VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction

Corresponding authors 1Sun Yat-Sen University 2Cardiff University

3ShanghaiTech University 4Peng Cheng Laboratory 5Guangdong Key Laboratory of Big Data Analysis and Processing

CVPR 2025

Abstract

Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals. In this work, we propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering. Specifically, we leverage the equivalence between a 3D model and its rendered multi-view 2D images, and reformulate 3D VTON as an extension of 2D VTON that ensures 3D consistent results across multiple views. To achieve this, we extend 2D VTON models to include multi-view garments and clothing-agnostic human body images as input, and propose several novel techniques to enhance them, including: i) a pseudo-3D pose representation using normal maps derived from the SMPL-X 3D human model, ii) a multi-view spatial attention mechanism that models the correlations between features from different viewing angles, and iii) a multi-view CLIP embedding that enhances the garment CLIP features used in 2D VTON with camera information. Extensive experiments on large-scale real datasets and clothing images from e-commerce platforms demonstrate the effectiveness of our approach.

Method

Given an input 3D human model \(\mathbf{G_{\rm src}}\) and a pair of garment images \((g_f, g_b)\), our method 1) renders \(\mathbf{G_{\rm src}}\) into multi-view 2D images (left) and 2) edits the rendered multi-view 2D images (middle); 3) reconstructs the edited images into a 3D model \(\mathbf{G_{\rm VTON}}\) (right). In the crucial step 2), we propose three novel techniques to equip a typical 2D VTON network with the capability to generate 3D-consistent results: 1) Pseudo-3D Pose Input, 2) Multi-view Spatial Attention, and 3) Multi-view CLIP Embedding


3D Virtual Try-on Results with E-commerce garments (from MVG Dataset)

3D Virtual Try-on Results on Thuman2.0 Dataset

3D Virtual Try-on Results on MVHumanNet Dataset

3D Virtual Try-on Results on a Unseen Real Scene


Qualitative Comparisons