Computer Graphics Forum

Permanent URI for this community

https://diglib.eg.org/handle/10.2312/1

Browse

Now showing 1 - 7 of 7

ClothingTwin: Reconstructing Inner and Outer Layers of Clothing Using 3D Gaussian Splatting
(The Eurographics Association and John Wiley & Sons Ltd., 2025) Jung, Munkyung; Lee, Dohae; Lee, In-Kwon; Christie, Marc; Pietroni, Nico; Wang, Yu-Shuen
We introduce ClothingTwin, a novel end-to-end framework for reconstructing 3D digital twins of clothing that capture both the outer and inner fabric -without the need for manual mannequin removal. Traditional 2D ''ghost mannequin'' photography techniques remove the mannequin and composite partial inner textures to create images in which the garment appears as if it were worn by a transparent model. However, extending such method to photorealistic 3D Gaussian Splatting (3DGS) is far more challenging. Achieving consistent inner-layer compositing across the large sets of images used for 3DGS optimization quickly becomes impractical if done manually. To address these issues, ClothingTwin introduces three key innovations. First, a specialized image acquisition protocol captures two sets of images for each garment: one worn normally on the mannequin (outer layer exposed) and one worn inside-out (inner layer exposed). This eliminates the need to painstakingly edit out mannequins in thousands of images and provides full coverage of all fabric surfaces. Second, we employ a mesh-guided 3DGS reconstruction for each layer and leverage Non-Rigid Iterative Closest Point (ICP) to align outer and inner point-clouds despite distinct geometries. Third, our enhanced rendering pipeline-featuring mesh-guided back-face culling, back-to-front alpha blending, and recalculated spherical harmonic angles-ensures photorealistic visualization of the combined outer and inner layers without inter-layer artifacts. Experimental evaluations on various garments show that ClothingTwin outperforms conventional 3DGS-based methods, and our ablation study validates the effectiveness of each proposed component.
CP-NeRF: Conditionally Parameterized Neural Radiance Fields for Cross-scene Novel View Synthesis
(The Eurographics Association and John Wiley & Sons Ltd., 2023) He, Hao; Liang, Yixun; Xiao, Shishi; Chen, Jierun; Chen, Yingcong; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Neural radiance fields (NeRF) have demonstrated a promising research direction for novel view synthesis. However, the existing approaches either require per-scene optimization that takes significant computation time or condition on local features which overlook the global context of images. To tackle this shortcoming, we propose the Conditionally Parameterized Neural Radiance Fields (CP-NeRF), a plug-in module that enables NeRF to leverage contextual information from different scales. Instead of optimizing the model parameters of NeRFs directly, we train a Feature Pyramid hyperNetwork (FPN) that extracts view-dependent global and local information from images within or across scenes to produce the model parameters. Our model can be trained end-to-end with standard photometric loss from NeRF. Extensive experiments demonstrate that our method can significantly boost the performance of NeRF, achieving state-of-the-art results in various benchmark datasets.
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
(The Eurographics Association and John Wiley & Sons Ltd., 2024) Dudai, Chen; Alper, Morris; Bezalel, Hana; Hanocka, Rana; Lang, Itai; Averbuch-Elor, Hadar; Bermano, Amit H.; Kalogerakis, Evangelos
Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision-and-language models with adaptations for understanding landmark scene semantics. To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D-compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large-scale scenes with groundtruth segmentations for multiple semantic concepts. Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau-vailab.github.io/HaLo-NeRF/.
Joint Deblurring and 3D Reconstruction for Macrophotography
(The Eurographics Association and John Wiley & Sons Ltd., 2025) Zhao, Yifan; Li, Liangchen; Zhou, Yuqi; Wang, Kai; Liang, Yan; Zhang, Juyong; Christie, Marc; Pietroni, Nico; Wang, Yu-Shuen
Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.
Multimodal 3D Few-Shot Classification via Gaussian Mixture Discriminant Analysis
(The Eurographics Association and John Wiley & Sons Ltd., 2025) Wu, Yiqi; Wu, Huachao; Hu, Ronglei; Chen, Yilin; Zhang, Dejun; Christie, Marc; Pietroni, Nico; Wang, Yu-Shuen
While pre-trained 3D vision-language models are becoming increasingly available, there remains a lack of frameworks that can effectively harness their capabilities for few-shot classification. In this work, we propose PointGMDA, a training-free framework that combines Gaussian Mixture Models (GMMs) with Gaussian Discriminant Analysis (GDA) to perform robust classification using only a few labeled point cloud samples. Our method estimatesGMMparameters per class from support data and computes mixture-weighted prototypes, which are then used in GDA with a shared covariance matrix to construct decision boundaries. This formulation allows us to model intra-class variability more expressively than traditional single-prototype approaches, while maintaining analytical tractability. To incorporate semantic priors, we integrate CLIP-style textual prompts and fuse predictions from geometric and textual modalities through a hybrid scoring strategy. We further introduce PointGMDA-T, a lightweight attention-guided refinement module that learns residuals for fast feature adaptation, improving robustness under distribution shift. Extensive experiments on ModelNet40 and ScanObjectNN demonstrate that PointGMDA outperforms strong baselines across a variety of few-shot settings, with consistent gains under both training-free and fine-tuned conditions. These results highlight the effectiveness and generality of our probabilistic modeling and multimodal adaptation framework. Our code is publicly available at https://github.com/djzgroup/PointGMDA.
Practical Acquisition of Shape and Plausible Appearance of Reflective and Translucent Objects
(The Eurographics Association and John Wiley & Sons Ltd., 2023) Lin, Arvin; Lin, Yiming; Ghosh, Abhijeet; Ritschel, Tobias; Weidlich, Andrea
We present a practical method for acquisition of shape and plausible appearance of reflective and translucent objects for realistic rendering and relighting applications. Such objects are extremely challenging to scan with existing capture setups, and have previously required complex lightstage hardware emitting continuous illumination. We instead employ a practical capture setup consisting of a set of desktop LCD screens to illuminate such objects with piece-wise continuous illumination for acquisition. We employ phase-shifted sinusoidal illumination for novel estimation of high quality photometric normals and transmission vector along with diffuse-specular separated reflectance/transmission maps for realistic relighting. We further employ neural in-painting to fill gaps in our measurements caused by gaps in screen illumination, and a novel NeuS-based neural rendering that combines these shape and reflectance maps acquired from multiple viewpoints for high-quality 3D surface geometry reconstruction along with plausible realistic rendering of complex light transport in such objects.
ShellNeRF: Learning a Controllable High-resolution Model of the Eye and Periocular Region
(The Eurographics Association and John Wiley & Sons Ltd., 2024) Li, Gengyan; Sarkar, Kripasindhu; Meka, Abhimitra; Buehler, Marcel; Mueller, Franziska; Gotardo, Paulo; Hilliges, Otmar; Beeler, Thabo; Bermano, Amit H.; Kalogerakis, Evangelos
Eye gaze and expressions are crucial non-verbal signals in face-to-face communication. Visual effects and telepresence demand significant improvements in personalized tracking, animation, and synthesis of the eye region to achieve true immersion. Morphable face models, in combination with coordinate-based neural volumetric representations, show promise in solving the difficult problem of reconstructing intricate geometry (eyelashes) and synthesizing photorealistic appearance variations (wrinkles and specularities) of eye performances. We propose a novel hybrid representation - ShellNeRF - that builds a discretized volume around a 3DMM face mesh using concentric surfaces to model the deformable 'periocular' region. We define a canonical space using the UV layout of the shells that constrains the space of dense correspondence search. Combined with an explicit eyeball mesh for modeling corneal light-transport, our model allows for animatable photorealistic 3D synthesis of the whole eye region. Using multi-view video input, we demonstrate significant improvements over state-of-the-art in expression re-enactment and transfer for high-resolution close-up views of the eye region.

Browse

Browsing Computer Graphics Forum by Subject "3D imaging"

Results Per Page

Sort Options