Search Results

Now showing 1 - 10 of 36
  • Item
    Dense 3D Gaussian Splatting Initialization for Sparse Image Data
    (The Eurographics Association, 2024) Seibt, Simon; Chang, Thomas Vincent Siu-Lung; von Rymon Lipinski, Bartosz ; Latoschik, Marc Erich; Liu, Lingjie; Averkiou, Melinos
    This paper presents advancements in novel-view synthesis with 3D Gaussian Splatting (3DGS) using a dense and accurate SfM point cloud initialization approach. We address the challenge of achieving photorealistic renderings from sparse image data, where basic 3DGS training may result in suboptimal convergence, thus leading to visual artifacts. The proposed method enhances precision and density of initially reconstructed point clouds by refining 3D positions and extrapolating additional points, even for difficult image regions, e.g. with repeating patterns and suboptimal visual coverage. Our contributions focus on improving ''Dense Feature Matching for Structure-from-Motion'' (DFM4SfM) based on a homographic decomposition of the image space to support 3DGS training: First, a grid-based feature detection method is introduced for DFM4SfM to ensure a welldistributed 3D Gaussian initialization uniformly over all depth planes. Second, the SfM feature matching is complemented by a geometric plausibility check, priming the homography estimation and thereby improving the initial placement of 3D Gaussians. Experimental results on the NeRF-LLFF dataset demonstrate that this approach achieves superior qualitative and quantitative results, even for fewer views, and the potential for a significantly accelerated 3DGS training with faster convergence.
  • Item
    Single-Image SVBRDF Estimation with Learned Gradient Descent
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Luo, Xuejiao; Scandolo, Leonardo; Bousseau, Adrien; Eisemann, Elmar; Bermano, Amit H.; Kalogerakis, Evangelos
    Recovering spatially-varying materials from a single photograph of a surface is inherently ill-posed, making the direct application of a gradient descent on the reflectance parameters prone to poor minima. Recent methods leverage deep learning either by directly regressing reflectance parameters using feed-forward neural networks or by learning a latent space of SVBRDFs using encoder-decoder or generative adversarial networks followed by a gradient-based optimization in latent space. The former is fast but does not account for the likelihood of the prediction, i.e., how well the resulting reflectance explains the input image. The latter provides a strong prior on the space of spatially-varying materials, but this prior can hinder the reconstruction of images that are too different from the training data. Our method combines the strengths of both approaches. We optimize reflectance parameters to best reconstruct the input image using a recurrent neural network, which iteratively predicts how to update the reflectance parameters given the gradient of the reconstruction likelihood. By combining a learned prior with a likelihood measure, our approach provides a maximum a posteriori estimate of the SVBRDF. Our evaluation shows that this learned gradient-descent method achieves state-of-the-art performance for SVBRDF estimation on synthetic and real images.
  • Item
    3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization
    (The Eurographics Association, 2024) Chung, SeungJeh; Park, JooHyun; Kang, HyeongYeop; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
    3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential by enabling the creation of uniquely styled 3D objects tailored to diverse scenes. Recent advancements in artificial intelligence and textdriven manipulation methods have made the stylization process increasingly intuitive and automated. While these methods reduce human costs by minimizing reliance on manual labor and expertise, they predominantly focus on holistic stylization, neglecting the application of desired styles to individual components of a 3D object. This limitation restricts the fine-grained controllability. To address this gap, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, parttailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP utilizes the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize individual parts of the 3D mesh and modify their appearance to match the styles specified in the text prompt. 3DStyleGLIP effectively integrates part localization and stylization guidance within GLIP's shared embedding space through an end-to-end process, enabled by part-level style loss and two complementary learning techniques. This neural methodology meets the user's need for fine-grained style editing and delivers high-quality part-specific stylization results, opening new possibilities for customization and flexibility in 3D content creation. Our code and results are available at https://github.com/sj978/3DStyleGLIP.
  • Item
    Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices
    (The Eurographics Association, 2024) Niedermayr, Simon; Neuhauser, Christoph; Petkov, Kaloian; Engel, Klaus; Westermann, Rüdiger; Linsen, Lars; Thies, Justus
    Interactive photorealistic rendering of 3D anatomy is used in medical education to explain the structure of the human body. It is currently restricted to frontal teaching scenarios, where even with a powerful GPU and high-speed access to a large storage device where the data set is hosted, interactive demonstrations can hardly be achieved. We present the use of novel view synthesis via compressed 3D Gaussian Splatting (3DGS) to overcome this restriction, and to even enable students to perform cinematic anatomy on lightweight and mobile devices. Our proposed pipeline first finds a set of camera poses that captures all potentially seen structures in the data. High-quality images are then generated with path tracing and converted into a compact 3DGS representation, consuming < 70 MB even for data sets of multiple GBs. This allows for real-time photorealistic novel view synthesis that recovers structures up to the voxel resolution and is almost indistinguishable from the path-traced images.
  • Item
    HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Dudai, Chen; Alper, Morris; Bezalel, Hana; Hanocka, Rana; Lang, Itai; Averbuch-Elor, Hadar; Bermano, Amit H.; Kalogerakis, Evangelos
    Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision-and-language models with adaptations for understanding landmark scene semantics. To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D-compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large-scale scenes with groundtruth segmentations for multiple semantic concepts. Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau-vailab.github.io/HaLo-NeRF/.
  • Item
    Search Me Knot, Render Me Knot: Embedding Search and Differentiable Rendering of Knots in 3D
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Gangopadhyay, Aalok; Gupta, Paras; Sharma, Tarun; Singh, Prajwal; Raman, Shanmuganathan; Hu, Ruizhen; Lefebvre, Sylvain
    We introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular knots embedded in 3D for arbitrary perspective camera configurations. Utilizing this differentiable rendering algorithm, we search over the space of knot configurations to find the ideal knot embedding. We represent the knot embeddings via homeomorphisms of the desired template knot, where the weights of an invertible neural network parametrize the homeomorphisms. Our approach is fully differentiable, making it possible to find the ideal 3D tubular structure for the desired perceptual art using gradient-based optimization. We propose several loss functions that impose additional physical constraints, enforcing that the tube is free of self-intersection, lies within a predefined region in space, satisfies the physical bending limits of the tube material, and the material cost is within a specified budget. We demonstrate through results that our knot representation is highly expressive and gives impressive results even for challenging target images in both single-view and multiple-view constraints. Through extensive ablation study, we show that each proposed loss function effectively ensures physical realizability. We construct a real-world 3D-printed object to demonstrate the practical utility of our approach.
  • Item
    TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Franke, Linus; Rückert, Darius; Fink, Laura; Stamminger, Marc; Bermano, Amit H.; Kalogerakis, Evangelos
    Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [KKLD23] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [RFS22] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. The project page is located at: https://lfranke.github.io/trips
  • Item
    CoupNeRF: Property-aware Neural Radiance Fields for Multi-Material Coupled Scenario Reconstruction
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Li, Jin; Gao, Yang; Song, Wenfeng; Li, Yacong; Li, Shuai; Hao, Aimin; Qin, Hong; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
    Neural Radiance Fields (NeRFs) have achieved significant recognition for their proficiency in scene reconstruction and rendering by utilizing neural networks to depict intricate volumetric environments. Despite considerable research dedicated to reconstructing physical scenes, rare works succeed in challenging scenarios involving dynamic, multi-material objects. To alleviate, we introduce CoupNeRF, an efficient neural network architecture that is aware of multiple material properties. This architecture combines physically grounded continuum mechanics with NeRF, facilitating the identification of motion systems across a wide range of physical coupling scenarios. We first reconstruct specific-material of objects within 3D physical fields to learn material parameters. Then, we develop a method to model the neighbouring particles, enhancing the learning process specifically in regions where material transitions occur. The effectiveness of CoupNeRF is demonstrated through extensive experiments, showcasing its proficiency in accurately coupling and identifying the behavior of complex physical scenes that span multiple physics domains.
  • Item
    LO-Gaussian: Gaussian Splatting for Low-light and Overexposure Scenes through Simulated Filter
    (The Eurographics Association, 2024) You, Jingjiao; Zhang, Yuanyang; Zhou, Tianchen; Zhao, Yecheng; Yao, Li; Chen, Renjie; Ritschel, Tobias; Whiting, Emily
    Recent advancements in 3D Gaussian-based scene reconstruction and novel view synthesis have achieved impressive results. However, real-world images often suffer from adverse lighting conditions, which can hinder the performance of these techniques. Although progress has been made in addressing poor illumination, existing methods still struggle to accurately recover complex details in low-light and overexposed images. To address this challenge, we propose a method called LO-Gaussian, designed to recover illumination effectively in both low-light and overexposed scenes. Our approach involves simulating adverse lighting conditions during training, which is jointly optimized with the original 3D Gaussian rendering. During inference, the simulated filter is removed, allowing the model to decouple the scene under normal lighting conditions. We validate the effectiveness of our method through experiments on two publicly available datasets that include both poorly illuminated scenes and their corresponding normal illumination images. Experimental results demonstrate that LO-Gaussian consistently achieves optimal or near-optimal performance across these datasets, confirming the efficacy of our approach in illumination restoration.
  • Item
    Enhancing Spatiotemporal Resampling with a Novel MIS Weight
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Pan, Xingyue; Zhang, Jiaxuan; Huang, Jiancong; Liu, Ligang; Bermano, Amit H.; Kalogerakis, Evangelos
    In real-time rendering, optimizing the sampling of large-scale candidates is crucial. The spatiotemporal reservoir resampling (ReSTIR) method provides an effective approach for handling large candidate samples, while the Generalized Resampled Importance Sampling (GRIS) theory provides a general framework for resampling algorithms. However, we have observed that when using the generalized multiple importance sampling (MIS) weight in previous work during spatiotemporal reuse, variances gradually amplify in the candidate domain when there are significant differences. To address this issue, we propose a new MIS weight suitable for resampling that blends samples from different sampling domains, ensuring convergence of results as the proportion of non-canonical samples increases. Additionally, we apply this weight to temporal resampling to reduce noise caused by scene changes or jitter. Our method effectively reduces energy loss in the biased version of ReSTIR DI while incurring no additional overhead, and it also suppresses artifacts caused by a high proportion of temporal samples. As a result, our approach leads to lower variance in the sampling results.