42 results
Search Results
Now showing 1 - 10 of 42
Item Edge-Friend: Fast and Deterministic Catmull-Clark Subdivision Surfaces(The Eurographics Association and John Wiley & Sons Ltd., 2023) Kuth, Bastian; Oberberger, Max; Chajdas, Matthäus; Meyer, Quirin; Bikker, Jacco; Gribble, ChristiaanWe present edge-friend, a data structure for quad meshes with access to neighborhood information required for Catmull-Clark subdivision surface refinement. Edge-friend enables efficient real-time subdivision surface rendering. In particular, the resulting algorithm is deterministic, does not require hardware support for atomic floating-point arithmetic, and is optimized for efficient rendering on GPUs. Edge-friend exploits that after one subdivision step, two edges can be uniquely and implicitly assigned to each quad. Additionally, edge-friend is a compact data structure, adding little overhead. Our algorithm is simple to implement in a single compute shader kernel, and requires minimal synchronization which makes it particularly suited for asynchronous execution. We easily extend our kernel to support relevant Catmull-Clark subdivision surface features, including semi-smooth creases, boundaries, animation and attribute interpolation. In case of topology changes, our data structure requires little preprocessing, making it amendable for a variety of applications, including real-time editing and animations. Our method can process and render billions of triangles per second on modern GPUs. For a sample mesh, our algorithm generates and renders 2.9 million triangles in 0.58ms on an AMD Radeon RX 7900 XTX GPU.Item Single-Image SVBRDF Estimation with Learned Gradient Descent(The Eurographics Association and John Wiley & Sons Ltd., 2024) Luo, Xuejiao; Scandolo, Leonardo; Bousseau, Adrien; Eisemann, Elmar; Bermano, Amit H.; Kalogerakis, EvangelosRecovering spatially-varying materials from a single photograph of a surface is inherently ill-posed, making the direct application of a gradient descent on the reflectance parameters prone to poor minima. Recent methods leverage deep learning either by directly regressing reflectance parameters using feed-forward neural networks or by learning a latent space of SVBRDFs using encoder-decoder or generative adversarial networks followed by a gradient-based optimization in latent space. The former is fast but does not account for the likelihood of the prediction, i.e., how well the resulting reflectance explains the input image. The latter provides a strong prior on the space of spatially-varying materials, but this prior can hinder the reconstruction of images that are too different from the training data. Our method combines the strengths of both approaches. We optimize reflectance parameters to best reconstruct the input image using a recurrent neural network, which iteratively predicts how to update the reflectance parameters given the gradient of the reconstruction likelihood. By combining a learned prior with a likelihood measure, our approach provides a maximum a posteriori estimate of the SVBRDF. Our evaluation shows that this learned gradient-descent method achieves state-of-the-art performance for SVBRDF estimation on synthetic and real images.Item World-Space Spatiotemporal Path Resampling for Path Tracing(The Eurographics Association and John Wiley & Sons Ltd., 2023) Zhang, Hangyu; Wang, Beibei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.With the advent of hardware-accelerated ray tracing, more and more real-time rendering applications tend to render images with ray-traced global illumination (GI). However, the low sample counts at real-time framerates bring enormous challenges to existing path sampling methods. Recent work (ReSTIR GI) samples indirect illumination effectively with a dramatic bias reduction. However, as a screen-space based path resampling approach, it can only reuse the path at the first bounce and brings subtle benefits for complex scenes. To this end, we propose a world-space based spatiotemporal path resampling approach. Our approach caches more path samples into a world-space grid, which allows reusing sub-path starting from non-primary path vertices. Furthermore, we introduce a practical normal-aware hash grid construction approach, providing more efficient candidate samples for path resampling. Eventually, our method achieves improvements ranging from 16.6% to 41.9% in terms of mean squared errors (MSE) compared against the previous method with only 4.4% ~ 8.4% extra time cost.Item Ray-aligned Occupancy Map Array for Fast Approximate Ray Tracing(The Eurographics Association and John Wiley & Sons Ltd., 2023) Zeng, Zheng; Xu, Zilin; Wang, Lu; Wu, Lifan; Yan, Ling-Qi; Ritschel, Tobias; Weidlich, AndreaWe present a new software ray tracing solution that efficiently computes visibilities in dynamic scenes. We first introduce a novel scene representation: ray-aligned occupancy map array (ROMA) that is generated by rasterizing the dynamic scene once per frame. Our key contribution is a fast and low-divergence tracing method computing visibilities in constant time, without constructing and traversing the traditional intersection acceleration data structures such as BVH. To further improve accuracy and alleviate aliasing, we use a spatiotemporal scheme to stochastically distribute the candidate ray samples. We demonstrate the practicality of our method by integrating it into a modern real-time renderer and showing better performance compared to existing techniques based on distance fields (DFs). Our method is free of the typical artifacts caused by incomplete scene information, and is about 2.5×-10× faster than generating and tracing DFs at the same resolution and equal storage.Item Learning Dynamic 3D Geometry and Texture for Video Face Swapping(The Eurographics Association and John Wiley & Sons Ltd., 2022) Otto, Christopher; Naruniec, Jacek; Helminger, Leonhard; Etterlin, Thomas; Mignone, Graziana; Chandran, Prashanth; Zoss, Gaspard; Schroers, Christopher; Gross, Markus; Gotardo, Paulo; Bradley, Derek; Weber, Romann; Umetani, Nobuyuki; Wojtan, Chris; Vouga, EtienneFace swapping is the process of applying a source actor's appearance to a target actor's performance in a video. This is a challenging visual effect that has seen increasing demand in film and television production. Recent work has shown that datadriven methods based on deep learning can produce compelling effects at production quality in a fraction of the time required for a traditional 3D pipeline. However, the dominant approach operates only on 2D imagery without reference to the underlying facial geometry or texture, resulting in poor generalization under novel viewpoints and little artistic control. Methods that do incorporate geometry rely on pre-learned facial priors that do not adapt well to particular geometric features of the source and target faces. We approach the problem of face swapping from the perspective of learning simultaneous convolutional facial autoencoders for the source and target identities, using a shared encoder network with identity-specific decoders. The key novelty in our approach is that each decoder first lifts the latent code into a 3D representation, comprising a dynamic face texture and a deformable 3D face shape, before projecting this 3D face back onto the input image using a differentiable renderer. The coupled autoencoders are trained only on videos of the source and target identities, without requiring 3D supervision. By leveraging the learned 3D geometry and texture, our method achieves face swapping with higher quality than when using offthe- shelf monocular 3D face reconstruction, and overall lower FID score than state-of-the-art 2D methods. Furthermore, our 3D representation allows for efficient artistic control over the result, which can be hard to achieve with existing 2D approaches.Item Meshlets and How to Shade Them: A Study on Texture-Space Shading(The Eurographics Association and John Wiley & Sons Ltd., 2022) Neff, Thomas; Mueller, Joerg H.; Steinberger, Markus; Schmalstieg, Dieter; Chaine, Raphaëlle; Kim, Min H.Commonly used image-space layouts of shading points, such as used in deferred shading, are strictly view-dependent, which restricts efficient caching and temporal amortization. In contrast, texture-space layouts can represent shading on all surface points and can be tailored to the needs of a particular application. However, the best grouping of shading points-which we call a shading unit-in texture space remains unclear. Choices of shading unit granularity (how many primitives or pixels per unit) and in shading unit parametrization (how to assign texture coordinates to shading points) lead to different outcomes in terms of final image quality, overshading cost, and memory consumption. Among the possible choices, shading units consisting of larger groups of scene primitives, so-called meshlets, remain unexplored as of yet. In this paper, we introduce a taxonomy for analyzing existing texture-space shading methods based on the group size and parametrization of shading units. Furthermore, we introduce a novel texture-space layout strategy that operates on large shading units: the meshlet shading atlas. We experimentally demonstrate that the meshlet shading atlas outperforms previous approaches in terms of image quality, run-time performance and temporal upsampling for a given number of fragment shader invocations. The meshlet shading atlas lends itself to work together with popular cluster-based rendering of meshes with high geometric detail.Item HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections(The Eurographics Association and John Wiley & Sons Ltd., 2024) Dudai, Chen; Alper, Morris; Bezalel, Hana; Hanocka, Rana; Lang, Itai; Averbuch-Elor, Hadar; Bermano, Amit H.; Kalogerakis, EvangelosInternet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision-and-language models with adaptations for understanding landmark scene semantics. To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D-compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large-scale scenes with groundtruth segmentations for multiple semantic concepts. Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau-vailab.github.io/HaLo-NeRF/.Item Search Me Knot, Render Me Knot: Embedding Search and Differentiable Rendering of Knots in 3D(The Eurographics Association and John Wiley & Sons Ltd., 2024) Gangopadhyay, Aalok; Gupta, Paras; Sharma, Tarun; Singh, Prajwal; Raman, Shanmuganathan; Hu, Ruizhen; Lefebvre, SylvainWe introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular knots embedded in 3D for arbitrary perspective camera configurations. Utilizing this differentiable rendering algorithm, we search over the space of knot configurations to find the ideal knot embedding. We represent the knot embeddings via homeomorphisms of the desired template knot, where the weights of an invertible neural network parametrize the homeomorphisms. Our approach is fully differentiable, making it possible to find the ideal 3D tubular structure for the desired perceptual art using gradient-based optimization. We propose several loss functions that impose additional physical constraints, enforcing that the tube is free of self-intersection, lies within a predefined region in space, satisfies the physical bending limits of the tube material, and the material cost is within a specified budget. We demonstrate through results that our knot representation is highly expressive and gives impressive results even for challenging target images in both single-view and multiple-view constraints. Through extensive ablation study, we show that each proposed loss function effectively ensures physical realizability. We construct a real-world 3D-printed object to demonstrate the practical utility of our approach.Item Progressive Denoising of Monte Carlo Rendered Images(The Eurographics Association and John Wiley & Sons Ltd., 2022) Firmino, Arthur; Frisvad, Jeppe Revall; Jensen, Henrik Wann; Chaine, Raphaëlle; Kim, Min H.Image denoising based on deep learning has become a powerful tool to accelerate Monte Carlo rendering. Deep learning techniques can produce smooth images using a low sample count. Unfortunately, existing deep learning methods are biased and do not converge to the correct solution as the number of samples increase. In this paper, we propose a progressive denoising technique that aims to use denoising only when it is beneficial and to reduce its impact at high sample counts. We use Stein's unbiased risk estimate (SURE) to estimate the error in the denoised image, and we combine this with a neural network to infer a per-pixel mixing parameter. We further augment this network with confidence intervals based on classical statistics to ensure consistency and convergence of the final denoised image. Our results demonstrate that our method is consistent and that it improves existing denoising techniques. Furthermore, it can be used in combination with existing high quality denoisers to ensure consistency. In addition to being asymptotically unbiased, progressive denoising is particularly good at preserving fine details that would otherwise be lost with existing denoisers.Item CubeGAN: Omnidirectional Image Synthesis Using Generative Adversarial Networks(The Eurographics Association and John Wiley & Sons Ltd., 2023) May, Christopher; Aliaga, Daniel; Myszkowski, Karol; Niessner, MatthiasWe propose a framework to create projectively-correct and seam-free cube-map images using generative adversarial learning. Deep generation of cube-maps that contain the correct projection of the environment onto its faces is not straightforward as has been recognized in prior work. Our approach extends an existing framework, StyleGAN3, to produce cube-maps instead of planar images. In addition to reshaping the output, we include a cube-specific volumetric initialization component, a projective resampling component, and a modification of augmentation operations to the spherical domain. Our results demonstrate the network's generation capabilities trained on imagery from various 3D environments. Additionally, we show the power and quality of our GAN design in an inversion task, combined with navigation capabilities, to perform novel view synthesis.