17 results
Search Results
Now showing 1 - 10 of 17
Item Deep Compositional Denoising for High-quality Monte Carlo Rendering(The Eurographics Association and John Wiley & Sons Ltd., 2021) Zhang, Xianyao; Manzi, Marco; Vogels, Thijs; Dahlberg, Henrik; Gross, Markus; Papas, Marios; Bousseau, Adrien and McGuire, MorganWe propose a deep-learning method for automatically decomposing noisy Monte Carlo renderings into components that kernelpredicting denoisers can denoise more effectively. In our model, a neural decomposition module learns to predict noisy components and corresponding feature maps, which are consecutively reconstructed by a denoising module. The components are predicted based on statistics aggregated at the pixel level by the renderer. Denoising these components individually allows the use of per-component kernels that adapt to each component's noisy signal characteristics. Experimentally, we show that the proposed decomposition module consistently improves the denoising quality of current state-of-the-art kernel-predicting denoisers on large-scale academic and production datasets.Item Glyph-Based Visualization of Affective States(The Eurographics Association, 2020) Kovacevic, Nikola; Wampfler, Rafael; Solenthaler, Barbara; Gross, Markus; Günther, Tobias; Kerren, Andreas and Garth, Christoph and Marai, G. ElisabetaDecades of research in psychology on the formal measurement of emotions led to the concept of affective states. Visualizing the measured affective state can be useful in education, as it allows teachers to adapt lessons based on the affective state of students. In the entertainment industry, game mechanics can be adapted based on the boredom and frustration levels of a player. Visualizing the affective state can also increase emotional self-awareness of the user whose state is being measured, which can have an impact on well-being. However, graphical user interfaces seldom visualize the user's affective state, but rather focus on the purely objective interaction between the system and the user. This paper proposes two graphical user interface widgets that visualize the user's affective state, ensuring a compact and unobtrusive visualization. In a user study with 644 participants, the widgets were evaluated in relation to a baseline widget and were tested on intuitiveness and understandability. Particularly in terms of understandability, the baseline was outperformed by our two widgets.Item Neural Denoising for Deep-Z Monte Carlo Renderings(The Eurographics Association and John Wiley & Sons Ltd., 2024) Zhang, Xianyao; Röthlin, Gerhard; Zhu, Shilin; Aydin, Tunç Ozan; Salehi, Farnood; Gross, Markus; Papas, Marios; Bermano, Amit H.; Kalogerakis, EvangelosWe present a kernel-predicting neural denoising method for path-traced deep-Z images that facilitates their usage in animation and visual effects production. Deep-Z images provide enhanced flexibility during compositing as they contain color, opacity, and other rendered data at multiple depth-resolved bins within each pixel. However, they are subject to noise, and rendering until convergence is prohibitively expensive. The current state of the art in deep-Z denoising yields objectionable artifacts, and current neural denoising methods are incapable of handling the variable number of depth bins in deep-Z images. Our method extends kernel-predicting convolutional neural networks to address the challenges stemming from denoising deep-Z images. We propose a hybrid reconstruction architecture that combines the depth-resolved reconstruction at each bin with the flattened reconstruction at the pixel level. Moreover, we propose depth-aware neighbor indexing of the depth-resolved inputs to the convolution and denoising kernel application operators, which reduces artifacts caused by depth misalignment present in deep-Z images. We evaluate our method on a production-quality deep-Z dataset, demonstrating significant improvements in denoising quality and performance compared to the current state-of-the-art deep-Z denoiser. By addressing the significant challenge of the cost associated with rendering path-traced deep-Z images, we believe that our approach will pave the way for broader adoption of deep-Z workflows in future productions.Item Learning Dynamic 3D Geometry and Texture for Video Face Swapping(The Eurographics Association and John Wiley & Sons Ltd., 2022) Otto, Christopher; Naruniec, Jacek; Helminger, Leonhard; Etterlin, Thomas; Mignone, Graziana; Chandran, Prashanth; Zoss, Gaspard; Schroers, Christopher; Gross, Markus; Gotardo, Paulo; Bradley, Derek; Weber, Romann; Umetani, Nobuyuki; Wojtan, Chris; Vouga, EtienneFace swapping is the process of applying a source actor's appearance to a target actor's performance in a video. This is a challenging visual effect that has seen increasing demand in film and television production. Recent work has shown that datadriven methods based on deep learning can produce compelling effects at production quality in a fraction of the time required for a traditional 3D pipeline. However, the dominant approach operates only on 2D imagery without reference to the underlying facial geometry or texture, resulting in poor generalization under novel viewpoints and little artistic control. Methods that do incorporate geometry rely on pre-learned facial priors that do not adapt well to particular geometric features of the source and target faces. We approach the problem of face swapping from the perspective of learning simultaneous convolutional facial autoencoders for the source and target identities, using a shared encoder network with identity-specific decoders. The key novelty in our approach is that each decoder first lifts the latent code into a 3D representation, comprising a dynamic face texture and a deformable 3D face shape, before projecting this 3D face back onto the input image using a differentiable renderer. The coupled autoencoders are trained only on videos of the source and target identities, without requiring 3D supervision. By leveraging the learned 3D geometry and texture, our method achieves face swapping with higher quality than when using offthe- shelf monocular 3D face reconstruction, and overall lower FID score than state-of-the-art 2D methods. Furthermore, our 3D representation allows for efficient artistic control over the result, which can be hard to achieve with existing 2D approaches.Item Interactive Sculpting of Digital Faces Using an Anatomical Modeling Paradigm(The Eurographics Association and John Wiley & Sons Ltd., 2020) Gruber, Aurel; Fratarcangeli, Marco; Zoss, Gaspard; Cattaneo, Roman; Beeler, Thabo; Gross, Markus; Bradley, Derek; Jacobson, Alec and Huang, QixingDigitally sculpting 3D human faces is a very challenging task. It typically requires either 1) highly-skilled artists using complex software packages for high quality results, or 2) highly-constrained simple interfaces for consumer-level avatar creation, such as in game engines. We propose a novel interactive method for the creation of digital faces that is simple and intuitive to use, even for novice users, while consistently producing plausible 3D face geometry, and allowing editing freedom beyond traditional video game avatar creation. At the core of our system lies a specialized anatomical local face model (ALM), which is constructed from a dataset of several hundred 3D face scans. User edits are propagated to constraints for an optimization of our data-driven ALM model, ensuring the resulting face remains plausible even for simple edits like clicking and dragging surface points. We show how several natural interaction methods can be implemented in our framework, including direct control of the surface, indirect control of semantic features like age, ethnicity, gender, and BMI, as well as indirect control through manipulating the underlying bony structures. The result is a simple new method for creating digital human faces, for artists and novice users alike. Our method is attractive for low-budget VFX and animation productions, and our anatomical modeling paradigm can complement traditional game engine avatar design packages.Item Deep Reconstruction of 3D Smoke Densities from Artist Sketches(The Eurographics Association and John Wiley & Sons Ltd., 2022) Kim, Byungsoo; Huang, Xingchang; Wuelfroth, Laura; Tang, Jingwei; Cordonnier, Guillaume; Gross, Markus; Solenthaler, Barbara; Chaine, Raphaëlle; Kim, Min H.Creative processes of artists often start with hand-drawn sketches illustrating an object. Pre-visualizing these keyframes is especially challenging when applied to volumetric materials such as smoke. The authored 3D density volumes must capture realistic flow details and turbulent structures, which is highly non-trivial and remains a manual and time-consuming process. We therefore present a method to compute a 3D smoke density field directly from 2D artist sketches, bridging the gap between early-stage prototyping of smoke keyframes and pre-visualization. From the sketch inputs, we compute an initial volume estimate and optimize the density iteratively with an updater CNN. Our differentiable sketcher is embedded into the end-to-end training, which results in robust reconstructions. Our training data set and sketch augmentation strategy are designed such that it enables general applicability. We evaluate the method on synthetic inputs and sketches from artists depicting both realistic smoke volumes and highly non-physical smoke shapes. The high computational performance and robustness of our method at test time allows interactive authoring sessions of volumetric density fields for rapid prototyping of ideas by novice users.Item Facial Animation with Disentangled Identity and Motion using Transformers(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chandran, Prashanth; Zoss, Gaspard; Gross, Markus; Gotardo, Paulo; Bradley, Derek; Dominik L. Michels; Soeren PirkWe propose a 3D+time framework for modeling dynamic sequences of 3D facial shapes, representing realistic non-rigid motion during a performance. Our work extends neural 3D morphable models by learning a motion manifold using a transformer architecture. More specifically, we derive a novel transformer-based autoencoder that can model and synthesize 3D geometry sequences of arbitrary length. This transformer naturally determines frame-to-frame correlations required to represent the motion manifold, via the internal self-attention mechanism. Furthermore, our method disentangles the constant facial identity from the time-varying facial expressions in a performance, using two separate codes to represent neutral identity and the performance itself within separate latent subspaces. Thus, the model represents identity-agnostic performances that can be paired with an arbitrary new identity code and fed through our new identity-modulated performance decoder; the result is a sequence of 3D meshes for the performance with the desired identity and temporal length. We demonstrate how our disentangled motion model has natural applications in performance synthesis, performance retargeting, key-frame interpolation and completion of missing data, performance denoising and retiming, and other potential applications that include full 3D body modeling.Item A Perceptual Shape Loss for Monocular 3D Face Reconstruction(The Eurographics Association and John Wiley & Sons Ltd., 2023) Otto, Christopher; Chandran, Prashanth; Zoss, Gaspard; Gross, Markus; Gotardo, Paulo; Bradley, Derek; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.Item Robust Image Denoising using Kernel Predicting Networks(The Eurographics Association, 2021) Cai, Zhilin; Zhang, Yang; Manzi, Marco; Oztireli, Cengiz; Gross, Markus; Aydin, Tunç Ozan; Theisel, Holger and Wimmer, MichaelWe present a new method for designing high quality denoisers that are robust to varying noise characteristics of input images. Instead of taking a conventional blind denoising approach or relying on explicit noise parameter estimation networks as well as invertible camera imaging pipeline models, we propose a two-stage model that first processes an input image with a small set of specialized denoisers, and then passes the resulting intermediate denoised images to a kernel predicting network that estimates per-pixel denoising kernels. We demonstrate that our approach achieves robustness to noise parameters at a level that exceeds comparable blind denoisers, while also coming close to state-of-the-art denoising quality for camera sensor noise.Item GANtlitz: Ultra High Resolution Generative Model for Multi-Modal Face Textures(The Eurographics Association and John Wiley & Sons Ltd., 2024) Gruber, Aurel; Collins, Edo; Meka, Abhimitra; Mueller, Franziska; Sarkar, Kripasindhu; Orts-Escolano, Sergio; Prasso, Luca; Busch, Jay; Gross, Markus; Beeler, Thabo; Bermano, Amit H.; Kalogerakis, EvangelosHigh-resolution texture maps are essential to render photoreal digital humans for visual effects or to generate data for machine learning. The acquisition of high resolution assets at scale is cumbersome, it involves enrolling a large number of human subjects, using expensive multi-view camera setups, and significant manual artistic effort to align the textures. To alleviate these problems, we introduce GANtlitz (A play on the german noun Antlitz, meaning face), a generative model that can synthesize multi-modal ultra-high-resolution face appearance maps for novel identities. Our method solves three distinct challenges: 1) unavailability of a very large data corpus generally required for training generative models, 2) memory and computational limitations of training a GAN at ultra-high resolutions, and 3) consistency of appearance features such as skin color, pores and wrinkles in high-resolution textures across different modalities. We introduce dual-style blocks, an extension to the style blocks of the StyleGAN2 architecture, which improve multi-modal synthesis. Our patch-based architecture is trained only on image patches obtained from a small set of face textures (<100) and yet allows us to generate seamless appearance maps of novel identities at 6k×4k resolution. Extensive qualitative and quantitative evaluations and baseline comparisons show the efficacy of our proposed system.