Search Results

Now showing 1 - 5 of 5
  • Item
    State of the Art on Diffusion Models for Visual Computing
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Po, Ryan; Yifan, Wang; Golyanik, Vladislav; Aberman, Kfir; Barron, Jon T.; Bermano, Amit; Chan, Eric; Dekel, Tali; Holynski, Aleksander; Kanazawa, Angjoo; Liu, C. Karen; Liu, Lingjie; Mildenhall, Ben; Nießner, Matthias; Ommer, Björn; Theobalt, Christian; Wonka, Peter; Wetzstein, Gordon; Aristidou, Andreas; Macdonnell, Rachel
    The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.
  • Item
    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction
    (The Eurographics Association and John Wiley & Sons Ltd., 2023) Tretschk, Edith; Kairanda, Navami; B R, Mallikarjun; Dabral, Rishabh; Kortylewski, Adam; Egger, Bernhard; Habermann, Marc; Fua, Pascal; Theobalt, Christian; Golyanik, Vladislav; Bousseau, Adrien; Theobalt, Christian
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular 2D image observations is a long-standing and actively researched area of computer vision and graphics. It is an ill-posed inverse problem, since-without additional prior assumptions-it permits infinitely many solutions leading to accurate projection to the input 2D images. Non-rigid reconstruction is a foundational building block for downstream applications like robotics, AR/VR, or visual content creation. The key advantage of using monocular cameras is their omnipresence and availability to the end users as well as their ease of use compared to more sophisticated camera set-ups such as stereo or multi-view systems. This survey focuses on state-of-the-art methods for dense non-rigid 3D reconstruction of various deformable objects and composite scenes from monocular videos or sets of monocular views. It reviews the fundamentals of 3D reconstruction and deformation modeling from 2D image observations. We then start from general methods-that handle arbitrary scenes and make only a few prior assumptions-and proceed towards techniques making stronger assumptions about the observed objects and types of deformations (e.g. human faces, bodies, hands, and animals). A significant part of this STAR is also devoted to classification and a high-level comparison of the methods, as well as an overview of the datasets for training and evaluation of the discussed techniques. We conclude by discussing open challenges in the field and the social aspects associated with the usage of the reviewed methods.
  • Item
    Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Yunus, Raza; Lenssen, Jan Eric; Niemeyer, Michael; Liao, Yiyi; Rupprecht, Christian; Theobalt, Christian; Pons-Moll, Gerard; Huang, Jia-Bin; Golyanik, Vladislav; Ilg, Eddy; Aristidou, Andreas; Macdonnell, Rachel
    Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.
  • Item
    Virtual Humans meet Event-based and Quantum-enhanced Vision
    (The Eurographics Association, 2025) Habermann, Marc; Golyanik, Vladislav; Mantiuk, Rafal; Hildebrandt, Klaus
    The tutorial is split in two parts, i.e. two 90 minute talks. In the first half, Marc Habermann will provide an introduction to creating a digital twin of a real human. Second, Vladislav Golyanik will introduce new ways of sensing the real world using event-based vision and how quantum computers can enhance fundamental problems in graphics and vision.
  • Item
    D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video
    (The Eurographics Association and John Wiley & Sons Ltd., 2025) Kappel, Moritz; Hahlbohm, Florian; Scholz, Timon; Castillo, Susana; Theobalt, Christian; Eisemann, Martin; Golyanik, Vladislav; Magnor, Marcus; Bousseau, Adrien; Day, Angela
    Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a dynamic neural point cloud, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our code and data are available online https://moritzkappel.github.io/projects/dnpc/.