Search Results

Now showing 1 - 4 of 4
  • Item
    State of the Art on Diffusion Models for Visual Computing
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Po, Ryan; Yifan, Wang; Golyanik, Vladislav; Aberman, Kfir; Barron, Jon T.; Bermano, Amit; Chan, Eric; Dekel, Tali; Holynski, Aleksander; Kanazawa, Angjoo; Liu, C. Karen; Liu, Lingjie; Mildenhall, Ben; Nießner, Matthias; Ommer, Björn; Theobalt, Christian; Wonka, Peter; Wetzstein, Gordon; Aristidou, Andreas; Macdonnell, Rachel
    The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.
  • Item
    Recent Trends in 3D Reconstruction of General Non-Rigid Scenes
    (The Eurographics Association and John Wiley & Sons Ltd., 2024) Yunus, Raza; Lenssen, Jan Eric; Niemeyer, Michael; Liao, Yiyi; Rupprecht, Christian; Theobalt, Christian; Pons-Moll, Gerard; Huang, Jia-Bin; Golyanik, Vladislav; Ilg, Eddy; Aristidou, Andreas; Macdonnell, Rachel
    Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.
  • Item
    Virtual Humans meet Event-based and Quantum-enhanced Vision
    (The Eurographics Association, 2025) Habermann, Marc; Golyanik, Vladislav; Mantiuk, Rafal; Hildebrandt, Klaus
    The tutorial is split in two parts, i.e. two 90 minute talks. In the first half, Marc Habermann will provide an introduction to creating a digital twin of a real human. Second, Vladislav Golyanik will introduce new ways of sensing the real world using event-based vision and how quantum computers can enhance fundamental problems in graphics and vision.
  • Item
    D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video
    (The Eurographics Association and John Wiley & Sons Ltd., 2025) Kappel, Moritz; Hahlbohm, Florian; Scholz, Timon; Castillo, Susana; Theobalt, Christian; Eisemann, Martin; Golyanik, Vladislav; Magnor, Marcus; Bousseau, Adrien; Day, Angela
    Dynamic reconstruction and spatiotemporal novel-view synthesis of non-rigidly deforming scenes recently gained increased attention. While existing work achieves impressive quality and performance on multi-view or teleporting camera setups, most methods fail to efficiently and faithfully recover motion and appearance from casual monocular captures. This paper contributes to the field by introducing a new method for dynamic novel view synthesis from monocular video, such as casual smartphone captures. Our approach represents the scene as a dynamic neural point cloud, an implicit time-conditioned point distribution that encodes local geometry and appearance in separate hash-encoded neural feature grids for static and dynamic regions. By sampling a discrete point cloud from our model, we can efficiently render high-quality novel views using a fast differentiable rasterizer and neural rendering network. Similar to recent work, we leverage advances in neural scene analysis by incorporating data-driven priors like monocular depth estimation and object segmentation to resolve motion and depth ambiguities originating from the monocular captures. In addition to guiding the optimization process, we show that these priors can be exploited to explicitly initialize our scene representation to drastically improve optimization speed and final image quality. As evidenced by our experimental evaluation, our dynamic point cloud model not only enables fast optimization and real-time frame rates for interactive applications, but also achieves competitive image quality on monocular benchmark sequences. Our code and data are available online https://moritzkappel.github.io/projects/dnpc/.