Eurographics Digital Library
This is the DSpace 7 platform of the Eurographics Digital Library.
- The contents of the Eurographics Digital Library Archive are freely accessible. Only access to the full-text documents of the journal Computer Graphics Forum (joint property of Wiley and Eurographics) is restricted to Eurographics members, people from institutions who have an Institutional Membership at Eurographics, or users of the TIB Hannover. On the item pages you will find so-called purchase links to the TIB Hannover.
- As a Eurographics member, you can log in with your email address and password from https://services.eg.org. If you are part of an institutional member and you are on a computer with a Eurographics registered IP domain, you can proceed immediately.
- From 2022, all new releases published by Eurographics will be licensed under Creative Commons. Publishing with Eurographics is Plan-S compliant. Please visit Eurographics Licensing and Open Access Policy for more details.
Recent Submissions
Toward Democratizing Human Motion Generation
(Tel Aviv University, 2025-04-24) Guy Tevet
Human motion generation is a challenging task due to the intricate complexity of human movement. Capturing the subtle dynamics of coordination, balance, and expression requires models capable of synthesizing both the physical plausibility and the nuanced variability inherent in human motion. Furthermore, the interdependence of spatial and temporal factors makes designing effective algorithms an intricate problem. As a result, motion generation remains accessible primarily to professional users, and even for them, it is a labor-intensive process requiring significant expertise and resources. The overarching goal of this work is to develop generative tools and intuitive controls that empower content creators, democratizing human motion synthesis and addressing these challenges. By leveraging advances in machine learning and generative Artificial intelligence (AI), this research seeks to enable users, regardless of expertise, to produce realistic, diverse, and context-aware animations with minimal effort. Such tools are not only intended to ease the technical and creative burden for professionals but also to open up animation and motion creation to a broader audience, making the process approachable, efficient, and cost-effective. The journey begins with MotionCLIP, which bridges the human motion domain with the semantic richness of CLIP. By aligning human motion representations with CLIP’s text and image embeddings, MotionCLIP enables text-to-motion generation, semantic editing, and interpolation. Its capability to interpret abstract prompts is exemplified by its ability to generate a sitting motion from the prompt "couch" or mimic a web-swinging motion from "Spiderman". These results demonstrate MotionCLIP’s potential to create nuanced animations and expand the creative toolkit for animators and novices alike. Next, the Motion Diffusion Model (MDM) introduces diffusion processes into motion synthesis, addressing the diversity and many-to-many mapping inherent in human motion. MDM combines a lightweight transformer architecture with geometric losses to ensure physically plausible and visually coherent results. It excels in tasks like text-to-motion and action-to-motion, offering state-of-the-art performance on benchmarks while requiring modest computational resources. MDM’s versatility is further demonstrated in inpainting tasks, such as filling gaps in motion sequences or editing specific body parts while preserving the rest of the animation. Extending the utility of diffusion models, Human Motion Diffusion as a Generative Prior explores advanced composition techniques for motion generation via different types of composition. Sequential composition enables the synthesis of long, coherent animations by stitching shorter segments, while parallel composition allows the generation of multi-character interactions using a lightweight communication block. Model composition offers fine-grained control, blending priors to edit and refine joint-level motion trajectories. These methods highlight how generative priors can support complex and nuanced motion applications, addressing previously unmet needs in the field. Finally, we suggest integrating data-driven motion generation into physics simulation through CLoSD, a framework that combines motion diffusion models with reinforcement learning (RL). Acting as a universal planner, the diffusion module generates text-driven motion plans, while the RL controller ensures physical plausibility and interaction with the environment. This synergy enables characters to perform a variety of tasks, from navigating to a goal to object interactions and transitioning between actions like sitting and standing. CLoSD thus bridges the gap between intuitive control and physical realism, opening new horizons for interactive motion generation. By addressing the inherent challenges of motion synthesis using neural generative methods, this work influenced how motion is created and controlled. Its contributions lay a groundwork for intuitive, democratized tools that will potentially empower professionals and novices to produce rich, realistic animations.
Neural Point-based Rendering for Immersive Novel View Synthesis
(Open FAU, 2025-05-26) Franke, Linus
Recent advances in neural rendering have greatly improved the realism and efficiency of digitizing real-world environments, enabling new possibilities for virtual experiences. However, achieving high-quality digital replicas of physical spaces is challenging due to the need for advanced 3D reconstruction and real-time rendering techniques, with visual outputs often deteriorating in challenging capturing conditions. Thus, this thesis explores point-based neural rendering approaches to address key challenges such as geometric inconsistencies, scalability, and perceptual fidelity, ultimately enabling realistic and interactive virtual scene exploration. The vision here is to enable immersive virtual reality (VR) scene exploration and virtual teleportation with the best perceptual quality for the user. This work introduces techniques to improve point-based Novel View Synthesis (NVS) by refining geometric accuracy and reducing visual artifacts. By detecting and correcting errors in point-cloud-based reconstructions, this approach improves rendering stability and accuracy. Additionally, an efficient rendering pipeline is proposed that combines rasterization with neural refinement to achieve high-quality results at real-time frame rates, ensuring smooth and consistent visual output across diverse scenes. To extend the scalability of neural point representations, a hierarchical structure is presented that efficiently organizes and renders massive point clouds, enabling real-time NVS of city-scale environments. Furthermore, a perceptually optimized foveated rendering technique is developed for VR applications, leveraging the characteristics of the human visual system to balance performance and perceptual quality. Lastly, a real-time neural reconstruction technique is proposed that eliminates preprocessing requirements, allowing for immediate virtual teleportation and interactive scene exploration. Through these advances, this thesis pushes the boundaries of neural point-based rendering, offering solutions that balance quality, efficiency, and scalability. The findings pave the way for more interactive and immersive virtual experiences, with applications spanning VR, augmented reality (AR), and digital content exploration.
Image-based 3D Reconstructions via Differentiable Rendering of Neural Implicit Representations
(2025-02-14) Tianhao Wu
Modeling objects in 3D is critical for various graphics and metaverse applications and is a fundamental step towards 3D machine reasoning, and the ability to reconstruct objects from RGB images only significantly enhances its applications. Representing objects in 3D involves learning two distinct aspects of the objects: geometry, which represents where the mass is located; and appearance, which affects the exact pixel colors to be rendered on the screen. While learning approximated appearance with known geometry is straightforward, obtaining correct geometry or recovering both simultaneously from RGB images alone has been a challenging task for a long period. The recent advancements in Differentiable Rendering and Neural Implicit Representations have significantly pushed the limits of geometry and appearance reconstruction from RGB images. Utilizing their continuous, differentiable, and less restrictive representations, it is possible to optimize geometry and appearance simultaneously from the ground truth images, leading to much better reconstruction accuracy and re-rendering quality. As one of the major neural implicit representations that have received great attention, Neural Radiance Field (NeRF) achieves clean and straightforward reconstruction of volumetric geometry and non-Lambertian appearance together from a dense set of RGB images. Various other forms of representations or modifications have also been proposed to handle specific tasks such as smooth surface modeling, sparse view reconstruction, or dynamic scene reconstruction. However, existing methods still make strict assumptions about the scenes captured and reconstructed, significantly constraining their application scenarios. For instance, current reconstructions typically assume the scene to be perfectly opaque with no semi-transparent effects, or assume no dynamic noise or occluders are included in the capture, or do not optimize rendering efficiency for high-frequency appearance in the scene. In this dissertation, we present three advancements to push the quality of image-based 3D reconstruction towards robust, reliable, and user-friendly real-world solutions. Our improvements cover all of the representation, architecture, and optimization of image-based 3D reconstruction approaches. First, we introduce AlphaSurf, a novel implicit representation with decoupled geometry and surface opacity and a grid-based architecture to enable accurate surface reconstruction of intricate or semi-transparent objects. Compared to a traditional image-based 3D reconstruction pipeline that considers only geometry and appearance, it distinguishes the calculation of the ray-surface intersection and intersection opacity differently while maintaining both to be naturally differentiable, supporting decoupled optimization from photometric loss. Specifically, intersections on AlphaSurf are found in closed-form via analytical solutions of cubic polynomials, avoiding Monte-Carlo sampling, and are therefore fully differentiable by construction, whereas additional grid-based opacity and radiance field are incorporated to allow reconstruction from RGB images only. We then consider the dynamic noise and occluders accidentally included in capture for static 3D reconstruction, as this is a common challenge encountered in the real world. This issue is particularly problematic for street scans or scenes with potential dynamic noises, such as cars, humans, or plants. We propose D^2NeRF, a method that reconstructs 3D scenes from casual mobile phone videos with all dynamic occluders decoupled from the static scene. This approach incorporates modeling of both 3D and 4D objects from RGB images and utilizes freedom constraints to achieve dynamic decoupling without semantic-based guidance. Hence, it can work on uncommon dynamic noises such as pouring liquid and moving shadows. Finally, we look into the efficiency constraint of 3D reconstruction and rendering, and specifically propose a solution for light-weight representation of scene components with simple geometry but high-frequency textures. We utilize a sparse set of anchors with correspondences from 3D to 2D texture space, enabling the high-frequency clothes on a forward-facing neural avatar to be modeled using 2D texture with neural deformation as a simplified and constrained representation. This dissertation provides a comprehensive overview of neural implicit representations and various applications in 3D reconstruction from RGB images, along with several advancements for achieving more robust and efficient reconstruction in various challenging real-world scenarios. We demonstrate that the representation, architecture, and optimization need to be specifically designed to deal with challenging obstacles in the image-based reconstruction task due to the severely ill-posed nature of the problem. With the correct design of the method, we can reconstruct translucent surfaces, remove dynamic occluders in the capture, and efficiently model high-frequency appearance from only posed multiview images or monocular video.
A micrograin formalism for the rendering of porous materials
(2024-12-06) Lucas, Simon
This thesis focuses on the impact of microscopic structures on material appearance, with a particular emphasis on porous materials. We first evaluated existing appearance models by conducting light transport simulations on sphere aggregates representing porous volumes. We found that none of the existing models accurately matched the simulations, with most errors arising from surface effects. This opened the path to the development of a specialized Bidirectional Scattering Distribution Function (BSDF) model for rendering porous layers, such as those found on surfaces covered with dust, rust, or dirt. Our model extends the Trowbridge-Reitz (GGX) distribution to handle pores between elliptical opaque micrograins and introduces a view- and light-dependent filling factor to blend porous and base layers. By adding height-normal and light-view correlations in the masking and shadowing terms, our model produces realistic effects seen in real world materials that were previously hardly obtainable like retro-reflection and height-color correlations. To improve the rendering efficiency of micrograin materials, we introduce an efficient importance sampling routine for visible Normal Distribution Functions (vNDF). Through numerical simulations, we validate the accuracy of our model. Finally, our work provides a comprehensive formalism for rendering porous layers and opens many perspectives of futur work.
How to Train Your Renderer: Optimized Methods for Learning Path Distributions in Monte Carlo Light Transport
(2025-05-06) Rath, Alexander
Light transport simulation allows us to preview architectural marvels before they break ground, practice complex surgeries without a living subject, and explore alien worlds from the comfort of our homes. Fueled by the steady advancements in computer hardware, rendering virtual scenes is more accessible than ever, and is met by an unprecedented demand for such content. Light interacts with our world in various intricate ways, hence the challenge in realistic rendering lies in tracing all the possible paths that light could take within a given virtual scene. Contemporary approaches predominantly rely on Monte Carlo integration, for which countless sampling procedures have been proposed to handle certain families of effects robustly. Handling all effects holistically through specialized sampling routines, however, remains an unsolved problem.
A promising alternative is to use learning techniques that automatically adapt to the effects present in the scene. However, such approaches require many complex design choices to be made, which existing works commonly resort to heuristics for. In this work, we investigate what constitutes effective learning algorithms for rendering – from data representation and the quantities to be learned, to the fitting process itself. By strategically optimizing these components for desirable goals, such as overall render efficiency, we demonstrate significant improvements over existing approaches.