Volume 43 (2024)
Permanent URI for this community
Browse
Browsing Volume 43 (2024) by Issue Date
Now showing 1 - 20 of 252
Results Per Page
Sort Options
Item LightUrban: Similarity Based Fine-grained Instancing for Lightweighting Complex Urban Point Clouds(The Eurographics Association and John Wiley & Sons Ltd., 2024) Lu, Zi Ang; Xiong, Wei Dan; Ren, Peng; Jia, Jin Yuan; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyLarge-scale urban point clouds play a vital role in various applications, while rendering and transmitting such data remains challenging due to its large volume, complicated structures, and significant redundancy. In this paper, we present LightUrban, the first point cloud instancing framework for efficient rendering and transmission of fine-grained complex urban scenes.We first introduce a segmentation method to organize the point clouds into individual buildings and vegetation instances from coarse to fine. Next, we propose an unsupervised similarity detection approach to accurately group instances with similar shapes. Furthermore, a fast pose and size estimation component is applied to calculate the transformations between the representative instance and the corresponding similar instances in each group. By replacing individual instances with their group's representative instances, the data volume and redundancy can be dramatically reduced. Experimental results on large-scale urban scenes demonstrate the effectiveness of our algorithm. To sum up, our method not only structures the urban point clouds but also significantly reduces data volume and redundancy, filling the gap in lightweighting urban landscapes through instancing.Item Time‐varying Extremum Graphs(© 2024 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Das, Somenath; Sridharamurthy, Raghavendra; Natarajan, Vijay; Alliez, Pierre; Wimmer, MichaelWe introduce time‐varying extremum graph (), a topological structure to support visualization and analysis of a time‐varying scalar field. The extremum graph is a sub‐structure of the Morse–Smale complex. It captures the adjacency relationship between cells in the Morse decomposition of a scalar field. We define the as a time‐varying extension of the extremum graph and demonstrate how it captures salient feature tracks within a dynamic scalar field. We formulate the construction of the as an optimization problem and describe an algorithm for computing the graph. We also demonstrate the capabilities of towards identification and exploration of topological events such as deletion, generation, split and merge within a dynamic scalar field via comprehensive case studies including a viscous fingers and a 3D von Kármán vortex street dataset.Item MatUp: Repurposing Image Upsamplers for SVBRDFs(The Eurographics Association and John Wiley & Sons Ltd., 2024) Gauthier, Alban; Kerbl, Bernhard; Levallois, Jérémy; Faury, Robin; Thiery, Jean-Marc; Boubekeur, Tamy; Garces, Elena; Haines, EricWe propose MATUP, an upsampling filter for material super-resolution. Our method takes as input a low-resolution SVBRDF and upscales its maps so that their rendering under various lighting conditions fits upsampled renderings inferred in the radiance domain with pre-trained RGB upsamplers. We formulate our local filter as a compact Multilayer Perceptron (MLP), which acts on a small window of the input SVBRDF and is optimized using a data-fitting loss defined over upsampled radiance at various locations. This optimization is entirely performed at the scale of a single, independent material. Doing so, MATUP leverages the reconstruction capabilities acquired over large collections of natural images by pre-trained RGB models and provides regularization over self-similar structures. In particular, our light-weight neural filter avoids retraining complex architectures from scratch or accessing any large collection of low/high resolution material pairs - which do not actually exist at the scale RGB upsamplers are trained with. As a result, MATUP provides fine and coherent details in the upscaled material maps, as shown in the extensive evaluation we provide.Item SGP 2024 CGF 43-5: Frontmatter(The Eurographics Association and John Wiley & Sons Ltd., 2024) Hu, Ruizhen; Lefebvre, Sylvain; Hu, Ruizhen; Lefebvre, SylvainItem A Robust Grid‐Based Meshing Algorithm for Embedding Self‐Intersecting Surfaces(© 2024 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Gagniere, S.; Han, Y.; Chen, Y.; Hyde, D.; Marquez‐Razon, A.; Teran, J.; Fedkiw, R.; Alliez, Pierre; Wimmer, MichaelThe creation of a volumetric mesh representing the interior of an input polygonal mesh is a common requirement in graphics and computational mechanics applications. Most mesh creation techniques assume that the input surface is not self‐intersecting. However, due to numerical and/or user error, input surfaces are commonly self‐intersecting to some degree. The removal of self‐intersection is a burdensome task that complicates workflow and generally slows down the process of creating simulation‐ready digital assets. We present a method for the creation of a volumetric embedding hexahedron mesh from a self‐intersecting input triangle mesh. Our method is designed for efficiency by minimizing use of computationally expensive exact/adaptive precision arithmetic. Although our approach allows for nearly no limit on the degree of self‐intersection in the input surface, our focus is on efficiency in the most common case: many minimal self‐intersections. The embedding hexahedron mesh is created from a uniform background grid and consists of hexahedron elements that are geometrical copies of grid cells. Multiple copies of a single grid cell are used to resolve regions of self‐intersection/overlap. Lastly, we develop a novel topology‐aware embedding mesh coarsening technique to allow for user‐specified mesh resolution as well as a topology‐aware tetrahedralization of the hexahedron mesh.Item TailorMe: Self-Supervised Learning of an Anatomically Constrained Volumetric Human Shape Model(The Eurographics Association and John Wiley & Sons Ltd., 2024) Wenninger, Stephan; Kemper, Fabian; Schwanecke, Ulrich; Botsch, Mario; Bermano, Amit H.; Kalogerakis, EvangelosHuman shape spaces have been extensively studied, as they are a core element of human shape and pose inference tasks. Classic methods for creating a human shape model register a surface template mesh to a database of 3D scans and use dimensionality reduction techniques, such as Principal Component Analysis, to learn a compact representation. While these shape models enable global shape modifications by correlating anthropometric measurements with the learned subspace, they only provide limited localized shape control. We instead register a volumetric anatomical template, consisting of skeleton bones and soft tissue, to the surface scans of the CAESAR database. We further enlarge our training data to the full Cartesian product of all skeletons and all soft tissues using physically plausible volumetric deformation transfer. This data is then used to learn an anatomically constrained volumetric human shape model in a self-supervised fashion. The resulting TAILORME model enables shape sampling, localized shape manipulation, and fast inference from given surface scans.Item HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections(The Eurographics Association and John Wiley & Sons Ltd., 2024) Dudai, Chen; Alper, Morris; Bezalel, Hana; Hanocka, Rana; Lang, Itai; Averbuch-Elor, Hadar; Bermano, Amit H.; Kalogerakis, EvangelosInternet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large-scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine-grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision-and-language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision-and-language models with adaptations for understanding landmark scene semantics. To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D-compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large-scale scenes with groundtruth segmentations for multiple semantic concepts. Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau-vailab.github.io/HaLo-NeRF/.Item Controllable Anime Image Editing Based on the Probability of Attribute Tags(The Eurographics Association and John Wiley & Sons Ltd., 2024) Song, Zhenghao; Mo, Haoran; Gao, Chengying; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyEditing anime images via probabilities of attribute tags allows controlling the degree of the manipulation in an intuitive and convenient manner. Existing methods fall short in the progressive modification and preservation of unintended regions in the input image. We propose a controllable anime image editing framework based on adjusting the tag probabilities, in which a probability encoding network (PEN) is developed to encode the probabilities into features that capture continuous characteristic of the probabilities. Thus, the encoded features are able to direct the generative process of a pre-trained diffusion model and facilitate the linear manipulation.We also introduce a local editing module that automatically identifies the intended regions and constrains the edits to be applied to those regions only, which preserves the others unchanged. Comprehensive comparisons with existing methods indicate the effectiveness of our framework in both one-shot and linear editing modes. Results in additional applications further demonstrate the generalization ability of our approach.Item Diffusion-based Human Motion Style Transfer with Semantic Guidance(The Eurographics Association and John Wiley & Sons Ltd., 2024) Hu, Lei; Zhang, Zihao; Ye, Yongjing; Xu, Yiwen; Xia, Shihong; Skouras, Melina; Wang, He3D Human motion style transfer is a fundamental problem in computer graphic and animation processing. Existing AdaINbased methods necessitate datasets with balanced style distribution and content/style labels to train the clustered latent space. However, we may encounter a single unseen style example in practical scenarios, but not in sufficient quantity to constitute a style cluster for AdaIN-based methods. Therefore, in this paper, we propose a novel two-stage framework for few-shot style transfer learning based on the diffusion model. Specifically, in the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior so that it can cope with various content motion inputs. In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer. The key idea is regarding the reverse process of diffusion as a motion-style translation process since the motion styles can be viewed as special motion variations. During the fine-tuning for style transfer, a simple yet effective semantic-guided style transfer loss coordinated with style example reconstruction loss is introduced to supervise the style transfer in CLIP semantic space. The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications. The source code is available at https://github.com/hlcdyy/diffusion-based-motion-style-transfer.Item TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering(The Eurographics Association and John Wiley & Sons Ltd., 2024) Franke, Linus; Rückert, Darius; Fink, Laura; Stamminger, Marc; Bermano, Amit H.; Kalogerakis, EvangelosPoint-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [KKLD23] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [RFS22] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. The project page is located at: https://lfranke.github.io/tripsItem Disentangled Lifespan Synthesis via Transformer-Based Nonlinear Regression(The Eurographics Association and John Wiley & Sons Ltd., 2024) Li, Mingyuan; Guo, Yingchun; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyLifespan face age transformation aims to generate facial images that accurately depict an individual's appearance at different age stages. This task is highly challenging due to the need for reasonable changes in facial features while preserving identity characteristics. Existing methods tend to synthesize unsatisfactory results, such as entangled facial attributes and low identity preservation, especially when dealing with large age gaps. Furthermore, over-manipulating the style vector may deviate it from the latent space and damage image quality. To address these issues, this paper introduces a novel nonlinear regression model- Disentangled Lifespan face Aging (DL-Aging) to achieve high-quality age transformation images. Specifically, we propose an age modulation encoder to extract age-related multi-scale facial features as key and value, and use the reconstructed style vector of the image as the query. The multi-head cross-attention in the W+ space is utilized to update the query for aging image reconstruction iteratively. This nonlinear transformation enables the model to learn a more disentangled mode of transformation, which is crucial for alleviating facial attribute entanglement. Additionally, we introduce a W+ space age regularization term to prevent excessive manipulation of the style vector and ensure it remains within theW+ space during transformation, thereby improving generation quality and aging accuracy. Extensive qualitative and quantitative experiments demonstrate that the proposed DL-Aging outperforms state-of-the-art methods regarding aging accuracy, image quality, attribute disentanglement, and identity preservation, especially for large age gaps.Item Practical Method to Estimate Fabric Mechanics from Metadata(The Eurographics Association and John Wiley & Sons Ltd., 2024) Dominguez-Elvira, Henar; Nicás, Alicia; Cirio, Gabriel; Rodríguez, Alejandro; Garces, Elena; Bermano, Amit H.; Kalogerakis, EvangelosEstimating fabric mechanical properties is crucial to create realistic digital twins. Existing methods typically require testing physical fabric samples with expensive devices or cumbersome capture setups. In this work, we propose a method to estimate fabric mechanics just from known manufacturer metadata such as the fabric family, the density, the composition, and the thickness. Further, to alleviate the need to know the fabric family –which might be ambiguous or unknown for nonspecialists– we propose an end-to-end neural method that works with planar images of the textile as input. We evaluate our methods using extensive tests that include the industry standard Cusick and demonstrate that both of them produce drapes that strongly correlate with the ground truth estimates provided by lab equipment. Our method is the first to propose such a simple capture method for mechanical properties outperforming other methods that require testing the fabric in specific setups.Item Real‐time Terrain Enhancement with Controlled Procedural Patterns(© 2024 Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Grenier, C.; Guérin, É.; Galin, É.; Sauvage, B.; Alliez, Pierre; Wimmer, MichaelAssisting the authoring of virtual terrains is a perennial challenge in the creation of convincing synthetic landscapes. Particularly, there is a need for augmenting artist-controlled low-resolution models with consistent relief details.We present a structured noise that procedurally enhances terrains in real time by adding spatially varying erosion patterns. The patterns can be cascaded, i.e. narrow ones are nested into large ones. Our model builds upon the Phasor noise, which we adapt to the specific characteristics of terrains (water flow, slope orientation). Relief details correspond to the underlying terrain characteristics and align with the slope to preserve the coherence of generated landforms. Moreover, our model allows for artist control, providing a palette of control maps, and can be efficiently implemented in graphics hardware, thus allowing for real-time synthesis and rendering, therefore permitting effective and intuitive authoring.Item PartwiseMPC: Interactive Control of Contact-Guided Motions(The Eurographics Association and John Wiley & Sons Ltd., 2024) Khoshsiyar, Niloofar; Gou, Ruiyu; Zhou, Tianhong; Andrews, Sheldon; Panne, Michiel van de; Skouras, Melina; Wang, HePhysics-based character motions remain difficult to create and control.We make two contributions towards simpler specification and faster generation of physics-based control. First, we introduce a novel partwise model predictive control (MPC) method that exploits independent planning for body parts when this proves beneficial, while defaulting to whole-body motion planning when that proves to be more effective. Second, we introduce a new approach to motion specification, based on specifying an ordered set of contact keyframes. These each specify a small number of pairwise contacts between the body and the environment, and serve as loose specifications of motion strategies. Unlike regular keyframes or traditional trajectory optimization constraints, they are heavily under-constrained and have flexible timing. We demonstrate a range of challenging contact-rich motions that can be generated online at interactive rates using this framework. We further show the generalization capabilities of the method.Item G-Style: Stylized Gaussian Splatting(The Eurographics Association and John Wiley & Sons Ltd., 2024) Kovács, Áron Samuel; Hermosilla, Pedro; Raidou, Renata Georgia; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe introduce G -Style, a novel algorithm designed to transfer the style of an image onto a 3D scene represented using Gaussian Splatting. Gaussian Splatting is a powerful 3D representation for novel view synthesis, as-compared to other approaches based on Neural Radiance Fields-it provides fast scene renderings and user control over the scene. Recent pre-prints have demonstrated that the style of Gaussian Splatting scenes can be modified using an image exemplar. However, since the scene geometry remains fixed during the stylization process, current solutions fall short of producing satisfactory results. Our algorithm aims to address these limitations by following a three-step process: In a pre-processing step, we remove undesirable Gaussians with large projection areas or highly elongated shapes. Subsequently, we combine several losses carefully designed to preserve different scales of the style in the image, while maintaining as much as possible the integrity of the original scene content. During the stylization process and following the original design of Gaussian Splatting, we split Gaussians where additional detail is necessary within our scene by tracking the gradient of the stylized color. Our experiments demonstrate that G -Style generates high-quality stylizations within just a few minutes, outperforming existing methods both qualitatively and quantitativelyItem Computational Smocking through Fabric-Thread Interaction(The Eurographics Association and John Wiley & Sons Ltd., 2024) Zhou, Ningfeng; Ren, Jing; Sorkine-Hornung, Olga; Bermano, Amit H.; Kalogerakis, EvangelosWe formalize Italian smocking, an intricate embroidery technique that gathers flat fabric into pleats along meandering lines of stitches, resulting in pleats that fold and gather where the stitching veers. In contrast to English smocking, characterized by colorful stitches decorating uniformly shaped pleats, and Canadian smocking, which uses localized knots to form voluminous pleats, Italian smocking permits the fabric to move freely along the stitched threads following curved paths, resulting in complex and unpredictable pleats with highly diverse, irregular structures, achieved simply by pulling on the threads. We introduce a novel method for digital previewing of Italian smocking results, given the thread stitching path as input. Our method uses a coarse-grained mass-spring system to simulate the interaction between the threads and the fabric. This configuration guides the fine-level fabric deformation through an adaptation of the state-of-the-art simulator, C-IPC [LKJ21]. Our method models the general problem of fabric-thread interaction and can be readily adapted to preview Canadian smocking as well.We compare our results to baseline approaches and physical fabrications to demonstrate the accuracy of our method.Item Cinematographic Camera Diffusion Model(The Eurographics Association and John Wiley & Sons Ltd., 2024) Jiang, Hongda; Wang, Xi; Christie, Marc; Liu, Libin; Chen, Baoquan; Bermano, Amit H.; Kalogerakis, EvangelosDesigning effective camera trajectories in virtual 3D environments is a challenging task even for experienced animators. Despite an elaborate film grammar, forged through years of experience, that enables the specification of camera motions through cinematographic properties (framing, shots sizes, angles, motions), there are endless possibilities in deciding how to place and move cameras with characters. Dealing with these possibilities is part of the complexity of the problem. While numerous techniques have been proposed in the literature (optimization-based solving, encoding of empirical rules, learning from real examples,...), the results either lack variety or ease of control. In this paper, we propose a cinematographic camera diffusion model using a transformer-based architecture to handle temporality and exploit the stochasticity of diffusion models to generate diverse and qualitative trajectories conditioned by high-level textual descriptions. We extend the work by integrating keyframing constraints and the ability to blend naturally between motions using latent interpolation, in a way to augment the degree of control of the designers. We demonstrate the strengths of this text-to-camera motion approach through qualitative and quantitative experiments and gather feedback from professional artists. The code and data are available at https://github.com/jianghd1996/Camera-control.Item Practical Appearance Model for Foundation Cosmetics(The Eurographics Association and John Wiley & Sons Ltd., 2024) Lanza, Dario; Padrón-Griffe, Juan Raúl; Pranovich, Alina; Muñoz, Adolfo; Frisvad, Jeppe Revall; Jarabo, Adrian; Garces, Elena; Haines, EricCosmetic products have found their place in various aspects of human life, yet their digital appearance reproduction has received little attention. We present an appearance model for cosmetics, in particular for foundation layers, that reproduces a range of existing appearances of foundation cosmetics: from a glossy to a matte to an almost velvety look. Our model is a multilayered BSDF that reproduces the stacking of multiple layers of cosmetics. Inspired by the microscopic particulates used in cosmetics, we model each individual layer as a stochastic participating medium with two types of scatterers that mimic the most prominent visual features of cosmetics: spherical diffusers, resulting in a uniform distribution of radiance; and platelets, responsible for the glossy look of certain cosmetics.We implement our model on top of the position-free Monte Carlo framework, that allows us to include multiple scattering. We validate our model against measured reflectance data, and demonstrate the versatility and expressiveness of our model by thoroughly exploring the range of appearances that it can produce.Item Robust Diffusion-based Motion In-betweening(The Eurographics Association and John Wiley & Sons Ltd., 2024) Qin, Jia; Yan, Peng; An, Bo; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyThe emergence of learning-based motion in-betweening techniques offers animators a more efficient way to animate characters. However, existing non-generative methods either struggle to support long transition generation or produce results that lack diversity. Meanwhile, diffusion models have shown promising results in synthesizing diverse and high-quality motions driven by text and keyframes. However, in these methods, keyframes often serve as a guide rather than a strict constraint and can sometimes be ignored when keyframes are sparse. To address these issues, we propose a lightweight yet effective diffusionbased motion in-betweening framework that generates animations conforming to keyframe constraints.We incorporate keyframe constraints into the training phase to enhance robustness in handling various constraint densities. Moreover, we employ relative positional encoding to improve the model's generalization on long range in-betweening tasks. This approach enables the model to learn from short animations while generating realistic in-betweening motions spanning thousands of frames. We conduct extensive experiments to validate our framework using the newly proposed metrics K-FID, K-Diversity, and K-Error, designed to evaluate generative in-betweening methods. Results demonstrate that our method outperforms existing diffusion-based methods across various lengths and keyframe densities. We also show that our method can be applied to text-driven motion synthesis, offering fine-grained control over the generated results.Item Real-time Neural Rendering of Dynamic Light Fields(The Eurographics Association and John Wiley & Sons Ltd., 2024) Coomans, Arno; Dominici, Edoardo Alberto; Döring, Christian; Mueller, Joerg H.; Hladky, Jozef; Steinberger, Markus; Bermano, Amit H.; Kalogerakis, EvangelosSynthesising high-quality views of dynamic scenes via path tracing is prohibitively expensive. Although caching offline-quality global illumination in neural networks alleviates this issue, existing neural view synthesis methods are limited to mainly static scenes, have low inference performance or do not integrate well with existing rendering paradigms. We propose a novel neural method that is able to capture a dynamic light field, renders at real-time frame rates at 1920x1080 resolution and integrates seamlessly with Monte Carlo ray tracing frameworks. We demonstrate how a combination of spatial, temporal and a novel surface-space encoding are each effective at capturing different kinds of spatio-temporal signals. Together with a compact fully-fused neural network and architectural improvements, we achieve a twenty-fold increase in network inference speed compared to related methods at equal or better quality. Our approach is suitable for providing offline-quality real-time rendering in a variety of scenarios, such as free-viewpoint video, interactive multi-view rendering, or streaming rendering. Finally, our work can be integrated into other rendering paradigms, e.g., providing a dynamic background for interactive scenarios where the foreground is rendered with traditional methods.