Search Results

Now showing 1 - 4 of 4
  • Item
    MoNeRF: Deformable Neural Rendering for Talking Heads via Latent Motion Navigation
    (Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Li, X.; Ding, Y.; Li, R.; Tang, Z.; Li, K.
    Novel view synthesis for talking heads presents significant challenges due to the complex and diverse motion transformations involved. Conventional methods often resort to reliance on structure priors, like facial templates, to warp observed images into a canonical space conducive to rendering. However, the incorporation of such priors introduces a trade‐off‐while aiding in synthesis, they concurrently amplify model complexity, limiting generalizability to other deformable scenes. Departing from this paradigm, we introduce a pioneering solution: the motion‐conditioned neural radiance field, MoNeRF, designed to model talking heads through latent motion navigation. At the core of MoNeRF lies a novel approach utilizing a compact set of latent codes to represent orthogonal motion directions. This innovative strategy empowers MoNeRF to efficiently capture and depict intricate scene motion by linearly combining these latent codes. In an extended capability, MoNeRF facilitates motion control through latent code adjustments, supports view transfer based on reference videos, and seamlessly extends its applicability to model human bodies without necessitating structural modifications. Rigorous quantitative and qualitative experiments unequivocally demonstrate MoNeRF's superior performance compared to state‐of‐the‐art methods in talking head synthesis. We will release the source code upon publication.
  • Item
    Deep‐Learning‐Based Facial Retargeting Using Local Patches
    (Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Choi, Yeonsoo; Lee, Inyup; Cha, Sihun; Kim, Seonghyeon; Jung, Sunjin; Noh, Junyong
    In the era of digital animation, the quest to produce lifelike facial animations for virtual characters has led to the development of various retargeting methods. While the retargeting facial motion between models of similar shapes has been very successful, challenges arise when the retargeting is performed on stylized or exaggerated 3D characters that deviate significantly from human facial structures. In this scenario, it is important to consider the target character's facial structure and possible range of motion to preserve the semantics assumed by the original facial motions after the retargeting. To achieve this, we propose a local patch‐based retargeting method that transfers facial animations captured in a source performance video to a target stylized 3D character. Our method consists of three modules. The Automatic Patch Extraction Module extracts local patches from the source video frame. These patches are processed through the Reenactment Module to generate correspondingly re‐enacted target local patches. The Weight Estimation Module calculates the animation parameters for the target character at every frame for the creation of a complete facial animation sequence. Extensive experiments demonstrate that our method can successfully transfer the semantic meaning of source facial expressions to stylized characters with considerable variations in facial feature proportion.
  • Item
    Conditional Font Generation With Content Pre‐Train and Style Filter
    (Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2024) Hong, Yang; Li, Yinfei; Qiao, Xiaojun; Zhang, Junsong
    Automatic font generation aims to streamline the design process by creating new fonts with minimal style references. This technology significantly reduces the manual labour and costs associated with traditional font design. Image‐to‐image translation has been the dominant approach, transforming font images from a source style to a target style using a few reference images. However, this framework struggles to fully decouple content from style, particularly when dealing with significant style shifts. Despite these limitations, image‐to‐image translation remains prevalent due to two main challenges faced by conditional generative models: (1) inability to handle unseen characters and (2) difficulty in providing precise content representations equivalent to the source font. Our approach tackles these issues by leveraging recent advancements in Chinese character representation research to pre‐train a robust content representation model. This model not only handles unseen characters but also generalizes to non‐existent ones, a capability absent in traditional image‐to‐image translation. We further propose a Transformer‐based Style Filter that not only accurately captures stylistic features from reference images but also handles any combination of them, fostering greater convenience for practical automated font generation applications. Additionally, we incorporate content loss with commonly used pixel‐ and perceptual‐level losses to refine the generated results from a comprehensive perspective. Extensive experiments validate the effectiveness of our method, particularly its ability to handle unseen characters, demonstrating significant performance gains over existing state‐of‐the‐art methods.
  • Item
    THGS: Lifelike Talking Human Avatar Synthesis From Monocular Video Via 3D Gaussian Splatting
    (Eurographics ‐ The European Association for Computer Graphics and John Wiley & Sons Ltd., 2025) Chen, Chuang; Yu, Lingyun; Yang, Quanwei; Zheng, Aihua; Xie, Hongtao
    Despite the remarkable progress in 3D talking head generation, directly generating 3D talking human avatars still suffers from rigid facial expressions, distorted hand textures and out‐of‐sync lip movements. In this paper, we extend speaker‐specific talking head generation task to and propose a novel pipeline, , that animates lifelike Talking Human avatars using 3D Gaussian Splatting (3DGS). Given speech audio, expression and body poses as input, effectively overcomes the limitations of 3DGS human re‐construction methods in capturing expressive dynamics, such as , from a short monocular video. Firstly, we introduce a simple yet effective for facial dynamics re‐construction, where subtle facial dynamics can be generated by linearly combining the static head model and expression blendshapes. Secondly, a is proposed for lip‐synced mouth movement animation, building connections between speech audio and mouth Gaussian movements. Thirdly, we employ a to optimize these parameters on the fly, which aligns hand movements and expressions better with video input. Experimental results demonstrate that can achieve high‐fidelity 3D talking human avatar animation at 150+ fps on a web‐based rendering system, improving the requirements of real‐time applications. Our project page is at .