Personalized Visual Dubbing through Virtual Dubber and Full Head Reenactment

Jeon, BobaePaquette, EricMudur, SudhirPopa, TiberiuCeylan, DuyguLi, Tzu-Mao2025-05-092025-05-092025978-3-03868-268-41017-4656https://doi.org/10.2312/egs.20251034https://diglib.eg.org/handle/10.2312/egs20251034Visual dubbing aims to modify facial expressions to ''lip-sync'' a new audio track. While person-generic talking head generation methods achieve expressive lip synchronization across arbitrary identities, they usually lack person-specific details and fail to generate high-quality results. Conversely, person-specific methods require extensive training. Our method combines the strengths of both methods by incorporating a virtual dubber, a person-generic talking head, as an intermediate representation. We then employ an autoencoder-based person-specific identity swapping network to transfer the actor identity, enabling fullhead reenactment that includes hair, face, ears, and neck. This eliminates artifacts while ensuring temporal consistency. Our quantitative and qualitative evaluation demonstrate that our method achieves a superior balance between lip-sync accuracy and realistic facial reenactment.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies → Image manipulation; AnimationComputing methodologies → Image manipulationAnimationPersonalized Visual Dubbing through Virtual Dubber and Full Head Reenactment10.2312/egs.202510344 pages