Generalizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot

Dang, RujingWang, ShaohuiWang, HaoqianChaine, RaphaëlleDeng, ZhigangKim, Min H.2023-10-092023-10-092023978-3-03868-234-9https://doi.org/10.2312/pg.20231274https://diglib.eg.org:443/handle/10.2312/pg20231274Audio-driven talking head generation has wide applications in virtual games, hosts, online meetings, etc. Recently, great achievements have been made in synthesizing talking heads based on neural radiance fields. However, the existing few-shot talking head synthesis methods still suffer from inaccurate deformation and lack of visual consistency. Therefore, we propose a Generalizable Dynamic Radiance Field (GDRF), which can rapidly generalize to unseen identities with few-shot. We introduce a warping module with 3D constraints to act in feature volume space, which is identity adaptive and exhibits excellent shape-shifting abilities. Our method can generate more accurately deformed and view consistent target images compared to previous methods. Furthermore, we map the audio signal to 3DMM parameters by applying an LSTM network, which helps get long-term context and generate more continuous and natural video. Extensive experiments demonstrate the superiority of our proposed method.Attribution 4.0 International LicenseCCS Concepts: Computing methodologies -> Reconstruction; Animation; Shape representationsComputing methodologiesReconstructionAnimationShape representationsGeneralizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot10.2312/pg.2023127481-888 pages