Audio-Driven Speech Animation with Text-Guided Expression
Loading...
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The Eurographics Association
Abstract
We introduce a novel method for generating expressive speech animations of a 3D face, driven by both audio and text descriptions. Many previous approaches focused on generating facial expressions using pre-defined emotion categories. In contrast, our method is capable of generating facial expressions from text descriptions unseen during training, without limitations to specific emotion classes. Our system employs a two-stage approach. In the first stage, an auto-encoder is trained to disentangle content and expression features from facial animations. In the second stage, two transformer-based networks predict the content and expression features from audio and text inputs, respectively. These features are then passed to the decoder of the pre-trained auto-encoder, yielding the final expressive speech animation. By accommodating diverse forms of natural language, such as emotion words or detailed facial expression descriptions, our method offers an intuitive and versatile way to generate expressive speech animations. Extensive quantitative and qualitative evaluations, including a user study, demonstrate that our method can produce natural expressive speech animations that correspond to the input audio and text descriptions.
Description
CCS Concepts: Computing methodologies → Animation; Neural networks
@inproceedings{10.2312:pg.20241290,
booktitle = {Pacific Graphics Conference Papers and Posters},
editor = {Chen, Renjie and Ritschel, Tobias and Whiting, Emily},
title = {{Audio-Driven Speech Animation with Text-Guided Expression}},
author = {Jung, Sunjin and Chun, Sewhan and Noh, Junyong},
year = {2024},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-250-9},
DOI = {10.2312/pg.20241290}
}