Expressive Speech Animation Synthesis with Phoneme-Level Controls

Deng, Z.; Neumann, U.

Expressive Speech Animation Synthesis with Phoneme-Level Controls

Date

2008

Authors

Deng, Z.
Neumann, U.

Publisher

The Eurographics Association and Blackwell Publishing Ltd

Abstract

This paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a pre-designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best-matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify hard constraints (motion-node constraints for expressing phoneme utterances) and soft constraints (emotion modifiers) to guide this search process. We also introduce a phoneme-Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.

        @article{10.1111:j.1467-8659.2008.01192.x
,
journal = {Computer Graphics Forum},
title = {{Expressive Speech Animation Synthesis with Phoneme-Level Controls
}},
author = {Deng, Z. and 
Neumann, U.
},
year = {2008
},
publisher = {The Eurographics Association and Blackwell Publishing Ltd
},
ISSN = {1467-8659
},
DOI = {10.1111/j.1467-8659.2008.01192.x
}
}

URI

https://doi.org/10.1111/j.1467-8659.2008.01192.x

Collections

27-Issue 8

Full item page