Expressive Speech Animation Synthesis with Phoneme-Level Controls

dc.contributor.authorDeng, Z.en_US
dc.contributor.authorNeumann, U.en_US
dc.date.accessioned2015-02-21T13:21:53Z
dc.date.available2015-02-21T13:21:53Z
dc.date.issued2008en_US
dc.description.abstractThis paper presents a novel data-driven expressive speech animation synthesis system with phoneme-level controls. This system is based on a pre-recorded facial motion capture database, where an actress was directed to recite a pre-designed corpus with four facial expressions (neutral, happiness, anger and sadness). Given new phoneme-aligned expressive speech and its emotion modifiers as inputs, a constrained dynamic programming algorithm is used to search for best-matched captured motion clips from the processed facial motion database by minimizing a cost function. Users optionally specify hard constraints (motion-node constraints for expressing phoneme utterances) and soft constraints (emotion modifiers) to guide this search process. We also introduce a phoneme-Isomap interface for visualizing and interacting phoneme clusters that are typically composed of thousands of facial motion capture frames. On top of this novel visualization interface, users can conveniently remove contaminated motion subsequences from a large facial motion dataset. Facial animation synthesis experiments and objective comparisons between synthesized facial motion and captured motion showed that this system is effective for producing realistic expressive speech animations.en_US
dc.description.number8en_US
dc.description.seriesinformationComputer Graphics Forumen_US
dc.description.volume27en_US
dc.identifier.doi10.1111/j.1467-8659.2008.01192.xen_US
dc.identifier.issn1467-8659en_US
dc.identifier.pages2096-2113en_US
dc.identifier.urihttps://doi.org/10.1111/j.1467-8659.2008.01192.xen_US
dc.publisherThe Eurographics Association and Blackwell Publishing Ltden_US
dc.titleExpressive Speech Animation Synthesis with Phoneme-Level Controlsen_US
Files
Collections