Dynamic Units of Visual Speech

Taylor, Sarah L.; Mahler, Moshe; Theobald, Barry-John; Matthews, Iain

Dynamic Units of Visual Speech

Files

275-284.pdf (1.37 MB)

visemes-sca-2012-720-final.mov (42.64 MB)

Date

2012

Authors

Taylor, Sarah L.
Mahler, Moshe
Theobald, Barry-John
Matthews, Iain

Publisher

The Eurographics Association

Abstract

We present a new method for generating a dynamic, concatenative, unit of visual speech that can generate realistic visual speech animation. We redefine visemes as temporal units that describe distinctive speech movements of the visual speech articulators. Traditionally visemes have been surmized as the set of static mouth shapes representing clusters of contrastive phonemes (e.g. /p, b, m/, and /f, v/). In this work, the motion of the visual speech articulators are used to generate discrete, dynamic visual speech gestures. These gestures are clustered, providing a finite set of movements that describe visual speech, the visemes. Dynamic visemes are applied to speech animation by simply concatenating viseme units. We compare to static visemes using subjective evaluation. We find that dynamic visemes are able to produce more accurate and visually pleasing speech animation given phonetically annotated audio, reducing the amount of time that an animator needs to spend manually refining the animation.

        @inproceedings{:10.2312/SCA/SCA12/275-284
,
booktitle = {Eurographics/ ACM SIGGRAPH Symposium on Computer Animation
},
editor = {Jehee Lee and Paul Kry
},
title = {{Dynamic Units of Visual Speech
}},
author = {Taylor, Sarah L. and 
Mahler, Moshe and 
Theobald, Barry-John and 
Matthews, Iain
},
year = {2012
},
publisher = {The Eurographics Association
},
ISSN = {1727-5288
},
ISBN = {978-3-905674-37-8
},
DOI = {/10.2312/SCA/SCA12/275-284
}
}