A Data Collection Protocol, Tool and Analysis for the Mapping of Speech Volume to Avatar Facial Animation

Miyawaki, RyosukePerusquia-Hernandez, MonicaIsoyama, NaoyaUchiyama, HideakiKiyokawa, KiyoshiHideaki UchiyamaJean-Marie Normand2022-11-292022-11-292022978-3-03868-179-31727-530Xhttps://doi.org/10.2312/egve.20221273https://diglib.eg.org:443/handle/10.2312/egve20221273Knowing the relationship between speech-related facial movement and speech is important for avatar animation. Accurate facial displays are necessary to convey perceptual speech characteristics fully. Recently, an effort has been made to infer the relationship between facial movement and speech with data-driven methodologies using computer vision. To this aim, we propose to use blendshape-based facial movement tracking, because it can be easily translated to avatar movement. Furthermore, we present a protocol for audio-visual and behavioral data collection and a tool running on WEB that aids in collecting and synchronizing data. As a start, we provide a database of six Japanese participants reading emotion-related scripts at different volume levels. Using this methodology, we found a relationship between speech volume and facial movement around the nose, cheek, mouth, and head pitch. We hope that our protocols, WEB-based tool, and collected data will be useful for other scientists to derive models for avatar animation.Attribution 4.0 International LicenseCCS Concepts: Human-centered computing -> Visualization toolkitsHuman centered computingVisualization toolkitsA Data Collection Protocol, Tool and Analysis for the Mapping of Speech Volume to Avatar Facial Animation10.2312/egve.2022127327-348 pages