• Login
    View Item 
    •   Eurographics DL Home
    • Graphics Dissertation Online
    • 2021
    • View Item
    •   Eurographics DL Home
    • Graphics Dissertation Online
    • 2021
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Machine Learning For Plausible Gesture Generation From Speech For Virtual Humans

    Thumbnail
    View/Open
    full thesis, two-sided (10.42Mb)
    Date
    2021-08-03
    Author
    Ferstl, Ylva
    Item/paper (currently) not available via TIB Hannover.
    Metadata
    Show full item record
    Abstract
    The growing use of virtual humans in an array of applications such as games, human-computer interfaces, and virtual reality demands the design of appealing and engaging characters, while minimizing the cost and time of creation. Nonverbal behavior is an integral part of human communication and important for believable embodied virtual agents. Co-speech gesture represents a key aspect of nonverbal communication and virtual agents are more engaging when exhibiting gesture behavior. Hand-animation of gesture is costly and does not scale to applications where agents may produce new utterances after deployment. Automatized gesture generation is therefore attractive, enabling any new utterance to be animated on the go. A major body of research has been dedicated to methods of automatic gesture generation, but generating expressive and defined gesture motion has commonly relied on explicit formulation of if-then rules or probabilistic modelling of annotated features. Able to work on unlabelled data, machine learning approaches are catching up, however, they often still produce averaged motion failing to capture the speech-gesture relationship adequately. The results from machine-learned models point to the high complexity of the speech-to-motion learning task. In this work, we explore a number of machine learning methods for improving the speech-to-motion learning outcome, including the use of transfer learning from speech and motion models, adversarial training, as well as modelling explicit expressive gesture parameters from speech. We develop a method for automatically segmenting individual gestures from a motion stream, enabling detailed analysis of the speech-gesture relationship. We present two large multimodal datasets of conversational speech and motion, designed specifically for this modelling problem. We finally present and evaluate a novel speech-to-gesture system, merging methods of machine learning and database sampling.
    URI
    https://diglib.eg.org:443/handle/10.2312/2633145
    Collections
    • 2021

    Eurographics Association copyright © 2013 - 2023 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA
     

     

    Browse

    All of Eurographics DLCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    BibTeX | TOC

    Create BibTeX Create Table of Contents

    Eurographics Association copyright © 2013 - 2023 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA