Show simple item record

dc.contributor.authorKuriyama, Shigeruen_US
dc.contributor.authorMukai, Tomohikoen_US
dc.contributor.authorTaketomi, Takafumien_US
dc.contributor.authorMukasa, Tomoyukien_US
dc.contributor.editorDominik L. Michelsen_US
dc.contributor.editorSoeren Pirken_US
dc.date.accessioned2022-08-10T15:20:04Z
dc.date.available2022-08-10T15:20:04Z
dc.date.issued2022
dc.identifier.issn1467-8659
dc.identifier.urihttps://doi.org/10.1111/cgf.14645
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf14645
dc.description.abstractGestural animations in the amusement or entertainment field often require rich expressions; however, it is still challenging to synthesize characteristic gestures automatically. Although style transfer based on a neural network model is a potential solution, existing methods mainly focus on cyclic motions such as gaits and require re-training in adding new motion styles. Moreover, their per-pose transformation cannot consider the time-dependent features, and therefore motion styles of different periods and timings are difficult to be transferred. This limitation is fatal for the gestural motions requiring complicated time alignment due to the variety of exaggerated or intentionally performed behaviors. This study introduces a context-based style transfer of gestural motions with neural networks to ensure stable conversion even for exaggerated, dynamically complicated gestures. We present a model based on a vision transformer for transferring gestures' content and style features by time-segmenting them to compose tokens in a latent space. We extend this model to yield the probability of swapping gestures' tokens for style-transferring. A transformer model is suited to semantically consistent matching among gesture tokens, owing to the correlation with spoken words. The compact architecture of our network model requires only a small number of parameters and computational costs, which is suitable for real-time applications with an ordinary device. We introduce loss functions provided by the restoration error of identically and cyclically transferred gesture tokens and the similarity losses of content and style evaluated by splicing features inside the transformer. This design of losses allows unsupervised and zero-shot learning, by which the scalability for motion data is obtained. We comparatively evaluated our style transfer method, mainly focusing on expressive gestures using our dataset captured for various scenarios and styles by introducing new error metrics tailored for gestures. Our experiment showed the superiority of our method in numerical accuracy and stability of style transfer against the existing methods.en_US
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.subjectCCS Concepts: Computing methodologies --> Motion processing
dc.subjectComputing methodologies
dc.subjectMotion processing
dc.titleContext-based Style Transfer of Tokenized Gesturesen_US
dc.description.seriesinformationComputer Graphics Forum
dc.description.sectionheadersLearning
dc.description.volume41
dc.description.number8
dc.identifier.doi10.1111/cgf.14645
dc.identifier.pages305-315
dc.identifier.pages11 pages


Files in this item

Thumbnail

This item appears in the following Collection(s)

  • 41-Issue 8
    ACM SIGGRAPH / Eurographics Symposium on Computer Animation 2022

Show simple item record