Context-based Style Transfer of Tokenized Gestures

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The Eurographics Association and John Wiley & Sons Ltd.
Gestural animations in the amusement or entertainment field often require rich expressions; however, it is still challenging to synthesize characteristic gestures automatically. Although style transfer based on a neural network model is a potential solution, existing methods mainly focus on cyclic motions such as gaits and require re-training in adding new motion styles. Moreover, their per-pose transformation cannot consider the time-dependent features, and therefore motion styles of different periods and timings are difficult to be transferred. This limitation is fatal for the gestural motions requiring complicated time alignment due to the variety of exaggerated or intentionally performed behaviors. This study introduces a context-based style transfer of gestural motions with neural networks to ensure stable conversion even for exaggerated, dynamically complicated gestures. We present a model based on a vision transformer for transferring gestures' content and style features by time-segmenting them to compose tokens in a latent space. We extend this model to yield the probability of swapping gestures' tokens for style-transferring. A transformer model is suited to semantically consistent matching among gesture tokens, owing to the correlation with spoken words. The compact architecture of our network model requires only a small number of parameters and computational costs, which is suitable for real-time applications with an ordinary device. We introduce loss functions provided by the restoration error of identically and cyclically transferred gesture tokens and the similarity losses of content and style evaluated by splicing features inside the transformer. This design of losses allows unsupervised and zero-shot learning, by which the scalability for motion data is obtained. We comparatively evaluated our style transfer method, mainly focusing on expressive gestures using our dataset captured for various scenarios and styles by introducing new error metrics tailored for gestures. Our experiment showed the superiority of our method in numerical accuracy and stability of style transfer against the existing methods.

CCS Concepts: Computing methodologies --> Motion processing

, journal = {Computer Graphics Forum}, title = {{
Context-based Style Transfer of Tokenized Gestures
}}, author = {
Kuriyama, Shigeru
Mukai, Tomohiko
Taketomi, Takafumi
Mukasa, Tomoyuki
}, year = {
}, publisher = {
The Eurographics Association and John Wiley & Sons Ltd.
}, ISSN = {
}, DOI = {
} }