VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

Zargarbashi, Fatemeh; Agrawal, Dhruv; Buhmann, Jakob; Guay, Martin; Coros, Stelian; Sumner, Robert W.

VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

dc.contributor.author	Zargarbashi, Fatemeh
dc.contributor.author	Agrawal, Dhruv
dc.contributor.author	Buhmann, Jakob
dc.contributor.author	Guay, Martin
dc.contributor.author	Coros, Stelian
dc.contributor.author	Sumner, Robert W.
dc.contributor.editor	Masia, Belen
dc.contributor.editor	Thies, Justus
dc.date.accessioned	2026-04-17T12:41:46Z
dc.date.available	2026-04-17T12:41:46Z
dc.date.issued	2026
dc.description.abstract	Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model. We propose a novel method for effective disentanglement of the style and content in human motion data to facilitate style transfer. Our approach is guided by the insight that content corresponds to coarse motion attributes while style captures the finer, expressive details. To model this hierarchy, we employ Residual Vector Quantized Variational Autoencoders (RVQ-VAEs) to learn a coarse-to-fine representation of motion. We further enhance the disentanglement by integrating codebook learning with contrastive learning and a novel information leakage loss to organize the content and the style across different codebooks. We harness this disentangled representation using our simple and effective inference-time technique Quantized Code Swapping, which enables motion style transfer without requiring any fine-tuning for unseen styles. Our framework demonstrates strong versatility across multiple inference applications, including style transfer, style removal, and motion blending.
dc.description.number	2
dc.description.sectionheaders	Motion in the Wild: From Individuals to Crowds
dc.description.seriesinformation	Computer Graphics Forum
dc.description.volume	45
dc.identifier.doi	10.1111/cgf.70377
dc.identifier.issn	1467-8659
dc.identifier.pages	14 pages
dc.identifier.uri	https://diglib.eg.org/handle/10.1111/cgf70377
dc.identifier.uri	https://doi.org/10.1111/cgf70377
dc.publisher	The Eurographics Association and John Wiley & Sons Ltd.
dc.rights	CC-BY-4.0
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Animation
dc.subject	Learning latent representations
dc.title	VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

Files

Original bundle

Now showing 1 - 2 of 2

Name:: cgf70377.pdf
Size:: 27.26 MB
Format:: Adobe Portable Document Format

Download

Name:: paper1269_1_mm4.mp4
Size:: 100.04 MB
Format:: Video MP4

Download

Collections

45-Issue 2
EG 2026 - Full Papers - CGF 45-Issue 2