MultiCOIN: Multi-Modal COntrollable INbetweening

Tanveer, Maham; Zhou, Yang; Niklaus, Simon; Mahdavi Amiri, Ali; Zhang, Hao (Richard); Singh, Krishna Kumar; Zhao, Nanxuan

MultiCOIN: Multi-Modal COntrollable INbetweening

Files

cgf70362.pdf (2.6 MB)

Date

2026

Authors

Tanveer, Maham
Zhou, Yang
Niklaus, Simon
Mahdavi Amiri, Ali
Zhang, Hao (Richard)
Singh, Krishna Kumar
Zhao, Nanxuan

Publisher

The Eurographics Association and John Wiley & Sons Ltd.

Abstract

Video inbetweening creates smooth transitions between two frames making it an indispensable tool for video editing and longform video synthesis. Existing methods struggle with large or complex motion and offer limited control over intermediate frames, often misaligning with user intent. We introduce MultiCOIN, a video inbetweening framework supporting multi-modal controls, including depth transitions and layering, motion trajectories, text prompts, and target regions for movement localization. It balances flexibility, usability, and fine-grained precision. Built on a Diffusion Transformer (DiT), due to its proven capability to generate high-quality long video, our model maps all motion controls into a unified sparse point-based representation compatible with the denoising process. Further, to respect the variety of controls which operate at varying levels of granularity and influence, we separate content and motion into two branches, enabling dedicated generators for each. A stage-wise training strategy ensures stable learning of multi-modal controls. Extensive experiments show improved motion complexity, controllability, and narrative consistency. Project Page: MultiCOIN.

        @article{10.1111:cgf.70362
,
journal = {Computer Graphics Forum},
title = {{MultiCOIN: Multi-Modal COntrollable INbetweening
}},
author = {Tanveer, Maham and 
Zhou, Yang and 
Niklaus, Simon and 
Mahdavi Amiri, Ali and 
Zhang, Hao (Richard) and 
Singh, Krishna Kumar and 
Zhao, Nanxuan
},
year = {2026
},
publisher = {The Eurographics Association and John Wiley & Sons Ltd.
},
ISSN = {1467-8659
},
DOI = {10.1111/cgf.70362
}
}

URI

https://diglib.eg.org/handle/10.1111/cgf70362
https://doi.org/10.1111/cgf.70362

Collections

45-Issue 2
EG 2026 - Full Papers - CGF 45-Issue 2

Full item page