Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.14864 (cs)

[Submitted on 23 May 2024]

Title:Video Diffusion Models are Training-free Motion Interpreter and Controller

Authors:Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang Pan

Abstract:Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, demands substantial training resources and necessitates retraining for diverse models. Crucially, these approaches do not explore how video diffusion models encode cross-frame motion information in their features, lacking interpretability and transparency in their effectiveness. To answer this question, this paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models. Through analysis using Principal Component Analysis (PCA), our work discloses that robust motion-aware feature already exists in video diffusion models. We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels. MOFT provides a distinct set of benefits, including the ability to encode comprehensive motion information with clear interpretability, extraction without the need for training, and generalizability across diverse architectures. Leveraging MOFT, we propose a novel training-free video motion control framework. Our method demonstrates competitive performance in generating natural and faithful motion, providing architecture-agnostic insights and applicability in a variety of downstream tasks.

Comments:	Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2405.14864 [cs.CV]
	(or arXiv:2405.14864v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.14864

Submission history

From: Zeqi Xiao [view email]
[v1] Thu, 23 May 2024 17:59:40 UTC (10,647 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Video Diffusion Models are Training-free Motion Interpreter and Controller

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Video Diffusion Models are Training-free Motion Interpreter and Controller

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators