Perception-as-Control

Official implementation of "Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation" (ICCV 2025)

Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
Yingjie Chen, Yifang Men, Yuan Yao, Miaomiao Cui, Liefeng Bo

💡 Abstract

Motion-controllable image animation is a fundamental task with a wide range of potential applications. Recent works have made progress in controlling camera or object motion via the same 2D motion representations or different control signals, while they still struggle in supporting collaborative camera and object motion control with adaptive control granularity. To this end, we introduce 3D-aware motion representation and propose an image animation framework, called Perception-as-Control, to achieve fine-grained collaborative motion control. Specifically, we construct 3D-aware motion representation from a reference image, manipulate it based on interpreted user intentions, and perceive it from different viewpoints. In this way, camera and object motions are transformed into intuitive, consistent visual changes. Then, the proposed framework leverages the perception results as motion control signals, enabling it to support various motion-related video synthesis tasks in a unified and flexible way. Experiments demonstrate the superiority of the proposed method.

🔥 Updates

(2025-08-04) A gradio demo is released.
(2025-06-27) Our work has been accepted by ICCV 2025 🎉🎉🎉.
(2025-03-31) We release the inference code and model weights of Perception-as-Control.
(2025-03-10) We update a new version of paper with more details.
(2025-01-09) The project page, demo video and technical report are released. The full paper version with more details is in process.

📑 TODO List

Release inference code and model weights
Provide a Gradio demo
Release a DiT version
Release training code

Usage

Environment

$ pip install -r requirements.txt

Pretrained Weights

Download pretrained weights and put them in $INSTALL_DIR/pretrained_weights.
Download pretrained weight of based models and put them in $INSTALL_DIR/pretrained_weights:

The pretrained weights are organized as follows:

./pretrained_weights/
|-- denoising_unet.pth
|-- reference_unet.pth
|-- cam_encoder.pth
|-- obj_encoder.pth
|-- sd-vae-ft-mse
|   |-- ...
|-- sd-image-variations-diffusers
|   |-- ...

Inference

$ python inference.py

The results will be saved in $INSTALL_DIR/outputs.

Run Gradio Demo

$ python run_gradio.py

🎥 Demo

Fine-grained collaborative motion control

Camera Motion Control

Object Motion Control

Collaborative Motion Control

basic_cam.mp4	multi-instance_obj.mp4	large_collab.mp4
abitrary_cam.mp4	fine-grained_obj.mp4	subtle_collab.mp4

Potential applications

Motion Generation motion_generation.mp4	Motion Clone motion_clone.mp4
Motion Transfer motion_transfer.mp4	Motion Editing motion_editing.mp4

For more details, please refer to our project page.

🔗 Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{chen2025perception,
  title={Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation},
  author={Chen, Yingjie and Men, Yifang and Yao, Yuan and Cui, Miaomiao and Bo, Liefeng},
  journal={arXiv preprint arXiv:2501.05020},
  website={https://chen-yingjie.github.io/projects/Perception-as-Control/index.html},
  year={2025}}

Acknowledgements

We would like to thank the contributors to Moore-AnimateAnyone, Depth-Anything-V2, SpaTracker, Tartanvo, diffusers for their open research and exploration.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
cam_patterns		cam_patterns
configs		configs
depth_anything_v2		depth_anything_v2
examples		examples
mesh		mesh
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
cam_utils.py		cam_utils.py
inference.py		inference.py
requirements.txt		requirements.txt
run_gradio.py		run_gradio.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Perception-as-Control

💡 Abstract

🔥 Updates

📑 TODO List

Usage

Environment

Pretrained Weights

Inference

Run Gradio Demo

🎥 Demo

Fine-grained collaborative motion control

Potential applications

🔗 Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

chen-yingjie/Perception-as-Control

Folders and files

Latest commit

History

Repository files navigation

Perception-as-Control

💡 Abstract

🔥 Updates

📑 TODO List

Usage

Environment

Pretrained Weights

Inference

Run Gradio Demo

🎥 Demo

Fine-grained collaborative motion control

Potential applications

🔗 Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages