rCM: Score-Regularized Continuous-Time Consistency Model
🚀SOTA Diffusion Distillation & Few-Step Video Generation

Overview

rCM is the first work that:

Scales up continuous-time consistency distillation (e.g., sCM/MeanFlow) to 10B+ parameter video diffusion models.
Provides open-sourced FlashAttention-2 Jacobian-vector product (JVP) kernel with support for parallelisms like FSDP/CP.
Identifies the quality bottleneck of sCM and overcomes it via a forward–reverse divergence joint distillation framework.
Delivers models that generate videos with both high quality and strong diversity in only 2~4 steps.

Comparison with Other Diffusion Distillation Methods on Wan2.1 T2V 1.3B (4-step)

sCM	DMD2	rCM (Ours)
Wan1.3B-sCM-4step.mp4	Wan1.3B-DMD2-4step.mp4	Wan1.3B-rCM-4step.mp4

rCM achieves both high quality and strong diversity.

Performance under Fewer (1~2) Steps

1-step	2-step	4-step
1step.mp4	2step.mp4	4step.mp4

5 Random Videos with Distilled Wan2.1 T2V 14B (4-step)

hotpot.mp4

Getting Started

This codebase is built on top of Cosmos-Predict2. Please follow its environment setup instructions.

Inference

Below is an example inference script for running rCM on T2V:

# Basic usage:
#   PYTHONPATH=. python rcm/inference/wan2pt1_t2v_rcm_infer.py [arguments]

# Arguments:
# --model_size         Model size: "1.3B" or "14B" (default: 1.3B)
# --num_samples        Number of videos to generate (default: 1)
# --num_steps          Sampling steps, 1–4 (default: 4)
# --sigma_max          Initial sigma for rCM (default: 80); larger choices (e.g., 1600) reduce diversity but may enhance quality
# --dit_path           Path to the distilled DiT model checkpoint (REQUIRED for inference)
# --vae_path           Path to Wan2.1 VAE (default: checkpoints/Wan2.1_VAE.pth)
# --text_encoder_path  Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth)
# --prompt             Text prompt for video generation (default: A stylish woman walks down a Tokyo street...)
# --resolution         Output resolution, e.g. "480p", "720p" (default: 480p)
# --aspect_ratio       Aspect ratio in W:H format (default: 16:9)
# --seed               Random seed for reproducibility (default: 0)
# --save_path          Output file path including extension (default: output/generated_video.mp4)


# Example
PYTHONPATH=.  python rcm/inference/wan2pt1_t2v_rcm_infer.py \
    --dit_path checkpoints/rCM_Wan2.1_T2V_1.3B_480p.pt \
    --num_samples 5 \
    --prompt "A cinematic shot of a snowy mountain at sunrise"

See Wan examples for additional usage examples.

Training

The full distillation pipeline still requires refactoring. We provide essential reference code for key components:

FlashAttention-2 JVP kernel: rcm/utils/flash_attention_jvp_triton.py
JVP-adapted Wan2.1 student network: rcm/networks/wan2pt1_jvp.py
Training: rcm/models/t2v_model_distill_rcm.py

Future Directions

There are promising directions to explore based on rCM. For example:

Few-step distilled models lag behind the teacher in aspects such as physical consistency; this can potentially be improved via reward-based post-training.
The forward–reverse divergence joint distillation framework of rCM could be extended to autoregressive video diffusion.

Acknowledgement

We thank the Cosmos-Predict2 project for providing the awesome open-source video diffusion training codebase.

Citation

@article{zheng2025rcm,
  title={Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency},
  author={Zheng, Kaiwen and Wang, Yuji and Ma, Qianli and Chen, Huayu and Zhang, Jintao and Balaji, Yogesh and Chen, Jianfei and Liu, Ming-Yu and Zhu, Jun and Zhang, Qinsheng},
  journal={arXiv preprint arXiv:2510.08431},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
imaginaire		imaginaire
rcm		rcm
LICENSE.txt		LICENSE.txt
README.md		README.md
Wan.md		Wan.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

rCM: Score-Regularized Continuous-Time Consistency Model
🚀SOTA Diffusion Distillation & Few-Step Video Generation

Overview

Comparison with Other Diffusion Distillation Methods on Wan2.1 T2V 1.3B (4-step)

Performance under Fewer (1~2) Steps

5 Random Videos with Distilled Wan2.1 T2V 14B (4-step)

Getting Started

Inference

Training

Future Directions

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

NVlabs/rcm

Folders and files

Latest commit

History

Repository files navigation

rCM: Score-Regularized Continuous-Time Consistency Model 🚀SOTA Diffusion Distillation & Few-Step Video Generation

Overview

Comparison with Other Diffusion Distillation Methods on Wan2.1 T2V 1.3B (4-step)

Performance under Fewer (1~2) Steps

5 Random Videos with Distilled Wan2.1 T2V 14B (4-step)

Getting Started

Inference

Training

Future Directions

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

rCM: Score-Regularized Continuous-Time Consistency Model
🚀SOTA Diffusion Distillation & Few-Step Video Generation

Packages