Skip to content

NVlabs/rcm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rCM: Score-Regularized Continuous-Time Consistency Model
🚀SOTA Diffusion Distillation & Few-Step Video Generation

   

Overview

rCM is the first work that:

  • Scales up continuous-time consistency distillation (e.g., sCM/MeanFlow) to 10B+ parameter video diffusion models.
  • Provides open-sourced FlashAttention-2 Jacobian-vector product (JVP) kernel with support for parallelisms like FSDP/CP.
  • Identifies the quality bottleneck of sCM and overcomes it via a forward–reverse divergence joint distillation framework.
  • Delivers models that generate videos with both high quality and strong diversity in only 2~4 steps.

Comparison with Other Diffusion Distillation Methods on Wan2.1 T2V 1.3B (4-step)

sCM DMD2 rCM (Ours)
Wan1.3B-sCM-4step.mp4
Wan1.3B-DMD2-4step.mp4
Wan1.3B-rCM-4step.mp4

rCM achieves both high quality and strong diversity.

Performance under Fewer (1~2) Steps

1-step 2-step 4-step
1step.mp4
2step.mp4
4step.mp4

5 Random Videos with Distilled Wan2.1 T2V 14B (4-step)

hotpot.mp4

Getting Started

This codebase is built on top of Cosmos-Predict2. Please follow its environment setup instructions.

Inference

Below is an example inference script for running rCM on T2V:

# Basic usage:
#   PYTHONPATH=. python rcm/inference/wan2pt1_t2v_rcm_infer.py [arguments]

# Arguments:
# --model_size         Model size: "1.3B" or "14B" (default: 1.3B)
# --num_samples        Number of videos to generate (default: 1)
# --num_steps          Sampling steps, 1–4 (default: 4)
# --sigma_max          Initial sigma for rCM (default: 80); larger choices (e.g., 1600) reduce diversity but may enhance quality
# --dit_path           Path to the distilled DiT model checkpoint (REQUIRED for inference)
# --vae_path           Path to Wan2.1 VAE (default: checkpoints/Wan2.1_VAE.pth)
# --text_encoder_path  Path to umT5 text encoder (default: checkpoints/models_t5_umt5-xxl-enc-bf16.pth)
# --prompt             Text prompt for video generation (default: A stylish woman walks down a Tokyo street...)
# --resolution         Output resolution, e.g. "480p", "720p" (default: 480p)
# --aspect_ratio       Aspect ratio in W:H format (default: 16:9)
# --seed               Random seed for reproducibility (default: 0)
# --save_path          Output file path including extension (default: output/generated_video.mp4)


# Example
PYTHONPATH=.  python rcm/inference/wan2pt1_t2v_rcm_infer.py \
    --dit_path checkpoints/rCM_Wan2.1_T2V_1.3B_480p.pt \
    --num_samples 5 \
    --prompt "A cinematic shot of a snowy mountain at sunrise"

See Wan examples for additional usage examples.

Training

The full distillation pipeline still requires refactoring. We provide essential reference code for key components:

  • FlashAttention-2 JVP kernel: rcm/utils/flash_attention_jvp_triton.py
  • JVP-adapted Wan2.1 student network: rcm/networks/wan2pt1_jvp.py
  • Training: rcm/models/t2v_model_distill_rcm.py

Future Directions

There are promising directions to explore based on rCM. For example:

  • Few-step distilled models lag behind the teacher in aspects such as physical consistency; this can potentially be improved via reward-based post-training.
  • The forward–reverse divergence joint distillation framework of rCM could be extended to autoregressive video diffusion.

Acknowledgement

We thank the Cosmos-Predict2 project for providing the awesome open-source video diffusion training codebase.

Citation

@article{zheng2025rcm,
  title={Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency},
  author={Zheng, Kaiwen and Wang, Yuji and Ma, Qianli and Chen, Huayu and Zhang, Jintao and Balaji, Yogesh and Chen, Jianfei and Liu, Ming-Yu and Zhu, Jun and Zhang, Qinsheng},
  journal={arXiv preprint arXiv:2510.08431},
  year={2025}
}

About

rCM: SOTA Diffusion Distillation & Few-Step Video Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages