[!New] Our paper: Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing is accepted by ICML 2025 arxiv link.
Our code is build upon open-sora (https://github.com/hpcaitech/Open-Sora), with the following features
-
autoregressive video generation, i.e., generating subsequent clips conditioned on last frames of previous clip
-
calsual generaion (by causal temporal attention)
-
cache sharing, the kv-cache is shared across all the denoising steps. This is differnet to the kv-cache implementation in live2diff
-
kv-cache queue, i.e., autoregressive generation without the redundant computation of overlapped conditional frames. the old kv-cache will be deququed
-
cyclic temporal positional embeddings (TPEs). i.e., we use cyclic shift to support the kv-cache queue
-
the key difference of our implementation compared to live2diff
- our kv-cache is shared across all the denoising steps. They store the kv-cache for all the denoising steps
- we use a cache queue structure to support the autoregressive generation, facilitated by the cyclic-TPEs
an overfiting demo
bash scripts/train.sh \
configs/causal_stdit/train_overfit_beach_demo.py \
overfit_demo \
9686 0
SkyTimelapse demo
bash scripts/train.sh \
configs/causal_stdit/train_SkyTimelapse_demo.py \
skytimelapse_demo \
9686 0
refer to scripts/train.sh
to config the ROOT_DATA_DIR
The code is preparing
If you find this code is helpful, please cite our paper:
@inproceedings{gao2025ca2,
title={Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing},
author={Gao, Kaifeng and Shi, Jiaxin and Zhang, Hanwang and Wang, Chunping and Xiao, Jun and Chen, Long},
booktitle={ICML},
year={2025},
organization={PMLR}
}