Junhao Cheng1,2,
Yuying Ge1,✉,
Yixiao Ge1,
Jing Liao2,
Ying Shan1
1ARC Lab, Tencent PCG,
2City University of Hong Kong
Experience the endless adventure of infinite anime life with AnimeGamer! 🤩
You can step into the shoes of Sosuke from "Ponyo on the Cliff" and interact with a dynamic game world through open-ended language instructions. AnimeGamer generates consistent multi-turn game states, consisting of dynamic animation shots (i.e., videos ) with contextual consistency (e.g., the purple car and the forest background), and updates to character states including stamina, social, and entertainment values.
With AnimeGamer, you can bring together beloved characters like Qiqi from "Qiqi's Delivery Service" and Pazu from "Castle in the Sky" to meet and interact in the anime world. Imagine Pazu mastering Qiqi's broom-flying skills, creating unique and magical experiences. AnimeGamer can generalize interactions between characters from different anime films and character actions, with the potential for endless possibilities.
AnimeGamer is built upon Multimodal Large Language Models (MLLMs) to generate each game state, including dynamic animation shots that depict character movements and updates to character states. The overview of AnimeGamer is as follows. The training process consists of three phases:
- (a) We model animation shots using action-aware multimodal representations through an encoder and train a diffusion-based decoder to reconstruct videos, with the additional input of motion scope that indicates action intensity.
- (b) We train an MLLM to predict the next game state representations by taking the history instructions and game state representations as input.
- (c) We further enhance the quality of decoded animation shots from the MLLM via an adaptation phase, where the decoder is fine-tuned by taking MLLM's predictions as input.
- [2025-04-02] Release wights of models separately trained on "Qiqi's Delivery Service" and "Ponyo on the cliff" 🔥
- [2025-04-02] Release paper in arXiv 🔥🔥🔥
- [2025-04-01] Release inference codes 🔥🔥🔥
- [2025-03-28] Create the repository 🔥🔥🔥
- Release data processing pipeline
- Release training codes
- Release wights of models trained on a mixture of anime films (the same setting as in our paper)
To set up the environment for inference, you can run the following command:
git clone https://github.com/TencentARC/AnimeGamer.git
cd AnimeGamer
conda create -n animegamer python==3.10 -y
conda activate animegamer
pip install -r requirements.txt
cd checkpoints
git clone https://huggingface.co/TencentARC/AnimeGamer
git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
git clone https://huggingface.co/Gluttony10/CogVideoX-2b-sat
To generate action-aware multimodal representations and update character states, you can run:
python inference_MLLM.py
To decode the representations into animation shots, you can run:
python inference_Decoder.py
Change the instructions in ./game_demo
to customize your play.
We refer to CogvideoX and SEED-X to build our codebase. Thanks for their wonderful project.
If you find this work helpful, please consider citing:
TODO