Skip to content

THUDM/slime

Repository files navigation

slime

中文版

Documentation Ask DeepWiki

slime is an LLM post-training framework for RL scaling, providing two core capabilities:

  1. High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
  2. Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

slime is the RL-framework behind GLM-4.5 and GLM-4.6 and apart from models from Z.ai, we also supports the following models:

  • Qwen3 series (Qwen3Next, Qwen3MoE, Qwen3), Qwen2.5 series;
  • DeepSeek V3 series (DeepSeek V3, V3.1, DeepSeek R1);
  • Llama 3.

Blogs

Table of Contents

Architecture Overview

arch

Module Descriptions:

  • training (Megatron): Responsible for the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after training.
  • rollout (SGLang + router): Generates new data (including rewards/verifier outputs) and stores it in the Data Buffer.
  • data buffer: A bridge module that manages prompt initialization, custom data, and rollout generation methods.

Quick Start

For a comprehensive quick start guide covering environment setup, data preparation, training startup, and key code analysis, please refer to:

We also provide examples for some use cases not covered in the quick start guide; please check examples.

Projects Built upon slime

slime has powered several novel research projects and production systems. Here are some notable examples:

⚡ TritonForge: Agentic RL Training Framework for Kernel Generation

TritonForge leverages slime's SFT & RL capabilities to train LLMs that automatically generate optimized GPU kernels. By using a two-stage training approach—supervised fine-tuning followed by reinforcement learning with multi-turn compilation feedback—TritonForge achieves remarkable results in converting PyTorch operations into high-performance Triton kernels.

🚀 APRIL: Accelerating RL Training with Active Partial Rollouts

APRIL introduces a system-level optimization that seamlessly integrates with slime to accelerate the rollout generation phase in RL training. By intelligently over-provisioning requests and actively managing partial completions, APRIL addresses the long-tail generation bottleneck that typically consumes over 90% of RL training time.

These projects showcase slime's versatility—from training code-generation models to optimizing RL training systems—making it a powerful foundation for both research and production deployments.

Arguments Walkthrough

Arguments in slime are divided into three categories:

  1. Megatron arguments: slime reads all arguments set in Megatron via PYTHONPATH. You can configure Megatron by passing arguments like --tensor-model-parallel-size 2.
  2. SGLang arguments: All arguments for the installed SGLang are supported. These arguments must be prefixed with --sglang-. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.
  3. slime-specific arguments: Please refer to: slime/utils/arguments.py

For complete usage instructions, please refer to the Usage Documentation.

Developer Guide

  • Contributions are welcome! If you have suggestions for new features, performance tuning, or feedback on user experience, feel free to submit an Issue or PR 😊

  • Use pre-commit to ensure code style consistency for your commits:

apt install pre-commit -y
pre-commit install

# run pre-commit to ensure code style consistency
pre-commit run --all-files --show-diff-on-failure --color=always

FAQ & Acknowledgements

  • For frequently asked questions, please see the Q&A
  • Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.
  • To quote slime, please use:
@misc{slime_github,
  author       = {Zilin Zhu and Chengxing Xie and Xin Lv and slime Contributors},
  title        = {slime: An LLM post-training framework for RL Scaling},
  year         = {2025},
  howpublished = {\url{https://github.com/THUDM/slime}},
  note         = {GitHub repository. Corresponding author: Xin Lv},
  urldate      = {2025-06-19}
}