Skip to content

AmberLJC/LLMSys-PaperList

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

Awesome LLM Systems Papers

A curated list of Large Language Model systems related academic papers, articles, tutorials, slides and projects. Star this repository, and then you can keep abreast of the latest developments of this booming research field.

Table of Contents

LLM Systems

Training

Systems for Post-training / RLHF

Fault Tolerance / Straggler Mitigation

Serving

Compound AI Systems

Serving at the edge

System Efficiency Optimization - Model Co-design

Multi-Modal Systems

LLM for Systems

Industrial LLM Technical Report

LLM Frameworks

Training

  • DeepSpeed: a deep learning optimization library that makes distributed training and inference easy, efficient, and effective | Microsoft

  • Accelerate | Hugging Face

  • LLaVA

  • Megatron | Nvidia

  • NeMo | Nvidia

  • torchtitan | PyTorch

  • veScale | ByteDance

  • DeepSeek Open Infra

  • VeOmni: Scaling any Modality Model Training

  • Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware | UMich

  • Post-Training

    • TRL: Transformers Reinforcement Learning
    • OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray
    • VeRL: Volcano Engine Reinforcement Learning for LLMs
    • rLLM: Reinforcement Learning for Language Agents
    • SkyRL: A Modular Full-stack RL Library for LLMs
    • AReal: Distributed RL System for LLM Reasoning
    • ROLL: Reinforcement Learning Optimization for Large-Scale Learning
    • slime: a LLM post-training framework aiming for RL Scaling
    • RAGEN: Training Agents by Reinforcing Reasoning

Serving

Survey Paper

LLM Benchmark / Leaderboard ? Traces

Related ML Readings

MLSys Courses

Other Reading

About

Large Language Model (LLM) Systems Paper List

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 9