Implement LRU eviction policy for LoRA adapters #11041

ConnorLi96 · 2025-09-29T01:29:39Z

Motivation

For addressing this feature: [Feature] (2/2) Support LRU cache for LoRA eviction

This PR implements a configurable LRU (Least Recently Used) eviction policy for LoRA adapters to provide more intelligent memory management. Currently, SGLang only supports FIFO eviction, which may not be optimal for workloads where certain LoRA adapters are accessed more frequently than others. The LRU policy ensures that frequently used adapters remain in memory while less recently used ones are evicted first, potentially improving cache hit rates and overall performance.

Modifications

New eviction policy framework: Added eviction_policy.py module with abstract EvictionPolicy class
LRU implementation: Implemented LRUEvictionPolicy using OrderedDict for O(1) access tracking
FIFO compatibility: Maintained existing FIFOEvictionPolicy for backward compatibility
Server configuration: Added --lora-eviction-policy argument to ServerArgs with choices ["fifo", "lru"]
Memory pool integration: Updated LoRAMemoryPool to use configurable eviction policies
Manager coordination: Modified LoRAManager to pass eviction policy to memory pool
Test runner support: Extended SRTRunner to accept eviction policy parameter
Adapter pinning: Implemented mechanism to prevent specific adapters from being evicted

All changes maintain full backward compatibility with default FIFO behavior.

Accuracy Tests

This PR does not affect model outputs or inference accuracy.

Benchmarking and Profiling

The LRU eviction policy is designed to improve cache efficiency for workloads with non-uniform adapter access patterns. Performance impact is minimal:

Memory overhead: Negligible (OrderedDict for tracking access order)
Eviction decision: O(1) time complexity for both LRU and FIFO
Access tracking: O(1) time complexity per adapter use

Detailed benchmarking will be conducted with realistic workloads in future testing.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-09-29T01:29:54Z

Summary of Changes

Hello @ConnorLi96, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the memory management for LoRA adapters within SGLang by introducing a configurable Least Recently Used (LRU) eviction policy. This new policy aims to optimize cache efficiency by prioritizing frequently accessed adapters, keeping them in memory longer than less used ones. The changes involve a new modular framework for eviction policies, integration into the LoRA memory pool, and a command-line option for users to select their preferred policy, all while ensuring backward compatibility with the existing FIFO behavior.

Highlights

New Eviction Policy Framework: Introduced 'eviction_policy.py' with an abstract 'EvictionPolicy' class, allowing for flexible eviction strategies beyond FIFO.
LRU Implementation: Added 'LRUEvictionPolicy' utilizing 'OrderedDict' for efficient O(1) access tracking and eviction decisions, improving cache efficiency.
Configurable Policy Selection: Integrated a '--lora-eviction-policy' argument into 'ServerArgs' to allow users to choose between 'fifo' (default) and 'lru' policies.
Memory Pool Integration: Updated 'LoRAMemoryPool' to dynamically use the selected eviction policy, replacing the previous hardcoded FIFO logic and enhancing memory management.
Adapter Pinning Support: Enhanced the eviction mechanism to respect pinned LoRA adapters, preventing them from being evicted prematurely.
Backward Compatibility: Ensured that the default behavior remains FIFO, maintaining compatibility with existing setups while offering the new LRU option.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a configurable LRU eviction policy for LoRA adapters, which is a great enhancement for managing memory more intelligently. The implementation is well-structured, introducing a new eviction policy framework and integrating it cleanly into the existing LoRAMemoryPool and LoRAManager. The changes maintain backward compatibility by defaulting to the existing FIFO policy. My review includes a minor suggestion to improve code conciseness in the eviction logic.

gemini-code-assist · 2025-09-29T01:32:05Z

python/sglang/srt/lora/mem_pool.py

+            candidates = set()
+            pinned_uids = set()
+
            for buffer_id in range(self.max_loras_per_batch):
                uid = self.buffer_id_to_uid[buffer_id]
+                if uid not in cur_uids and uid is not None:
+                    candidates.add(uid)
+                    lora_ref = lora_refs.get(uid)
+                    if lora_ref is not None and lora_ref.pinned:
+                        pinned_uids.add(uid)


The logic for collecting eviction candidates can be made more concise and readable. Using a comprehension to build a list of candidate info first, then creating the candidates and pinned_uids sets from it, can make the code more declarative and easier to follow.

all_candidates = [ (uid, lora_refs.get(uid)) for uid in self.buffer_id_to_uid if uid not in cur_uids and uid is not None ] candidates = {uid for uid, _ in all_candidates} pinned_uids = {uid for uid, ref in all_candidates if ref and ref.pinned}

python/sglang/srt/lora/eviction_policy.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/lora/mem_pool.py

python/sglang/srt/lora/eviction_policy.py

test/srt/lora/test_lora_eviction_policy.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/lora/mem_pool.py

python/sglang/srt/lora/eviction_policy.py

test/srt/lora/test_lora_eviction_policy.py

python/sglang/srt/lora/lora_manager.py

python/sglang/srt/lora/mem_pool.py

python/sglang/srt/lora/eviction_policy.py

test/srt/lora/test_lora_eviction_policy.py

lifuhuang

LGTM. Great work!

ConnorLi96 · 2025-10-07T06:37:17Z

LGTM. Great work!

nice, thank you so much for the guidance all the way! Can we add run-ci label for this PR? or we can just merge it directly.

ConnorLi96 requested review from Fridge003, Ying1123 and lifuhuang as code owners September 29, 2025 01:29

gemini-code-assist bot reviewed Sep 29, 2025

View reviewed changes

zhyncs assigned lifuhuang and Fridge003 Sep 29, 2025

zhyncs added the high priority label Sep 29, 2025

lifuhuang reviewed Sep 29, 2025

View reviewed changes

python/sglang/srt/lora/eviction_policy.py Outdated Show resolved Hide resolved

lifuhuang reviewed Sep 29, 2025

View reviewed changes

python/sglang/srt/lora/lora_manager.py Show resolved Hide resolved

lifuhuang reviewed Sep 29, 2025

View reviewed changes

python/sglang/srt/lora/mem_pool.py Outdated Show resolved Hide resolved

lifuhuang reviewed Sep 29, 2025

View reviewed changes

python/sglang/srt/lora/eviction_policy.py Show resolved Hide resolved

ConnorLi96 force-pushed the feature/sglang_lora_lru branch from 90b9bf5 to af09896 Compare September 30, 2025 01:28

ConnorLi96 added 3 commits September 29, 2025 20:01

Implement LRU eviction policy for LoRA adapters

43dfe76

feat: Add LRU eviction policy with comprehensive unit tests

06eaf34

style: Fix code formatting and spelling issues

8e7afe1

ConnorLi96 force-pushed the feature/sglang_lora_lru branch from af09896 to 8e7afe1 Compare September 30, 2025 03:04

change default fifo to lru

e6ae718

ConnorLi96 requested a review from lifuhuang September 30, 2025 07:03

lifuhuang reviewed Sep 30, 2025

View reviewed changes

ConnorLi96 added 3 commits September 30, 2025 23:19

address comments and optmize code

77d8051

correct the engine command help=

65628b5

support integration test

3d6fdd2

ConnorLi96 requested a review from lifuhuang October 1, 2025 21:59

ConnorLi96 commented Oct 1, 2025

View reviewed changes

test/srt/lora/test_lora_eviction_policy.py Outdated Show resolved Hide resolved

ConnorLi96 added 2 commits October 2, 2025 23:48

evict None base model

f1c446e

add DEFAULT_LORA_EVICTION_POLICY

bfe931f

lifuhuang reviewed Oct 4, 2025

View reviewed changes

ConnorLi96 added 2 commits October 6, 2025 17:33

address some comments

76290d0

add more unit tests

a0b3a74

ConnorLi96 requested a review from lifuhuang October 7, 2025 02:03

ConnorLi96 and others added 3 commits October 6, 2025 19:05

Merge branch 'main' into feature/sglang_lora_lru

15e1d8a

Update server_args.py

03584ba

fix format

ca58c7f

lifuhuang approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'main' into feature/sglang_lora_lru

11a6a1b

Merge branch 'main' into feature/sglang_lora_lru

721421e

Fridge003 added the run-ci label Oct 7, 2025

ConnorLi96 and others added 4 commits October 10, 2025 15:45

fix wrong import

f221f60

Merge branch 'main' into feature/sglang_lora_lru

0921776

Merge branch 'main' into feature/sglang_lora_lru

56efbfb

Merge branch 'main' into feature/sglang_lora_lru

94b8bd2

Implement LRU eviction policy for LoRA adapters #11041

Are you sure you want to change the base?

Implement LRU eviction policy for LoRA adapters #11041

Conversation

ConnorLi96 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Sep 29, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lifuhuang left a comment

Choose a reason for hiding this comment

Uh oh!

ConnorLi96 commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ConnorLi96 commented Sep 29, 2025 •

edited

Loading