Skip to content

Conversation

ConnorLi96
Copy link
Contributor

@ConnorLi96 ConnorLi96 commented Sep 29, 2025

Motivation

For addressing this feature: [Feature] (2/2) Support LRU cache for LoRA eviction

This PR implements a configurable LRU (Least Recently Used) eviction policy for LoRA adapters to provide more intelligent memory management. Currently, SGLang only supports FIFO eviction, which may not be optimal for workloads where certain LoRA adapters are accessed more frequently than others. The LRU policy ensures that frequently used adapters remain in memory while less recently used ones are evicted first, potentially improving cache hit rates and overall performance.

Modifications

  • New eviction policy framework: Added eviction_policy.py module with abstract EvictionPolicy class
  • LRU implementation: Implemented LRUEvictionPolicy using OrderedDict for O(1) access tracking
  • FIFO compatibility: Maintained existing FIFOEvictionPolicy for backward compatibility
  • Server configuration: Added --lora-eviction-policy argument to ServerArgs with choices ["fifo", "lru"]
  • Memory pool integration: Updated LoRAMemoryPool to use configurable eviction policies
  • Manager coordination: Modified LoRAManager to pass eviction policy to memory pool
  • Test runner support: Extended SRTRunner to accept eviction policy parameter
  • Adapter pinning: Implemented mechanism to prevent specific adapters from being evicted

All changes maintain full backward compatibility with default FIFO behavior.

Accuracy Tests

This PR does not affect model outputs or inference accuracy.

Benchmarking and Profiling

The LRU eviction policy is designed to improve cache efficiency for workloads with non-uniform adapter access patterns. Performance impact is minimal:

  • Memory overhead: Negligible (OrderedDict for tracking access order)
  • Eviction decision: O(1) time complexity for both LRU and FIFO
  • Access tracking: O(1) time complexity per adapter use

Detailed benchmarking will be conducted with realistic workloads in future testing.

Checklist

Copy link
Contributor

Summary of Changes

Hello @ConnorLi96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the memory management for LoRA adapters within SGLang by introducing a configurable Least Recently Used (LRU) eviction policy. This new policy aims to optimize cache efficiency by prioritizing frequently accessed adapters, keeping them in memory longer than less used ones. The changes involve a new modular framework for eviction policies, integration into the LoRA memory pool, and a command-line option for users to select their preferred policy, all while ensuring backward compatibility with the existing FIFO behavior.

Highlights

  • New Eviction Policy Framework: Introduced 'eviction_policy.py' with an abstract 'EvictionPolicy' class, allowing for flexible eviction strategies beyond FIFO.
  • LRU Implementation: Added 'LRUEvictionPolicy' utilizing 'OrderedDict' for efficient O(1) access tracking and eviction decisions, improving cache efficiency.
  • Configurable Policy Selection: Integrated a '--lora-eviction-policy' argument into 'ServerArgs' to allow users to choose between 'fifo' (default) and 'lru' policies.
  • Memory Pool Integration: Updated 'LoRAMemoryPool' to dynamically use the selected eviction policy, replacing the previous hardcoded FIFO logic and enhancing memory management.
  • Adapter Pinning Support: Enhanced the eviction mechanism to respect pinned LoRA adapters, preventing them from being evicted prematurely.
  • Backward Compatibility: Ensured that the default behavior remains FIFO, maintaining compatibility with existing setups while offering the new LRU option.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable LRU eviction policy for LoRA adapters, which is a great enhancement for managing memory more intelligently. The implementation is well-structured, introducing a new eviction policy framework and integrating it cleanly into the existing LoRAMemoryPool and LoRAManager. The changes maintain backward compatibility by defaulting to the existing FIFO policy. My review includes a minor suggestion to improve code conciseness in the eviction logic.

Comment on lines 204 to 213
candidates = set()
pinned_uids = set()

for buffer_id in range(self.max_loras_per_batch):
uid = self.buffer_id_to_uid[buffer_id]
if uid not in cur_uids and uid is not None:
candidates.add(uid)
lora_ref = lora_refs.get(uid)
if lora_ref is not None and lora_ref.pinned:
pinned_uids.add(uid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for collecting eviction candidates can be made more concise and readable. Using a comprehension to build a list of candidate info first, then creating the candidates and pinned_uids sets from it, can make the code more declarative and easier to follow.

            all_candidates = [
                (uid, lora_refs.get(uid))
                for uid in self.buffer_id_to_uid
                if uid not in cur_uids and uid is not None
            ]
            candidates = {uid for uid, _ in all_candidates}
            pinned_uids = {uid for uid, ref in all_candidates if ref and ref.pinned}

@ConnorLi96 ConnorLi96 force-pushed the feature/sglang_lora_lru branch from 90b9bf5 to af09896 Compare September 30, 2025 01:28
@ConnorLi96 ConnorLi96 force-pushed the feature/sglang_lora_lru branch from af09896 to 8e7afe1 Compare September 30, 2025 03:04
@ConnorLi96 ConnorLi96 requested a review from lifuhuang October 1, 2025 21:59
@ConnorLi96 ConnorLi96 requested a review from lifuhuang October 7, 2025 02:03
Copy link
Collaborator

@lifuhuang lifuhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work!

@ConnorLi96
Copy link
Contributor Author

LGTM. Great work!

nice, thank you so much for the guidance all the way! Can we add run-ci label for this PR? or we can just merge it directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants