Skip to content

[Feature]: Add Character-Level Prefix Routing for Cache-Aware Load Balancers (SGLang) #16144

@brunamagrinidacruz

Description

@brunamagrinidacruz

The Feature

Add support for prefix-based prompt caching in LiteLLM's router.

Motivation, pitch

I am currently using LiteLLM as an API gateway/management layer in front of SGLang Router to help with rate limiting (TPM) and routing to the appropriate endpoints (Router instances). SGLang Router itself supports a cache-aware load balancing strategy.

If my current understanding is correct, LiteLLM's current prompt caching implementation uses message-level exact hash matching. This is a bit different than SGLang which uses a character level prefix matching strategy for cache routing.

I'm wondering if there's any support, or plans to support, a prefix-aware cache routing strategy to help maintain global cache affinity in this scenario. Without this mechanism, requests that would normally result in a cache hit (e.g., from one router to an SGLang worker) might instead be routed to a different router instance, therefore missing the cache.

To address this, is it possible to add support for prefix-aware prompt caching/routing in LiteLLM's router? Instead of only matching exact hashes, the router could:

Use the longest shared prefix of the prompt (character based) to select a worker.
Optionally, expose configuration for prefix match thresholds (as in SGLang).
This would allow better cache affinity and performance when using LiteLLM with SGLang or any prefix-caching backend.

Currently, LiteLLM has a fallback mechanism that, on a cache miss, removes up to the last 3 messages and tries again. I think this may not handle cases where there are structural changes in the conversation (e.g., inserted or deleted in the middle, or shared long system prompts across different sessions).

Having some prefix-aware routing logic however, can help in this scenario.

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions