LLM Semantic Router

Auto-selection of model

An Envoy External Processor (ExtProc) that acts as an external Mixture-of-Models (MoM) router. It intelligently directs OpenAI API requests to the most suitable backend model from a defined pool based on semantic understanding of the request's intent. This is achieved using BERT classification. Conceptually similar to Mixture-of-Experts (MoE) which lives within a model, this system selects the best entire model for the nature of the task.

As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:

The detailed design doc can be found here.

The screenshot below shows the LLM Router dashboard in Grafana.

The router is implemented in two ways: Golang (with Rust FFI based on Candle) and Python. Benchmarking will be conducted to determine the best implementation.

Auto-selection of tools

Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.

PII detection

Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.

Prompt guard

Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.

Semantic cache

Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.

Usage

Prerequisites

Rust
Envoy
Huggingface CLI

Run the Envoy Proxy

This listens for incoming requests and uses the ExtProc filter.

make run-envoy

Download the models

make download-models

Run the Semantic Router (Go Implementation)

This builds the Rust binding and the Go router, then starts the ExtProc gRPC server that Envoy communicates with.

make run-router

Once both Envoy and the router are running, you can test the routing logic using predefined prompts:

# Test the tools auto-selection
make test-tools

# Test the auto-selection of model
make test-prompt

# Test the prompt guard
make test-prompt-guard

# Test the PII detection
make test-pii

This will send curl requests simulating different types of user prompts (Math, Creative Writing, General) to the Envoy endpoint (http://localhost:8801). The router should direct these to the appropriate backend model configured in config/config.yaml.

Testing

A comprehensive test suite is available to validate the functionality of the Semantic Router. The tests follow the data flow through the system, from client request to routing decision.

Prerequisites

Install test dependencies:

pip install -r tests/requirements.txt

Running Tests

Make sure both the Envoy proxy and Router are running:

make run-envoy  # In one terminal
make run-router  # In another terminal

Run all tests in sequence:

python tests/run_all_tests.py

Run a specific test:

python tests/00-client-request-test.py

Run only tests matching a pattern:

python tests/run_all_tests.py --pattern "0*-*.py"

Check if services are running without running tests:

python tests/run_all_tests.py --check-only

The test suite includes:

Basic client request tests
Envoy ExtProc interaction tests
Router classification tests
Semantic cache tests
Category-specific tests
Metrics validation tests

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
candle-binding		candle-binding
classifier_model_fine_tuning		classifier_model_fine_tuning
config		config
docs		docs
dual_classifier		dual_classifier
model_eval		model_eval
multitask_bert_fine_tuning		multitask_bert_fine_tuning
pii_model_fine_tuning		pii_model_fine_tuning
prompt_guard_fine_tuning		prompt_guard_fine_tuning
semantic_router		semantic_router
test_program		test_program
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.extproc		Dockerfile.extproc
Makefile		Makefile
README.md		README.md
llm-router-dashboard.json		llm-router-dashboard.json
requirements.txt		requirements.txt
verify_tokenization.py		verify_tokenization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Semantic Router

Auto-selection of model

Auto-selection of tools

PII detection

Prompt guard

Semantic cache

Usage

Prerequisites

Run the Envoy Proxy

Download the models

Run the Semantic Router (Go Implementation)

Testing

Prerequisites

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

redhat-et/semantic_router

Folders and files

Latest commit

History

Repository files navigation

LLM Semantic Router

Auto-selection of model

Auto-selection of tools

PII detection

Prompt guard

Semantic cache

Usage

Prerequisites

Run the Envoy Proxy

Download the models

Run the Semantic Router (Go Implementation)

Testing

Prerequisites

Running Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages