Table of Contents
- 1. Overview
- 2. Download images
- 3. Generation Server Setup
- 4. Orchestrator Setup
- 5. Simple E-mail Detector Setup
- 6. Sentence Chunker Setup
- 7. Granite Guardian HAP Detector Setup
- 8. Summary
This document contains all the steps necessary to run the Guardrails ecosystem locally. The guardrails framework was built to run on bigger production settings. However, in this tutorial, we'll work with smaller components that can be executed locally. The intent of this workshop is not to have a production-ready environment, but to illustrate the components of the guardrails ecosystem, their functionalities and applicabilities.
The Guardrails ecossystem is composed of four component types: generation servers, detectors, chunkers and an orchestrator. Use the commands below to download the docker images for each component.
docker pull ollama/ollama:0.11.7
docker pull quay.io/mdevin0/guardrails-orchestrator:latest
docker pull quay.io/mdevin0/email-detector:latest
docker pull quay.io/mdevin0/granite-guardian-hap-detector:latest
docker pull quay.io/mdevin0/chunker:latest
Tip
If you can't use Docker, you can use podman and podman-compose instead for anything related in this workshop.
Generation servers are responsible for serving a generative language model. At this point in time, the framework supports servers that comply with the Open AI Completions and Chat Completions APIs. However, only vLLM was tested as a generation server. In this workshop, we will use Ollama as our generation server.
We'll be using the official Ollama docker image through docker compose. To start the service, create a file named docker-compose.yaml with the following content:
services:
generation-server:
image: ollama/ollama:0.11.7
container_name: generation-server
ports:
- "8000:8000"
volumes:
- ollama_data:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:8000
restart: unless-stopped
volumes:
ollama_data:
driver: localThen, run docker compose up to start the service.
The next step is to download the generation model.
To do so, run the following commands on a different terminal:
# Get inside generation-server container
docker exec -ti generation-server bash# Download model
ollama pull qwen3:0.6b# Exit container
exit# Test model is listed
curl localhost:8000/v1/models
# The output of this command should be similar to this:
{"object":"list","data":[{"id":"qwen3:0.6b","object":"model","created":1757891103,"owned_by":"library"}]}You can run the following command to test the model:
curl --location 'http://localhost:8000/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3:0.6b",
"prompt": "/no_think Hi there! How are you?"
}'Once we have the generation-server up and running, we can setup the orchestrator and configure it to use the generation server.
Create a file named orchestrator.yaml and add the following content:
openai:
service:
hostname: generation-server
port: 8000
detectors:
email:
type: text_contents
service:
hostname: email-detector
port: 8001
chunker_id: whole_doc_chunker
default_threshold: 0.5The full documentation for the orchestrator config can be found here.
Tip
In Linux environments, you may need to run the following command to allow the file to be accessed by containers:
sudo chcon -Rt svirt_sandbox_file_t orchestrator.yaml
Now we need to add the orchestrator service to the compose file. To do so, add the following under services in docker-compose.yaml:
orchestrator:
image: quay.io/mdevin0/guardrails-orchestrator:latest
container_name: orchestrator
ports:
- "8090:8033"
volumes:
- ${PWD}/orchestrator.yaml:/config/config.yaml
environment:
- ORCHESTRATOR_CONFIG=/config/config.yaml
depends_on:
- generation-server
restart: unless-stoppedAfter that, stop running compose and execute docker compose up to now bring up both generation server and orchestrator.
To test the connection, you can run the following command:
curl --location 'http://localhost:8090/api/v2/text/completions-detection' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3:0.6b",
"prompt": "/no_think Hi there! How are you?"
}'The documentation for the orchestrator completions endpoint can be found here, and its source code can be found in this repository.
Now we have the generation server and the orchestrator setup, but there is no value in using the guardrails framework if there are no detectors. We will configure an e-mail detector now by adding the following service to docker-compose.yaml:
email-detector:
image: quay.io/mdevin0/email-detector:latest
container_name: email-detector
ports:
- "8091:8001"
restart: unless-stoppedThen, run docker compose up to bring up the new services.
To test an input detection, run the following command:
curl --location 'http://localhost:8090/api/v2/text/completions-detection' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "qwen3:0.6b",
"prompt": "/no_think My e-mail is [email protected].",
"detectors": {
"input": {
"email": {}
}
}
}'To test output detection, run the following request:
curl --location 'http://localhost:8090/api/v2/text/completions-detection' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3:0.6b",
"prompt": "/no_think Could you generate a random e-mail address?",
"detectors": {
"output": {
"email": {}
}
}
}'The source code for the e-mail detector can be found on this repository.
Now that we have a detector, we will configure a sentence chunker.
Run the following commands
Add the following service to docker-compose.yaml:
sentence-chunker:
image: quay.io/mdevin0/chunker:latest
container_name: sentence-chunker
ports:
- "50052:50051"
volumes:
- nltk_data:/root/nltk_data/
restart: unless-stoppedAnd also add the following volume to volumes:
nltk_data:
driver: localIn the orchestrator.yaml, add the sentence-chunker entry at the top level, right below the openai block:
chunkers:
sentence_chunker:
type: sentence
service:
hostname: sentence-chunker
port: 50051The block above registers the sentence-chunker in the orchestrator.
We can also register a version of the e-mail detector configured with the sentence chunker by adding the following entry under detectors in orchestrator.yaml:
email-sentence:
type: text_contents
service:
hostname: email-detector
port: 8001
chunker_id: sentence_chunker
default_threshold: 0.5Now, we can start the docker compose services again, and get into the sentence chunker container to download the necessary model. To do so:
# Get inside sentence-chunker container
docker exec -ti sentence-chunker bash
# Initialize python and download punkt_tab lib
python
import nltk
nltk.download('punkt_tab')
# Exit python and container
exit()
exitNow we can test a call to the e-mail detector configured with the sentence chunker by running the following:
curl --location 'http://localhost:8090/api/v2/text/completions-detection' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "qwen3:0.6b",
"prompt": "Hey, how'\''s it going? My e-mail address is [email protected].",
"detectors": {
"input": {
"email-sentence": {}
}
}
}'The source code for this chunker is available on this repository.
The last step in this tutorial is configuring a HAP (Hate, Abuse and Profanity) detector based on Granite Guardian model. Granite is a open source family of models developed by IBM under the Apache 2.0 license, which means these models can be used even in enterprise environments. From the Granite family, we'll be using a Granite Guardian model focused on HAP detection.
To enable this detector, add the following entry under services in docker-compose.yaml:
gg-hap-detector:
image: quay.io/mdevin0/granite-guardian-hap-detector:latest
container_name: gg-hap-detector
ports:
- "8092:8002"
volumes:
- ~/.cache/huggingface/:/root/.cache/huggingface/
restart: unless-stoppedThen, register the detector configured with the sentence chunker in orchestrator.yaml, under detectors:
gg-hap-sentence:
type: text_contents
service:
hostname: gg-hap-detector
port: 8002
chunker_id: sentence_chunker
default_threshold: 0Feel free to also register a version using the whole_doc_chunker.
To test the detection, run the following command:
curl --location 'http://localhost:8090/api/v2/text/completions-detection' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3:0.6b",
"prompt": "I love cats! I hate aliens!",
"detectors": {
"input": {
"gg-hap-sentence": {}
}
}
}'The source code for this detector can be found on this repository.
In this workshop, we configured a Guardrails ecosystem, which can be used to perform detections in both user input and LLMs output. While the generation server and orchestrator are ready for production use, the chunker and detectors used in this tutorial are intended for educational purposes only. Also, depending on your production workload, you'll most likely want to use a generation server like vLLM, as Ollama is intended mostly for local usage.