The Dial RAG answers user questions using information from the documents provided by user. It supports the following document formats: PDF, DOC/DOCX, PPT/PPTX, TXT and other plain text formats such as code files. Also, it supports PDF and JPEG, PNG and other image formats for the image understanding.
The Dial RAG implements several retrieval methods to find the relevant information:
- Description retriever - uses vision model to generate page images descriptions and perform search on them. Supports different vision models, like
gpt-4o-mini
,gemini-1.5-flash-002
oranthropic.claude-v3-haiku
. - Multimodal retriever - uses multimodal embedding models for pages images search. Supports different multimodal models, like
azure-ai-vision-embeddings
, Googlemultimodalembedding@001
oramazon.titan-embed-image-v1
- Semantic retriever - uses text embedding model to find the relevant information in the documents.
- Keyword retriever - uses Okapi BM25 algorithm to find the relevant information in the documents.
The Dial RAG is intended to be used in the Dial environment. It uses the Dial Core to access the LLMs and other services.
Following environment variables are required to set for the deployment configuration:
Variable | Description |
---|---|
DIAL_URL |
url to the dial core |
DIAL_RAG__INDEX_STORAGE__USE_DIAL_FILE_STORAGE |
set to True to store indexes in the Dial File Storage instead of the local file storage |
The Dial RAG provides a set of configuration files with predefined settings for different environments. The configuration files are located in the config
directory.
You can set the environment variable DIAL_RAG__CONFIG_PATH
to point to the required configuration file depending on the Dial environment and available models.
The following configuration files are available in the config
directory:
config/aws_description.yaml
- AWS environment with description retriever, which usesClaude 3 Haiku
model for page images descriptions andClaude 3.5 Sonnet
for the answer generation.config/aws_embedding.yaml
- AWS environment with multimodal retriever, which usesamazon.titan-embed-image-v1
model for page images embeddings andClaude 3.5 Sonnet
for the answer generation.config/azure_description.yaml
- Azure environment with description retriever, which usesGPT-4o mini
model for page images descriptions andGPT-4o
for the answer generation.config/azure_embedding.yaml
- Azure environment with multimodal retriever, which usesazure-ai-vision-embeddings
model for page images embeddings andGPT-4o
for the answer generation.config/gcp_description.yaml
- GCP environment with description retriever, which usesGemini 1.5 Flash
model for page images descriptions andGemini 1.5 Pro
for the answer generation.config/gcp_embedding.yaml
- GCP environment with multimodal retriever, which uses Googlemultimodalembedding@001
model for page images embeddings andGemini 1.5 Pro
for the answer generation.config/azure_with_gcp_embedding.yaml
- mixed environment which assumes that you have and access to both Azure and GCP models in the Dial. It uses Googlemultimodalembedding@001
model for page images embeddings andGPT-4o
for the answer generation.
If you are running the Dial RAG in a different environment, you can create your own configuration file based on one of the provided files and set the DIAL_RAG__CONFIG_PATH
environment variable to point to it. If you need a small change in the configuration (for example to change the model name), you can point the DIAL_RAG__CONFIG_PATH
to the existing file and override the required settings using the environment variables. See the Additional environment variables section for the list of available settings.
Variable | Default | Description |
---|---|---|
LOG_LEVEL |
INFO |
Log level for the application. |
LOG_LEVEL_OVERRIDE |
{} |
Allows to override log level for specific modules. Example: LOG_LEVEL_OVERRIDE='{"dial_rag": "DEBUG", "urllib3": "ERROR" }' |
Dial RAG has additional variables to tune its performance and behavior:
Optional, default value: .
Path to the yaml configuration file.See config directory for examples.
Optional, default value: http://dial-proxy.dial-proxy
Url to the dial core.
Optional, default value: False
Enables support of debug commands in the messages. Should be false
for prod envs. It is set to true
only for staging. See Debug commands for more details.
Optional, default value: 6
Process pool for document parsing, image extraction and similar CPU-bound tasks. Is set to max(1, CPU_COUNT - 2)
to leave some CPU cores for other tasks.
Optional, default value: 1
Embedding process itself uses multiple cores. Should be 1
, unless you have a lot of cores and can explicitly see the underutilisation (i.e. you only have a very small documents in the requests).
Optional, default value: 1
Embedding process for the query. Should be 1
, unless you have a lot of cores.
Optional, default value: False
Set to True
to store indexes in the Dial File Storage instead of in memory storage
Optional, default value: 128MiB
Used to cache the document indexes and avoid requesting Dial Core File API every time, if user makes several requests for the same document. Could be increased to reduce load on the Dial Core File API if we have a lot of concurrent users (requires corresponding increase of the pod memory). Could be integer for bytes, or a pydantic.ByteSize compatible string (e.g. 128MiB, 1GiB, 2.5GiB).
Optional, default value: False
Ignore errors during document loading. Used for Web RAG for the request with multiple documents.
Optional, default value: False
Use profiler to collect performance metrics for the request.
Optional, default value: False
Allows writing the links of the attached documents to the logs with log levels higher than DEBUG.
If enabled, Dial RAG will log the links to the documents for log messages with levels from INFO to CRITICAL where relevant. For example, an ERROR log message with an exception during document processing will contain the link to the document.
If disabled, only log messages with DEBUG level may contain the links to the documents, to avoid logging sensitive information. For example, the links to the documents will not be logged for the ERROR log messages with an exception during document processing.
Optional, default value: 30
Timeout for the whole request. Includes connection establishment, sending the request, and receiving the response.
Optional, default value: 30
Timeout for establishing a connection to the server.
Optional, default value: 30
Timeout for the whole request. Includes connection establishment, sending the request, and receiving the response.
Optional, default value: 30
Timeout for establishing a connection to the server.
Optional, default value: 5MiB
Limits the size of the document the RAG will accept for processing. This limit is applied to the size of the text extracted from the document, not the size of the attached document itself. Could be integer for bytes, or a pydantic.ByteSize compatible string.
Optional, default value: 1000
Sets the chunk size for unstructured document loader.
Optional, default value: None
Enables MultimodalRetriever which uses multimodal embedding models for pages images search.
Optional, default value: llm=LlmConfig(deployment_name='gpt-4o-mini-2024-07-18', max_prompt_tokens=0, max_retries=1000000000) estimated_task_tokens=4000 time_limit_multiplier=1.5 min_time_limit_sec=300
Enables DescriptionRetriever which uses vision model to generate page images descriptions and perform search on them.
Optional, default value: gpt-4o-2024-05-13
Used to set the deployment name of the LLM used in the chain. Could be useful if the model deployments have non-standard names in the Dial Core configuration.
Optional, default value: 0
Sets max_prompt_tokens
for the history truncation for the LLM, if history is used. Requires DEPLOYMENT_NAME
model to support he history truncation and max_prompt_tokens
parameter. Could be set to 0
to disable the history truncation for models which does not support it, but will cause error it if max model context window will be reached.
Optional, default value: 2
Sets the number of retries to send the request to the LLM.
Optional, default value: None
Allow to override the system prompt template.
Optional, default value: True
Used to set whether to use the history for the answer generation. If true, the previous messages from the chat history would be passes to the model. If false, only the query (last user message or standalone question, depending on the query_chain
settings) will be passed to the model for the answer generation.
Optional, default value: 4
Sets number of page images to pass to the model for the answer generation. If is greater that 0, the model in llm.deployment_name
should accept images in the user messages. Could be set to 0 (together with USE_MULTIMODAL_INDEX=False and USE_DESCRIPTION_INDEX=False) for text-only RAG.
Optional, default value: 1536
Sets the size of the page images to pass to the model for the answer generation.
Optional, default value: gpt-4o-2024-05-13
Used to set the deployment name of the LLM used in the chain. Could be useful if the model deployments have non-standard names in the Dial Core configuration.
Optional, default value: 0
Sets max_prompt_tokens
for the history truncation for the LLM, if history is used. Requires DEPLOYMENT_NAME
model to support he history truncation and max_prompt_tokens
parameter. Could be set to 0
to disable the history truncation for models which does not support it, but will cause error it if max model context window will be reached.
Optional, default value: 2
Sets the number of retries to send the request to the LLM.
Optional, default value: True
Used to set whether to use the history for the chat history summarization to the standalone question for retrieval. If true, the previous messages from the chat history would be passes to the model to make a standalone question. If false, the last user message was assumed to be a standalone question and be used for retrieval as is.
Dial RAG supports following commands in messages:
/attach <url>
- allows to provide an url to the attached document in the message body. Is equivalent to the setting messages[i].custom_content.attachments[j].url
in the Dial API.
The /attach
command is useful to attach the document which is available in the Internet and is not uploaded to the Dial File Storage.
Dial RAG supports following debug commands if the option ENABLE_DEBUG_COMMANDS
is set to true
.
/model <model>
- allows to override the chat model used for the answer generation. Should be a deployment name of a chat model in available the Dial./query_model <model>
- allows to override the model used to summarize the chat history to the standalone question. Should be a deployment name of a chat model in available the Dial. The model should supporttool calls
./profile
- generates CPU profile report for the request. The report will be available as an attachment in theProfiler
stage.
This project uses Python==3.11 and Poetry>=1.8.5 as a dependency manager.
Check out Poetry's documentation on how to install it on your system before proceeding.
If you have Poetry>=1.8.5 and python 3.11 installed in the system, to install requirements you can run:
poetry install
This will install all requirements for running the package, linting, formatting and tests.
Alternatively, if you have uv installed, you can use it to create the environment with required version of Python and poetry:
uv venv "$VIRTUAL_ENV" --python 3.11
uvx [email protected] install
This will install all requirements for running the package, linting, formatting and tests, the same as poetry install
command above.
If you want to use poetry from the uv with make commands, you can set the POETRY=uvx [email protected]
environment variable:
POETRY="uvx [email protected]" make install
The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.
This project uses Ruff as a linter and formatter. To configure it for your IDE follow the instructions in https://docs.astral.sh/ruff/editors/setup/.
As of now, Windows distributions do not include the make tool. To run make commands, the tool can be installed using the following command (since Windows 10):
winget install GnuWin32.Make
For convenience, the tool folder can be added to the PATH environment variable as C:\Program Files (x86)\GnuWin32\bin
.
The command definitions inside Makefile should be cross-platform to keep the development environment setup simple.
Copy .env.example
to .env
and customize it for your environment for the development process. See the Configuration section for the list of environment variables.
Run the development server locally:
make serve
Run the development server in Docker:
make docker_serve
Open localhost:5000/docs
to make sure the server is up and running.
The docker_compose_local
folder contains the Docker Compose file and auxiliary scripts to run Dial RAG with Dial Core in Docker Compose. The docker-compose.yml
file is configured to run Dial RAG alongside Dial Core, Dial Chat UI, and the DIAL Adapter for DIAL to provide access to LLMs.
-
In the
docker_compose_local
folder, create a file named.env
and define the following variables:DIAL_RAG_URL
- Provide the URL for the local Dial RAG instance (including the IP address and port) if you are running it in your IDE. The default value ishttp://host.docker.internal:5000
.REMOTE_DIAL_URL
- Provide the URL for the remote Dial Core to access the LLMs.REMOTE_DIAL_API_KEY
- Provide the API key for the remote Dial Core.DEPLOY_DIAL_RAG=<0|1>
- Set to0
to skip deploying the Dial RAG container in Docker Compose (useful for debugging the application locally). Set to1
to deploy Dial RAG as a Docker Compose container.
These variables will be passed to
dial_conf/core/config.json
and used for communication between the Dial and Dial RAG applications. -
Navigate to the
docker_compose_local
folder and run the following command in the terminal:docker-compose up
This will bring up the entire Dial application, ready to use.
-
If you need to rebuild the Dial RAG image, use the following command:
docker-compose up --build dial-rag
Run the linting before committing:
make lint
To auto-fix formatting issues run:
make format
Run unit tests locally:
make test
Run unit tests in Docker:
make docker_test
Some of the tests marked with the @e2e_test
decorator utilize cached results located in the ./tests/cache
directory. By default, these tests will use cached values. During test execution, you may encounter warning or failure messages such as Failed: There is no response found in cache, use environment variable REFRESH=True to update
This indicates that some logic has changed and that the cached responses are out of date.
These tests can be executed using environment variables, or nox sessions:
make test
(ornox -s test
) - usual test run, executed on CI. The test uses ONLY the cached responses from LLM. If cache missing, test throws an exception.REFRESH=True make test
(ornox -s test -- --refresh
) - This flag will delete all unused cache files, and stores new ones required by the executed tests.
To use the REFRESH
flag, you need to have running dial-core on DIAL_CORE_HOST
(default "localhost:8080") with DIAL_CORE_API_KEY
(default "dial_api_key").
To remove the virtual environment and build artifacts:
make clean
This project uses settings-doc to generate the Configuration section of this documentation from the Pydantic settings. To update the documentation run:
make docs