Traditional retrieval-augmented generation (RAG) systems rely on static data ingested in batches, which limits their ability to support time-critical use cases like emergency response or live monitoring. These situations require immediate access to dynamic data sources such as sensor feeds or radio signals.
The Streaming Data to RAG developer example solves this by enabling RAG systems to process live data streams in real-time. It features a GPU-accelerated software-defined radio (SDR) pipeline that continuously captures radio frequency (RF) signals, transcribes them into searchable text, embeds, and indexes them in real time.
While this reference example showcases an ingestion pipeline transcribing FM-radio data, any number of data streaming workloads could be implemented with this architecture (see docs on API integration). Other use cases could include smart spaces, predictive maintenance, root cause analysis, or any complex systems with many interrelated sensing and data streams.
- Using the Brev Launchable
- Requirements
- Overview
- Quickstart
- Usage
- API Integration
- File Replay Module
- Troubleshooting
- Hardware Integration
- Ethical Considerations
- License
- Terms of Use
- Go to https://build.nvidia.com/nvidia/streaming-data-to-rag.
- Click Deploy on Cloud to go the launchable page on Brev.
- Click Deploy Launchable to start an instance that can run the notebooks.
- Click Open Notebook once the Launchable has been built to open the Jupyter server.
- Open the notebook inside the Jupyter server:
launchable.ipynb
.
Before you start, make sure that you have:
- A GPU capable of running:
- An NVIDIA API key from build.nvidia.com
- Docker, Docker Compose, and the NVIDIA Container Toolkit
- CUDA >=12.2
- Ubuntu >=22.04
See full documentation for the 3rd party repositories used:
- Context-Aware RAG (
streaming-data-to-rag-dev
development branch) - GitHub repository- Handles ingestion of text documents and context-aware Q&A with stored data
- Note that certain components of CA RAG in this blueprint are pre-release features that will be included in a future update at https://github.com/NVIDIA/context-aware-rag.
- Containers:
- Milvus Standalone
- Neo4j
- CA-RAG Ingestion Service
- CA-RAG Retrieval Service
- NeMo Agent Toolkit UI (
streaming-data-to-rag-dev
development branch) - GitHub repository- Provides user interface with chat window and view of streaming and stored data
- Software-Defined Radio
- Receives baseband I/Q samples of FM radio via UDP, uses Holoscan application to perform signal processing and Riva interaction to convert the RF data into transcripts of the audio.
- Container base:
nvcr.io/nvidia/clara-holoscan/holoscan:v3.0.0-dgpu
- FM File Replay
- Reads audio files, does GPU-accelerated FM modulation on audio, and sends to Holoscan SDR via UDP.
- Container base:
nvcr.io/nvidia/pytorch:23.08-py3
- Parakeet 0.6 ASR NIM
- Handles automatic speech transcription capabilities via local deployment
- NVIDIA NeMo Retriever Llama 3.2 embedding NIM
- Handles text embeddings via local deployment NVIDIA NeMo Retriever Llama 3.2 embedding model
NOTE: The NIM services used by the RAG backend (i.e. everything except for the ASR NIM) have configurable endpoints that can be set in
external/context-aware-rag/config/config.yaml
. Cloud and on-prem NIM deployments can be used interchangeably by updating the endpoints in the RAG configuration file.
- AI developers and engineers
- Research teams and intelligence analysts
- Radio enthusiasts
For a complete walkthrough including step-by-step deployment in a Jupyter environment, see notebooks/quickstart.ipynb
.
This blueprint utilizes 2 open-source NVIDIA repositories that have been augmented for this workflow:
- NeMo Agent Toolkit UI: Open-source repository used as the UI
- Context-Aware RAG: Open-source RAG repository originally shown in the VSS Blueprint
First, initialize the required submodules:
git submodule update --init --recursive
Set up the required environment variables. You can either export them directly or create an .env
file.
export NVIDIA_API_KEY=<your-api-key> # Your API key from build.nvidia.com
export MODEL_DIRECTORY=<nim-cache> # [optional] Usually ~/.cache/nim, for NIM model caching
For testing without physical SDR hardware, configure file replay mode. Some sample files from NVIDIA's AI Podcast have been included here for demo purposes. The audio has been clipped for brevity, and names and voices have been modified for privacy.
You may also upload your own audio files to explore the Streaming Data to RAG developer example on custom content. Any audio file that can be loaded with librosa.load
can be used.
The file replay service is not just replaying audio. Rather, it is fully simulating the type of data seen by the Holoscan SDR if it was connected to physical hardware producing baseband I/Q samples. For more details, see src/file-replay/README.md.
# Replay files must be comma-separated and located in `src/file-replay/files`
export REPLAY_FILES="sample_files/ai_gtc_1.mp3, sample_files/ai_gtc_2.mp3, sample_files/ai_gtc_3.mp3"
export REPLAY_TIME=3600 # Maximum replay time in seconds
export REPLAY_MAX_FILE_SIZE=50 # Maximum size of an individual file in MB
Build the image needed for the context-aware RAG system:
docker build -t ctx_rag -f external/context-aware-rag/docker/Dockerfile external/context-aware-rag
Build the images for the FM radio ingestion workflow:
- Parakeet ASR NIM (
asr-nim
) - Holoscan-based SDR (
holoscan-sdr
) - NeMo Agent Toolkit UI (
agentiq-ui
) - File replay (
fm-file-replay
)
docker compose -f deploy/docker-compose.yaml --profile replay build
- Context-Aware RAG services:
docker compose -f external/context-aware-rag/docker/deploy/compose.yaml up -d
- Wait for the RAG services to be healthy before proceeding.
- Retrieval service: http://localhost:8000/health
- Ingestion service: http://localhost:8001/health
- Milvus: http://localhost:9091/healthz
- Neo4j: http://localhost:7474
- FM Radio Ingestion Workflow and UI:
docker compose -f deploy/docker-compose.yaml --profile replay up -d
Quick notes about Context-Aware RAG service
Several services part of the Context-Aware RAG deployment are only used for telemetry and not required for core functionality. These services are:
jaeger
,cassandra
,cassandra-schema
,otel_collector
,prometheus
, andgrafana
.Additionally, there are 'summary' and 'notification' features in the Context-Aware RAG repository that are not used in this Developer Example. See the documentation for more details on how to enable and take advantage of these features.
Once all services are running, access the frontend at http://localhost:3000.
Note: If running on a remote system, you'll need to set up port forwarding. With SSH, this looks like:
ssh -L 3000:localhost:3000 <user>@<remote>
You should see real-time transcripts appearing in the UI as the system processes the audio files.
To view the complete history of transcripts exported to the CA-RAG workflow, click the "History" button in the header tab:
Note: Documents are not ingested into the database right away. They are batched into sets of N documents; once a batch is full it is processed and subsequently injected into the graph DB.
The batch size is parameterized in external/context-aware-rag/config/config.yaml --> chat.params.batch_size
. Other parameters for chat are set here as well, including the maximum documents retrieved (top_k
, with a default of 25).
When the SDR workflow finalizes a transcript and sends it to the ingestion service, it will appear in the transcript history page with a header that looks similar to:
Once the document's batch is processed and available for retrieval, the header will change to:
The context-aware RAG workflow has the ability to filter and retrieve data based on the document ingestion time and document stream. Note that all times are in UTC - to see the relative time, check the timestamps on the transcript history page.
Some suggested questions to start with:
- Recent summary:
- "Summarize the last 10 minutes on channel 0"
- Retrieves documents added on stream 0 between 600 seconds ago and now
- "Summarize the main topics discussed, excluding channel 2, for the past hour."
- Retrieves up to
top_k
documents from all streams but stream 2 between 3600 seconds ago and now
- Retrieves up to
- "Summarize the last 10 minutes on channel 0"
- Time window:
- "What was the topic of conversation on channel 2 15 minutes ago?"
- Retrieves documents added on stream 2 900 seconds ago, with a default 5 minute window
- "Between 2 minutes and half an hour ago, what was the most interesting fact you heard?"
- Retrieves documents between 120 and 1800 seconds ago on all channels
- "What was the topic of conversation on channel 2 15 minutes ago?"
- Specific time:
- "At 9 oclock, what was the topic on channel 3?"
- Retrieves documents in a default 5 minute window around either 9 AM or 9 PM, whichever was more recent
- "At 10:00 PM, what was the topic on channel 3, using a 10 minute window?"
- Retrieves documents in a 10 minute window around 10 PM on channel 3
- "At 9 oclock, what was the topic on channel 3?"
- Excluding recent
- "What was the main topic of conversation on channel 0, excluding the past ten minutes?"
- Retrieves up to
top_k
documents on channel 0 added prior to 600 seconds ago
- Retrieves up to
- "What was the main topic of conversation on channel 0, excluding the past ten minutes?"
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.
Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.
The software and materials are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for AI Products; except for the models, which are governed by the NVIDIA Community Model License; the NVIDIA NeMo-Agent-Toolkit-UI, which is governed by the MIT License; the NVIDIA Holoscan SDK and the NVIDIA Context Aware RAG, which are governed by the Apache 2.0 License; and the audio files, which are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
For llama-3.1-70b-instruct, Llama 3.1 Community License Agreement. Built with Llama. For llama-3.2-nv-embedqa-1b-v2 and llama-3.2-nv-rerankqa-1b-v2, the Llama 3.2 Community License Agreement. Built with Llama.