This project is a Python-based, real-time conversational AI agent system built on top of LiveKit and Rime.ai. It enables hyper-realistic, character-driven voice agents that can join LiveKit audio rooms, respond to users in natural language, and speak with expressive, customizable voices. The system leverages advanced TTS (text-to-speech) models, OpenAI LLMs (large language models), and a modular plugin architecture for extensibility.
- Project Summary
- Folder Structure
- Key Components
- Core Technologies
- Setup & Installation
- Environment Variables & API Keys
- Running the Agent
- Customization & Prompt Engineering
- Technical Notes
- Demo/Deployment Tips
- References
rime-livekit-agents/
│
├── .env # Environment variables (API keys, URLs)
├── agent_configs.py # Voice/personality configs and prompt engineering
├── rime_agent.py # Main agent logic and entrypoint
├── requirements.txt # Python dependencies
├── text_utils.py # Custom sentence tokenizer for TTS
├── TECHNICAL_OVERVIEW.md # This technical documentation
├── README.md # Basic project info
├── KMS/ # (Optional) Key Management Service or logs
│ └── logs/
└── __pycache__/ # Python bytecode cache
- Main entry point for the agent.
- Handles LiveKit room connection, session management, plugin integration, and event loop.
- Integrates TTS, LLM, STT, noise cancellation, and turn detection.
- Defines agent personalities, TTS settings, and prompt engineering.
- Example:
"celeste"persona with a clingy, playful, flirty university girlfriend style. - Each persona can have unique TTS speed, model, and prompt.
- Implements custom sentence tokenization for advanced TTS models (e.g., Arcana).
- LiveKit: Real-time audio/video infrastructure for scalable, multi-user rooms.
- Rime.ai: Hyper-realistic TTS models ("arcana", "mistv2").
- OpenAI: LLMs (e.g., GPT-4o-mini) for generating conversational responses.
- Python 3.11+: All orchestration and logic.
- LiveKit Plugins: For noise cancellation, turn detection, and TTS enhancements.
git clone https://github.com/uw-datasci/AI-GF.gitWindows:
python -m venv .venv
.venv\Scripts\activatemacOS/Linux:
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtpython rime_agent.py download-filesThis will download any required TTS or turn detection models.
Create a .env file in the project root with the following keys:
LIVEKIT_URL=wss://<your-livekit-server>.livekit.cloud
LIVEKIT_API_KEY=<your-livekit-api-key>
LIVEKIT_API_SECRET=<your-livekit-api-secret>
OPENAI_API_KEY=<your-openai-api-key>
RIME_API_KEY=<your-rime-api-key>
# Optional: Tavus avatar integration
TAVUS_API_KEY=<your-tavus-api-key>
TAVUS_REPLICA_ID=<your-tavus-replica-id>Required API Keys:
- LiveKit: For connecting to your LiveKit Cloud or self-hosted server.
- OpenAI: For LLM responses (ensure your key has quota).
- Rime.ai: For TTS (Arcana, Mistv2, etc.).
- Tavus (optional): For avatar video integration.
Note:
- Do not surround values with quotes unless the value contains spaces.
- If you see quota errors, check your OpenAI or Rime.ai usage and billing.
Run the agent in console mode for local testing:
python rime_agent.py consoleConnects the agent to a LiveKit room:
python rime_agent.py dev- Ensure all required environment variables are set in
.env. - The agent will join the specified LiveKit room and respond to participants.
- Press
Ctrl + Cin the terminal to stop the agent at any time.
- Edit
agent_configs.pyto:- Add new personas (copy the
"celeste"config and modify). - Change TTS speed (
"speed_alpha"), model, or speaker. - Update the
llm_promptfor different conversational styles. - Adjust
intro_phrasefor custom greetings.
- Add new personas (copy the
Example:
"celeste": {
"tts_options": {
"model": "arcana",
"speaker": "celeste",
"speed_alpha": 1.0, # 1.0 = normal speed
...
},
"llm_prompt": "...",
"intro_phrase": "hey cutie... <laugh> I was just thinking about you. what are you up to?",
}- Lower
"speed_alpha"if TTS is too fast for avatar sync.
- Dependencies:
- Uses a forked version of
livekit-plugins-rimefor improved Arcana support (seerequirements.txt). - All audio processing, TTS, and LLM calls are asynchronous for low latency.
- Uses a forked version of
- Plugins:
- Noise cancellation (
livekit-plugins-noise-cancellation) - Turn detection (
livekit-plugins-turn-detector)
- Noise cancellation (
- Extensibility:
- Add new plugins, voices, or logic by extending the agent/session classes.
- Microphone Selection:
- By default, uses the system default input device.
- To change, modify the code to set the desired device index using
sounddeviceor relevant library.
-
For Demos:
- Highlight real-time, character-driven voice interaction.
- Show expressive TTS and persona switching.
- Demonstrate easy customization via
agent_configs.py. - Explain integration with LiveKit for scalable audio rooms.
-
For Production:
- Deploy on a cloud VM or service (e.g., Render, AWS, Azure).
- Use secure storage for API keys.
- Monitor usage and quotas for OpenAI and Rime.ai.
- Optionally, connect a web or mobile frontend via LiveKit APIs.
- Quota Errors:
- If you see
insufficient_quotaor 429 errors, check your OpenAI or Rime.ai account usage and billing.
- If you see
- Audio Sync Issues:
- If TTS audio is faster than the avatar, lower
"speed_alpha"inagent_configs.py.
- If TTS audio is faster than the avatar, lower
- Missing Dependencies:
- Re-run
pip install -r requirements.txtin your activated virtual environment.
- Re-run
- Microphone Issues:
- Ensure your preferred input device is set as default, or modify the code to select a specific device.
- LiveKit Agents Documentation
- Rime.ai
- LiveKit Cloud
- OpenAI Platform
- Tavus (if using avatar video)
This document provides a comprehensive technical overview and setup guide for the Rime LiveKit Agents project. For further details, see the codebase and README.