I build production-grade, real-time voice agents, RAG systems, and multimodal AI—blending deep learning, systems programming, and deployment to ship reliable products.
- Voice AI: low-latency STT/TTS with Faster-Whisper, VAD, RealtimeTTS (Coqui, Kokoro, Orpheus)
- Conversational agents: FastAPI + WebSockets, Redis, LangChain
- RAG: ChromaDB, Pinecone, HuggingFace LLMs
- Multimodal agents: audio • text • vision
- Deployment: Docker, Kubernetes, FastAPI backends, Streamlit/JS frontends
- Data: PostgreSQL, Redis; async pipelines, session/state management
-
Real-time Voice Agent (ChatGPT Voice Mini)
- Full-duplex WebSocket voice, partial/final transcripts, VAD, memory, entity extraction
- Stack: FastAPI, Redis, PostgreSQL, JS | STT: Faster-Whisper | TTS: RealtimeTTS (Coqui, Orpheus)
- GitHub: https://github.com/Siddharth0207/voice-agent-llama
-
RAG-Driven QA Engine
- Context-aware Q&A with Chroma/Pinecone + HuggingFace LLMs
- Packaged for Streamlit/Flask
Python, JavaScript, SQL • FastAPI, LangChain, HuggingFace, PyTorch, WebRTC • Redis, PostgreSQL • Docker, Kubernetes • Whisper/Faster-Whisper, VAD, RealtimeTTS • Git, VS Code
- Modular, composable agent architectures
- Long-term memory, multi-turn reasoning
- Reproducible, well-documented releases
- Collaborations on voice-first AI UX
If you’re building real-time agents or production ML infra, let’s connect.
“Build systems that listen, speak, and understand — in real time.”