Twilio Demo - Real-time Audio Translation

A real-time audio translation application built with FastAPI, Twilio, and Palabra AI that enables live voice conversations between speakers of different languages.

🚀 Features

Real-time Audio Processing: Live audio streaming and processing using Twilio Media Streams
Automatic Speech Recognition: Real-time transcription with language detection (English/Russian)
Live Translation: Instant translation among different languages
Web Interface: Real-time transcription display with WebSocket updates
Multi-party Calls: Support for client-operator conversations
Audio Mixing: Intelligent mixing of original and translated audio

🏗️ Architecture

┌─────────────┐    ┌─────────────┐    ┌───────────────────────┐
│   Client    │    │   Twilio    │    │   Websocket Server    │
│  (Phone)    │◄──►│  (Gateway)  │◄──►│      (FastAPI)        │
└─────────────┘    └─────────────┘    └───────────────────────┘
                           ▲                   ▲
                           │                   │
                           |                   │
                           │                   │
                           │                   ▼
                           │            ┌─────────────┐
                           │            │  Palabra    │
                           │            │     API     │
                           │            │             │
                           │            └─────────────┘
                           │                   
                           ▼                   
                    ┌─────────────┐            
                    │  Operator   │            
                    │  (Phone)    │            
                    └─────────────┘

🛠️ Technology Stack

Backend: FastAPI, Python 3.11+
Audio Processing: NumPy, audioop
WebSocket: Starlette WebSockets
Telephony: Twilio API
AI Services: Palabra AI (ASR, Translation, TTS)
Frontend: HTML, CSS, JavaScript
Process Management: Multiprocessing with async workers

📋 Prerequisites

Python 3.11 or higher
Twilio paid account with phone numbers (free trial accounts have limitations)
Palabra AI API credentials
Environment variables configured

🔧 Installation

Clone the repository

git clone <repository-url>
cd twilio-demo

Configure environment variables Edit the Makefile and set your actual values for:
- Twilio credentials
- Palabra AI credentials
- Server configuration
- Language settings
Install dependencies
```
make install
```
Install development dependencies (optional)
```
make dev
```

⚙️ Configuration

Environment Variables

All environment variables are configured in the Makefile. Edit the variables in the Makefile according to your setup:

# Environment variables
export TWILIO_ACCOUNT_SID = your_account_sid
export TWILIO_AUTH_TOKEN = your_auth_token
export TWILIO_NUMBER = your_twilio_phone_number
export PALABRA_CLIENT_ID = your_client_id
export PALABRA_CLIENT_SECRET = your_client_secret
export HOST = your_server_hostname_or_ip
export OPERATOR_NUMBER = operator_phone_number
export PORT = 7839
export SOURCE_LANGUAGE = en
export TARGET_LANGUAGE = pl

Variable Descriptions

Twilio Configuration

TWILIO_ACCOUNT_SID - Your Twilio Account SID
TWILIO_AUTH_TOKEN - Your Twilio Auth Token
TWILIO_NUMBER - Your Twilio phone number that clients will call

This Twilio article explains how to obtain both credentials.

Palabra AI Configuration

PALABRA_CLIENT_ID - Your Palabra AI client identifier
PALABRA_CLIENT_SECRET - Your Palabra AI client secret key

This Palabra article explains how to obtain both credentials.

Server Configuration

HOST - Your server's hostname or IP address (for local development you may use Cloudflare Tunnel URL or its alternatives)
OPERATOR_NUMBER - The operator's phone number for receiving calls
PORT - Server port number (defaults to 7839)

Language Configuration

SOURCE_LANGUAGE - Language spoken by the client (e.g., en, ru, de, es)
TARGET_LANGUAGE - Language spoken by the operator (e.g., en, ru, de, es)

🛠️ Makefile Commands

The project includes a Makefile for common operations:

Available Commands

make help - Show all available commands
make install - Create virtual environment and install dependencies
make dev - Install dependencies with development tools
make run - Start the server with environment variables from Makefile
make clean - Remove virtual environment
make format - Format code with black and isort
make check - Run all code quality checks

Environment Variables in Makefile

All environment variables are defined in the Makefile using export statements. This ensures they are available when running make run or other commands.

🌐 Local Development with Cloudflare Tunnel

For local development, you'll need to expose your local server to the internet so Twilio can send webhooks. The recommended tool for this is Cloudflare Tunnel (cloudflared).

Setting up Cloudflare Tunnel

Install cloudflared Follow the Cloudflare Tunnel documentation for installation instructions.

Start cloudflared tunnel

cloudflared tunnel --url http://localhost:${PORT}

Copy the tunnel URL
```
https://abc123.trycloudflare.com
```
Set HOST variable Use the tunnel URL (without protocol) as your HOST value:
```
HOST=abc123.trycloudflare.com
```

Important Notes

HTTPS Required: Twilio requires HTTPS for webhooks, which Cloudflare Tunnel provides
Stable URLs: Cloudflare Tunnel provides stable URLs that don't change on restart
Update Twilio Webhooks: Remember to update your Twilio webhook URLs when the tunnel URL changes

Twilio Webhook Configuration

After setting up your tunnel, you need to configure Twilio webhooks to point to your server:

Go to Twilio Console → Phone Numbers → Manage → Active numbers
Click on your phone number
In the "Voice Configuration" section, set:
- Webhook URL: https://${HOST}/twiml/client
- HTTP Method: POST

For detailed instructions, see the Twilio Phone Number Configuration documentation.

Important: Replace ${HOST} with your actual tunnel hostname (e.g., abc123.trycloudflare.com).

Geographic Permissions

Critical: Ensure that the country of your operator's phone number is enabled in Twilio's Geographic Permissions. If the operator's country is not enabled, Twilio will block outbound calls to that number.

To configure Geographic Permissions:

Go to Twilio Console → Voice → Geographic Permissions
Enable calling to the country where your operator's phone number is located

For detailed information about Geographic Permissions and toll fraud protection, see the Twilio Geographic Permissions documentation.

🚀 Usage

Starting the Server

make run

The server will start on http://0.0.0.0:${PORT}

Note: Make sure you have configured all environment variables in the Makefile before starting the server.

Making a Call

Client calls your Twilio number
System automatically calls the operator
Both parties are connected via WebSocket
Real-time translation begins automatically

Web Interface

Access the transcription interface at:

https://${HOST}:${PORT}/transcription

Replace ${HOST} and ${PORT} with your actual server hostname/IP address and port number.

📁 Project Structure

twilio-demo/
├── main.py                 # FastAPI application entry point
├── bridge.py               # Audio bridge and WebSocket handling
├── settings.py             # Configuration and role settings
├── transcription.py        # Transcription broadcasting
├── utils/
│   ├── audio.py           # Audio processing workers
│   ├── calls.py           # Call session management
│   └── worker.py          # Async process manager
├── templates/
│   └── transcription.html # Web interface template
├── static/
│   ├── css/
│   │   └── styles.css     # Styling for web interface
│   └── js/
│       └── app.js         # WebSocket client logic
├── pyproject.toml         # Project configuration
└── README.md              # This file

🔌 API Endpoints

HTTP Endpoints

POST /twiml/client - Handle incoming client calls
POST /voice/callback/{session_id} - Handle call status updates
POST /voice/disconnect/{role}/{session_id} - Handle call termination
GET /transcription - Web interface for transcriptions

WebSocket Endpoints

WS /voice/{role}/{session_id} - Audio streaming for calls
WS /transcription-ws - Real-time transcription updates

🎵 Audio Processing

The application processes audio in the following pipeline:

Input: μ-law encoded audio from Twilio (8kHz, mono)
Conversion: Convert to PCM (24kHz, 16-bit, mono)
Processing: Send to Palabra AI for ASR and translation
Output: Receive translated audio and mix with original
Delivery: Send mixed audio back to participants

Audio Specifications

Input Format: μ-law, 8kHz, 1 channel
Processing Format: PCM s16le, 24kHz, 1 channel
Output Format: μ-law, 8kHz, 1 channel
Twilio Buffer Size: 960 bytes (20ms at 24kHz)

🌐 Web Interface

The web interface provides:

Real-time Transcription: Live display of conversation
Translation Status: Indicates when translations are pending
Connection Status: WebSocket connection monitoring
Responsive Design: Works on desktop and mobile devices

📸 Screenshots

Main Interface

Transcription Display

🔒 Security

Twilio Signature Validation: All webhooks are verified
Environment Variables: Sensitive data stored securely
Input Validation: All user inputs are validated
Error Handling: Comprehensive error handling and logging

🧪 Development

Code Quality Tools

Black: Code formatting
Ruff: Linting and formatting
isort: Import sorting
Vulture: Dead code detection

Running Development Tools

# Format code
ruff format .

# Lint code
ruff check .

# Sort imports
ruff check --select I .

# Check for dead code
vulture .

🐛 Troubleshooting

Common Issues

WebSocket Connection Failed
- Check if server is running
- Verify firewall settings
- Check WebSocket URL configuration
Audio Not Processing
- Verify Palabra AI credentials
- Check audio format compatibility
- Monitor server logs for errors
Calls Not Connecting
- Verify Twilio credentials
- Check phone number configuration
- Ensure proper webhook URLs

Logs

The application provides detailed logging:

INFO: Connection status and call events
WARNING: Non-critical issues
ERROR: Errors and exceptions

📝 API Documentation

Once the server is running, access the interactive API documentation at:

https://${HOST}:${PORT}/docs

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
static		static
templates		templates
utils		utils
Makefile		Makefile
README.md		README.md
bridge.py		bridge.py
main.py		main.py
pyproject.toml		pyproject.toml
settings.py		settings.py
transcription.py		transcription.py

PalabraAI/twilio-demo

Folders and files

Latest commit

History

Repository files navigation