Skip to content

A real-time audio translation application built with FastAPI, Twilio, and Palabra AI that enables live voice conversations between speakers of different languages.

Notifications You must be signed in to change notification settings

PalabraAI/twilio-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Twilio Demo - Real-time Audio Translation

A real-time audio translation application built with FastAPI, Twilio, and Palabra AI that enables live voice conversations between speakers of different languages.

πŸš€ Features

  • Real-time Audio Processing: Live audio streaming and processing using Twilio Media Streams
  • Automatic Speech Recognition: Real-time transcription with language detection (English/Russian)
  • Live Translation: Instant translation among different languages
  • Web Interface: Real-time transcription display with WebSocket updates
  • Multi-party Calls: Support for client-operator conversations
  • Audio Mixing: Intelligent mixing of original and translated audio

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client    β”‚    β”‚   Twilio    β”‚    β”‚   Websocket Server    β”‚
β”‚  (Phone)    │◄──►│  (Gateway)  │◄──►│      (FastAPI)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β–²                   β–²
                           β”‚                   β”‚
                           |                   β”‚
                           β”‚                   β”‚
                           β”‚                   β–Ό
                           β”‚            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚            β”‚  Palabra    β”‚
                           β”‚            β”‚     API     β”‚
                           β”‚            β”‚             β”‚
                           β”‚            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚                   
                           β–Ό                   
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            
                    β”‚  Operator   β”‚            
                    β”‚  (Phone)    β”‚            
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            

πŸ› οΈ Technology Stack

  • Backend: FastAPI, Python 3.11+
  • Audio Processing: NumPy, audioop
  • WebSocket: Starlette WebSockets
  • Telephony: Twilio API
  • AI Services: Palabra AI (ASR, Translation, TTS)
  • Frontend: HTML, CSS, JavaScript
  • Process Management: Multiprocessing with async workers

πŸ“‹ Prerequisites

  • Python 3.11 or higher
  • Twilio paid account with phone numbers (free trial accounts have limitations)
  • Palabra AI API credentials
  • Environment variables configured

πŸ”§ Installation

  1. Clone the repository

    git clone <repository-url>
    cd twilio-demo
  2. Configure environment variables Edit the Makefile and set your actual values for:

    • Twilio credentials
    • Palabra AI credentials
    • Server configuration
    • Language settings
  3. Install dependencies

    make install
  4. Install development dependencies (optional)

    make dev

βš™οΈ Configuration

Environment Variables

All environment variables are configured in the Makefile. Edit the variables in the Makefile according to your setup:

# Environment variables
export TWILIO_ACCOUNT_SID = your_account_sid
export TWILIO_AUTH_TOKEN = your_auth_token
export TWILIO_NUMBER = your_twilio_phone_number
export PALABRA_CLIENT_ID = your_client_id
export PALABRA_CLIENT_SECRET = your_client_secret
export HOST = your_server_hostname_or_ip
export OPERATOR_NUMBER = operator_phone_number
export PORT = 7839
export SOURCE_LANGUAGE = en
export TARGET_LANGUAGE = pl

Variable Descriptions

Twilio Configuration

  • TWILIO_ACCOUNT_SID - Your Twilio Account SID
  • TWILIO_AUTH_TOKEN - Your Twilio Auth Token
  • TWILIO_NUMBER - Your Twilio phone number that clients will call

This Twilio article explains how to obtain both credentials.

Palabra AI Configuration

  • PALABRA_CLIENT_ID - Your Palabra AI client identifier
  • PALABRA_CLIENT_SECRET - Your Palabra AI client secret key

This Palabra article explains how to obtain both credentials.

Server Configuration

  • HOST - Your server's hostname or IP address (for local development you may use Cloudflare Tunnel URL or its alternatives)
  • OPERATOR_NUMBER - The operator's phone number for receiving calls
  • PORT - Server port number (defaults to 7839)

Language Configuration

  • SOURCE_LANGUAGE - Language spoken by the client (e.g., en, ru, de, es)
  • TARGET_LANGUAGE - Language spoken by the operator (e.g., en, ru, de, es)

πŸ› οΈ Makefile Commands

The project includes a Makefile for common operations:

Available Commands

  • make help - Show all available commands
  • make install - Create virtual environment and install dependencies
  • make dev - Install dependencies with development tools
  • make run - Start the server with environment variables from Makefile
  • make clean - Remove virtual environment
  • make format - Format code with black and isort
  • make check - Run all code quality checks

Environment Variables in Makefile

All environment variables are defined in the Makefile using export statements. This ensures they are available when running make run or other commands.

🌐 Local Development with Cloudflare Tunnel

For local development, you'll need to expose your local server to the internet so Twilio can send webhooks. The recommended tool for this is Cloudflare Tunnel (cloudflared).

Setting up Cloudflare Tunnel

  1. Install cloudflared Follow the Cloudflare Tunnel documentation for installation instructions.

  2. Start cloudflared tunnel

    cloudflared tunnel --url http://localhost:${PORT}
  3. Copy the tunnel URL

    https://abc123.trycloudflare.com
    
  4. Set HOST variable Use the tunnel URL (without protocol) as your HOST value:

    HOST=abc123.trycloudflare.com

Important Notes

  • HTTPS Required: Twilio requires HTTPS for webhooks, which Cloudflare Tunnel provides
  • Stable URLs: Cloudflare Tunnel provides stable URLs that don't change on restart
  • Update Twilio Webhooks: Remember to update your Twilio webhook URLs when the tunnel URL changes

Twilio Webhook Configuration

After setting up your tunnel, you need to configure Twilio webhooks to point to your server:

  1. Go to Twilio Console β†’ Phone Numbers β†’ Manage β†’ Active numbers
  2. Click on your phone number
  3. In the "Voice Configuration" section, set:
    • Webhook URL: https://${HOST}/twiml/client
    • HTTP Method: POST

For detailed instructions, see the Twilio Phone Number Configuration documentation.

Important: Replace ${HOST} with your actual tunnel hostname (e.g., abc123.trycloudflare.com).

Geographic Permissions

Critical: Ensure that the country of your operator's phone number is enabled in Twilio's Geographic Permissions. If the operator's country is not enabled, Twilio will block outbound calls to that number.

To configure Geographic Permissions:

  1. Go to Twilio Console β†’ Voice β†’ Geographic Permissions
  2. Enable calling to the country where your operator's phone number is located

For detailed information about Geographic Permissions and toll fraud protection, see the Twilio Geographic Permissions documentation.

πŸš€ Usage

Starting the Server

make run

The server will start on http://0.0.0.0:${PORT}

Note: Make sure you have configured all environment variables in the Makefile before starting the server.

Making a Call

  1. Client calls your Twilio number
  2. System automatically calls the operator
  3. Both parties are connected via WebSocket
  4. Real-time translation begins automatically

Web Interface

Access the transcription interface at:

https://${HOST}:${PORT}/transcription

Replace ${HOST} and ${PORT} with your actual server hostname/IP address and port number.

πŸ“ Project Structure

twilio-demo/
β”œβ”€β”€ main.py                 # FastAPI application entry point
β”œβ”€β”€ bridge.py               # Audio bridge and WebSocket handling
β”œβ”€β”€ settings.py             # Configuration and role settings
β”œβ”€β”€ transcription.py        # Transcription broadcasting
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ audio.py           # Audio processing workers
β”‚   β”œβ”€β”€ calls.py           # Call session management
β”‚   └── worker.py          # Async process manager
β”œβ”€β”€ templates/
β”‚   └── transcription.html # Web interface template
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   └── styles.css     # Styling for web interface
β”‚   └── js/
β”‚       └── app.js         # WebSocket client logic
β”œβ”€β”€ pyproject.toml         # Project configuration
└── README.md              # This file

πŸ”Œ API Endpoints

HTTP Endpoints

  • POST /twiml/client - Handle incoming client calls
  • POST /voice/callback/{session_id} - Handle call status updates
  • POST /voice/disconnect/{role}/{session_id} - Handle call termination
  • GET /transcription - Web interface for transcriptions

WebSocket Endpoints

  • WS /voice/{role}/{session_id} - Audio streaming for calls
  • WS /transcription-ws - Real-time transcription updates

🎡 Audio Processing

The application processes audio in the following pipeline:

  1. Input: ΞΌ-law encoded audio from Twilio (8kHz, mono)
  2. Conversion: Convert to PCM (24kHz, 16-bit, mono)
  3. Processing: Send to Palabra AI for ASR and translation
  4. Output: Receive translated audio and mix with original
  5. Delivery: Send mixed audio back to participants

Audio Specifications

  • Input Format: ΞΌ-law, 8kHz, 1 channel
  • Processing Format: PCM s16le, 24kHz, 1 channel
  • Output Format: ΞΌ-law, 8kHz, 1 channel
  • Twilio Buffer Size: 960 bytes (20ms at 24kHz)

🌐 Web Interface

The web interface provides:

  • Real-time Transcription: Live display of conversation
  • Translation Status: Indicates when translations are pending
  • Connection Status: WebSocket connection monitoring
  • Responsive Design: Works on desktop and mobile devices

πŸ“Έ Screenshots

Main Interface

Main Interface

Transcription Display

Transcription Display

πŸ”’ Security

  • Twilio Signature Validation: All webhooks are verified
  • Environment Variables: Sensitive data stored securely
  • Input Validation: All user inputs are validated
  • Error Handling: Comprehensive error handling and logging

πŸ§ͺ Development

Code Quality Tools

  • Black: Code formatting
  • Ruff: Linting and formatting
  • isort: Import sorting
  • Vulture: Dead code detection

Running Development Tools

# Format code
ruff format .

# Lint code
ruff check .

# Sort imports
ruff check --select I .

# Check for dead code
vulture .

πŸ› Troubleshooting

Common Issues

  1. WebSocket Connection Failed

    • Check if server is running
    • Verify firewall settings
    • Check WebSocket URL configuration
  2. Audio Not Processing

    • Verify Palabra AI credentials
    • Check audio format compatibility
    • Monitor server logs for errors
  3. Calls Not Connecting

    • Verify Twilio credentials
    • Check phone number configuration
    • Ensure proper webhook URLs

Logs

The application provides detailed logging:

  • INFO: Connection status and call events
  • WARNING: Non-critical issues
  • ERROR: Errors and exceptions

πŸ“ API Documentation

Once the server is running, access the interactive API documentation at:

https://${HOST}:${PORT}/docs

About

A real-time audio translation application built with FastAPI, Twilio, and Palabra AI that enables live voice conversations between speakers of different languages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published