A real-time voice AI assistant for conducting scaffolding safety inspections by phone, powered by OpenAI's Realtime API with Twilio integration, SQLite database storage, and Docker deployment.
- Real-time Voice Interaction: Speech-to-speech conversation with low latency
- Structured Data Collection: Tag identifier, inspector name, location, pass/fail, and comments
- SQLite Database: Persistent storage of all inspection records
- Twilio Integration: Connect via phone calls
- Caller Recognition: Automatically remembers and greets returning callers by name
- REST API: Query inspections by tag, location, result, or get statistics
- Docker Support: Easy deployment with Docker Compose
- MCP Tool Support: Extensible tool system for additional capabilities
- Interrupt Capable: Users can interrupt the AI mid-response
- Docker
- Docker Compose
- OpenAI API key with Realtime API access
- Node.js 20+
- OpenAI API key with Realtime API access
- (Optional) Twilio account with phone number for phone integration
- (Optional) ngrok or similar tool for local development
-
Clone and Configure
cp .env.example .env
Edit
.env
and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
-
Build and Run
docker compose up -d
This will:
- Build the Docker image
- Start the container in detached mode
- Create the
./data
directory for database persistence - Expose the server on port 5050
-
View Logs
docker compose logs -f
-
Stop the Service
docker compose down
-
Rebuild After Changes
docker compose up -d --build
The server will be available at http://localhost:5050
Database will be persisted in ./data/inspections.db
-
Clone and Install
npm install
-
Configure Environment
cp .env.example .env
Edit
.env
and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
-
Run the Server
npm start # or for development with auto-reload: npm run dev
Server will start on
http://localhost:5050
-
Run Tests (Optional)
npm test
OPENAI_API_KEY
: Your OpenAI API key (required)PORT
: Server port (default: 5050)VOICE
: Voice selection - alloy, echo, or shimmer (default: alloy)SYSTEM_MESSAGE
: Customize the AI assistant's personality
For phone integration:
-
Set up Twilio environment variables in
.env
:TWILIO_ACCOUNT_SID=your_account_sid TWILIO_AUTH_TOKEN=your_auth_token TWILIO_PHONE_NUMBER=your_phone_number
-
Configure your Twilio phone number's webhook:
- Voice webhook:
https://your-domain/incoming-call
- Method: POST
- Voice webhook:
-
For local development, use ngrok:
ngrok http 5050
Then use the ngrok URL for your Twilio webhook.
The server supports MCP (Model Context Protocol) for extending the AI with tools.
Enable MCP Servers:
-
Edit
.env
and specify which servers to use:MCP_SERVERS=weather,memory
-
Configure each server command:
MCP_WEATHER_COMMAND=npx -y @modelcontextprotocol/server-weather MCP_MEMORY_COMMAND=npx -y @modelcontextprotocol/server-memory
Available MCP Servers:
@modelcontextprotocol/server-weather
: Weather information@modelcontextprotocol/server-memory
: Persistent memory across conversations@modelcontextprotocol/server-filesystem
: File system operations@modelcontextprotocol/server-sqlite
: SQLite database queries@modelcontextprotocol/server-github
: GitHub API operations@modelcontextprotocol/server-brave-search
: Web search
See mcp-config.example.json
for more examples.
The application provides REST API endpoints to query inspection data:
curl http://localhost:5050/inspections
curl http://localhost:5050/inspections?limit=50
curl http://localhost:5050/inspections/tag/TAG-12345
curl http://localhost:5050/inspections/result/FAIL
curl http://localhost:5050/inspections/result/PASS?limit=20
curl http://localhost:5050/inspections/location/Building%207
curl http://localhost:5050/inspections/stats
Returns:
{
"stats": {
"total": 150,
"passed": 142,
"failed": 8,
"unique_inspectors": 12,
"unique_locations": 25
}
}
- Storage: SQLite database at
./data/inspections.db
- Schema:
- Inspections table: Equipment ID, inspector name, location, pass/fail result, comments, phone number, timestamps
- Callers table: Phone number, caller name, first/last call timestamps, total calls
- Caller Recognition: Phone numbers are automatically associated with names for personalized greetings
- Persistence: Database persisted in Docker volume
- Backup: Simply copy the
data/
directory
- User calls Twilio number
- System recognizes returning callers and greets them by name (e.g., "Welcome back, John!")
- AI: "What's your inspection tag number?" (or equipment location)
- User provides tag (e.g., "SCAFF-001")
- AI collects: name, location
- System saves caller's name for future calls
- AI asks: "Does the scaffolding pass or fail?"
- AI asks: "Any concerns to note?"
- AI submits structured JSON data to database
- AI: "You may now hang up, or let me know if you'd like to enter another inspection"
The application includes a comprehensive test suite with 112 tests covering:
- Database operations (34 tests) - Including safety checks
- Equipment management (36 tests)
- Validation logic (29 tests)
- Integration workflows (13 tests)
Run tests:
# Run all tests
npm test
# Run specific test suites
npm run test:database
npm run test:equipment
npm run test:validation
npm run test:integration
# Watch mode for development
npm run test:watch
Test coverage:
- ✅ 112 passing tests
- ✅ All core modules covered
- ✅ Edge cases and error handling
- ✅ Production safety checks
- ✅ Fast execution (<1 second)
ai-realtime-audio/
├── index.js # Main server and WebSocket handling
├── database.js # SQLite database operations
├── equipment.js # Equipment registry
├── validation.js # Input validation logic
├── system-prompt.txt # AI system instructions
├── data/ # SQLite database storage
├── test/ # Test suite
│ ├── database.test.js
│ ├── equipment.test.js
│ ├── validation.test.js
│ ├── integration.test.js
│ └── README.md
├── .env # Configuration
├── package.json # Dependencies and scripts
├── Dockerfile # Docker image definition
└── docker-compose.yml # Docker Compose configuration
When adding new functionality:
- Add tests in the appropriate test file
- Run tests to ensure they pass:
npm test
- Update documentation if needed
Example test:
describe('New Feature', function() {
it('should do something specific', function() {
const result = myFunction(input);
expect(result).to.equal(expected);
});
});
- User can either hang up or record additional inspections (loops back to step 3)
You can test the WebSocket connection directly:
# Connect to the WebSocket endpoint
wscat -c ws://localhost:5050/media-stream
Call your Twilio phone number and start talking to the AI assistant!
Phone Call → Twilio → WebSocket (media-stream) → Server → OpenAI Realtime API
↓
MCP Servers (tools)
The server:
- Receives audio from Twilio via WebSocket
- Forwards audio to OpenAI Realtime API
- Handles tool calls via MCP servers
- Streams AI responses back to Twilio
GET /
: Health check and statusPOST /incoming-call
: Twilio webhook for incoming calls (returns TwiML)WS /media-stream
: WebSocket endpoint for audio streaming
# Run with auto-reload
npm run dev
# View logs
# The server logs all events for debugging
"Missing OPENAI_API_KEY"
- Ensure
.env
file exists and contains your OpenAI API key
MCP server not starting
- Check that the command is correct in
.env
- Ensure the MCP server package is accessible (npx will auto-install)
- Check server logs for specific error messages
No audio in Twilio calls
- Verify webhook URL is publicly accessible
- Check that WebSocket URL in TwiML matches your server
- Ensure OpenAI API key has Realtime API access
ISC