AI Realtime Audio Assistant - Scaffolding Inspection

A real-time voice AI assistant for conducting scaffolding safety inspections by phone, powered by OpenAI's Realtime API with Twilio integration, SQLite database storage, and Docker deployment.

Features

Real-time Voice Interaction: Speech-to-speech conversation with low latency
Structured Data Collection: Tag identifier, inspector name, location, pass/fail, and comments
SQLite Database: Persistent storage of all inspection records
Twilio Integration: Connect via phone calls
Caller Recognition: Automatically remembers and greets returning callers by name
REST API: Query inspections by tag, location, result, or get statistics
Docker Support: Easy deployment with Docker Compose
MCP Tool Support: Extensible tool system for additional capabilities
Interrupt Capable: Users can interrupt the AI mid-response

Prerequisites

Option 1: Docker (Recommended)

Docker
Docker Compose
OpenAI API key with Realtime API access

Option 2: Local Development

Node.js 20+
OpenAI API key with Realtime API access
(Optional) Twilio account with phone number for phone integration
(Optional) ngrok or similar tool for local development

Quick Start with Docker

Clone and Configure

cp .env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Build and Run
```
docker compose up -d
```
This will:
- Build the Docker image
- Start the container in detached mode
- Create the ./data directory for database persistence
- Expose the server on port 5050
View Logs
```
docker compose logs -f
```
Stop the Service
```
docker compose down
```
Rebuild After Changes
```
docker compose up -d --build
```

The server will be available at http://localhost:5050

Database will be persisted in ./data/inspections.db

Quick Start (Local Development)

Clone and Install
```
npm install
```

Configure Environment

cp .env.example .env

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Run the Server

npm start
# or for development with auto-reload:
npm run dev

Server will start on http://localhost:5050

Run Tests (Optional)
```
npm test
```

Configuration

Basic Configuration

OPENAI_API_KEY: Your OpenAI API key (required)
PORT: Server port (default: 5050)
VOICE: Voice selection - alloy, echo, or shimmer (default: alloy)
SYSTEM_MESSAGE: Customize the AI assistant's personality

Twilio Configuration (Optional)

For phone integration:

Set up Twilio environment variables in .env:

TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=your_phone_number

Configure your Twilio phone number's webhook:
- Voice webhook: https://your-domain/incoming-call
- Method: POST
For local development, use ngrok:
```
ngrok http 5050
```
Then use the ngrok URL for your Twilio webhook.

MCP Tool Integration

The server supports MCP (Model Context Protocol) for extending the AI with tools.

Enable MCP Servers:

Edit .env and specify which servers to use:
```
MCP_SERVERS=weather,memory
```

Configure each server command:

MCP_WEATHER_COMMAND=npx -y @modelcontextprotocol/server-weather
MCP_MEMORY_COMMAND=npx -y @modelcontextprotocol/server-memory

Available MCP Servers:

@modelcontextprotocol/server-weather: Weather information
@modelcontextprotocol/server-memory: Persistent memory across conversations
@modelcontextprotocol/server-filesystem: File system operations
@modelcontextprotocol/server-sqlite: SQLite database queries
@modelcontextprotocol/server-github: GitHub API operations
@modelcontextprotocol/server-brave-search: Web search

See mcp-config.example.json for more examples.

Inspection Data API

The application provides REST API endpoints to query inspection data:

Get All Inspections

curl http://localhost:5050/inspections
curl http://localhost:5050/inspections?limit=50

Get Inspection by Tag

curl http://localhost:5050/inspections/tag/TAG-12345

Filter by Result (PASS/FAIL)

curl http://localhost:5050/inspections/result/FAIL
curl http://localhost:5050/inspections/result/PASS?limit=20

Search by Location

curl http://localhost:5050/inspections/location/Building%207

Get Statistics

curl http://localhost:5050/inspections/stats

Returns:

{
  "stats": {
    "total": 150,
    "passed": 142,
    "failed": 8,
    "unique_inspectors": 12,
    "unique_locations": 25
  }
}

Database

Storage: SQLite database at ./data/inspections.db
Schema:
- Inspections table: Equipment ID, inspector name, location, pass/fail result, comments, phone number, timestamps
- Callers table: Phone number, caller name, first/last call timestamps, total calls
Caller Recognition: Phone numbers are automatically associated with names for personalized greetings
Persistence: Database persisted in Docker volume
Backup: Simply copy the data/ directory

Usage

Inspection Call Flow

User calls Twilio number
System recognizes returning callers and greets them by name (e.g., "Welcome back, John!")
AI: "What's your inspection tag number?" (or equipment location)
User provides tag (e.g., "SCAFF-001")
AI collects: name, location
System saves caller's name for future calls
AI asks: "Does the scaffolding pass or fail?"
AI asks: "Any concerns to note?"
AI submits structured JSON data to database
AI: "You may now hang up, or let me know if you'd like to enter another inspection"

Development

Testing

The application includes a comprehensive test suite with 112 tests covering:

Database operations (34 tests) - Including safety checks
Equipment management (36 tests)
Validation logic (29 tests)
Integration workflows (13 tests)

Run tests:

# Run all tests
npm test

# Run specific test suites
npm run test:database
npm run test:equipment
npm run test:validation
npm run test:integration

# Watch mode for development
npm run test:watch

Test coverage:

✅ 112 passing tests
✅ All core modules covered
✅ Edge cases and error handling
✅ Production safety checks
✅ Fast execution (<1 second)

Project Structure

ai-realtime-audio/
├── index.js              # Main server and WebSocket handling
├── database.js           # SQLite database operations
├── equipment.js          # Equipment registry
├── validation.js         # Input validation logic
├── system-prompt.txt     # AI system instructions
├── data/                 # SQLite database storage
├── test/                 # Test suite
│   ├── database.test.js
│   ├── equipment.test.js
│   ├── validation.test.js
│   ├── integration.test.js
│   └── README.md
├── .env                  # Configuration
├── package.json          # Dependencies and scripts
├── Dockerfile            # Docker image definition
└── docker-compose.yml    # Docker Compose configuration

Adding New Tests

When adding new functionality:

Add tests in the appropriate test file
Run tests to ensure they pass: npm test
Update documentation if needed

Example test:

describe('New Feature', function() {
  it('should do something specific', function() {
    const result = myFunction(input);
    expect(result).to.equal(expected);
  });
});

User can either hang up or record additional inspections (loops back to step 3)

Testing Without Twilio

You can test the WebSocket connection directly:

# Connect to the WebSocket endpoint
wscat -c ws://localhost:5050/media-stream

With Twilio

Call your Twilio phone number and start talking to the AI assistant!

Architecture

Phone Call → Twilio → WebSocket (media-stream) → Server → OpenAI Realtime API
                                                      ↓
                                                 MCP Servers (tools)

The server:

Receives audio from Twilio via WebSocket
Forwards audio to OpenAI Realtime API
Handles tool calls via MCP servers
Streams AI responses back to Twilio

API Endpoints

GET /: Health check and status
POST /incoming-call: Twilio webhook for incoming calls (returns TwiML)
WS /media-stream: WebSocket endpoint for audio streaming

Development

# Run with auto-reload
npm run dev

# View logs
# The server logs all events for debugging

Troubleshooting

"Missing OPENAI_API_KEY"

Ensure .env file exists and contains your OpenAI API key

MCP server not starting

Check that the command is correct in .env
Ensure the MCP server package is accessible (npx will auto-install)
Check server logs for specific error messages

No audio in Twilio calls

Verify webhook URL is publicly accessible
Check that WebSocket URL in TwiML matches your server
Ensure OpenAI API key has Realtime API access

Resources

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mocharc.json		.mocharc.json
Dockerfile		Dockerfile
README.md		README.md
TEST-QUICK-REF.md		TEST-QUICK-REF.md
database.js		database.js
docker-compose.yml		docker-compose.yml
equipment.js		equipment.js
index.js		index.js
mcp-config.example.json		mcp-config.example.json
package-lock.json		package-lock.json
package.json		package.json
system-prompt.txt		system-prompt.txt
validation.js		validation.js

e-gineering/ai-realtime-audio

Folders and files

Latest commit

History

Repository files navigation

AI Realtime Audio Assistant - Scaffolding Inspection

Features

Prerequisites

Option 1: Docker (Recommended)

Option 2: Local Development

Quick Start with Docker

Quick Start (Local Development)

Configuration

Basic Configuration

Twilio Configuration (Optional)

MCP Tool Integration

Inspection Data API

Get All Inspections

Get Inspection by Tag

Filter by Result (PASS/FAIL)

Search by Location

Get Statistics

Database

Usage

Inspection Call Flow

Development

Testing

Project Structure

Adding New Tests

Testing Without Twilio

With Twilio

Architecture

API Endpoints

Development

Troubleshooting

Resources

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages