Generative AI is producing a bunch of fun new models for us devs to poke at. Did you know you can use these over the phone?
Twilio gives you a superpower called Media Streams which gives you a Websocket connection to both sides of a phone call. You can get audio streamed to you, process it, and send audio back.
This repo serves as WIP demo but is exploring two models using Deepgram for Speech to Text and the incredibly fun elevenlabs for Text to Speech.
Sign up for Deepgram and ElevenLabs
Use something like ngrok to tunnel and then expose port 3000
ngrok http 3000Copy .env.example to .env and update keys
Set SERVER to your tunneled URL
Install the necessary packages
npm installStart the web server
node server.jsWire up your Twilio number using the console or CLI
twilio phone-numbers:update +18889876 --voice-url=https://your-server.ngrok.io/incomingThere is a Stream TwiML verb that will connect a stream to your websocket server.