home / skills / openclaw / skills / phone-agent
This skill enables a real-time voice AI on calls, transcribing speech, reasoning with an LLM, and speaking responses via TTS.
npx playbooks add skill openclaw/skills --skill phone-agentReview the files below or copy the command above to add this skill to your agents.
---
name: phone-agent
description: "Run a real-time AI phone agent using Twilio, Deepgram, and ElevenLabs. Handles incoming calls, transcribes audio, generates responses via LLM, and speaks back via streaming TTS. Use when user wants to: (1) Test voice AI capabilities, (2) Handle phone calls programmatically, (3) Build a conversational voice bot."
---
# Phone Agent Skill
Runs a local FastAPI server that acts as a real-time voice bridge.
## Architecture
```
Twilio (Phone) <--> WebSocket (Audio) <--> [Local Server] <--> Deepgram (STT)
|
+--> OpenAI (LLM)
+--> ElevenLabs (TTS)
```
## Prerequisites
1. **Twilio Account**: Phone number + TwiML App.
2. **Deepgram API Key**: For fast speech-to-text.
3. **OpenAI API Key**: For the conversation logic.
4. **ElevenLabs API Key**: For realistic text-to-speech.
5. **Ngrok** (or similar): To expose your local port 8080 to Twilio.
## Setup
1. **Install Dependencies**:
```bash
pip install -r scripts/requirements.txt
```
2. **Set Environment Variables** (in `~/.moltbot/.env`, `~/.clawdbot/.env`, or export):
```bash
export DEEPGRAM_API_KEY="your_key"
export OPENAI_API_KEY="your_key"
export ELEVENLABS_API_KEY="your_key"
export TWILIO_ACCOUNT_SID="your_sid"
export TWILIO_AUTH_TOKEN="your_token"
export PORT=8080
```
3. **Start the Server**:
```bash
python3 scripts/server.py
```
4. **Expose to Internet**:
```bash
ngrok http 8080
```
5. **Configure Twilio**:
- Go to your Phone Number settings.
- Set "Voice & Fax" -> "A Call Comes In" to **Webhook**.
- URL: `https://<your-ngrok-url>.ngrok.io/incoming`
- Method: `POST`
## Usage
Call your Twilio number. The agent should answer, transcribe your speech, think, and reply in a natural voice.
## Customization
- **System Prompt**: Edit `SYSTEM_PROMPT` in `scripts/server.py` to change the persona.
- **Voice**: Change `ELEVENLABS_VOICE_ID` to use different voices.
- **Model**: Switch `gpt-4o-mini` to `gpt-4` for smarter (but slower) responses.
This skill runs a real-time AI phone agent that connects Twilio calls to a local FastAPI server, transcribes audio with Deepgram, generates conversational responses with an LLM, and returns spoken audio via ElevenLabs streaming TTS. It is designed for rapid testing and prototyping of voice AI, or for programmatic handling of phone calls. The setup uses ngrok (or similar) to expose a local port to Twilio for live phone integration.
Incoming Twilio calls are routed to the FastAPI server over a webhook and upgraded to a WebSocket audio stream. The server forwards audio to Deepgram for speech-to-text, passes transcriptions to an LLM for intent and response generation, and streams ElevenLabs TTS audio back to the caller. Configuration is driven by environment variables for API keys, Twilio credentials, and voice/model choices.
What external services do I need?
You need Twilio (phone + TwiML app), Deepgram (STT), an LLM provider (OpenAI or similar), and ElevenLabs (TTS).
Can I run this in production?
This setup is intended for prototyping. For production, replace ngrok with a stable HTTPS endpoint, add authentication, scale handling, and robust error/retry logic.