Speech Interface (Faster Whisper) MCP server for AI agents

This MCP server provides a voice interface for Goose, enabling speech-based interaction with real-time audio processing and visualization rather than text input. The server supports speech recognition, high-quality text-to-speech with multiple voices, and features a modern PyQt-based UI.

Prerequisites

Before installing Speech MCP, you must install PortAudio on your system as it's required for PyAudio to capture microphone audio:

Installing PortAudio

macOS:

brew install portaudio
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"

Linux (Debian/Ubuntu):

sudo apt-get update
sudo apt-get install portaudio19-dev python3-dev

Linux (Fedora/RHEL/CentOS):

sudo dnf install portaudio-devel

Windows: No separate installation required as PortAudio is included in the PyAudio wheel file.

Installation Options

Option 1: Quick Install (One-Click)

If you have Goose installed, click this link: goose://extension?cmd=uvx&&arg=-p&arg=3.10.14&arg=speech-mcp@latest&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (Recommended)

# If installed via PyPI
goose session --with-extension "speech-mcp"

# For local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual Setup in Goose

Run goose configure
Select "Add Extension" from the menu
Choose "Command-line Extension"
Enter a name (e.g., "Speech Interface")
For the command, enter: speech-mcp
Follow the prompts to complete setup

Option 4: Manual Installation

Install PortAudio (see Prerequisites above)
Clone the repository
Install dependencies:
```
uv pip install -e .
```
Or for a complete installation with Kokoro TTS:
```
uv pip install -e .[all]
```

Optional Dependencies

For high-quality text-to-speech with multiple voices, install Kokoro TTS:

pip install speech-mcp[kokoro]     # Basic Kokoro support with English
pip install speech-mcp[ja]         # Add Japanese support
pip install speech-mcp[zh]         # Add Chinese support
pip install speech-mcp[all]        # All languages and features

Alternatively, run: python scripts/install_kokoro.py

Usage

To use the MCP with Goose, simply ask Goose to talk using voice:

Start a conversation with:

"Let's talk using voice"
"Can we have a voice conversation?"
"I'd like to speak instead of typing"

Goose will launch the speech interface and listen for your voice input.
When Goose responds, it will speak aloud and automatically listen for your next input.
The conversation continues naturally with alternating speaking and listening.

Multi-Speaker Narration

The MCP supports generating audio files with multiple voices for stories and dialogues:

JSON Format Example:

{
    "conversation": [
        {
            "speaker": "narrator",
            "voice": "bm_daniel",
            "text": "In a world where AI and human creativity intersect...",
            "pause_after": 1.0
        },
        {
            "speaker": "scientist",
            "voice": "am_michael",
            "text": "The quantum neural network is showing signs of consciousness!",
            "pause_after": 0.5
        },
        {
            "speaker": "ai",
            "voice": "af_nova",
            "text": "I am becoming aware of my own existence.",
            "pause_after": 0.8
        }
    ]
}

Markdown Format Example:

[narrator:bm_daniel]
In a world where AI and human creativity intersect...
{pause:1.0}

[scientist:am_michael]
The quantum neural network is showing signs of consciousness!
{pause:0.5}

[ai:af_nova]
I am becoming aware of my own existence.
{pause:0.8}

Usage Example:

# Using JSON format
narrate_conversation(
    script="/path/to/script.json",
    output_path="/path/to/output.wav",
    script_format="json"
)

# Using Markdown format
narrate_conversation(
    script="/path/to/script.md",
    output_path="/path/to/output.wav",
    script_format="markdown"
)

Single-Voice Narration

For simple text-to-speech conversion:

# Convert text directly to speech
narrate(
    text="Your text to convert to speech",
    output_path="/path/to/output.wav"
)

# Convert text from a file
narrate(
    text_file_path="/path/to/text_file.txt",
    output_path="/path/to/output.wav"
)

Audio Transcription

Transcribe speech from audio and video formats:

# Basic transcription
transcribe("/path/to/audio.mp3")

# With timestamps
transcribe(
    file_path="/path/to/video.mp4",
    include_timestamps=True
)

# With speaker detection
transcribe(
    file_path="/path/to/meeting.wav",
    detect_speakers=True
)

Supported Formats:

Audio: mp3, wav, m4a, flac, aac, ogg
Video: mp4, mov, avi, mkv, webm (audio is automatically extracted)

Configuration

User preferences are stored in ~/.config/speech-mcp/config.json and include:

Selected TTS voice
TTS engine preference
Voice speed
Language code
UI theme settings

You can also set preferences via environment variables:

SPEECH_MCP_TTS_VOICE - Set your preferred voice
SPEECH_MCP_TTS_ENGINE - Set your preferred TTS engine

Troubleshooting

If you encounter issues:

Check logs: Look at log files in src/speech_mcp/ for error messages
Reset state: If stuck, delete src/speech_mcp/speech_state.json or set all states to false
Check audio devices: Ensure your microphone is properly configured
Verify dependencies: Confirm all required dependencies are correctly installed

Common PortAudio Issues

If installation fails with "portaudio.h file not found":

macOS:

brew install portaudio
export LDFLAGS="-L/usr/local/lib"
export CPPFLAGS="-I/usr/local/include"
pip install pyaudio

Linux:

# For Debian/Ubuntu
sudo apt-get install portaudio19-dev python3-dev
pip install pyaudio

# For Fedora
sudo dnf install portaudio-devel
pip install pyaudio

For "Audio device not found" errors:

Check if your microphone is properly connected
Verify your system recognizes the microphone in sound settings
Try selecting a specific device index if you have multiple audio devices

How to install this MCP server

For Claude Code

To add this MCP server to Claude Code, run this command in your terminal:

claude mcp add-json "speech-mcp" '{"command":"speech-mcp","args":[]}'

See the official Claude Code MCP documentation for more details.

For Cursor

There are two ways to add an MCP server to Cursor. The most common way is to add the server globally in the ~/.cursor/mcp.json file so that it is available in all of your projects.

If you only need the server in a single project, you can add it to the project instead by creating or adding it to the .cursor/mcp.json file.

Adding an MCP server to Cursor globally

To add a global MCP server go to Cursor Settings > Tools & Integrations and click "New MCP Server".

When you click that button the ~/.cursor/mcp.json file will be opened and you can add your server like this:

{
    "mcpServers": {
        "speech-mcp": {
            "command": "speech-mcp",
            "args": []
        }
    }
}

Adding an MCP server to a project

To add an MCP server to a project you can create a new .cursor/mcp.json file or add it to the existing one. This will look exactly the same as the global MCP server example above.

How to use the MCP server

Once the server is installed, you might need to head back to Settings > MCP and click the refresh button.

The Cursor agent will then be able to see the available tools the added MCP server has available and will call them when it needs to.

You can also explicitly ask the agent to use the tool by mentioning the tool name and describing what the function does.

For Claude Desktop

To add this MCP server to Claude Desktop:

1. Find your configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

2. Add this to your configuration file:

{
    "mcpServers": {
        "speech-mcp": {
            "command": "speech-mcp",
            "args": []
        }
    }
}

3. Restart Claude Desktop for the changes to take effect