Home / MCP / Kokoro TTS MCP Server

Kokoro TTS MCP Server

Provides text-to-speech synthesis via Kokoro TTS with configurable voices, speed, and optional playback/save.

python
Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
    "mcpServers": {
        "kokoro_tts": {
            "command": "/Users/giannisan/pinokio/bin/miniconda/bin/uv",
            "args": [
                "--directory",
                "/Users/giannisan/Documents/Cline/MCP/kokoro-tts-mcp",
                "run",
                "tts-mcp.py"
            ]
        }
    }
}

You can use the Kokoro TTS MCP Server to add high‑quality text‑to‑speech synthesis to your applications. It exposes a single MCP tool that converts text into spoken audio with customizable voices and playback options, making it easy to integrate voice features into your projects.

How to use

Connect your MCP client to the kokoro_tts server, then invoke the generate_speech tool. You provide the text you want spoken, choose a voice, and optionally adjust the speed. You can also save the audio to a file and play it back immediately if you wish.

How to install

Prerequisites you need before installing are Python 3.10 or higher and the uv package manager.

curl -LsSf https://astral.sh/uv/install.sh | sh
```

```bash
uv venv
source .venv/bin/activate  # On Windows, use: .venv\Scripts\activate
uv pip install .

Configuration and start commands

The MCP server runs via a stdio (local) configuration that starts the uv runtime and executes the tts-mcp script. This is the complete runtime command as shown in the configuration example.

{
  "mcpServers": {
    "kokoro_tts": {
      "type": "stdio",
      "name": "kokoro_tts",
      "command": "/Users/giannisan/pinokio/bin/miniconda/bin/uv",
      "args": [
        "--directory",
        "/Users/giannisan/Documents/Cline/MCP/kokoro-tts-mcp",
        "run",
        "tts-mcp.py"
      ],
      "env": []
    }
  }
}

Usage notes for generating speech

Within an MCP client you call the generate_speech tool with the required text and optional parameters. The tool supports a default voice, a speed multiplier, optional saving of the audio, and optional immediate playback.

Key parameters you can provide when calling generate_speech:

  • text: The text to convert to speech (required)
  • voice: The voice to use for synthesis (default: af_heart)
  • speed: Speech speed multiplier (default: 1.0)
  • save_path: Directory to save audio files
  • play_audio: Whether to play the audio immediately (default: False)

Notes on dependencies and platform support

The server relies on Kokoro TTS and its Python integration, along with audio playback support across Windows, macOS, and Linux.

Examples of client usage

In your MCP client, you can request speech generation like this (conceptual description): pass the text you want spoken, choose a voice such as af_heart, set speed to 1.0, and request immediate playback if desired. You can also specify a save_path to store the resulting audio file for later use.

Security and maintenance notes

Keep the server and its dependencies up to date. Use secure connections for MCP endpoints if you expose the server remotely. Monitor for updates to Kokoro TTS and the MCP client libraries you integrate.

Available tools

generate_speech

Generates speech from text with optional voice, speed, and playback/save options. Returns audio data or a saved file path based on configuration.