home / mcp / speech mcp server

Speech MCP Server

Provides text-to-speech capabilities using Kokoro TTS with configurable voice and speed, via an MCP interface.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "hammeiam-koroko-speech-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "speech-mcp-server"
      ],
      "env": {
        "MCP_DEFAULT_VOICE": "af_bella",
        "MCP_DEFAULT_SPEECH_SPEED": "1.1"
      }
    }
  }
}

You have a Model Context Protocol (MCP) server that delivers high‑quality text-to-speech using the Kokoro TTS model. It is designed to be MCP‑compliant, easy to install, and quick to integrate with your preferred MCP client to convert text into natural speech with selectable voices and adjustable speed.

How to use

You connect to the Speech MCP Server from your MCP client to convert text to speech. Start by running the server with your preferred defaults for voice and speed, then call the available MCP tools to generate speech from text. You can list all available voices to choose from and check the model status during startup or if you need to troubleshoot initialization.

How to install

Prerequisites: ensure you have Node.js and a package manager installed on your system. Common options are npm, pnpm, or yarn.

Install the Speech MCP Server package using your chosen package manager.

Install with npm, pnpm, or yarn:

# Using npm
npm install speech-mcp-server

# Using pnpm (recommended)
pnpm add speech-mcp-server

# Using yarn
yarn add speech-mcp-server

Run and configure the server

Run the server with default settings or customize the default speech speed and voice using environment variables. You can start quickly with the default configuration or override values when starting the server.

Run with default configuration:

npm start

Run with custom defaults for speed and voice (override defaults when starting):

MCP_DEFAULT_SPEECH_SPEED=1.5 MCP_DEFAULT_VOICE=af_bella npm start

Using the server through MCP tools

The server exposes a set of MCP tools to perform text-to-speech tasks. You can convert text, adjust options, list voices, and check the model status. Use your MCP client to call these tools, and expect audio data or metadata in response.

Available tools

text_to_speech

Converts text to speech using the default settings.

text_to_speech_with_options

Converts text to speech with customizable speed and optional voice.

list_voices

Lists all available voices for text-to-speech.

get_model_status

Checks the initialization status of the TTS model.