home / mcp / voicevox tts mcp server

VOICEVOX TTS MCP Server

Provides VOICEVOX based text-to-speech via MCP with multi-speaker control, streaming playback, and cross-platform support.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "kajidog-mcp-tts-voicevox": {
      "url": "http://localhost:3000/mcp",
      "headers": {
        "VOICEVOX_URL": "http://localhost:50021",
        "MCP_HTTP_HOST": "0.0.0.0",
        "MCP_HTTP_MODE": "true",
        "MCP_HTTP_PORT": "3000",
        "MCP_ALLOWED_HOSTS": "localhost,127.0.0.1",
        "VOICEVOX_USE_STREAMING": "false",
        "VOICEVOX_DEFAULT_SPEAKER": "1",
        "VOICEVOX_DEFAULT_IMMEDIATE": "true",
        "VOICEVOX_RESTRICT_IMMEDIATE": "false",
        "VOICEVOX_DEFAULT_SPEED_SCALE": "1.0",
        "VOICEVOX_DEFAULT_WAIT_FOR_END": "false",
        "VOICEVOX_RESTRICT_WAIT_FOR_END": "false",
        "VOICEVOX_DEFAULT_WAIT_FOR_START": "false",
        "VOICEVOX_RESTRICT_WAIT_FOR_START": "false"
      }
    }
  }
}

You run a VOICEVOX based text-to-speech MCP server that lets clients like Claude Desktop make your assistant speak in multiple voices, manage playback smoothly, and stream audio efficiently across Windows, macOS, and Linux.

How to use

You connect to the MCP server from your client and call the speak tool to convert text into speech. You can switch speakers within a single request by assigning different speaker IDs to different text segments. Playback can be immediate or queued with options to wait for end or start, and you can enable streaming playback when your environment supports it. Use a single call to deliver multi-segment, multi-speaker dialogue and enjoy responsive TTS in your AI agent.

How to install

Prerequisites you need before installing are Node.js 18.0.0 or higher and a running VOICEVOX Engine. It is recommended to also install ffplay for low-latency streaming playback.

Install the MCP server and required tooling with these steps.

# 1) Install Node.js (if you don’t have it)
#     Use a version manager or download from nodejs.org

# 2) Start VOICEVOX Engine (must be running)

# 3) Install dependencies and start the MCP server
#    This example uses npm via npx to run the server from the package

Additional setup and configs

Configure your client to point at the MCP server either in HTTP mode or via a local stdio process. The server can run in HTTP mode for remote access or as a local stdio service started by your client configurations.

{
  "mcpServers": {
    "tts-http": {
      "type": "http",
      "url": "http://localhost:3000/mcp",
      "args": []
    },
    "tts-stdio": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@kajidog/mcp-tts-voicevox"]
    }
  }
}

Troubleshooting

If audio does not play, verify the VOICEVOX Engine is running and that the playback tools are available on your platform (ffplay is optional but enables streaming). If the MCP client does not recognize the server, confirm the correct command or URL is used and that you restarted the client after changes.

Configuration

Environment variables influence how the MCP server operates. Common settings include VOICEVOX URL, default speaker, and streaming options. You can also restrict certain playback options for safety and consistency.

# Example environment variables
export VOICEVOX_URL=http://localhost:50021
export VOICEVOX_DEFAULT_SPEAKER=1
export VOICEVOX_DEFAULT_SPEED_SCALE=1.0
export VOICEVOX_USE_STREAMING=false
export VOICEVOX_DEFAULT_IMMEDIATE=true
export VOICEVOX_DEFAULT_WAIT_FOR_START=false
export VOICEVOX_DEFAULT_WAIT_FOR_END=false

Security and access

Limit access to the MCP server by configuring allowed hosts and origins when using HTTP mode. If you expose the server on a network, ensure only trusted clients can reach it.

Tools and usage

The server exposes a set of tools to perform synthesis and manage playback from MCP clients. The primary tool is speak, which converts text to speech. Other useful tools include ping_voicevox, get_speakers, get_speaker_detail, stop_speaker, generate_query, and synthesize_file.

Available tools

speak

Text-to-speech synthesis callable from MCP clients to convert text into audio. Supports per-segment speaker selection, speed control, and playback options.

ping_voicevox

Check connectivity to the VOICEVOX Engine.

get_speakers

Retrieve the list of available VOICEVOX speakers.

get_speaker_detail

Fetch detailed information for a specific speaker.

stop_speaker

Stop current playback and clear the queue.

generate_query

Create a speech synthesis query for debugging or inspection.

synthesize_file

Generate an audio file from text input.