home / mcp / aivisspeech mcp server

AivisSpeech MCP Server

Provides an MCP-compliant interface to synthesize speech via the AivisSpeech Engine, enabling applications to generate voice output.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "kentaro-aivis-speech-mcp": {
      "command": "node",
      "args": [
        "/path/to/aivis-speech-mcp/dist/index.js"
      ],
      "env": {
        "AIVIS_SPEECH_API_URL": "http://localhost:10101",
        "AIVIS_SPEECH_SPEAKER_ID": "888753760"
      }
    }
  }
}

You can use the AivisSpeech MCP Server to expose speech synthesis capabilities to AI assistants and applications through the Model Context Protocol (MCP). This server talks to the AivisSpeech Engine to produce high‑quality voice output, while offering a type-safe TypeScript design and an architecture that’s easy to extend for your needs.

How to use

Connect your MCP client to the AivisSpeech MCP Server and request speech synthesis by sending the appropriate MCP requests. You can also retrieve speaker information and adjust voice styles as part of your integration. The server handles communication with the AivisSpeech Engine and returns synthesized audio data to your application.

How to install

Prerequisites you need before installing: Node.js 18.x or newer and npm 9.x or newer. The AivisSpeech Engine must be installed separately.

# Clone the repository
git clone https://github.com/kentaro/aivis-speech-mcp.git
cd aivis-speech-mcp

# Install dependencies
npm install

# Build the project
npm run build

# Set up environment variables
cp .env.sample .env
# Edit the .env file to configure settings

# Cursor MCP configuration
cp .cursor/mcp.json.sample .cursor/mcp.json
# In mcp.json, replace "/path/to/aivis-speech-mcp/dist/index.js" with the actual path
# Example: "C:/Users/yourname/path/to/aivis-speech-mcp/dist/index.js"

Notes on the final start command

If you need to run the server in development or production, follow the start flow described in the installation steps to run the local MCP server executable. The final start command is typically the runtime invocation shown in the build/run instructions (for example, the stdio configuration uses node to execute the built index.js).

How to install (continued)

After building, you can start the server in development or production mode using npm scripts described below.

Environment and configuration

Configure the connection to the AivisSpeech Engine and default speaker in your environment. The following are example values you can use as a starting point.

Configuration examples

# AivisSpeech API Configuration
AIVIS_SPEECH_API_URL=http://localhost:10101  # AivisSpeech Engine API endpoint

# Speaker Configuration
AIVIS_SPEECH_SPEAKER_ID=888753760  # Default speaker ID
{
  "mcpServers": {
    "AivisSpeech-MCP": {
      "command": "node",
      "args": ["/path/to/aivis-speech-mcp/dist/index.js"]
    }
  }
}

Troubleshooting

Common issues and fixes include:

  • Cannot connect to AivisSpeech Engine: verify AIVIS_SPEECH_API_URL in your .env file
  • Audio not playing: check your system audio settings and active output device
  • Speaker ID not found: ensure the AivisSpeech Engine is running and list available speaker IDs

Architecture

The MCP Server is composed of an MCP service that handles MCP-compliant requests and an AivisSpeech service that communicates with the Engine to perform speech synthesis.

API and MCP protocol

The server provides MCP endpoints for text-to-speech synthesis, retrieving speaker information, and configuring voice styles. For specifics on the MCP protocol, reference the MCP specifications and the Engine API documentation for available features and response formats.

Available tools

synthesize

Generate speech audio from text by sending a synthesis request to the AivisSpeech Engine through the MCP server.

getSpeakerInfo

Fetch information about available speakers from the AivisSpeech Engine.

setVoiceStyle

Configure voice style or parameters used during synthesis to influence tone, pace, and timbre.