home / mcp / whisper speech recognition mcp server

Whisper Speech Recognition MCP Server

Faster Whisper MCP Server - AI-powered audio transcription using Whisper model with MCP integration. Supports multiple languages, batch processing, and various output formats (VTT, SRT, JSON).

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "jacobngai-fast-whisper-mcp-server": {
      "command": "python",
      "args": [
        "D:/path/to/whisper_server.py"
      ]
    }
  }
}

You can run a high-performance Whisper-based speech recognition MCP server that transcribes audio efficiently, supports batch processing, CUDA acceleration when available, and outputs formats like VTT, SRT, and JSON. This guide shows you how to install, run, configure with an MCP client, and use the available transcription tools.

How to use

You interact with the server through an MCP client to transcribe audio files or batches. The server exposes tools to get model information and perform transcriptions in single-file or batch modes. To start using it, launch the server locally, configure it in your MCP client, and then invoke the transcription tools from your client’s UI or CLI.

How to install

# Prerequisites
# Ensure you have Python 3.10+
# Install dependencies in a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # on Unix or macOS
venv\Scripts\activate     # on Windows

# Install Python dependencies
pip install -r requirements.txt

# Optional: Install PyTorch and torchaudio per your CUDA version
# CUDA 12.6
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126

# CUDA 12.1
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

# CPU only
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
```

```bash
# Verify CUDA availability (optional)
nvcc --version
nvidia-smi
```

```bash
# Start the server (example)
python whisper_server.py

Configuration and usage notes

You can configure an MCP client to start and control the server. A common configuration example for integrating with a client like Claude Desktop is shown below. This config runs the server locally via Python and points to the script that starts the server.

{
  "mcpServers": {
    "whisper": {
      "command": "python",
      "args": ["D:/path/to/whisper_server.py"],
      "env": {}
    }
  }
}

Starting and using the server with an MCP client

To start the server on Windows, you can run the batch script if available. For other platforms, use Python to start the server: Make sure you are in the environment where dependencies are installed, then run the start command.

python whisper_server.py

Available transcription tools

The server provides the following transcription tools you can invoke from your MCP client: - get_model_info — Retrieve information about available Whisper models - transcribe — Transcribe a single audio file - batch_transcribe — Transcribe multiple audio files in a folder

{
  "tool": "get_model_info",
  "description": "Get information about available Whisper models"
}

Available tools

get_model_info

Get information about available Whisper models

transcribe

Transcribe a single audio file to text and formats (VTT, SRT, JSON)

batch_transcribe

Batch transcribe audio files in a folder for bulk processing