home / mcp / whisper speech recognition mcp server
Faster Whisper MCP Server - AI-powered audio transcription using Whisper model with MCP integration. Supports multiple languages, batch processing, and various output formats (VTT, SRT, JSON).
Configuration
View docs{
"mcpServers": {
"jacobngai-fast-whisper-mcp-server": {
"command": "python",
"args": [
"D:/path/to/whisper_server.py"
]
}
}
}You can run a high-performance Whisper-based speech recognition MCP server that transcribes audio efficiently, supports batch processing, CUDA acceleration when available, and outputs formats like VTT, SRT, and JSON. This guide shows you how to install, run, configure with an MCP client, and use the available transcription tools.
You interact with the server through an MCP client to transcribe audio files or batches. The server exposes tools to get model information and perform transcriptions in single-file or batch modes. To start using it, launch the server locally, configure it in your MCP client, and then invoke the transcription tools from your client’s UI or CLI.
# Prerequisites
# Ensure you have Python 3.10+
# Install dependencies in a virtual environment (recommended)
python -m venv venv
source venv/bin/activate # on Unix or macOS
venv\Scripts\activate # on Windows
# Install Python dependencies
pip install -r requirements.txt
# Optional: Install PyTorch and torchaudio per your CUDA version
# CUDA 12.6
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# CUDA 12.1
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# CPU only
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu
```
```bash
# Verify CUDA availability (optional)
nvcc --version
nvidia-smi
```
```bash
# Start the server (example)
python whisper_server.pyYou can configure an MCP client to start and control the server. A common configuration example for integrating with a client like Claude Desktop is shown below. This config runs the server locally via Python and points to the script that starts the server.
{
"mcpServers": {
"whisper": {
"command": "python",
"args": ["D:/path/to/whisper_server.py"],
"env": {}
}
}
}To start the server on Windows, you can run the batch script if available. For other platforms, use Python to start the server: Make sure you are in the environment where dependencies are installed, then run the start command.
python whisper_server.pyThe server provides the following transcription tools you can invoke from your MCP client: - get_model_info — Retrieve information about available Whisper models - transcribe — Transcribe a single audio file - batch_transcribe — Transcribe multiple audio files in a folder
{
"tool": "get_model_info",
"description": "Get information about available Whisper models"
}Get information about available Whisper models
Transcribe a single audio file to text and formats (VTT, SRT, JSON)
Batch transcribe audio files in a folder for bulk processing