home / mcp / whisper mcp server
Provides high‑performance Whisper-based transcription with batch support, CUDA acceleration, and multiple output formats.
Configuration
View docs{
"mcpServers": {
"biguncle-fast-whisper-mcp-server": {
"command": "python",
"args": [
"D:/path/to/whisper_server.py"
],
"env": {
"ENV_PLACEHOLDER": "YOUR_VALUE"
}
}
}
}You can run a high-performance Whisper-based MCP server that transcribes audio efficiently, supports batch processing, and outputs formats like VTT, SRT, and JSON. It includes model caching and dynamic batching to maximize GPU usage, making large transcription tasks fast and scalable.
Start by launching the MCP server locally through your preferred MCP client. Once running, you can access tools to get model information, transcribe a single audio file, or batch transcribe all audio files in a folder. For integration with GUI clients like Claude Desktop, configure the client to point at the Whisper MCP server and use the provided tools for common transcription tasks.
Prerequisites: Python 3.10 or later. You also need the following Python packages installed: faster-whisper>=0.9.0, torch==2.6.0+cu126, torchaudio==2.6.0+cu126, and mcp[cli]>=1.2.0.
1. Clone or download the project files.
2. Create and activate a virtual environment (recommended).Use a Python venv or your preferred environment manager.
3. Install dependencies.
pip install -r requirements.txtStart the server on your platform. On Windows, run the startup script. On other platforms, you can start the server with Python directly.
Configure your MCP client to connect to the Whisper server. If you are using Claude Desktop, place the following configuration in your Claude Desktop config file and restart Claude Desktop.
{
"mcpServers": {
"whisper": {
"command": "python",
"args": ["D:/path/to/whisper_server.py"],
"env": {}
}
}
}Retrieve information about available Whisper models (sizes, capabilities) to determine the best model for your transcription needs.
Transcribe a single audio file into the desired output format (VTT, SRT, or JSON).
Batch transcribe all audio files within a folder, with dynamic batching based on GPU memory.