home / mcp / 语音转文字 mcp server

语音转文字 MCP Server

一些mcp服务

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "gongjiaben-mcp": {
      "url": "https://bailian.aliyuncs.com/v1/audio/transcriptions",
      "headers": {
        "DEFAULT_ENGINE": "remote_api",
        "BAILIAN_API_KEY": "your_bailian_api_key",
        "BAILIAN_API_URL": "https://bailian.aliyuncs.com/v1/audio/transcriptions",
        "DEFAULT_API_TYPE": "bailian",
        "DEFAULT_LANGUAGE": "zh-CN"
      }
    }
  }
}

This powerful MCP Server provides voice-to-text transcription with support for multiple engines, formats, and languages. It can transcribe single files or batches, analyze audio metadata, and convert formats, all through remote API calls or local integration via an MCP client.

How to use

You will use a client that talks to the MCP server to transcribe audio, analyze files, and convert formats. Start with launching the server, then choose an engine (remote API or local), and finally run transcription or batch processes. You can mix single-file transcriptions, batch jobs, and format conversions to fit your workflow.

How to install

# Prerequisites: Python and a MCP runtime (uv) available

# Option 1: Install with uv (recommended)
# Clone the project folder that contains the MCP server
# Follow project-specific clone steps if you have a repository URL

# Install dependencies via the runtime
uv sync

# Run the MCP server
uv run mcp dev main.py

# Or run directly
uv run python main.py

Additional configuration and usage notes

Configure the server defaults to tailor transcription behavior and API usage. Set the default language, engine, and API type, and provide your API keys and endpoints as environment variables.

# Environment defaults
export DEFAULT_LANGUAGE=zh-CN
export DEFAULT_ENGINE=remote_api
export DEFAULT_API_TYPE=bailian
export BAILIAN_API_KEY=your_bailian_api_key
export BAILIAN_API_URL=https://bailian.aliyuncs.com/v1/audio/transcriptions

What you can do with the MCP server

- Transcribe a single audio file using a chosen engine (remote API or Google Speech Recognition). - Transcribe multiple files in a batch. - Analyze an audio file to get format, duration, and sample rate. - Convert audio between formats like WAV, MP3, M4A, FLAC, OGG, and AAC. - Retrieve supported input formats and available output text formats (TXT, SRT, VTT).

Troubleshooting

If you encounter issues with API keys or endpoints, verify environment variables or pass them directly in the API call. For format or network problems, ensure FFmpeg is installed and the chosen engine has network access if using remote APIs.

Tools and endpoints overview

The server exposes functions to perform transcription and analysis tasks, including: transcribe_audio_file, transcribe_audio_data, transcribe_with_remote_api, batch_transcribe, analyze_audio_file, convert_audio_file_format, and get_supported_formats. It also provides a resource view via audio://info/{file_path} and audio://formats for supported formats.

Security and best practices

Keep API keys confidential. Use environment variables or secure secret management. Limit access to the MCP server to trusted clients, and rotate keys regularly.

Available tools

transcribe_audio_file

Transcribes a single audio file using a selected engine (remote API or Google Speech Recognition) and language.

transcribe_audio_data

Transcribes provided audio data into text, with language and engine options.

transcribe_with_remote_api

Transcribes audio by calling a remote API, requiring API type and credentials.

batch_transcribe

Transcribes multiple audio files in a batch, returning a collection of results.

analyze_audio_file

Analyzes an audio file to extract metadata such as format, duration, and sample rate.

convert_audio_file_format

Converts an audio file from one format to another (e.g., MP3 to WAV).

get_supported_formats

Returns the list of supported input and output formats.