home / mcp / qwen video understanding mcp server
Provides a bridge for Claude and other agents to analyze long videos and images using Qwen3-VL on Modal.
Configuration
View docs{
"mcpServers": {
"adamanz-qwen-video-mcp-server": {
"command": "uv",
"args": [
"--directory",
"/Users/adamanz/qwen-video-mcp-server",
"run",
"server.py"
],
"env": {
"MODAL_APP": "qwen-video-understanding",
"MODAL_WORKSPACE": "adam-31541",
"QWEN_IMAGE_ENDPOINT": "Auto-generated",
"QWEN_VIDEO_ENDPOINT": "Auto-generated"
}
}
}
}You deploy and run an MCP server that connects Claude to a video understanding model on Modal, enabling you to analyze long videos and images with recall, grounding, OCR, and customizable prompts. This guide walks you through using, installing, configuring, and troubleshooting the Qwen Video Understanding MCP Server so you can leverage powerful video analysis capabilities in your AI workflows.
Use the server with an MCP client to run video and image analyses, extract on-screen text and speech, generate summaries, and perform targeted question answering about video content. The server acts as a bridge between Claude and your Qwen3-VL model on Modal, handling prompts, framing, and result extraction for you.
Prerequisites you must have before starting:
Follow these concrete steps to install and run the MCP server locally and connect it to Claude.
# 1. Deploy the Model to Modal (if not already done)
cd ~/qwen-video-modal
modal deploy qwen_video.py
# 2. Install the MCP Server
cd ~/qwen-video-mcp-server
pip install -e .
# Optional alternative using uv
# uv pip install -e .
# 3. Configure Environment
cp .env.example .env
# Edit .env with your Modal workspace name
# 4. Add to Claude Desktop (MCP config example)
# This is the runtime config used by Claude Desktop to start the MCP server
```
Note: Adapt paths and workspace/app names to your environment as needed.Configuration, usage, and troubleshooting details help you tailor the MCP server to your environment and catch issues early.
Server configuration and environment variables are shown below. Use these values as a reference when wiring up your Claude Desktop client.
{
"mcpServers": {
"qwen_video": {
"command": "uv",
"args": [
"--directory",
"/Users/youruser/qwen-video-mcp-server",
"run",
"server.py"
],
"env": {
"MODAL_WORKSPACE": "your-modal-workspace",
"MODAL_APP": "qwen-video-understanding"
}
}
},
"envVars": [
{"name": "MODAL_WORKSPACE", "description": "Your Modal workspace/username", "example": "your-workspace"},
{"name": "MODAL_APP", "description": "Name of the Modal app for the MCP server", "example": "qwen-video-understanding"},
{"name": "QWEN_IMAGE_ENDPOINT", "description": "Override image endpoint URL", "example": "https://image-endpoint.example"},
{"name": "QWEN_VIDEO_ENDPOINT", "description": "Override video endpoint URL", "example": "https://video-endpoint.example"}
]
}If you see a timeout, a 502/503 error, or a video URL accessibility issue, try the following:
For development workflows, install dev dependencies and run tests to keep the MCP server reliable.
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytestThe MCP server supports analyzing videos and images via URLs, reporting on-frame content, generating summaries in various styles, extracting on-screen text and speech, answering targeted questions, and comparing frames to track changes over time. You configure and run the server through a local command that starts the server process under the Modal VPS connection, then connect Claude to issue prompts and receive results.
Analyze a video via URL with a custom prompt and optional constraints like max_frames to control processing.
Analyze an image via URL with a custom prompt to describe or extract details.
Generate a video summary in different styles such as brief, standard, or detailed.
Extract on-screen text and transcribe speech from a video.
Answer specific questions about the content of a video.
Analyze changes and progression in a video by comparing frames.
Check the status of the Modal endpoint configuration.
List all server capabilities and supported formats.