home / mcp / qwen video understanding mcp server

Qwen Video Understanding MCP Server

Provides a bridge for Claude and other agents to analyze long videos and images using Qwen3-VL on Modal.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "adamanz-qwen-video-mcp-server": {
      "command": "uv",
      "args": [
        "--directory",
        "/Users/adamanz/qwen-video-mcp-server",
        "run",
        "server.py"
      ],
      "env": {
        "MODAL_APP": "qwen-video-understanding",
        "MODAL_WORKSPACE": "adam-31541",
        "QWEN_IMAGE_ENDPOINT": "Auto-generated",
        "QWEN_VIDEO_ENDPOINT": "Auto-generated"
      }
    }
  }
}

You deploy and run an MCP server that connects Claude to a video understanding model on Modal, enabling you to analyze long videos and images with recall, grounding, OCR, and customizable prompts. This guide walks you through using, installing, configuring, and troubleshooting the Qwen Video Understanding MCP Server so you can leverage powerful video analysis capabilities in your AI workflows.

How to use

Use the server with an MCP client to run video and image analyses, extract on-screen text and speech, generate summaries, and perform targeted question answering about video content. The server acts as a bridge between Claude and your Qwen3-VL model on Modal, handling prompts, framing, and result extraction for you.

How to install

Prerequisites you must have before starting:

  • Modal account with access to a Modal workspace
  • Python 3.10 or newer
  • A deployed Qwen video understanding model on Modal

Follow these concrete steps to install and run the MCP server locally and connect it to Claude.

# 1. Deploy the Model to Modal (if not already done)
cd ~/qwen-video-modal
modal deploy qwen_video.py

# 2. Install the MCP Server
cd ~/qwen-video-mcp-server
pip install -e .

# Optional alternative using uv
# uv pip install -e .

# 3. Configure Environment
cp .env.example .env
# Edit .env with your Modal workspace name

# 4. Add to Claude Desktop (MCP config example)
# This is the runtime config used by Claude Desktop to start the MCP server
```

Note: Adapt paths and workspace/app names to your environment as needed.

Additional sections

Configuration, usage, and troubleshooting details help you tailor the MCP server to your environment and catch issues early.

Server configuration and environment variables are shown below. Use these values as a reference when wiring up your Claude Desktop client.

{
  "mcpServers": {
    "qwen_video": {
      "command": "uv",
      "args": [
        "--directory",
        "/Users/youruser/qwen-video-mcp-server",
        "run",
        "server.py"
      ],
      "env": {
        "MODAL_WORKSPACE": "your-modal-workspace",
        "MODAL_APP": "qwen-video-understanding"
      }
    }
  },
  "envVars": [
    {"name": "MODAL_WORKSPACE", "description": "Your Modal workspace/username", "example": "your-workspace"},
    {"name": "MODAL_APP", "description": "Name of the Modal app for the MCP server", "example": "qwen-video-understanding"},
    {"name": "QWEN_IMAGE_ENDPOINT", "description": "Override image endpoint URL", "example": "https://image-endpoint.example"},
    {"name": "QWEN_VIDEO_ENDPOINT", "description": "Override video endpoint URL", "example": "https://video-endpoint.example"}
  ]
}

Troubleshooting

If you see a timeout, a 502/503 error, or a video URL accessibility issue, try the following:

  • Check that the video URL is publicly accessible and does not require authentication
  • Reduce max_frames or use shorter video segments to avoid timeouts
  • Wait and retry if the Modal container is starting up (cold start)
  • Ensure the Modal workspace and app names in the environment match your actual setup

Development

For development workflows, install dev dependencies and run tests to keep the MCP server reliable.

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

Notes on usage and capabilities

The MCP server supports analyzing videos and images via URLs, reporting on-frame content, generating summaries in various styles, extracting on-screen text and speech, answering targeted questions, and comparing frames to track changes over time. You configure and run the server through a local command that starts the server process under the Modal VPS connection, then connect Claude to issue prompts and receive results.

Available tools

analyze_video

Analyze a video via URL with a custom prompt and optional constraints like max_frames to control processing.

analyze_image

Analyze an image via URL with a custom prompt to describe or extract details.

summarize_video

Generate a video summary in different styles such as brief, standard, or detailed.

extract_video_text

Extract on-screen text and transcribe speech from a video.

video_qa

Answer specific questions about the content of a video.

compare_video_frames

Analyze changes and progression in a video by comparing frames.

check_endpoint_status

Check the status of the Modal endpoint configuration.

list_capabilities

List all server capabilities and supported formats.