home / mcp / qwen3-vl video understanding mcp server (blaxel)
Provides MCP-based video and image analysis via Blaxel using Qwen3-VL-8B-Instruct, including summarization and text extraction.
Configuration
View docs{
"mcpServers": {
"adamanz-qwen-video-blaxel-mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/qwen-video-blaxel-mcp",
"run",
"server.py"
],
"env": {
"BLAXEL_MODEL": "qwen-qwen3-vl-8b-instruct",
"BLAXEL_API_KEY": "YOUR_API_KEY",
"BLAXEL_API_URL": "https://api.blaxel.ai/v1"
}
}
}
}You deploy and run the Qwen3-VL Video Understanding MCP Server on Blaxel to enable Claude and other agents to analyze videos and images. This MCP server uses Qwen3-VL-8B-Instruct on Blaxel’s H100 GPUs to provide video analysis, image analysis, video summarization, text extraction, and video Q&A, all orchestrated through a lightweight local MCP workflow.
You will interact with this MCP server from your MCP client to perform common analysis tasks. You can analyze a video by URL with a custom prompt, analyze an image by URL, generate summaries in different styles, extract on-screen text and transcriptions, and ask targeted questions about video content. When you start the server, you provide your Blaxel API key and model, then call the available functions from your client to perform these actions.
Typical usage patterns include sending a video URL and a natural language prompt to describe or query the content, requesting a summarized version of the video in a chosen style, or extracting both text and speech from video footage. Your MCP client will expose these functions so you can integrate them into your workflows and agents.
Prerequisites you need before installation: Python 3.10 or newer, and ffmpeg for video frame extraction. You also need an active Blaxel account and access to the Blaxel CLI.
1) Deploy the model to Blaxel using the provided configuration. This registers the Qwen3-VL-8B-Instruct model on your Blaxel account with GPU support.
cat << 'EOF' | blaxel apply -f -
apiVersion: blaxel.ai/v1alpha1
kind: Model
metadata:
name: qwen-qwen3-vl-8b-instruct
displayName: Qwen/Qwen3-VL-8B-Instruct
spec:
enabled: true
policies: []
flavors:
- name: nvidia-h100/x4
type: gpu
runtime:
model: Qwen/Qwen3-VL-8B-Instruct
type: hf_private_endpoint
image: ''
args: []
endpointName: qwenqwen3-vl-8b-instruct-nvidia-h100
organization: adamanz
integrationConnections:
- huggingface-4s2m2h
EOF
```
Or use the provided config:
```bash
blaxel apply -f blaxel-model.yaml2) Get your API key from Blaxel to authenticate requests.
blaxel auth token3) Install the MCP Server locally. Change to the project directory and install in editable mode.
cd qwen-video-blaxel-mcp
pip install -e .
```
Or use the uv runner for development:
```bash
uv pip install -e .4) Configure environment variables for your Blaxel credentials and model.
cp .env.example .env
# Edit .env with your Blaxel API key and desired model5) Add the MCP server configuration to Claude Desktop so you can run it directly from the Claude interface.
{
"mcpServers": {
"qwen_blaxel_mcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/qwen-video-blaxel-mcp",
"run",
"server.py"
],
"env": {
"BLAXEL_API_KEY": "your-blaxel-api-key",
"BLAXEL_MODEL": "qwen-qwen3-vl-8b-instruct"
}
}
}
}6) Restart Claude Desktop to load the new MCP server. The qwen_blaxel_mcp tools will be available for use from your Claude interface.
Configuration and prerequisites ensure the server can access Blaxel with your API key and use the Qwen3-VL-8B-Instruct model on H100 GPUs. ffmpeg is required to extract frames from videos for analysis. You will specify URLs for videos/images when calling the tools from your MCP client.
Required formats: Video formats include mp4, webm, mov, and avi; image formats include jpg, jpeg, png, gif, and webp.
If you need to inspect or adjust the Blaxel configuration, you can run a quick check to verify API connectivity and model readiness.
Security: Keep your Blaxel API key secret. Use environment variables in a secure way and restrict access to the Claude desktop configuration.
Analyze a video from a URL with a custom prompt to extract content, frames, and insights.
Analyze a single image from a URL and generate descriptive responses based on the prompt.
Create a summary of the video in a chosen style (brief, standard, or detailed).
Answer specific questions about the video content, given a URL.
Extract on-screen text and transcribe spoken content from a video.
Check the current Blaxel API configuration for correctness.
List all capabilities exposed by this MCP server.