home / mcp / vision mcp server

Vision MCP Server

Provides image analysis via Vision Language Models by exposing an MCP server that processes images through a visual prompt.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "i-richardwang-vision-mcp": {
      "command": "uvx",
      "args": [
        "vision-mcp"
      ],
      "env": {
        "OPENAI_MODEL": "gpt-4o",
        "OPENAI_API_KEY": "YOUR_API_KEY",
        "OPENAI_API_BASE": "https://api.openai.com"
      }
    }
  }
}

Vision MCP exposes an image-analysis MCP server powered by Vision Language Models. It lets you analyze images by sending prompts and image sources through an MCP client, enabling seamless image understanding within your workflows and applications.

How to use

You run the Vision MCP server and connect an MCP client to it to analyze images. Start by ensuring your environment variables for the vision model are set, then run the MCP server command from your client configuration. Use the available tool to analyze images by providing a prompt and an image source (URL or local file path). The server supports common image formats such as JPEG, PNG, and WebP.

How to install

Prerequisites: you need Python’s package manager uv installed. You will also configure your MCP client to point at the server. Follow these steps to set up Vision MCP on your machine.

curl -LsSf https://astral.sh/uv/install.sh | sh
```

```sh
# Example client configuration snippet (clip to your MCP client setup)
# This configures the Vision MCP server so your client can reach it

Running the Vision MCP server

After installation, you start the Vision MCP server using the client-facing command specified in your configuration. The example setup runs a local runtime that exposes the Vision MCP endpoint through the uvx command as vision-mcp.

uvx vision-mcp

Available tools

analyze_image

Analyze and understand image content from files or URLs using a Vision Language Model. Provide a prompt and an image source to return insights about the image.