home / mcp / image recognition mcp server

Image Recognition MCP Server

An MCP server that provides image recognition 👀 capabilities using Anthropic and OpenAI vision APIs

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "mario-andreschak-mcp-image-recognition": {
      "command": "python",
      "args": [
        "-m",
        "image_recognition_server.server"
      ],
      "env": {
        "LOG_LEVEL": "INFO",
        "ENABLE_OCR": "true",
        "OPENAI_MODEL": "gpt-4o-mini",
        "TESSERACT_CMD": "/path/to/tesseract",
        "OPENAI_API_KEY": "YOUR_OPENAI_API_KEY",
        "OPENAI_TIMEOUT": "60",
        "OPENAI_BASE_URL": "https://openai.example/api/v1",
        "VISION_PROVIDER": "anthropic",
        "ANTHROPIC_API_KEY": "YOUR_ANTHROPIC_API_KEY",
        "FALLBACK_PROVIDER": "OPENAI_OR_ANTHROPOID"
      }
    }
  }
}

You have an MCP Image Recognition Server that provides image understanding capabilities by leveraging Claude Vision from Anthropic or GPT-4 Vision from OpenAI. It accepts images in common formats, can input images as base64 data or as files, and can optionally extract text with Tesseract OCR. This server helps you build intelligent image analysis into your apps or workflows with configurable primary and fallback providers.

How to use

To use the server, run it locally or expose it to your network, then connect your MCP client to the server endpoint. You can describe an image by sending either base64-encoded image data with its MIME type or by pointing the client to an image file. If you enable OCR, the server can also extract text from images using Tesseract before or after description.

Practical usage patterns include describing an image from raw data, describing an image from a file, and combining description with optional OCR results. You can configure the primary and fallback vision providers to switch between Anthropic Claude Vision and OpenAI GPT-4 Vision as needed.

How to install

Prerequisites: Python 3.8 or higher is required. Tesseract OCR is optional but needed if you want text extraction.

# 1) Clone the project
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition

# 2) Prepare environment
cp .env.example .env
# Edit .env with your API keys and preferences

# 3) Build on Windows (if using the provided build script)
build.bat

Run the server in normal mode using Python, or start via the Windows batch script.

# Run with Python
python -m image_recognition_server.server

# Alternative Windows start
run.bat server

Configuration and notes

Configure your environment in the .env file to enable and customize vision providers, API access, and optional OCR.

Environment variables you can set include the following. Use placeholder values where you do not yet have real keys.

ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
VISION_PROVIDER=anthropic  # or openai
FALLBACK_PROVIDER=OPENAI_OR_ANTHROPOID   # optional
LOG_LEVEL=INFO
ENABLE_OCR=true
TESSERACT_CMD=/path/to/tesseract  # optional
OPENAI_MODEL=gpt-4o-mini
OPENAI_BASE_URL=https://openai.example/api/v1  # optional for custom endpoints
OPENAI_TIMEOUT=60

Notes on usage and capabilities

Supported input formats include JPEG, PNG, GIF, and WebP. You can provide images as Base64 data or as file paths. When OCR is enabled, you get text extraction results alongside image descriptions.

OpenRouter and alternative OpenAI-compatible models are supported through configurable OPENAI_BASE_URL and OPENAI_MODEL settings. If using an OpenRouter endpoint, set OPENAI_BASE_URL to the router API and model to an OpenRouter-formatted string.

Available tools

describe_image

Generates a detailed description of an image from base64 data and MIME type.

describe_image_from_file

Generates a detailed description of an image from a file path.