home / mcp / crawl4ai+searxng mcp server

Crawl4AI+SearXNG MCP Server

Docker-based MCP stack combining Crawl4AI, SearXNG, and Supabase for web search, crawling, and RAG in AI agents and coding assistants.

Installation

Add the following to your MCP client configuration file.

Configuration

View docs

{
  "mcpServers": {
    "alexesom-crawl4ai-rag-mcp-gemini": {
      "url": "http://localhost:8051/sse",
      "headers": {
        "HOST": "0.0.0.0",
        "PORT": "8051",
        "NEO4J_URI": "bolt://localhost:7687",
        "TRANSPORT": "sse",
        "SEARXNG_URL": "http://searxng:8080",
        "MODEL_CHOICE": "gemini-2.5-flash-lite",
        "SUPABASE_URL": "https://xyz.supabase.co",
        "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
        "SUPABASE_SERVICE_KEY": "YOUR_SUPABASE_SERVICE_KEY"
      }
    }
  }
}

You deploy a self-contained MCP server stack that combines web crawling, private search, and RAG capabilities to empower AI agents and coding assistants. Everything runs in Docker, with an integrated SearXNG search experience, and optional knowledge graph features for code validation and hallucination detection. This setup lets you search the web, crawl content, chunk and store it as vector data, and run advanced RAG queries to answer questions or retrieve code examples with precision.

How to use

Once you have the stack running, you interact with it through your MCP client by connecting to the SSE endpoint or by launching the local stdio MCP if you prefer a direct process integration. Use the provided tools to search, crawl, and perform RAG queries. Common workflows include searching with SearXNG and then automatically scraping discovered URLs, storing them in the vector database, and retrieving semantically relevant results or raw content as needed.

How to install

Prerequisites: You need Docker and Docker Compose, and you should have access to a Supabase project for vector storage. You also need a Gemini API key for embeddings.

# Quick start steps
# 1) Get the stack up and running
 docker compose up -d

# 2) If you prefer the SSE transport for MCP client configuration, use the example MCP connection json snippet shown below to configure your client.

# 3) If you want to run the stdio-based local MCP, ensure you have a Python virtual environment ready and use the provided Python execution flow described in the configuration steps.

Configuration and components

The stack includes an integrated SearXNG private search instance, a Redis-compatible cache, and a Caddy reverse proxy for HTTPS. It exposes an MCP server at http://localhost:8051 for SSE connections and an internal SearXNG service at http://searxng:8080. You can enable advanced RAG strategies at runtime via environment variables in the .env file.

# Example environment values to populate your .env before starting
TRANSPORT=sse
HOST=0.0.0.0
PORT=8051

SEARXNG_URL=http://searxng:8080
SEARXNG_USER_AGENT=MCP-Crawl4AI-RAG-Server/1.0
SEARXNG_DEFAULT_ENGINES=google,bing,duckduckgo
SEARXNG_TIMEOUT=30

GEMINI_API_KEY=YOUR_GEMINI_API_KEY
MODEL_CHOICE=gemini-2.5-flash-lite
SUPABASE_URL=YOUR_SUPABASE_URL
SUPABASE_SERVICE_KEY=YOUR_SUPABASE_SERVICE_KEY

USE_CONTEXTUAL_EMBEDDINGS=false
USE_HYBRID_SEARCH=false
USE_AGENTIC_RAG=false
USE_RERANKING=false
USE_KNOWLEDGE_GRAPH=false

NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD

HTTP MCP connection (SSE)

Connect your MCP client using the SSE endpoint exposed by the stack. This is the recommended remote interface for most MCP clients.

{
  "mcpServers": {
    "crawl4ai_rag": {
      "transport": "sse",
      "url": "http://localhost:8051/sse"
    }
  }
}

Stdio MCP connection (local)

You can run a local, stdio-based MCP client by executing Python from your virtual environment and running the MCP script directly. This lets you integrate the MCP into your local tooling without container networking.

{
  "crawl4ai_rag": {
    "command": "/absolute/path/to/your/virtualenv/.venv/bin/python",
    "args": ["/absolute/path/to/repo/crawl4ai-rag-mcp-gemini/src/crawl4ai_mcp.py"],
    "envFile": "/absolute/path/to/stdio/envfile/.env"
  }
}

Tools and capabilities overview

The server provides key tools to manage the web intelligence stack, including scraping, crawling, semantic searching, and RAG processing. You can perform RAG queries with contextual embeddings, hybrid search, and knowledge graph support when enabled.

Running the stack and checks

Start the complete stack with Docker Compose, verify services, and perform a health check to confirm the server is up and reachable.

docker compose up -d

docker compose logs -f

docker compose ps

curl http://localhost:8051/health

Troubleshooting and notes

If something goes wrong, check the service logs for mcp-crawl4ai and searxng, verify internal networking, and ensure ports do not conflict. For production HTTPS, configure SEARXNG_HOSTNAME and SEARXNG_TLS and use your domain.

Available tools

scrape_urls

Scrape one or more URLs and store their content in the vector database. Supports single or batch processing.

smart_crawl_url

Intelligently crawl a full website based on the URL type (sitemap, llms-full.txt, or a regular webpage) and follow internal links.

get_available_sources

List all available sources (domains) in the database.

perform_rag_query

Run a semantic RAG query with optional source filtering to retrieve relevant content.

search

Comprehensive web search tool that integrates SearXNG with automated scraping and RAG processing; returns RAG results or raw markdown content.

search_code_examples

Search specifically for code examples and their summaries from crawled documentation (requires USE_AGENTIC_RAG=true).

parse_github_repository

Parse a GitHub repository into a Neo4j knowledge graph, extracting structure for validation.

check_ai_script_hallucinations

Analyze scripts for hallucinations by validating imports, calls, and usage against the knowledge graph.

query_knowledge_graph

Explore the knowledge graph with commands to list repos, classes, and methods.