home / mcp / crawl4ai+searxng mcp server
Docker-based MCP stack combining Crawl4AI, SearXNG, and Supabase for web search, crawling, and RAG in AI agents and coding assistants.
Configuration
View docs{
"mcpServers": {
"alexesom-crawl4ai-rag-mcp-gemini": {
"url": "http://localhost:8051/sse",
"headers": {
"HOST": "0.0.0.0",
"PORT": "8051",
"NEO4J_URI": "bolt://localhost:7687",
"TRANSPORT": "sse",
"SEARXNG_URL": "http://searxng:8080",
"MODEL_CHOICE": "gemini-2.5-flash-lite",
"SUPABASE_URL": "https://xyz.supabase.co",
"GEMINI_API_KEY": "YOUR_GEMINI_API_KEY",
"SUPABASE_SERVICE_KEY": "YOUR_SUPABASE_SERVICE_KEY"
}
}
}
}You deploy a self-contained MCP server stack that combines web crawling, private search, and RAG capabilities to empower AI agents and coding assistants. Everything runs in Docker, with an integrated SearXNG search experience, and optional knowledge graph features for code validation and hallucination detection. This setup lets you search the web, crawl content, chunk and store it as vector data, and run advanced RAG queries to answer questions or retrieve code examples with precision.
Once you have the stack running, you interact with it through your MCP client by connecting to the SSE endpoint or by launching the local stdio MCP if you prefer a direct process integration. Use the provided tools to search, crawl, and perform RAG queries. Common workflows include searching with SearXNG and then automatically scraping discovered URLs, storing them in the vector database, and retrieving semantically relevant results or raw content as needed.
Prerequisites: You need Docker and Docker Compose, and you should have access to a Supabase project for vector storage. You also need a Gemini API key for embeddings.
# Quick start steps
# 1) Get the stack up and running
docker compose up -d
# 2) If you prefer the SSE transport for MCP client configuration, use the example MCP connection json snippet shown below to configure your client.
# 3) If you want to run the stdio-based local MCP, ensure you have a Python virtual environment ready and use the provided Python execution flow described in the configuration steps.The stack includes an integrated SearXNG private search instance, a Redis-compatible cache, and a Caddy reverse proxy for HTTPS. It exposes an MCP server at http://localhost:8051 for SSE connections and an internal SearXNG service at http://searxng:8080. You can enable advanced RAG strategies at runtime via environment variables in the .env file.
# Example environment values to populate your .env before starting
TRANSPORT=sse
HOST=0.0.0.0
PORT=8051
SEARXNG_URL=http://searxng:8080
SEARXNG_USER_AGENT=MCP-Crawl4AI-RAG-Server/1.0
SEARXNG_DEFAULT_ENGINES=google,bing,duckduckgo
SEARXNG_TIMEOUT=30
GEMINI_API_KEY=YOUR_GEMINI_API_KEY
MODEL_CHOICE=gemini-2.5-flash-lite
SUPABASE_URL=YOUR_SUPABASE_URL
SUPABASE_SERVICE_KEY=YOUR_SUPABASE_SERVICE_KEY
USE_CONTEXTUAL_EMBEDDINGS=false
USE_HYBRID_SEARCH=false
USE_AGENTIC_RAG=false
USE_RERANKING=false
USE_KNOWLEDGE_GRAPH=false
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=YOUR_NEO4J_PASSWORDConnect your MCP client using the SSE endpoint exposed by the stack. This is the recommended remote interface for most MCP clients.
{
"mcpServers": {
"crawl4ai_rag": {
"transport": "sse",
"url": "http://localhost:8051/sse"
}
}
}You can run a local, stdio-based MCP client by executing Python from your virtual environment and running the MCP script directly. This lets you integrate the MCP into your local tooling without container networking.
{
"crawl4ai_rag": {
"command": "/absolute/path/to/your/virtualenv/.venv/bin/python",
"args": ["/absolute/path/to/repo/crawl4ai-rag-mcp-gemini/src/crawl4ai_mcp.py"],
"envFile": "/absolute/path/to/stdio/envfile/.env"
}
}The server provides key tools to manage the web intelligence stack, including scraping, crawling, semantic searching, and RAG processing. You can perform RAG queries with contextual embeddings, hybrid search, and knowledge graph support when enabled.
Start the complete stack with Docker Compose, verify services, and perform a health check to confirm the server is up and reachable.
docker compose up -d
docker compose logs -f
docker compose ps
curl http://localhost:8051/healthIf something goes wrong, check the service logs for mcp-crawl4ai and searxng, verify internal networking, and ensure ports do not conflict. For production HTTPS, configure SEARXNG_HOSTNAME and SEARXNG_TLS and use your domain.
Scrape one or more URLs and store their content in the vector database. Supports single or batch processing.
Intelligently crawl a full website based on the URL type (sitemap, llms-full.txt, or a regular webpage) and follow internal links.
List all available sources (domains) in the database.
Run a semantic RAG query with optional source filtering to retrieve relevant content.
Comprehensive web search tool that integrates SearXNG with automated scraping and RAG processing; returns RAG results or raw markdown content.
Search specifically for code examples and their summaries from crawled documentation (requires USE_AGENTIC_RAG=true).
Parse a GitHub repository into a Neo4j knowledge graph, extracting structure for validation.
Analyze scripts for hallucinations by validating imports, calls, and usage against the knowledge graph.
Explore the knowledge graph with commands to list repos, classes, and methods.