Home / MCP / Crawl4AI RAG MCP Server

Crawl4AI RAG MCP Server

Provides web crawling, RAG, and knowledge-graph capabilities for AI agents with a Supabase-backed vector store.

python
Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
    "mcpServers": {
        "crawl4ai_rag_docker": {
            "command": "docker",
            "args": [
                "run",
                "--env-file",
                ".env",
                "-p",
                "8051:8051",
                "mcp/crawl4ai-rag"
            ],
            "env": {
                "OPENAI_API_KEY": "sk-...",
                "SUPABASE_URL": "https://xyz.supabase.co",
                "SUPABASE_SERVICE_KEY": "eyJ...",
                "NEO4J_URI": "bolt://localhost:7687",
                "NEO4J_USER": "neo4j",
                "NEO4J_PASSWORD": "password",
                "TRANSPORT": "sse"
            }
        }
    }
}

You can run the Crawl4AI RAG MCP server to crawl web content, store it in a vector database, and perform retrieval-augmented generation over the crawled data. This enables AI agents and AI coding assistants to access up-to-date web knowledge, with flexible RAG strategies and optional knowledge graph features for validation and analysis.

How to use

Start the server using a local runtime or a container, then connect your MCP client to the running instance via the supported transport. Once connected, you can crawl websites, index their content into a vector store, run semantic searches over the crawled material, and, if enabled, extract and store code examples for quick retrieval. You can filter results by data source, enable hybrid keyword+semantic search, apply reranking for enhanced relevance, and optionally engage a knowledge graph to detect AI hallucinations and analyze repositories.

How to install

Prerequisites you need before installing: a container runtime or Python with a compatible setup, a vector database backend (Supabase), and an API key for the embedding model you intend to use.

Option 1: Run with Docker (recommended) Create a local clone of the project, then build and run the container.

git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag

docker build -t mcp/crawl4ai-rag --build-arg PORT=8051 .

docker run --env-file .env -p 8051:8051 mcp/crawl4ai-rag

Option 2: Run with uv directly (no Docker)

If you prefer running directly with Python, follow these steps to set up a virtual environment and install dependencies.

git clone https://github.com/coleam00/mcp-crawl4ai-rag.git
cd mcp-crawl4ai-rag

pip install uv

uv venv
.venv\Scripts\activate  # Windows
# on Mac/Linux: source .venv/bin/activate

uv pip install -e .
crawl4ai-setup

# Create a configuration file based on your environment, including keys and endpoints.

Configure and start the server

Create a configuration file named .env in the project root with settings for host, port, transport, API keys, embedding model, RAG strategies, and data sources. You can enable or disable features like contextual embeddings, hybrid search, agentic RAG, reranking, and knowledge graph according to your needs.

Additional configuration notes

Security and deployment considerations: use a secure API key management approach, protect your Supabase and Neo4j connections, and only expose the server to trusted networks. When running in a container, adjust host networking as appropriate for your environment.

Troubleshooting tips

If you cannot connect from your MCP client, verify that the server is listening on the configured host and port, ensure the .env file contains valid credentials for OpenAI, Supabase, and Neo4j (if enabled), and confirm that the selected RAG strategies are supported by your deployment.

Notes on knowledge graph features

Enabling the knowledge graph adds capabilities to parse GitHub repositories, detect hallucinations in generated code, and explore the knowledge graph. This feature requires a Neo4j setup and may have compatibility considerations with container deployments.

Available tools

crawl_single_page

Crawl a single web page and store its content in the vector database for later retrieval.

smart_crawl_url

Crawl an entire website intelligently based on URL type (sitemap, text file, or regular page) and follow internal links.

get_available_sources

Retrieve a list of all available content sources/domains indexed in the vector store.

perform_rag_query

Execute a semantic RAG query with optional source filtering to retrieve relevant content.

search_code_examples

(Requires USE_AGENTIC_RAG) Find and summarize code examples from crawled documentation.

parse_github_repository

Parse a GitHub repository into a knowledge graph in Neo4j, extracting classes, methods, and relationships.

check_ai_script_hallucinations

Validate AI-generated Python scripts against the knowledge graph to detect hallucinations.

query_knowledge_graph

Explore the indexed repositories, classes, methods, and other graph elements via an interactive query interface.