home / mcp / crawl4ai mcp server

Crawl4AI MCP Server

Provides web search, crawling, and Retrieval Augmented Generation for AI agents and coding assistants.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "ai-enthusiasts-crawl4ai-rag-mcp": {
      "command": "docker",
      "args": [
        "exec",
        "-i",
        "crawl4aimcp-mcp-1",
        "uv",
        "run",
        "python",
        "src/main.py"
      ],
      "env": {
        "USE_KNOWLEDGE_GRAPH": "true"
      }
    }
  }
}

You deploy a self-contained MCP server that combines CRAWL4AI, SearXNG, and Supabase to enable AI agents and coding assistants to search the web, crawl content, store embeddings, and run retrieval-augmented generation workflows. It’s designed to be Docker-based with zero Python environment setup and an integrated, production-ready stack for fast, private web intelligence.

How to use

You interact with the MCP server through an MCP client to perform search, crawl, and RAG tasks. Start the stack, connect your client, and run workflows that query the built-in SearXNG search, scrape and crawl content, create vector embeddings, and execute RAG queries. You can also leverage advanced RAG strategies such as contextual embeddings, hybrid search, and agentic RAG for code examples.

Typical usage patterns include: initiating a search to discover relevant pages, triggering automated scraping of found URLs, storing content as embeddings, and then requesting RAG-processed results that focus on the most relevant chunks. You can also run a full workflow that starts from a query, gathers URLs, scrapes content, stores it, and returns either semantically organized results or raw Markdown content depending on your needs.

How to install

Prerequisites to run the MCP server locally or in a development environment are Docker and Docker Compose, with Make available for convenience commands. Ensure you have at least 8 GB of RAM for production-style workloads.

1) Clone the project repository and navigate into it.

2) Start the stack in production mode to run all services together.

make prod  # Starts all services in production mode

3) If you are developing, you can start services with hot reloading and debug logging.

make dev   # Starts services with hot reloading and debug logging

Configuration and runtime details

This MCP server includes a minimal, explicit example for connecting your MCP client (such as Claude Desktop) to run the MCP server inside a container. The example shows how to invoke the MCP runtime from a Docker container and set an environment variable to enable knowledge graph features.

{
  "mcpServers": {
    "crawl4ai-mcp": {
      "command": "docker",
      "args": [
        "exec", "-i", "crawl4aimcp-mcp-1",
        "uv", "run", "python", "src/main.py"
      ],
      "env": {
        "USE_KNOWLEDGE_GRAPH": "true"
      }
    }
  }
}

Notes on advanced features and tools

The server ships with a rich set of tools to perform web search, crawling, and RAG processing. Core tools include URL scraping, smart crawling, listing available sources, and performing RAG queries. A comprehensive, integrated search tool performs end-to-end workflows from search to RAG results.

Troubleshooting and tips

If services fail to start, verify the container state and view logs for each service. You can inspect running containers and examine their logs to diagnose startup issues.

If the MCP connection does not respond, test the MCP server directly inside the container by invoking the runtime command and checking the container logs for runtime errors.

Available tools

scrape_urls

Scrape one or more URLs and store their content in the vector database; supports single URLs and batch processing.

smart_crawl_url

Intelligently crawl a full website based on the type of URL (sitemap, llms-full.txt, or a regular webpage) with recursive traversal.

get_available_sources

Retrieve a list of all available content sources (domains) in the database.

perform_rag_query

Run a semantic search over crawled content with optional source filtering to retrieve relevant results.

search

Comprehensive web search that connects SearXNG results with automated scraping and RAG processing; returns either RAG-processed results or raw markdown content.

search_code_examples

Search specifically for code examples and summaries from crawled documentation (requires USE_AGENTIC_RAG=true).

parse_github_repository

Parse a GitHub repository into a Neo4j knowledge graph across multiple languages.

parse_local_repository

Parse local Git repositories directly without cloning, supporting multi-language codebases.

parse_repository_branch

Parse specific branches of repositories for version-specific analysis.

analyze_code_cross_language

Perform semantic search across multiple languages to identify similar patterns.

check_ai_script_hallucinations

Analyze Python scripts for AI hallucinations by validating imports and usage against the knowledge graph.

query_knowledge_graph

Explore and query the Neo4j knowledge graph with commands for repos, classes, methods, and more.

get_script_analysis_info

Get information about script analysis setup, available paths, and usage instructions for hallucination detection tools.

smart_code_search

Intelligent code search combining Qdrant semantic search with Neo4j structural validation and confidence scoring.

extract_and_index_repository_code

Bridge Neo4j data into Qdrant for searchable code examples with rich metadata.

check_ai_script_hallucinations_enhanced

Dual-validation hallucination detection using Neo4j and Qdrant with merged confidence scores.