Documentation Crawler MCP server for AI agents

This MCP server provides a specialized documentation service for developers, allowing accurate API lookups and eliminating AI hallucinations when working with framework documentation in tools like Cursor.

Installation

Before installing, you can configure Puppeteer to use your existing Chrome browser:

On macOS/Linux:

export PUPPETEER_SKIP_DOWNLOAD=true
npm install

On Windows (Command Prompt):

set PUPPETEER_SKIP_DOWNLOAD=true
npm install

On Windows (PowerShell):

$env:PUPPETEER_SKIP_DOWNLOAD = $true
npm install

Document Crawling Setup

Before running the server, you need to crawl documentation from framework websites.

Creating Crawler Configuration

Create a doc-sources.js file in the config directory:

// config/doc-sources.js

// Document source configuration
export const docSources = [
    {
        // Document source name - used for source parameter in searches
        name: "taro",
        // Base URL of the documentation website
        url: "https://docs.taro.zone/docs",
        // Include patterns - specify URL paths to crawl (empty array means all pages)
        includePatterns: [
        ],
        // Exclude patterns - specify URL paths to skip (supports regex)
        excludePatterns: [
            /\d\.x/,   // Exclude version pages
            /apis/     // Exclude API pages
        ]
    },
    {
        name: "taroify",
        url: "https://taroify.github.io/taroify.com/introduce/",
        includePatterns: [
            "/components/",          // All component pages
            "/components/*/",        // Component sub-pages
            "/components/*/*/"       // Component sub-sub-pages
        ],
        excludePatterns: []
    },
    {
        name: "jquery",
        url: "https://www.jquery123.com/",
        includePatterns: [],         // Empty array means crawl all pages
        excludePatterns: [
            /version/               // Exclude version-related pages
        ]
    }
];

// Global crawler configuration
export const crawlerConfig = {
    // Number of parallel threads
    maxConcurrency: 40,
    // Page load timeout (milliseconds)
    pageLoadTimeout: 30000,
    // Content load timeout (milliseconds)
    contentLoadTimeout: 5000,
    // Whether to show browser window (false for headless mode)
    headless: false,
    // Retry count
    maxRetries: 3,
    // Retry delay (milliseconds)
    retryDelay: 2000,
    // Request delay (milliseconds)
    requestDelay: 1000
};

Running the Crawler

After setting up the configuration, start the crawler:

npm run crawl

The crawler will automatically fetch documents from the specified websites and save them in the JSON format required by the MCP server.

Starting the MCP Server

Once documents are crawled, start the server:

npm start

The server will automatically detect and load document files from the docs directory and provide interfaces through the MCP protocol.

Usage

The server provides two main MCP tools:

Searching Documents

Use the search_docs tool to search for documentation based on keywords:

// Search for documents
const searchRequest = {
  jsonrpc: "2.0",
  id: "search1",
  method: "tools/call",
  params: {
    name: "search_docs",
    arguments: { 
      query: "component", 
      source: "taro", 
      limit: 5 
    }
  }
};

Parameters:

query: Search keyword (string, required)
source: Document source name (string, optional)
limit: Maximum number of results (number, optional, default 10)

Special function: When the query is "reload", the server will reload all documents.

Getting Document Details

Use the get_doc_detail tool to retrieve detailed information about a specific document:

// Get document details
const detailRequest = {
  jsonrpc: "2.0",
  id: "detail1",
  method: "tools/call",
  params: {
    name: "get_doc_detail",
    arguments: { 
      id: "https://docs.taro.zone/docs/components-desc", 
      source: "taro" 
    }
  }
};

Parameters:

id: Document ID (string, required)
source: Document source name (string, optional)

Reloading Documents

To reload all documents:

// Reload documents
const reloadRequest = {
  jsonrpc: "2.0",
  id: "reload1",
  method: "tools/call",
  params: {
    name: "search_docs",
    arguments: { 
      query: "reload" 
    }
  }
};

Configuring Cursor

To use this server with Cursor, add the following configuration to your mcp.json file:

{
  "mcpServers": {
    "Documentation MCP Server": {
      "command": "node",
      "args": ["/absolute/path/to/server.js"],
      "env": { "NODE_ENV": "development" }
    }
  }
}

Note: Make sure to use the full absolute path to the server file, not a relative path. The server will automatically output a Cursor-compatible configuration example when started.

Running Tests

You can run automated tests to verify the server's functionality:

npm test

The tests check basic server functions:

MCP server initialization
Search tool calls
Document detail tool calls

How to install this MCP server

For Claude Code

To add this MCP server to Claude Code, run this command in your terminal:

claude mcp add-json "MCP" '{"command":"node","args":["/\u7edd\u5bf9\u8def\u5f84/server.js"],"env":{"NODE_ENV":"development"}}'

See the official Claude Code MCP documentation for more details.

For Cursor

There are two ways to add an MCP server to Cursor. The most common way is to add the server globally in the ~/.cursor/mcp.json file so that it is available in all of your projects.

If you only need the server in a single project, you can add it to the project instead by creating or adding it to the .cursor/mcp.json file.

Adding an MCP server to Cursor globally

To add a global MCP server go to Cursor Settings > Tools & Integrations and click "New MCP Server".

When you click that button the ~/.cursor/mcp.json file will be opened and you can add your server like this:

{
    "mcpServers": {
        "\u6587\u6863 MCP \u670d\u52a1\u5668": {
            "command": "node",
            "args": [
                "/\u7edd\u5bf9\u8def\u5f84/server.js"
            ],
            "env": {
                "NODE_ENV": "development"
            }
        }
    }
}

Adding an MCP server to a project

To add an MCP server to a project you can create a new .cursor/mcp.json file or add it to the existing one. This will look exactly the same as the global MCP server example above.

How to use the MCP server

Once the server is installed, you might need to head back to Settings > MCP and click the refresh button.

The Cursor agent will then be able to see the available tools the added MCP server has available and will call them when it needs to.

You can also explicitly ask the agent to use the tool by mentioning the tool name and describing what the function does.

For Claude Desktop

To add this MCP server to Claude Desktop:

1. Find your configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

2. Add this to your configuration file:

{
    "mcpServers": {
        "\u6587\u6863 MCP \u670d\u52a1\u5668": {
            "command": "node",
            "args": [
                "/\u7edd\u5bf9\u8def\u5f84/server.js"
            ],
            "env": {
                "NODE_ENV": "development"
            }
        }
    }
}

3. Restart Claude Desktop for the changes to take effect