home / mcp / webcrawler mcp server

Webcrawler MCP Server

Extract website content, map links, and generate Markdown content for multiple URLs.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "jmh108-md-webcrawl-mcp": {
      "command": "fastmcp",
      "args": [
        "dev",
        "server.py",
        "--with-editable",
        "."
      ],
      "env": {
        "OUTPUT_PATH": "./output",
        "REQUEST_TIMEOUT": "30",
        "MAX_CONCURRENT_REQUESTS": "5"
      }
    }
  }
}

You can run a Python-based MCP web crawler that extracts website content and saves it as Markdown files, maps site structure, and processes multiple URLs in batches. This server is useful for building searchable content catalogs from web pages and quickly generating offline documentation-like markdown from live sites.

How to use

To use this MCP server, you interact with an MCP client to run tools that crawl web pages, extract content, and build indexes of linked content. You can extract content from a URL and save it as Markdown, then create an index of the content map for easy navigation. Typical workflows let you batch process several URLs and store the resulting Markdown files in a chosen output directory.

How to install

Prerequisites: You need Python 3.7 or newer and a working Python environment. You also need FastMCP installed to manage MCP servers.

# 1) Clone the project repository
git clone https://github.com/yourusername/webcrawler.git
cd webcrawler
# 2) Install Python dependencies
pip install -r requirements.txt
# 3) Optional: Configure environment variables
export OUTPUT_PATH=./output  # Set your preferred output directory

Install the MCP runtime tooling and start the server using FastMCP. The following commands install the MCP server package and then set up a development instance. Use a path you control for server.py as needed.

# 4) Install FastMCP (Python-based MCP runtime)
pip install fastmcp
# 5) Install the MCP server entry point
fastmcp install server.py
# 6) Run the server in development mode
fastmcp dev server.py --with-editable .

Configuration and usage notes

You can customize where crawled content is saved by setting OUTPUT_PATH. You can control concurrency with MAX_CONCURRENT_REQUESTS and adjust request timeouts with REQUEST_TIMEOUT. These variables help tune performance and resource usage when crawling many URLs.

Environment variables you may configure include OUTPUT_PATH for the output directory, MAX_CONCURRENT_REQUESTS for parallel requests, and REQUEST_TIMEOUT for per-request timeout in seconds.

Examples of typical workflows

# Extract content from a single URL and save as Markdown
mcp call extract_content --url "https://example.com" --output_path "example.md"

# Scan linked content from a URL and create an index of the content map
mcp call scan_linked_content --url "https://example.com" | \
mcp call create_index --content_map - --output_path "index.md"

Available tools

extract_content

Tool to fetch a webpage and extract its main content, saving it as a Markdown file in the specified output path.

scan_linked_content

Tool to crawl a URL and enumerate linked pages, producing a content map that can be used to build indexes.

create_index

Tool to generate an index from a content map, producing a Markdown index file that links to extracted content.