home / mcp / doc scraper mcp server

Doc Scraper MCP Server

Provides documentation scraping from URLs and converts HTML to Markdown for MCP workflows.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "askjohngeorge-mcp-doc-scraper": {
      "command": "python",
      "args": [
        "-m",
        "mcp_doc_scraper"
      ]
    }
  }
}

You run a Doc Scraper MCP Server to convert web documentation into Markdown you can store and reuse. It lets you fetch documentation from any URL, convert it to markdown, and save it to a chosen output path, all inside the MCP framework to be used by your clients.

How to use

To use the Doc Scraper MCP Server with an MCP client, you start the local server and call its scrape_docs tool with a URL and an output path. The server exposes a single tool that accepts a URL to scrape and a destination path where the markdown should be saved. Your MCP client will invoke this tool as part of its workflow, handling the URL input and the desired output location.

Typical usage pattern: provide the URL of the documentation you want to convert and specify where to store the resulting markdown file. The server handles fetching the page, converting HTML to markdown, and writing the markdown file to your chosen location.

How to install

Prerequisites you need to prepare before installing this MCP server: Python 3.x, a working Python environment, and git for cloning the repository.

# Install the Doc Scraper MCP Server locally using Python
# 1) Clone the repository
git clone https://github.com/askjohngeorge/mcp-doc-scraper.git
cd mcp-doc-scraper

# 2) Create and activate a virtual environment
python -m venv venv
# On macOS/Linux
source venv/bin/activate
# On Windows
venv\Scripts\activate

# 3) Install dependencies in editable mode
pip install -e .
```} ,{

Run the server locally using Python. This starts the MCP-compatible service that exposes the scrape_docs tool.

python -m mcp_doc_scraper
```} ,{

Optional alternative installation via Smithery is supported for automatic client-specific setup. If you prefer that route, run the following command to install the Doc Scraper MCP Server for the Claude client.

npx -y @smithery/cli install @askjohngeorge/mcp-doc-scraper --client claude
```}]} ,{

Notes on installation flow: you may clone the repository, establish a virtual environment, install dependencies, and then start the server with the Python command shown above. If you want to run in a development environment, you can install development dependencies as needed and run the server in the same way.

Configuration and usage notes

The Doc Scraper MCP Server provides a single tool named scrape_docs that you call from your MCP client. It accepts two inputs: url for the documentation page you want to scrape and output_path for where the generated Markdown file should be saved.

Example inputs you would pass from your MCP client: a documentation URL and a local path such as /outputs/docs/target.md. The server handles the fetch, HTML-to-markdown conversion, and file write steps, returning control to your client once the file is saved.

Notes about environment and tooling

No special environment variables are required to run the server by default. The core dependencies are managed through Python packaging, including aiohttp, mcp, and pydantic.

If you plan to integrate with MCP tooling, ensure your client is configured to communicate with the local Python process started by python -m mcp_doc_scraper and that the client passes the url and output_path parameters to the scrape_docs tool.

Troubleshooting and tips

If the server fails to start, verify that your Python environment is active, dependencies are installed, and there are no port or file permission issues preventing the output path from being written.

If you encounter issues with scraping or conversion, check that the target URL is accessible and that the HTML content can be converted to Markdown. Review the output path for write permissions and ensure the path exists or can be created by the server process.

Available tools

scrape_docs

Scrape documentation from a URL and save it as a Markdown file at a specified output path