home / mcp / site crawler mcp server

Site Crawler MCP Server

Provides advanced website crawling with asset extraction, SEO, security, and compliance analyses.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "andacguven-site-crawler-mcp": {
      "command": "uvx",
      "args": [
        "--from",
        "/path/to/site-crawler-mcp",
        "site-crawler-mcp"
      ],
      "env": {
        "PYTHONPATH": "/path/to/site-crawler-mcp/src",
        "CRAWLER_TIMEOUT": "30",
        "CRAWLER_USER_AGENT": "SiteCrawlerMCP/1.0",
        "CRAWLER_MAX_CONCURRENT": "5"
      }
    }
  }
}

Site Crawler MCP lets you run a local MCP server that crawls websites and extracts assets, metadata, and security/SEO information. It’s designed for e-commerce sites and general web crawling needs, enabling multi-mode data collection, polite crawling, and structured outputs for business insights.

How to use

You run the Site Crawler MCP server through an MCP client by configuring an MCP server entry and starting the local process. You can start a server in different ways depending on your preferred runtime environment: via uvx (recommended for development), via uv run, or by invoking Python directly. Each method connects to the local server process and exposes the same capabilities through the MCP protocol.

To use it in your workflow, add a server entry that points to how you will start the process. You’ll typically provide a short, stable name for the server, specify the runtime command, and pass the appropriate arguments to launch the server. Once configured, you can request crawling operations that extract images, metadata, SEO data, brand information, and more in a single pass.

How to install

Prerequisites you need before installing: Python 3.10+ and a working development environment. You will also use a virtual environment and a package manager to install dependencies.

Option A: Install from PyPI when published

pip install site-crawler-mcp

Option B: Install from source (development)

Using uv (Recommended) follow these steps to set up a development environment and install the package:

# Clone the repository
git clone https://github.com/AndacGuven/site-crawler-mcp.git
cd site-crawler-mcp

# Create virtual environment with Python 3.12
uv venv --python 3.12
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies and package
uv sync

Option B: Install from source using a plain Python environment

# Clone the repository
git clone https://github.com/AndacGuven/site-crawler-mcp.git
cd site-crawler-mcp

# Create virtual environment (recommended)
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .

Additional setup notes

If you plan to run the MCP server locally, you’ll use one of the following runtime invocations to start the server and connect with your MCP client. The examples below show the exact commands you’ll run from your project path when starting the server.

Configuration and runtime tips

Configure the server to respect concurrency and timeouts to match your environment. The crawler supports setting maximum concurrent requests and a request timeout, and you can customize the user agent string for your crawls.

For best results, tailor the crawl depth and the maximum number of pages to crawl. The system is designed to deduplicate URLs and respect politeness delays to avoid overloading target sites.

Examples and common workflows

Use the multi-mode extraction to gather a mix of data types in one run. Typical combinations include images, meta data, SEO information, and brand details to build a comprehensive view of a site.

Available tools

site_crawlAssets

Crawl a website and extract various assets across multiple modes such as images, meta, brand, seo, performance, security, compliance, infrastructure, legal, careers, references, and contact information.

images

Extract all images with metadata including alt text, dimensions, and format.

meta

Extract basic SEO metadata like title, description, and H1 tags.

brand

Extract branding information such as logo, company name, and about pages.

seo

Perform a comprehensive SEO analysis including meta tags and structured data.

performance

Collect page load metrics and performance indicators.

security

Inspect security headers and HTTPS configuration.

compliance

Check accessibility and regulatory compliance.

infrastructure

Detect server technologies and CDN usage.

legal

Identify privacy policies and terms, including KVKK/GDPR signals.

careers

Find job opportunities and career pages.

references

Gather client testimonials and case studies.

contact

Extract contact details such as emails, phones, and addresses.