home / mcp / site crawler mcp server
Provides advanced website crawling with asset extraction, SEO, security, and compliance analyses.
Configuration
View docs{
"mcpServers": {
"andacguven-site-crawler-mcp": {
"command": "uvx",
"args": [
"--from",
"/path/to/site-crawler-mcp",
"site-crawler-mcp"
],
"env": {
"PYTHONPATH": "/path/to/site-crawler-mcp/src",
"CRAWLER_TIMEOUT": "30",
"CRAWLER_USER_AGENT": "SiteCrawlerMCP/1.0",
"CRAWLER_MAX_CONCURRENT": "5"
}
}
}
}Site Crawler MCP lets you run a local MCP server that crawls websites and extracts assets, metadata, and security/SEO information. Itβs designed for e-commerce sites and general web crawling needs, enabling multi-mode data collection, polite crawling, and structured outputs for business insights.
You run the Site Crawler MCP server through an MCP client by configuring an MCP server entry and starting the local process. You can start a server in different ways depending on your preferred runtime environment: via uvx (recommended for development), via uv run, or by invoking Python directly. Each method connects to the local server process and exposes the same capabilities through the MCP protocol.
To use it in your workflow, add a server entry that points to how you will start the process. Youβll typically provide a short, stable name for the server, specify the runtime command, and pass the appropriate arguments to launch the server. Once configured, you can request crawling operations that extract images, metadata, SEO data, brand information, and more in a single pass.
Prerequisites you need before installing: Python 3.10+ and a working development environment. You will also use a virtual environment and a package manager to install dependencies.
Option A: Install from PyPI when published
pip install site-crawler-mcpUsing uv (Recommended) follow these steps to set up a development environment and install the package:
# Clone the repository
git clone https://github.com/AndacGuven/site-crawler-mcp.git
cd site-crawler-mcp
# Create virtual environment with Python 3.12
uv venv --python 3.12
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies and package
uv syncOption B: Install from source using a plain Python environment
# Clone the repository
git clone https://github.com/AndacGuven/site-crawler-mcp.git
cd site-crawler-mcp
# Create virtual environment (recommended)
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install in development mode
pip install -e .If you plan to run the MCP server locally, youβll use one of the following runtime invocations to start the server and connect with your MCP client. The examples below show the exact commands youβll run from your project path when starting the server.
Configure the server to respect concurrency and timeouts to match your environment. The crawler supports setting maximum concurrent requests and a request timeout, and you can customize the user agent string for your crawls.
For best results, tailor the crawl depth and the maximum number of pages to crawl. The system is designed to deduplicate URLs and respect politeness delays to avoid overloading target sites.
Use the multi-mode extraction to gather a mix of data types in one run. Typical combinations include images, meta data, SEO information, and brand details to build a comprehensive view of a site.
Crawl a website and extract various assets across multiple modes such as images, meta, brand, seo, performance, security, compliance, infrastructure, legal, careers, references, and contact information.
Extract all images with metadata including alt text, dimensions, and format.
Extract basic SEO metadata like title, description, and H1 tags.
Extract branding information such as logo, company name, and about pages.
Perform a comprehensive SEO analysis including meta tags and structured data.
Collect page load metrics and performance indicators.
Inspect security headers and HTTPS configuration.
Check accessibility and regulatory compliance.
Detect server technologies and CDN usage.
Identify privacy policies and terms, including KVKK/GDPR signals.
Find job opportunities and career pages.
Gather client testimonials and case studies.
Extract contact details such as emails, phones, and addresses.