Home / MCP / Web Crawler MCP Server
Provides a configurable web crawling MCP server that follows links, respects delays, and handles concurrent requests for scalable data collection.
Configuration
View docs{
"mcpServers": {
"web_crawler": {
"command": "node",
"args": [
"/path/to/web-crawler/build/index.js"
],
"env": {
"CRAWL_LINKS": "false",
"MAX_DEPTH": "3",
"REQUEST_DELAY": "1000",
"TIMEOUT": "5000",
"MAX_CONCURRENT": "5"
}
}
}
}You can deploy and run a Web Crawler MCP Server that exposes a crawl endpoint via MCP for configurable web crawling tasks. This server lets you perform controlled crawls with adjustable depth, delays, timeouts, and concurrency, all managed through an MCP client.
To perform a crawl, connect with your MCP client and invoke the crawl capability exposed by the server. You can specify the target URL and the crawl depth, and the server will handle requests in a controlled, asynchronous way. Use it to collect pages, follow links within configured constraints, and respect rate limits and timeouts. You can adjust how aggressively the crawler behaves by setting environment variables or MCP configuration values that control depth, delays, timeouts, and concurrency. Ensure you have the appropriate permissions and follow any site policies before crawling.
Prerequisites: Node.js v18 or newer and npm v9 or newer.
Step 1: Install dependencies and build the project.
git clone https://github.com/jitsmaster/web-crawler-mcp.git
cd web-crawler-mcp
npm install
npm run buildStep 2: Create runtime configuration for the server using environment variables.
CRAWL_LINKS=false
MAX_DEPTH=3
REQUEST_DELAY=1000
TIMEOUT=5000
MAX_CONCURRENT=5Step 3: Start the MCP server.
npm startThe server can be wired into your MCP workflow using a local (stdio) runtime configuration. This runs the server as a local process started by a command like node and points to the built entry file. The following configuration snippet shows the exact structure used to run the server locally and pass through the environment variables shown above.
{
"mcpServers": {
"web_crawler": {
"command": "node",
"args": ["/path/to/web-crawler/build/index.js"],
"env": {
"CRAWL_LINKS": "false",
"MAX_DEPTH": "3",
"REQUEST_DELAY": "1000",
"TIMEOUT": "5000",
"MAX_CONCURRENT": "5"
}
}
}
}Environment variables control crawling behavior. Adjust CRAWL_LINKS to follow or ignore links, set MAX_DEPTH to limit crawl breadth, tune REQUEST_DELAY for pacing, TIMEOUT for request timeouts, and MAX_CONCURRENT to limit simultaneous requests. When deploying in production, consider securing access to the MCP endpoint and applying appropriate rate limiting on the client side to stay within target site policies.
To initiate a crawl via your MCP client, provide the target URL and optional depth parameter. The server executes the crawl according to the configured environment and MCP settings.
Initiates a crawl via MCP with configurable URL and depth; respects server-side settings such as delay, timeout, and concurrency.