home / skills / firecrawl / cli / firecrawl-crawl

firecrawl-crawl skill

/skills/firecrawl-crawl

This skill bulk extracts content from a website by crawling links to a defined depth, filters paths, and returns structured results.

npx playbooks add skill firecrawl/cli --skill firecrawl-crawl

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.6 KB
---
name: firecrawl-crawl
description: |
  Bulk extract content from an entire website or site section. Use this skill when the user wants to crawl a site, extract all pages from a docs section, bulk-scrape multiple pages following links, or says "crawl", "get all the pages", "extract everything under /docs", "bulk extract", or needs content from many pages on the same site. Handles depth limits, path filtering, and concurrent extraction.
allowed-tools:
  - Bash(firecrawl *)
  - Bash(npx firecrawl *)
---

# firecrawl crawl

Bulk extract content from a website. Crawls pages following links up to a depth/limit.

## When to use

- You need content from many pages on a site (e.g., all `/docs/`)
- You want to extract an entire site section
- Step 4 in the [workflow escalation pattern](firecrawl-cli): search → scrape → map → **crawl** → browser

## Quick start

```bash
# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json

# Check status of a running crawl
firecrawl crawl <job-id>
```

## Options

| Option                    | Description                                 |
| ------------------------- | ------------------------------------------- |
| `--wait`                  | Wait for crawl to complete before returning |
| `--progress`              | Show progress while waiting                 |
| `--limit <n>`             | Max pages to crawl                          |
| `--max-depth <n>`         | Max link depth to follow                    |
| `--include-paths <paths>` | Only crawl URLs matching these paths        |
| `--exclude-paths <paths>` | Skip URLs matching these paths              |
| `--delay <ms>`            | Delay between requests                      |
| `--max-concurrency <n>`   | Max parallel crawl workers                  |
| `--pretty`                | Pretty print JSON output                    |
| `-o, --output <path>`     | Output file path                            |

## Tips

- Always use `--wait` when you need the results immediately. Without it, crawl returns a job ID for async polling.
- Use `--include-paths` to scope the crawl — don't crawl an entire site when you only need one section.
- Crawl consumes credits per page. Check `firecrawl credit-usage` before large crawls.

## See also

- [firecrawl-scrape](../firecrawl-scrape/SKILL.md) — scrape individual pages
- [firecrawl-map](../firecrawl-map/SKILL.md) — discover URLs before deciding to crawl
- [firecrawl-download](../firecrawl-download/SKILL.md) — download site to local files (uses map + scrape)

Overview

This skill bulk-extracts content from an entire website or a specific site section. It runs a link-following crawl with configurable depth, page limits, path filters, and concurrent workers. Use it to collect many pages for analysis, ingestion, or offline processing. Results can be returned synchronously or as an async job ID for polling.

How this skill works

The crawler starts from a seed URL and follows internal links up to a maximum depth and page limit. You can include or exclude path patterns to scope the crawl, set request delays, and tune concurrency for performance and politeness. The skill outputs a JSON file containing scraped page content and metadata, or returns a job ID when run asynchronously. Progress and wait options let you monitor or block until completion.

When to use it

  • You need content from many pages on the same site (e.g., an entire /docs section).
  • You want to extract every page under a specific path for indexing or archival.
  • You need to bulk-scrape linked pages rather than single-page scrapes.
  • You want controlled breadth/depth with concurrency and delay settings for large crawls.

Best practices

  • Scope the crawl with --include-paths to avoid crawling the whole site and reduce cost.
  • Use --limit and --max-depth to control crawl size and runtime.
  • Set --delay and reasonable --max-concurrency to avoid overloading the target site.
  • Run with --wait when you need immediate results; otherwise poll the returned job ID.
  • Check credit usage before large crawls to manage costs.

Example use cases

  • Crawl and extract all developer docs under /docs for building a searchable knowledge base.
  • Bulk-extract product pages across a site section for competitive analysis or feed generation.
  • Crawl a blog archive to ingest posts into an AI training or summarization pipeline.
  • Run a depth-limited crawl to map and scrape linked help articles for support automation.

FAQ

What if I only want a subset of URLs?

Use --include-paths to restrict which paths are followed and --exclude-paths to skip unwanted patterns.

How do I get results immediately?

Run with --wait to block until the crawl finishes and returns the output file; without it you receive a job ID for async polling.

How do I avoid overloading a site?

Set --delay to add time between requests and lower --max-concurrency to reduce parallel workers.