home / skills / firecrawl / cli / firecrawl-agent

firecrawl-agent skill

/skills/firecrawl-agent

This skill helps you extract structured data from complex websites by autonomously navigating pages and returning JSON with a defined schema.

npx playbooks add skill firecrawl/cli --skill firecrawl-agent

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.7 KB
---
name: firecrawl-agent
description: |
  AI-powered autonomous data extraction that navigates complex sites and returns structured JSON. Use this skill when the user wants structured data from websites, needs to extract pricing tiers, product listings, directory entries, or any data as JSON with a schema. Triggers on "extract structured data", "get all the products", "pull pricing info", "extract as JSON", or when the user provides a JSON schema for website data. More powerful than simple scraping for multi-page structured extraction.
allowed-tools:
  - Bash(firecrawl *)
  - Bash(npx firecrawl *)
---

# firecrawl agent

AI-powered autonomous extraction. The agent navigates sites and extracts structured data (takes 2-5 minutes).

## When to use

- You need structured data from complex multi-page sites
- Manual scraping would require navigating many pages
- You want the AI to figure out where the data lives

## Quick start

```bash
# Extract structured data
firecrawl agent "extract all pricing tiers" --wait -o .firecrawl/pricing.json

# With a JSON schema for structured output
firecrawl agent "extract products" --schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}' --wait -o .firecrawl/products.json

# Focus on specific pages
firecrawl agent "get feature list" --urls "<url>" --wait -o .firecrawl/features.json
```

## Options

| Option                 | Description                               |
| ---------------------- | ----------------------------------------- |
| `--urls <urls>`        | Starting URLs for the agent               |
| `--model <model>`      | Model to use: spark-1-mini or spark-1-pro |
| `--schema <json>`      | JSON schema for structured output         |
| `--schema-file <path>` | Path to JSON schema file                  |
| `--max-credits <n>`    | Credit limit for this agent run           |
| `--wait`               | Wait for agent to complete                |
| `--pretty`             | Pretty print JSON output                  |
| `-o, --output <path>`  | Output file path                          |

## Tips

- Always use `--wait` to get results inline. Without it, returns a job ID.
- Use `--schema` for predictable, structured output — otherwise the agent returns freeform data.
- Agent runs consume more credits than simple scrapes. Use `--max-credits` to cap spending.
- For simple single-page extraction, prefer `scrape` — it's faster and cheaper.

## See also

- [firecrawl-scrape](../firecrawl-scrape/SKILL.md) — simpler single-page extraction
- [firecrawl-browser](../firecrawl-browser/SKILL.md) — manual browser automation (more control)
- [firecrawl-crawl](../firecrawl-crawl/SKILL.md) — bulk extraction without AI

Overview

This skill performs AI-powered autonomous data extraction that navigates complex websites and returns structured JSON. It is designed to locate, traverse, and extract multi-page content like product lists, pricing tiers, and directory entries. Runs typically take a few minutes and can accept a JSON schema to enforce predictable output.

How this skill works

The agent launches from one or more starting URLs, explores linked pages as needed, and identifies where target data resides. It uses an AI planner to decide navigation and extraction steps, then outputs the results as structured JSON. Optionally provide a JSON schema to constrain the output shape and improve reliability.

When to use it

  • You need structured JSON from complex, multi-page sites.
  • You want the agent to discover where data lives rather than specify every page.
  • You need pricing tiers, complete product listings, or directory entries aggregated.
  • You have a JSON schema and require predictable output for downstream processing.
  • Manual scraping would be slow due to many linked pages or dynamic navigation.

Best practices

  • Provide one or more focused starting URLs to reduce exploration time.
  • Supply a JSON schema when you require consistent field types and names.
  • Use --wait (or the equivalent) to receive results inline instead of a job ID.
  • Set a credit or runtime limit for large sites to control cost.
  • Prefer simpler single-page scrapes for trivial extraction to save time and credits.

Example use cases

  • Extract all product details (name, sku, price, description) across a multi-category ecommerce site into JSON.
  • Pull pricing tiers and feature lists from a SaaS website and return an array of plan objects.
  • Crawl a business directory to collect company names, addresses, phone numbers, and websites.
  • Aggregate course catalogs and schedules across department pages into a standardized JSON schema.
  • Collect technical documentation sections scattered across pages into a structured output for indexing.

FAQ

What if I need a specific JSON layout?

Provide a JSON schema before running the agent. The agent will attempt to conform extracted data to that schema for predictable output.

How long does an extraction take?

Typical runs take 2–5 minutes depending on site complexity and number of pages; set limits if you need tighter bounds.