home / skills / vm0-ai / vm0-skills / firecrawl

firecrawl skill

/firecrawl

This skill helps you scrape, crawl, and extract structured data from websites using Firecrawl API via curl for AI-powered insights.

npx playbooks add skill vm0-ai/vm0-skills --skill firecrawl

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
11.5 KB
---
name: firecrawl
description: Firecrawl web scraping API via curl. Use this skill to scrape webpages, crawl websites, discover URLs, search the web, or extract structured data.
vm0_secrets:
  - FIRECRAWL_API_KEY
---

# Firecrawl

Use the Firecrawl API via direct `curl` calls to **scrape websites and extract data for AI**.

> Official docs: `https://docs.firecrawl.dev/`

---

## When to Use

Use this skill when you need to:

- **Scrape a webpage** and convert to markdown/HTML
- **Crawl an entire website** and extract all pages
- **Discover all URLs** on a website
- **Search the web** and get full page content
- **Extract structured data** using AI

---

## Prerequisites

1. Sign up at https://www.firecrawl.dev/
2. Get your API key from the dashboard

```bash
export FIRECRAWL_API_KEY="fc-your-api-key"
```

---


> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```

## How to Use

All examples below assume you have `FIRECRAWL_API_KEY` set.

Base URL: `https://api.firecrawl.dev/v1`

---

## 1. Scrape - Single Page

Extract content from a single webpage.

### Basic Scrape

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["markdown"]
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
```

### Scrape with Options

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://docs.example.com/api",
  "formats": ["markdown"],
  "onlyMainContent": true,
  "timeout": 30000
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.markdown'
```

### Get HTML Instead

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["html"]
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.html'
```

### Get Screenshot

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "formats": ["screenshot"]
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data.screenshot'
```

**Scrape Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | URL to scrape (required) |
| `formats` | array | `markdown`, `html`, `rawHtml`, `screenshot`, `links` |
| `onlyMainContent` | boolean | Skip headers/footers |
| `timeout` | number | Timeout in milliseconds |

---

## 2. Crawl - Entire Website

Crawl all pages of a website (async operation).

### Start a Crawl

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com",
  "limit": 50,
  "maxDepth": 2
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
```

**Response:**

```json
{
  "success": true,
  "id": "crawl-job-id-here"
}
```

### Check Crawl Status

Replace `<job-id>` with the actual job ID returned from the crawl request:

```bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '{status, completed, total}'
```

### Get Crawl Results

Replace `<job-id>` with the actual job ID:

```bash
bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq '.data[] | {url: .metadata.url, title: .metadata.title}'
```

### Crawl with Path Filters

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://blog.example.com",
  "limit": 20,
  "maxDepth": 3,
  "includePaths": ["/posts/*"],
  "excludePaths": ["/admin/*", "/login"]
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/crawl" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json'
```

**Crawl Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | Starting URL (required) |
| `limit` | number | Max pages to crawl (default: 100) |
| `maxDepth` | number | Max crawl depth (default: 3) |
| `includePaths` | array | Paths to include (e.g., `/blog/*`) |
| `excludePaths` | array | Paths to exclude |

---

## 3. Map - URL Discovery

Get all URLs from a website quickly.

### Basic Map

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://example.com"
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links[:10]'
```

### Map with Search Filter

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://shop.example.com",
  "search": "product",
  "limit": 500
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.links'
```

**Map Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `url` | string | Website URL (required) |
| `search` | string | Filter URLs containing keyword |
| `limit` | number | Max URLs to return (default: 1000) |

---

## 4. Search - Web Search

Search the web and get full page content.

### Basic Search

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "AI news 2024",
  "limit": 5
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'
```

### Search with Full Content

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "machine learning tutorials",
  "limit": 3,
  "scrapeOptions": {
    "formats": ["markdown"]
  }
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, content: .markdown[:500]}'
```

**Search Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | string | Search query (required) |
| `limit` | number | Number of results (default: 10) |
| `scrapeOptions` | object | Options for scraping results |

---

## 5. Extract - AI Data Extraction

Extract structured data from pages using AI.

### Basic Extract

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract the product name, price, and description"
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
```

### Extract with Schema

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/product/123"],
  "prompt": "Extract product information",
  "schema": {
    "type": "object",
    "properties": {
      "name": {"type": "string"},
      "price": {"type": "number"},
      "currency": {"type": "string"},
      "inStock": {"type": "boolean"}
    }
  }
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
```

### Extract from Multiple URLs

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": [
    "https://example.com/product/1",
    "https://example.com/product/2"
  ],
  "prompt": "Extract product name and price"
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
```

**Extract Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `urls` | array | URLs to extract from (required) |
| `prompt` | string | Description of data to extract (required) |
| `schema` | object | JSON schema for structured output |

---

## Practical Examples

### Scrape Documentation

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://docs.python.org/3/tutorial/",
  "formats": ["markdown"],
  "onlyMainContent": true
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/scrape" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.data.markdown' > python-tutorial.md
```

### Find All Blog Posts

Write to `/tmp/firecrawl_request.json`:

```json
{
  "url": "https://blog.example.com",
  "search": "post"
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/map" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq -r '.links[]'
```

### Research a Topic

Write to `/tmp/firecrawl_request.json`:

```json
{
  "query": "best practices REST API design 2024",
  "limit": 5,
  "scrapeOptions": {"formats": ["markdown"]}
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/search" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data[] | {title: .metadata.title, url: .url}'
```

### Extract Pricing Data

Write to `/tmp/firecrawl_request.json`:

```json
{
  "urls": ["https://example.com/pricing"],
  "prompt": "Extract all pricing tiers with name, price, and features"
}
```

Then run:

```bash
bash -c 'curl -s -X POST "https://api.firecrawl.dev/v1/extract" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}" -H "Content-Type: application/json" -d @/tmp/firecrawl_request.json' | jq '.data'
```

### Poll Crawl Until Complete

Replace `<job-id>` with the actual job ID:

```bash
while true; do
  STATUS="$(bash -c 'curl -s "https://api.firecrawl.dev/v1/crawl/<job-id>" -H "Authorization: Bearer ${FIRECRAWL_API_KEY}"' | jq -r '.status')"
  echo "Status: $STATUS"
  [ "$STATUS" = "completed" ] && break
  sleep 5
done
```

---

## Response Format

### Scrape Response

```json
{
  "success": true,
  "data": {
  "markdown": "# Page Title\n\nContent...",
  "metadata": {
  "title": "Page Title",
  "description": "...",
  "url": "https://..."
  }
  }
}
```

### Crawl Status Response

```json
{
  "success": true,
  "status": "completed",
  "completed": 50,
  "total": 50,
  "data": [...]
}
```

---

## Guidelines

1. **Rate limits**: Add delays between requests to avoid 429 errors
2. **Crawl limits**: Set reasonable `limit` values to control API usage
3. **Main content**: Use `onlyMainContent: true` for cleaner output
4. **Async crawls**: Large crawls are async; poll `/crawl/{id}` for status
5. **Extract prompts**: Be specific for better AI extraction results
6. **Check success**: Always check `success` field in responses

Overview

This skill provides curl-based access to the Firecrawl web scraping API to scrape pages, crawl sites, discover URLs, search the web, and extract structured data for AI workflows. It focuses on simple JSON request files, authenticated requests using FIRECRAWL_API_KEY, and output formats like markdown, html, screenshots, and structured JSON. Use it when you need reliable programmatic scraping and AI-friendly data extraction via shell scripts.

How this skill works

You send POST requests to the Firecrawl endpoints (/scrape, /crawl, /map, /search, /extract) with a small JSON payload describing the target and options. The API returns structured responses containing content, metadata, links, or extraction outputs; crawls are asynchronous and return a job id you poll for completion. Common options include formats (markdown, html, rawHtml, screenshot, links), onlyMainContent, timeouts, limits, depth, and optional extraction schemas.

When to use it

  • Scrape a single page and convert it to markdown or HTML for downstream processing.
  • Crawl an entire site asynchronously to index pages, metadata, and titles.
  • Map a website to quickly discover all internal URLs or filter links by keyword.
  • Search the web for query results and retrieve full page content for research.
  • Extract structured data (products, pricing, specs) from one or many pages using an AI prompt or JSON schema.

Best practices

  • Set FIRECRAWL_API_KEY in your environment and wrap curl calls in bash -c when piping to avoid environment variable clearing.
  • Use onlyMainContent: true to reduce noise from headers/footers for cleaner Markdown/HTML outputs.
  • Respect rate limits: add delays and set reasonable limit values to avoid 429 errors and excessive usage.
  • Use async crawl jobs for large sites and poll /crawl/{job-id} until status is completed before fetching results.
  • Provide specific prompts and JSON schemas for the extract endpoint to get consistent, structured outputs.

Example use cases

  • Save a documentation site as markdown for offline search and AI indexing.
  • Crawl a blog to collect post URLs, titles, and publish dates for migration or analytics.
  • Map an e-commerce site to list all product URLs and feed them to a price-monitoring pipeline.
  • Search recent news on a topic and retrieve full page markdown for summarization by an LLM.
  • Extract pricing tiers and features from a pricing page into a JSON schema for comparison tools.

FAQ

How do I authenticate requests?

Set FIRECRAWL_API_KEY in your environment and include an Authorization: Bearer header in curl; use bash -c '...' when piping.

Are crawls synchronous?

No. Crawl requests are async: POST /crawl returns a job id. Poll GET /crawl/{id} to check status and retrieve results.

What output formats are available?

Supported formats include markdown, html, rawHtml, screenshot, and links. Use formats in the scrape payload to control response.