home / skills / jezweb / claude-skills / firecrawl-scraper
/skills/firecrawl-scraper
This skill helps you convert websites into AI-ready data using Firecrawl to scrape, crawl, extract, and monitor changes with autonomous capabilities.
npx playbooks add skill jezweb/claude-skills --skill firecrawl-scraperReview the files below or copy the command above to add this skill to your agents.
---
name: firecrawl-scraper
description: |
Convert websites into LLM-ready data with Firecrawl API. Features: scrape, crawl, map, search, extract, agent (autonomous), batch operations, and change tracking. Handles JavaScript, anti-bot bypass, PDF/DOCX parsing, and branding extraction. Prevents 10 documented errors.
Use when: scraping websites, crawling sites, web search + scrape, autonomous data gathering, monitoring content changes, extracting brand/design systems, or troubleshooting content not loading, JavaScript rendering, bot detection, v2 migration, job status errors, DNS resolution, or stealth mode pricing.
user-invocable: true
---
# Firecrawl Web Scraper Skill
**Status**: Production Ready
**Last Updated**: 2026-01-20
**Official Docs**: https://docs.firecrawl.dev
**API Version**: v2
**SDK Versions**: firecrawl-py 4.13.0+, @mendable/firecrawl-js 4.11.1+
---
## What is Firecrawl?
Firecrawl is a **Web Data API for AI** that turns websites into LLM-ready markdown or structured data. It handles:
- **JavaScript rendering** - Executes client-side JavaScript to capture dynamic content
- **Anti-bot bypass** - Gets past CAPTCHA and bot detection systems
- **Format conversion** - Outputs as markdown, HTML, JSON, screenshots, summaries
- **Document parsing** - Processes PDFs, DOCX files, and images
- **Autonomous agents** - AI-powered web data gathering without URLs
- **Change tracking** - Monitor content changes over time
- **Branding extraction** - Extract color schemes, typography, logos
---
## API Endpoints Overview
| Endpoint | Purpose | Use Case |
|----------|---------|----------|
| `/scrape` | Single page | Extract article, product page |
| `/crawl` | Full site | Index docs, archive sites |
| `/map` | URL discovery | Find all pages, plan strategy |
| `/search` | Web search + scrape | Research with live data |
| `/extract` | Structured data | Product prices, contacts |
| `/agent` | Autonomous gathering | No URLs needed, AI navigates |
| `/batch-scrape` | Multiple URLs | Bulk processing |
---
## 1. Scrape Endpoint (`/v2/scrape`)
Scrapes a single webpage and returns clean, structured content.
### Basic Usage
```python
from firecrawl import Firecrawl
import os
app = Firecrawl(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# Basic scrape
doc = app.scrape(
url="https://example.com/article",
formats=["markdown", "html"],
only_main_content=True
)
print(doc.markdown)
print(doc.metadata)
```
```typescript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
const result = await app.scrapeUrl('https://example.com/article', {
formats: ['markdown', 'html'],
onlyMainContent: true
});
console.log(result.markdown);
```
### Output Formats
| Format | Description |
|--------|-------------|
| `markdown` | LLM-optimized content |
| `html` | Full HTML |
| `rawHtml` | Unprocessed HTML |
| `screenshot` | Page capture (with viewport options) |
| `links` | All URLs on page |
| `json` | Structured data extraction |
| `summary` | AI-generated summary |
| `branding` | Design system data |
| `changeTracking` | Content change detection |
### Advanced Options
```python
doc = app.scrape(
url="https://example.com",
formats=["markdown", "screenshot"],
only_main_content=True,
remove_base64_images=True,
wait_for=5000, # Wait 5s for JS
timeout=30000,
# Location & language
location={"country": "AU", "languages": ["en-AU"]},
# Cache control
max_age=0, # Fresh content (no cache)
store_in_cache=True,
# Stealth mode for complex sites
stealth=True,
# Custom headers
headers={"User-Agent": "Custom Bot 1.0"}
)
```
### Browser Actions
Perform interactions before scraping:
```python
doc = app.scrape(
url="https://example.com",
actions=[
{"type": "click", "selector": "button.load-more"},
{"type": "wait", "milliseconds": 2000},
{"type": "scroll", "direction": "down"},
{"type": "write", "selector": "input#search", "text": "query"},
{"type": "press", "key": "Enter"},
{"type": "screenshot"} # Capture state mid-action
]
)
```
### JSON Mode (Structured Extraction)
```python
# With schema
doc = app.scrape(
url="https://example.com/product",
formats=["json"],
json_options={
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}
}
)
# Without schema (prompt-only)
doc = app.scrape(
url="https://example.com/product",
formats=["json"],
json_options={
"prompt": "Extract the product name, price, and availability"
}
)
```
### Branding Extraction
Extract design system and brand identity:
```python
doc = app.scrape(
url="https://example.com",
formats=["branding"]
)
# Returns:
# - Color schemes and palettes
# - Typography (fonts, sizes, weights)
# - Spacing and layout metrics
# - UI component styles
# - Logo and imagery URLs
# - Brand personality traits
```
---
## 2. Crawl Endpoint (`/v2/crawl`)
Crawls all accessible pages from a starting URL.
```python
result = app.crawl(
url="https://docs.example.com",
limit=100,
max_depth=3,
allowed_domains=["docs.example.com"],
exclude_paths=["/api/*", "/admin/*"],
scrape_options={
"formats": ["markdown"],
"only_main_content": True
}
)
for page in result.data:
print(f"Scraped: {page.metadata.source_url}")
print(f"Content: {page.markdown[:200]}...")
```
### Async Crawl with Webhooks
```python
# Start crawl (returns immediately)
job = app.start_crawl(
url="https://docs.example.com",
limit=1000,
webhook="https://your-domain.com/webhook"
)
print(f"Job ID: {job.id}")
# Or poll for status
status = app.check_crawl_status(job.id)
```
---
## 3. Map Endpoint (`/v2/map`)
Rapidly discover all URLs on a website without scraping content.
```python
urls = app.map(url="https://example.com")
print(f"Found {len(urls)} pages")
for url in urls[:10]:
print(url)
```
Use for: sitemap discovery, crawl planning, website audits.
---
## 4. Search Endpoint (`/search`) - NEW
Perform web searches and optionally scrape the results in one operation.
```python
# Basic search
results = app.search(
query="best practices for React server components",
limit=10
)
for result in results:
print(f"{result.title}: {result.url}")
# Search + scrape results
results = app.search(
query="React server components tutorial",
limit=5,
scrape_options={
"formats": ["markdown"],
"only_main_content": True
}
)
for result in results:
print(f"{result.title}")
print(result.markdown[:500])
```
### Search Options
```python
results = app.search(
query="machine learning papers",
limit=20,
# Filter by source type
sources=["web", "news", "images"],
# Filter by category
categories=["github", "research", "pdf"],
# Location
location={"country": "US"},
# Time filter
tbs="qdr:m", # Past month (qdr:h=hour, qdr:d=day, qdr:w=week, qdr:y=year)
timeout=30000
)
```
**Cost**: 2 credits per 10 results + scraping costs if enabled.
---
## 5. Extract Endpoint (`/v2/extract`)
AI-powered structured data extraction from single pages, multiple pages, or entire domains.
### Single Page
```python
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
description: str
in_stock: bool
result = app.extract(
urls=["https://example.com/product"],
schema=Product,
system_prompt="Extract product information"
)
print(result.data)
```
### Multi-Page / Domain Extraction
```python
# Extract from entire domain using wildcard
result = app.extract(
urls=["example.com/*"], # All pages on domain
schema=Product,
system_prompt="Extract all products"
)
# Enable web search for additional context
result = app.extract(
urls=["example.com/products"],
schema=Product,
enable_web_search=True # Follow external links
)
```
### Prompt-Only Extraction (No Schema)
```python
result = app.extract(
urls=["https://example.com/about"],
prompt="Extract the company name, founding year, and key executives"
)
# LLM determines output structure
```
---
## 6. Agent Endpoint (`/agent`) - NEW
Autonomous web data gathering without requiring specific URLs. The agent searches, navigates, and gathers data using natural language prompts.
```python
# Basic agent usage
result = app.agent(
prompt="Find the pricing plans for the top 3 headless CMS platforms and compare their features"
)
print(result.data)
# With schema for structured output
from pydantic import BaseModel
from typing import List
class CMSPricing(BaseModel):
name: str
free_tier: bool
starter_price: float
features: List[str]
result = app.agent(
prompt="Find pricing for Contentful, Sanity, and Strapi",
schema=CMSPricing
)
# Optional: focus on specific URLs
result = app.agent(
prompt="Extract the enterprise pricing details",
urls=["https://contentful.com/pricing", "https://sanity.io/pricing"]
)
```
### Agent Models
| Model | Best For | Cost |
|-------|----------|------|
| `spark-1-mini` (default) | Simple extractions, high volume | Standard |
| `spark-1-pro` | Complex analysis, ambiguous data | 60% more |
```python
result = app.agent(
prompt="Analyze competitive positioning...",
model="spark-1-pro" # For complex tasks
)
```
### Async Agent
```python
# Start agent (returns immediately)
job = app.start_agent(
prompt="Research market trends..."
)
# Poll for results
status = app.check_agent_status(job.id)
if status.status == "completed":
print(status.data)
```
**Note**: Agent is in Research Preview. 5 free daily requests, then credit-based billing.
---
## 7. Batch Scrape - NEW
Process multiple URLs efficiently in a single operation.
### Synchronous (waits for completion)
```python
results = app.batch_scrape(
urls=[
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
],
formats=["markdown"],
only_main_content=True
)
for page in results.data:
print(f"{page.metadata.source_url}: {len(page.markdown)} chars")
```
### Asynchronous (with webhooks)
```python
job = app.start_batch_scrape(
urls=url_list,
formats=["markdown"],
webhook="https://your-domain.com/webhook"
)
# Webhook receives events: started, page, completed, failed
```
```typescript
const job = await app.startBatchScrape(urls, {
formats: ['markdown'],
webhook: 'https://your-domain.com/webhook'
});
// Poll for status
const status = await app.checkBatchScrapeStatus(job.id);
```
---
## 8. Change Tracking - NEW
Monitor content changes over time by comparing scrapes.
```python
# Enable change tracking
doc = app.scrape(
url="https://example.com/pricing",
formats=["markdown", "changeTracking"]
)
# Response includes:
print(doc.change_tracking.status) # new, same, changed, removed
print(doc.change_tracking.previous_scrape_at)
print(doc.change_tracking.visibility) # visible, hidden
```
### Comparison Modes
```python
# Git-diff mode (default)
doc = app.scrape(
url="https://example.com/docs",
formats=["markdown", "changeTracking"],
change_tracking_options={
"mode": "diff"
}
)
print(doc.change_tracking.diff) # Line-by-line changes
# JSON mode (structured comparison)
doc = app.scrape(
url="https://example.com/pricing",
formats=["markdown", "changeTracking"],
change_tracking_options={
"mode": "json",
"schema": {"type": "object", "properties": {"price": {"type": "number"}}}
}
)
# Costs 5 credits per page
```
**Change States**:
- `new` - Page not seen before
- `same` - No changes since last scrape
- `changed` - Content modified
- `removed` - Page no longer accessible
---
## Authentication
```bash
# Get API key from https://www.firecrawl.dev/app
# Store in environment
FIRECRAWL_API_KEY=fc-your-api-key-here
```
**Never hardcode API keys!**
---
## Cloudflare Workers Integration
**The Firecrawl SDK cannot run in Cloudflare Workers** (requires Node.js). Use the REST API directly:
```typescript
interface Env {
FIRECRAWL_API_KEY: string;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { url } = await request.json<{ url: string }>();
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
formats: ['markdown'],
onlyMainContent: true
})
});
const result = await response.json();
return Response.json(result);
}
};
```
---
## Rate Limits & Pricing
### Warning: Stealth Mode Pricing Change (May 2025)
Stealth mode now costs **5 credits per request** when actively used. Default behavior uses "auto" mode which only charges stealth credits if basic fails.
**Recommended pattern**:
```python
# Use auto mode (default) - only charges 5 credits if stealth is needed
doc = app.scrape(url, formats=["markdown"])
# Or conditionally enable stealth for specific errors
if error_status_code in [401, 403, 500]:
doc = app.scrape(url, formats=["markdown"], proxy="stealth")
```
### Unified Billing (November 2025)
Credits and tokens merged into single system. Extract endpoint uses credits (15 tokens = 1 credit).
### Pricing Tiers
| Tier | Credits/Month | Notes |
|------|---------------|-------|
| Free | 500 | Good for testing |
| Hobby | 3,000 | $19/month |
| Standard | 100,000 | $99/month |
| Growth | 500,000 | $399/month |
**Credit Costs**:
- Scrape: 1 credit (basic), 5 credits (stealth)
- Crawl: 1 credit per page
- Search: 2 credits per 10 results
- Extract: 5 credits per page (changed from tokens in v2.6.0)
- Agent: Dynamic (complexity-based)
- Change Tracking JSON mode: +5 credits
---
## Common Issues & Solutions
| Issue | Cause | Solution |
|-------|-------|----------|
| Empty content | JS not loaded | Add `wait_for: 5000` or use `actions` |
| Rate limit exceeded | Over quota | Check dashboard, upgrade plan |
| Timeout error | Slow page | Increase `timeout`, use `stealth: true` |
| Bot detection | Anti-scraping | Use `stealth: true`, add `location` |
| Invalid API key | Wrong format | Must start with `fc-` |
---
## Known Issues Prevention
This skill prevents **10** documented issues:
### Issue #1: Stealth Mode Pricing Change (May 2025)
**Error**: Unexpected credit costs when using stealth mode
**Source**: [Stealth Mode Docs](https://docs.firecrawl.dev/features/stealth-mode) | [Changelog](https://www.firecrawl.dev/changelog)
**Why It Happens**: Starting May 8th, 2025, Stealth Mode proxy requests cost **5 credits per request** (previously included in standard pricing). This is a significant billing change.
**Prevention**: Use auto mode (default) which only charges stealth credits if basic fails
```python
# RECOMMENDED: Use auto mode (default)
doc = app.scrape(url, formats=['markdown'])
# Auto retries with stealth (5 credits) only if basic fails
# Or conditionally enable based on error status
try:
doc = app.scrape(url, formats=['markdown'], proxy='basic')
except Exception as e:
if e.status_code in [401, 403, 500]:
doc = app.scrape(url, formats=['markdown'], proxy='stealth')
```
**Stealth Mode Options**:
- `auto` (default): Charges 5 credits only if stealth succeeds after basic fails
- `basic`: Standard proxies, 1 credit cost
- `stealth`: 5 credits per request when actively used
---
### Issue #2: v2.0.0 Breaking Changes - Method Renames
**Error**: `AttributeError: 'FirecrawlApp' object has no attribute 'scrape_url'`
**Source**: [v2.0.0 Release](https://github.com/firecrawl/firecrawl/releases/tag/v2.0.0) | [Migration Guide](https://docs.firecrawl.dev/migrate-to-v2)
**Why It Happens**: v2.0.0 (August 2025) renamed SDK methods across all languages
**Prevention**: Use new method names
**JavaScript/TypeScript**:
- `scrapeUrl()` → `scrape()`
- `crawlUrl()` → `crawl()` or `startCrawl()`
- `asyncCrawlUrl()` → `startCrawl()`
- `checkCrawlStatus()` → `getCrawlStatus()`
**Python**:
- `scrape_url()` → `scrape()`
- `crawl_url()` → `crawl()` or `start_crawl()`
```python
# OLD (v1)
doc = app.scrape_url("https://example.com")
# NEW (v2)
doc = app.scrape("https://example.com")
```
---
### Issue #3: v2.0.0 Breaking Changes - Format Changes
**Error**: `'extract' is not a valid format`
**Source**: [v2.0.0 Release](https://github.com/firecrawl/firecrawl/releases/tag/v2.0.0)
**Why It Happens**: Old `"extract"` format renamed to `"json"` in v2.0.0
**Prevention**: Use new object format for JSON extraction
```python
# OLD (v1)
doc = app.scrape_url(
url="https://example.com",
params={
"formats": ["extract"],
"extract": {"prompt": "Extract title"}
}
)
# NEW (v2)
doc = app.scrape(
url="https://example.com",
formats=[{"type": "json", "prompt": "Extract title"}]
)
# With schema
doc = app.scrape(
url="https://example.com",
formats=[{
"type": "json",
"prompt": "Extract product info",
"schema": {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"}
}
}
}]
)
```
**Screenshot format also changed**:
```python
# NEW: Screenshot as object
formats=[{
"type": "screenshot",
"fullPage": True,
"quality": 80,
"viewport": {"width": 1920, "height": 1080}
}]
```
---
### Issue #4: v2.0.0 Breaking Changes - Crawl Options
**Error**: `'allowBackwardCrawling' is not a valid parameter`
**Source**: [v2.0.0 Release](https://github.com/firecrawl/firecrawl/releases/tag/v2.0.0)
**Why It Happens**: Several crawl parameters renamed or removed in v2.0.0
**Prevention**: Use new parameter names
**Parameter Changes**:
- `allowBackwardCrawling` → Use `crawlEntireDomain` instead
- `maxDepth` → Use `maxDiscoveryDepth` instead
- `ignoreSitemap` (bool) → `sitemap` ("only", "skip", "include")
```python
# OLD (v1)
app.crawl_url(
url="https://docs.example.com",
params={
"allowBackwardCrawling": True,
"maxDepth": 3,
"ignoreSitemap": False
}
)
# NEW (v2)
app.crawl(
url="https://docs.example.com",
crawl_entire_domain=True,
max_discovery_depth=3,
sitemap="include" # "only", "skip", or "include"
)
```
---
### Issue #5: v2.0.0 Default Behavior Changes
**Error**: Stale cached content returned unexpectedly
**Source**: [v2.0.0 Release](https://github.com/firecrawl/firecrawl/releases/tag/v2.0.0)
**Why It Happens**: v2.0.0 changed several defaults
**Prevention**: Be aware of new defaults
**Default Changes**:
- `maxAge` now defaults to **2 days** (cached by default)
- `blockAds`, `skipTlsVerification`, `removeBase64Images` enabled by default
```python
# Force fresh data if needed
doc = app.scrape(url, formats=['markdown'], max_age=0)
# Disable cache entirely
doc = app.scrape(url, formats=['markdown'], store_in_cache=False)
```
---
### Issue #6: Job Status Race Condition
**Error**: `"Job not found"` when checking crawl status immediately after creation
**Source**: [GitHub Issue #2662](https://github.com/firecrawl/firecrawl/issues/2662)
**Why It Happens**: Database replication delay between job creation and status endpoint availability
**Prevention**: Wait 1-3 seconds before first status check, or implement retry logic
```python
import time
# Start crawl
job = app.start_crawl(url="https://docs.example.com")
print(f"Job ID: {job.id}")
# REQUIRED: Wait before first status check
time.sleep(2) # 1-3 seconds recommended
# Now status check succeeds
status = app.get_crawl_status(job.id)
# Or implement retry logic
def get_status_with_retry(job_id, max_retries=3, delay=1):
for attempt in range(max_retries):
try:
return app.get_crawl_status(job_id)
except Exception as e:
if "Job not found" in str(e) and attempt < max_retries - 1:
time.sleep(delay)
continue
raise
status = get_status_with_retry(job.id)
```
---
### Issue #7: DNS Errors Return HTTP 200
**Error**: DNS resolution failures return `success: false` with HTTP 200 status instead of 4xx
**Source**: [GitHub Issue #2402](https://github.com/firecrawl/firecrawl/issues/2402) | Fixed in v2.7.0
**Why It Happens**: Changed in v2.7.0 for consistent error handling
**Prevention**: Check `success` field and `code` field, don't rely on HTTP status alone
```typescript
const result = await app.scrape('https://nonexistent-domain-xyz.com');
// DON'T rely on HTTP status code
// Response: HTTP 200 with { success: false, code: "SCRAPE_DNS_RESOLUTION_ERROR" }
// DO check success field
if (!result.success) {
if (result.code === 'SCRAPE_DNS_RESOLUTION_ERROR') {
console.error('DNS resolution failed');
}
throw new Error(result.error);
}
```
**Note**: DNS resolution errors still charge 1 credit despite failure.
---
### Issue #8: Bot Detection Still Charges Credits
**Error**: Cloudflare error page returned as "successful" scrape, credits charged
**Source**: [GitHub Issue #2413](https://github.com/firecrawl/firecrawl/issues/2413)
**Why It Happens**: Fire-1 engine charges credits even when bot detection prevents access
**Prevention**: Validate content isn't an error page before processing; use stealth mode for protected sites
```python
# First attempt without stealth
doc = app.scrape(url="https://protected-site.com", formats=["markdown"])
# Validate content isn't an error page
if "cloudflare" in doc.markdown.lower() or "access denied" in doc.markdown.lower():
# Retry with stealth (costs 5 credits if successful)
doc = app.scrape(url, formats=["markdown"], stealth=True)
```
**Cost Impact**: Basic scrape charges 1 credit even on failure, stealth retry charges additional 5 credits.
---
### Issue #9: Self-Hosted Anti-Bot Fingerprinting Weakness
**Error**: `"All scraping engines failed!"` (SCRAPE_ALL_ENGINES_FAILED) on sites with anti-bot measures
**Source**: [GitHub Issue #2257](https://github.com/firecrawl/firecrawl/issues/2257)
**Why It Happens**: Self-hosted Firecrawl lacks advanced anti-fingerprinting techniques present in cloud service
**Prevention**: Use Firecrawl cloud service for sites with strong anti-bot measures, or configure proxy
```bash
# Self-hosted fails on Cloudflare-protected sites
curl -X POST 'http://localhost:3002/v2/scrape' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://www.example.com/",
"pageOptions": { "engine": "playwright" }
}'
# Error: "All scraping engines failed!"
# Workaround: Use cloud service instead
# Cloud service has better anti-fingerprinting
```
**Note**: This affects self-hosted v2.3.0+ with default docker-compose setup. Warning present: "⚠️ WARNING: No proxy server provided. Your IP address may be blocked."
---
### Issue #10: Cache Performance Best Practices (Community-sourced)
**Suboptimal**: Not leveraging cache can make requests 500% slower
**Source**: [Fast Scraping Docs](https://docs.firecrawl.dev/features/fast-scraping) | [Blog Post](https://www.firecrawl.dev/blog/mastering-firecrawl-scrape-endpoint)
**Why It Matters**: Default `maxAge` is 2 days in v2+, but many use cases need different strategies
**Prevention**: Use appropriate cache strategy for your content type
```python
# Fresh data (real-time pricing, stock prices)
doc = app.scrape(url, formats=["markdown"], max_age=0)
# 10-minute cache (news, blogs)
doc = app.scrape(url, formats=["markdown"], max_age=600000) # milliseconds
# Use default cache (2 days) for static content
doc = app.scrape(url, formats=["markdown"]) # maxAge defaults to 172800000
# Don't store in cache (one-time scrape)
doc = app.scrape(url, formats=["markdown"], store_in_cache=False)
# Require minimum age before re-scraping (v2.7.0+)
doc = app.scrape(url, formats=["markdown"], min_age=3600000) # 1 hour minimum
```
**Performance Impact**:
- Cached response: Milliseconds
- Fresh scrape: Seconds
- Speed difference: **Up to 500%**
---
## Package Versions
| Package | Version | Last Checked |
|---------|---------|--------------|
| firecrawl-py | 4.13.0+ | 2026-01-20 |
| @mendable/firecrawl-js | 4.11.1+ | 2026-01-20 |
| API Version | v2 | Current |
---
## Official Documentation
- **Docs**: https://docs.firecrawl.dev
- **Python SDK**: https://docs.firecrawl.dev/sdks/python
- **Node.js SDK**: https://docs.firecrawl.dev/sdks/node
- **API Reference**: https://docs.firecrawl.dev/api-reference
- **GitHub**: https://github.com/mendableai/firecrawl
- **Dashboard**: https://www.firecrawl.dev/app
---
**Token Savings**: ~65% vs manual integration
**Error Prevention**: 10 documented issues (v2 migration, stealth pricing, job status race, DNS errors, bot detection billing, self-hosted limitations, cache optimization)
**Production Ready**: Yes
**Last verified**: 2026-01-21 | **Skill version**: 2.0.0 | **Changes**: Added Known Issues Prevention section with 10 documented errors from TIER 1-2 research findings; added v2 migration guidance; documented stealth mode pricing change and unified billing model
This skill converts websites into LLM-ready data using the Firecrawl API. It supports scraping, crawling, mapping, structured extraction, autonomous agents, batch operations, and change tracking. Production-ready features include JavaScript rendering, anti-bot bypass, PDF/DOCX parsing, and branding extraction to produce markdown, JSON, or rich metadata.
The skill calls Firecrawl v2 endpoints to fetch and transform web content: /scrape for single pages, /crawl for site-wide indexing, /map for URL discovery, /search for combined web search + scrape, /extract for schema-driven data, and /agent for autonomous gathering. It handles client-side JavaScript, anti-bot defenses (stealth mode), document parsing, and can run browser actions (click, scroll, fill) before capture. Results return markdown, HTML, structured JSON, screenshots, branding details, and change-tracking diffs.
Does it handle pages with heavy JavaScript?
Yes. Use wait_for, browser actions, or increase timeout to ensure client-side rendering completes before capture.
How do I avoid bot detection charges?
Default is auto stealth: stealth credits are consumed only when basic approaches fail. Enable stealth explicitly only for known anti-bot pages to limit cost.
Can I get structured output?
Yes. Use /extract with a JSON schema or prompt-only extraction to produce typed JSON for downstream pipelines.
How do I monitor site changes?
Include changeTracking in formats to receive status (new, same, changed, removed) and diffs; JSON mode supports structured comparisons.