home / skills / vm0-ai / vm0-skills / scrapeninja
This skill accelerates data extraction from protected sites using fast non-JS scraping and JS rendering with geo-targeted proxies.
npx playbooks add skill vm0-ai/vm0-skills --skill scrapeninjaReview the files below or copy the command above to add this skill to your agents.
---
name: scrapeninja
description: High-performance web scraping API with Chrome TLS fingerprint and JS rendering
vm0_secrets:
- SCRAPENINJA_API_KEY
---
# ScrapeNinja
High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.
> Official docs: https://scrapeninja.net/docs/
---
## When to Use
Use this skill when you need to:
- Scrape websites with anti-bot protection (Cloudflare, Datadome)
- Extract data without running a full browser (fast `/scrape` endpoint)
- Render JavaScript-heavy pages (`/scrape-js` endpoint)
- Use rotating proxies with geo selection (US, EU, Brazil, etc.)
- Extract structured data with Cheerio extractors
- Intercept AJAX requests
- Take screenshots of pages
---
## Prerequisites
1. Get an API key from RapidAPI or APIRoad:
- RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
- APIRoad: https://apiroad.net/marketplace/apis/scrapeninja
Set environment variable:
```bash
# For RapidAPI
export SCRAPENINJA_API_KEY="your-rapidapi-key"
# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_API_KEY="your-apiroad-key"
```
---
> **Important:** When using `$VAR` in a command that pipes to another command, wrap the command containing `$VAR` in `bash -c '...'`. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.
> ```bash
> bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"'
> ```
## How to Use
### 1. Basic Scrape (Non-JS, Fast)
High-performance scraping with Chrome TLS fingerprint, no JavaScript:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com"
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
```
**With custom headers and retries:**
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'
```
### 2. Scrape with JavaScript Rendering
For JavaScript-heavy sites (React, Vue, etc.):
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
```
**With screenshot:**
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"screenshot": true
}
```
Then run:
```bash
# Get screenshot URL from response
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot'
```
### 3. Geo-Based Proxy Selection
Use proxies from specific regions:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"geo": "eu"
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info
```
Available geos: `us`, `eu`, `br` (Brazil), `fr` (France), `de` (Germany), `4g-eu`
### 4. Smart Retries
Retry on specific HTTP status codes or text patterns:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'
```
### 5. Extract Data with Cheerio
Extract structured JSON using Cheerio extractor functions:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'
```
### 6. Intercept AJAX Requests
Capture XHR/fetch responses:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'
```
### 7. Block Resources for Speed
Speed up JS rendering by blocking images and media:
Write to `/tmp/scrapeninja_request.json`:
```json
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}
```
Then run:
```bash
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json'
```
---
## API Endpoints
| Endpoint | Description |
|----------|-------------|
| `/scrape` | Fast non-JS scraping with Chrome TLS fingerprint |
| `/scrape-js` | Full Chrome browser with JS rendering |
| `/v2/scrape-js` | Enhanced JS rendering for protected sites (APIRoad only) |
---
## Request Parameters
### Common Parameters (all endpoints)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string | required | URL to scrape |
| `headers` | string[] | - | Custom HTTP headers |
| `retryNum` | int | 1 | Number of retry attempts |
| `geo` | string | `us` | Proxy geo: us, eu, br, fr, de, 4g-eu |
| `proxy` | string | - | Custom proxy URL (overrides geo) |
| `timeout` | int | 10/16 | Timeout per attempt in seconds |
| `textNotExpected` | string[] | - | Text patterns that trigger retry |
| `statusNotExpected` | int[] | [403, 502] | HTTP status codes that trigger retry |
| `extractor` | string | - | Cheerio extractor function |
### JS Rendering Parameters (`/scrape-js`, `/v2/scrape-js`)
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `waitForSelector` | string | - | CSS selector to wait for |
| `postWaitTime` | int | - | Extra wait time after load (1-12s) |
| `screenshot` | bool | true | Take page screenshot |
| `blockImages` | bool | false | Block image loading |
| `blockMedia` | bool | false | Block CSS/fonts loading |
| `catchAjaxHeadersUrlMask` | string | - | URL pattern to intercept AJAX |
| `viewport` | object | 1920x1080 | Custom viewport size |
---
## Response Format
```json
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}
```
---
## Guidelines
1. **Start with `/scrape`**: Use the fast non-JS endpoint first, only switch to `/scrape-js` if needed
2. **Retries**: Set `retryNum` to 2-3 for unreliable sites
3. **Geo Selection**: Use `eu` for European sites, `us` for American sites
4. **Extractors**: Test extractors at https://scrapeninja.net/cheerio-sandbox/
5. **Blocked Sites**: For Cloudflare/Datadome protected sites, use `/v2/scrape-js` via APIRoad
6. **Screenshots**: Set `screenshot: false` to speed up JS rendering
7. **Rate Limits**: Check your plan limits on RapidAPI/APIRoad dashboard
---
## Tools
- **Playground**: https://scrapeninja.net/scraper-sandbox
- **Cheerio Sandbox**: https://scrapeninja.net/cheerio-sandbox
- **cURL Converter**: https://scrapeninja.net/curl-to-scraper
This skill provides a high-performance web scraping API with Chrome TLS fingerprinting, rotating proxies, smart retries, and optional JavaScript rendering. It offers fast non-JS scraping and full browser rendering endpoints, plus features like geo-based proxies, Cheerio extractors, AJAX interception, and screenshots. Designed for scraping protected and JavaScript-heavy sites while minimizing infrastructure overhead.
You send JSON requests to dedicated endpoints: /scrape for fast non-JS HTML fetches and /scrape-js (or /v2/scrape-js via APIRoad) for full Chrome rendering. The service uses a Chrome TLS fingerprint, rotating proxies (selectable by geo), and configurable retry logic. Optional extractor functions run Cheerio on the HTML and JS rendering can capture AJAX responses and screenshots.
Which endpoint should I try first?
Begin with /scrape for speed; switch to /scrape-js only if the content requires JavaScript rendering.
How do I handle sites that still block requests?
Increase retryNum, add statusNotExpected/textNotExpected rules, use geo proxies, or use /v2/scrape-js via APIRoad for tougher protections.