home / skills / tinyfish-io / skills / tinyfish-web-agent

tinyfish-web-agent skill

/skills/tinyfish-web-agent

This skill enables automated web data extraction and task automation using TinyFish, handling bot-protected sites and browser actions via natural language.

npx playbooks add skill tinyfish-io/skills --skill tinyfish-web-agent

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
4.7 KB
---
name: tinyfish
description: Use TinyFish web agent to extract/scrape websites, extract data, and automate browser actions using natural language. Use when you need to extract/scrape data from websites, handle bot-protected sites, or automate web tasks.
homepage: https://agent.tinyfish.ai
requires:
  env:
    - TINYFISH_API_KEY
---

# TinyFish Web Agent

Requires: `TINYFISH_API_KEY` environment variable

## Pre-flight Check (REQUIRED)

Before making any API call, **always** run this first to verify the key is available:

```bash
[ -n "$TINYFISH_API_KEY" ] && echo "TINYFISH_API_KEY is set" || echo "TINYFISH_API_KEY is NOT set"
```

If the key is **not set**, you **MUST stop and ask the user** to add their API key. Do **NOT** fall back to other tools or approaches — the task requires TinyFish.

Tell the user:

> You need a TinyFish API key. Get one at: <https://agent.tinyfish.ai/api-keys>
>
> Then set it so the agent can use it:
>
> **Option 1 — Environment variable (works everywhere):**
> ```bash
> export TINYFISH_API_KEY="your-key-here"
> ```
>
> **Option 2 — Claude Code settings (Claude Code only):**
> Add to `~/.claude/settings.local.json`:
> ```json
> {
>   "env": {
>     "TINYFISH_API_KEY": "your-key-here"
>   }
> }
> ```

Do NOT proceed until the key is confirmed available.

## Best Practices

1. **Specify JSON format**: Always describe the exact structure you want returned
2. **Parallel calls**: When extracting from multiple independent sites, make separate parallel calls instead of combining into one prompt

## Basic Extract/Scrape

Extract data from a page. Specify the JSON structure you want:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "goal": "Extract product info as JSON: {\"name\": str, \"price\": str, \"in_stock\": bool}"
  }'
```

## Multiple Items

Extract lists of data with explicit structure:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "goal": "Extract all products as JSON array: [{\"name\": str, \"price\": str, \"url\": str}]"
  }'
```

## Stealth Mode

For bot-protected sites, add `"browser_profile": "stealth"` to the request body:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://protected-site.com",
    "goal": "Extract product data as JSON: {\"name\": str, \"price\": str, \"description\": str}",
    "browser_profile": "stealth"
  }'
```

## Proxy

Route through a specific country by adding `"proxy_config"` to the body:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geo-restricted-site.com",
    "goal": "Extract pricing data as JSON: {\"item\": str, \"price\": str, \"currency\": str}",
    "browser_profile": "stealth",
    "proxy_config": {"enabled": true, "country_code": "US"}
  }'
```

## Output

The SSE stream returns `data: {...}` lines. The final result is the event where `type == "COMPLETE"` and `status == "COMPLETED"` — the extracted data is in the `resultJson` field. Claude reads the raw SSE output directly; no script-side parsing is needed.

## Parallel Extraction

When extracting from multiple independent sources, make separate parallel curl calls instead of combining into one prompt:

**Good** - Parallel calls:
```bash
# Compare pizza prices - run these simultaneously
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pizzahut.com",
    "goal": "Extract pizza prices as JSON: [{\"name\": str, \"price\": str}]"
  }'

curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://dominos.com",
    "goal": "Extract pizza prices as JSON: [{\"name\": str, \"price\": str}]"
  }'
```

**Bad** - Single combined call:
```bash
# Don't do this - less reliable and slower
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pizzahut.com",
    "goal": "Extract prices from Pizza Hut and also go to Dominos..."
  }'
```

Each independent extraction task should be its own API call. This is faster (parallel execution) and more reliable.

Overview

This skill integrates the TinyFish web agent to extract data, scrape websites, and automate browser actions using natural-language goals. It requires a TinyFish API key and is designed for reliable, structured data output and bot-protected site handling. Use it to return JSON-formatted results directly from web pages or to run automated browser flows.

How this skill works

Before any call, the skill verifies the TINYFISH_API_KEY environment variable is set and stops if it is missing. It sends POST requests to TinyFish's run-sse endpoint with a URL and a natural-language goal that specifies the exact JSON structure to return. For bot-protected or geo-restricted sites, the request can include stealth browser profiles and proxy configuration. The final extracted data appears in the SSE stream's COMPLETE event in the resultJson field.

When to use it

  • You need structured JSON extraction from a single web page or list of items.
  • Scraping sites that use bot detection or require realistic browser behavior.
  • Automating sequences of browser actions and collecting the final outputs.
  • Extracting data from geo-restricted pages by routing through a country-specific proxy.
  • Running parallel, independent extractions for speed and reliability.

Best practices

  • Always run the pre-flight check to confirm TINYFISH_API_KEY is set; do not proceed without it.
  • Specify the exact JSON schema you want returned (fields and types) to get predictable output.
  • Make independent extraction tasks in parallel API calls rather than combining multiple sites in one call.
  • Use browser_profile: "stealth" for bot-protected sites and proxy_config for geographic targeting.
  • Read the SSE stream and use the event with type == "COMPLETE" and status == "COMPLETED" to retrieve resultJson.

Example use cases

  • Extract product lists and prices from an e-commerce category page as a JSON array.
  • Scrape job postings across multiple company pages in parallel and normalize fields.
  • Access a region-locked pricing page by routing through a US proxy and return structured pricing data.
  • Automate a login flow in stealth mode, navigate to a user dashboard, and extract account metrics.
  • Compare menu item prices between competitors by running separate, parallel extraction calls.

FAQ

What happens if the TinyFish API key is not set?

The skill halts and asks you to set TINYFISH_API_KEY. You must provide the key before any API calls; do not fallback to other tools.

How do I handle bot-protected sites?

Include "browser_profile": "stealth" in the request body to use a stealth browser profile. Add proxy_config when you also need a specific country route.