home / skills / openclaw / skills / tinyfish-web-agent

tinyfish-web-agent skill

/skills/simantak-dabhade/tinyfish-web-agent

This skill extracts data from websites using TinyFish, handling bot protection and automating browser tasks for reliable web scraping.

This is most likely a fork of the tinyfish-web-agent skill from tinyfish-io
npx playbooks add skill openclaw/skills --skill tinyfish-web-agent

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
4.7 KB
---
name: tinyfish
description: Use TinyFish web agent to extract/scrape websites, extract data, and automate browser actions using natural language. Use when you need to extract/scrape data from websites, handle bot-protected sites, or automate web tasks.
homepage: https://agent.tinyfish.ai
requires:
  env:
    - TINYFISH_API_KEY
---

# TinyFish Web Agent

Requires: `TINYFISH_API_KEY` environment variable

## Pre-flight Check (REQUIRED)

Before making any API call, **always** run this first to verify the key is available:

```bash
[ -n "$TINYFISH_API_KEY" ] && echo "TINYFISH_API_KEY is set" || echo "TINYFISH_API_KEY is NOT set"
```

If the key is **not set**, you **MUST stop and ask the user** to add their API key. Do **NOT** fall back to other tools or approaches — the task requires TinyFish.

Tell the user:

> You need a TinyFish API key. Get one at: <https://agent.tinyfish.ai/api-keys>
>
> Then set it so the agent can use it:
>
> **Option 1 — Environment variable (works everywhere):**
> ```bash
> export TINYFISH_API_KEY="your-key-here"
> ```
>
> **Option 2 — Claude Code settings (Claude Code only):**
> Add to `~/.claude/settings.local.json`:
> ```json
> {
>   "env": {
>     "TINYFISH_API_KEY": "your-key-here"
>   }
> }
> ```

Do NOT proceed until the key is confirmed available.

## Best Practices

1. **Specify JSON format**: Always describe the exact structure you want returned
2. **Parallel calls**: When extracting from multiple independent sites, make separate parallel calls instead of combining into one prompt

## Basic Extract/Scrape

Extract data from a page. Specify the JSON structure you want:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "goal": "Extract product info as JSON: {\"name\": str, \"price\": str, \"in_stock\": bool}"
  }'
```

## Multiple Items

Extract lists of data with explicit structure:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "goal": "Extract all products as JSON array: [{\"name\": str, \"price\": str, \"url\": str}]"
  }'
```

## Stealth Mode

For bot-protected sites, add `"browser_profile": "stealth"` to the request body:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://protected-site.com",
    "goal": "Extract product data as JSON: {\"name\": str, \"price\": str, \"description\": str}",
    "browser_profile": "stealth"
  }'
```

## Proxy

Route through a specific country by adding `"proxy_config"` to the body:

```bash
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://geo-restricted-site.com",
    "goal": "Extract pricing data as JSON: {\"item\": str, \"price\": str, \"currency\": str}",
    "browser_profile": "stealth",
    "proxy_config": {"enabled": true, "country_code": "US"}
  }'
```

## Output

The SSE stream returns `data: {...}` lines. The final result is the event where `type == "COMPLETE"` and `status == "COMPLETED"` — the extracted data is in the `resultJson` field. Claude reads the raw SSE output directly; no script-side parsing is needed.

## Parallel Extraction

When extracting from multiple independent sources, make separate parallel curl calls instead of combining into one prompt:

**Good** - Parallel calls:
```bash
# Compare pizza prices - run these simultaneously
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pizzahut.com",
    "goal": "Extract pizza prices as JSON: [{\"name\": str, \"price\": str}]"
  }'

curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://dominos.com",
    "goal": "Extract pizza prices as JSON: [{\"name\": str, \"price\": str}]"
  }'
```

**Bad** - Single combined call:
```bash
# Don't do this - less reliable and slower
curl -N -s -X POST "https://agent.tinyfish.ai/v1/automation/run-sse" \
  -H "X-API-Key: $TINYFISH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://pizzahut.com",
    "goal": "Extract prices from Pizza Hut and also go to Dominos..."
  }'
```

Each independent extraction task should be its own API call. This is faster (parallel execution) and more reliable.

Overview

This skill uses the TinyFish (Mino) web agent to extract content, scrape structured data, and automate browser actions via natural language. It connects to the Mino API and runs headless browser sessions that can handle complex pages, geo-restricted content, and bot protections. Use it to return clean JSON results suitable for pipelines, databases, or downstream processing.

How this skill works

You send the target URL and a clear natural-language goal that specifies the JSON structure you want. The agent drives a real browser session, performs navigation and extraction, and streams events until completion. Results appear in event["resultJson"] when the run completes; options include browser profiles (like stealth) and proxy_config to control routing.

When to use it

  • Scraping product, pricing, or listing pages into a strict JSON schema.
  • Extracting multiple items from a paginated or dynamic site.
  • Accessing sites protected by anti-bot measures using stealth profiles.
  • Collecting data from geo-restricted pages by routing through a country-specific proxy.
  • Automating multi-step browser tasks where structured output is required.

Best practices

  • Always define the exact JSON schema you need in the goal to ensure consistent output.
  • Run independent extractions as parallel API calls instead of a single combined prompt.
  • Use the stealth browser_profile for bot-protected sites and add proxy_config for geo-sensitive content.
  • Stream and monitor events; read event["type"] and event["resultJson"] on COMPLETE for final results.
  • Keep prompts focused and single-purpose to improve speed and reliability.

Example use cases

  • Extract all products from an e-commerce category page into [{"name": str, "price": str, "url": str}].
  • Compare hourly pricing across competitor sites by running parallel extract calls.
  • Collect localized pricing by setting proxy_config to the target country code.
  • Automate login and download steps, then extract structured data from the post-login page.
  • Scrape listings from a site with bot defenses using browser_profile set to stealth.

FAQ

How do I get the final JSON result?

Monitor the SSE stream events and read event["resultJson"] when event["type"] == "COMPLETE" and status is COMPLETED.

Can TinyFish bypass anti-bot protections?

It can mitigate many protections using the stealth browser profile, but success depends on the site's defenses and may require proxy or session tweaks.