home / skills / intellectronica / agent-skills / tavily

tavily skill

safe

This skill enables live web search, extraction, mapping, and crawling via Tavily REST API to produce structured results for AI workflows.

npx playbooks add skill intellectronica/agent-skills --skill tavily

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

7.1 KB

---
name: tavily
description: Use this skill for web search, extraction, mapping, crawling, and research via Tavily’s REST API when web searches are needed and no built-in tool is available, or when Tavily’s LLM-friendly format is beneficial.
---

# Tavily

## Purpose

Provide a curl-based interface to Tavily’s REST API for web search, extraction, mapping, crawling, and optional research. Return structured results suitable for LLM workflows and multi-step investigations.

## When to Use

- Use when a task needs live web information, site extraction, mapping, or crawling.
- Use when web searches are needed and no built-in tool is available, or when Tavily’s LLM-friendly output (summaries, chunks, sources, citations) is beneficial.
- Use when a task requires structured search results, extraction, or site discovery from Tavily.

## Required Environment

- Require `TAVILY_API_KEY` in the environment.
- If `TAVILY_API_KEY` is missing, prompt the user to provide the API key before proceeding.

## Base URL and Auth

- Base URL: `https://api.tavily.com`
- Authentication: `Authorization: Bearer $TAVILY_API_KEY`
- Content type: `Content-Type: application/json`
- Optional project tracking: add `X-Project-ID: <project-id>` if project attribution is needed.

## Tool Mapping (Tavily REST)

### 1) search → POST /search

Use for web search with optional answer and content extraction.

Recommended minimal request:

```bash
curl -sS -X POST "https://api.tavily.com/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "query": "<query>",
    "search_depth": "basic",
    "max_results": 5,
    "include_answer": true,
    "include_raw_content": false,
    "include_images": false
  }'
```

Key parameters (all optional unless noted):
- `query` (required): search text
- `search_depth`: `basic` | `advanced` | `fast` | `ultra-fast`
- `chunks_per_source`: 1–3 (advanced only)
- `max_results`: 0–20
- `topic`: `general` | `news` | `finance`
- `time_range`: `day|week|month|year|d|w|m|y`
- `start_date`, `end_date`: `YYYY-MM-DD`
- `include_answer`: `false` | `true` | `basic` | `advanced`
- `include_raw_content`: `false` | `true` | `markdown` | `text`
- `include_images`: boolean
- `include_image_descriptions`: boolean
- `include_favicon`: boolean
- `include_domains`, `exclude_domains`: string arrays
- `country`: country name (general topic only)
- `auto_parameters`: boolean
- `include_usage`: boolean

Expected response fields:
- `answer` (if requested), `results[]` with `title`, `url`, `content`, `score`, `raw_content` (optional), `favicon` (optional)
- `response_time`, `usage`, `request_id`

### 2) extract → POST /extract

Use for extracting content from specific URLs.

```bash
curl -sS -X POST "https://api.tavily.com/extract" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "urls": ["https://example.com/article"],
    "query": "<optional intent for reranking>",
    "chunks_per_source": 3,
    "extract_depth": "basic",
    "format": "markdown",
    "include_images": false,
    "include_favicon": false
  }'
```

Key parameters:
- `urls` (required): array of URLs
- `query`: rerank chunks by intent
- `chunks_per_source`: 1–5 (only when `query` provided)
- `extract_depth`: `basic` | `advanced`
- `format`: `markdown` | `text`
- `timeout`: 1–60 seconds
- `include_usage`: boolean

Expected response fields:
- `results[]` with `url`, `raw_content`, `images`, `favicon`
- `failed_results[]`, `response_time`, `usage`, `request_id`

### 3) map → POST /map

Use for generating a site map (URL discovery only).

```bash
curl -sS -X POST "https://api.tavily.com/map" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "url": "https://docs.tavily.com",
    "max_depth": 1,
    "max_breadth": 20,
    "limit": 50,
    "allow_external": true
  }'
```

Key parameters:
- `url` (required)
- `instructions`: natural language guidance (raises cost)
- `max_depth`: 1–5
- `max_breadth`: 1+
- `limit`: 1+
- `select_paths`, `select_domains`, `exclude_paths`, `exclude_domains`: arrays of regex strings
- `allow_external`: boolean
- `timeout`: 10–150 seconds
- `include_usage`: boolean

Expected response fields:
- `base_url`, `results[]` (list of URLs), `response_time`, `usage`, `request_id`

### 4) crawl → POST /crawl

Use for site traversal with built-in extraction.

```bash
curl -sS -X POST "https://api.tavily.com/crawl" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "url": "https://docs.tavily.com",
    "instructions": "Find all pages about the Python SDK",
    "max_depth": 1,
    "max_breadth": 20,
    "limit": 50,
    "extract_depth": "basic",
    "format": "markdown",
    "include_images": false
  }'
```

Key parameters:
- `url` (required)
- `instructions`: optional; raises cost and enables `chunks_per_source`
- `chunks_per_source`: 1–5 (only with `instructions`)
- `max_depth`, `max_breadth`, `limit`: same as map
- `extract_depth`: `basic` | `advanced`
- `format`: `markdown` | `text`
- `include_images`, `include_favicon`, `allow_external`
- `timeout`: 10–150 seconds
- `include_usage`: boolean

Expected response fields:
- `base_url`, `results[]` with `url`, `raw_content`, `favicon`
- `response_time`, `usage`, `request_id`

## Optional Research Workflow (Deep Investigation)

Use when a query needs multi-step analysis and citations.

### create research task → POST /research

```bash
curl -sS -X POST "https://api.tavily.com/research" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{
    "input": "<research question>",
    "model": "auto",
    "stream": false,
    "citation_format": "numbered"
  }'
```

Expected response fields:
- `request_id`, `created_at`, `status` (pending), `input`, `model`, `response_time`

### get research status → GET /research/{request_id}

```bash
curl -sS -X GET "https://api.tavily.com/research/<request_id>" \
  -H "Authorization: Bearer $TAVILY_API_KEY"
```

Expected response fields:
- `status`: `completed`
- `content`: report text or structured object
- `sources[]`: `{ title, url, favicon }`

### streaming research (SSE)

Set `"stream": true` in the POST body and use curl with `-N` to stream events:

```bash
curl -N -X POST "https://api.tavily.com/research" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TAVILY_API_KEY" \
  -d '{"input":"<question>","stream":true,"model":"pro"}'
```

Handle SSE events (tool calls, tool responses, content chunks, sources, done).

## Usage Notes

- Treat `search`, `extract`, `map`, and `crawl` as the primary endpoints for discovery and content retrieval.
- Return structured results with URLs, titles, and summaries for easy downstream use.
- Default to conservative parameters (`search_depth: basic`, `max_results: 5`) unless deeper recall is needed.
- Reuse consistent request bodies across calls to keep results predictable.

## Error Handling

- If any request returns 401/403, prompt for or re-check `TAVILY_API_KEY`.
- If timeouts occur, reduce `max_depth`/`limit` or use `search_depth: basic`.
- If responses are too large, lower `max_results` or `chunks_per_source`.

Overview

This skill provides a REST API interface to Tavily for web search, extraction, mapping, crawling, and optional multi-step research. It returns structured, LLM-friendly results (summaries, chunks, sources, citations) suitable for downstream agent workflows. Use it when live web data or site discovery is required and built-in tools are not available.

How this skill works

Authenticate with TAVILY_API_KEY and call the Tavily endpoints: search, extract, map, crawl, and research. Each endpoint accepts JSON parameters to control depth, breadth, formats, and limits and returns structured responses with URLs, titles, content or answers, usage, and request IDs. The research endpoint supports asynchronous polling and streaming (SSE) for long-running, cited investigations.

When to use it

When you need live web search results with summarized, LLM-ready output.
When extracting or reformatting content from specific URLs for downstream analysis.
When you must discover site structure or crawl pages with built-in extraction.
When a multi-step, citation-backed research workflow is required.
When no built-in web tool is available and structured results are necessary.

Best practices

Require TAVILY_API_KEY in the environment and prompt the user if it is missing.
Default to conservative parameters (search_depth: basic, max_results: 5) then increase only if needed.
Prefer include_answer and formatted raw_content for LLM consumption when context is required.
Use map for URL discovery and crawl when you need extraction combined with traversal.
Handle 401/403 by re-checking credentials; reduce depth/limits on timeouts or oversized responses.

Example use cases

Perform a quick web search that returns a short answer and top 5 source summaries for an agent to cite.
Extract markdown-formatted content from a set of product documentation pages for ingestion into a knowledge base.
Generate a site map to discover product pages and then crawl selected paths to extract article text.
Run a research task that streams intermediate findings and final citations for a deep investigation workflow.
Filter search results by domain, country, or time range to assemble focused, timely evidence.

FAQ

What happens if TAVILY_API_KEY is missing or invalid?

The skill should prompt the user to provide or re-check the API key. HTTP 401/403 responses indicate authentication issues.

Which endpoint should I use to discover URLs on a site?

Use map for URL discovery only, and use crawl when you want traversal plus built-in extraction and chunking.