home / skills / benjaminjackson / exa-skills / search

search skill

safe

This skill helps you locate relevant web content using AI-powered semantic search to find pages, papers, and code by meaning.

npx playbooks add skill benjaminjackson/exa-skills --skill search

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

8.2 KB

---
name: exa-search
description: Search the web for content matching a query with AI-powered semantic search. Use for finding relevant web pages, research papers, news articles, code repositories, or any web content by meaning rather than just keywords.
---

# Exa Search

Token-efficient strategies for web search using exa-ai.

**Use `--help` to see available commands and verify usage before running:**
```bash
exa-ai <command> --help
```

## Critical Requirements

**MUST follow these rules when using exa-ai search:**

### Shared Requirements

This skill inherits requirements from [Common Requirements](../../../docs/common-requirements.md):
- Schema design patterns → All schema operations
- Output format selection → All output operations

### MUST NOT Rules

1. **Avoid --text flag**: Prefer structured output with schemas over raw text extraction for better token efficiency

## Cost Optimization

### Pricing
- **1-25 results**: $0.005 per search
- **26-100 results**: $0.025 per search (5x more expensive)

**Cost strategy:**
1. **Default to 1-25 results**: 5x cheaper, sufficient for most queries
2. **Need 50+ results? Run multiple targeted searches**: Two 25-result searches with different angles beats one 50-result search (better quality, more control)
3. **Use 26-100 results sparingly**: Only when you need comprehensive coverage that multiple targeted searches would miss

## Token Optimization

**Apply these strategies:**

- **Use toon format**: `--output-format toon` for 40% fewer tokens than JSON (use when reading output directly)
- **Use JSON + jq**: Extract only needed fields with jq (use when piping/processing output)
- **Use --summary**: Get AI-generated summaries instead of full page text
- **Use schemas**: Extract structured data with `--summary-schema` (always pipe to jq)
- **Limit results**: Use `--num-results N` to get only what you need

**IMPORTANT**: Choose one approach, don't mix them:
- **Approach 1: toon only** - Compact YAML-like output for direct reading
- **Approach 2: JSON + jq** - Extract specific fields programmatically
- **Approach 3: Schemas + jq** - Get structured data, always use JSON output (default) and pipe to jq

Examples:
```bash
# ❌ High token usage
exa-ai search "AI news" --num-results 10

# ✅ Approach 1: toon format for direct reading (60% reduction)
exa-ai search "AI news" --num-results 3 --output-format toon

# ✅ Approach 2: JSON + jq for field extraction (90% reduction)
exa-ai search "AI news" --num-results 3 | jq -r '.results[].title'

# ❌ Don't mix toon with jq (toon is YAML-like, not JSON)
exa-ai search "AI news" --output-format toon | jq -r '.results[].title'
```

## Quick Start

### Basic Search
```bash
exa-ai search "Anthropic Claude new features" --num-results 5 --output-format toon
```

### Search with Category Filter
```bash
exa-ai search "machine learning architectures" --category "research paper" --num-results 10
```

### Extract Structured Data
```bash
exa-ai search "AI safety research papers 2024" \
  --summary \
  --summary-schema '{"type":"object","properties":{"title":{"type":"string"},"key_finding":{"type":"string"}}}' \
  --num-results 3 | jq -r '.results[].summary | fromjson | "- \(.title): \(.key_finding)"'
```

### LinkedIn Search
```bash
exa-ai search "Anthropic" --linkedin company
exa-ai search "Dario Amodei" --linkedin person
```

## Detailed Reference

For complete options, examples, and advanced usage, consult [REFERENCE.md](REFERENCE.md).

### Shared Requirements

<shared-requirements>

## Schema Design

### MUST: Use object wrapper for schemas

**Applies to**: answer, search, find-similar, get-contents

When using schema parameters (`--output-schema` or `--summary-schema`), always wrap properties in an object:

```json
{"type":"object","properties":{"field_name":{"type":"string"}}}
```

**DO NOT** use bare properties without the object wrapper:
```json
{"properties":{"field_name":{"type":"string"}}}  // ❌ Missing "type":"object"
```

**Why**: The Exa API requires a valid JSON Schema with an object type at the root level. Omitting this causes validation errors.

**Examples**:
```bash
# ✅ CORRECT - object wrapper included
exa-ai search "AI news" \
  --summary-schema '{"type":"object","properties":{"headline":{"type":"string"}}}'

# ❌ WRONG - missing object wrapper
exa-ai search "AI news" \
  --summary-schema '{"properties":{"headline":{"type":"string"}}}'
```

---

## Output Format Selection

### MUST NOT: Mix toon format with jq

**Applies to**: answer, context, search, find-similar, get-contents

`toon` format produces YAML-like output, not JSON. DO NOT pipe toon output to jq for parsing:

```bash
# ❌ WRONG - toon is not JSON
exa-ai search "query" --output-format toon | jq -r '.results'

# ✅ CORRECT - use JSON (default) with jq
exa-ai search "query" | jq -r '.results[].title'

# ✅ CORRECT - use toon for direct reading only
exa-ai search "query" --output-format toon
```

**Why**: jq expects valid JSON input. toon format is designed for human readability and produces YAML-like output that jq cannot parse.

### SHOULD: Choose one output approach

**Applies to**: answer, context, search, find-similar, get-contents

Pick one strategy and stick with it throughout your workflow:

1. **Approach 1: toon only** - Compact YAML-like output for direct reading
   - Use when: Reading output directly, no further processing needed
   - Token savings: ~40% reduction vs JSON
   - Example: `exa-ai search "query" --output-format toon`

2. **Approach 2: JSON + jq** - Extract specific fields programmatically
   - Use when: Need to extract specific fields or pipe to other commands
   - Token savings: ~80-90% reduction (extracts only needed fields)
   - Example: `exa-ai search "query" | jq -r '.results[].title'`

3. **Approach 3: Schemas + jq** - Structured data extraction with validation
   - Use when: Need consistent structured output across multiple queries
   - Token savings: ~85% reduction + consistent schema
   - Example: `exa-ai search "query" --summary-schema '{...}' | jq -r '.results[].summary | fromjson'`

**Why**: Mixing approaches increases complexity and token usage. Choosing one approach optimizes for your use case.

---

## Shell Command Best Practices

### MUST: Run commands directly, parse separately

**Applies to**: monitor, search (websets), research, and all skills using complex commands

When using the Bash tool with complex shell syntax, run commands directly and parse output in separate steps:

```bash
# ❌ WRONG - nested command substitution
webset_id=$(exa-ai webset-create --search '{"query":"..."}' | jq -r '.webset_id')

# ✅ CORRECT - run directly, then parse
exa-ai webset-create --search '{"query":"..."}'
# Then in a follow-up command:
webset_id=$(cat output.json | jq -r '.webset_id')
```

**Why**: Complex nested `$(...)` command substitutions can fail unpredictably in shell environments. Running commands directly and parsing separately improves reliability and makes debugging easier.

### MUST NOT: Use nested command substitutions

**Applies to**: All skills when using complex multi-step operations

Avoid nesting multiple levels of command substitution:

```bash
# ❌ WRONG - deeply nested
result=$(exa-ai search "$(cat query.txt | tr '\n' ' ')" --num-results $(cat config.json | jq -r '.count'))

# ✅ CORRECT - sequential steps
query=$(cat query.txt | tr '\n' ' ')
count=$(cat config.json | jq -r '.count')
exa-ai search "$query" --num-results $count
```

**Why**: Nested command substitutions are fragile and hard to debug when they fail. Sequential steps make each operation explicit and easier to troubleshoot.

### SHOULD: Break complex commands into sequential steps

**Applies to**: All skills when working with multi-step workflows

For readability and reliability, break complex operations into clear sequential steps:

```bash
# ❌ Less maintainable - everything in one line
exa-ai webset-create --search '{"query":"startups","count":1}' | jq -r '.webset_id' | xargs -I {} exa-ai webset-search-create {} --query "AI" --behavior override

# ✅ More maintainable - clear steps
exa-ai webset-create --search '{"query":"startups","count":1}'
webset_id=$(jq -r '.webset_id' < output.json)
exa-ai webset-search-create $webset_id --query "AI" --behavior override
```

**Why**: Sequential steps are easier to understand, debug, and modify. Each step can be verified independently.

</shared-requirements>

Overview

This skill provides AI-powered semantic web search using exa-ai, optimized for token efficiency and cost control. It helps find relevant web pages, papers, news, and repositories by meaning rather than keyword matches. The skill emphasizes structured outputs, schema usage, and practical command patterns for reliable results.

How this skill works

The skill runs exa-ai search commands with options for result count, output format, summaries, and JSON schemas. It inspects web content, extracts AI-generated summaries or structured fields, and returns compact outputs suitable for human reading or programmatic pipelines. It enforces token- and cost-saving strategies like limiting results, choosing toon vs JSON, and using --summary or --summary-schema.

When to use it

When you need semantic matches across web pages, research, news, or code repositories.
When you want token-efficient search results for downstream processing or human consumption.
When you require structured summaries or validated fields for automation (use --summary-schema).
When running repeated targeted searches instead of a single large, expensive query.
When you need reliable shell workflows broken into sequential steps for debugging.

Best practices

Default to 1–25 results (--num-results 1..25) to minimize cost unless comprehensive coverage is required.
Pick a single output approach and stick with it: toon for reading, JSON+jq for extraction, or Schemas+jq for structured data.
Avoid the --text flag; prefer structured summaries or schemas for token efficiency.
Wrap schema definitions in an object: include "type":"object" at the root to avoid validation errors.
Do not mix toon with jq; toon is YAML-like and not valid JSON for jq parsing.
Run complex shell operations sequentially and avoid nested command substitutions for robustness.

Example use cases

Quickly scan top 3–5 recent AI safety papers with --summary and --num-results 5, using toon for a compact readout.
Extract titles and key findings from 3 research articles using --summary-schema and jq for automated ingestion.
Search LinkedIn profiles or companies with built-in --linkedin filters for targeted background checks.
Run two focused 25-result searches with different angles instead of a single 50-result search to reduce cost and improve relevance.
Create a repeatable pipeline: run exa-ai search, save JSON, then parse selected fields with jq in a separate step.

FAQ

Why should I avoid the --text flag?

--text returns raw page text which consumes many tokens. Structured summaries or schemas produce compact, targeted output and reduce cost.

When should I use toon vs JSON?

Use toon for human-readable, low-token output when not parsing. Use JSON+jq when extracting fields programmatically. Use schemas+jq for validated structured data.