home / skills / benjaminjackson / exa-skills / get-contents
This skill retrieves and summarizes web content from URLs, extracting structured data to fuel efficient analysis and automation.
npx playbooks add skill benjaminjackson/exa-skills --skill get-contentsReview the files below or copy the command above to add this skill to your agents.
---
name: exa-get-contents
description: Retrieve and extract content from URLs with AI-powered summarization and structured data extraction. Use for scraping web pages, extracting specific information, summarizing articles, or crawling websites with subpages.
---
# Exa Get Contents
Token-efficient strategies for retrieving and extracting content from URLs using exa-ai.
**Use `--help` to see available commands and verify usage before running:**
```bash
exa-ai <command> --help
```
## Critical Requirements
**MUST follow these rules when using exa-ai get-contents:**
### Shared Requirements
This skill inherits requirements from [Common Requirements](../../../docs/common-requirements.md):
- Schema design patterns → All schema operations
- Output format selection → All output operations
### MUST Rules
1. **Always use livecrawl**: Include `--livecrawl-timeout 10000` for fresh, up-to-date content instead of cached results
### SHOULD Rules
1. **Prefer --summary over --text**: Use summaries with schemas for structured extraction instead of full text for better token efficiency
## Cost Optimization
### Pricing
- **Per piece of content**: $0.001
Each URL counts as one piece of content. Multiple URLs increase cost linearly.
**Cost strategy:**
- Only fetch URLs you need
- Use `--summary` instead of `--text` to reduce processing (and token costs)
- Combine with search results to target specific URLs rather than crawling broadly
## Token Optimization
**Apply these strategies:**
- **Use toon format**: `--output-format toon` for 40% fewer tokens than JSON (use when reading output directly)
- **Use JSON + jq**: Extract only needed fields with jq (use when piping/processing output)
- **Use --summary**: Get AI-generated summaries instead of full page text
- **Use schemas**: Extract structured data with `--summary-schema` (always pipe to jq)
- **Limit extraction**: Use `--text-max-characters`, `--links`, and `--image-links` to control output size
**IMPORTANT**: Choose one approach, don't mix them:
- **Approach 1: toon only** - Compact YAML-like output for direct reading
- **Approach 2: JSON + jq** - Extract specific fields programmatically
- **Approach 3: Schemas + jq** - Get structured data, always use JSON output (default) and pipe to jq
Examples:
```bash
# ❌ High token usage - full text
exa-ai get-contents "https://example.com" --text --livecrawl-timeout 10000
# ✅ Approach 1: toon format with summary (70% reduction)
exa-ai get-contents "https://example.com" --summary --livecrawl-timeout 10000 --output-format toon
# ✅ Approach 2: JSON + jq for summary extraction (80% reduction)
exa-ai get-contents "https://example.com" --summary --livecrawl-timeout 10000 | jq '.results[].summary'
# ✅ Approach 3: Schema + jq for structured extraction (85% reduction)
exa-ai get-contents "https://example.com" \
--summary \
--livecrawl-timeout 10000 \
--summary-schema '{"type":"object","properties":{"key_info":{"type":"string"}}}' | \
jq -r '.results[].summary | fromjson | .key_info'
# ❌ Don't mix toon with jq (toon is YAML-like, not JSON)
exa-ai get-contents "https://example.com" --output-format toon | jq -r '.results'
```
## Quick Start
### Basic Content with Summary
```bash
exa-ai get-contents "https://anthropic.com" --summary --livecrawl-timeout 10000 --output-format toon
```
### Custom Summary Query
```bash
exa-ai get-contents "https://techcrunch.com" \
--summary \
--livecrawl-timeout 10000 \
--summary-query "What are the main tech news stories on this page?" | jq '.results[].summary'
```
### Structured Data Extraction
```bash
exa-ai get-contents "https://www.stripe.com" \
--summary \
--livecrawl-timeout 10000 \
--summary-schema '{"type":"object","properties":{"company_name":{"type":"string"},"main_product":{"type":"string"},"target_market":{"type":"string"}}}' | jq -r '.results[].summary | fromjson'
```
### Multiple URLs
```bash
exa-ai get-contents "https://anthropic.com,https://openai.com,https://cohere.com" \
--summary \
--livecrawl-timeout 10000 \
--output-format toon
```
## Detailed Reference
For complete options, examples, and advanced usage, consult [REFERENCE.md](REFERENCE.md).
### Shared Requirements
<shared-requirements>
## Schema Design
### MUST: Use object wrapper for schemas
**Applies to**: answer, search, find-similar, get-contents
When using schema parameters (`--output-schema` or `--summary-schema`), always wrap properties in an object:
```json
{"type":"object","properties":{"field_name":{"type":"string"}}}
```
**DO NOT** use bare properties without the object wrapper:
```json
{"properties":{"field_name":{"type":"string"}}} // ❌ Missing "type":"object"
```
**Why**: The Exa API requires a valid JSON Schema with an object type at the root level. Omitting this causes validation errors.
**Examples**:
```bash
# ✅ CORRECT - object wrapper included
exa-ai search "AI news" \
--summary-schema '{"type":"object","properties":{"headline":{"type":"string"}}}'
# ❌ WRONG - missing object wrapper
exa-ai search "AI news" \
--summary-schema '{"properties":{"headline":{"type":"string"}}}'
```
---
## Output Format Selection
### MUST NOT: Mix toon format with jq
**Applies to**: answer, context, search, find-similar, get-contents
`toon` format produces YAML-like output, not JSON. DO NOT pipe toon output to jq for parsing:
```bash
# ❌ WRONG - toon is not JSON
exa-ai search "query" --output-format toon | jq -r '.results'
# ✅ CORRECT - use JSON (default) with jq
exa-ai search "query" | jq -r '.results[].title'
# ✅ CORRECT - use toon for direct reading only
exa-ai search "query" --output-format toon
```
**Why**: jq expects valid JSON input. toon format is designed for human readability and produces YAML-like output that jq cannot parse.
### SHOULD: Choose one output approach
**Applies to**: answer, context, search, find-similar, get-contents
Pick one strategy and stick with it throughout your workflow:
1. **Approach 1: toon only** - Compact YAML-like output for direct reading
- Use when: Reading output directly, no further processing needed
- Token savings: ~40% reduction vs JSON
- Example: `exa-ai search "query" --output-format toon`
2. **Approach 2: JSON + jq** - Extract specific fields programmatically
- Use when: Need to extract specific fields or pipe to other commands
- Token savings: ~80-90% reduction (extracts only needed fields)
- Example: `exa-ai search "query" | jq -r '.results[].title'`
3. **Approach 3: Schemas + jq** - Structured data extraction with validation
- Use when: Need consistent structured output across multiple queries
- Token savings: ~85% reduction + consistent schema
- Example: `exa-ai search "query" --summary-schema '{...}' | jq -r '.results[].summary | fromjson'`
**Why**: Mixing approaches increases complexity and token usage. Choosing one approach optimizes for your use case.
---
## Shell Command Best Practices
### MUST: Run commands directly, parse separately
**Applies to**: monitor, search (websets), research, and all skills using complex commands
When using the Bash tool with complex shell syntax, run commands directly and parse output in separate steps:
```bash
# ❌ WRONG - nested command substitution
webset_id=$(exa-ai webset-create --search '{"query":"..."}' | jq -r '.webset_id')
# ✅ CORRECT - run directly, then parse
exa-ai webset-create --search '{"query":"..."}'
# Then in a follow-up command:
webset_id=$(cat output.json | jq -r '.webset_id')
```
**Why**: Complex nested `$(...)` command substitutions can fail unpredictably in shell environments. Running commands directly and parsing separately improves reliability and makes debugging easier.
### MUST NOT: Use nested command substitutions
**Applies to**: All skills when using complex multi-step operations
Avoid nesting multiple levels of command substitution:
```bash
# ❌ WRONG - deeply nested
result=$(exa-ai search "$(cat query.txt | tr '\n' ' ')" --num-results $(cat config.json | jq -r '.count'))
# ✅ CORRECT - sequential steps
query=$(cat query.txt | tr '\n' ' ')
count=$(cat config.json | jq -r '.count')
exa-ai search "$query" --num-results $count
```
**Why**: Nested command substitutions are fragile and hard to debug when they fail. Sequential steps make each operation explicit and easier to troubleshoot.
### SHOULD: Break complex commands into sequential steps
**Applies to**: All skills when working with multi-step workflows
For readability and reliability, break complex operations into clear sequential steps:
```bash
# ❌ Less maintainable - everything in one line
exa-ai webset-create --search '{"query":"startups","count":1}' | jq -r '.webset_id' | xargs -I {} exa-ai webset-search-create {} --query "AI" --behavior override
# ✅ More maintainable - clear steps
exa-ai webset-create --search '{"query":"startups","count":1}'
webset_id=$(jq -r '.webset_id' < output.json)
exa-ai webset-search-create $webset_id --query "AI" --behavior override
```
**Why**: Sequential steps are easier to understand, debug, and modify. Each step can be verified independently.
</shared-requirements>
This skill retrieves and extracts content from web URLs with AI-powered summarization and structured data extraction. It focuses on token- and cost-efficient web scraping by enforcing live crawling and promoting summary- and schema-based extraction. Use it to fetch fresh page content, produce compact summaries, or return validated JSON schemas for downstream processing.
The tool always performs a live crawl (use --livecrawl-timeout 10000) to fetch up-to-date page content. It supports three output strategies: compact toon output for human reading, JSON + jq for programmatic field extraction, and schema-driven summaries (--summary-schema) for structured data. Use --summary instead of full text and wrap schemas in an object to ensure valid extraction.
Why must I use a root object in schemas?
The API requires a JSON Schema with type:"object" at the root. Omitting it causes validation errors, so always wrap properties in an object.
Can I mix toon output with jq parsing?
No. Toon produces YAML-like output and is not valid JSON. Use JSON output when you plan to pipe results to jq.