home / skills / merit-systems / x402scan-skills / web-research

web-research skill

/skills/web-research

This skill enables deep web research by using Exa and Firecrawl endpoints to find sources, extract clean text, and answer factual questions.

npx playbooks add skill merit-systems/x402scan-skills --skill web-research

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
7.1 KB
---
name: web-research
description: |
  Neural web search and content extraction using x402-protected APIs. Better than WebSearch for deep research and WebFetch for blocked sites.

  USE FOR:
  - Deep web research and investigation
  - Finding similar pages to a reference URL
  - Extracting clean text from web pages
  - Scraping sites that block standard fetchers
  - Getting direct answers to factual questions
  - Research requiring multiple sources

  TRIGGERS:
  - "research", "investigate", "deep dive", "find sources"
  - "similar to", "pages like", "more like this"
  - "scrape", "extract content from", "get the text from"
  - "blocked site", "can't access", "paywall"
  - "what is", "explain", "answer this"

  Prefer Exa for semantic/neural search, Firecrawl for direct scraping.

  IMPORTANT: Never guess endpoint paths. All paths follow the pattern https://enrichx402.com/api/{provider}/{action}. Use exact URLs from the Quick Reference table below or call x402.discover_api_endpoints first.
mcp:
  - x402
---

# Web Research with x402 APIs

> **STOP — Read before making any API call.** enrichx402.com endpoints are **not** the same as each provider's native API. All paths use the format `https://enrichx402.com/api/{provider}/{action}`. You MUST either:
> 1. Copy exact URLs from the Quick Reference table below, OR
> 2. Run `x402.discover_api_endpoints(url="https://enrichx402.com")` to get the correct paths
>
> **Guessing paths will fail** with 405 errors (wrong path) or 404 errors (missing `/api/` prefix).

Access Exa (neural search) and Firecrawl (web scraping) through x402-protected endpoints.

## Setup

See [rules/getting-started.md](rules/getting-started.md) for installation and wallet setup.

## Quick Reference

| Task | Endpoint | Price | Best For |
|------|----------|-------|----------|
| Neural search | `https://enrichx402.com/api/exa/search` | $0.01 | Semantic web search |
| Find similar | `https://enrichx402.com/api/exa/find-similar` | $0.01 | Pages similar to a URL |
| Extract text | `https://enrichx402.com/api/exa/contents` | $0.002 | Clean text from URLs |
| Direct answers | `https://enrichx402.com/api/exa/answer` | $0.01 | Factual Q&A |
| Scrape page | `https://enrichx402.com/api/firecrawl/scrape` | $0.0126 | Single page to markdown |
| Web search | `https://enrichx402.com/api/firecrawl/search` | $0.0252 | Search with scraping |

## When to Use What

| Scenario | Tool |
|----------|------|
| General web search | WebSearch (free) or Exa ($0.01) |
| Semantic/conceptual search | Exa search |
| Find pages like X | Exa find-similar |
| Get clean text from URL | Exa contents |
| Scrape blocked/JS-heavy site | Firecrawl scrape |
| Search + scrape results | Firecrawl search |
| Quick fact lookup | Exa answer |

See [rules/when-to-use.md](rules/when-to-use.md) for detailed guidance.

## Exa Neural Search

Semantic search that understands meaning, not just keywords:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/search",
  method="POST",
  body={
    "query": "startups building AI agents for customer support",
    "numResults": 10,
    "type": "neural"
  }
)
```

**Options:**
- `query` - Search query (required)
- `numResults` - Number of results (default: 10, max: 25)
- `type` - "neural" (semantic) or "keyword" (traditional)
- `includeDomains` - Only search these domains
- `excludeDomains` - Skip these domains
- `startPublishedDate` / `endPublishedDate` - Date range filter

**Returns**: List of URLs with titles, snippets, and relevance scores.

## Find Similar Pages

Find pages semantically similar to a reference URL:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/find-similar",
  method="POST",
  body={
    "url": "https://example.com/article-i-like",
    "numResults": 10
  }
)
```

Great for:
- Finding competitor products
- Discovering related content
- Expanding research sources

## Extract Text Content

Get clean, structured text from URLs:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/contents",
  method="POST",
  body={
    "urls": [
      "https://example.com/article1",
      "https://example.com/article2"
    ]
  }
)
```

**Options:**
- `urls` - Array of URLs to extract
- `text` - Include full text (default: true)
- `highlights` - Include key highlights

Cheapest option ($0.002) when you already have URLs and just need the content.

## Direct Answers

Get factual answers to questions:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/answer",
  method="POST",
  body={
    "query": "What is the population of Tokyo?"
  }
)
```

Returns a direct answer with source citations. Best for:
- Factual questions
- Quick lookups
- Verification of claims

## Firecrawl Scrape

Scrape a single page to clean markdown:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/firecrawl/scrape",
  method="POST",
  body={
    "url": "https://example.com/page-to-scrape"
  }
)
```

**Options:**
- `url` - Page to scrape (required)
- `formats` - Output formats: ["markdown", "html", "links"]
- `onlyMainContent` - Skip nav/footer/ads (default: true)
- `waitFor` - Wait ms for JS to render

**Advantages over WebFetch:**
- Handles JavaScript-rendered content
- Bypasses common blocking
- Extracts main content only
- LLM-optimized markdown output

## Firecrawl Search

Web search with automatic scraping of results:

```mcp
x402.fetch(
  url="https://enrichx402.com/api/firecrawl/search",
  method="POST",
  body={
    "query": "best practices for react server components",
    "limit": 5
  }
)
```

**Options:**
- `query` - Search query (required)
- `limit` - Number of results (default: 5)
- `scrapeOptions` - Options passed to scraper

Returns search results with full scraped content for each.

## Workflows

### Deep Research

1. (Optional) Check balance: `x402.get_wallet_info`
2. **Discover endpoints (required before first fetch):** `x402.discover_api_endpoints(url="https://enrichx402.com")`
3. Search broadly with Exa
4. Find related sources with find-similar
5. Extract content from top sources
6. Synthesize findings

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/search",
  method="POST",
  body={"query": "AI agents in healthcare 2024", "numResults": 15}
)
```

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/find-similar",
  method="POST",
  body={"url": "https://best-article-found.com"}
)
```

```mcp
x402.fetch(
  url="https://enrichx402.com/api/exa/contents",
  method="POST",
  body={"urls": ["url1", "url2", "url3"]}
)
```

### Blocked Site Scraping

- [ ] Try WebFetch first (free)
- [ ] If blocked/empty, use Firecrawl with `waitFor` for JS-heavy sites

```mcp
x402.fetch(
  url="https://enrichx402.com/api/firecrawl/scrape",
  method="POST",
  body={"url": "https://blocked-site.com/article", "waitFor": 3000}
)
```

## Cost Optimization

- **Use Exa contents** ($0.002) when you already have URLs
- **Use WebSearch/WebFetch first** (free) and fall back to x402 endpoints
- **Batch URL extraction** - pass multiple URLs to Exa contents
- **Limit results** - request only as many as needed

## Parallel Calls

Independent searches can run in parallel:

```mcp
# These don't depend on each other
x402.fetch(url=".../exa/search", body={"query": "topic A"})
x402.fetch(url=".../exa/search", body={"query": "topic B"})
```

Overview

This skill provides neural web search and robust content extraction through x402-protected APIs. It combines semantic search (Exa) with a resilient scraper (Firecrawl) to handle blocked or JS-heavy sites and to return clean, LLM-ready text and direct factual answers. Use it when standard fetchers fail or when you need conceptual matches and multi-source synthesis.

How this skill works

You interact with x402 endpoints that proxy Exa (neural search, find-similar, contents, answer) and Firecrawl (scrape, search). Before using any endpoint you must discover or copy the exact API paths under the enrichx402.com /api/ prefix. Exa performs semantic ranking and content extraction; Firecrawl renders and scrapes pages that block conventional fetchers or require JS.

When to use it

  • Deep investigations that need multiple corroborating sources
  • Finding pages similar to a reference URL (competitor or related content discovery)
  • Extracting clean, structured text from a list of URLs
  • Scraping paywalled, bot-blocked, or JavaScript-heavy pages
  • Getting quick factual answers with source citations

Best practices

  • Always call x402.discover_api_endpoints(url="https://enrichx402.com") or copy exact /api/ URLs before fetching
  • Prefer Exa for semantic queries and bulk content extraction (exa/contents) to save cost
  • Try free WebSearch/WebFetch first; fallback to Exa/Firecrawl only when needed
  • Batch URL extractions in a single exa/contents call for cost efficiency
  • Use Firecrawl scrape with waitFor for pages that require JS rendering or are blocked

Example use cases

  • Researching market landscape: run Exa search, find-similar on top hits, extract with exa/contents, synthesize
  • Investigative reporting: scrape a blocked page with firecrawl/scrape and extract main content as markdown
  • Source expansion: provide a reference URL to exa/find-similar to discover related articles and competitors
  • Fact-checking: query exa/answer for direct answers with source citations
  • Bulk content ingestion: pass many URLs to exa/contents to obtain clean text for downstream LLM analysis

FAQ

Do I need to guess endpoint paths?

No. Never guess paths. Use the exact enrichx402.com /api/ URLs or call x402.discover_api_endpoints to obtain them.

When should I use Firecrawl versus Exa?

Use Exa for semantic search, similarity, and cheap text extraction. Use Firecrawl when pages are blocked, require JS rendering, or you need scraper-optimized markdown.