home / skills / vaayne / agent-kit / web-fetch

web-fetch skill

/skills/web-fetch

This skill fetches and extracts clean, readable content from web pages using Jina Reader, returning title, text, and metadata for analysis.

npx playbooks add skill vaayne/agent-kit --skill web-fetch

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.3 KB
---
name: web-fetch
description: Fetch and extract clean content from URLs using Jina Reader API. Use when users need to read webpage content, extract article text, or fetch URL content for analysis. Triggers on "fetch this page", "read this URL", "extract content from", "get the content of", "what does this page say".
---

# Web Fetch

## Overview

Extract clean, readable content from any URL using Jina Reader API. Returns raw JSON with title, content, and metadata optimized for LLM consumption.

## When to Use

- User wants to read or analyze webpage content
- Need to extract article text from a URL
- Fetching documentation or reference pages
- Converting web pages to clean text for processing

## Workflow

1. Identify the URL from user request
2. Validate URL format
3. Run the fetch script
4. Present extracted content to user

## Usage

```bash
# Basic fetch
uv run --script scripts/web_fetch.py --url "https://example.com"

# With custom timeout
uv run --script scripts/web_fetch.py \
  --url "https://example.com/article" \
  --timeout 60
```

## Parameters

| Parameter   | Default    | Description                           |
| ----------- | ---------- | ------------------------------------- |
| `--url`     | (required) | URL to fetch and extract content from |
| `--timeout` | 30         | Request timeout in seconds            |

## Output Contract

| Scenario    | stdout             | stderr             | exit code |
| ----------- | ------------------ | ------------------ | --------- |
| Success     | Raw JSON from Jina | (empty)            | 0         |
| Invalid URL | (empty)            | Error message      | 1         |
| Timeout     | (empty)            | Timeout error      | 1         |
| HTTP Error  | (empty)            | HTTP error details | 1         |

Success output contains:

- Page title and description
- Clean extracted content (markdown-formatted)
- URL and metadata
- Token usage information

## Prerequisites

- Uses Jina Reader API (no API key required)
- Requires `uv` for running PEP 723 scripts

## Examples

### Fetch a webpage

```bash
uv run --script scripts/web_fetch.py \
  --url "https://docs.python.org/3/whatsnew/3.12.html"
```

### Fetch with longer timeout for slow pages

```bash
uv run --script scripts/web_fetch.py \
  --url "https://example.com/large-article" \
  --timeout 60
```

Overview

This skill fetches and extracts clean, readable content from any public URL using the Jina Reader API. It returns structured JSON containing title, markdown-formatted content, metadata, and token-usage info optimized for downstream LLM processing. Use it when you need reliable article text extraction or to convert web pages into text for analysis or summarization.

How this skill works

The skill identifies and validates the URL, sends a fetch request to the Jina Reader API, and parses the returned document into a compact JSON payload. It extracts title, description, the main article body (cleaned and formatted as markdown), plus metadata such as source URL and token counts. Errors like invalid URLs, timeouts, or HTTP failures are surfaced via stderr and nonzero exit codes.

When to use it

  • You need to read or analyze the main content of a webpage.
  • Extracting article text from blog posts, news pages, or documentation.
  • Preparing web content for summarization, question answering, or indexing.
  • Automating data collection from reference pages or public APIs.
  • Converting web pages to clean markdown for further NLP pipelines.

Best practices

  • Always validate and normalize URLs before invoking the skill to avoid avoidable errors.
  • Increase timeout for large or slow-loading pages (default is 30 seconds).
  • Handle non-200 HTTP responses and retries in calling logic for robustness.
  • Inspect token usage metadata if you plan to chain many extractions to control costs.
  • Sanitize or filter extracted content if downstream systems require stricter input control.

Example use cases

  • Fetch a news article to generate a concise summary for a daily briefing.
  • Extract API docs or changelogs to feed into a developer assistant.
  • Pull product pages for automated competitor analysis or feature extraction.
  • Convert long-form blog posts into markdown to create vector embeddings for search.
  • Retrieve academic or technical articles for citation extraction and indexing.

FAQ

Do I need an API key to use the Jina Reader through this skill?

No API key is required; the skill uses the public Jina Reader API as provided.

What happens if the page is behind authentication or blocked?

The fetch will fail with an HTTP error; the skill reports the error details on stderr and returns a nonzero exit code.

Can I adjust request timeout?

Yes — the skill accepts a timeout parameter (default 30 seconds) to accommodate slow pages.