home / skills / guanyang / antigravity-skills / defuddle

defuddle skill

safe

/skills/defuddle

This skill extracts clean markdown content from web pages using Defuddle CLI to save tokens and remove clutter.

npx playbooks add skill guanyang/antigravity-skills --skill defuddle

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.0 KB

---
name: defuddle
description: Extract clean markdown content from web pages using Defuddle CLI, removing clutter and navigation to save tokens. Use instead of WebFetch when the user provides a URL to read or analyze, for online documentation, articles, blog posts, or any standard web page.
---

# Defuddle

Use Defuddle CLI to extract clean readable content from web pages. Prefer over WebFetch for standard web pages — it removes navigation, ads, and clutter, reducing token usage.

If not installed: `npm install -g defuddle-cli`

## Usage

Always use `--md` for markdown output:

```bash
defuddle parse <url> --md
```

Save to file:

```bash
defuddle parse <url> --md -o content.md
```

Extract specific metadata:

```bash
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain
```

## Output formats

| Flag | Format |
|------|--------|
| `--md` | Markdown (default choice) |
| `--json` | JSON with both HTML and markdown |
| (none) | HTML |
| `-p <name>` | Specific metadata property |

Overview

This skill extracts clean, readable Markdown content from web pages using the Defuddle CLI, removing navigation, ads, and other clutter to save tokens. It is the preferred extractor for standard documentation, articles, and blog posts when a URL is provided. The skill outputs focused Markdown or JSON and can also capture specific metadata properties.

How this skill works

The skill invokes the Defuddle CLI to parse the provided URL and request Markdown output (--md) by default. It can write results to a file, return JSON that includes both HTML and Markdown, or fetch specific metadata properties (title, description, domain) with -p. It strips extraneous page chrome so downstream language models receive concise, high-value content.

When to use it

You have a URL to a standard web page (docs, blog post, article) and need clean text for analysis or summarization.
You want to reduce token usage by removing navigation, ads, and sidebars before feeding content to a model.
You need Markdown output ready for notes, documentation, or publishing workflows.
You want structured metadata (title, description, domain) alongside page content.
You prefer a deterministic CLI-based extractor instead of a general web fetch when page is static or server-rendered.

Best practices

Always request Markdown (--md) for compact, model-friendly output.
Save content to a file when reusing it across tasks (defuddle parse <url> --md -o content.md).
Install Defuddle globally if not present (npm install -g defuddle-cli) and ensure PATH access.
If a page is heavily client-side (SPA), verify server-rendered HTML or use a renderer before parsing.
Request specific metadata (-p title/description/domain) when you only need small context fields.

Example use cases

Convert a technical blog post into clean Markdown before summarization or code extraction.
Harvest documentation pages into a knowledge-base or note system with minimal manual cleanup.
Preprocess multiple article URLs to reduce token cost before feeding them to an LLM pipeline.
Extract page title and description to auto-populate metadata fields in publishing workflows.
Pull readable content for offline review, diffing, or archiving in Markdown format.

FAQ

Is Defuddle required to use this skill?

Yes—the skill relies on the Defuddle CLI. Install it with npm install -g defuddle-cli and ensure the CLI is in PATH.

What output formats are supported?

Use --md for Markdown (recommended), --json for JSON containing HTML and Markdown, or no flag to get raw HTML. Use -p <name> to fetch a specific metadata property.

What if a page is a JavaScript-heavy single-page app?

Defuddle works best on server-rendered or static pages. For client-rendered pages, pre-render the page or capture the server-rendered HTML before parsing.