home / skills / jrajasekera / claude-skills / article-extractor
/article-extractor
This skill extracts clean article content from URLs, saves as markdown, and supports offline reading via Wayback when needed.
npx playbooks add skill jrajasekera/claude-skills --skill article-extractorReview the files below or copy the command above to add this skill to your agents.
---
name: article-extractor
description: Extract clean article content from URLs and save as markdown. Triggers when user provides a webpage URL and wants to download it, extract content, get a clean version without ads, capture an article for offline reading, save an article, grab content from a page, archive a webpage, clip an article, or read something later. Handles blog posts, news articles, tutorials, documentation pages, and similar web content. Supports Wayback Machine for dead links or paywalled content. This skill handles the entire workflow - do NOT use web_fetch or other tools first, just call the extraction script directly with the URL.
---
# Article Extractor
Extract clean article content from URLs, removing ads, navigation, and clutter. Multi-tool fallback ensures reliability.
## Workflow
When user provides a URL to download/extract:
1. Call the extraction script directly with the URL (do NOT fetch the URL first with web_fetch)
2. Script handles fetching, extraction, and saving automatically
3. Returns clean markdown file with frontmatter
## Usage
```bash
# Basic extraction
scripts/extract-article.sh "https://example.com/article"
# Specify output location
scripts/extract-article.sh "https://example.com/article" -o my-article.md -d ~/Documents
# Try Wayback Machine if original fails
scripts/extract-article.sh "https://example.com/article" --wayback
```
Make script executable if needed: `chmod +x scripts/extract-article.sh`
## Key Options
- `-o <file>` - Output filename
- `-d <dir>` - Output directory
- `-w, --wayback` - Try Wayback Machine if extraction fails
- `-t <tool>` - Force tool: `jina`, `trafilatura`, `readability`, `fallback`
- `-q` - Quiet mode
For complete options, exit codes, tool details, and examples, see [references/tools-and-options.md](references/tools-and-options.md).
## Common Failures
- **Exit 3 (access denied)**: Paywall or login required - try `--wayback`
- **Exit 4 (no content)**: Heavy JavaScript - try different `--tool`
- **Exit 2 (network)**: Connection issue - check URL
## Local Tools (Optional)
For offline extraction: `scripts/install-deps.sh`This skill extracts clean article content from a webpage URL and saves it as a markdown file with frontmatter. It removes ads, navigation, and clutter so you get a reader-friendly version for offline reading, archiving, or publishing. The tool supports fallback extractors and Wayback Machine retrieval for dead or paywalled links.
Give the script a URL and it fetches the page, runs content-extraction tools, cleans HTML, and writes a markdown file with frontmatter. You must call the extraction script directly with the URL; the script handles fetching, extraction, and saving. It supports forcing a specific extractor, quiet mode, and an option to try the Wayback Machine when the original fails.
What if the extractor reports access denied or a paywall?
Try the --wayback option to fetch an archived copy. If that fails, the page may require credentials or advanced bypasses that the extractor can’t handle.
Extraction returned no content or heavy JavaScript page — what now?
Retry with a different tool using -t (options like jina, trafilatura, readability, fallback). If network issues occurred, check the URL and your connection and consult the exit code for details.