home / skills / aidotnet / moyucode / web-scraper
This skill helps you extract structured data from web pages using CSS selectors with rate limiting and pagination support.
npx playbooks add skill aidotnet/moyucode --skill web-scraperReview the files below or copy the command above to add this skill to your agents.
---
name: web-scraper
description: 使用CSS选择器从网页提取数据,支持分页、限速和多种输出格式。
metadata:
short-description: 从网站爬取数据
source:
repository: https://github.com/cheeriojs/cheerio
license: MIT
---
# Web Scraper Tool
## Description
Extract structured data from web pages using CSS selectors with rate limiting and pagination support.
## Trigger
- `/scrape` command
- User requests web data extraction
- User needs to parse HTML
## Usage
```bash
# Scrape single page
python scripts/web_scraper.py --url "https://example.com" --selector ".item" --output data.json
# Scrape with multiple selectors
python scripts/web_scraper.py --url "https://example.com" --selectors "title:.title,price:.price,link:a@href"
# Scrape multiple pages
python scripts/web_scraper.py --urls urls.txt --selector ".product" --output products.json --delay 2
```
## Tags
`scraping`, `web`, `html`, `data-extraction`, `automation`
## Compatibility
- Codex: ✅
- Claude Code: ✅
This skill extracts structured data from web pages using CSS selectors, with built-in support for pagination, rate limiting, and multiple output formats. It is designed for quick, repeatable data extraction from static HTML pages and simple listing sites. Use it to convert page elements into JSON, CSV, or other structured outputs for analysis or automation.
You provide one or more CSS selectors that map to fields on the target pages. The tool fetches pages, applies selectors to each document, follows pagination rules when configured, and respects a configurable delay between requests. Results can be emitted as JSON, CSV, or streamed output; optionally it accepts a list of URLs to process in batch.
Which output formats are supported?
Common outputs include JSON and CSV; the tool can also stream results or write to a specified file.
How does pagination work?
You can supply a pagination selector or URL pattern. The scraper follows pages until no new items are found or a page limit is reached.