home / skills / inclusionai / aworld / xhs-scraper

xhs-scraper skill

safe

/examples/skill_agent/skills/xhs-scraper

This skill helps you extract Xiaohongshu search results via a connected browser, outputting markdown, rss, or json for notes lists and details.

npx playbooks add skill inclusionai/aworld --skill xhs-scraper

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.3 KB

---
name: xhs-scraper
description: 小红书搜索抓取 skill - 通过 agent-browser (CDP) 抓取小红书搜索结果，支持列表+详情、多格式输出。使用场景：按关键词抓取笔记列表与正文、生成 RSS/JSON/Markdown。
---

# 小红书抓取 (xhs-scraper)

## 概述

通过已连接 CDP 的浏览器（agent-browser）抓取小红书搜索结果：列表页滚动采集卡片信息，可选进入详情页获取正文，输出为 Markdown / RSS / JSON。

## 工具路径

- 脚本：`.claude/skills/xhs-scraper/scrape_xhs.sh`
- 依赖：`agent-browser`（CDP 已连接）、`python3`

## 用法

```bash
./scrape_xhs.sh -k <keyword> [-p <cdp_port>] [-n <max_scrolls>] [-d <detail_count>] [-o <output_file>] [-f <format>]
```

### 参数

| 参数 | 说明 | 默认 |
|------|------|------|
| `-k` | 搜索关键词（必填） | - |
| `-p` | CDP 端口 | 9222 |
| `-n` | 列表页最大滚动次数 | 5 |
| `-d` | 进入详情页获取正文的条数（0=仅列表） | 10 |
| `-o` | 输出文件路径 | stdout |
| `-f` | 格式：`md` \| `rss` \| `json` | md |

### 示例

```bash
./scrape_xhs.sh -k "Agent开发工程师"
./scrape_xhs.sh -k "AI Agent岗位" -d 5 -f rss -o feed.xml
./scrape_xhs.sh -k "大模型面经" -n 10 -d 20 -f json -o data.json
```

Overview

This skill scrapes Xiaohongshu (Little Red Book) search results via a CDP-connected browser (agent-browser). It collects list-card metadata, optionally visits detail pages to extract full content, and exports results as Markdown, RSS, or JSON. Designed for automated keyword-driven harvesting of notes for analysis, monitoring, and feed generation.

How this skill works

The skill drives a browser using the Chrome DevTools Protocol to perform a search, scroll the list page to load cards, and capture card metadata (title, author, time, link, cover). Optionally it opens a configurable number of detail pages to extract full note text and images. Output is formatted as Markdown, RSS, or structured JSON for downstream consumption.

When to use it

Collect public Xiaohongshu notes for keyword research or trend monitoring
Generate an RSS feed or JSON dataset from platform search results
Scrape list-level metadata only or include detail-level full text for deeper analysis
Automate periodic harvesting for competitor or content tracking
Prepare markdown exports for documentation or manual review

Best practices

Run against a stable CDP-enabled browser instance (agent-browser) to avoid session issues
Start with small scroll and detail limits to validate selectors before scaling up
Respect the platform’s terms of service and rate limits; add delays if needed
Choose JSON for programmatic workflows, RSS for feed consumption, Markdown for human-readable archives
Keep output paths and formats explicit and rotate or deduplicate results when running repeatedly

Example use cases

Harvest top notes for a given job-related keyword and export as Markdown for recruiter review
Build a daily RSS feed of new notes for a product or topic to monitor sentiment
Create a JSON dataset of note titles, authors, and timestamps for trend analysis
Scrape detailed note bodies for a small sample to train or evaluate content classifiers
Produce Markdown archives of candidate interview experiences or study notes for team knowledge sharing

FAQ

Do I need to log in to Xiaohongshu?

No. The skill scrapes public search results. Logged-in content may appear differently; use a logged-in browser if you need access to private content.

How do I control how many detail pages are visited?

Set the detail count parameter to limit how many items the scraper opens for full-content extraction; zero will only collect list metadata.