home / skills / inclusionai / aworld / xhs-scraper

This skill helps you extract Xiaohongshu search results via a connected browser, outputting markdown, rss, or json for notes lists and details.

npx playbooks add skill inclusionai/aworld --skill xhs-scraper

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.3 KB
---
name: xhs-scraper
description: 小红书搜索抓取 skill - 通过 agent-browser (CDP) 抓取小红书搜索结果,支持列表+详情、多格式输出。使用场景:按关键词抓取笔记列表与正文、生成 RSS/JSON/Markdown。
---

# 小红书抓取 (xhs-scraper)

## 概述

通过已连接 CDP 的浏览器(agent-browser)抓取小红书搜索结果:列表页滚动采集卡片信息,可选进入详情页获取正文,输出为 Markdown / RSS / JSON。

## 工具路径

- 脚本:`.claude/skills/xhs-scraper/scrape_xhs.sh`
- 依赖:`agent-browser`(CDP 已连接)、`python3`

## 用法

```bash
./scrape_xhs.sh -k <keyword> [-p <cdp_port>] [-n <max_scrolls>] [-d <detail_count>] [-o <output_file>] [-f <format>]
```

### 参数

| 参数 | 说明 | 默认 |
|------|------|------|
| `-k` | 搜索关键词(必填) | - |
| `-p` | CDP 端口 | 9222 |
| `-n` | 列表页最大滚动次数 | 5 |
| `-d` | 进入详情页获取正文的条数(0=仅列表) | 10 |
| `-o` | 输出文件路径 | stdout |
| `-f` | 格式:`md` \| `rss` \| `json` | md |

### 示例

```bash
./scrape_xhs.sh -k "Agent开发工程师"
./scrape_xhs.sh -k "AI Agent岗位" -d 5 -f rss -o feed.xml
./scrape_xhs.sh -k "大模型面经" -n 10 -d 20 -f json -o data.json
```

Overview

This skill scrapes Xiaohongshu (Little Red Book) search results via a CDP-connected browser (agent-browser). It collects list-card metadata, optionally visits detail pages to extract full content, and exports results as Markdown, RSS, or JSON. Designed for automated keyword-driven harvesting of notes for analysis, monitoring, and feed generation.

How this skill works

The skill drives a browser using the Chrome DevTools Protocol to perform a search, scroll the list page to load cards, and capture card metadata (title, author, time, link, cover). Optionally it opens a configurable number of detail pages to extract full note text and images. Output is formatted as Markdown, RSS, or structured JSON for downstream consumption.

When to use it

  • Collect public Xiaohongshu notes for keyword research or trend monitoring
  • Generate an RSS feed or JSON dataset from platform search results
  • Scrape list-level metadata only or include detail-level full text for deeper analysis
  • Automate periodic harvesting for competitor or content tracking
  • Prepare markdown exports for documentation or manual review

Best practices

  • Run against a stable CDP-enabled browser instance (agent-browser) to avoid session issues
  • Start with small scroll and detail limits to validate selectors before scaling up
  • Respect the platform’s terms of service and rate limits; add delays if needed
  • Choose JSON for programmatic workflows, RSS for feed consumption, Markdown for human-readable archives
  • Keep output paths and formats explicit and rotate or deduplicate results when running repeatedly

Example use cases

  • Harvest top notes for a given job-related keyword and export as Markdown for recruiter review
  • Build a daily RSS feed of new notes for a product or topic to monitor sentiment
  • Create a JSON dataset of note titles, authors, and timestamps for trend analysis
  • Scrape detailed note bodies for a small sample to train or evaluate content classifiers
  • Produce Markdown archives of candidate interview experiences or study notes for team knowledge sharing

FAQ

Do I need to log in to Xiaohongshu?

No. The skill scrapes public search results. Logged-in content may appear differently; use a logged-in browser if you need access to private content.

How do I control how many detail pages are visited?

Set the detail count parameter to limit how many items the scraper opens for full-content extraction; zero will only collect list metadata.