home / skills / inclusionai / aworld / x-scraper

This skill enables targeted X (Twitter) data collection via a connected browser, filtering by user or keywords and exporting Markdown, RSS, or JSON.

npx playbooks add skill inclusionai/aworld --skill x-scraper

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
2.7 KB
---
name: x-scraper
description: X (Twitter) 抓取 skill - 通过 agent-browser (CDP) 抓取指定用户推文或首页推荐流,支持关键词过滤、Tab 切换、多格式输出。使用场景:按用户/关键词抓取时间线、查看首页推荐流、生成 RSS/JSON/Markdown。
---

# X 抓取 (x-scraper)

## 概述

通过已连接 CDP 的浏览器(agent-browser)抓取 X (Twitter) 内容,包含两个脚本:

1. **scrape_x_user.sh** — 抓取指定用户时间线,可选关键词过滤
2. **scrape_x_home.sh** — 抓取当前登录用户的首页推荐流(For you / Following)

输出格式统一支持 Markdown / RSS / JSON。

## 工具路径

- 用户抓取:`./scrape_x_user.sh`
- 首页推荐:`./scrape_x_home.sh`
- 依赖:`agent-browser`(CDP 已连接且已登录 X)、`python3`

---

## 1. 用户帖子抓取 (scrape_x_user.sh)

按用户名抓取最新帖子,可选关键词搜索过滤。

### 用法

```bash
././scrape_x_user.sh [-u <username>] [-k <keyword>] [-p <cdp_port>] [-n <max_scrolls>] [-o <output_file>] [-f <format>]
```

### 参数

| 参数 | 说明 | 默认 |
|------|------|------|
| `-u` | X 用户名(不带 @) | Alibaba_Qwen |
| `-k` | 搜索关键词(可选,不指定则抓取用户全部最新帖子) | - |
| `-p` | CDP 端口 | 9222 |
| `-n` | 最大滚动次数 | 10 |
| `-o` | 输出文件路径 | stdout |
| `-f` | 格式:`md` \| `rss` \| `json` | md |

### 示例

```bash
././scrape_x_user.sh
././scrape_x_user.sh -k qwen3
././scrape_x_user.sh -u chenchengpro -k claw -f rss -o feed.xml
././scrape_x_user.sh -u chenchengpro -f json -n 20 -o data.json
```

---

## 2. 首页推荐流抓取 (scrape_x_home.sh)

抓取当前登录用户的 X 首页推荐内容,支持 For you / Following 两个 Tab 切换。

### 用法

```bash
././scrape_x_home.sh [-t <tab>] [-p <cdp_port>] [-n <max_scrolls>] [-o <output_file>] [-f <format>]
```

### 参数

| 参数 | 说明 | 默认 |
|------|------|------|
| `-t` | 推荐 Tab:`foryou` \| `following` | foryou |
| `-p` | CDP 端口 | 9222 |
| `-n` | 最大滚动次数 | 5 |
| `-o` | 输出文件路径 | stdout |
| `-f` | 格式:`md` \| `rss` \| `json` | md |

### 输出字段

每条帖子包含:`author`(作者名 + handle)、`time`(ISO 时间戳)、`text`(正文)、`link`(帖子链接)、`hasMedia`(是否含图片/视频)、`retweet`(转推/置顶上下文)

### 示例

```bash
././scrape_x_home.sh                           # 抓取 For you 推荐流
././scrape_x_home.sh -t following -n 10        # 抓取 Following 时间线
././scrape_x_home.sh -f json -o feed.json      # JSON 输出到文件
././scrape_x_home.sh -n 3 -f rss -o home.xml   # 少量抓取,RSS 输出
```

Overview

This skill scrapes X (Twitter) content via a browser connected through Chrome DevTools Protocol (agent-browser). It provides two scripts to harvest a user timeline or the logged-in home recommendation stream and exports results as Markdown, RSS, or JSON. The tool supports keyword filtering, tab switching (For you / Following), and configurable scrolling limits.

How this skill works

The scripts control a CDP-connected browser to navigate X, load timelines, and scroll to collect visible posts. scrape_x_user.sh targets a specific username and can filter posts by keyword; scrape_x_home.sh captures the logged-in account’s home feed and can switch between recommendation tabs. Output includes structured fields (author, time, text, link, media flag, retweet context) and is written to stdout or a file in md/rss/json.

When to use it

  • Harvest a specific user’s recent tweets for archiving or analysis.
  • Monitor a public figure or project feed and filter by keywords or topics.
  • Capture the logged-in account’s For You or Following recommendation stream for research.
  • Generate RSS or JSON feeds from X content for downstream consumption.
  • Quickly export timelines to Markdown for note-taking or reporting.

Best practices

  • Ensure agent-browser CDP is running and the browser is logged into X before running scripts.
  • Use modest max scroll counts for initial runs to avoid long sessions and to validate results.
  • Combine keyword filters with user scraping to reduce noise and focus on relevant posts.
  • Prefer JSON output for programmatic processing; use RSS for feed consumption and Markdown for human review.
  • Rotate output file paths when collecting repeated snapshots to avoid overwriting historical captures.

Example use cases

  • Daily scrape of a project lead’s timeline, filtering for release-related keywords, output to JSON for downstream analytics.
  • Capture your logged-in For You feed weekly to study recommendation differences versus Following, export as RSS for review.
  • Create a Markdown digest of a researcher’s posts filtered by topic for inclusion in a weekly newsletter.
  • Fetch 10 pages of a user timeline to build a local archive and convert to JSON for import into a search index.
  • Quickly generate an RSS feed from the home recommendation stream for a private monitoring dashboard.

FAQ

What prerequisites are required to run the scripts?

A running agent-browser with CDP enabled and logged into X, plus Python 3 installed on the system.

How do I limit how many posts are collected?

Use the -n parameter to control the maximum number of scrolls; lower values reduce the number of posts fetched.

Which output format should I choose?

Use JSON for automated processing, RSS for feed readers, and Markdown for human-friendly summaries.