home / skills / dcjanus / prompts / fetch-url
This skill fetches a URL, extracts the article body in Markdown by default, and supports various output formats and browser-based strategies.
npx playbooks add skill dcjanus/prompts --skill fetch-urlReview the files below or copy the command above to add this skill to your agents.
---
name: fetch-url
description: 获取并提取链接正文(默认 Markdown);内置 X/Twitter URL 处理,提升受限页面的抓取成功率。
---
在当前文件所在目录运行:`./scripts/fetch_url.py URL`(仅支持 `http` / `https`)。
说明:必须直接当作可执行文件执行。
脚本调用方式示例(不要用 `uv run python` 或 `python`):
```bash
cd skills/fetch-url && ./scripts/fetch_url.py https://example.com --output ./page.md
```
错误示例:
```bash
uv run python skills/fetch-url/scripts/fetch_url.py https://example.com --output ./page.md
python skills/fetch-url/scripts/fetch_url.py https://example.com --output ./page.md
```
默认自动探测本地 Chromium 系浏览器路径;未探测到时需安装 Playwright 浏览器:
```bash
uv run playwright install chromium
```
参数:
- `--output`:将输出写入文件(默认 stdout)。
- `--timeout-ms`:Playwright 导航超时(毫秒,默认 60000)。
- `--browser-path`:指定本地 Chromium 系浏览器路径(默认自动探测)。
- `--output-format`:输出格式(默认 `markdown`),支持 `csv`、`html`、`json`、`markdown`、`raw-html`、`txt`、`xml`、`xmltei`;`raw-html` 直接输出渲染后的 HTML(不经 trafilatura)。
- `--fetch-strategy`:仅 `markdown` 可用,支持 `auto`、`agent`、`jina`、`browser`。默认 `auto`。
`--fetch-strategy` 常用值:
- `auto`:默认选择。
- `agent`:优先用原站 Markdown 协商。
- `jina`:优先用 Jina Reader。
- `browser`:直接用本地 Playwright。
环境变量:
- 可设置 `JINA_API_KEY` 提升 Jina Reader 限流:`JINA_API_KEY=your-token ./scripts/fetch_url.py ...`
示例:
```bash
./scripts/fetch_url.py https://example.com --output ./page.md --timeout-ms 60000
./scripts/fetch_url.py https://example.com --fetch-strategy jina
JINA_API_KEY=your-token ./scripts/fetch_url.py https://example.com --fetch-strategy jina
./scripts/fetch_url.py https://example.com --fetch-strategy browser
./scripts/fetch_url.py https://x.com/jack/status/20 --output-format markdown
./scripts/fetch_url.py https://x.com/jack/status/20 --output-format markdown --fetch-strategy browser
```
Reference:[`scripts/fetch_url.py`](scripts/fetch_url.py)
This skill fetch-url fetches a web page and extracts its main text content (default output: Markdown). It includes special handling for X/Twitter URLs to improve retrieval success on restricted pages. The tool runs as a standalone executable script and supports multiple output formats and fetch strategies.
Run the script directly from its directory as an executable to navigate the URL with Playwright or fall back to content readers. It can detect a local Chromium-like browser automatically or use Playwright-installed browsers, call Jina Reader when configured, or prefer raw site Markdown depending on the chosen fetch strategy. The script then extracts rendered content (optionally post-processed with trafilatura) and writes output to stdout or a file in the requested format.
How do I run the script?
Change to the fetch-url directory and run it directly as an executable: ./scripts/fetch_url.py https://example.com --output ./page.md. Do not call it via python or uv run python wrappers.
What if Chromium is not found on my system?
Install Playwright browsers (for example: uv run playwright install chromium) or provide a local Chromium path with --browser-path.