home / skills / inclusionai / aworld / html-to-image
This skill renders HTML content into images by driving agent-browser and capturing screenshots for ready-to-publish visuals.
npx playbooks add skill inclusionai/aworld --skill html-to-imageReview the files below or copy the command above to add this skill to your agents.
---
name: html-to-image
description: HTML 转图片 skill - 将 HTML 文件或内容通过 agent-browser 渲染并截图为图片。适用于生成信息图、社交媒体配图、数据可视化截图等场景。
---
# HTML 转图片 (html-to-image)
## 概述
将 HTML 文件或内容通过 agent-browser 渲染为图片。典型用法:Claude 生成精美 HTML → 本 skill 截图 → 得到可直接发布的图片。
## 工具路径
- 脚本:`.claude/skills/html-to-image/html_to_image.sh`
- 依赖:`agent-browser`(CDP 已连接)、`python3`
## 用法
```bash
./html_to_image.sh -o <output> [-f <html_file> | -c <html_content>] [-w <width>] [-p <cdp_port>] [--full]
```
### 参数
| 参数 | 说明 | 必填 | 默认 |
|------|------|------|------|
| `-o` | 输出图片路径(.png) | 是 | - |
| `-f` | HTML 文件路径(与 `-c` 二选一) | 二选一 | - |
| `-c` | HTML 内容字符串(与 `-f` 二选一) | 二选一 | - |
| `-w` | 视口宽度 | 否 | 1080 |
| `-e` | 视口高度(不指定则全页截图) | 否 | - |
| `-p` | CDP 端口 | 否 | 9222 |
| `--full` | 全页截图(忽略视口高度限制) | 否 | 默认开启 |
### 示例
```bash
# 从 HTML 文件截图
./html_to_image.sh \
-f card.html -o card.png
# 直接传入 HTML 内容
./html_to_image.sh \
-c '<html><body><h1>Hello</h1></body></html>' \
-o hello.png
# 指定宽度(适配手机尺寸)
./html_to_image.sh \
-f infographic.html -o output.png -w 750
# 固定视口截图(非全页)
./html_to_image.sh \
-f page.html -o output.png -w 1080 -e 1920 --no-full
```
### 典型工作流
1. Claude 根据内容生成精美 HTML(信息图、卡片等)
2. 使用本 skill 截图为 PNG
3. 将截图传给 `xhs-publisher` 发布到小红书
```bash
# 生成图片
./html_to_image.sh -f card.html -o card.png
# 发布到小红书
./.claude/skills/xhs-publisher/publish_xhs.sh -t "标题" -c "正文" -i card.png
```
This skill converts HTML files or inline HTML content into PNG images by rendering them in an agent-connected browser and taking a screenshot. It is designed to turn generated HTML (cards, infographics, visualizations) into publishable images quickly. The tool is script-driven and works with a Chromium Debugging Protocol (CDP) connection.
The script loads given HTML (from a file or a content string) into an agent-browser instance connected via CDP, sets a viewport size, and captures a screenshot. You can request a viewport-limited capture or a full-page screenshot. The output is a PNG file written to the specified path.
What inputs are supported?
You can provide an HTML file path or pass HTML content as a string; one of these is required.
How do I control output dimensions?
Use -w to set viewport width and -e for viewport height. Omit -e or use --full to capture the full page height.
What if the browser is not reachable?
Confirm agent-browser is running and CDP is listening on the port (default 9222). Adjust -p if using a different port.