home / skills / inclusionai / aworld / html-to-image

This skill renders HTML content into images by driving agent-browser and capturing screenshots for ready-to-publish visuals.

npx playbooks add skill inclusionai/aworld --skill html-to-image

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.9 KB
---
name: html-to-image
description: HTML 转图片 skill - 将 HTML 文件或内容通过 agent-browser 渲染并截图为图片。适用于生成信息图、社交媒体配图、数据可视化截图等场景。
---

# HTML 转图片 (html-to-image)

## 概述

将 HTML 文件或内容通过 agent-browser 渲染为图片。典型用法:Claude 生成精美 HTML → 本 skill 截图 → 得到可直接发布的图片。

## 工具路径

- 脚本:`.claude/skills/html-to-image/html_to_image.sh`
- 依赖:`agent-browser`(CDP 已连接)、`python3`

## 用法

```bash
./html_to_image.sh -o <output> [-f <html_file> | -c <html_content>] [-w <width>] [-p <cdp_port>] [--full]
```

### 参数

| 参数 | 说明 | 必填 | 默认 |
|------|------|------|------|
| `-o` | 输出图片路径(.png) | 是 | - |
| `-f` | HTML 文件路径(与 `-c` 二选一) | 二选一 | - |
| `-c` | HTML 内容字符串(与 `-f` 二选一) | 二选一 | - |
| `-w` | 视口宽度 | 否 | 1080 |
| `-e` | 视口高度(不指定则全页截图) | 否 | - |
| `-p` | CDP 端口 | 否 | 9222 |
| `--full` | 全页截图(忽略视口高度限制) | 否 | 默认开启 |

### 示例

```bash
# 从 HTML 文件截图
./html_to_image.sh \
  -f card.html -o card.png

# 直接传入 HTML 内容
./html_to_image.sh \
  -c '<html><body><h1>Hello</h1></body></html>' \
  -o hello.png

# 指定宽度(适配手机尺寸)
./html_to_image.sh \
  -f infographic.html -o output.png -w 750

# 固定视口截图(非全页)
./html_to_image.sh \
  -f page.html -o output.png -w 1080 -e 1920 --no-full
```

### 典型工作流

1. Claude 根据内容生成精美 HTML(信息图、卡片等)
2. 使用本 skill 截图为 PNG
3. 将截图传给 `xhs-publisher` 发布到小红书

```bash
# 生成图片
./html_to_image.sh -f card.html -o card.png

# 发布到小红书
./.claude/skills/xhs-publisher/publish_xhs.sh -t "标题" -c "正文" -i card.png
```

Overview

This skill converts HTML files or inline HTML content into PNG images by rendering them in an agent-connected browser and taking a screenshot. It is designed to turn generated HTML (cards, infographics, visualizations) into publishable images quickly. The tool is script-driven and works with a Chromium Debugging Protocol (CDP) connection.

How this skill works

The script loads given HTML (from a file or a content string) into an agent-browser instance connected via CDP, sets a viewport size, and captures a screenshot. You can request a viewport-limited capture or a full-page screenshot. The output is a PNG file written to the specified path.

When to use it

  • Generate shareable images from programmatically produced HTML
  • Create social media cards or blog visuals from HTML templates
  • Capture data visualizations rendered in-browser for reports
  • Batch-convert HTML snippets into images for publishing workflows

Best practices

  • Provide complete HTML including styles and fonts to ensure consistent rendering
  • Prefer full-page screenshots for long content; use viewport height for fixed-size artboards
  • Test viewport width and height locally to match target platform aspect ratios
  • Ensure agent-browser/CDP is running and accessible on the specified port before invoking the script
  • Use semantic, self-contained HTML so screenshots do not depend on external network resources

Example use cases

  • Turn Claude-generated HTML templates into PNGs for social posts
  • Snapshot interactive charts or dashboards into static images for documents
  • Produce thumbnails or preview images for web content pipelines
  • Create sized images for mobile feeds by setting a mobile-width viewport
  • Integrate into a publish workflow: generate HTML → capture PNG → upload to a platform

FAQ

What inputs are supported?

You can provide an HTML file path or pass HTML content as a string; one of these is required.

How do I control output dimensions?

Use -w to set viewport width and -e for viewport height. Omit -e or use --full to capture the full page height.

What if the browser is not reachable?

Confirm agent-browser is running and CDP is listening on the port (default 9222). Adjust -p if using a different port.