home / skills / heygen-com / skills / heygen

heygen skill

/skills/heygen

This skill helps you generate AI avatar videos and manage prompts, scenes, and translations via HeyGen API for Remotion workflows.

npx playbooks add skill heygen-com/skills --skill heygen

Review the files below or copy the command above to add this skill to your agents.

Files (23)
SKILL.md
6.3 KB
---
name: heygen
description: |
  HeyGen AI video creation API. Use when: (1) Using Video Agent for one-shot prompt-to-video generation, (2) Generating AI avatar videos with /v2/video/generate, (3) Working with HeyGen avatars, voices, backgrounds, or captions, (4) Creating transparent WebM videos for compositing, (5) Polling video status or handling webhooks, (6) Integrating HeyGen with Remotion for programmatic video, (7) Translating or dubbing existing videos, (8) Generating standalone TTS audio with the Starfish model via /v1/audio.
homepage: https://docs.heygen.com/reference/generate-video-agent
allowed-tools: mcp__heygen__*
metadata:
  openclaw:
    requires:
      env:
        - HEYGEN_API_KEY
    primaryEnv: HEYGEN_API_KEY
---

# HeyGen API

AI avatar video creation API for generating talking-head videos, explainers, and presentations.

## Tool Selection

If HeyGen MCP tools are available (`mcp__heygen__*`), **prefer them** over direct HTTP API calls — they handle authentication and request formatting automatically.

| Task | MCP Tool | Fallback (Direct API) |
|------|----------|----------------------|
| Generate video from prompt | `mcp__heygen__generate_video_agent` | `POST /v1/video_agent/generate` |
| Check video status / get URL | `mcp__heygen__get_video` | `GET /v1/video_status.get` |
| List account videos | `mcp__heygen__list_videos` | `GET /v1/video.list` |
| Generate TTS audio | `mcp__heygen__text_to_speech` | `POST /v1/audio/text_to_speech` |
| List TTS voices | `mcp__heygen__list_audio_voices` | `GET /v1/audio/voices` |
| Delete a video | `mcp__heygen__delete_video` | `DELETE /v1/video.delete` |

If no HeyGen MCP tools are available, use direct HTTP API calls with `X-Api-Key: $HEYGEN_API_KEY` header as documented in the reference files.

## Default Workflow

**Prefer Video Agent** for most video requests.
Always use [prompt-optimizer.md](references/prompt-optimizer.md) guidelines to structure prompts with scenes, timing, and visual styles.

**With MCP tools:**
1. Write an optimized prompt using [prompt-optimizer.md](references/prompt-optimizer.md) → [visual-styles.md](references/visual-styles.md)
2. Call `mcp__heygen__generate_video_agent` with prompt and config (duration_sec, orientation, avatar_id)
3. Call `mcp__heygen__get_video` with the returned video_id to poll status and get the download URL

**Without MCP tools (direct API):**
1. Write an optimized prompt using [prompt-optimizer.md](references/prompt-optimizer.md) → [visual-styles.md](references/visual-styles.md)
2. `POST /v1/video_agent/generate` — see [video-agent.md](references/video-agent.md)
3. `GET /v1/video_status.get?video_id=<id>` — see [video-status.md](references/video-status.md)

Only use v2/video/generate when user explicitly needs:
- Exact script without AI modification
- Specific voice_id selection
- Different avatars/backgrounds per scene
- Precise per-scene timing control
- Programmatic/batch generation with exact specs

## Quick Reference

| Task | MCP Tool | Read |
|------|----------|------|
| Generate video from prompt (easy) | `mcp__heygen__generate_video_agent` | [prompt-optimizer.md](references/prompt-optimizer.md) → [visual-styles.md](references/visual-styles.md) → [video-agent.md](references/video-agent.md) |
| Generate video with precise control | — | [video-generation.md](references/video-generation.md), [avatars.md](references/avatars.md), [voices.md](references/voices.md) |
| Check video status / get download URL | `mcp__heygen__get_video` | [video-status.md](references/video-status.md) |
| Add captions or text overlays | — | [captions.md](references/captions.md), [text-overlays.md](references/text-overlays.md) |
| Transparent video for compositing | — | [video-generation.md](references/video-generation.md) (WebM section) |
| Generate standalone TTS audio | `mcp__heygen__text_to_speech` | [text-to-speech.md](references/text-to-speech.md) |
| List TTS voices | `mcp__heygen__list_audio_voices` | [voices.md](references/voices.md) |
| Translate/dub existing video | — | [video-translation.md](references/video-translation.md) |
| Use with Remotion | — | [remotion-integration.md](references/remotion-integration.md) |

## Reference Files

### Foundation
- [references/authentication.md](references/authentication.md) - API key setup and X-Api-Key header
- [references/quota.md](references/quota.md) - Credit system and usage limits
- [references/video-status.md](references/video-status.md) - Polling patterns and download URLs
- [references/assets.md](references/assets.md) - Uploading images, videos, audio

### Core Video Creation
- [references/avatars.md](references/avatars.md) - Listing avatars, styles, avatar_id selection
- [references/voices.md](references/voices.md) - Listing voices, locales, speed/pitch
- [references/scripts.md](references/scripts.md) - Writing scripts, pauses, pacing
- [references/video-generation.md](references/video-generation.md) - POST /v2/video/generate and multi-scene videos
- [references/video-agent.md](references/video-agent.md) - One-shot prompt video generation
- [references/prompt-optimizer.md](references/prompt-optimizer.md) - Writing effective Video Agent prompts (core workflow + rules)
- [references/visual-styles.md](references/visual-styles.md) - 20 named visual styles with full specs
- [references/prompt-examples.md](references/prompt-examples.md) - Full production prompt example + ready-to-use templates
- [references/dimensions.md](references/dimensions.md) - Resolution and aspect ratios

### Video Customization
- [references/backgrounds.md](references/backgrounds.md) - Solid colors, images, video backgrounds
- [references/text-overlays.md](references/text-overlays.md) - Adding text with fonts and positioning
- [references/captions.md](references/captions.md) - Auto-generated captions and subtitles

### Advanced Features
- [references/templates.md](references/templates.md) - Template listing and variable replacement
- [references/video-translation.md](references/video-translation.md) - Translating videos and dubbing
- [references/text-to-speech.md](references/text-to-speech.md) - Standalone TTS audio with Starfish model
- [references/photo-avatars.md](references/photo-avatars.md) - Creating avatars from photos
- [references/webhooks.md](references/webhooks.md) - Webhook endpoints and events

### Integration
- [references/remotion-integration.md](references/remotion-integration.md) - Using HeyGen in Remotion compositions

Overview

This skill provides a concise interface and best practices for generating AI avatar videos with HeyGen. It covers using the Video Agent for one-shot prompt-to-video generation and the v2/video/generate endpoint when you need exact per-scene control. The skill also explains creating transparent WebM outputs, standalone TTS audio, and integrating HeyGen output into Remotion pipelines.

How this skill works

Use the Video Agent API (POST /v1/video_agent/generate) for most prompt-driven video requests; structure prompts with scenes, timing, and visual style using prompt-optimizer guidelines. Use POST /v2/video/generate when you require strict script fidelity, specific voice_id or avatar/background per scene, or programmatic batch generation. Poll video status endpoints or use webhooks to detect completion and retrieve download URLs. For audio-only, call the Starfish TTS model via /v1/audio to produce standalone speech.

When to use it

  • Create quick talking-head or explainer videos from a natural language prompt (use Video Agent).
  • Need exact per-scene timing, specific voice_id, or multiple avatars/backgrounds (use v2/video/generate).
  • Produce transparent WebM videos for compositing over other footage or UIs.
  • Translate or dub an existing video into another language with automated lip-sync and voice options.
  • Generate standalone TTS audio for podcasts, IVR, or to combine with custom visuals (Starfish model).
  • Integrate generated assets into Remotion for programmatic, frame-accurate compositions.

Best practices

  • Prefer Video Agent for general prompt-to-video workflows; reserve v2/video/generate for strict, deterministic specifications.
  • Write prompts with clear scenes, durations, visual references, and desired pacing using the prompt optimizer format.
  • Use avatars and voices lists to test several combinations; lock voice_id only when you need repeatable results.
  • Choose WebM with an alpha channel for compositing; verify target player support for transparent WebM.
  • Use webhook events or polling patterns in video-status docs to handle async completion robustly.
  • Upload background assets and overlays ahead of generation to avoid network latency and ensure exact results.

Example use cases

  • Marketing explainer: feed a short script to Video Agent to generate a 60–90 second avatar explainer with captions.
  • Course lessons: produce multi-scene scripted lessons using v2/video/generate to control timing and multiple avatars.
  • Localized dubs: translate and dub an educational video into Spanish with voice matching and subtitles.
  • Product demo compositing: generate transparent WebM avatar overlays and composite them into a demo timeline in Remotion.
  • Voice assets: generate Starfish TTS audio files for spoken UI prompts or narration tracks.

FAQ

When should I choose Video Agent vs v2/video/generate?

Use Video Agent for quick, AI-curated prompt-to-video creation. Choose v2/video/generate when you need exact script fidelity, specific voice/avatar per scene, or strict per-scene timing.

How do I get notified when a video is ready?

Either poll the video status endpoint following recommended patterns or configure webhooks to receive completion events and download URLs.