home / skills / openclaw / skills / clawatar

This skill lets your AI agent render a 3D VRM avatar with lip sync, expressions, and WebSocket control for interactive VTuber-style conversations.

npx playbooks add skill openclaw/skills --skill clawatar

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.8 KB
---
name: clawatar
description: Give your AI agent a 3D VRM avatar body with animations, expressions, voice chat, and lip sync. Use when the user wants a visual avatar, VRM viewer, avatar companion, VTuber-style character, or 3D character they can talk to. Installs a web-based viewer controllable via WebSocket.
---

# Clawatar โ€” 3D VRM Avatar Viewer

Give your AI agent a body. Web-based VRM avatar with 162 animations, expressions, TTS lip sync, and AI chat.

## Install & Start

```bash
# Clone and install
git clone https://github.com/Dongping-Chen/Clawatar.git ~/.openclaw/workspace/clawatar
cd ~/.openclaw/workspace/clawatar && npm install

# Start (Vite + WebSocket server)
npm run start
```

Opens at http://localhost:3000 with WS control at ws://localhost:8765.

Users must provide their own VRM model (drag & drop onto page, or set `model.url` in `clawatar.config.json`).

## WebSocket Commands

Send JSON to `ws://localhost:8765`:

### play_action
```json
{"type": "play_action", "action_id": "161_Waving"}
```

### set_expression
```json
{"type": "set_expression", "name": "happy", "weight": 0.8}
```
Expressions: `happy`, `angry`, `sad`, `surprised`, `relaxed`

### speak (requires ElevenLabs API key)
```json
{"type": "speak", "text": "Hello!", "action_id": "161_Waving", "expression": "happy"}
```

### reset
```json
{"type": "reset"}
```

## Quick Animation Reference

| Mood | Action ID |
|------|-----------|
| Greeting | `161_Waving` |
| Happy | `116_Happy Hand Gesture` |
| Thinking | `88_Thinking` |
| Agreeing | `118_Head Nod Yes` |
| Disagreeing | `144_Shaking Head No` |
| Laughing | `125_Laughing` |
| Sad | `142_Sad Idle` |
| Dancing | `105_Dancing`, `143_Samba Dancing`, `164_Ymca Dance` |
| Thumbs Up | `153_Standing Thumbs Up` |
| Idle | `119_Idle` |

Full list: `public/animations/catalog.json` (162 animations)

## Sending Commands from Agent

```bash
cd ~/.openclaw/workspace/clawatar && node -e "
const W=require('ws'),s=new W('ws://localhost:8765');
s.on('open',()=>{s.send(JSON.stringify({type:'speak',text:'Hello!',action_id:'161_Waving',expression:'happy'}));setTimeout(()=>s.close(),1000)})
"
```

## UI Features

- **Touch reactions**: Click avatar head/body for reactions
- **Emotion bar**: Quick ๐Ÿ˜Š๐Ÿ˜ข๐Ÿ˜ ๐Ÿ˜ฎ๐Ÿ˜Œ๐Ÿ’ƒ buttons
- **Background scenes**: Sakura Garden, Night Sky, Cafรฉ, Sunset
- **Camera presets**: Face, Portrait, Full Body, Cinematic
- **Voice chat**: Mic input โ†’ AI response โ†’ TTS lip sync

## Config

Edit `clawatar.config.json` for ports, voice settings, model URL. TTS requires ElevenLabs API key in env (`ELEVENLABS_API_KEY`) or `~/.openclaw/openclaw.json` under `skills.entries.sag.apiKey`.

## Notes

- Animations from [Mixamo](https://www.mixamo.com/) โ€” credit required, non-commercial
- VRM model not included (BYOM โ€” Bring Your Own Model)
- Works standalone without OpenClaw; AI chat is optional

Overview

This skill gives your AI agent a fully interactive 3D VRM avatar with animations, expressions, voice chat and lip sync. It runs as a web-based viewer with a WebSocket control interface so agents can drive animation, expression and speech in real time. The viewer supports 162 animations, emotion controls, camera presets and scene backgrounds for VTuber-style or companion experiences.

How this skill works

The skill hosts a Vite-based web viewer (default http://localhost:3000) and a WebSocket control server (default ws://localhost:8765). Agents send JSON commands over the WebSocket to play animations, set facial expressions, trigger TTS speech with lip sync, or reset the avatar. Users supply their own VRM model (drag-and-drop or set model.url in the config) and may configure ports, voice settings and ElevenLabs API credentials for TTS.

When to use it

  • You want a visual avatar for an AI assistant, VTuber or virtual companion.
  • You need remote control of animations and expressions via WebSocket from an agent.
  • You want real-time voice chat with TTS-driven lip sync.
  • You need a web-based viewer with camera presets and scene backgrounds.
  • You must demo an AI character with rich, pre-baked animations and touch reactions.

Best practices

  • Provide a compliant VRM model that supports facial blend shapes for lip sync and expressions.
  • Store ElevenLabs API key securely in env or your OpenClaw config only if using TTS.
  • Map high-level agent intents to animation IDs and expression names to keep behavior consistent.
  • Test camera presets and scene choices with your model โ€” different VRMs need different framings.
  • Respect Mixamo licensing: credit sources and avoid unauthorized commercial use when required.

Example use cases

  • VTuber streaming: control gestures, expressions and TTS as a live host avatar.
  • Customer-facing avatar: guide users with gestures and speech on a website or kiosk.
  • Companion bot: run idle behaviors, touch reactions and context-aware responses.
  • Demoing character designs: preview different animations, backgrounds and camera presets.
  • Research or teaching: prototype embodied conversational agents with lip-synced speech.

FAQ

Do I get a VRM model with the skill?

No. You must provide your own VRM model (BYOM). Drag-and-drop on the viewer or set model.url in clawatar.config.json.

How do I enable TTS lip sync?

Provide an ElevenLabs API key via the ELEVENLABS_API_KEY environment variable or in your OpenClaw config. Then send a speak command over the WebSocket to trigger TTS and lip sync.

Where do I find animation IDs?

A catalog of 162 animations is available in the viewerโ€™s public/animations/catalog.json. Common action IDs (e.g., 161_Waving) are listed in the quick reference.