home / skills / openclaw / skills / edge-tts-uvx

edge-tts-uvx skill

/skills/al-one/edge-tts-uvx

This skill generates high-quality text-to-speech audio from text using edge-tts, with configurable voice, speed, pitch, and subtitles.

npx playbooks add skill openclaw/skills --skill edge-tts-uvx

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
4.3 KB
---
name: edge-tts-uvx
description: |
  Text-to-speech conversion using `uvx edge-tts` for generating audio from text.
  Use when:
    (1) User requests audio/voice output with the "tts" trigger or keyword.
    (2) Content needs to be spoken rather than read (multitasking, accessibility, driving, cooking).
    (3) User wants a specific voice, speed, pitch, or format for TTS output.
metadata:
  {
    "openclaw":
      {
        "emoji": "🗣️",
        "requires": {"bins": ["uvx"]}
      }
  }
---

# Edge-TTS Skill

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package.
Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.

## Usage
```shell
uvx edge-tts --text "{msg}" --write-media {tempdir}/{filename}.mp3

# With subtitles
uvx edge-tts --text "{msg}" --write-media {tempdir}/{filename}.mp3 --write-subtitles -
```

## Changing rate(speed), volume and pitch
```shell
uvx edge-tts --text "{msg}" --write-media {tempdir}/{filename}.mp3 --rate=+50%
uvx edge-tts --text "{msg}" --write-media {tempdir}/{filename}.mp3 --volume=+50% --pitch=-50Hz
```

## Changing the voice
```shell
uvx edge-tts --text "{msg}" --write-media {tempdir}/{filename}.mp3 --voice=zh-CN-XiaoxiaoNeural
```

## Available voices
```
Name                               Gender    ContentCategories      VoicePersonalities
en-GB-LibbyNeural                  Female    General                Friendly, Positive
en-GB-RyanNeural                   Male      General                Friendly, Positive
en-GB-SoniaNeural                  Female    General                Friendly, Positive
en-GB-ThomasNeural                 Male      General                Friendly, Positive
en-HK-SamNeural                    Male      General                Friendly, Positive
en-HK-YanNeural                    Female    General                Friendly, Positive
en-US-AnaNeural                    Female    Cartoon, Conversation  Cute
en-US-AndrewMultilingualNeural     Male      Conversation, Copilot  Warm, Confident, Authentic, Honest
en-US-AndrewNeural                 Male      Conversation, Copilot  Warm, Confident, Authentic, Honest
en-US-AriaNeural                   Female    News, Novel            Positive, Confident
en-US-AvaMultilingualNeural        Female    Conversation, Copilot  Expressive, Caring, Pleasant, Friendly
en-US-AvaNeural                    Female    Conversation, Copilot  Expressive, Caring, Pleasant, Friendly
en-US-BrianMultilingualNeural      Male      Conversation, Copilot  Approachable, Casual, Sincere
en-US-BrianNeural                  Male      Conversation, Copilot  Approachable, Casual, Sincere
en-US-ChristopherNeural            Male      News, Novel            Reliable, Authority
en-US-EmmaMultilingualNeural       Female    Conversation, Copilot  Cheerful, Clear, Conversational
en-US-EmmaNeural                   Female    Conversation, Copilot  Cheerful, Clear, Conversational
en-US-EricNeural                   Male      News, Novel            Rational
en-US-GuyNeural                    Male      News, Novel            Passion
en-US-JennyNeural                  Female    General                Friendly, Considerate, Comfort
en-US-MichelleNeural               Female    News, Novel            Friendly, Pleasant
en-US-RogerNeural                  Male      News, Novel            Lively
en-US-SteffanNeural                Male      News, Novel            Rational
fr-FR-DeniseNeural                 Female    General                Friendly, Positive
fr-FR-HenriNeural                  Male      General                Friendly, Positive
zh-CN-XiaoxiaoNeural               Female    News, Novel            Warm
zh-CN-YunjianNeural                Male      Sports,  Novel         Passion
zh-CN-liaoning-XiaobeiNeural       Female    Dialect                Humorous
zh-CN-shaanxi-XiaoniNeural         Female    Dialect                Bright
zh-HK-HiuGaaiNeural                Female    General                Friendly, Positive
zh-HK-WanLungNeural                Male      General                Friendly, Positive
zh-TW-HsiaoChenNeural              Female    General                Friendly, Positive
zh-TW-YunJheNeural                 Male      General                Friendly, Positive\
```

Retrieve all available voices using shell commands:
```shell
uvx edge-tts --list-voices
```

Overview

This skill converts text into high-quality neural speech using the uvx edge-tts command-line wrapper. It supports multiple languages, selectable voices, adjustable rate, pitch and volume, and can output audio files and subtitles. Use it when you need spoken output for accessibility, multitasking, or media generation. The tool is designed for simple shell integration and automated TTS workflows.

How this skill works

The skill calls the uvx edge-tts CLI to synthesize speech with Microsoft Edge neural voices. It accepts plain text or piped input and writes audio files (MP3/other formats) and optional subtitle files. You can list available voices, set voice name, and adjust rate, pitch, and volume via command options. Commands are suitable for scripts, chat responders, or on-demand audio generation.

When to use it

  • User explicitly requests audio or uses the "tts" trigger/keyword.
  • Content must be spoken for hands-free contexts (driving, cooking, exercising).
  • Accessibility needs: screen readers, visually impaired users, or learning tools.
  • You want a specific voice, language, or voice personality for narration.
  • Generating audio assets for podcasts, videos, or automated announcements.

Best practices

  • Preview short samples when selecting a voice and personality to match tone.
  • Adjust rate, pitch, and volume conservatively to keep natural prosody.
  • Provide clean, punctuation-aware input; add pauses or SSML-like cues if supported.
  • Generate subtitles alongside audio for searchability and accessibility.
  • Cache generated audio for repeat requests to save time and API/network usage.

Example use cases

  • Produce narrated instructions for a cooking app with a friendly female voice.
  • Create podcast intros and transitions with a confident, authoritative voice.
  • Generate accessibility audio files for articles, reports, or ebooks.
  • Automate voice prompts for an IVR or interactive kiosk in multiple languages.
  • Export audio + subtitles for short educational videos or course modules.

FAQ

How do I change voice, speed, or pitch?

Pass the --voice, --rate, --pitch and --volume flags to uvx edge-tts when invoking the command.

Can I get a list of available voices?

Yes — run uvx edge-tts --list-voices to retrieve all supported voices and metadata.

Does it produce subtitle files?

Yes — use the --write-subtitles option to output subtitle data alongside the audio file.