home / skills / aidotnet / moyucode / text-to-speech

text-to-speech skill

safe

This skill converts text to speech audio with multiple voices and languages, enabling quick generation of accessible audio content.

npx playbooks add skill aidotnet/moyucode --skill text-to-speech

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

908 B

---
name: text-to-speech
description: 将文本转换为语音音频文件，支持多种声音和语言。
metadata:
  short-description: 文字转语音
source:
  repository: https://github.com/pyttsx3/pyttsx3
  license: MPL-2.0
---

# Text to Speech Tool

## Description
Convert text to speech audio files with support for multiple voices, languages, and speech rates.

## Trigger
- `/tts` command
- User needs text to speech
- User wants to generate audio

## Usage

```bash
# Speak text
python scripts/text_to_speech.py "Hello World"

# Save to file
python scripts/text_to_speech.py "Hello World" --output hello.mp3

# Change voice/rate
python scripts/text_to_speech.py "Hello" --rate 150 --voice 1

# Read from file
python scripts/text_to_speech.py --file document.txt --output audio.mp3
```

## Tags
`tts`, `speech`, `audio`, `voice`, `accessibility`

## Compatibility
- Codex: ✅
- Claude Code: ✅

Overview

This skill converts plain text into speech audio files, supporting multiple voices, languages, and adjustable speech rates. It produces playable audio (MP3, WAV) and can read single strings or whole documents. Use it to generate narration, accessibility audio, voice prompts, or previews for apps and content.

How this skill works

Provide text directly or point to a text file; the tool selects a voice, language, and rate, then synthesizes audio and writes an output file. Command-line flags control output filename, voice index, and speech rate. The engine handles basic punctuation and preserves line breaks for natural pauses.

When to use it

Creating narrated audio for articles, tutorials, or e-learning modules.
Generating accessibility audio for visually impaired users.
Prototyping voice prompts for IVR, chatbots, or apps.
Batch converting documents into podcasts or audio libraries.
Quickly previewing different voices and speaking rates.

Best practices

Choose a voice and language that match your audience for natural results.
Adjust rate and add punctuation or line breaks to control cadence and pauses.
Test short samples before batch-processing long documents to ensure tone and clarity.
Specify clear output filenames and formats (mp3, wav) to match distribution needs.
Sanitize or segment long input texts to avoid runtime or memory issues.

Example use cases

Convert a blog post into an MP3 for distribution as a short podcast.
Generate voice prompts for a prototype conversational UI with different voices.
Produce audio versions of user manuals for accessibility compliance.
Create narration for a video by exporting lines as WAV and syncing in an editor.

FAQ

What input formats are supported?

Plain text strings and text files are supported; feed the text directly or use the --file option to read a document.

Which output formats can I create?

Common formats like MP3 and WAV are supported; specify the desired filename extension or output option.

How do I change the voice or speaking speed?

Use command-line flags to select a voice index and adjust the rate (e.g., --voice 1 --rate 150) to fine-tune tone and tempo.