home / skills / openclaw / skills / speechall-cli

speechall-cli skill

safe

This skill helps you transcribe audio and video via the speechall CLI, manage models, and run diarization from the terminal.

npx playbooks add skill openclaw/skills --skill speechall-cli

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

4.3 KB

---
name: speechall-cli
description: "Install and use the speechall CLI tool for speech-to-text transcription. Use when the user wants to: (1) transcribe audio or video files to text, (2) install speechall on macOS or Linux, (3) list available STT models and their capabilities, (4) use speaker diarization, subtitles, or other transcription features from the terminal. Triggers on mentions of speechall, audio transcription CLI, or speech-to-text from the command line."
---

# speechall-cli

CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).

## Installation

### Homebrew (macOS and Linux)

```bash
brew install Speechall/tap/speechall
```

**Without Homebrew**: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your `PATH`.

### Verify

```bash
speechall --version
```

## Authentication

An API key is required. Provide it via environment variable (preferred) or flag:

```bash
export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav
```

The user can create an API key on https://speechall.com/console/api-keys

## Commands

### transcribe (default)

Transcribe an audio or video file. This is the default subcommand — `speechall audio.wav` is equivalent to `speechall transcribe audio.wav`.

```bash
speechall <file> [options]
```

**Options:**

| Flag | Description | Default |
|---|---|---|
| `--model <provider.model>` | STT model identifier | `openai.gpt-4o-mini-transcribe` |
| `--language <code>` | Language code (e.g. `en`, `tr`, `de`) | API default (auto-detect) |
| `--output-format <format>` | Output format (`text`, `json`, `verbose_json`, `srt`, `vtt`) | API default |
| `--diarization` | Enable speaker diarization | off |
| `--speakers-expected <n>` | Expected number of speakers (use with `--diarization`) | — |
| `--no-punctuation` | Disable automatic punctuation | — |
| `--temperature <0.0-1.0>` | Model temperature | — |
| `--initial-prompt <text>` | Text prompt to guide model style | — |
| `--custom-vocabulary <term>` | Terms to boost recognition (repeatable) | — |
| `--ruleset-id <uuid>` | Replacement ruleset UUID | — |
| `--api-key <key>` | API key (overrides `SPEECHALL_API_KEY` env var) | — |

**Examples:**

```bash
# Basic transcription
speechall interview.mp3

# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en

# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt

# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4
```

### models

List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.

```bash
speechall models [options]
```

**Filter flags:**

| Flag | Description |
|---|---|
| `--provider <name>` | Filter by provider (e.g. `openai`, `deepgram`) |
| `--language <code>` | Filter by supported language (`tr` matches `tr`, `tr-TR`, `tr-CY`) |
| `--diarization` | Only models supporting speaker diarization |
| `--srt` | Only models supporting SRT output |
| `--vtt` | Only models supporting VTT output |
| `--punctuation` | Only models supporting automatic punctuation |
| `--streamable` | Only models supporting real-time streaming |
| `--vocabulary` | Only models supporting custom vocabulary |

**Examples:**

```bash
# List all available models
speechall models

# Models from a specific provider
speechall models --provider deepgram

# Models that support Turkish and diarization
speechall models --language tr --diarization

# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'
```

## Tips

- On macOS, video files (`.mp4`, `.mov`, etc.) are automatically converted to audio before upload.
- On Linux, pass audio files directly (`.wav`, `.mp3`, `.m4a`, `.flac`, etc.).
- Output goes to stdout. Redirect to save: `speechall audio.wav > transcript.txt`
- Errors go to stderr, so piping stdout is safe.
- Run `speechall --help`, `speechall transcribe --help`, or `speechall models --help` to see all valid enum values for model identifiers, language codes, and output formats.

Overview

This skill installs and uses the speechall CLI to perform speech-to-text transcription from the terminal. It supports multiple providers and formats, plus features like speaker diarization, custom vocabulary, and subtitle outputs (SRT/VTT). Use it to transcribe audio or video files quickly on macOS or Linux and list available STT models and capabilities.

How this skill works

Install the speechall binary via Homebrew or by downloading the release for your platform, then authenticate with an API key (env var or flag). Run the default transcribe command or the models command to list provider models and capabilities. The tool uploads audio (or extracts audio from video on macOS), calls the selected provider/model, and streams the transcription to stdout in text, JSON, SRT, VTT, or verbose formats.

When to use it

Transcribe recorded interviews, meetings, podcasts, or lecture audio from the command line.
Quickly extract subtitles (SRT/VTT) for video files before publishing.
Compare or choose STT providers and models by listing capabilities such as diarization or punctuation.
Add speaker diarization for multi-person meetings and export speaker-labeled transcripts.
Automate bulk transcription in scripts by redirecting stdout to files.

Best practices

Set SPEECHALL_API_KEY as an environment variable to avoid exposing keys on command lines.
Choose a model that supports needed features (diarization, SRT, punctuation) via speechall models before transcribing.
Redirect stdout to save transcripts and handle errors separately (stderr).
Supply --custom-vocabulary for domain-specific terms to improve recognition accuracy.
On macOS, feed video files directly; on Linux, extract or provide audio formats supported by the API.

Example use cases

Transcribe a podcast episode: speechall episode.mp3 > episode-transcript.txt
Generate subtitles for a recorded webinar: speechall webinar.mp4 --output-format srt > webinar.srt
Run speaker diarization for a meeting: speechall meeting.wav --diarization --speakers-expected 4 --output-format vtt
List models that support Turkish and diarization to pick the best fit: speechall models --language tr --diarization
Boost recognition of medical terminology: speechall notes.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

FAQ

How do I authenticate the CLI?

Provide your API key via the SPEECHALL_API_KEY environment variable or pass --api-key on the command line.

Can I transcribe video files?

Yes. On macOS the CLI extracts audio from common video formats automatically. On Linux provide an audio file or pre-extract audio.