home / skills / openclaw / skills / mlx-whisper
This skill transcribes local audio with Apple Silicon optimized MLX Whisper, eliminating API keys and delivering ready-to-use subtitles and translations.
npx playbooks add skill openclaw/skills --skill mlx-whisperReview the files below or copy the command above to add this skill to your agents.
---
name: mlx-whisper
version: 1.0.0
description: Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
homepage: https://github.com/ml-explore/mlx-examples/tree/main/whisper
metadata: {"clawdbot":{"emoji":"🍎","requires":{"bins":["mlx_whisper"]},"install":[{"id":"pip","kind":"pip","package":"mlx-whisper","bins":["mlx_whisper"],"label":"Install mlx-whisper (pip)"}]}}
---
# MLX Whisper
Local speech-to-text using Apple MLX, optimized for Apple Silicon Macs.
## Quick Start
```bash
mlx_whisper /path/to/audio.mp3 --model mlx-community/whisper-large-v3-turbo
```
## Common Usage
```bash
# Transcribe to text file
mlx_whisper audio.m4a -f txt -o ./output
# Transcribe with language hint
mlx_whisper audio.mp3 --language en --model mlx-community/whisper-large-v3-turbo
# Generate subtitles (SRT)
mlx_whisper video.mp4 -f srt -o ./subs
# Translate to English
mlx_whisper foreign.mp3 --task translate
```
## Models (download on first use)
| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| mlx-community/whisper-tiny | ~75MB | Fastest | Basic |
| mlx-community/whisper-base | ~140MB | Fast | Good |
| mlx-community/whisper-small | ~470MB | Medium | Better |
| mlx-community/whisper-medium | ~1.5GB | Slower | Great |
| mlx-community/whisper-large-v3 | ~3GB | Slowest | Best |
| mlx-community/whisper-large-v3-turbo | ~1.6GB | Fast | Excellent (Recommended) |
## Notes
- Requires Apple Silicon Mac (M1/M2/M3/M4)
- Models cache to `~/.cache/huggingface/`
- Default model is `mlx-community/whisper-tiny`; use `--model mlx-community/whisper-large-v3-turbo` for best results
This skill provides local speech-to-text using MLX Whisper, optimized for Apple Silicon Macs and designed to run without an API key. It supports multiple models and formats, from quick transcripts to subtitle (SRT) generation and translation. The default model is lightweight for fast runs, but larger models are available for higher accuracy. Models download on first use and are cached locally for repeated runs.
The tool runs inference locally on Apple Silicon (M1/M2/M3/M4) using MLX-optimized Whisper models. It loads a chosen model (default tiny) from the Hugging Face cache, processes audio or video files, and outputs text, subtitles, or translations. You can select model size, language hints, output format, and task (transcribe or translate) via command-line options.
Does this require an internet connection?
You need internet to download models the first time, but inference runs locally after models are cached.
Which Macs are supported?
Apple Silicon Macs (M1/M2/M3/M4) are required for the MLX-optimized models.
Where are models stored?
Downloaded models are cached in ~/.cache/huggingface/ by default.