home / skills / openclaw / skills / parakeet-mlx
This skill transcribes audio locally on Apple Silicon using Parakeet MLX, delivering formats like txt, vtt, or json for offline privacy.
npx playbooks add skill openclaw/skills --skill parakeet-mlxReview the files below or copy the command above to add this skill to your agents.
---
name: parakeet-mlx
description: Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).
homepage: https://github.com/senstella/parakeet-mlx
metadata: {"clawdbot":{"emoji":"🦜","requires":{"bins":["parakeet-mlx"]},"install":[{"id":"uv-tool","kind":"uv","formula":"parakeet-mlx","bins":["parakeet-mlx"],"label":"Install Parakeet MLX CLI (uv tool install)"}]}}
---
# Parakeet MLX (CLI)
Use `parakeet-mlx` to transcribe audio locally on Apple Silicon.
Quick start
- `parakeet-mlx /path/audio.mp3 --output-format txt`
- `parakeet-mlx /path/audio.m4a --output-format vtt --highlight-words`
- `parakeet-mlx *.mp3 --output-format all`
Notes
- Install CLI with: `uv tool install parakeet-mlx -U` (not `uv add` or `pip install`)
- Use `parakeet-mlx --help` to see all options (`--help`, not `-h`).
- Models download from Hugging Face to `~/.cache/huggingface` on first run.
- Default model: `mlx-community/parakeet-tdt-0.6b-v3` (optimized for Apple Silicon).
- Requires `ffmpeg` installed for audio processing.
- Output formats: txt, srt, vtt, json, or all.
- Use `--verbose` for detailed progress and confidence scores.
- Accepts multiple files (shell wildcards like `*.mp3` work).
This skill provides a local speech-to-text CLI using Parakeet MLX optimized for Apple Silicon, requiring no API key. It runs entirely on-device, downloads models to the Hugging Face cache on first use, and outputs captions or transcripts in multiple formats.
The CLI accepts one or more audio files, invokes ffmpeg for preprocessing, and runs the Parakeet MLX ASR model (default: mlx-community/parakeet-tdt-0.6b-v3) locally on Apple Silicon. Models are fetched to ~/.cache/huggingface automatically on first run, and the tool can emit txt, srt, vtt, json, or all formats with optional verbosity and word-highlighting.
Do I need an API key?
No. The tool runs locally and does not require any API key.
Where are models stored?
Models download to ~/.cache/huggingface on the first run and are reused from that cache.
What audio formats are supported?
The CLI accepts common audio files; ffmpeg handles format conversion, so most formats like mp3 and m4a work.