home / skills / openclaw / skills / parakeet-mlx

parakeet-mlx skill

/skills/kylehowells/parakeet-mlx

This skill transcribes audio locally on Apple Silicon using Parakeet MLX, delivering formats like txt, vtt, or json for offline privacy.

npx playbooks add skill openclaw/skills --skill parakeet-mlx

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.2 KB
---
name: parakeet-mlx
description: Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).
homepage: https://github.com/senstella/parakeet-mlx
metadata: {"clawdbot":{"emoji":"🦜","requires":{"bins":["parakeet-mlx"]},"install":[{"id":"uv-tool","kind":"uv","formula":"parakeet-mlx","bins":["parakeet-mlx"],"label":"Install Parakeet MLX CLI (uv tool install)"}]}}
---

# Parakeet MLX (CLI)

Use `parakeet-mlx` to transcribe audio locally on Apple Silicon.

Quick start
- `parakeet-mlx /path/audio.mp3 --output-format txt`
- `parakeet-mlx /path/audio.m4a --output-format vtt --highlight-words`
- `parakeet-mlx *.mp3 --output-format all`

Notes
- Install CLI with: `uv tool install parakeet-mlx -U` (not `uv add` or `pip install`)
- Use `parakeet-mlx --help` to see all options (`--help`, not `-h`).
- Models download from Hugging Face to `~/.cache/huggingface` on first run.
- Default model: `mlx-community/parakeet-tdt-0.6b-v3` (optimized for Apple Silicon).
- Requires `ffmpeg` installed for audio processing.
- Output formats: txt, srt, vtt, json, or all.
- Use `--verbose` for detailed progress and confidence scores.
- Accepts multiple files (shell wildcards like `*.mp3` work).

Overview

This skill provides a local speech-to-text CLI using Parakeet MLX optimized for Apple Silicon, requiring no API key. It runs entirely on-device, downloads models to the Hugging Face cache on first use, and outputs captions or transcripts in multiple formats.

How this skill works

The CLI accepts one or more audio files, invokes ffmpeg for preprocessing, and runs the Parakeet MLX ASR model (default: mlx-community/parakeet-tdt-0.6b-v3) locally on Apple Silicon. Models are fetched to ~/.cache/huggingface automatically on first run, and the tool can emit txt, srt, vtt, json, or all formats with optional verbosity and word-highlighting.

When to use it

  • Transcribing audio on a Mac with Apple Silicon without sending data to a third-party API.
  • Batch-processing many files using shell wildcards like *.mp3.
  • Generating captions (SRT/VTT) for videos or podcasts for publication.
  • Working offline or in environments with strict privacy requirements.
  • Testing or archiving audio transcripts locally before distribution.

Best practices

  • Install the CLI via: uv tool install parakeet-mlx -U (do not use pip or uv add).
  • Ensure ffmpeg is installed and available in PATH for reliable audio preprocessing.
  • Run a single-file test first so the model downloads and caches to ~/.cache/huggingface.
  • Use --output-format all for multiple formats in one pass, and --verbose when you need confidence scores.
  • Prefer the default Apple Silicon–optimized model unless you have a different model in mind.

Example use cases

  • Transcribe a meeting recording: parakeet-mlx meeting.m4a --output-format txt
  • Create captions for a lecture video: parakeet-mlx lecture.mp3 --output-format vtt --highlight-words
  • Batch convert a podcast archive: parakeet-mlx *.mp3 --output-format all
  • Produce machine-readable timestamps and confidences with --verbose for quality checks
  • Process audio locally when compliance or privacy rules forbid cloud transcription

FAQ

Do I need an API key?

No. The tool runs locally and does not require any API key.

Where are models stored?

Models download to ~/.cache/huggingface on the first run and are reused from that cache.

What audio formats are supported?

The CLI accepts common audio files; ffmpeg handles format conversion, so most formats like mp3 and m4a work.