home / skills / openclaw / skills / mlx-whisper

mlx-whisper skill

/skills/kevin37li/mlx-whisper

This skill transcribes local audio with Apple Silicon optimized MLX Whisper, eliminating API keys and delivering ready-to-use subtitles and translations.

npx playbooks add skill openclaw/skills --skill mlx-whisper

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.6 KB
---
name: mlx-whisper
version: 1.0.0
description: Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).
homepage: https://github.com/ml-explore/mlx-examples/tree/main/whisper
metadata: {"clawdbot":{"emoji":"🍎","requires":{"bins":["mlx_whisper"]},"install":[{"id":"pip","kind":"pip","package":"mlx-whisper","bins":["mlx_whisper"],"label":"Install mlx-whisper (pip)"}]}}
---

# MLX Whisper

Local speech-to-text using Apple MLX, optimized for Apple Silicon Macs.

## Quick Start

```bash
mlx_whisper /path/to/audio.mp3 --model mlx-community/whisper-large-v3-turbo
```

## Common Usage

```bash
# Transcribe to text file
mlx_whisper audio.m4a -f txt -o ./output

# Transcribe with language hint
mlx_whisper audio.mp3 --language en --model mlx-community/whisper-large-v3-turbo

# Generate subtitles (SRT)
mlx_whisper video.mp4 -f srt -o ./subs

# Translate to English
mlx_whisper foreign.mp3 --task translate
```

## Models (download on first use)

| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| mlx-community/whisper-tiny | ~75MB | Fastest | Basic |
| mlx-community/whisper-base | ~140MB | Fast | Good |
| mlx-community/whisper-small | ~470MB | Medium | Better |
| mlx-community/whisper-medium | ~1.5GB | Slower | Great |
| mlx-community/whisper-large-v3 | ~3GB | Slowest | Best |
| mlx-community/whisper-large-v3-turbo | ~1.6GB | Fast | Excellent (Recommended) |

## Notes

- Requires Apple Silicon Mac (M1/M2/M3/M4)
- Models cache to `~/.cache/huggingface/`
- Default model is `mlx-community/whisper-tiny`; use `--model mlx-community/whisper-large-v3-turbo` for best results

Overview

This skill provides local speech-to-text using MLX Whisper, optimized for Apple Silicon Macs and designed to run without an API key. It supports multiple models and formats, from quick transcripts to subtitle (SRT) generation and translation. The default model is lightweight for fast runs, but larger models are available for higher accuracy. Models download on first use and are cached locally for repeated runs.

How this skill works

The tool runs inference locally on Apple Silicon (M1/M2/M3/M4) using MLX-optimized Whisper models. It loads a chosen model (default tiny) from the Hugging Face cache, processes audio or video files, and outputs text, subtitles, or translations. You can select model size, language hints, output format, and task (transcribe or translate) via command-line options.

When to use it

  • Transcribing interviews, meetings, or lectures on an Apple Silicon Mac without sending audio to external APIs.
  • Generating SRT subtitles for videos when you need local processing and quick turnaround.
  • Translating spoken audio into English or another language offline.
  • Batch processing archived audio files where privacy or bandwidth is a concern.
  • Testing different model sizes to balance speed and transcription quality.

Best practices

  • Use mlx-community/whisper-large-v3-turbo for best balance of speed and accuracy when disk space allows.
  • Keep models cached in ~/.cache/huggingface/ to avoid repeated downloads across runs.
  • Provide a language hint (--language) for improved accuracy when audio language is known.
  • Use higher-quality or larger models for noisy audio or speakers with accents.
  • Transcode very long or unsupported formats to standard audio (WAV or MP3) before processing.

Example use cases

  • Transcribe a meeting audio file to a text file: mlx_whisper meeting.mp3 -f txt -o ./output
  • Generate subtitles for a video: mlx_whisper video.mp4 -f srt -o ./subs
  • Translate foreign-language audio to English: mlx_whisper foreign.mp3 --task translate
  • Quick local transcription for notes with the tiny model on a laptop: mlx_whisper note.m4a
  • Process a folder of archived interviews locally to preserve privacy and avoid API costs.

FAQ

Does this require an internet connection?

You need internet to download models the first time, but inference runs locally after models are cached.

Which Macs are supported?

Apple Silicon Macs (M1/M2/M3/M4) are required for the MLX-optimized models.

Where are models stored?

Downloaded models are cached in ~/.cache/huggingface/ by default.