home / skills / openclaw / skills / whisper-local-api
This skill provides a private, offline OpenAI-compatible Whisper ASR endpoint for fast, accurate speech-to-text on your own hardware.
npx playbooks add skill openclaw/skills --skill whisper-local-apiReview the files below or copy the command above to add this skill to your agents.
---
name: whisper-local-api
description: Secure, offline, OpenAI-compatible local Whisper ASR endpoint for OpenClaw. Features faster-whisper (large-v3-turbo), built-in privacy with no cloud telemetry, low-RAM usage footprint, and high-accuracy speech-to-text transcription. Perfect for safe and private AI agent voice commands.
---
# Whisper Local API - Secure & Private ASR
Deploy a privacy-first, 100% local speech-to-text service in a deterministic way. This allows OpenClaw to process audio transcriptions safely on your own hardware without ever contacting third-party cloud APIs.
## Key SEO & Security Features
* **100% Offline & Private:** Your voice data, commands, and transcriptions never leave your host system. Zero cloud dependencies.
* **Highly Accurate:** Uses the `large-v3-turbo` model via `faster-whisper`, achieving state-of-the-art accuracy even with accents or background noise.
* **Memory Safe:** Operates around ~400-500MB of RAM, making it extremely lightweight for VPS or low-resource edge servers.
* **OpenAI API Compatible:** Exposes a strict `/v1/audio/transcriptions` endpoint mimicking OpenAI's JSON format. Compatible natively with any software that supports OpenAI's Whisper API.
## Standard Workflow
1. Install/update runtime:
```bash
bash scripts/bootstrap.sh
```
2. Start service:
```bash
bash scripts/start.sh
```
3. Validate service health:
```bash
bash scripts/healthcheck.sh
```
4. (Optional) Run a smoke transcription test with a local audio file:
```bash
bash scripts/smoke-test.sh /path/to/test-speech.mp3
```
## Repo Location
Default install/update path used by scripts:
* `~/whisper-local-api`
Override with env var before running scripts:
```bash
WHISPER_DIR=/custom/path bash scripts/bootstrap.sh
```
## OpenClaw Integration Notes
After the healthcheck passes, use the secure local endpoint:
* URL: `http://localhost:9000`
* Endpoint: `/v1/audio/transcriptions`
No authentication tokens are passed over the network.
## Safety Rules
* Ask before any package-manager operations.
* The API securely binds locally to `0.0.0.0`. If exposing to the public internet, deploy behind a secure reverse proxy (like Nginx) and enforce HTTPS + Basic Auth.
* This service will safely auto-fallback memory allocation modes (`float16` -> `int8`) to prevent CPU crashes.
This skill provides a secure, offline OpenAI-compatible Whisper ASR endpoint optimized for OpenClaw. It runs large-v3-turbo via faster-whisper for high-accuracy speech-to-text while keeping all data local and private. The service is memory-efficient (~400–500MB RAM) and exposes a drop-in /v1/audio/transcriptions endpoint matching OpenAI’s Whisper format.
It installs and runs a local HTTP service that implements the OpenAI Whisper transcription API and uses faster-whisper with the large-v3-turbo model. Audio sent to the /v1/audio/transcriptions endpoint is transcribed on-device with automatic memory-safe fallbacks (float16 → int8) to avoid crashes. No telemetry or cloud calls are made; the endpoint is compatible with any client expecting OpenAI-style JSON responses.
Is audio ever sent to external services?
No. All transcription runs locally; there is no cloud telemetry or external API calls.
What resources does it require?
Typical memory footprint is around 400–500MB RAM; CPU requirements depend on throughput and model precision but float16→int8 fallback reduces crashes on constrained systems.
How do I integrate it with software expecting OpenAI’s Whisper API?
Point clients at the local URL (default http://localhost:9000) and use the /v1/audio/transcriptions endpoint; responses follow OpenAI’s JSON format for seamless compatibility.