home / skills / openclaw / skills / whisper-local-api

whisper-local-api skill

safe

This skill provides a private, offline OpenAI-compatible Whisper ASR endpoint for fast, accurate speech-to-text on your own hardware.

npx playbooks add skill openclaw/skills --skill whisper-local-api

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

2.3 KB

---
name: whisper-local-api
description: Secure, offline, OpenAI-compatible local Whisper ASR endpoint for OpenClaw. Features faster-whisper (large-v3-turbo), built-in privacy with no cloud telemetry, low-RAM usage footprint, and high-accuracy speech-to-text transcription. Perfect for safe and private AI agent voice commands.
---

# Whisper Local API - Secure & Private ASR

Deploy a privacy-first, 100% local speech-to-text service in a deterministic way. This allows OpenClaw to process audio transcriptions safely on your own hardware without ever contacting third-party cloud APIs.

## Key SEO & Security Features

*   **100% Offline & Private:** Your voice data, commands, and transcriptions never leave your host system. Zero cloud dependencies.
*   **Highly Accurate:** Uses the `large-v3-turbo` model via `faster-whisper`, achieving state-of-the-art accuracy even with accents or background noise.
*   **Memory Safe:** Operates around ~400-500MB of RAM, making it extremely lightweight for VPS or low-resource edge servers.
*   **OpenAI API Compatible:** Exposes a strict `/v1/audio/transcriptions` endpoint mimicking OpenAI's JSON format. Compatible natively with any software that supports OpenAI's Whisper API.

## Standard Workflow

1. Install/update runtime:
   ```bash
   bash scripts/bootstrap.sh
   ```
2. Start service:
   ```bash
   bash scripts/start.sh
   ```
3. Validate service health:
   ```bash
   bash scripts/healthcheck.sh
   ```
4. (Optional) Run a smoke transcription test with a local audio file:
   ```bash
   bash scripts/smoke-test.sh /path/to/test-speech.mp3
   ```

## Repo Location

Default install/update path used by scripts:
*   `~/whisper-local-api`

Override with env var before running scripts:
```bash
WHISPER_DIR=/custom/path bash scripts/bootstrap.sh
```

## OpenClaw Integration Notes

After the healthcheck passes, use the secure local endpoint:
*   URL: `http://localhost:9000`
*   Endpoint: `/v1/audio/transcriptions`

No authentication tokens are passed over the network.

## Safety Rules

*   Ask before any package-manager operations.
*   The API securely binds locally to `0.0.0.0`. If exposing to the public internet, deploy behind a secure reverse proxy (like Nginx) and enforce HTTPS + Basic Auth.
*   This service will safely auto-fallback memory allocation modes (`float16` -> `int8`) to prevent CPU crashes.

Overview

This skill provides a secure, offline OpenAI-compatible Whisper ASR endpoint optimized for OpenClaw. It runs large-v3-turbo via faster-whisper for high-accuracy speech-to-text while keeping all data local and private. The service is memory-efficient (~400–500MB RAM) and exposes a drop-in /v1/audio/transcriptions endpoint matching OpenAI’s Whisper format.

How this skill works

It installs and runs a local HTTP service that implements the OpenAI Whisper transcription API and uses faster-whisper with the large-v3-turbo model. Audio sent to the /v1/audio/transcriptions endpoint is transcribed on-device with automatic memory-safe fallbacks (float16 → int8) to avoid crashes. No telemetry or cloud calls are made; the endpoint is compatible with any client expecting OpenAI-style JSON responses.

When to use it

You need fully local, private transcription for voice commands or sensitive audio.
Deploying on low-memory VPS, edge devices, or development machines where RAM is constrained.
Integrating speech input with OpenClaw or other agents expecting the OpenAI Whisper API.
When you want deterministic, offline testing or archiving of ASR results without external dependencies.

Best practices

Run the service behind a secure reverse proxy (HTTPS + Basic Auth) if you expose it beyond localhost.
Validate health with the included healthcheck and smoke-test scripts after installation.
Set WHISPER_DIR to override the default install path before bootstrap if you require a custom layout.
Confirm package-manager changes with the operator before upgrades; the tool asks before altering system packages.
Monitor memory usage on very small hosts and prefer int8 fallback on constrained CPUs.

Example use cases

Local voice control for OpenClaw agents without sending audio to the cloud.
Batch-transcribing archived audio for offline analytics or searching sensitive recordings.
Embedding private ASR in demos, labs, or workshops where internet access is restricted.
Edge deployments on low-cost VPS for remote teams that require private transcription.

FAQ

Is audio ever sent to external services?

No. All transcription runs locally; there is no cloud telemetry or external API calls.

What resources does it require?

Typical memory footprint is around 400–500MB RAM; CPU requirements depend on throughput and model precision but float16→int8 fallback reduces crashes on constrained systems.

How do I integrate it with software expecting OpenAI’s Whisper API?

Point clients at the local URL (default http://localhost:9000) and use the /v1/audio/transcriptions endpoint; responses follow OpenAI’s JSON format for seamless compatibility.