home / skills / openclaw / skills / whisper-local-api

whisper-local-api skill

/skills/hantok/whisper-local-api

This skill provides a private, offline OpenAI-compatible Whisper ASR endpoint for fast, accurate speech-to-text on your own hardware.

npx playbooks add skill openclaw/skills --skill whisper-local-api

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
2.3 KB
---
name: whisper-local-api
description: Secure, offline, OpenAI-compatible local Whisper ASR endpoint for OpenClaw. Features faster-whisper (large-v3-turbo), built-in privacy with no cloud telemetry, low-RAM usage footprint, and high-accuracy speech-to-text transcription. Perfect for safe and private AI agent voice commands.
---

# Whisper Local API - Secure & Private ASR

Deploy a privacy-first, 100% local speech-to-text service in a deterministic way. This allows OpenClaw to process audio transcriptions safely on your own hardware without ever contacting third-party cloud APIs.

## Key SEO & Security Features

*   **100% Offline & Private:** Your voice data, commands, and transcriptions never leave your host system. Zero cloud dependencies.
*   **Highly Accurate:** Uses the `large-v3-turbo` model via `faster-whisper`, achieving state-of-the-art accuracy even with accents or background noise.
*   **Memory Safe:** Operates around ~400-500MB of RAM, making it extremely lightweight for VPS or low-resource edge servers.
*   **OpenAI API Compatible:** Exposes a strict `/v1/audio/transcriptions` endpoint mimicking OpenAI's JSON format. Compatible natively with any software that supports OpenAI's Whisper API.

## Standard Workflow

1. Install/update runtime:
   ```bash
   bash scripts/bootstrap.sh
   ```
2. Start service:
   ```bash
   bash scripts/start.sh
   ```
3. Validate service health:
   ```bash
   bash scripts/healthcheck.sh
   ```
4. (Optional) Run a smoke transcription test with a local audio file:
   ```bash
   bash scripts/smoke-test.sh /path/to/test-speech.mp3
   ```

## Repo Location

Default install/update path used by scripts:
*   `~/whisper-local-api`

Override with env var before running scripts:
```bash
WHISPER_DIR=/custom/path bash scripts/bootstrap.sh
```

## OpenClaw Integration Notes

After the healthcheck passes, use the secure local endpoint:
*   URL: `http://localhost:9000`
*   Endpoint: `/v1/audio/transcriptions`

No authentication tokens are passed over the network.

## Safety Rules

*   Ask before any package-manager operations.
*   The API securely binds locally to `0.0.0.0`. If exposing to the public internet, deploy behind a secure reverse proxy (like Nginx) and enforce HTTPS + Basic Auth.
*   This service will safely auto-fallback memory allocation modes (`float16` -> `int8`) to prevent CPU crashes.

Overview

This skill provides a secure, offline OpenAI-compatible Whisper ASR endpoint optimized for OpenClaw. It runs large-v3-turbo via faster-whisper for high-accuracy speech-to-text while keeping all data local and private. The service is memory-efficient (~400–500MB RAM) and exposes a drop-in /v1/audio/transcriptions endpoint matching OpenAI’s Whisper format.

How this skill works

It installs and runs a local HTTP service that implements the OpenAI Whisper transcription API and uses faster-whisper with the large-v3-turbo model. Audio sent to the /v1/audio/transcriptions endpoint is transcribed on-device with automatic memory-safe fallbacks (float16 → int8) to avoid crashes. No telemetry or cloud calls are made; the endpoint is compatible with any client expecting OpenAI-style JSON responses.

When to use it

  • You need fully local, private transcription for voice commands or sensitive audio.
  • Deploying on low-memory VPS, edge devices, or development machines where RAM is constrained.
  • Integrating speech input with OpenClaw or other agents expecting the OpenAI Whisper API.
  • When you want deterministic, offline testing or archiving of ASR results without external dependencies.

Best practices

  • Run the service behind a secure reverse proxy (HTTPS + Basic Auth) if you expose it beyond localhost.
  • Validate health with the included healthcheck and smoke-test scripts after installation.
  • Set WHISPER_DIR to override the default install path before bootstrap if you require a custom layout.
  • Confirm package-manager changes with the operator before upgrades; the tool asks before altering system packages.
  • Monitor memory usage on very small hosts and prefer int8 fallback on constrained CPUs.

Example use cases

  • Local voice control for OpenClaw agents without sending audio to the cloud.
  • Batch-transcribing archived audio for offline analytics or searching sensitive recordings.
  • Embedding private ASR in demos, labs, or workshops where internet access is restricted.
  • Edge deployments on low-cost VPS for remote teams that require private transcription.

FAQ

Is audio ever sent to external services?

No. All transcription runs locally; there is no cloud telemetry or external API calls.

What resources does it require?

Typical memory footprint is around 400–500MB RAM; CPU requirements depend on throughput and model precision but float16→int8 fallback reduces crashes on constrained systems.

How do I integrate it with software expecting OpenAI’s Whisper API?

Point clients at the local URL (default http://localhost:9000) and use the /v1/audio/transcriptions endpoint; responses follow OpenAI’s JSON format for seamless compatibility.