home / skills / openclaw / skills / openai-whisper-api

openai-whisper-api skill

/skills/steipete/openai-whisper-api

This skill transcribes audio using OpenAI Whisper API, generating text transcripts from audio files with configurable model, language, and output options.

npx playbooks add skill openclaw/skills --skill openai-whisper-api

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
1.1 KB
---
name: openai-whisper-api
description: Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
homepage: https://platform.openai.com/docs/guides/speech-to-text
metadata: {"clawdbot":{"emoji":"☁️","requires":{"bins":["curl"],"env":["OPENAI_API_KEY"]},"primaryEnv":"OPENAI_API_KEY"}}
---

# OpenAI Whisper API (curl)

Transcribe an audio file via OpenAI’s `/v1/audio/transcriptions` endpoint.

## Quick start

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
```

Defaults:
- Model: `whisper-1`
- Output: `<input>.txt`

## Useful flags

```bash
{baseDir}/scripts/transcribe.sh /path/to/audio.ogg --model whisper-1 --out /tmp/transcript.txt
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --language en
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --prompt "Speaker names: Peter, Daniel"
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a --json --out /tmp/transcript.json
```

## API key

Set `OPENAI_API_KEY`, or configure it in `~/.clawdbot/clawdbot.json`:

```json5
{
  skills: {
    "openai-whisper-api": {
      apiKey: "OPENAI_KEY_HERE"
    }
  }
}
```

Overview

This skill transcribes audio using the OpenAI Audio Transcriptions API (Whisper) so you can convert speech to text quickly and reliably. It uses the whisper-1 model by default and writes output to a text file next to the input unless you specify a different path. The tool supports model selection, language hints, custom prompts, and JSON output for integration.

How this skill works

The script calls the /v1/audio/transcriptions endpoint with your audio file and API key. You can pass flags to select model, force language, provide a prompt (for speaker names or context), and choose plain text or JSON output. API credentials are read from the OPENAI_API_KEY environment variable or from a local configuration file.

When to use it

  • Transcribing interviews, meetings, or lectures to searchable text.
  • Generating captions or notes from recorded audio for content production.
  • Batch-processing archived audio as part of an indexing workflow.
  • When you need language hints or speaker-context prompts to improve accuracy.
  • Exporting structured transcripts in JSON for downstream parsing.

Best practices

  • Use clear, high-quality audio and reduce background noise for better accuracy.
  • Provide a language flag for non-auto-detected languages to reduce errors.
  • Use the prompt flag to supply speaker names or domain context when relevant.
  • Request JSON output if you plan to programmatically segment or label transcripts.
  • Keep your OPENAI_API_KEY secure and load it from environment variables or a protected config file.

Example use cases

  • Run scripts/transcribe.sh meeting.m4a to generate meeting notes in meeting.m4a.txt.
  • Transcribe a podcast episode to JSON, then import timestamps and speakers into an editor.
  • Process a folder of lecture recordings to create searchable text for an LMS.
  • Provide speaker names in a prompt for panel discussions to improve labeling.

FAQ

Where does the API key come from?

Set OPENAI_API_KEY in your environment or place it under the skill entry in a local config file.

How do I change the output file?

Use the --out flag to specify a different output path and filename.

Can I force the transcription language?

Yes — pass the --language flag with the ISO language code to avoid incorrect auto-detection.