home / skills / badlogic / pi-skills / transcribe

transcribe skill

/transcribe

This skill transcribes audio to text using Groq Whisper API, handling multiple formats and delivering clean, punctuated output.

npx playbooks add skill badlogic/pi-skills --skill transcribe

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
875 B
---
name: transcribe
description: Speech-to-text transcription using Groq Whisper API. Supports m4a, mp3, wav, ogg, flac, webm.
---

# Transcribe

Speech-to-text using Groq Whisper API.

## Setup

The script needs `GROQ_API_KEY` environment variable. Check if already set:
```bash
echo $GROQ_API_KEY
```

If not set, guide the user through setup:
1. Ask if they have a Groq API key
2. If not, have them sign up at https://console.groq.com/ and create an API key
3. Have them add to their shell profile (~/.zshrc or ~/.bashrc):
   ```bash
   export GROQ_API_KEY="<their-api-key>"
   ```
4. Then run `source ~/.zshrc` (or restart terminal)

## Usage

```bash
{baseDir}/transcribe.sh <audio-file>
```

## Supported Formats

- m4a, mp3, wav, ogg, flac, webm
- Max file size: 25MB

## Output

Returns plain text transcription with punctuation and proper capitalization to stdout.

Overview

This skill provides speech-to-text transcription using the Groq Whisper API. It accepts common audio formats and returns a punctuation- and capitalization-aware plain-text transcription to stdout. It’s designed for quick command-line use in developer workflows.

How this skill works

The script uploads an audio file to the Groq Whisper API and receives a text transcription response. It supports m4a, mp3, wav, ogg, flac, and webm files up to 25 MB. The tool requires a GROQ_API_KEY environment variable for authenticated requests and prints the final transcription to standard output.

When to use it

  • Convert meeting recordings or interviews into searchable text
  • Generate captions or notes from voice memos and podcasts
  • Extract text from short audio clips for indexing or analysis
  • Rapidly transcribe demos or spoken code explanations for documentation
  • Integrate into CI or automation that needs quick, local transcription results

Best practices

  • Keep audio under 25 MB or split longer recordings into segments
  • Use clear, single-speaker recordings for highest accuracy
  • Set GROQ_API_KEY in your shell profile (e.g., ~/.zshrc or ~/.bashrc) and reload the shell
  • Normalize audio levels and remove background noise before upload when possible
  • Name files with timestamps or metadata to keep transcriptions organized

Example use cases

  • Transcribe a 10-minute podcast clip to produce episode show notes
  • Convert interview recordings into editable text for publishing
  • Generate searchable transcripts of user research sessions
  • Produce captions for short product demo videos
  • Automate transcription in a CI task that verifies spoken instructions or tests

FAQ

What audio formats are supported?

m4a, mp3, wav, ogg, flac, and webm are supported.

How large can the audio file be?

Maximum file size is 25 MB. Split longer files before uploading.

How do I supply the API key?

Set GROQ_API_KEY as an environment variable in your shell profile and reload the terminal.