home / skills / willsigmon / sigstack / deepgram-expert

This skill helps you integrate real-time Deepgram transcription with low latency, speaker diarization, and multilingual support into applications.

npx playbooks add skill willsigmon/sigstack --skill deepgram-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.5 KB
---
name: Deepgram Expert
description: Deepgram - real-time transcription, low latency, speaker diarization, enterprise-ready
allowed-tools: Read, Edit, Bash, WebFetch
model: sonnet
---

# Deepgram Expert

Enterprise-grade real-time speech-to-text.

## Pricing (2026)

- **$0.0043/minute** (pay-as-you-go)
- Volume tiers automatic (no contracts)
- Real-time and batch options
- Speaker diarization included

## Key Advantages
- **20 seconds** to transcribe 1 hour
- Low latency streaming (0.2-0.3x real-time)
- 5.8% WER on technical audio
- Speaker diarization built-in
- 30+ languages

## API Usage

### Batch Transcription
```python
from deepgram import DeepgramClient, PrerecordedOptions

deepgram = DeepgramClient(api_key="...")

options = PrerecordedOptions(
    model="nova-3",
    smart_format=True,
    diarize=True,
    punctuate=True
)

response = deepgram.listen.rest.v1.transcribe_file(
    {"buffer": open("audio.mp3", "rb")},
    options
)

print(response.results.channels[0].alternatives[0].transcript)
```

### Real-Time Streaming
```python
from deepgram import DeepgramClient, LiveOptions

deepgram = DeepgramClient(api_key="...")

options = LiveOptions(
    model="nova-3",
    language="en",
    smart_format=True,
    interim_results=True
)

connection = deepgram.listen.websocket.v1.start(options)

@connection.on("transcript_received")
def on_transcript(transcript):
    if transcript.is_final:
        print(transcript.channel.alternatives[0].transcript)

# Stream audio
connection.send(audio_chunk)
```

### Speaker Diarization
```python
options = PrerecordedOptions(
    model="nova-3",
    diarize=True
)

# Response includes speaker labels
for word in response.results.channels[0].alternatives[0].words:
    print(f"Speaker {word.speaker}: {word.word}")
```

## Model Options

| Model | Speed | Accuracy | Use Case |
|-------|-------|----------|----------|
| nova-3 | Fast | Best | General |
| whisper | Medium | Good | Accuracy focus |
| enhanced | Fast | Good | Low latency |

## Integration

### Node.js SDK
```typescript
import { createClient } from "@deepgram/sdk";

const deepgram = createClient(process.env.DEEPGRAM_KEY);

const { result } = await deepgram.listen.prerecorded.transcribeFile(
  fs.createReadStream("audio.mp3"),
  { model: "nova-3", smart_format: true }
);
```

## Best For
- Real-time applications
- Call center transcription
- Live captioning
- Voice analytics
- Technical/medical audio

Use when: Real-time transcription, low-latency requirements, speaker identification

Overview

This skill packages Deepgram enterprise-grade speech-to-text capabilities for real-time and batch transcription with low latency and built-in speaker diarization. It targets high-throughput production use: live captioning, call centers, voice analytics, and technical audio. The implementation supports Node.js and Python SDKs and exposes model selection and streaming controls for latency and accuracy trade-offs.

How this skill works

The skill uses Deepgram’s SDKs and WebSocket streaming to send audio and receive transcripts with interim and final results. Batch transcription runs via REST/file uploads, returning formatted transcripts and word-level speaker labels when diarization is enabled. Model options (nova-3, whisper, enhanced) let you prioritize speed or accuracy; pricing is metered per minute with automatic volume tiers.

When to use it

  • Live captioning for events or broadcasts requiring sub-second latency.
  • Call center transcription with speaker diarization for agent/customer separation.
  • Voice analytics pipelines that need fast transcript availability for indexing.
  • Transcribing technical or medical audio where low WER matters.
  • Batch processing large audio archives where speed and cost efficiency matter.

Best practices

  • Choose nova-3 for general low-latency use, whisper when accuracy is prioritized.
  • Enable smart_format and punctuate to get readable, production-ready transcripts.
  • Use interim_results for responsive UIs, but rely on is_final for persistent storage.
  • Enable diarize for multi-speaker audio and map speaker labels to meeting metadata.
  • Chunk long streams into manageable segments to reduce memory and simplify retry logic.

Example use cases

  • Real-time captions for webinars and live streams with sub-second display.
  • Automated call summaries and QA in contact centers with speaker separation.
  • Indexer for searchable meeting archives with timestamped, speaker-labeled words.
  • Medical research transcription where technical accuracy and fast turnaround are required.

FAQ

What latency can I expect for streaming transcription?

Typical low-latency streaming runs around 0.2–0.3x real-time; final transcript availability depends on model and network conditions.

Does diarization increase cost or processing time?

Diarization is included and adds modest processing overhead; it can increase turnaround time slightly for batch jobs but is available in both real-time and prerecorded flows.