home / skills / willsigmon / sigstack / audio-expert
This skill helps you manage audio systems for Leavn and Modcaster, enabling real-time TTS, caption syncing, recording, and fingerprinting across devices.
npx playbooks add skill willsigmon/sigstack --skill audio-expertReview the files below or copy the command above to add this skill to your agents.
---
name: Audio Expert
description: Leavn and Modcaster audio systems - TTS, streaming, recording, fingerprinting, ML processing, caption sync
allowed-tools: Read, Edit, Grep
---
# Audio Expert
Audio architecture for Leavn (Bible app) and Modcaster (podcast app).
## Leavn Audio Systems
### Core Services
- `AudioPlaybackService` - TTS + verse reading
- `GuidedAudioOrchestrator` - Guided mode coordination
- `SermonRecordingService` - Sermon capture
- `StreamingTTSClient` - Real-time TTS
- `ChatterboxKit` - Voice profiles
### Caption & Timing
- `CaptionTimecodeAligner` - Precision sync
- `CaptionSyncCoordinator` - State machine
- Verse boundary detection
### Audio Graph
- `AudioGraphManager` initialization
- Scene modes: CarPlay, background, etc.
### Common Fixes
- Audio conflicts/interruptions
- TTS voice selection
- Recording state management
- Caption synchronization
## Modcaster Audio Systems
### Audio Fingerprinting
Spectral peak extraction (Shazam-style):
1. Apply FFT using vDSP
2. Extract spectral peaks
3. Create constellation map
4. Hash into compact fingerprint
Use cases: Intro/outro detection, ad identification, cross-show matching
### On-Device ML
- `AVAudioEngine` tap installation, buffer processing
- `Core ML` models for voice enhancement, noise reduction
- `Sound Analysis` for speech/music classification
- Neural Engine utilization
### Content Classification
- Episode type: full/trailer/bonus
- Ad segment detection
- Intro/outro recognition
- Speech vs music separation
This skill describes an audio infrastructure blueprint for Leavn (Bible app) and Modcaster (podcast app). It covers TTS, streaming, recording, fingerprinting, on-device ML, and caption synchronization to build reliable playback, capture, and content analysis pipelines. The emphasis is practical: services, common fixes, and component interactions for production mobile/audio systems.
The architecture splits responsibilities into focused services: playback and TTS, guided orchestration, sermon recording, streaming TTS, and voice profile management. Caption alignment and stateful caption sync ensure precise verse timing. For podcast workflows, the system extracts spectral peaks, hashes constellation maps into compact fingerprints, and runs on-device ML taps for enhancement and classification.
How do I avoid audio interruptions when switching scenes?
Centralize audio session management in an AudioGraphManager and use explicit scene modes (CarPlay, background) with prioritized resource handling to prevent conflicts.
Where should I run ML models for noise reduction?
Prefer Core ML models executed on the Neural Engine with AVAudioEngine taps for buffer capture; this keeps processing local and minimizes network dependency.