home / skills / willsigmon / sigstack / audio-fingerprint-expert

audio-fingerprint-expert skill

safe

/plugins/media/skills/audio-fingerprint-expert

This skill helps you identify and skip intros, ads, and match music using audio fingerprints across platforms.

npx playbooks add skill willsigmon/sigstack --skill audio-fingerprint-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.6 KB

---
name: Audio Fingerprint Expert
description: Audio fingerprinting - music recognition, ad detection, intro/outro skipping
allowed-tools: Read, Edit, Bash, WebFetch
model: sonnet
---

# Audio Fingerprint Expert

Identify and match audio content using fingerprinting.

## Use Cases for Modcaster
- Skip intros/outros automatically
- Detect and skip ads
- Identify music in podcasts
- Match duplicate content

## Top Services

### Commercial APIs

**AudD ($2-5/1000 requests)**
- Neural network based
- Music recognition
- Real-time and batch

**ACRCloud**
- Industry leader
- Cross-platform SDKs
- Custom fingerprint databases

**ShazamAPI (via RapidAPI)**
- The classic
- Huge music database
- Enterprise options

### Open Source

**AcoustID (Free)**
- Links to MusicBrainz
- Community-powered
- Chromaprint fingerprinting

**Dejavu**
- Python implementation
- Self-hosted
- Custom audio matching

## AudD API

### Recognize Music
```bash
curl -X POST "https://api.audd.io/" \
  -F "api_token=YOUR_TOKEN" \
  -F "[email protected]" \
  -F "return=spotify,apple_music"
```

### Python
```python
import requests

response = requests.post('https://api.audd.io/', data={
    'api_token': 'YOUR_TOKEN',
    'return': 'spotify,apple_music',
}, files={
    'file': open('audio.mp3', 'rb'),
})

result = response.json()
if result['result']:
    print(f"Found: {result['result']['title']} by {result['result']['artist']}")
```

## AcoustID (Free)

### Generate Fingerprint
```bash
# Install chromaprint
brew install chromaprint

# Generate fingerprint
fpcalc -json audio.mp3
```

### Lookup
```python
import acoustid

for score, recording_id, title, artist in acoustid.match(API_KEY, 'audio.mp3'):
    print(f"Match ({score:.2f}): {title} by {artist}")
```

## Dejavu (Self-Hosted)

### Setup
```python
from dejavu import Dejavu

djv = Dejavu(config={
    "database_type": "sqlite",
    "database": "fingerprints.db"
})

# Fingerprint known audio
djv.fingerprint_directory("known_intros/", [".mp3", ".wav"])

# Match unknown audio
songs = djv.recognize(FileRecognizer, "podcast_episode.mp3")
print(songs)  # Returns matches with timestamps
```

## Podcast Ad Detection Pattern

```python
# 1. Fingerprint known ads
for ad_file in known_ads:
    dejavu.fingerprint_file(ad_file)

# 2. When processing episode
matches = dejavu.recognize(episode_file)

# 3. Get timestamps of ads
ad_segments = [(m['offset'], m['offset'] + m['duration']) for m in matches]

# 4. Skip those segments in player
```

## Accuracy Tips
- Use 10-30 second samples
- Higher sample rate = better accuracy
- Noise affects matching
- Store fingerprints, not audio

Use when: Music recognition, ad skipping, duplicate detection, audio matching

Overview

This skill implements audio fingerprinting to identify and match audio content for tasks like music recognition, ad detection, intro/outro skipping, and duplicate detection. It supports both commercial APIs and open-source self-hosted options so you can choose trade-offs between cost, latency, and control. The skill focuses on generating and comparing compact fingerprints rather than storing raw audio.

How this skill works

The skill extracts short, robust fingerprints from audio segments and compares them against a database or remote API to find matches and timestamps. It can fingerprint known items (ads, intros, songs) and then scan new recordings to locate those signatures, returning match scores and segment offsets. Integrations include services like AudD, ACRCloud, Shazam (via RapidAPI), and open-source tools like Chromaprint/AcoustID and Dejavu for self-hosted workflows.

When to use it

Automatically skip podcast intros, outros, or known ad breaks during playback or processing.
Detect and label advertisements and promos in long-form audio for analytics or redaction.
Recognize songs within podcasts or user uploads to provide metadata and rights info.
Find and remove duplicate clips across a media library or archive.
Build a custom fingerprint database for brand monitoring or content matching.

Best practices

Fingerprints: store compact fingerprints instead of raw audio to save storage and speed up matching.
Sample length: use 10–30 second representative samples for reliable matches; shorter noisy samples reduce accuracy.
Audio quality: prefer higher sample rates and minimize background noise when fingerprinting reference files.
Hybrid approach: use commercial APIs for broad catalog coverage and self-hosted fingerprints for proprietary content.
Timestamps: persist match offsets and durations to enable precise skipping or clipping in playback.

Example use cases

Podcast platform automatically removes or masks known ad segments before distribution using stored fingerprints.
Media player that skips recurring intro/outro music by matching against a local fingerprint database in real time.
Content operations team identifies reused clips across thousands of episodes to detect duplicate uploads.
Music discovery feature that recognizes songs inside episodes and links to streaming metadata (Spotify/Apple Music).
Brand monitoring system that finds when sponsor ads run inside third-party content using fingerprint matches.

FAQ

Which solution should I pick: commercial API or self-hosted?

Use commercial APIs for broad coverage and fast setup; choose self-hosted (Dejavu/Chromaprint+AcoustID) when you need full control, lower per-request cost, or to fingerprint proprietary content.

How long should reference samples be?

Aim for 10–30 seconds of clear audio. That balances uniqueness for matching with storage and compute efficiency.

Can fingerprints survive noise or compression?

Yes—fingerprinting is designed to be robust to common compression and moderate noise, but excessive distortion or very low bitrate can reduce accuracy.