home / skills / dkyazzentwatwa / chatgpt-skills / audio-analyzer

audio-analyzer skill

/audio-analyzer

This skill analyzes audio files to reveal tempo, key, loudness, and frequency insights with visualizations and exportable reports.

npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill audio-analyzer

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
8.1 KB
---
name: audio-analyzer
description: Comprehensive audio analysis with waveform visualization, spectrogram, BPM detection, key detection, frequency analysis, and loudness metrics.
---

# Audio Analyzer

A comprehensive toolkit for analyzing audio files. Extract detailed information about audio including tempo, musical key, frequency content, loudness metrics, and generate professional visualizations.

## Quick Start

```python
from scripts.audio_analyzer import AudioAnalyzer

# Analyze an audio file
analyzer = AudioAnalyzer("song.mp3")
analyzer.analyze()

# Get all analysis results
results = analyzer.get_results()
print(f"BPM: {results['tempo']['bpm']}")
print(f"Key: {results['key']['key']} {results['key']['mode']}")

# Generate visualizations
analyzer.plot_waveform("waveform.png")
analyzer.plot_spectrogram("spectrogram.png")

# Full report
analyzer.save_report("analysis_report.json")
```

## Features

- **Tempo/BPM Detection**: Accurate beat tracking with confidence score
- **Key Detection**: Musical key and mode (major/minor) identification
- **Frequency Analysis**: Spectrum, dominant frequencies, frequency bands
- **Loudness Metrics**: RMS, peak, LUFS, dynamic range
- **Waveform Visualization**: Multi-channel waveform plots
- **Spectrogram**: Time-frequency visualization with customization
- **Chromagram**: Pitch class visualization for harmonic analysis
- **Beat Grid**: Visual beat markers overlaid on waveform
- **Export Formats**: JSON report, PNG/SVG visualizations

## API Reference

### Initialization

```python
# From file
analyzer = AudioAnalyzer("audio.mp3")

# With custom sample rate
analyzer = AudioAnalyzer("audio.wav", sr=44100)
```

### Analysis Methods

```python
# Run full analysis
analyzer.analyze()

# Individual analyses
analyzer.analyze_tempo()      # BPM and beat positions
analyzer.analyze_key()        # Musical key detection
analyzer.analyze_loudness()   # RMS, peak, LUFS
analyzer.analyze_frequency()  # Spectrum analysis
analyzer.analyze_dynamics()   # Dynamic range
```

### Results Access

```python
# Get all results as dict
results = analyzer.get_results()

# Individual results
tempo = analyzer.get_tempo()        # {'bpm': 120, 'confidence': 0.85, 'beats': [...]}
key = analyzer.get_key()            # {'key': 'C', 'mode': 'major', 'confidence': 0.72}
loudness = analyzer.get_loudness()  # {'rms_db': -14.2, 'peak_db': -0.5, 'lufs': -14.0}
freq = analyzer.get_frequency()     # {'dominant_freq': 440, 'spectrum': [...]}
```

### Visualization Methods

```python
# Waveform
analyzer.plot_waveform(
    output="waveform.png",
    figsize=(12, 4),
    color="#1f77b4",
    show_rms=True
)

# Spectrogram
analyzer.plot_spectrogram(
    output="spectrogram.png",
    figsize=(12, 6),
    cmap="magma",           # viridis, plasma, inferno, magma
    freq_scale="log",       # linear, log, mel
    max_freq=8000           # Hz
)

# Chromagram (pitch classes)
analyzer.plot_chromagram(
    output="chromagram.png",
    figsize=(12, 4)
)

# Onset strength / beat grid
analyzer.plot_beats(
    output="beats.png",
    figsize=(12, 4),
    show_strength=True
)

# Combined dashboard
analyzer.plot_dashboard(
    output="dashboard.png",
    figsize=(14, 10)
)
```

### Export

```python
# JSON report with all analysis
analyzer.save_report("report.json")

# Summary text
summary = analyzer.get_summary()
print(summary)
```

## Analysis Details

### Tempo Detection

Uses beat tracking algorithm to detect:
- **BPM**: Beats per minute (tempo)
- **Beat positions**: Timestamps of detected beats
- **Confidence**: Reliability score (0-1)

```python
tempo = analyzer.get_tempo()
# {
#     'bpm': 128.0,
#     'confidence': 0.89,
#     'beats': [0.0, 0.469, 0.938, 1.406, ...],  # seconds
#     'beat_count': 256
# }
```

### Key Detection

Analyzes harmonic content to identify:
- **Key**: Root note (C, C#, D, etc.)
- **Mode**: Major or minor
- **Confidence**: Detection confidence
- **Key profile**: Correlation with each key

```python
key = analyzer.get_key()
# {
#     'key': 'A',
#     'mode': 'minor',
#     'confidence': 0.76,
#     'profile': {'C': 0.12, 'C#': 0.08, ...}
# }
```

### Loudness Metrics

Comprehensive loudness analysis:
- **RMS dB**: Root mean square level
- **Peak dB**: Maximum sample level
- **LUFS**: Integrated loudness (broadcast standard)
- **Dynamic Range**: Difference between loud and quiet sections

```python
loudness = analyzer.get_loudness()
# {
#     'rms_db': -14.2,
#     'peak_db': -0.3,
#     'lufs': -14.0,
#     'dynamic_range_db': 12.5,
#     'crest_factor': 8.2
# }
```

### Frequency Analysis

Spectrum analysis including:
- **Dominant frequency**: Strongest frequency component
- **Frequency bands**: Energy in bass, mid, treble
- **Spectral centroid**: "Brightness" of audio
- **Spectral rolloff**: Frequency below which 85% of energy exists

```python
freq = analyzer.get_frequency()
# {
#     'dominant_freq': 440.0,
#     'spectral_centroid': 2150.3,
#     'spectral_rolloff': 4200.5,
#     'bands': {
#         'sub_bass': -28.5,      # 20-60 Hz
#         'bass': -18.2,          # 60-250 Hz
#         'low_mid': -12.1,       # 250-500 Hz
#         'mid': -10.8,           # 500-2000 Hz
#         'high_mid': -14.3,      # 2000-4000 Hz
#         'high': -22.1           # 4000-20000 Hz
#     }
# }
```

## CLI Usage

```bash
# Full analysis with all visualizations
python audio_analyzer.py --input song.mp3 --output-dir ./analysis/

# Just tempo and key
python audio_analyzer.py --input song.mp3 --analyze tempo key --output report.json

# Generate specific visualization
python audio_analyzer.py --input song.mp3 --plot spectrogram --output spec.png

# Dashboard view
python audio_analyzer.py --input song.mp3 --dashboard --output dashboard.png

# Batch analyze directory
python audio_analyzer.py --input-dir ./songs/ --output-dir ./reports/
```

### CLI Arguments

| Argument | Description | Default |
|----------|-------------|---------|
| `--input` | Input audio file | Required |
| `--input-dir` | Directory of audio files | - |
| `--output` | Output file path | - |
| `--output-dir` | Output directory | `.` |
| `--analyze` | Analysis types: tempo, key, loudness, frequency, all | `all` |
| `--plot` | Plot type: waveform, spectrogram, chromagram, beats, dashboard | - |
| `--format` | Output format: json, txt | `json` |
| `--sr` | Sample rate for analysis | `22050` |

## Examples

### Song Analysis

```python
analyzer = AudioAnalyzer("track.mp3")
analyzer.analyze()

print(f"Tempo: {analyzer.get_tempo()['bpm']:.1f} BPM")
print(f"Key: {analyzer.get_key()['key']} {analyzer.get_key()['mode']}")
print(f"Loudness: {analyzer.get_loudness()['lufs']:.1f} LUFS")

analyzer.plot_dashboard("track_analysis.png")
```

### Podcast Quality Check

```python
analyzer = AudioAnalyzer("podcast.mp3")
analyzer.analyze_loudness()

loudness = analyzer.get_loudness()
if loudness['lufs'] > -16:
    print("Warning: Audio may be too loud for podcast standards")
elif loudness['lufs'] < -20:
    print("Warning: Audio may be too quiet")
else:
    print("Loudness is within podcast standards (-16 to -20 LUFS)")
```

### Batch Analysis

```python
import os
from scripts.audio_analyzer import AudioAnalyzer

results = []
for filename in os.listdir("./songs"):
    if filename.endswith(('.mp3', '.wav', '.flac')):
        analyzer = AudioAnalyzer(f"./songs/{filename}")
        analyzer.analyze()
        results.append({
            'file': filename,
            'bpm': analyzer.get_tempo()['bpm'],
            'key': f"{analyzer.get_key()['key']} {analyzer.get_key()['mode']}",
            'lufs': analyzer.get_loudness()['lufs']
        })

# Sort by BPM for DJ set
results.sort(key=lambda x: x['bpm'])
```

## Supported Formats

Input formats (via librosa/soundfile):
- MP3
- WAV
- FLAC
- OGG
- M4A/AAC
- AIFF

Output formats:
- JSON (analysis report)
- PNG (visualizations)
- SVG (visualizations)
- TXT (summary)

## Dependencies

```
librosa>=0.10.0
soundfile>=0.12.0
matplotlib>=3.7.0
numpy>=1.24.0
scipy>=1.10.0
```

## Limitations

- Key detection works best with melodic content (less accurate for drums/percussion)
- BPM detection may struggle with free-tempo or complex time signatures
- Very short clips (<5 seconds) may have reduced accuracy
- LUFS calculation is simplified (not full ITU-R BS.1770-4)

Overview

This skill performs comprehensive audio analysis and visualization for music, podcasts, and sound design. It extracts tempo, musical key, frequency content, and loudness metrics, and can generate waveform, spectrogram, chromagram, beat-grid, and dashboard images. The outputs include numeric reports (JSON/TXT) and publication-ready PNG/SVG visuals.

How this skill works

The analyzer loads common audio formats and runs modular analysis steps: beat tracking for BPM and beats, harmonic analysis for key detection, FFT-based spectral analysis for frequency metrics, and loudness estimation including RMS, peak, LUFS, and dynamic range. Visualization functions render multi-channel waveforms, time-frequency spectrograms, chromagrams, and annotated beat grids. Results are exposed through a simple API and CLI for batch processing and report export.

When to use it

  • Analyze tempo and beat positions for DJ sets or remixing
  • Detect musical key and mode for harmonic mixing or songwriting
  • Audit loudness for podcast and broadcast compliance
  • Inspect frequency balance and dominant tones for mixing and mastering
  • Generate visual reports for presentations, metadata, or catalogs

Best practices

  • Use full-length clips (preferably >20s) for reliable key and LUFS measurements
  • For tempo detection, prefer steady rhythmic content; manually verify confidence scores for complex material
  • Resample to a consistent sample rate when batch-processing many files to ensure comparable metrics
  • Combine spectrogram and chromagram views to diagnose masking or harmonic clashes
  • Export JSON reports for automated pipelines and PNG/SVG for human review

Example use cases

  • Batch-analyze a music library to create BPM-sorted DJ playlists and key labels
  • Run loudness checks on podcast episodes and flag LUFS outside target ranges
  • Generate spectrograms and waveform dashboards for release assets and quality control
  • Detect dominant frequencies and spectral centroids to guide EQ decisions during mixing
  • Produce beat-grid overlays to align stems for tempo-syncing and editing

FAQ

How accurate is the key detection?

Key detection performs well on melodic music but can be unreliable on percussion-heavy or heavily processed tracks; use the provided confidence score and key profile for verification.

Can this calculate broadcast-standard LUFS?

The tool provides a practical LUFS estimate suitable for loudness checks, but it simplifies the standard algorithm; for strict compliance use a dedicated ITU-R BS.1770-4 implementation.