home / skills / dkyazzentwatwa / chatgpt-skills / audio-analyzer
This skill analyzes audio files to reveal tempo, key, loudness, and frequency insights with visualizations and exportable reports.
npx playbooks add skill dkyazzentwatwa/chatgpt-skills --skill audio-analyzerReview the files below or copy the command above to add this skill to your agents.
---
name: audio-analyzer
description: Comprehensive audio analysis with waveform visualization, spectrogram, BPM detection, key detection, frequency analysis, and loudness metrics.
---
# Audio Analyzer
A comprehensive toolkit for analyzing audio files. Extract detailed information about audio including tempo, musical key, frequency content, loudness metrics, and generate professional visualizations.
## Quick Start
```python
from scripts.audio_analyzer import AudioAnalyzer
# Analyze an audio file
analyzer = AudioAnalyzer("song.mp3")
analyzer.analyze()
# Get all analysis results
results = analyzer.get_results()
print(f"BPM: {results['tempo']['bpm']}")
print(f"Key: {results['key']['key']} {results['key']['mode']}")
# Generate visualizations
analyzer.plot_waveform("waveform.png")
analyzer.plot_spectrogram("spectrogram.png")
# Full report
analyzer.save_report("analysis_report.json")
```
## Features
- **Tempo/BPM Detection**: Accurate beat tracking with confidence score
- **Key Detection**: Musical key and mode (major/minor) identification
- **Frequency Analysis**: Spectrum, dominant frequencies, frequency bands
- **Loudness Metrics**: RMS, peak, LUFS, dynamic range
- **Waveform Visualization**: Multi-channel waveform plots
- **Spectrogram**: Time-frequency visualization with customization
- **Chromagram**: Pitch class visualization for harmonic analysis
- **Beat Grid**: Visual beat markers overlaid on waveform
- **Export Formats**: JSON report, PNG/SVG visualizations
## API Reference
### Initialization
```python
# From file
analyzer = AudioAnalyzer("audio.mp3")
# With custom sample rate
analyzer = AudioAnalyzer("audio.wav", sr=44100)
```
### Analysis Methods
```python
# Run full analysis
analyzer.analyze()
# Individual analyses
analyzer.analyze_tempo() # BPM and beat positions
analyzer.analyze_key() # Musical key detection
analyzer.analyze_loudness() # RMS, peak, LUFS
analyzer.analyze_frequency() # Spectrum analysis
analyzer.analyze_dynamics() # Dynamic range
```
### Results Access
```python
# Get all results as dict
results = analyzer.get_results()
# Individual results
tempo = analyzer.get_tempo() # {'bpm': 120, 'confidence': 0.85, 'beats': [...]}
key = analyzer.get_key() # {'key': 'C', 'mode': 'major', 'confidence': 0.72}
loudness = analyzer.get_loudness() # {'rms_db': -14.2, 'peak_db': -0.5, 'lufs': -14.0}
freq = analyzer.get_frequency() # {'dominant_freq': 440, 'spectrum': [...]}
```
### Visualization Methods
```python
# Waveform
analyzer.plot_waveform(
output="waveform.png",
figsize=(12, 4),
color="#1f77b4",
show_rms=True
)
# Spectrogram
analyzer.plot_spectrogram(
output="spectrogram.png",
figsize=(12, 6),
cmap="magma", # viridis, plasma, inferno, magma
freq_scale="log", # linear, log, mel
max_freq=8000 # Hz
)
# Chromagram (pitch classes)
analyzer.plot_chromagram(
output="chromagram.png",
figsize=(12, 4)
)
# Onset strength / beat grid
analyzer.plot_beats(
output="beats.png",
figsize=(12, 4),
show_strength=True
)
# Combined dashboard
analyzer.plot_dashboard(
output="dashboard.png",
figsize=(14, 10)
)
```
### Export
```python
# JSON report with all analysis
analyzer.save_report("report.json")
# Summary text
summary = analyzer.get_summary()
print(summary)
```
## Analysis Details
### Tempo Detection
Uses beat tracking algorithm to detect:
- **BPM**: Beats per minute (tempo)
- **Beat positions**: Timestamps of detected beats
- **Confidence**: Reliability score (0-1)
```python
tempo = analyzer.get_tempo()
# {
# 'bpm': 128.0,
# 'confidence': 0.89,
# 'beats': [0.0, 0.469, 0.938, 1.406, ...], # seconds
# 'beat_count': 256
# }
```
### Key Detection
Analyzes harmonic content to identify:
- **Key**: Root note (C, C#, D, etc.)
- **Mode**: Major or minor
- **Confidence**: Detection confidence
- **Key profile**: Correlation with each key
```python
key = analyzer.get_key()
# {
# 'key': 'A',
# 'mode': 'minor',
# 'confidence': 0.76,
# 'profile': {'C': 0.12, 'C#': 0.08, ...}
# }
```
### Loudness Metrics
Comprehensive loudness analysis:
- **RMS dB**: Root mean square level
- **Peak dB**: Maximum sample level
- **LUFS**: Integrated loudness (broadcast standard)
- **Dynamic Range**: Difference between loud and quiet sections
```python
loudness = analyzer.get_loudness()
# {
# 'rms_db': -14.2,
# 'peak_db': -0.3,
# 'lufs': -14.0,
# 'dynamic_range_db': 12.5,
# 'crest_factor': 8.2
# }
```
### Frequency Analysis
Spectrum analysis including:
- **Dominant frequency**: Strongest frequency component
- **Frequency bands**: Energy in bass, mid, treble
- **Spectral centroid**: "Brightness" of audio
- **Spectral rolloff**: Frequency below which 85% of energy exists
```python
freq = analyzer.get_frequency()
# {
# 'dominant_freq': 440.0,
# 'spectral_centroid': 2150.3,
# 'spectral_rolloff': 4200.5,
# 'bands': {
# 'sub_bass': -28.5, # 20-60 Hz
# 'bass': -18.2, # 60-250 Hz
# 'low_mid': -12.1, # 250-500 Hz
# 'mid': -10.8, # 500-2000 Hz
# 'high_mid': -14.3, # 2000-4000 Hz
# 'high': -22.1 # 4000-20000 Hz
# }
# }
```
## CLI Usage
```bash
# Full analysis with all visualizations
python audio_analyzer.py --input song.mp3 --output-dir ./analysis/
# Just tempo and key
python audio_analyzer.py --input song.mp3 --analyze tempo key --output report.json
# Generate specific visualization
python audio_analyzer.py --input song.mp3 --plot spectrogram --output spec.png
# Dashboard view
python audio_analyzer.py --input song.mp3 --dashboard --output dashboard.png
# Batch analyze directory
python audio_analyzer.py --input-dir ./songs/ --output-dir ./reports/
```
### CLI Arguments
| Argument | Description | Default |
|----------|-------------|---------|
| `--input` | Input audio file | Required |
| `--input-dir` | Directory of audio files | - |
| `--output` | Output file path | - |
| `--output-dir` | Output directory | `.` |
| `--analyze` | Analysis types: tempo, key, loudness, frequency, all | `all` |
| `--plot` | Plot type: waveform, spectrogram, chromagram, beats, dashboard | - |
| `--format` | Output format: json, txt | `json` |
| `--sr` | Sample rate for analysis | `22050` |
## Examples
### Song Analysis
```python
analyzer = AudioAnalyzer("track.mp3")
analyzer.analyze()
print(f"Tempo: {analyzer.get_tempo()['bpm']:.1f} BPM")
print(f"Key: {analyzer.get_key()['key']} {analyzer.get_key()['mode']}")
print(f"Loudness: {analyzer.get_loudness()['lufs']:.1f} LUFS")
analyzer.plot_dashboard("track_analysis.png")
```
### Podcast Quality Check
```python
analyzer = AudioAnalyzer("podcast.mp3")
analyzer.analyze_loudness()
loudness = analyzer.get_loudness()
if loudness['lufs'] > -16:
print("Warning: Audio may be too loud for podcast standards")
elif loudness['lufs'] < -20:
print("Warning: Audio may be too quiet")
else:
print("Loudness is within podcast standards (-16 to -20 LUFS)")
```
### Batch Analysis
```python
import os
from scripts.audio_analyzer import AudioAnalyzer
results = []
for filename in os.listdir("./songs"):
if filename.endswith(('.mp3', '.wav', '.flac')):
analyzer = AudioAnalyzer(f"./songs/{filename}")
analyzer.analyze()
results.append({
'file': filename,
'bpm': analyzer.get_tempo()['bpm'],
'key': f"{analyzer.get_key()['key']} {analyzer.get_key()['mode']}",
'lufs': analyzer.get_loudness()['lufs']
})
# Sort by BPM for DJ set
results.sort(key=lambda x: x['bpm'])
```
## Supported Formats
Input formats (via librosa/soundfile):
- MP3
- WAV
- FLAC
- OGG
- M4A/AAC
- AIFF
Output formats:
- JSON (analysis report)
- PNG (visualizations)
- SVG (visualizations)
- TXT (summary)
## Dependencies
```
librosa>=0.10.0
soundfile>=0.12.0
matplotlib>=3.7.0
numpy>=1.24.0
scipy>=1.10.0
```
## Limitations
- Key detection works best with melodic content (less accurate for drums/percussion)
- BPM detection may struggle with free-tempo or complex time signatures
- Very short clips (<5 seconds) may have reduced accuracy
- LUFS calculation is simplified (not full ITU-R BS.1770-4)
This skill performs comprehensive audio analysis and visualization for music, podcasts, and sound design. It extracts tempo, musical key, frequency content, and loudness metrics, and can generate waveform, spectrogram, chromagram, beat-grid, and dashboard images. The outputs include numeric reports (JSON/TXT) and publication-ready PNG/SVG visuals.
The analyzer loads common audio formats and runs modular analysis steps: beat tracking for BPM and beats, harmonic analysis for key detection, FFT-based spectral analysis for frequency metrics, and loudness estimation including RMS, peak, LUFS, and dynamic range. Visualization functions render multi-channel waveforms, time-frequency spectrograms, chromagrams, and annotated beat grids. Results are exposed through a simple API and CLI for batch processing and report export.
How accurate is the key detection?
Key detection performs well on melodic music but can be unreliable on percussion-heavy or heavily processed tracks; use the provided confidence score and key profile for verification.
Can this calculate broadcast-standard LUFS?
The tool provides a practical LUFS estimate suitable for loudness checks, but it simplifies the standard algorithm; for strict compliance use a dedicated ITU-R BS.1770-4 implementation.