home / skills / aj-geddes / useful-ai-prompts / performance-regression-debugging

performance-regression-debugging skill

/skills/performance-regression-debugging

This skill helps you detect, diagnose, and fix performance regressions by comparing baselines and profiling changes to restore speed.

npx playbooks add skill aj-geddes/useful-ai-prompts --skill performance-regression-debugging

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
5.0 KB
---
name: performance-regression-debugging
description: Identify and debug performance regressions from code changes. Use comparison and profiling to locate what degraded performance and restore baseline metrics.
---

# Performance Regression Debugging

## Overview

Performance regressions occur when code changes degrade application performance. Detection and quick resolution are critical.

## When to Use

- After deployment performance degrades
- Metrics show negative trend
- User complaints about slowness
- A/B testing shows variance
- Regular performance monitoring

## Instructions

### 1. **Detection & Measurement**

```javascript
// Before: 500ms response time
// After: 1000ms response time (2x slower = regression)

// Capture baseline metrics
const baseline = {
  responseTime: 500,  // ms
  timeToInteractive: 2000,  // ms
  largestContentfulPaint: 1500,  // ms
  memoryUsage: 50,  // MB
  bundleSize: 150  // KB gzipped
};

// Monitor after change
const current = {
  responseTime: 1000,
  timeToInteractive: 4000,
  largestContentfulPaint: 3000,
  memoryUsage: 150,
  bundleSize: 200
};

// Calculate regression
const regressions = {};
for (let metric in baseline) {
  const change = (current[metric] - baseline[metric]) / baseline[metric];
  if (change > 0.1) {  // >10% degradation
    regressions[metric] = {
      baseline: baseline[metric],
      current: current[metric],
      percentChange: (change * 100).toFixed(1) + '%',
      severity: change > 0.5 ? 'Critical' : 'High'
    };
  }
}

// Results:
// responseTime: 500ms → 1000ms (100% slower = CRITICAL)
// largestContentfulPaint: 1500ms → 3000ms (100% slower = CRITICAL)
```

### 2. **Root Cause Identification**

```yaml
Systematic Search:

Step 1: Identify Changed Code
  - Check git commits between versions
  - Review code review comments
  - Identify risky changes
  - Prioritize by likelyhood

Step 2: Binary Search (Bisect)
  - Start with suspected change
  - Disable the change
  - Re-measure performance
  - If improves → this is the issue
  - If not → disable other changes

  git bisect start
  git bisect bad HEAD
  git bisect good v1.0.0
  # Test each commit

Step 3: Profile the Change
  - Run profiler on old vs new code
  - Compare flame graphs
  - Identify expensive functions
  - Check allocation patterns

Step 4: Analyze Impact
  - Code review the change
  - Understand what changed
  - Check for O(n²) algorithms
  - Look for new database queries
  - Check for missing indexes

---

Common Regressions:

N+1 Query:
  Before: 1 query (10ms)
  After: 1000 queries (1000ms)
  Caused: Removed JOIN, now looping
  Fix: Restore JOIN or use eager loading

Missing Index:
  Before: Index Scan (10ms)
  After: Seq Scan (500ms)
  Caused: New filter column, no index
  Fix: Add index

Memory Leak:
  Before: 50MB memory
  After: 500MB after 1 hour
  Caused: Listener not removed, cache grows
  Fix: Clean up properly

Bundle Size:
  Before: 150KB gzipped
  After: 250KB gzipped
  Caused: Added library without tree-shaking
  Fix: Use lighter alternative or split

Algorithm Efficiency:
  Before: O(n) = 1ms for 1000 items
  After: O(n²) = 1000ms for 1000 items
  Caused: Nested loops added
  Fix: Use better algorithm
```

### 3. **Fixing & Verification**

```yaml
Fix Process:

1. Understand the Problem
  - Profile and identify exactly what's slow
  - Measure impact quantitatively
  - Understand root cause

2. Implement Fix
  - Make minimal changes
  - Don't introduce new issues
  - Test locally first
  - Measure improvement

3. Verify Fix
  - Run same measurement
  - Check regression gone
  - Ensure no new issues
  - Compare metrics

  Before regression: 500ms
  After regression: 1000ms
  After fix: 550ms (acceptable, minor overhead)

4. Prevent Recurrence
  - Add performance test
  - Set performance budget
  - Alert on regressions
  - Code review for perf
```

### 4. **Prevention Measures**

```yaml
Performance Testing:

Baseline Testing:
  - Establish baseline metrics
  - Record for each release
  - Track trends over time
  - Alert on degradation

Load Testing:
  - Test with realistic load
  - Measure under stress
  - Identify bottlenecks
  - Catch regressions

Performance Budgets:
  - Set max bundle size
  - Set max response time
  - Set max LCP/FCP
  - Enforce in CI/CD

Monitoring:
  - Track real user metrics
  - Alert on degradation
  - Compare releases
  - Analyze trends

---

Checklist:

[ ] Baseline metrics established
[ ] Regression detected and measured
[ ] Changed code identified
[ ] Root cause found (code, data, infra)
[ ] Fix implemented
[ ] Fix verified
[ ] No new issues introduced
[ ] Performance test added
[ ] Budget set
[ ] Monitoring updated
[ ] Team notified
[ ] Prevention measures in place
```

## Key Points

- Establish baseline metrics for comparison
- Use binary search to find culprit commits
- Profile to identify exact bottleneck
- Measure before/after fix
- Add performance regression tests
- Set and enforce performance budgets
- Monitor production metrics
- Alert on significant degradation
- Document root cause
- Prevent through code review

Overview

This skill helps identify and debug performance regressions introduced by code changes. It guides you from detection through root-cause isolation, fix implementation, verification, and prevention so you can restore baseline metrics quickly. The approach combines metric comparison, binary search across commits, and targeted profiling.

How this skill works

First, capture baseline and current metrics and flag regressions by percent change and severity. Then use a systematic search: inspect changed commits, run git bisect to narrow the culprit, and profile old vs new code to find expensive functions or allocation patterns. Finally implement minimal fixes, re-measure to verify recovery, and add regression tests and budgets to prevent recurrence.

When to use it

  • After a deployment that coincides with slower user-facing metrics
  • When monitoring shows a deteriorating trend in response time or memory
  • If users report slowness or errors after recent changes
  • During A/B experiments that reveal performance variance
  • As part of release validation when performance budgets are enforced

Best practices

  • Establish and store baseline metrics for each release to enable direct comparisons
  • Prioritize suspected changes, then binary-search (git bisect) to isolate the commit
  • Profile before and after to compare flame graphs and allocation patterns
  • Make minimal, measured changes; verify with the same tests that detected the regression
  • Add automated performance tests and enforce budgets in CI to catch regressions early

Example use cases

  • Response time doubled after a feature merge — run bisect and profile to find added O(n²) loop
  • Page LCP spikes after a library import — compare bundle sizes and tree-shaking impact
  • Memory use grows over time after a release — identify leaks via allocation profiles
  • Database queries slowed — detect N+1 queries or missing indexes and restore joins/indexes
  • CI alerts when bundle size or latency exceeds set budgets — block the release and fix

FAQ

How do I decide which metrics to collect as baseline?

Collect user-facing metrics (response time, TTI, LCP/FCP), memory, CPU, DB query counts, and bundle size. Choose metrics tied to user experience and system health.

When should I use git bisect vs. profiling first?

If the culprit is likely a recent code change among many commits, start with git bisect. If you need an immediate clue about hotspots, profile both versions in parallel to guide bisecting and prioritize commits.