home / skills / wdm0006 / python-skills / performance

performance skill

safe

This skill helps you optimize Python library performance by profiling, memory analysis, benchmarking, and applying practical optimization strategies.

npx playbooks add skill wdm0006/python-skills --skill performance

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.9 KB

---
name: optimizing-python-performance
description: Optimizes Python library performance through profiling (cProfile, PyInstrument), memory analysis (memray, tracemalloc), benchmarking (pytest-benchmark), and optimization strategies. Use when analyzing performance bottlenecks, finding memory leaks, or setting up performance regression testing.
---

# Python Performance Optimization

## Profiling Quick Start

```bash
# PyInstrument (statistical, readable output)
python -m pyinstrument script.py

# cProfile (detailed, built-in)
python -m cProfile -s cumulative script.py

# Memory profiling
pip install memray
memray run script.py
memray flamegraph memray-*.bin
```

## PyInstrument Usage

```python
from pyinstrument import Profiler

profiler = Profiler()
profiler.start()
result = my_function()
profiler.stop()
print(profiler.output_text(unicode=True, color=True))
```

## Memory Analysis

```python
import tracemalloc

tracemalloc.start()
# ... code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics('lineno')[:10]:
    print(stat)
```

## Benchmarking (pytest-benchmark)

```python
def test_encode_benchmark(benchmark):
    result = benchmark(encode, 37.7749, -122.4194)
    assert len(result) == 12
```

```bash
pytest tests/ --benchmark-only
pytest tests/ --benchmark-compare
```

## Common Optimizations

```python
# Use set for membership (O(1) vs O(n))
valid = set(items)
if item in valid: ...

# Use deque for queue operations
from collections import deque
queue = deque()
queue.popleft()  # O(1) vs list.pop(0) O(n)

# Use generators for large data
def process(items):
    for item in items:
        yield transform(item)

# Cache expensive computations
from functools import lru_cache

@lru_cache(maxsize=1000)
def expensive(x):
    return compute(x)

# String building
result = "".join(str(x) for x in items)  # Not += in loop
```

## Algorithm Complexity

| Operation | list | set | dict |
|-----------|------|-----|------|
| Lookup | O(n) | O(1) | O(1) |
| Insert | O(1) | O(1) | O(1) |
| Delete | O(n) | O(1) | O(1) |

For detailed strategies, see:
- **[PROFILING.md](PROFILING.md)** - Advanced profiling techniques
- **[BENCHMARKS.md](BENCHMARKS.md)** - CI benchmark regression testing

## Optimization Checklist

```
Before Optimizing:
- [ ] Confirm there's a real problem
- [ ] Profile to find actual bottleneck
- [ ] Establish baseline measurements

Process:
- [ ] Algorithm improvements first
- [ ] Then data structures
- [ ] Then implementation details
- [ ] Measure after each change

After:
- [ ] Add benchmarks to prevent regression
- [ ] Verify correctness unchanged
- [ ] Document why optimization needed
```

## Learn More

This skill is based on the [Performance](https://mcginniscommawill.com/guides/python-library-development/#performance-beyond-raw-speed) section of the [Guide to Developing High-Quality Python Libraries](https://mcginniscommawill.com/guides/python-library-development/) by [Will McGinnis](https://mcginniscommawill.com/).

Overview

This skill optimizes Python library performance by combining profiling, memory analysis, and benchmarking tools with practical optimization patterns. It guides you to find real bottlenecks, apply focused fixes, and add regression checks so improvements are measurable and durable. The goal is faster, leaner, and more maintainable Python libraries.

How this skill works

It runs statistical and deterministic profilers (PyInstrument, cProfile) to identify hot code paths, and uses memory tracers (memray, tracemalloc) to locate leaks and heavy allocations. It integrates pytest-benchmark for repeatable benchmarks and regression checks. The workflow emphasizes measuring before changes, applying algorithmic or data-structure fixes, and re-measuring to confirm gains.

When to use it

You suspect slow operations but don’t know the cause
A library shows unexpected memory growth or leaks
Preparing performance regressions for CI
Before publishing a performance-sensitive release
When choosing between algorithmic vs implementation changes

Best practices

Always confirm a real performance problem before optimizing; measure baseline timings and memory usage
Profile with PyInstrument for readable, high-level hotspots and cProfile for detailed call statistics
Use tracemalloc or memray to capture snapshots and flamegraphs for memory issues
Prioritize algorithmic improvements and appropriate data structures before micro-optimizations
Add pytest-benchmark tests to CI and compare results to prevent regressions
Document why and how an optimization was applied and include correctness tests

Example use cases

Find and fix a slow nested loop by switching to a more efficient algorithm or data structure
Track down a memory leak caused by lingering references using tracemalloc or memray and fix object retention
Add pytest-benchmark tests to catch regressions when refactoring core library functions
Replace repeated list membership checks with set lookups to improve lookup-heavy code
Cache expensive pure functions with functools.lru_cache to reduce repeated computation

FAQ

Which profiler should I run first?

Start with PyInstrument for a quick, readable view of hotspots. Use cProfile when you need precise call counts and cumulative timings.

How do I avoid over-optimizing?

Follow the checklist: confirm a real problem, profile to find the true bottleneck, measure gains after each change, and ensure tests still pass. Favor algorithm and data-structure changes before micro-tuning.