home / skills / nickcrew / claude-cortex / python-performance-optimization

python-performance-optimization skill

/skills/python-performance-optimization

This skill helps you profile and optimize Python code to reduce latency and resource usage through proven techniques.

npx playbooks add skill nickcrew/claude-cortex --skill python-performance-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
6.6 KB
---
name: python-performance-optimization
description: Python performance optimization patterns using profiling, algorithmic improvements, and acceleration techniques. Use when optimizing slow Python code, reducing memory usage, or improving application throughput and latency.
---

# Python Performance Optimization

Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.

## When to Use This Skill

- Code runs too slowly for production requirements
- High CPU usage or memory consumption issues
- Need to reduce API response times or batch processing duration
- Application fails to scale under load
- Optimizing data processing pipelines or scientific computing
- Reducing cloud infrastructure costs through efficiency gains
- Profile-guided optimization after measuring performance bottlenecks

## Core Concepts

**The Golden Rule**: Never optimize without profiling first. 80% of execution time is spent in 20% of code.

**Optimization Hierarchy** (in priority order):
1. **Algorithm complexity** - O(n²) → O(n log n) provides exponential gains
2. **Data structure choice** - List → Set for lookups (10,000x faster)
3. **Language features** - Comprehensions, built-ins, generators
4. **Caching** - Memoization for repeated calculations
5. **Compiled extensions** - NumPy, Numba, Cython for hot paths
6. **Parallelism** - Multiprocessing for CPU-bound work

**Key Principle**: Algorithmic improvements beat micro-optimizations every time.

## Quick Reference

Load detailed guides for specific optimization areas:

| Task | Load reference |
| --- | --- |
| Profile code and find bottlenecks | `skills/python-performance-optimization/references/profiling.md` |
| Algorithm and data structure optimization | `skills/python-performance-optimization/references/algorithms.md` |
| Memory optimization and generators | `skills/python-performance-optimization/references/memory.md` |
| String concatenation and file I/O | `skills/python-performance-optimization/references/string-io.md` |
| NumPy, Numba, Cython, multiprocessing | `skills/python-performance-optimization/references/acceleration.md` |

## Optimization Workflow

### Phase 1: Measure
1. **Profile with cProfile** - Identify slow functions
2. **Line profile hot paths** - Find exact slow lines
3. **Memory profile** - Check for memory bottlenecks
4. **Benchmark baseline** - Record current performance

### Phase 2: Analyze
1. **Check algorithm complexity** - Is it O(n²) or worse?
2. **Evaluate data structures** - Are you using lists for lookups?
3. **Identify repeated work** - Can results be cached?
4. **Find I/O bottlenecks** - Database queries, file operations

### Phase 3: Optimize
1. **Improve algorithms first** - Biggest impact
2. **Use appropriate data structures** - Set/dict for O(1) lookups
3. **Apply caching** - `@lru_cache` for expensive functions
4. **Use generators** - For large datasets
5. **Leverage NumPy/Numba** - For numerical code
6. **Parallelize** - Multiprocessing for CPU-bound tasks

### Phase 4: Validate
1. **Re-profile** - Verify improvements
2. **Benchmark** - Measure speedup quantitatively
3. **Test correctness** - Ensure optimizations didn't break functionality
4. **Document** - Explain why optimization was needed

## Common Optimization Patterns

### Pattern 1: Replace List with Set for Lookups
```python
# Slow: O(n) lookup
if item in large_list:  # Bad

# Fast: O(1) lookup
if item in large_set:   # Good
```

### Pattern 2: Use Comprehensions
```python
# Slower
result = []
for i in range(n):
    result.append(i * 2)

# Faster (35% speedup)
result = [i * 2 for i in range(n)]
```

### Pattern 3: Cache Expensive Calculations
```python
from functools import lru_cache

@lru_cache(maxsize=None)
def expensive_function(n):
    # Result cached automatically
    return complex_calculation(n)
```

### Pattern 4: Use Generators for Large Data
```python
# Memory inefficient
def read_file(path):
    return [line for line in open(path)]  # Loads entire file

# Memory efficient
def read_file(path):
    for line in open(path):  # Streams line by line
        yield line.strip()
```

### Pattern 5: Vectorize with NumPy
```python
# Pure Python: ~500ms
result = sum(i**2 for i in range(1000000))

# NumPy: ~5ms (100x faster)
import numpy as np
result = np.sum(np.arange(1000000)**2)
```

## Common Mistakes to Avoid

1. **Optimizing before profiling** - You'll optimize the wrong code
2. **Using lists for membership tests** - Use sets/dicts instead
3. **String concatenation in loops** - Use `"".join()` or `StringIO`
4. **Loading entire files into memory** - Use generators
5. **N+1 database queries** - Use JOINs or batch queries
6. **Ignoring built-in functions** - They're C-optimized and fast
7. **Premature optimization** - Focus on algorithmic improvements first
8. **Not benchmarking** - Always measure improvements quantitatively

## Decision Tree

**Start here**: Profile with cProfile to find bottlenecks

**Hot path is algorithm?**
- Yes → Check complexity, improve algorithm, use better data structures
- No → Continue

**Hot path is computation?**
- Numerical loops → NumPy or Numba
- CPU-bound → Multiprocessing
- Already fast enough → Done

**Hot path is memory?**
- Large data → Generators, streaming
- Many objects → `__slots__`, object pooling
- Caching needed → `@lru_cache` or custom cache

**Hot path is I/O?**
- Database → Batch queries, indexes, connection pooling
- Files → Buffering, streaming
- Network → Async I/O, request batching

## Best Practices

1. **Profile before optimizing** - Measure to find real bottlenecks
2. **Optimize algorithms first** - O(n²) → O(n) beats micro-optimizations
3. **Use appropriate data structures** - Set/dict for lookups, not lists
4. **Leverage built-ins** - C-implemented built-ins are faster than pure Python
5. **Avoid premature optimization** - Optimize hot paths identified by profiling
6. **Use generators for large data** - Reduce memory usage with lazy evaluation
7. **Batch operations** - Minimize overhead from syscalls and network requests
8. **Cache expensive computations** - Use `@lru_cache` or custom caching
9. **Consider NumPy/Numba** - Vectorization and JIT for numerical code
10. **Parallelize CPU-bound work** - Use multiprocessing to utilize all cores

## Resources

- **Python Performance**: https://wiki.python.org/moin/PythonSpeed
- **cProfile**: https://docs.python.org/3/library/profile.html
- **NumPy**: https://numpy.org/doc/stable/user/absolute_beginners.html
- **Numba**: https://numba.pydata.org/
- **Cython**: https://cython.readthedocs.io/
- **High Performance Python** (Book by Gorelick & Ozsvald)

Overview

This skill provides practical patterns and a workflow for profiling and optimizing Python applications to reduce latency, lower memory usage, and improve throughput. It emphasizes measurement-first optimization, algorithmic improvements, and targeted acceleration using libraries like NumPy, Numba, and multiprocessing. The guidance is focused on repeatable steps and high-impact changes rather than micro-optimizations.

How this skill works

The skill guides you through a four-phase workflow: measure (cProfile, line and memory profilers, benchmarking), analyze (complexity, data-structure choices, I/O hotspots), optimize (algorithm changes, caching, generators, vectorization, parallelism), and validate (re-profile and benchmark). It identifies hot paths, recommends replacement patterns (e.g., list→set, comprehensions, @lru_cache), and suggests when to adopt compiled extensions or parallelism.

When to use it

  • Application response times or throughput are below requirements
  • High CPU usage or uncontrolled memory growth is observed
  • Data pipelines or scientific code run too slowly to scale
  • Cloud costs are high due to inefficient code
  • After profiling reveals a clear bottleneck to optimize

Best practices

  • Always profile before changing code; target the 20% of code that consumes 80% of time
  • Fix algorithmic complexity before micro-optimizing (O(n²) → O(n log n) or O(n))
  • Choose the right data structures: use set/dict for membership and fast lookups
  • Use generators and streaming to reduce peak memory usage
  • Leverage C-optimized libraries (NumPy) or JIT (Numba) for numeric hotspots
  • Re-profile and benchmark after each change and keep tests to ensure correctness

Example use cases

  • Speeding up a batch ETL job by replacing quadratic routines with linear algorithms and streaming data
  • Reducing API latency by moving expensive repeated computations behind @lru_cache
  • Accelerating numerical simulations by vectorizing loops with NumPy or applying Numba JIT to hot functions
  • Scaling CPU-bound tasks across cores with multiprocessing or job queues
  • Lowering memory footprint of file processing by converting list-based reads to generator-based streaming

FAQ

What profiler should I start with?

Begin with cProfile to find slow functions, then use line_profiler for line-level hotspots and a memory profiler for allocation issues.

When should I use NumPy vs Numba vs multiprocessing?

Use NumPy to vectorize array computations; use Numba for JIT-compiling tight numerical loops that can’t be vectorized; use multiprocessing for coarse-grained CPU-bound tasks that parallelize across data partitions.