home / skills / personamanagmentlayer / pcl / performance-expert

performance-expert skill

/stdlib/devops/performance-expert

This skill provides expert guidance on profiling, benchmarking, and tuning for fast, scalable applications across frontend, backend, and databases.

npx playbooks add skill personamanagmentlayer/pcl --skill performance-expert

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
11.6 KB
---
name: performance-expert
version: 1.0.0
description: Expert-level performance optimization, profiling, benchmarking, and tuning
category: devops
tags: [performance, optimization, profiling, benchmarking, scalability]
allowed-tools:
  - Read
  - Write
  - Edit
  - Bash(*)
---

# Performance Expert

Expert guidance for performance optimization, profiling, benchmarking, and system tuning.

## Core Concepts

### Performance Fundamentals
- Response time vs throughput
- Latency vs bandwidth
- CPU, memory, I/O bottlenecks
- Concurrency vs parallelism
- Caching strategies
- Load balancing

### Optimization Areas
- Algorithm optimization
- Database optimization
- Network optimization
- Frontend performance
- Backend performance
- Infrastructure tuning

### Profiling Tools
- CPU profilers
- Memory profilers
- Network profilers
- Application Performance Monitoring (APM)
- Load testing tools

## Python Performance

```python
import cProfile
import pstats
import timeit
import memory_profiler
from functools import lru_cache
from typing import List
import numpy as np

# Performance Profiling
def profile_function(func):
    """Decorator for profiling function execution"""
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()

        result = func(*args, **kwargs)

        profiler.disable()
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions

        return result
    return wrapper

@profile_function
def slow_function():
    total = 0
    for i in range(1000000):
        total += i
    return total

# Memoization for expensive computations
@lru_cache(maxsize=128)
def fibonacci(n: int) -> int:
    """Cached Fibonacci calculation"""
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Vectorization with NumPy
def slow_loop(data: List[float]) -> List[float]:
    """Slow: Using Python loops"""
    return [x ** 2 + 2 * x + 1 for x in data]

def fast_vectorized(data: np.ndarray) -> np.ndarray:
    """Fast: Using NumPy vectorization"""
    return data ** 2 + 2 * data + 1

# Benchmarking
def benchmark_function(func, *args, iterations=1000):
    """Benchmark function execution time"""
    total_time = timeit.timeit(
        lambda: func(*args),
        number=iterations
    )
    avg_time = total_time / iterations

    return {
        'total_time': total_time,
        'avg_time': avg_time,
        'iterations': iterations
    }

# Memory profiling
@memory_profiler.profile
def memory_intensive_function():
    """Function that uses significant memory"""
    data = [i for i in range(1000000)]
    return sum(data)

# Efficient string concatenation
def slow_string_concat(items: List[str]) -> str:
    """Slow: String concatenation in loop"""
    result = ""
    for item in items:
        result += item  # Creates new string each time
    return result

def fast_string_concat(items: List[str]) -> str:
    """Fast: Using join"""
    return "".join(items)

# Generator for memory efficiency
def slow_list_comprehension(n: int) -> List[int]:
    """Returns all squares at once"""
    return [i ** 2 for i in range(n)]

def fast_generator(n: int):
    """Yields squares one at a time"""
    for i in range(n):
        yield i ** 2
```

## Database Optimization

```python
from sqlalchemy import create_engine, Index, text
from sqlalchemy.orm import sessionmaker
import redis

# Connection pooling
engine = create_engine(
    'postgresql://user:pass@localhost/db',
    pool_size=20,
    max_overflow=0,
    pool_pre_ping=True,
    pool_recycle=3600
)

# Query optimization
class DatabaseOptimizer:
    def __init__(self, session):
        self.session = session

    def bad_n_plus_one(self):
        """N+1 query problem"""
        users = self.session.query(User).all()
        for user in users:
            posts = user.posts  # Triggers additional query per user
            print(f"{user.name}: {len(posts)} posts")

    def good_eager_loading(self):
        """Eager loading to avoid N+1"""
        from sqlalchemy.orm import joinedload

        users = self.session.query(User)\
            .options(joinedload(User.posts))\
            .all()

        for user in users:
            posts = user.posts  # No additional query
            print(f"{user.name}: {len(posts)} posts")

    def use_indexes(self):
        """Create indexes for frequent queries"""
        # Index on frequently queried columns
        Index('idx_user_email', User.email)
        Index('idx_post_created_at', Post.created_at)

        # Composite index
        Index('idx_post_user_status', Post.user_id, Post.status)

    def batch_operations(self, items: List[dict]):
        """Batch insert instead of individual inserts"""
        # Bad: Individual inserts
        # for item in items:
        #     self.session.add(User(**item))
        #     self.session.commit()

        # Good: Batch insert
        self.session.bulk_insert_mappings(User, items)
        self.session.commit()

    def use_pagination(self, page: int = 1, page_size: int = 20):
        """Paginate large result sets"""
        offset = (page - 1) * page_size

        return self.session.query(User)\
            .order_by(User.created_at.desc())\
            .limit(page_size)\
            .offset(offset)\
            .all()

# Redis caching
class CacheOptimizer:
    def __init__(self, redis_client: redis.Redis):
        self.redis = redis_client

    def cache_query_result(self, key: str, query_func, ttl: int = 3600):
        """Cache database query results"""
        # Check cache first
        cached = self.redis.get(key)
        if cached:
            return json.loads(cached)

        # Execute query
        result = query_func()

        # Cache result
        self.redis.setex(key, ttl, json.dumps(result))

        return result

    def cache_aside_pattern(self, key: str, fetch_func, ttl: int = 3600):
        """Implement cache-aside pattern"""
        data = self.redis.get(key)

        if data is None:
            data = fetch_func()
            self.redis.setex(key, ttl, json.dumps(data))
        else:
            data = json.loads(data)

        return data
```

## Frontend Performance

```javascript
// Debouncing for expensive operations
function debounce(func, wait) {
    let timeout;
    return function executedFunction(...args) {
        const later = () => {
            clearTimeout(timeout);
            func(...args);
        };
        clearTimeout(timeout);
        timeout = setTimeout(later, wait);
    };
}

// Example: Debounce search input
const searchInput = document.getElementById('search');
const debouncedSearch = debounce((query) => {
    // Expensive search operation
    fetchSearchResults(query);
}, 300);

searchInput.addEventListener('input', (e) => {
    debouncedSearch(e.target.value);
});

// Lazy loading images
const lazyLoadImages = () => {
    const images = document.querySelectorAll('img[data-src]');

    const imageObserver = new IntersectionObserver((entries) => {
        entries.forEach(entry => {
            if (entry.isIntersecting) {
                const img = entry.target;
                img.src = img.dataset.src;
                img.removeAttribute('data-src');
                imageObserver.unobserve(img);
            }
        });
    });

    images.forEach(img => imageObserver.observe(img));
};

// Virtual scrolling for large lists
class VirtualScroller {
    constructor(container, items, itemHeight) {
        this.container = container;
        this.items = items;
        this.itemHeight = itemHeight;
        this.visibleItems = Math.ceil(container.clientHeight / itemHeight);
        this.render();
    }

    render() {
        const scrollTop = this.container.scrollTop;
        const startIndex = Math.floor(scrollTop / this.itemHeight);
        const endIndex = startIndex + this.visibleItems;

        // Only render visible items
        const visibleData = this.items.slice(startIndex, endIndex);

        this.container.innerHTML = visibleData
            .map(item => `<div style="height: ${this.itemHeight}px">${item}</div>`)
            .join('');
    }
}

// Code splitting with dynamic imports
async function loadModule() {
    const module = await import('./heavy-module.js');
    module.init();
}
```

## API Performance

```python
from fastapi import FastAPI, BackgroundTasks
from fastapi.responses import StreamingResponse
import asyncio
from typing import AsyncGenerator

app = FastAPI()

# Async endpoints for I/O bound operations
@app.get("/users/{user_id}")
async def get_user(user_id: str):
    """Async endpoint for database queries"""
    user = await db.fetch_user(user_id)
    return user

# Streaming responses for large data
async def generate_large_file() -> AsyncGenerator[bytes, None]:
    """Stream large file in chunks"""
    chunk_size = 8192
    with open("large_file.csv", "rb") as f:
        while chunk := f.read(chunk_size):
            yield chunk
            await asyncio.sleep(0)  # Allow other tasks to run

@app.get("/download")
async def download_large_file():
    return StreamingResponse(
        generate_large_file(),
        media_type="text/csv"
    )

# Background tasks for long-running operations
@app.post("/process")
async def process_data(background_tasks: BackgroundTasks):
    """Offload heavy processing to background"""
    background_tasks.add_task(heavy_processing_task)
    return {"status": "processing started"}

# Connection pooling
import aiohttp

class APIClient:
    def __init__(self):
        self.session = None

    async def __aenter__(self):
        self.session = aiohttp.ClientSession(
            connector=aiohttp.TCPConnector(limit=100)
        )
        return self

    async def __aexit__(self, *args):
        await self.session.close()

    async def fetch(self, url: str):
        async with self.session.get(url) as response:
            return await response.json()
```

## Load Testing

```python
from locust import HttpUser, task, between

class PerformanceTest(HttpUser):
    wait_time = between(1, 3)

    @task(3)
    def get_users(self):
        """High frequency endpoint"""
        self.client.get("/api/users")

    @task(1)
    def create_user(self):
        """Lower frequency endpoint"""
        self.client.post("/api/users", json={
            "email": "[email protected]",
            "name": "Test User"
        })

    def on_start(self):
        """Login once per user"""
        response = self.client.post("/api/login", json={
            "username": "test",
            "password": "password"
        })
        self.token = response.json()["token"]
```

## Best Practices

### General
- Measure before optimizing
- Focus on bottlenecks
- Use appropriate data structures
- Cache expensive computations
- Minimize I/O operations
- Use connection pooling
- Implement pagination

### Database
- Create proper indexes
- Avoid N+1 queries
- Use eager loading
- Batch operations
- Optimize queries
- Use read replicas
- Implement caching

### Frontend
- Minimize bundle size
- Code splitting
- Lazy loading
- Compress assets
- Use CDN
- Optimize images
- Debounce/throttle events

### Backend
- Use async for I/O
- Implement caching
- Connection pooling
- Background processing
- Horizontal scaling
- Load balancing

## Anti-Patterns

❌ Premature optimization
❌ No profiling/measurement
❌ Optimizing wrong bottlenecks
❌ Ignoring caching
❌ Synchronous I/O operations
❌ No database indexes
❌ Loading all data at once

## Resources

- Python Performance Tips: https://wiki.python.org/moin/PythonSpeed/PerformanceTips
- Web Performance: https://web.dev/performance/
- Locust Load Testing: https://locust.io/
- py-spy Profiler: https://github.com/benfred/py-spy
- Chrome DevTools: https://developer.chrome.com/docs/devtools/

Overview

This skill provides expert-level guidance for performance optimization, profiling, benchmarking, and system tuning across application stacks. It synthesizes core concepts, practical patterns, and tool-specific advice for CPU, memory, I/O, network, frontend, backend, and database workloads. The goal is actionable steps to find bottlenecks, measure impact, and safely improve throughput and latency.

How this skill works

The skill walks you through performance fundamentals, targeted optimization areas, and concrete techniques: profiling with CPU/memory/network profilers and APMs, benchmarking and load testing, and tuning infrastructure and code. It explains where to measure (hot paths, I/O, DB queries, network) and offers patterns like caching, connection pooling, async I/O, and batching. Example snippets demonstrate profiling decorators, vectorization, query optimization, frontend lazy loading/code splitting, streaming APIs, and load test scenarios.

When to use it

  • Diagnosing slow response times or unpredictable latency in production
  • Improving throughput under load or reducing resource costs
  • Validating optimization impact with benchmarks and load tests
  • Addressing database inefficiencies like N+1 queries or missing indexes
  • Optimizing frontend load time, bundle size, or large-list rendering

Best practices

  • Measure before changing: identify true bottlenecks with profilers and metrics
  • Focus on high-impact paths: optimize the hot 10% that affects 90% of time
  • Use appropriate tools: CPU and memory profilers, APMs, and load testers
  • Prefer safe, observable changes: feature flags, benchmarks, and telemetry
  • Cache judiciously and use connection pooling to reduce I/O strain
  • Apply async I/O or background processing for long-running tasks

Example use cases

  • Profile a slow Python endpoint, fix heavy loops with vectorization or caching, verify with benchmarks
  • Eliminate N+1 queries via eager loading and add indexes to speed common queries
  • Reduce frontend time-to-interactive using code splitting, lazy-loading and debounced inputs
  • Stream large downloads with async generators and StreamingResponse to lower memory use
  • Run locust load tests to validate scaling strategy and identify contention points

FAQ

Where should I start when an application is slow?

Start by measuring: collect response-time metrics, run a profiler for CPU/memory hotspots, and check slow DB queries. Fix the highest-impact issue first and validate with benchmarks.

How do I know if caching will help?

Measure cost of computing or fetching the data, estimate cache hit rate, and test latency improvements. Use cache-aside patterns with TTLs and monitor cache eviction and staleness.