home / skills / williamzujkowski / standards / advanced-optimization

advanced-optimization skill

unsafe

This skill helps you optimize PostgreSQL performance by selecting appropriate indexes, analyzing query plans, and applying production-ready optimization

npx playbooks add skill williamzujkowski/standards --skill advanced-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (9)

SKILL.md

17.1 KB

---
name: database-advanced-optimization
category: database
difficulty: advanced
estimated_time: 60 minutes
prerequisites:
- Basic SQL knowledge
- Database design fundamentals
- Understanding of indexes and query execution
tags:
- postgresql
- mongodb
- redis
- performance
- optimization
- scaling
- caching
- sharding
learning_objectives:
- Master database-specific optimization techniques
- Implement effective caching strategies
- Design scalable database architectures
- Optimize query performance across multiple database systems
- Configure connection pooling and monitoring
version: 1.0.0
last_updated: 2025-01-17
description: 'Use SQL (PostgreSQL) when:'
---


# Database Advanced Optimization

## Level 1: Quick Reference (5 minutes)

### Database Selection Guide

**Use SQL (PostgreSQL) when:**

- ACID compliance is critical
- Complex joins and transactions required
- Data has clear relational structure
- Strong consistency needed
- Rich query capabilities required

**Use NoSQL (MongoDB) when:**

- Flexible schema needed
- Horizontal scaling is priority
- Document-oriented data model fits
- High write throughput required
- Eventual consistency acceptable

**Use In-Memory (Redis) when:**

- Sub-millisecond latency required
- Caching layer needed
- Real-time features (pub/sub, streams)
- Session management
- Rate limiting or counters

### Common Optimization Patterns

```sql
-- PostgreSQL: Create covering index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at)
INCLUDE (status, total_amount);

-- PostgreSQL: Analyze query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE user_id = 123;
```

```javascript
// MongoDB: Create compound index
db.orders.createIndex({ userId: 1, createdAt: -1 }, { background: true });

// MongoDB: Use aggregation pipeline efficiently
db.orders.aggregate([
  { $match: { status: "pending" } },
  { $sort: { createdAt: -1 } },
  { $limit: 100 }
]);
```

```python
# Redis: Implement cache-aside pattern
def get_user(user_id):
    cache_key = f"user:{user_id}"
    user = redis.get(cache_key)

    if user is None:
        user = db.query("SELECT * FROM users WHERE id = %s", user_id)
        redis.setex(cache_key, 3600, json.dumps(user))

    return json.loads(user)
```

### Essential Optimization Checklist

#### PostgreSQL

- [ ] Create appropriate indexes (B-tree, GIN, GiST)
- [ ] Analyze query plans with EXPLAIN ANALYZE
- [ ] Configure autovacuum appropriately
- [ ] Set up connection pooling (PgBouncer)
- [ ] Monitor with pg_stat_statements
- [ ] Optimize shared_buffers and work_mem
- [ ] Configure appropriate WAL settings

#### MongoDB

- [ ] Design effective shard keys
- [ ] Create compound indexes for common queries
- [ ] Enable profiler for slow queries
- [ ] Configure replica sets for read scaling
- [ ] Optimize aggregation pipelines
- [ ] Set appropriate write concerns
- [ ] Monitor with MongoDB Compass/Atlas

#### Redis

- [ ] Implement appropriate caching strategy
- [ ] Configure maxmemory and eviction policies
- [ ] Use pipelining for bulk operations
- [ ] Set appropriate TTLs
- [ ] Monitor memory usage and hit rates
- [ ] Configure persistence (RDB/AOF) appropriately
- [ ] Use Redis Cluster for scaling

### Quick Wins

**Immediate Impact:**

1. Add indexes for frequently queried columns
2. Enable query result caching
3. Implement connection pooling
4. Add Redis cache layer for hot data
5. Optimize N+1 queries with batch loading

**Performance Monitoring:**

```bash
# PostgreSQL query stats
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC LIMIT 10;

# MongoDB profiler
db.setProfilingLevel(1, { slowms: 100 });
db.system.profile.find().sort({ ts: -1 }).limit(10);

# Redis stats
redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'
```

---

## Level 2:
>
> **📚 Full Examples**: See [REFERENCE.md](./REFERENCE.md) for complete code samples, detailed configurations, and production-ready implementations.

 Implementation Guide (45 minutes)

### PostgreSQL Advanced Optimization

#### Index Strategies

PostgreSQL supports multiple index types, each optimized for specific use cases:

**B-tree Indexes (Default)**


*See [REFERENCE.md](./REFERENCE.md#example-0) for complete implementation.*


**GIN Indexes (Full-text and Array Search)**


*See [REFERENCE.md](./REFERENCE.md#example-1) for complete implementation.*


**GiST Indexes (Geometric and Range Data)**


*See [REFERENCE.md](./REFERENCE.md#example-2) for complete implementation.*


#### Query Optimization with EXPLAIN ANALYZE


*See [REFERENCE.md](./REFERENCE.md#example-3) for complete implementation.*


**Understanding EXPLAIN Output:**


*See [REFERENCE.md](./REFERENCE.md#example-4) for complete implementation.*


**Optimization Decisions:**

- Index Scan = Good (using indexes)
- Buffers shared hit = Good (data in cache)
- Nested Loop = Appropriate for small result sets
- Hash Join = Better for large joins

#### VACUUM and Maintenance


*See [REFERENCE.md](./REFERENCE.md#example-5) for complete implementation.*


#### Connection Pooling with PgBouncer


*See [REFERENCE.md](./REFERENCE.md#example-6) for complete implementation.*


**Connection Pooling Best Practices:**

- Use `transaction` mode for most applications
- Set `default_pool_size` = (CPU cores × 2) + effective_spindle_count
- Monitor pool usage: `SHOW POOLS;` in PgBouncer console
- Application connection string: `postgresql://user:pass@pgbouncer:6432/myapp`

#### PostgreSQL Performance Tuning


*See [REFERENCE.md](./REFERENCE.md#example-7) for complete implementation.*


### MongoDB Advanced Optimization

#### Sharding Strategies

**Shard Key Selection:**


*See [REFERENCE.md](./REFERENCE.md#example-8) for complete implementation.*


**Shard Key Patterns:**

1. **Range-based Sharding** (ordered data):

```javascript
// Good for time-series data
sh.shardCollection("analytics.events", { "date": 1, "userId": 1 })

// Queries benefit from targeted routing
db.events.find({ date: ISODate("2025-01-17"), userId: 123 })
```

2. **Hash-based Sharding** (random distribution):

```javascript
// Good for even distribution, poor for range queries
sh.shardCollection("users.profiles", { "_id": "hashed" })

// All shards queried for range
db.profiles.find({ _id: { $gt: 1000, $lt: 2000 } })
```

3. **Zone Sharding** (geographic distribution):


*See [REFERENCE.md](./REFERENCE.md#example-11) for complete implementation.*


#### Index Optimization


*See [REFERENCE.md](./REFERENCE.md#example-12) for complete implementation.*


**Index Analysis:**


*See [REFERENCE.md](./REFERENCE.md#example-13) for complete implementation.*


#### Aggregation Pipeline Optimization


*See [REFERENCE.md](./REFERENCE.md#example-14) for complete implementation.*


#### Replica Set Configuration


*See [REFERENCE.md](./REFERENCE.md#example-15) for complete implementation.*


### Redis Advanced Patterns

#### Caching Strategies

**1. Cache-Aside (Lazy Loading)**


*See [REFERENCE.md](./REFERENCE.md#example-16) for complete implementation.*


**2. Write-Through Caching**


*See [REFERENCE.md](./REFERENCE.md#example-17) for complete implementation.*


**3. Write-Behind (Write-Back) Caching**


*See [REFERENCE.md](./REFERENCE.md#example-18) for complete implementation.*


**4. Read-Through Caching**


*See [REFERENCE.md](./REFERENCE.md#example-19) for complete implementation.*


#### Redis Data Structures and Use Cases


*See [REFERENCE.md](./REFERENCE.md#example-20) for complete implementation.*


#### Pub/Sub and Streams

**Pub/Sub Pattern:**


*See [REFERENCE.md](./REFERENCE.md#example-21) for complete implementation.*


**Redis Streams (Preferred for Reliable Messaging):**


*See [REFERENCE.md](./REFERENCE.md#example-22) for complete implementation.*


#### Redis Cluster Configuration


*See [REFERENCE.md](./REFERENCE.md#example-23) for complete implementation.*


**Redis Configuration (redis.conf):**


*See [REFERENCE.md](./REFERENCE.md#example-24) for complete implementation.*


### Query Optimization Techniques

#### Solving the N+1 Problem

**Bad: N+1 Queries**


*See [REFERENCE.md](./REFERENCE.md#example-25) for complete implementation.*


**Good: Join or Batch Loading**


*See [REFERENCE.md](./REFERENCE.md#example-26) for complete implementation.*


#### Database Scaling Strategies

**1. Vertical Scaling (Scale Up)**

- Increase CPU, RAM, storage
- Simple but has limits
- Good for initial growth

**2. Horizontal Scaling (Scale Out)**

**Read Replicas:**


*See [REFERENCE.md](./REFERENCE.md#example-27) for complete implementation.*


**Sharding:**


*See [REFERENCE.md](./REFERENCE.md#example-28) for complete implementation.*


#### Connection Pooling Best Practices

**HikariCP (Java):**


*See [REFERENCE.md](./REFERENCE.md#example-29) for complete implementation.*


**Python (SQLAlchemy):**


*See [REFERENCE.md](./REFERENCE.md#example-30) for complete implementation.*


**Node.js (pg-pool):**


*See [REFERENCE.md](./REFERENCE.md#example-31) for complete implementation.*


### Database Monitoring and Profiling

#### PostgreSQL Monitoring

**pg_stat_statements Extension:**


*See [REFERENCE.md](./REFERENCE.md#example-32) for complete implementation.*


**Key Metrics to Monitor:**


*See [REFERENCE.md](./REFERENCE.md#example-33) for complete implementation.*


#### MongoDB Profiling

**Enable Profiler:**


*See [REFERENCE.md](./REFERENCE.md#example-34) for complete implementation.*


**Analyze Slow Queries:**


*See [REFERENCE.md](./REFERENCE.md#example-35) for complete implementation.*


**Server Status Metrics:**


*See [REFERENCE.md](./REFERENCE.md#example-36) for complete implementation.*


#### Redis Monitoring

**INFO Command:**


*See [REFERENCE.md](./REFERENCE.md#example-37) for complete implementation.*


**Key Metrics:**


*See [REFERENCE.md](./REFERENCE.md#example-38) for complete implementation.*


**Slow Log:**


*See [REFERENCE.md](./REFERENCE.md#example-39) for complete implementation.*


### Backup and Recovery Strategies

#### PostgreSQL Backup

**Physical Backup (pg_basebackup):**

```bash
# Full backup
pg_basebackup -h localhost -U postgres -D /backup/pg_data -Fp -Xs -P

# Compressed backup
pg_basebackup -h localhost -U postgres -D /backup/pg_data -Ft -z -P
```

**Logical Backup (pg_dump):**


*See [REFERENCE.md](./REFERENCE.md#example-41) for complete implementation.*


**Restore:**


*See [REFERENCE.md](./REFERENCE.md#example-42) for complete implementation.*


**Point-in-Time Recovery (PITR):**


*See [REFERENCE.md](./REFERENCE.md#example-43) for complete implementation.*


#### MongoDB Backup

**mongodump:**


*See [REFERENCE.md](./REFERENCE.md#example-44) for complete implementation.*


**mongorestore:**


*See [REFERENCE.md](./REFERENCE.md#example-45) for complete implementation.*


**Filesystem Snapshots (Replica Set):**


*See [REFERENCE.md](./REFERENCE.md#example-46) for complete implementation.*


#### Redis Backup

**RDB (Snapshot):**


*See [REFERENCE.md](./REFERENCE.md#example-47) for complete implementation.*


**AOF (Append-Only File):**

```bash
# Enable AOF (redis.conf)
appendonly yes
appendfsync everysec

# Rewrite AOF
redis-cli BGREWRITEAOF

# Backup AOF file
cp /var/lib/redis/appendonly.aof /backup/appendonly-$(date +%Y%m%d).aof
```

**Restore:**

```bash
# Stop Redis
systemctl stop redis

# Restore RDB or AOF
cp /backup/dump.rdb /var/lib/redis/
# or
cp /backup/appendonly.aof /var/lib/redis/

# Start Redis
systemctl start redis
```

---

## Level 3: Deep Dive Resources (10 minutes)

### Official Documentation

**PostgreSQL:**

- [PostgreSQL Performance Tips](https://www.postgresql.org/docs/current/performance-tips.html)
- [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html)
- [EXPLAIN Documentation](https://www.postgresql.org/docs/current/sql-explain.html)
- [PgBouncer Documentation](https://www.pgbouncer.org/usage.html)

**MongoDB:**

- [MongoDB Performance Best Practices](https://www.mongodb.com/docs/manual/administration/analyzing-mongodb-performance/)
- [Sharding Guide](https://www.mongodb.com/docs/manual/sharding/)
- [Index Strategies](https://www.mongodb.com/docs/manual/applications/indexes/)
- [Aggregation Pipeline Optimization](https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/)

**Redis:**

- [Redis Documentation](https://redis.io/documentation)
- [Redis Cluster Tutorial](https://redis.io/docs/manual/scaling/)
- [Redis Persistence](https://redis.io/docs/manual/persistence/)
- [Redis Best Practices](https://redis.io/docs/manual/patterns/)

### Books and Courses

- "High Performance PostgreSQL" by Gregory Smith
- "PostgreSQL Query Optimization" by Henrietta Dombrovskaya
- "MongoDB: The Definitive Guide" by Shannon Bradshaw
- "Redis in Action" by Josiah Carlson
- "Designing Data-Intensive Applications" by Martin Kleppmann

### Tools and Utilities

**PostgreSQL:**

- pgAdmin 4 - Database management
- pg_top - Real-time monitoring
- pgBadger - Log analyzer
- pgtune - Configuration generator

**MongoDB:**

- MongoDB Compass - GUI client
- MongoDB Atlas - Managed service
- mongostat/mongotop - Command-line monitoring
- Studio 3T - Advanced GUI

**Redis:**

- RedisInsight - GUI client
- redis-cli - Command-line interface
- redis-stat - Real-time monitoring
- redis-rdb-tools - RDB file analysis

### Practice Exercises

1. **Index Optimization Challenge**: Given a slow query, design optimal indexes
2. **Sharding Strategy**: Design shard key for multi-tenant application
3. **Cache Implementation**: Build cache-aside pattern with TTL and invalidation
4. **Query Analysis**: Use EXPLAIN to optimize complex join queries
5. **Backup Strategy**: Design comprehensive backup/recovery plan

### Related Skills

- [Database Design Fundamentals](../design-fundamentals/SKILL.md)
- [API Performance Optimization](../../backend/api-performance/SKILL.md)
- [System Architecture Patterns](../../architecture/system-patterns/SKILL.md)
- [Monitoring and Observability](../../devops/monitoring/SKILL.md)

## Examples

### Basic Usage

```python
// TODO: Add basic example for advanced-optimization
// This example demonstrates core functionality
```

### Advanced Usage

```python
// TODO: Add advanced example for advanced-optimization
// This example shows production-ready patterns
```

### Integration Example

```python
// TODO: Add integration example showing how advanced-optimization
// works with other systems and services
```

See `examples/advanced-optimization/` for complete working examples.

## Integration Points

This skill integrates with:

### Upstream Dependencies

- **Tools**: Common development tools and frameworks
- **Prerequisites**: Basic understanding of general concepts

### Downstream Consumers

- **Applications**: Production systems requiring advanced-optimization functionality
- **CI/CD Pipelines**: Automated testing and deployment workflows
- **Monitoring Systems**: Observability and logging platforms

### Related Skills

- See other skills in this category

### Common Integration Patterns

1. **Development Workflow**: How this skill fits into daily development
2. **Production Deployment**: Integration with production systems
3. **Monitoring & Alerting**: Observability integration points

## Common Pitfalls

### Pitfall 1: Insufficient Testing

**Problem:** Not testing edge cases and error conditions leads to production bugs

**Solution:** Implement comprehensive test coverage including:

- Happy path scenarios
- Error handling and edge cases
- Integration points with external systems

**Prevention:** Enforce minimum code coverage (80%+) in CI/CD pipeline

### Pitfall 2: Hardcoded Configuration

**Problem:** Hardcoding values makes applications inflexible and environment-dependent

**Solution:** Use environment variables and configuration management:

- Separate config from code
- Use environment-specific configuration files
- Never commit secrets to version control

**Prevention:** Use tools like dotenv, config validators, and secret scanners

### Pitfall 3: Ignoring Security Best Practices

**Problem:** Security vulnerabilities from not following established security patterns

**Solution:** Follow security guidelines:

- Input validation and sanitization
- Proper authentication and authorization
- Encrypted data transmission (TLS/SSL)
- Regular security audits and updates

**Prevention:** Use security linters, SAST tools, and regular dependency updates

**Best Practices:**

- Follow established patterns and conventions for advanced-optimization
- Keep dependencies up to date and scan for vulnerabilities
- Write comprehensive documentation and inline comments
- Use linting and formatting tools consistently
- Implement proper error handling and logging
- Regular code reviews and pair programming
- Monitor production metrics and set up alerts

---

## Bundled Resources

This skill includes 6 production-ready templates and scripts:

1. **templates/postgres-optimization.sql** - PostgreSQL optimization queries and index strategies
2. **templates/mongodb-sharding.yaml** - MongoDB sharding configuration
3. **templates/redis-caching.py** - Redis caching pattern implementations
4. **scripts/query-profiling.sh** - Automated database query profiling
5. **config/pgbouncer.ini** - PgBouncer connection pooling configuration
6. **templates/monitoring-dashboard.json** - Grafana dashboard for database metrics

All resources are located in the `skills/database/advanced-optimization/` directory.

Overview

This skill teaches advanced database optimization with a focus on PostgreSQL while comparing when to use NoSQL and in-memory stores. It gives quick decision guidance, concrete optimization patterns, monitoring checks, and production-ready operational practices. Use it to make database choices, tune performance, and implement caching and scaling strategies fast.

How this skill works

The skill inspects common performance pain points and provides actionable recommendations: index strategies, EXPLAIN ANALYZE interpretation, vacuum and maintenance guidance, and connection pooling. It shows concrete SQL snippets, caching patterns (Redis cache-aside), and monitoring queries for PostgreSQL, MongoDB, and Redis. It also outlines backup and recovery commands and scaling approaches (read replicas, sharding, clustering).

When to use it

Choose PostgreSQL when ACID, complex joins, or strong consistency are required
Choose MongoDB when schema flexibility and high write throughput matter
Use Redis for sub-millisecond latency, caching, or real-time pub/sub features
Apply this guide when diagnosing slow queries or high connection counts
Use before production launches to set pooling, backups, and monitoring

Best practices

Create covering and compound indexes that match query predicates and order-by columns
Use EXPLAIN (ANALYZE, BUFFERS) to validate plan changes before applying schema tweaks
Run autovacuum tuning and periodic VACUUM/ANALYZE for high-write tables
Front-end hot-data with Redis using cache-aside and set sensible TTLs
Add PgBouncer or a pooling layer and size pools based on CPU and expected connections
Monitor pg_stat_statements, Redis INFO, and MongoDB profiler for slow-query baselines

Example use cases

Speed up a slow user-orders JOIN by adding a composite index and re-evaluating the plan
Implement cache-aside for user sessions to reduce read load on PostgreSQL
Design shard keys and targeted queries for a time-series collection in MongoDB
Setup PgBouncer in transaction mode and tune pool sizes for a web application
Create regular PITR backups and test restores for production PostgreSQL

FAQ

How quickly can I see performance gains?

Small changes like adding missing indexes or enabling pooling often show improvements within minutes; structural changes and tuning require testing with EXPLAIN and monitoring over days.

When should I shard vs use read replicas?

Use read replicas first for read scaling. Sharding is appropriate when a single node cannot handle write or storage needs or when you need scale-out for write-heavy workloads.