home / skills / williamzujkowski / cognitive-toolworks / database-optimization-analyzer

database-optimization-analyzer skill

Q: Do recommendations include implementation code?

Yes — every actionable index or rewrite includes concrete DDL/commands and notes about locking or concurrency (e.g., CONCURRENTLY for PostgreSQL).

safe

/skills/database-optimization-analyzer

This skill analyzes database queries to optimize performance through quick wins and deeper schema and index improvements for scalable workloads.

npx playbooks add skill williamzujkowski/cognitive-toolworks --skill database-optimization-analyzer

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

14.1 KB

---
name: Database Optimization Analyzer
slug: database-optimization-analyzer
description: Optimize SQL and NoSQL databases with schema design, query performance tuning, and indexing strategies.
capabilities:
  - Analyze query execution plans (EXPLAIN/EXPLAIN ANALYZE)
  - Recommend indexes for SQL and NoSQL databases
  - Identify N+1 queries and query anti-patterns
  - Suggest denormalization strategies for NoSQL
  - Optimize schema design for performance
  - Detect missing indexes and unused indexes
  - Analyze database-specific optimization opportunities
inputs:
  - query: SQL or NoSQL query text
  - database_type: postgresql, mysql, mongodb, redis, or other
  - execution_plan: optional EXPLAIN output
  - schema: optional table/collection schema definition
  - performance_metrics: optional query execution time, rows scanned
outputs:
  - optimization_recommendations: array of actionable improvements
  - index_suggestions: specific index definitions
  - query_rewrites: optimized query alternatives
  - estimated_impact: performance improvement estimates
  - implementation_priority: high/medium/low
keywords:
  - database
  - performance
  - optimization
  - indexing
  - query-tuning
  - SQL
  - NoSQL
  - PostgreSQL
  - MySQL
  - MongoDB
  - Redis
version: 1.0.0
owner: cognitive-toolworks
license: MIT
security: No credentials required; analyzes query patterns only
links:
  - https://www.postgresql.org/docs/current/performance-tips.html
  - https://dev.mysql.com/doc/refman/8.0/en/optimization.html
  - https://www.mongodb.com/docs/manual/core/query-optimization/
  - https://redis.io/docs/latest/develop/use/patterns/
  - https://use-the-index-luke.com/
---

## Purpose & When-To-Use

Use this skill when:
- Slow query identified (execution time >100ms for OLTP, >5s for OLAP)
- Database performance degradation observed
- Schema design review needed before production deployment
- Query optimization required for scalability
- Index strategy needs validation
- Database migration planning (SQL to NoSQL or vice versa)
- Performance regression analysis after schema changes

Do not use for:
- Database administration tasks (backup, replication, failover)
- Database installation or configuration
- Security hardening (use security-assessment-framework)
- Data modeling or ER diagram generation

## Pre-Checks

1. **Time Normalization**: `NOW_ET = 2025-10-25T22:10:45-04:00` (NIST/time.gov semantics)
2. **Input Validation**:
   - Database type must be one of: postgresql, mysql, mongodb, redis, or specify 'other'
   - Query text must be non-empty string
   - If execution_plan provided, must match database type format
3. **Source Freshness**:
   - PostgreSQL docs accessed: 2025-10-25T22:10:45-04:00
   - MySQL docs accessed: 2025-10-25T22:10:45-04:00
   - MongoDB docs accessed: 2025-10-25T22:10:45-04:00
   - Redis docs accessed: 2025-10-25T22:10:45-04:00
   - use-the-index-luke.com accessed: 2025-10-25T22:10:45-04:00

## Procedure

### T1: Fast Path (≤2k tokens, 80% common cases)

**Scope**: Quick wins for common query anti-patterns without deep execution plan analysis.

**Steps**:
1. **Detect Database Type**: Parse query syntax or use provided database_type
2. **Quick Pattern Scan**:
   - SELECT * usage (recommend explicit columns)
   - Missing WHERE clause on large tables
   - LIKE with leading wildcard (%pattern)
   - Functions on indexed columns (e.g., WHERE YEAR(date_col) = 2025)
   - Cartesian joins (missing JOIN conditions)
   - Subqueries that could be JOINs
3. **Index Hints** (generic):
   - Columns in WHERE, JOIN ON, ORDER BY, GROUP BY clauses
   - Composite index recommendations based on query predicates
4. **Output**: 3-5 actionable quick wins with implementation code

**Abort Conditions**:
- Query syntax invalid for specified database type
- Performance metrics contradict issue (query already fast)

### T2: Extended Analysis (≤6k tokens, deep optimization)

**Scope**: Detailed execution plan analysis, schema review, and database-specific optimizations.

**Steps**:
1. **Execution Plan Analysis** (if provided):
   - **PostgreSQL**: Parse EXPLAIN ANALYZE output
     - Identify Seq Scan on large tables → recommend index
     - Nested Loop with high row estimates → consider Hash Join
     - Sort operations → evaluate index for ORDER BY
     - High cost nodes → focus optimization effort
   - **MySQL**: Parse EXPLAIN output
     - type=ALL (full table scan) → add index
     - Extra='Using filesort' → index for ORDER BY
     - Extra='Using temporary' → optimize GROUP BY
     - rows examined vs rows returned ratio
   - **MongoDB**: Parse explain() output
     - COLLSCAN (collection scan) → create index
     - IXSCAN with high nReturned/totalKeysExamined → index selectivity issue
     - SORT stage → compound index with sort fields
   - **Redis**: Analyze command patterns
     - KEYS command → use SCAN for large keyspaces
     - Large SET/LIST operations → consider pipelining
     - Missing TTL on ephemeral data

2. **Schema Design Review**:
   - **Normalization Issues**:
     - Over-normalization causing excessive JOINs → selective denormalization
     - Under-normalization causing data redundancy → normalize
   - **Data Types**:
     - Oversized VARCHAR → right-size
     - Using TEXT for short strings → use VARCHAR
     - Numeric precision mismatches
   - **Partitioning Opportunities**:
     - Tables >10M rows → consider partitioning by date/range
     - Time-series data → partition by month/quarter

3. **Database-Specific Optimizations**:
   - **PostgreSQL**:
     - Partial indexes for filtered queries
     - GIN/GiST indexes for full-text search, JSON
     - Covering indexes to avoid table lookups
     - Materialized views for complex aggregations
     - (Source: https://www.postgresql.org/docs/current/indexes-types.html, accessed 2025-10-25T22:10:45-04:00)
   - **MySQL**:
     - InnoDB buffer pool sizing
     - Query cache considerations (deprecated in 8.0)
     - Index merge optimization
     - (Source: https://dev.mysql.com/doc/refman/8.0/en/optimization-indexes.html, accessed 2025-10-25T22:10:45-04:00)
   - **MongoDB**:
     - ESR rule (Equality, Sort, Range) for compound indexes
     - Index intersection vs compound index trade-offs
     - Aggregation pipeline optimization ($match early, $project to reduce data)
     - (Source: https://www.mongodb.com/docs/manual/core/index-compound/, accessed 2025-10-25T22:10:45-04:00)
   - **Redis**:
     - Hash vs String for structured data
     - Sorted Sets for ranking/leaderboards
     - Pub/Sub vs Streams for messaging
     - (Source: https://redis.io/docs/latest/develop/data-types/, accessed 2025-10-25T22:10:45-04:00)

4. **N+1 Query Detection**:
   - Identify repeated queries with different parameters in application logs
   - Recommend batch loading (IN clause, JOIN, or eager loading)

5. **Output**: Detailed optimization report with:
   - Prioritized recommendations (high/medium/low impact)
   - Specific DDL statements for indexes
   - Query rewrites with before/after comparison
   - Estimated performance improvement (based on row reduction, index usage)

**Decision Rules**:
- If execution plan shows >80% time in Seq Scan → HIGH priority index recommendation
- If query returns <10% of scanned rows → index selectivity issue or missing predicate
- If schema has >5 levels of normalization → evaluate denormalization candidates
- If MongoDB query scans >10k documents to return <100 → create compound index

### T3: Deep Dive (≤12k tokens, comprehensive analysis)

**Scope**: Multi-query optimization, workload analysis, and migration recommendations.

**Steps**:
1. **Workload Pattern Analysis**:
   - Analyze multiple queries as a workload
   - Identify overlapping index opportunities
   - Detect conflicting index requirements (OLTP vs OLAP)
   - Recommend read replicas or caching for read-heavy workloads

2. **Advanced Index Strategies**:
   - Covering indexes to eliminate table lookups
   - Partial/filtered indexes to reduce index size
   - Index-only scans (PostgreSQL) or loose index scans (MySQL)
   - Multi-column index order optimization

3. **Migration Guidance**:
   - SQL to NoSQL candidates (high read/write ratio, flexible schema)
   - NoSQL to SQL candidates (complex transactions, strong consistency)
   - Polyglot persistence recommendations (right database for right use case)

4. **Benchmarking Recommendations**:
   - Generate pgbench (PostgreSQL), sysbench (MySQL), or YCSB (NoSQL) test scenarios
   - Define success metrics (latency p95, throughput)

5. **Output**: Comprehensive optimization strategy with implementation roadmap

**Abort Conditions**:
- Workload too diverse to analyze coherently (>10 distinct query patterns)
- Insufficient data to make reliable recommendations

## Decision Rules

1. **Index Creation Threshold**:
   - Recommend index if query scans >1000 rows to return <100 rows
   - Skip index if table <10k rows (full scan acceptable)
   - Avoid index if column cardinality <5% (low selectivity)

2. **Denormalization Threshold**:
   - Consider if query requires >3 JOINs
   - Evaluate if read:write ratio >10:1
   - Reject if data consistency critical (financial transactions)

3. **Database Type Recommendations**:
   - PostgreSQL: Complex queries, JSONB, full-text search, geospatial
   - MySQL: Simple OLTP, strong ecosystem, MariaDB compatibility
   - MongoDB: Flexible schema, horizontal scaling, document model
   - Redis: Caching, session store, real-time analytics, pub/sub

4. **Ambiguity Handling**:
   - If execution plan unclear → request EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN FORMAT=JSON (MySQL)
   - If schema unknown → request schema dump or information_schema query
   - If performance acceptable despite anti-patterns → note improvement potential but low priority

## Output Contract

**Schema** (JSON):
```json
{
  "analysis_timestamp": "ISO-8601 datetime",
  "database_type": "string (postgresql|mysql|mongodb|redis|other)",
  "query_analyzed": "string",
  "findings": [
    {
      "type": "string (index|query_rewrite|schema_change|configuration)",
      "severity": "string (high|medium|low)",
      "description": "string",
      "recommendation": "string",
      "implementation_code": "string (DDL or config)",
      "estimated_impact": "string (e.g., '80% reduction in rows scanned')"
    }
  ],
  "index_recommendations": [
    {
      "table_or_collection": "string",
      "index_definition": "string (CREATE INDEX ... or db.collection.createIndex(...))",
      "rationale": "string",
      "estimated_size_mb": "number (optional)"
    }
  ],
  "query_rewrites": [
    {
      "original_query": "string",
      "optimized_query": "string",
      "improvement_rationale": "string"
    }
  ],
  "next_steps": ["string array of actionable items"],
  "sources_consulted": ["string array of URLs with access dates"]
}
```

**Required Fields**:
- `analysis_timestamp`, `database_type`, `findings` (must have ≥1 finding or explain why no issues found)
- If T2/T3: `sources_consulted` must include 2-4 authoritative sources

## Examples

**Example: PostgreSQL Slow Query Optimization**

Input:
```json
{
  "database_type": "postgresql",
  "query": "SELECT * FROM orders WHERE customer_id = 12345 AND status = 'pending' ORDER BY created_at DESC",
  "performance_metrics": {
    "execution_time_ms": 2300,
    "rows_scanned": 450000,
    "rows_returned": 15
  }
}
```

Output (T1):
```json
{
  "findings": [
    {"type": "query_rewrite", "severity": "medium", "description": "SELECT * loads unnecessary columns", "recommendation": "Use explicit column list"},
    {"type": "index", "severity": "high", "description": "Seq Scan on orders (450k rows scanned for 15 returned)", "recommendation": "Create composite index on (customer_id, status, created_at)"}
  ],
  "index_recommendations": [
    {"table_or_collection": "orders", "index_definition": "CREATE INDEX idx_orders_customer_status_created ON orders(customer_id, status, created_at DESC);", "rationale": "Covers WHERE predicates and ORDER BY"}
  ]
}
```

## Quality Gates

1. **Token Budget**:
   - T1: ≤2000 tokens (quick pattern scan, generic index hints)
   - T2: ≤6000 tokens (execution plan analysis, database-specific optimizations)
   - T3: ≤12000 tokens (workload analysis, migration guidance)
   - Abort if approaching limit without actionable output

2. **Safety**:
   - Never recommend DROP INDEX without analyzing query workload impact
   - Warn about locking implications of CREATE INDEX on large tables (suggest CONCURRENTLY for PostgreSQL)
   - Flag irreversible operations (denormalization, data type changes)

3. **Auditability**:
   - Every index recommendation must cite specific query pattern
   - Execution plan findings must reference specific nodes/stages
   - Estimated impact must show calculation method

4. **Determinism**:
   - Same query + execution plan → same recommendations
   - Randomness only in example data generation, not analysis logic

5. **Source Quality**:
   - T2+ must cite 2-4 official database documentation URLs with access dates
   - use-the-index-luke.com for general indexing principles (accessed 2025-10-25T22:10:45-04:00): https://use-the-index-luke.com/

## Resources

**PostgreSQL**:
- Official Performance Tips: https://www.postgresql.org/docs/current/performance-tips.html
- Index Types: https://www.postgresql.org/docs/current/indexes-types.html
- EXPLAIN Documentation: https://www.postgresql.org/docs/current/sql-explain.html

**MySQL**:
- Optimization Guide: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
- Index Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization-indexes.html
- EXPLAIN Format: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html

**MongoDB**:
- Query Optimization: https://www.mongodb.com/docs/manual/core/query-optimization/
- Compound Indexes: https://www.mongodb.com/docs/manual/core/index-compound/
- Explain Plans: https://www.mongodb.com/docs/manual/reference/method/cursor.explain/

**Redis**:
- Data Types: https://redis.io/docs/latest/develop/data-types/
- Patterns: https://redis.io/docs/latest/develop/use/patterns/
- Best Practices: https://redis.io/docs/latest/develop/use/optimization/

**General**:
- use-the-index-luke.com: SQL indexing fundamentals (database-agnostic)
- See `resources/` directory for query templates and index strategy guides

Overview

This skill analyzes and optimizes SQL and NoSQL databases for schema design, query performance, and indexing strategies. I provide quick actionable fixes for common anti-patterns and deeper execution-plan driven recommendations for PostgreSQL, MySQL, MongoDB, and Redis. Outputs include prioritized findings, concrete DDL/commands, and an implementation roadmap.

How this skill works

I start with input validation and detect database type from the query or explicit parameter. For fast-path checks I scan for common anti-patterns (SELECT *, missing WHERE, leading wildcards, functions on indexed columns, cartesian joins) and produce 3–5 quick wins with code. For extended analysis I parse EXPLAIN/EXPLAIN ANALYZE formats, review schema and data types, evaluate index selectivity, and produce prioritized index, query-rewrite, and schema recommendations with estimated impact and authoritative sources.

When to use it

Slow query identified (OLTP >100ms, OLAP >5s)
Performance regression after schema changes or deployment
Schema review before production or migration planning
When index strategy needs validation or selectivity issues appear
When multiple related queries suggest overlapping index opportunities

Best practices

Provide execution plan output and basic schema (information_schema or collection sample) for T2/T3 analyses
Prefer non-blocking index creation (CONCURRENTLY in PostgreSQL) on large tables
Right-size data types and avoid SELECT * in production queries
Use composite/partial indexes to cover WHERE + ORDER BY patterns where appropriate
Measure before and after: capture rows examined, execution_time_ms, and p95 latency for benchmarks

Example use cases

T1 quick fix: replace SELECT * and add composite index for WHERE+ORDER BY to reduce scanned rows
T2 execution-plan driven: parse EXPLAIN ANALYZE to detect sequential scans and recommend CREATE INDEX with DDL and impact estimate
T3 workload analysis: analyze multiple queries to propose shared covering indexes and partitioning for large tables
MongoDB: recommend compound index ordering per ESR rule and adjust aggregation pipeline ($match early)
Redis: replace KEYS with SCAN and suggest pipelining or TTLs for ephemeral data

FAQ

What inputs produce the best results?

Provide database_type, the exact query, execution plan (EXPLAIN/ANALYZE), basic schema or sample rows, and performance metrics (rows_scanned, rows_returned, execution_time_ms).

When will you recommend creating an index?

I recommend an index when a query scans >1000 rows to return <100 rows, or EXPLAIN shows full table/collection scans on large datasets; I skip indexes on tables <10k rows or very low cardinality columns.

Do recommendations include implementation code?

Yes — every actionable index or rewrite includes concrete DDL/commands and notes about locking or concurrency (e.g., CONCURRENTLY for PostgreSQL).