home / skills / odjaramillo / custom-rules / reporting-optimization

reporting-optimization skill

not checked

/packages/cli/.agent/skills/reporting-optimization

npx playbooks add skill odjaramillo/custom-rules --skill reporting-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

3.8 KB

---
name: reporting-optimization
description: >
  High Performance Reporting and Query Optimization
  Trigger: When optimizing database queries for reporting.
license: Apache-2.0
metadata:
  author: poletron
  version: "1.0"
  scope: [root]
  auto_invoke: "Working with reporting optimization"

## When to Use

Use this skill when:
- Optimizing slow reporting queries
- Building dashboards with heavy aggregations
- Working with large datasets
- Implementing caching strategies

---

## Decision Tree

```
Need query optimization?   → Use EXPLAIN ANALYZE
Need cached aggregations?  → Use Materialized View
Need complex breakdown?    → Use CTEs
Need row ranking?          → Use Window Functions
Need partial coverage?     → Use Partial Index
```

---

# Reporting & Optimization Guidelines

Reporting queries often scan large datasets. Inefficient queries can degrade the performance of the entire system. Follow these guidelines to ensure speed and stability.

## 1. Query Optimization

### 1.1 SELECT Efficiency
- **No `SELECT *`:** Fetching unnecessary columns increases I/O and network payload. Select only required fields.
- **SARGable Queries:** Ensure predicates (WHERE clauses) are "Search ARGument ABLE" to perform Index Seeks instead of Scans.
    - *Bad:* `WHERE YEAR(created_at) = 2023` (Function on column prevents index use).
    - *Good:* `WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'`.

### 1.2 CTEs vs Temporary Tables
- **Common Table Expressions (CTEs):** Use `WITH` clauses for readability and to break down complex logic. In modern PostgreSQL (12+), CTEs are materialized when beneficial, making them performant.
- **Temporary Tables:** For extremely complex multi-step processing involving heavy intermediate indexing, use `CREATE TEMP TABLE`.

### 1.3 Subqueries
- Avoid Correlated Subqueries in the `SELECT` list that run once per row. Rewrite them as `JOIN`s or `LATERAL` joins.
    - *Bad:* `SELECT id, (SELECT count(*) FROM orders WHERE user_id = u.id) FROM users u`
    - *Good:* `SELECT u.id, count(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id`

## 2. Aggregation Strategies

### 2.1 Materialized Views
- For dashboards requiring heavy aggregation (Count, Sum, Avg) over millions of rows, do not query the live transactional table every time.
- Use **Materialized Views** to cache the result:
    ```sql
    CREATE MATERIALIZED VIEW mv_daily_sales AS
    SELECT day, SUM(total) FROM sales GROUP BY day;
    ```
- Refresh explicitly: `REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_sales;`.

### 2.2 Window Functions
- Use Window Functions (`ROW_NUMBER()`, `RANK()`, `LEAD()`, `LAG()`) for complex reporting intra-row logic instead of self-joins.

## 3. Safety Guardrails

### 3.1 Timeouts
- Set `statement_timeout` for reporting roles to prevent a runaway query from locking resources indefinitely.

### 3.2 Result Limits
- Always verify the estimated cardinality. If a report could return 1M+ rows, enforce `LIMIT` or pagination logic.

## 4. Query Profiling

### 4.1 EXPLAIN ANALYZE
Before deploying complex queries, use `EXPLAIN ANALYZE` to understand the execution plan.
- Look for **Seq Scans** on large tables (potential missing index).
- Check **Actual vs Estimated Rows** for significant discrepancies (stale statistics).
- Identify **Nested Loops** with high row counts that could be rewritten as Hash Joins.

```sql
EXPLAIN ANALYZE
SELECT u.id, COUNT(o.id)
FROM USER u
LEFT JOIN ORDER o ON u.id = o.user_id
WHERE u.is_active = TRUE
GROUP BY u.id;
```

### 4.2 Partial Indexes
Create indexes that only cover a subset of rows to save space and speed up specific queries.
- *Example:* Index only active users.
    ```sql
    CREATE INDEX idx_user_active ON USER(email) WHERE is_active = TRUE;
    ```
- Use for columns with low cardinality where only specific values are frequently queried.