home / skills / odjaramillo / custom-rules / reporting-optimization
npx playbooks add skill odjaramillo/custom-rules --skill reporting-optimizationReview the files below or copy the command above to add this skill to your agents.
---
name: reporting-optimization
description: >
High Performance Reporting and Query Optimization
Trigger: When optimizing database queries for reporting.
license: Apache-2.0
metadata:
author: poletron
version: "1.0"
scope: [root]
auto_invoke: "Working with reporting optimization"
## When to Use
Use this skill when:
- Optimizing slow reporting queries
- Building dashboards with heavy aggregations
- Working with large datasets
- Implementing caching strategies
---
## Decision Tree
```
Need query optimization? → Use EXPLAIN ANALYZE
Need cached aggregations? → Use Materialized View
Need complex breakdown? → Use CTEs
Need row ranking? → Use Window Functions
Need partial coverage? → Use Partial Index
```
---
# Reporting & Optimization Guidelines
Reporting queries often scan large datasets. Inefficient queries can degrade the performance of the entire system. Follow these guidelines to ensure speed and stability.
## 1. Query Optimization
### 1.1 SELECT Efficiency
- **No `SELECT *`:** Fetching unnecessary columns increases I/O and network payload. Select only required fields.
- **SARGable Queries:** Ensure predicates (WHERE clauses) are "Search ARGument ABLE" to perform Index Seeks instead of Scans.
- *Bad:* `WHERE YEAR(created_at) = 2023` (Function on column prevents index use).
- *Good:* `WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'`.
### 1.2 CTEs vs Temporary Tables
- **Common Table Expressions (CTEs):** Use `WITH` clauses for readability and to break down complex logic. In modern PostgreSQL (12+), CTEs are materialized when beneficial, making them performant.
- **Temporary Tables:** For extremely complex multi-step processing involving heavy intermediate indexing, use `CREATE TEMP TABLE`.
### 1.3 Subqueries
- Avoid Correlated Subqueries in the `SELECT` list that run once per row. Rewrite them as `JOIN`s or `LATERAL` joins.
- *Bad:* `SELECT id, (SELECT count(*) FROM orders WHERE user_id = u.id) FROM users u`
- *Good:* `SELECT u.id, count(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id`
## 2. Aggregation Strategies
### 2.1 Materialized Views
- For dashboards requiring heavy aggregation (Count, Sum, Avg) over millions of rows, do not query the live transactional table every time.
- Use **Materialized Views** to cache the result:
```sql
CREATE MATERIALIZED VIEW mv_daily_sales AS
SELECT day, SUM(total) FROM sales GROUP BY day;
```
- Refresh explicitly: `REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_sales;`.
### 2.2 Window Functions
- Use Window Functions (`ROW_NUMBER()`, `RANK()`, `LEAD()`, `LAG()`) for complex reporting intra-row logic instead of self-joins.
## 3. Safety Guardrails
### 3.1 Timeouts
- Set `statement_timeout` for reporting roles to prevent a runaway query from locking resources indefinitely.
### 3.2 Result Limits
- Always verify the estimated cardinality. If a report could return 1M+ rows, enforce `LIMIT` or pagination logic.
## 4. Query Profiling
### 4.1 EXPLAIN ANALYZE
Before deploying complex queries, use `EXPLAIN ANALYZE` to understand the execution plan.
- Look for **Seq Scans** on large tables (potential missing index).
- Check **Actual vs Estimated Rows** for significant discrepancies (stale statistics).
- Identify **Nested Loops** with high row counts that could be rewritten as Hash Joins.
```sql
EXPLAIN ANALYZE
SELECT u.id, COUNT(o.id)
FROM USER u
LEFT JOIN ORDER o ON u.id = o.user_id
WHERE u.is_active = TRUE
GROUP BY u.id;
```
### 4.2 Partial Indexes
Create indexes that only cover a subset of rows to save space and speed up specific queries.
- *Example:* Index only active users.
```sql
CREATE INDEX idx_user_active ON USER(email) WHERE is_active = TRUE;
```
- Use for columns with low cardinality where only specific values are frequently queried.