home / skills / levnikolaevich / claude-code-skills / ln-651-query-efficiency-auditor

ln-651-query-efficiency-auditor skill

/ln-651-query-efficiency-auditor

This skill audits query efficiency in codebases, identifying redundant fetches, N-UPDATE/DELETE loops, and missing bulk operations to improve performance.

npx playbooks add skill levnikolaevich/claude-code-skills --skill ln-651-query-efficiency-auditor

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.2 KB
---
name: ln-651-query-efficiency-auditor
description: "Query efficiency audit worker (L3). Checks redundant entity fetches, N-UPDATE/DELETE loops, unnecessary resolves, over-fetching, missing bulk operations, wrong caching scope. Returns findings with severity, location, effort, recommendations."
allowed-tools: Read, Grep, Glob, Bash
---

> **Paths:** File paths (`shared/`, `references/`, `../ln-*`) are relative to skills repo root. If not found at CWD, locate this SKILL.md directory and go up one level for repo root.

# Query Efficiency Auditor (L3 Worker)

Specialized worker auditing database query patterns for redundancy, inefficiency, and misuse.

## Purpose & Scope

- **Worker in ln-650 coordinator pipeline** - invoked by ln-650-persistence-performance-auditor
- Audit **query efficiency** (Priority: HIGH)
- Check redundant fetches, batch operation misuse, caching scope problems
- Return structured findings with severity, location, effort, recommendations
- Calculate compliance score (X/10) for Query Efficiency category

## Inputs (from Coordinator)

**MANDATORY READ:** Load `shared/references/task_delegation_pattern.md#audit-coordinator--worker-contract` for contextStore structure.

Receives `contextStore` with: `tech_stack`, `best_practices`, `db_config` (database type, ORM settings), `codebase_root`.

**Domain-aware:** Supports `domain_mode` + `current_domain`.

## Workflow

1) **Parse context from contextStore**
   - Extract tech_stack, best_practices, db_config
   - Determine scan_path (same logic as ln-624)

2) **Scan codebase for violations**
   - All Grep/Glob patterns use `scan_path`
   - Trace call chains for redundant fetches (requires reading caller + callee)

3) **Collect findings with severity, location, effort, recommendation**

4) **Calculate score using penalty algorithm**

5) **Return JSON result to coordinator**

## Audit Rules (Priority: HIGH)

### 1. Redundant Entity Fetch
**What:** Same entity fetched from DB twice in a call chain

**Detection:**
- Find function A that calls `repo.get(id)` or `session.get(Model, id)`, then passes `id` (not object) to function B
- Function B also calls `repo.get(id)` or `session.get(Model, id)` for the same entity
- Common pattern: `acquire_next_pending()` returns job, but `_process_job(job_id)` re-fetches it

**Detection patterns (Python/SQLAlchemy):**
- Grep for `repo.*get_by_id|session\.get\(|session\.query.*filter.*id` in service/handler files
- Trace: if function receives `entity_id: int/UUID` AND internally does `repo.get(entity_id)`, check if caller already has entity object
- Check `expire_on_commit` setting: if `False`, objects remain valid after commit

**Severity:**
- **HIGH:** Redundant fetch in API request handler (adds latency per request)
- **MEDIUM:** Redundant fetch in background job (less critical)

**Recommendation:** Pass entity object instead of ID, or remove second fetch when `expire_on_commit=False`

**Effort:** S (change signature to accept object instead of ID)

### 2. N-UPDATE/DELETE Loop
**What:** Loop of individual UPDATE/DELETE operations instead of single batch query

**Detection:**
- Pattern: `for item in items: await repo.update(item.id, ...)` or `for item in items: await repo.delete(item.id)`
- Pattern: `for item in items: session.execute(update(Model).where(...))`

**Detection patterns:**
- Grep for `for .* in .*:` followed by `repo\.(update|delete|reset|save|mark_)` within 1-3 lines
- Grep for `for .* in .*:` followed by `session\.execute\(.*update\(` within 1-3 lines

**Severity:**
- **HIGH:** Loop over >10 items (N separate round-trips to DB)
- **MEDIUM:** Loop over <=10 items

**Recommendation:** Replace with single `UPDATE ... WHERE id IN (...)` or `session.execute(update(Model).where(Model.id.in_(ids)))`

**Effort:** M (rewrite query + test)

### 3. Unnecessary Resolve
**What:** Re-resolving a value from DB when it is already available in the caller's scope

**Detection:**
- Method receives `profile_id` and resolves engine from it, but caller already determined `engine`
- Method receives `lang_code` and looks up dialect_id, but caller already has both `lang` and `dialect`
- Pattern: function receives `X_id`, does `get(X_id)`, extracts `.field`, when caller already has `field`

**Severity:**
- **MEDIUM:** Extra DB query per invocation, especially in high-frequency paths

**Recommendation:** Split method into two variants: `with_known_value(value, ...)` and `resolving_value(id, ...)`; or pass resolved value directly

**Effort:** S-M (refactor signature, update callers)

### 4. Over-Fetching
**What:** Loading full ORM model when only few fields are needed

**Detection:**
- `session.query(Model)` or `select(Model)` without `.options(load_only(...))` for models with >10 columns
- Especially in list/search endpoints that return many rows
- Pattern: loading full entity but only using 2-3 fields

**Severity:**
- **MEDIUM:** Large models (>15 columns) in list endpoints
- **LOW:** Small models (<10 columns) or single-entity endpoints

**Recommendation:** Use `load_only()`, `defer()`, or raw `select(Model.col1, Model.col2)` for list queries

**Effort:** S (add load_only to query)

### 5. Missing Bulk Operations
**What:** Sequential INSERT/DELETE/UPDATE instead of bulk operations

**Detection:**
- `for item in items: session.add(item)` instead of `session.add_all(items)`
- `for item in items: session.delete(item)` instead of bulk delete
- Pattern: loop with single `INSERT` per iteration

**Severity:**
- **MEDIUM:** Any sequential add/delete in loop (missed batch optimization)

**Recommendation:** Use `session.add_all()`, `session.execute(insert(Model).values(list_of_dicts))`, `bulk_save_objects()`

**Effort:** S (replace loop with bulk call)

### 6. Wrong Caching Scope
**What:** Request-scoped cache for data that rarely changes (should be app-scoped)

**Detection:**
- Service registered as request-scoped (e.g., via FastAPI `Depends()`) with internal cache (`_cache` dict, `_loaded` flag)
- Cache populated by expensive query (JOINs, aggregations) per each request
- Data TTL >> request duration (e.g., engine configurations, language lists, feature flags)

**Detection patterns:**
- Find classes with `_cache`, `_loaded`, `_initialized` attributes
- Check if class is created per-request (via DI registration scope)
- Compare: data change frequency vs cache lifetime

**Severity:**
- **HIGH:** Expensive query (JOINs, subqueries) cached only per-request
- **MEDIUM:** Simple query cached per-request

**Recommendation:** Move cache to app-scoped service (singleton), add TTL-based invalidation, or use CacheService with configurable TTL

**Effort:** M (change DI scope, add TTL logic)

## Scoring Algorithm

**MANDATORY READ:** Load `shared/references/audit_scoring.md` for unified scoring formula.

## Output Format

Return JSON to coordinator:

```json
{
  "category": "Query Efficiency",
  "score": 6,
  "total_issues": 8,
  "critical": 0,
  "high": 3,
  "medium": 4,
  "low": 1,
  "findings": [
    {
      "severity": "HIGH",
      "location": "app/infrastructure/messaging/job_processor.py:434",
      "issue": "Redundant entity fetch: job re-fetched by ID after acquire_next_pending already returned it",
      "principle": "Query Efficiency / DRY Data Access",
      "recommendation": "Pass job object to _process_job instead of job_id",
      "effort": "S"
    }
  ]
}
```

## Critical Rules

- **Do not auto-fix:** Report only
- **Trace call chains:** Rules 1 and 3 require reading both caller and callee
- **ORM-aware:** Check `expire_on_commit`, `autoflush`, session scope before flagging redundant fetches
- **Context-aware:** Small datasets or infrequent operations may justify simpler code
- **Exclude tests:** Do not flag test fixtures or setup code

## Definition of Done

- contextStore parsed (tech_stack, db_config, ORM settings)
- scan_path determined (domain path or codebase root)
- All 6 checks completed:
  - redundant fetch, N-UPDATE loop, unnecessary resolve, over-fetching, bulk ops, caching scope
- Findings collected with severity, location, effort, recommendation
- Score calculated
- JSON returned to coordinator

## Reference Files

- **Audit scoring formula:** `shared/references/audit_scoring.md`
- **Audit output schema:** `shared/references/audit_output_schema.md`

---
**Version:** 1.0.0
**Last Updated:** 2026-02-04

Overview

This skill audits database query patterns for redundancy, inefficiency, and misuse and returns structured findings with severity, location, effort, and concrete recommendations. It is designed to run as an L3 worker in a coordinator pipeline and produces a compliance score for the Query Efficiency category. The auditor focuses on redundant fetches, N-update/delete loops, unnecessary resolves, over-fetching, missing bulk operations, and wrong caching scope.

How this skill works

The worker parses the provided contextStore (tech_stack, best_practices, db_config, codebase_root) and determines the scan_path (domain-aware). It performs static scans and call-chain tracing to detect patterns: duplicate entity retrievals, per-item DB loops, repeated resolves, full-model loads when partial fields suffice, missing bulk APIs, and request-scoped caches for rarely changing data. Findings are collected, scored by a penalty algorithm, and returned as JSON for the coordinator.

When to use it

  • During pre-release performance audits of persistence and service layers
  • When investigating high-latency API endpoints or background job slowness
  • As part of regular architectural reviews for database access patterns
  • Before major refactors that change service boundaries or DI scopes
  • To validate ORM and session configuration impact on data access

Best practices

  • Prefer passing resolved entity objects to callees instead of IDs when available
  • Replace per-item UPDATE/DELETE loops with set-based operations or bulk APIs
  • Use selective column loads (load_only/defer/select specific columns) for list endpoints
  • Centralize caches at application scope for rarely changing data and add TTL
  • Trace caller → callee chains before flagging redundant fetches and respect ORM session settings
  • Exclude test code paths and consider dataset size/frequency before prioritizing fixes

Example use cases

  • Detect a handler that calls repo.get(id) then passes id to a worker that re-fetches the same entity
  • Flag loops performing repo.update/delete inside for-loops over >10 items
  • Identify list endpoints loading full models with >15 columns when only two fields are rendered
  • Find services that populate an in-memory cache per request for configuration data that rarely changes
  • Locate methods that re-resolve simple derived values already available to the caller

FAQ

Does the auditor auto-fix issues it finds?

No. The auditor only reports findings with recommendations; it does not modify code.

Will it flag false positives for small or infrequent datasets?

The auditor is context-aware and factors dataset size and operation frequency; it will lower severity or exclude cases that are justified.