home / skills / gigaverse-app / skillet / code-review

This skill helps you perform code review across diffs and PRs, focusing on architecture and cross-file patterns to improve quality.

npx playbooks add skill gigaverse-app/skillet --skill code-review

Review the files below or copy the command above to add this skill to your agents.

Files (8)
SKILL.md
5.7 KB
---
name: code-review
description: Use when asked to "review code", "review PR", "review diff", "check this PR", "do a code review", or mentions "code review", "PR review", "full diff", "cross-file patterns", "code quality", "review changes".
---

# Code Review Best Practices

## Core Philosophy

**Review at PR level, not file-by-file. Focus on egregious structural issues, not minutia. Look for cross-file patterns that indicate architectural problems.**

## Quick Start

```bash
# ALWAYS filter out lock files first (often 10k+ lines of noise)
git diff main...HEAD -- . ':!uv.lock' ':!poetry.lock' > /tmp/pr-diff.txt
wc -lc /tmp/pr-diff.txt  # Most diffs fit in 256KB after this

# Check size - if under 256KB, analyze the whole thing
# If still too big, filter more
git diff main...HEAD -- . ':!uv.lock' ':!docs/*' > /tmp/code-diff.txt
```

## What to Look For - Egregious Issues Only

### Quick Checklist
- [ ] **Code duplication** - Same logic in 2+ places?
- [ ] **Repeated patterns** - Same code structure 3+ times?
- [ ] **God functions** - Functions over 100 lines?
- [ ] **Weak types** - Any, object, raw dict/list, missing annotations?
- [ ] **Type-deficient patterns** - hasattr, getattr instead of proper types?
- [ ] **Meaningless tests** - Tests that just verify assignment?
- [ ] **Dead code** - Unused functions, commented code?

### What NOT to Flag
- Variable naming (unless truly confusing)
- Line length
- Comment style
- Whitespace
- Import order

These are auto-fixable or minor. Focus on structural problems.

## Critical Rules Quick Reference

```python
# Flag: Weak types
def process(data: Any) -> dict:  # Any, raw dict
def process(data: UserData) -> UserResponse:  # Specific types

# Flag: hasattr/getattr (type deficiency marker)
if hasattr(obj, "name"):  # Why don't you know the type?
if isinstance(obj, Named):  # Use protocol or union type

# Flag: God functions
def process_order(order_data):  # 200 lines doing everything
    validated = _validate(data)  # Extract responsibilities
    transformed = _transform(validated)

# Flag: Duplicate logic across files
# service_a.py and service_b.py have same 20-line validation
# Extract to shared utility

# Flag: Related classes without shared interface
class CacheDirectory:
    def list_entries(self): ...  # Names differ
class StatusDirectory:
    def get_all_entries(self): ...  # Should share ABC
```

## Handling Large Diffs

**Keep maximum context to spot cross-file patterns.**

### Step 1: Remove Lock Files (Always)
```bash
git diff main...HEAD -- . ':!uv.lock' > /tmp/diff.txt
wc -lc /tmp/diff.txt  # Under 256KB? STOP - analyze whole thing
```

### Step 2: Remove Docs (If Still Too Big)
```bash
git diff main...HEAD -- . ':!uv.lock' ':!docs/*' > /tmp/code-diff.txt
```

### Step 3: Remove Tests (Last Resort)
```bash
git diff main...HEAD -- . ':!uv.lock' ':!docs/*' ':!tests/*' > /tmp/prod.txt
```

### Step 4: Chunk with LARGE Chunks (Rare)
```bash
# Only if >256KB after all filtering
# Use 15k lines to preserve cross-file patterns
# Small chunks (2k lines) defeat the purpose!
```

## LLM-Based Full Diff Review

**Why use external LLM tools?** Claude Code has a 25K context limit per tool call. For reviewing entire PR diffs (50K-200K+ tokens), use models with larger context windows.

**Benefits:**
1. Cross-file pattern detection in one pass
2. Different models catch different issues
3. Faster than file-by-file review

### Workflow

```bash
# 1. Extract changes (handles lock file exclusion automatically)
./scripts/extract-changes.sh              # PR: current branch vs main
./scripts/extract-changes.sh abc123       # Specific commit
./scripts/extract-changes.sh auth/login   # Path pattern (fuzzy match)

# 2. Run code review with large-context model
./scripts/llm-review.sh -m gpt-4o
./scripts/llm-review.sh -m claude-3-5-sonnet-latest

# 3. Run specialized reviews
./scripts/llm-review-tests.sh -m gpt-4o      # Test quality
./scripts/llm-review-types.sh -m gpt-4o      # Type hints
```

### Setup

Requires [Simon Willison's llm tool](https://llm.datasette.io/):

```bash
pip install llm
llm keys set openai          # For GPT-4
pip install llm-claude-3     # For Claude
llm keys set claude
```

See [references/llm-tooling.md](references/llm-tooling.md) for full setup and usage guide.

### Using LLM Findings

LLM review provides **hints**, not final answers:
1. Identify areas to investigate deeper
2. Cross-check with different models
3. Use Claude Code to implement actual fixes

## Reference Files

For detailed patterns and examples:
- [references/what-to-flag.md](references/what-to-flag.md) - Duplication, weak types, god functions
- [references/code-bloat-patterns.md](references/code-bloat-patterns.md) - AI-generated bloat patterns
- [references/llm-tooling.md](references/llm-tooling.md) - LLM tool setup and usage for full-diff review

**Remember**: The ENTIRE POINT of full diff review is cross-file patterns. Don't chunk unless you absolutely must!

## Human-in-the-Loop After Code Review

**CRITICAL:** After completing a code review, ALWAYS:

1. **Present findings** - Output the full review report with all sections
2. **List suggested actions** - Number each potential fix
3. **Ask for approval** - Use AskUserQuestion before creating TODOs or executing
4. **Wait for explicit approval** - User may reject, ask for clarification, or approve selectively

**Why this matters:**
- LLM reviews often suggest unnecessary changes
- Some suggestions may be wrong or over-engineered
- User knows context that LLM doesn't
- Executing without approval wastes time on rejected work

**Example flow:**
```
[Code review findings output]

## Suggested Actions
1. Add format-duration validator to Schema
2. Add tests for format-duration validation
3. Move constant to config class

Which items should I proceed with?
```

Overview

This skill performs focused, PR-level code reviews that prioritize structural and cross-file issues over minor style nitpicks. It filters large diffs, preserves cross-file context, and surfaces high-impact problems like duplication, god functions, and weak typing. The goal is concise, actionable findings with numbered suggested actions and a required approval step before making changes.

How this skill works

It inspects the full PR diff (after filtering lock files and large noise) to detect cross-file patterns, duplicated logic, overly large functions, and type-deficient code. For large diffs it progressively excludes docs/tests and uses large-context LLM workflows when needed. The output is a structured review report with findings, prioritized actions, and an explicit ask-for-approval step before any automated edits.

When to use it

  • When asked to review a PR, diff, or full change set ("review code", "review PR", "review diff").
  • When you need cross-file pattern detection rather than file-by-file comments.
  • When diffs are large and you need guidance on chunking or using large-context models.
  • When you want high-impact, structural feedback (duplication, god functions, weak types).
  • Before merging substantial changes or refactors that may introduce architectural issues.

Best practices

  • Review at the PR level; focus on structural problems over minor style issues.
  • Always filter out lock files first, then docs, then tests if needed to reduce noise.
  • Flag weak types, hasattr/getattr patterns, duplicated logic, and functions >100 lines.
  • Preserve large chunks (e.g., ~15k lines) to retain cross-file context; avoid tiny chunks.
  • Provide a numbered list of suggested actions and request explicit user approval before applying fixes.

Example use cases

  • Run a full PR review to find duplicated validation logic across services and propose a shared utility.
  • Detect a 200-line god function and suggest extracting validation, transformation, and persistence responsibilities.
  • Spot widespread use of raw dicts/Any and recommend introducing typed models or protocols.
  • Use LLM-assisted large-context review for a 100k+ token diff to surface cross-file architectural smells.
  • Produce a concise report with prioritized fixes and ask the author which items to implement.

FAQ

What should I not flag during review?

Skip minor style issues like naming unless confusing, line length, whitespace, import order, and comment style; these are low-value or auto-fixable.

How do you handle very large diffs?

Filter lock files and docs first, then tests as a last resort; if still large, use large-context LLMs or preserve large chunks to maintain cross-file patterns.