home / skills / srstomp / pokayokay / session-review

session-review skill

/plugins/pokayokay/skills/session-review

npx playbooks add skill srstomp/pokayokay --skill session-review

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
10.1 KB
---
name: session-review
description: Analyzes completed agent sessions to identify what went well, what went wrong, and patterns to improve. Reads from .claude/ folder (session logs, progress.md, git history) and produces actionable insights. Use after completing work sessions to learn from agent behavior and improve future runs.
---

# Session Review Skill

Analyze agent sessions to extract lessons and improve future performance.

## Purpose

After an agent session (or series of sessions), this skill helps you:
- Understand what the agent actually did
- Identify good patterns to reinforce
- Spot bad patterns to prevent
- Find wasted effort
- Generate improvements for skills/prompts

## Quick Start

### Trigger Phrases

- "Review the last session"
- "Analyze what went wrong in this project"
- "What patterns do you see in the agent's work?"
- "Help me understand why the agent struggled with X"

### Basic Usage

```
1. Run after completing a project or hitting issues
2. Point to project with .claude/ folder
3. Get structured analysis with recommendations
```

## Input Sources

The skill reads from multiple sources:

```
.claude/
├── progress.md           # Session summaries
├── sessions/             # Detailed session logs
│   ├── 2025-01-10-001.md
│   └── 2025-01-10-002.md
├── features.json         # Original plan
├── tasks.db              # Task tracking
└── checkpoints/          # Human decision points
    └── pending-review.md

Git History
├── Commit messages       # What was done
├── Diffs                 # What actually changed
└── Reverts               # What was undone

Code Quality
├── Build status          # Did it compile?
├── Test results          # Did tests pass?
└── Lint output           # Code quality issues
```

## Analysis Framework

### 1. Execution Analysis

Compare plan vs reality:

| Metric | Source | Question |
|--------|--------|----------|
| Tasks planned | features.json | How many tasks were defined? |
| Tasks completed | progress.md | How many actually finished? |
| Tasks abandoned | git history | What was started but reverted? |
| Scope creep | task additions | What wasn't in original plan? |
| Time estimates | tasks.db | How accurate were estimates? |

### 2. Quality Analysis

Assess work quality:

| Signal | Good | Bad |
|--------|------|-----|
| Build failures | 0-1 per session | Multiple per task |
| Test failures | Caught and fixed | Left broken |
| Reverts | Rare | Frequent |
| Commit size | Focused, small | Huge or tiny |
| Code review flags | Minor style | Logic errors |

### 3. Pattern Detection

Look for recurring behaviors:

**Good Patterns**
- Verifies before starting new work
- Commits after each task
- Updates progress consistently
- Asks clarifying questions
- Tests changes end-to-end

**Bad Patterns**
- Starts multiple tasks before finishing one
- Commits broken code
- Ignores test failures
- Over-engineers simple tasks
- Repeats same mistakes

### 4. Decision Analysis

Review checkpoint decisions:

| Decision Point | What Happened | Outcome |
|----------------|---------------|---------|
| Ambiguity found | Agent guessed | Wrong assumption |
| Scope change | Human approved | Added 2 days |
| Build failure | Auto-fixed | Fixed but fragile |

## Output Format

### Session Review Report

```markdown
# Session Review: [Project Name]

## Summary
- **Sessions analyzed**: 5
- **Tasks completed**: 12/15 (80%)
- **Time spent**: 8.5 hours
- **Estimated time**: 6 hours (42% over)

## What Went Well

### ✓ Consistent Progress Tracking
The agent updated progress.md after every task, making it easy
to resume sessions and understand state.

### ✓ Good Verification Habit
Before starting new features, the agent ran the test suite
and fixed any regressions first.

### ✓ Appropriate Scope Questions
When the export feature was ambiguous, the agent paused and
asked rather than guessing. This saved rework.

## What Went Wrong

### ✗ Underestimated Database Tasks
Database migrations took 3x longer than estimated. The agent
didn't account for data migration complexity.

**Pattern**: Estimates for database work consistently low
**Recommendation**: Add 2x multiplier for DB tasks

### ✗ Repeated Authentication Bug
The same JWT expiration bug was introduced twice across sessions.
The agent didn't learn from the first fix.

**Pattern**: No memory of previous bugs
**Recommendation**: Add known-issues.md to .claude/ folder

### ✗ Over-engineered Settings Page
Simple settings page became a full preferences system with
versioning, import/export, and sync. Scope crept significantly.

**Pattern**: Feature expansion without checkpoint
**Recommendation**: Add scope_change checkpoint at 'pause' level

## Time Analysis

| Category | Planned | Actual | Variance |
|----------|---------|--------|----------|
| Frontend | 3h | 2.5h | -17% ✓ |
| Backend | 2h | 3h | +50% |
| Database | 1h | 3h | +200% ✗ |
| Testing | 0h | 1h | (unplanned) |

## Recommendations

### For This Project

1. **Add database migration checklist**
   Create .claude/checklists/database.md with migration steps

2. **Track known bugs**
   Add .claude/known-issues.md that agent reads each session

3. **Tighten scope checkpoints**
   Change scope_change from 'review' to 'pause'

### For Your Skills

1. **Update api-design skill**
   Add section on JWT token lifecycle, common expiration bugs

2. **Update prd-analyzer skill**
   Add 2x multiplier for database task estimates

### For Future Projects

1. **Start with tighter checkpoints**
   First epic in 'supervised', then graduate to 'semi-auto'

2. **Require tests for auth code**
   Any authentication changes must include test coverage
```

## Analysis Techniques

### Git History Analysis

```bash
# Count commits per session
git log --oneline --since="2025-01-10" --until="2025-01-11"

# Find reverts (indicates mistakes)
git log --oneline --grep="revert" --grep="Revert"

# Find fix commits (indicates bugs)
git log --oneline --grep="fix" --grep="Fix"

# Large commits (might be problematic)
git log --stat --since="2025-01-10" | grep -E "^\s+\d+ file"

# Commit frequency (should be steady)
git log --format="%ai" --since="2025-01-10" | cut -d' ' -f2 | cut -d: -f1 | sort | uniq -c
```

### Progress Analysis

```python
# Pseudocode for analyzing progress.md
def analyze_progress(progress_md: str) -> dict:
    sessions = parse_sessions(progress_md)
    
    metrics = {
        'total_sessions': len(sessions),
        'tasks_per_session': [],
        'checkpoints_hit': [],
        'errors_encountered': [],
        'scope_changes': []
    }
    
    for session in sessions:
        metrics['tasks_per_session'].append(session.tasks_completed)
        metrics['checkpoints_hit'].extend(session.checkpoints)
        metrics['errors_encountered'].extend(session.errors)
        
    return metrics
```

### Task Completion Analysis

```sql
-- Query tasks.db for completion patterns
SELECT 
    task_type,
    COUNT(*) as total,
    AVG(estimate_hours) as avg_estimate,
    AVG(
        CAST((julianday(completed_at) - julianday(started_at)) * 24 AS REAL)
    ) as avg_actual,
    AVG(
        CAST((julianday(completed_at) - julianday(started_at)) * 24 AS REAL)
    ) / AVG(estimate_hours) as accuracy_ratio
FROM tasks
WHERE status = 'done'
GROUP BY task_type;
```

## Pattern Library

### Common Good Patterns

| Pattern | Signal | Reinforce By |
|---------|--------|--------------|
| Verify first | Runs tests before new work | Add to session protocol |
| Small commits | <100 lines per commit | Praise in review |
| Clear messages | Commit messages explain why | Include in prompts |
| Asks questions | Pauses on ambiguity | Keep checkpoints |

### Common Bad Patterns

| Pattern | Signal | Prevent By |
|---------|--------|------------|
| Scope creep | Unplanned features | Tighter checkpoints |
| Broken commits | Build fails after commit | Require build pass |
| Giant commits | >500 lines | Split task definition |
| Repeated bugs | Same fix twice | Known issues file |
| Estimate optimism | Always over time | Add multipliers |

### Warning Signs

| Signal | Meaning | Action |
|--------|---------|--------|
| Multiple reverts | Agent is thrashing | Switch to supervised |
| Long gaps in commits | Stuck on something | Review what's blocking |
| Sudden scope expansion | Agent is over-engineering | Pause and refocus |
| Skipped tests | Agent avoiding verification | Require test runs |

## Integration with Project Harness

### After Each Epic

Run session review at epic boundaries:

```yaml
# In .claude/config.yaml
checkpoints:
  epic_complete: pause  # Good time to review
```

When epic completes:
1. Run session-review on the epic's sessions
2. Apply recommendations to remaining work
3. Adjust estimates for similar future tasks

### After Project Completion

Full project review:
1. Analyze all sessions
2. Generate project retrospective
3. Update skills with learnings
4. Document patterns for future projects

## Output Files

Generate these artifacts:

| File | Purpose |
|------|---------|
| `.claude/reviews/YYYY-MM-DD.md` | Session review report |
| `.claude/patterns.md` | Accumulated good/bad patterns |
| `.claude/known-issues.md` | Bugs to watch for |
| `.claude/estimate-adjustments.md` | Multipliers by task type |

## Review Checklist

When running a session review:

```markdown
## Pre-Review
- [ ] All sessions have logs in .claude/sessions/
- [ ] Git history is available
- [ ] Build/test status known

## Analysis
- [ ] Compared planned vs actual tasks
- [ ] Checked estimate accuracy
- [ ] Identified reverts/fixes
- [ ] Found repeated patterns
- [ ] Reviewed checkpoint decisions

## Output
- [ ] Generated review report
- [ ] Listed actionable recommendations
- [ ] Updated patterns.md if new patterns found
- [ ] Updated known-issues.md if bugs repeated
```

## Anti-Patterns

| Anti-Pattern | Problem | Fix |
|--------------|---------|-----|
| Review too late | Lessons forgotten | Review after each epic |
| Blame the agent | Not actionable | Focus on system improvements |
| Ignore estimates | Keep being wrong | Track and adjust multipliers |
| No follow-through | Same mistakes repeat | Track recommendation implementation |