home / skills / srstomp / pokayokay / session-review
npx playbooks add skill srstomp/pokayokay --skill session-reviewReview the files below or copy the command above to add this skill to your agents.
---
name: session-review
description: Analyzes completed agent sessions to identify what went well, what went wrong, and patterns to improve. Reads from .claude/ folder (session logs, progress.md, git history) and produces actionable insights. Use after completing work sessions to learn from agent behavior and improve future runs.
---
# Session Review Skill
Analyze agent sessions to extract lessons and improve future performance.
## Purpose
After an agent session (or series of sessions), this skill helps you:
- Understand what the agent actually did
- Identify good patterns to reinforce
- Spot bad patterns to prevent
- Find wasted effort
- Generate improvements for skills/prompts
## Quick Start
### Trigger Phrases
- "Review the last session"
- "Analyze what went wrong in this project"
- "What patterns do you see in the agent's work?"
- "Help me understand why the agent struggled with X"
### Basic Usage
```
1. Run after completing a project or hitting issues
2. Point to project with .claude/ folder
3. Get structured analysis with recommendations
```
## Input Sources
The skill reads from multiple sources:
```
.claude/
├── progress.md # Session summaries
├── sessions/ # Detailed session logs
│ ├── 2025-01-10-001.md
│ └── 2025-01-10-002.md
├── features.json # Original plan
├── tasks.db # Task tracking
└── checkpoints/ # Human decision points
└── pending-review.md
Git History
├── Commit messages # What was done
├── Diffs # What actually changed
└── Reverts # What was undone
Code Quality
├── Build status # Did it compile?
├── Test results # Did tests pass?
└── Lint output # Code quality issues
```
## Analysis Framework
### 1. Execution Analysis
Compare plan vs reality:
| Metric | Source | Question |
|--------|--------|----------|
| Tasks planned | features.json | How many tasks were defined? |
| Tasks completed | progress.md | How many actually finished? |
| Tasks abandoned | git history | What was started but reverted? |
| Scope creep | task additions | What wasn't in original plan? |
| Time estimates | tasks.db | How accurate were estimates? |
### 2. Quality Analysis
Assess work quality:
| Signal | Good | Bad |
|--------|------|-----|
| Build failures | 0-1 per session | Multiple per task |
| Test failures | Caught and fixed | Left broken |
| Reverts | Rare | Frequent |
| Commit size | Focused, small | Huge or tiny |
| Code review flags | Minor style | Logic errors |
### 3. Pattern Detection
Look for recurring behaviors:
**Good Patterns**
- Verifies before starting new work
- Commits after each task
- Updates progress consistently
- Asks clarifying questions
- Tests changes end-to-end
**Bad Patterns**
- Starts multiple tasks before finishing one
- Commits broken code
- Ignores test failures
- Over-engineers simple tasks
- Repeats same mistakes
### 4. Decision Analysis
Review checkpoint decisions:
| Decision Point | What Happened | Outcome |
|----------------|---------------|---------|
| Ambiguity found | Agent guessed | Wrong assumption |
| Scope change | Human approved | Added 2 days |
| Build failure | Auto-fixed | Fixed but fragile |
## Output Format
### Session Review Report
```markdown
# Session Review: [Project Name]
## Summary
- **Sessions analyzed**: 5
- **Tasks completed**: 12/15 (80%)
- **Time spent**: 8.5 hours
- **Estimated time**: 6 hours (42% over)
## What Went Well
### ✓ Consistent Progress Tracking
The agent updated progress.md after every task, making it easy
to resume sessions and understand state.
### ✓ Good Verification Habit
Before starting new features, the agent ran the test suite
and fixed any regressions first.
### ✓ Appropriate Scope Questions
When the export feature was ambiguous, the agent paused and
asked rather than guessing. This saved rework.
## What Went Wrong
### ✗ Underestimated Database Tasks
Database migrations took 3x longer than estimated. The agent
didn't account for data migration complexity.
**Pattern**: Estimates for database work consistently low
**Recommendation**: Add 2x multiplier for DB tasks
### ✗ Repeated Authentication Bug
The same JWT expiration bug was introduced twice across sessions.
The agent didn't learn from the first fix.
**Pattern**: No memory of previous bugs
**Recommendation**: Add known-issues.md to .claude/ folder
### ✗ Over-engineered Settings Page
Simple settings page became a full preferences system with
versioning, import/export, and sync. Scope crept significantly.
**Pattern**: Feature expansion without checkpoint
**Recommendation**: Add scope_change checkpoint at 'pause' level
## Time Analysis
| Category | Planned | Actual | Variance |
|----------|---------|--------|----------|
| Frontend | 3h | 2.5h | -17% ✓ |
| Backend | 2h | 3h | +50% |
| Database | 1h | 3h | +200% ✗ |
| Testing | 0h | 1h | (unplanned) |
## Recommendations
### For This Project
1. **Add database migration checklist**
Create .claude/checklists/database.md with migration steps
2. **Track known bugs**
Add .claude/known-issues.md that agent reads each session
3. **Tighten scope checkpoints**
Change scope_change from 'review' to 'pause'
### For Your Skills
1. **Update api-design skill**
Add section on JWT token lifecycle, common expiration bugs
2. **Update prd-analyzer skill**
Add 2x multiplier for database task estimates
### For Future Projects
1. **Start with tighter checkpoints**
First epic in 'supervised', then graduate to 'semi-auto'
2. **Require tests for auth code**
Any authentication changes must include test coverage
```
## Analysis Techniques
### Git History Analysis
```bash
# Count commits per session
git log --oneline --since="2025-01-10" --until="2025-01-11"
# Find reverts (indicates mistakes)
git log --oneline --grep="revert" --grep="Revert"
# Find fix commits (indicates bugs)
git log --oneline --grep="fix" --grep="Fix"
# Large commits (might be problematic)
git log --stat --since="2025-01-10" | grep -E "^\s+\d+ file"
# Commit frequency (should be steady)
git log --format="%ai" --since="2025-01-10" | cut -d' ' -f2 | cut -d: -f1 | sort | uniq -c
```
### Progress Analysis
```python
# Pseudocode for analyzing progress.md
def analyze_progress(progress_md: str) -> dict:
sessions = parse_sessions(progress_md)
metrics = {
'total_sessions': len(sessions),
'tasks_per_session': [],
'checkpoints_hit': [],
'errors_encountered': [],
'scope_changes': []
}
for session in sessions:
metrics['tasks_per_session'].append(session.tasks_completed)
metrics['checkpoints_hit'].extend(session.checkpoints)
metrics['errors_encountered'].extend(session.errors)
return metrics
```
### Task Completion Analysis
```sql
-- Query tasks.db for completion patterns
SELECT
task_type,
COUNT(*) as total,
AVG(estimate_hours) as avg_estimate,
AVG(
CAST((julianday(completed_at) - julianday(started_at)) * 24 AS REAL)
) as avg_actual,
AVG(
CAST((julianday(completed_at) - julianday(started_at)) * 24 AS REAL)
) / AVG(estimate_hours) as accuracy_ratio
FROM tasks
WHERE status = 'done'
GROUP BY task_type;
```
## Pattern Library
### Common Good Patterns
| Pattern | Signal | Reinforce By |
|---------|--------|--------------|
| Verify first | Runs tests before new work | Add to session protocol |
| Small commits | <100 lines per commit | Praise in review |
| Clear messages | Commit messages explain why | Include in prompts |
| Asks questions | Pauses on ambiguity | Keep checkpoints |
### Common Bad Patterns
| Pattern | Signal | Prevent By |
|---------|--------|------------|
| Scope creep | Unplanned features | Tighter checkpoints |
| Broken commits | Build fails after commit | Require build pass |
| Giant commits | >500 lines | Split task definition |
| Repeated bugs | Same fix twice | Known issues file |
| Estimate optimism | Always over time | Add multipliers |
### Warning Signs
| Signal | Meaning | Action |
|--------|---------|--------|
| Multiple reverts | Agent is thrashing | Switch to supervised |
| Long gaps in commits | Stuck on something | Review what's blocking |
| Sudden scope expansion | Agent is over-engineering | Pause and refocus |
| Skipped tests | Agent avoiding verification | Require test runs |
## Integration with Project Harness
### After Each Epic
Run session review at epic boundaries:
```yaml
# In .claude/config.yaml
checkpoints:
epic_complete: pause # Good time to review
```
When epic completes:
1. Run session-review on the epic's sessions
2. Apply recommendations to remaining work
3. Adjust estimates for similar future tasks
### After Project Completion
Full project review:
1. Analyze all sessions
2. Generate project retrospective
3. Update skills with learnings
4. Document patterns for future projects
## Output Files
Generate these artifacts:
| File | Purpose |
|------|---------|
| `.claude/reviews/YYYY-MM-DD.md` | Session review report |
| `.claude/patterns.md` | Accumulated good/bad patterns |
| `.claude/known-issues.md` | Bugs to watch for |
| `.claude/estimate-adjustments.md` | Multipliers by task type |
## Review Checklist
When running a session review:
```markdown
## Pre-Review
- [ ] All sessions have logs in .claude/sessions/
- [ ] Git history is available
- [ ] Build/test status known
## Analysis
- [ ] Compared planned vs actual tasks
- [ ] Checked estimate accuracy
- [ ] Identified reverts/fixes
- [ ] Found repeated patterns
- [ ] Reviewed checkpoint decisions
## Output
- [ ] Generated review report
- [ ] Listed actionable recommendations
- [ ] Updated patterns.md if new patterns found
- [ ] Updated known-issues.md if bugs repeated
```
## Anti-Patterns
| Anti-Pattern | Problem | Fix |
|--------------|---------|-----|
| Review too late | Lessons forgotten | Review after each epic |
| Blame the agent | Not actionable | Focus on system improvements |
| Ignore estimates | Keep being wrong | Track and adjust multipliers |
| No follow-through | Same mistakes repeat | Track recommendation implementation |