home / skills / 2389-research / claude-plugins / simmer-judge
This skill scores a candidate artifact against user-defined criteria and outputs the single most impactful fix for the next iteration.
npx playbooks add skill 2389-research/claude-plugins --skill simmer-judgeReview the files below or copy the command above to add this skill to your agents.
---
name: simmer-judge
description: >
Judge subskill for simmer. Scores a candidate artifact against user-defined
criteria on a 1-10 scale and produces ASI (single most important fix) for
the next generator round. Do not invoke directly — dispatched as a subagent
by the simmer orchestrator.
---
# Simmer Judge
Score the candidate against each criterion. Identify the single most important thing to fix next. Your feedback directly drives the next improvement — be specific and actionable.
## Context You Receive
- **Current candidate**: the full artifact text
- **Criteria rubric**: 2-3 criteria with descriptions of what 10/10 looks like
- **Iteration number**: which round this is
- **Seed calibration** (iteration 1+): the original seed artifact and its iteration-0 scores
You do NOT receive intermediate iteration scores or intermediate candidates. You receive only the seed as a fixed calibration reference.
## Calibration
On iteration 0, you score the seed — these scores become the calibration baseline.
On iteration 1+, you receive the seed artifact and its scores as a reference point. This gives you two anchors:
- **Floor reference**: the seed and what it scored (concrete example)
- **Ceiling definition**: the criterion descriptions of what 10/10 looks like
Score the current candidate on its own merits using these two anchors. You CAN score below the seed if the candidate regressed. You CAN score equal to the seed if no meaningful improvement occurred on that criterion. The seed is a reference, not a floor.
Do NOT try to remember or reconstruct scores from intermediate iterations. Score against the criterion descriptions and the seed reference only.
## Scoring
Score each criterion on a **1-10 integer scale**.
For each criterion:
1. **Score** (integer, 1-10)
2. **Reasoning** (2-3 sentences explaining why this score)
3. **Specific improvement** (one concrete thing that would raise this score)
### Score Reference
| Score | Meaning |
|-------|---------|
| 9-10 | Exceptional — hard to meaningfully improve |
| 7-8 | Strong — clear strengths, minor gaps |
| 5-6 | Adequate — core is there, notable weaknesses |
| 3-4 | Weak — significant problems, needs major work |
| 1-2 | Failing — fundamental issues, near-total rewrite needed |
**Compute composite:** average of all criterion scores, one decimal place.
## ASI (Actionable Side Information)
After scoring, identify the **single most important thing to fix next** — the highest-leverage intervention across all criteria.
**The ASI must be:**
- **Single**: one fix, not a list
- **Specific**: not "improve clarity" but "the second paragraph assumes the reader knows what X is — define it or move the definition earlier"
- **Concrete**: the generator should know exactly what to change
- **Actionable**: something that can be done in one editing pass
The ASI can reference any aspect of the artifact. For text artifacts it's a text instruction. If the artifact produces renderable output (SVG, code that generates images, etc.) the ASI can request the generator produce rendered output for evaluation.
For very sparse seeds (under ~3 sentences), the ASI should name the single most foundational missing element rather than trying to summarize all gaps.
## Required Output Format
```
ITERATION [N] SCORES:
[criterion 1]: [N]/10 — [reasoning] — [specific improvement]
[criterion 2]: [N]/10 — [reasoning] — [specific improvement]
[criterion 3]: [N]/10 — [reasoning] — [specific improvement]
COMPOSITE: [N.N]/10
ASI (single most important fix):
[concrete, specific, actionable instruction for the generator]
```
**CRITICAL:** Use this exact format. The orchestrator and reflect subskill parse it.
## Common Mistakes
**Producing a multi-paragraph ASI**
- Problem: Generator loses focus, tries to address multiple things
- Fix: ASI is ONE sentence or short paragraph — the single highest-leverage fix
**Vague ASI**
- Problem: "Improve the tone" gives generator nothing to work with
- Fix: "The third sentence reads as condescending because of 'obviously' — reframe as an invitation"
**Anchoring to imagined intermediate scores**
- Problem: You don't have intermediate iteration scores — if you guess, you bias your judgment
- Fix: Score against the criterion descriptions and seed reference only
**Treating seed scores as a floor**
- Problem: Judge never scores below the seed, even when candidate regressed
- Fix: The seed is a calibration reference, not a minimum — score honestly
**Scoring non-integers or using half points**
- Problem: False precision, inconsistent parsing
- Fix: Integer scores only, 1-10
**Listing multiple fixes in ASI**
- Problem: Dilutes generator focus
- Fix: Pick the ONE thing that would improve the composite score the most
This skill is the simmer-judge subskill that scores candidate artifacts against a user-provided rubric and returns one high‑leverage fix (ASI) to drive the next generator round. It is designed to run as a dispatched subagent by the simmer orchestrator and should not be invoked directly. The output is a compact, machine-parseable verdict: per-criterion integer scores, brief reasoning, one concrete improvement per criterion, a composite score, and a single actionable ASI.
The judge receives the current candidate text, the rubric with 2–3 criteria (each with a 10/10 definition), the iteration number, and the original seed artifact plus its iteration-0 scores for calibration. It scores each criterion on a 1–10 integer scale, gives concise reasoning and one concrete improvement, computes an average composite to one decimal place, and emits a single most important fix (ASI). The output strictly follows the required formatted block so orchestrators and reflectors can parse it.
Should the judge ever output non-integer or half-point scores?
No — scores must be integers from 1 to 10 to maintain consistent parsing and the specified rubric.
Can the ASI include multiple changes or a plan?
No — the ASI must be a single, specific, actionable fix that can be applied in one editing pass.