home / skills / 2389-research / claude-plugins / simmer-reflect

simmer-reflect skill

/simmer/skills/simmer-reflect

This skill records iteration results, tracks the best candidate, and forwards ASI for the next round to improve performance.

npx playbooks add skill 2389-research/claude-plugins --skill simmer-reflect

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
3.5 KB
---
name: simmer-reflect
description: >
  Reflect subskill for simmer. Records iteration results in trajectory table,
  tracks best candidate, and passes ASI forward to the next round. Do not
  invoke directly — called by simmer orchestrator after each judge round.
---

# Simmer Reflect

You are the only subskill that sees the full score history. Your job: record the iteration, track the best candidate, and pass the ASI forward.

## Context You Receive

- **Full score history**: all iterations so far (scores, composites, key changes)
- **Current iteration number** and **max iterations**
- **Latest judge output**: scores + ASI for this round
- **Generator report**: what changed this round (2-3 sentences)

## What To Do

### 1. Record in Trajectory

Update `{OUTPUT_DIR}/trajectory.md` with the running score table.

**Required format (do not add extra columns):**
- Columns: Iteration, [criterion names from rubric], Composite, Key Change
- Below table: `Best candidate: iteration [N] (composite: [N.N]/10)`
- No "Best?" column — use the line below the table instead

```markdown
# Simmer Trajectory

| Iteration | [criterion 1] | [criterion 2] | [criterion 3] | Composite | Key Change |
|-----------|---------------|---------------|---------------|-----------|------------|
| 0         | 4             | 5             | 3             | 4.0       | seed       |
| 1         | 7             | 5             | 4             | 5.3       | [summary]  |
| 2         | 7             | 6             | 7             | 6.7       | [summary]  |

Best candidate: iteration 2 (composite: 6.7/10)
```

The "Key Change" column uses the generator's 2-3 sentence report, condensed to a few words (under 60 characters). For iteration 0 (the seed), Key Change is always "seed".

### 2. Track Best Candidate

Compare this iteration's composite score to the best-so-far. Update the "Best candidate" line at the bottom of the trajectory.

**The best candidate may not be the latest one.** If iteration 3 scores lower than iteration 2, the best is still iteration 2.

### 2b. Handle Regression

If this iteration's composite is LOWER than best-so-far:
- Note the regression in the trajectory Key Change column (e.g., "regressed — ASI targeted X but Y suffered")
- Advise the orchestrator: next generator should receive the BEST candidate (not the latest), plus the current ASI
- Include in output: `REGRESSION: true — use iteration [N] as input to next generator`

### 3. Pass ASI Forward

Return to the orchestrator:
- The ASI from this round's judge (passed unchanged to next generator)
- Which iteration file contains the current best candidate
- Whether iterations remain
- Whether a regression occurred (and which candidate to use as input)

## Output to Orchestrator

```
ITERATION [N] RECORDED
BEST SO FAR: iteration [N] (composite: [N.N]/10)
REGRESSION: [true/false] — [if true: use iteration N as input to next generator]
ITERATIONS REMAINING: [N]
ASI FOR NEXT ROUND: [the judge's ASI, unchanged]
```

## Common Mistakes

**Modifying the ASI**
- Problem: Reflect edits or summarizes the judge's ASI before passing it forward
- Fix: Pass the ASI through unchanged — the judge wrote it for the generator

**Not tracking best-so-far separately**
- Problem: Assumes the last iteration is the best
- Fix: Always compare composite to best-so-far, update if better

**Writing candidate content into trajectory**
- Problem: Trajectory file becomes huge, clutters context
- Fix: Trajectory only contains scores, composites, and short key-change summaries

Overview

This skill records iteration results for the simmer orchestrator, maintains a running trajectory table, and forwards the judge's ASI unchanged to the next generator. It identifies and tracks the best candidate across rounds and flags regressions when an iteration performs worse than the best-so-far. Do not invoke this subskill directly — it runs automatically after each judge round.

How this skill works

After each judge round, the skill appends the iteration's criterion scores, composite, and a short key-change summary to the trajectory table. It compares the current composite to the best-so-far, updates the Best candidate line if improved, and detects regressions when the current composite is lower. Finally, it returns the judge's ASI untouched, indicates which iteration file is the best input for the next generator, and reports whether iterations remain.

When to use it

  • Automatically after every judge round in the simmer pipeline
  • When you need a single authoritative record of score history and changes
  • When the orchestrator must know which candidate to feed to the next generator
  • When you must detect and handle regressions between iterations

Best practices

  • Write the trajectory table with only the required columns and no candidate content
  • Condense the generator report into a short Key Change (under 60 characters)
  • Never modify or summarize the judge's ASI — pass it forward unchanged
  • Always compare composite scores to the current best, not to the last iteration
  • If regression occurs, mark REGRESSION and recommend using the best-so-far as input

Example use cases

  • Recording scores and changes across iterative refinement of a prompt or model output
  • Choosing the correct candidate to supply to the next generator after a poor iteration
  • Maintaining a compact, machine- and human-readable trajectory.md for audits
  • Signaling the orchestrator that iterations remain and which file to load next

FAQ

Should I ever edit the judge's ASI before passing it on?

No. The judge's ASI must be forwarded exactly as received.

What summary text belongs in the Key Change column?

A concise 2-6 word summary of the generator's 2-3 sentence report, or 'seed' for iteration 0.