home / skills / nickcrew / claude-cortex / evaluator-optimizer

evaluator-optimizer skill

safe

This skill guides iterative refinement of code, docs, or designs through structured evaluation and targeted improvements to production-grade quality.

npx playbooks add skill nickcrew/claude-cortex --skill evaluator-optimizer

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

3.1 KB

---
name: evaluator-optimizer
description: Iterative refinement workflow for polishing code, documentation, or designs through systematic evaluation and improvement cycles. Use when refining drafts into production-grade quality.
keywords:
  - refinement
  - iteration
  - code quality
  - evaluation
  - optimization
  - polish
triggers:
  - refine this
  - polish this
  - optimize this
  - improve iteratively
  - evaluator-optimizer
---

# Evaluator-Optimizer

Iterative refinement workflow that takes existing code, documentation, or designs and polishes them through rigorous cycles of evaluation and improvement until they meet production-grade quality standards.

## When to Use This Skill

- Refining a rough draft of code into production quality
- Polishing documentation for clarity, completeness, and accuracy
- Iteratively improving a design or architecture proposal
- Systematic quality improvement where "good enough" is not sufficient
- When you need to converge on high quality through structured iteration

## Quick Reference

| Task | Load reference |
| --- | --- |
| Evaluation criteria and quality rubrics | `skills/evaluator-optimizer/references/evaluation-criteria.md` |

## Workflow: The Loop

For any given artifact (code, text, design):

1. **Accept**: Take the current version of the artifact.
2. **Evaluate**: Act as a harsh critic. Rate the artifact on correctness, clarity, efficiency, style, and safety. Assign a score out of 100.
3. **Decide**:
   - Score >= 90: **Stop** and present the result.
   - Score < 90: **Refine**.
4. **Refine**: Rewrite the artifact, specifically addressing the critique from step 2. List what changed and why.
5. **Repeat**: Return to step 2 with the new version.

## Behavioral Rules

- **Do not settle**: "Good enough" is not good enough. You are here to polish.
- **Be explicit**: When evaluating, list specific flaws. "The function `process_data` is O(n^2) but could be O(n)."
- **Show your work**: Summarize changes in each iteration.
- **Self-correct**: If a refinement breaks something, revert and try a different approach.
- **Converge**: Each iteration must improve the score. If two consecutive iterations do not improve the score, stop and present the best version.

## Iteration Output Template

```markdown
## Iteration [N] Evaluation

| Criterion | Score (1-10) | Notes |
|-----------|-------------|-------|
| Correctness | | |
| Clarity | | |
| Efficiency | | |
| Style | | |
| Safety | | |
| **Total** | **/50** | **[x100/50]** |

### Issues Found
1. [Specific issue with location]
2. [Specific issue with location]

### Refinements Applied
- [Change 1 and rationale]
- [Change 2 and rationale]
```

## Example Interaction

**Input**: "Refine this Python script."

**Iteration 1 Evaluation**:
- Functionality: Good
- Efficiency: Poor - uses nested loops for matching
- Style: Variable names `a` and `b` are unclear
- Score: 60/100

**Refinements applied**:
- Flattened loops using a set lookup (O(n))
- Renamed `a` to `users`, `b` to `active_ids`
- Added type hints

**Iteration 2 Evaluation**:
- Functionality: Good
- Efficiency: Excellent
- Style: Good
- Score: 95/100

Result: Present the refined script.

Overview

This skill implements an iterative evaluator-optimizer loop that polishes code, documentation, or designs until they reach production-grade quality. It behaves like a relentless critic and engineer: evaluate, score, refine, and repeat until a high-quality threshold is met. Use it to turn drafts into reliably correct, efficient, and well-styled deliverables.

How this skill works

The workflow accepts an artifact, runs a rigorous evaluation against explicit criteria (correctness, clarity, efficiency, style, safety), and assigns a numeric score. If the score is below the convergence threshold, it applies targeted refinements addressing each critique, documents changes and rationale, then re-evaluates. Iterations continue until the artifact scores high enough or no further measurable improvement is possible.

When to use it

Refining a prototype script into production-ready code
Polishing technical documentation for clarity and completeness
Improving architecture or design proposals through measurable iterations
Enforcing quality standards where incremental improvements are required
Recovering and stabilizing a draft that must meet safety or compliance requirements

Best practices

Begin with a clear rubric and target score (default: 90/100) to guide convergence
Provide the full artifact and any contextual constraints (platform, dependencies, APIs)
Prefer small, focused changes per iteration to isolate regressions
Record why each change was made and how it affects the score
Stop and present the best version if two iterations fail to improve the score

Example use cases

Convert a messy Python script into a typed, efficient, and well-documented module
Iteratively improve an API spec so it meets security and backward-compatibility rules
Polish a README and onboarding docs until they clearly explain setup and common workflows
Refine a system design document to reduce ambiguity and highlight trade-offs
Optimize algorithm implementations for worst-case performance and readability

FAQ

How many iterations does this usually take?

It varies by artifact quality; minor drafts may converge in 1–3 iterations, complex systems can take more. Each iteration must measurably improve the score.

What if a refinement introduces a regression?

Revert the breaking change, document the failure, and try an alternative refinement; the workflow requires self-correction and monotonic score improvement.