home / skills / simhacker / moollm / rubric

rubric skill

safe

This skill converts qualitative debates into defensible, measurable scores using weighted criteria and clear anchors for objective decision making.

npx playbooks add skill simhacker/moollm --skill rubric

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

4.8 KB

---
name: rubric
description: Measurable criteria translating qualitative debate to quantitative scores
allowed-tools:
  - read_file
  - write_file
tier: 1
protocol: RUBRIC
tags: [moollm, scoring, criteria, evaluation, decision]
credits: "Mike Gallaher — RUBRIC-bridge methodology"
related: [adversarial-committee, evaluator, scoring, advertisement]
templates:
  - file: RUBRIC.yml.tmpl
    purpose: Scoring rubric definition
---

# Rubric

> *"What gets measured gets managed."*

Explicit criteria with weights translate qualitative debate into defensible scores.

## Rubric Structure

```yaml
rubric:
  id: client-evaluation-v1
  name: "Client Engagement Rubric"
  version: 1.0
  
  criteria:
    resource_efficiency:
      weight: 0.20
      scale: [1, 2, 3, 4, 5]
      description: "How well does this fit our current capacity?"
      anchors:
        5: "Perfect fit, no adjustments needed"
        4: "Good fit, minor adjustments"
        3: "Manageable, some reallocation"
        2: "Difficult, significant changes"
        1: "Would overwhelm current capacity"
        
    risk_level:
      weight: 0.30
      scale: [1, 2, 3, 4, 5]
      description: "What's the risk profile?"
      anchors:
        5: "Minimal risk, strong track record"
        4: "Low risk, minor concerns"
        3: "Moderate risk, manageable"
        2: "Elevated risk, needs mitigation"
        1: "High risk, major red flags"
        
    strategic_alignment:
      weight: 0.25
      scale: [1, 2, 3, 4, 5]
      description: "Does this advance our strategic goals?"
      anchors:
        5: "Core to our strategy"
        4: "Strongly aligned"
        3: "Compatible"
        2: "Tangential"
        1: "Misaligned"
        
    stakeholder_impact:
      weight: 0.25
      scale: [1, 2, 3, 4, 5]
      description: "How does this affect our stakeholders?"
      anchors:
        5: "Benefits all stakeholders"
        4: "Benefits most, neutral for others"
        3: "Mixed impact"
        2: "Some stakeholders harmed"
        1: "Significant negative impact"
        
  thresholds:
    accept: 3.5
    reject: 2.0
    review: [2.0, 3.5]  # Between reject and accept
```

## Scoring

```yaml
evaluation:
  rubric: client-evaluation-v1
  subject: "Client X Engagement"
  evaluator: independent  # No debate context
  
  scores:
    resource_efficiency:
      score: 4
      rationale: "Good capacity fit, minor timeline adjustment"
      
    risk_level:
      score: 2
      rationale: "Scope creep history is concerning"
      
    strategic_alignment:
      score: 4
      rationale: "Aligns with growth goals"
      
    stakeholder_impact:
      score: 3
      rationale: "Positive for team growth, stress for some"
      
  weighted_total: 3.15  # Below accept threshold
  
  recommendation: review
  
  critique: |
    Risk score of 2 dragging down overall.
    If risk can be mitigated (milestone billing, scope boundaries),
    score could reach 3.55 (accept).
    
    Suggestion: Revise proposal with explicit risk mitigation.
```

## Calculation

```
weighted_total = Σ (criterion_weight × criterion_score)

Example:
  resource_efficiency: 0.20 × 4 = 0.80
  risk_level:          0.30 × 2 = 0.60
  strategic_alignment: 0.25 × 4 = 1.00
  stakeholder_impact:  0.25 × 3 = 0.75
  
  total = 0.80 + 0.60 + 1.00 + 0.75 = 3.15
```

## Rubric Evolution

Rubrics improve over time:

```yaml
rubric_history:
  - version: 1.0
    date: "2025-06-01"
    note: "Initial version"
    
  - version: 1.1
    date: "2025-09-15"
    changes:
      - "Added 'stakeholder_impact' criterion"
      - "Reweighted risk from 0.35 to 0.30"
    reason: "Post-mortem showed we missed stakeholder effects"
    
  - version: 1.2
    date: "2026-01-05"
    changes:
      - "Refined risk anchors for scope creep"
    reason: "Client X evaluation revealed gap"
```

## Domain-Specific Rubrics

```yaml
# rubrics/code-review-rubric.yml
rubric:
  id: code-review-v1
  criteria:
    correctness: { weight: 0.30 }
    readability: { weight: 0.20 }
    maintainability: { weight: 0.20 }
    performance: { weight: 0.15 }
    test_coverage: { weight: 0.15 }

# rubrics/hiring-rubric.yml
rubric:
  id: hiring-v1
  criteria:
    technical_skills: { weight: 0.30 }
    collaboration: { weight: 0.25 }
    growth_potential: { weight: 0.20 }
    culture_add: { weight: 0.15 }
    communication: { weight: 0.10 }
```

## Commands

| Command | Action |
|---------|--------|
| `DEFINE RUBRIC [name]` | Create new rubric |
| `SCORE [subject] AGAINST [rubric]` | Evaluate against criteria |
| `COMPARE [a] [b] USING [rubric]` | Side-by-side evaluation |
| `REVISE RUBRIC [changes]` | Update rubric version |

## Integration

```mermaid
graph LR
    C[Committee Debate] --> O[Output]
    O -->|THROW| E[Evaluator]
    R[RUBRIC.yml] --> E
    E --> S{Score}
    S -->|≥ 3.5| A[Accept]
    S -->|≤ 2.0| X[Reject]
    S -->|between| REV[Review]
    REV -->|critique| C
```

Overview

This skill converts qualitative debate into measurable, defensible scores using explicit criteria and weights. It defines criteria with scales and anchors, computes weighted totals, and produces accept/reject/review recommendations. The skill supports evolving rubrics and versioned change history for auditability.

How this skill works

You define a rubric with named criteria, numeric scales, descriptive anchors, and weights. Evaluators supply per-criterion scores and rationales; the skill multiplies scores by weights, sums a weighted total, and applies configurable thresholds to yield a recommendation. It can compare subjects, suggest revisions when thresholds are missed, and record rubric evolution for traceability.

When to use it

Prioritizing project or client proposals with transparent criteria
Standardizing hiring or code review decisions across teams
Turning stakeholder debates into repeatable scoring models
Comparing multiple options side-by-side with a common lens
Auditing past decisions and tracking rubric changes over time

Best practices

Keep criteria limited to the few that matter and ensure weights sum to 1.0
Define clear anchor language for each scale point to reduce ambiguity
Record evaluator rationale for each score to support critique and revision
Use thresholds (accept/reject/review) and automate recommendations
Version rubrics and log change notes so decisions remain auditable

Example use cases

Evaluate a new client engagement for capacity, risk, alignment, and stakeholder impact
Score candidate interviews using a standardized hiring rubric to minimize bias
Run code reviews with weighted criteria like correctness, readability, and tests
Compare two project proposals using the same rubric to decide funding
Iterate a rubric after a post-mortem to close gaps revealed by real evaluations

FAQ

How is the weighted total calculated?

Multiply each criterion score by its weight and sum the results to get the weighted total.

What if criteria weights don’t sum to 1.0?

Normalize weights so they sum to 1.0 before calculating totals to keep scores comparable.

When should I revise a rubric?

Revise after audits or post-mortems reveal decision gaps, or when priorities shift; record version notes for traceability.