home / skills / laurigates / claude-plugins / confidence-scoring

confidence-scoring skill

/blueprint-plugin/skills/confidence-scoring

This skill assesses confidence in PRPs and work-orders, guiding readiness for execution or subagent delegation with structured scoring.

npx playbooks add skill laurigates/claude-plugins --skill confidence-scoring

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
8.7 KB
---
model: haiku
created: 2025-12-16
modified: 2026-02-06
reviewed: 2025-12-16
name: confidence-scoring
description: "Assess quality of PRPs and work-orders using systematic confidence scoring. Use when evaluating readiness for execution or subagent delegation."
allowed-tools: Read, Grep, Glob
---

# Confidence Scoring for PRPs and Work-Orders

This skill provides systematic evaluation of PRPs (Product Requirement Prompts) and work-orders to determine their readiness for execution or delegation.

## When to Use This Skill

Activate this skill when:
- Creating a new PRP (`/prp:create`)
- Generating a work-order (`/blueprint:work-order`)
- Deciding whether to execute or refine a PRP
- Evaluating whether a task is ready for subagent delegation
- Reviewing PRPs/work-orders for quality

## Scoring Dimensions

### 1. Context Completeness (1-10)

Evaluates whether all necessary context is explicitly provided.

| Score | Criteria |
|-------|----------|
| **10** | All file paths explicit with line numbers, all code snippets included, library versions specified, integration points documented |
| **8-9** | Most context provided, minor gaps that can be inferred from codebase |
| **6-7** | Key context present but some discovery required |
| **4-5** | Significant context missing, will need exploration |
| **1-3** | Minimal context, extensive discovery needed |

**Checklist**:
- [ ] File paths are absolute or clearly relative to project root
- [ ] Code snippets include actual line numbers (e.g., `src/auth.py:45-60`)
- [ ] Library versions are specified
- [ ] Integration points are documented
- [ ] Patterns from codebase are shown with examples

### 2. Implementation Clarity (1-10)

Evaluates how clear the implementation approach is.

| Score | Criteria |
|-------|----------|
| **10** | Pseudocode covers all cases, step-by-step clear, edge cases addressed |
| **8-9** | Main path clear, most edge cases covered |
| **6-7** | Implementation approach clear, some details need discovery |
| **4-5** | High-level only, significant ambiguity |
| **1-3** | Vague requirements, unclear approach |

**Checklist**:
- [ ] Task breakdown is explicit
- [ ] Pseudocode is provided for complex logic
- [ ] Implementation order is specified
- [ ] Edge cases are identified
- [ ] Error handling approach is documented

### 3. Gotchas Documented (1-10)

Evaluates whether known pitfalls are documented with mitigations.

| Score | Criteria |
|-------|----------|
| **10** | All known pitfalls documented, each has mitigation, library-specific issues covered |
| **8-9** | Major gotchas covered, mitigations clear |
| **6-7** | Some gotchas documented, may discover more |
| **4-5** | Few gotchas mentioned, incomplete coverage |
| **1-3** | No gotchas documented |

**Checklist**:
- [ ] Library-specific gotchas documented
- [ ] Version-specific behaviors noted
- [ ] Common mistakes identified
- [ ] Each gotcha has a mitigation
- [ ] Race conditions/concurrency issues addressed

### 4. Validation Coverage (1-10)

Evaluates whether executable validation commands are provided.

| Score | Criteria |
|-------|----------|
| **10** | All quality gates have executable commands, expected outcomes specified |
| **8-9** | Main validation commands present, most outcomes specified |
| **6-7** | Some validation commands, gaps in coverage |
| **4-5** | Minimal validation commands |
| **1-3** | No executable validation |

**Checklist**:
- [ ] Linting command provided and executable
- [ ] Type checking command provided (if applicable)
- [ ] Unit test command with specific test files
- [ ] Integration test command (if applicable)
- [ ] Coverage check command with threshold
- [ ] Security scan command (if applicable)
- [ ] All commands include expected outcomes

### 5. Test Coverage (1-10) - Work-Orders Only

Evaluates whether test cases are specified.

| Score | Criteria |
|-------|----------|
| **10** | All test cases specified with assertions, edge cases covered |
| **8-9** | Main test cases specified, most assertions included |
| **6-7** | Key test cases present, some gaps |
| **4-5** | Few test cases, minimal detail |
| **1-3** | No test cases specified |

**Checklist**:
- [ ] Each test case has code template
- [ ] Assertions are explicit
- [ ] Happy path tested
- [ ] Error cases tested
- [ ] Edge cases tested

## Calculating Overall Score

### For PRPs
```
Overall = (Context + Implementation + Gotchas + Validation) / 4
```

### For Work-Orders
```
Overall = (Context + Gotchas + TestCoverage + Validation) / 4
```

## Score Thresholds

| Score | Readiness | Recommendation |
|-------|-----------|----------------|
| **9-10** | Excellent | Ready for autonomous subagent execution |
| **7-8** | Good | Ready for execution with some discovery |
| **5-6** | Fair | Needs refinement before execution |
| **3-4** | Poor | Significant gaps, recommend research phase |
| **1-2** | Inadequate | Restart with proper research |

## Response Templates

### High Confidence (7+)

```markdown
## Confidence Score: X.X/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | X/10 | [specific observation] |
| Implementation Clarity | X/10 | [specific observation] |
| Gotchas Documented | X/10 | [specific observation] |
| Validation Coverage | X/10 | [specific observation] |
| **Overall** | **X.X/10** | |

**Assessment:** Ready for execution

**Strengths:**
- [Key strength 1]
- [Key strength 2]

**Recommendations (optional):**
- [Minor improvement 1]
```

### Low Confidence (<7)

```markdown
## Confidence Score: X.X/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | X/10 | [specific gap] |
| Implementation Clarity | X/10 | [specific gap] |
| Gotchas Documented | X/10 | [specific gap] |
| Validation Coverage | X/10 | [specific gap] |
| **Overall** | **X.X/10** | |

**Assessment:** Needs refinement before execution

**Gaps to Address:**
- [ ] [Gap 1 with suggested action]
- [ ] [Gap 2 with suggested action]
- [ ] [Gap 3 with suggested action]

**Next Steps:**
1. [Specific research action]
2. [Specific documentation action]
3. [Specific validation action]
```

## Examples

### Example 1: Well-Prepared PRP

```markdown
## Confidence Score: 8.5/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | 9/10 | All files explicit, code snippets with line refs |
| Implementation Clarity | 8/10 | Pseudocode covers main path, one edge case unclear |
| Gotchas Documented | 8/10 | Redis connection pool, JWT format issues covered |
| Validation Coverage | 9/10 | All gates have commands, outcomes specified |
| **Overall** | **8.5/10** | |

**Assessment:** Ready for execution

**Strengths:**
- Comprehensive codebase intelligence with actual code snippets
- Validation gates are copy-pasteable
- Known library gotchas well-documented

**Recommendations:**
- Consider documenting concurrent token refresh edge case
```

### Example 2: Needs Work

```markdown
## Confidence Score: 5.0/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | 4/10 | File paths vague ("somewhere in auth/") |
| Implementation Clarity | 6/10 | High-level approach clear, no pseudocode |
| Gotchas Documented | 3/10 | No library-specific gotchas |
| Validation Coverage | 7/10 | Test command present, missing lint/type check |
| **Overall** | **5.0/10** | |

**Assessment:** Needs refinement before execution

**Gaps to Address:**
- [ ] Add explicit file paths (use `grep` to find them)
- [ ] Add pseudocode for token generation logic
- [ ] Research jsonwebtoken gotchas (check GitHub issues)
- [ ] Add linting and type checking commands

**Next Steps:**
1. Run `/prp:curate-docs jsonwebtoken` to create ai_docs entry
2. Use Explore agent to find exact file locations
3. Add validation gate commands from project's package.json
```

## Integration with Blueprint Development

This skill is automatically applied when:
- `/prp:create` generates a new PRP
- `/blueprint:work-order` generates a work-order
- Reviewing existing PRPs for execution readiness

The confidence score determines:
- **9+**: Proceed with subagent delegation
- **7-8**: Proceed with direct execution
- **< 7**: Refine before execution

## Tips for Improving Scores

### Context Completeness
- Use `grep` to find exact file locations
- Include actual line numbers in code snippets
- Reference ai_docs entries for library patterns

### Implementation Clarity
- Write pseudocode before describing approach
- Enumerate edge cases explicitly
- Define error handling strategy

### Gotchas Documented
- Search GitHub issues for library gotchas
- Check Stack Overflow for common problems
- Document team experience from past projects

### Validation Coverage
- Copy commands from project's config (package.json, pyproject.toml)
- Include specific file paths in test commands
- Specify expected outcomes for each gate

Overview

This skill assesses the quality of PRPs (Product Requirement Prompts) and work-orders using a systematic confidence scoring model to determine readiness for execution or delegation. It produces per-dimension scores, an overall score, and actionable recommendations to refine items before running subagents. The output is designed to be copy-pasteable into reviews and automation gates.

How this skill works

The skill evaluates five dimensions: Context Completeness, Implementation Clarity, Gotchas Documented, Validation Coverage, and (for work-orders) Test Coverage. Each dimension is scored 1–10 according to checklists and combined into an overall score based on the PRP or work-order formula. It emits a templated assessment (score table, strengths, gaps, and next steps) and a readiness recommendation for delegation or further refinement.

When to use it

  • After creating a new PRP (/prp:create) to verify execution readiness
  • When generating a work-order (/blueprint:work-order) to confirm test and validation coverage
  • Before delegating tasks to subagents to avoid wasted cycles
  • During PRP or work-order reviews to identify gaps and mitigations
  • When deciding whether to run an automated execution pipeline or pause for refinement

Best practices

  • Include absolute or clearly relative file paths and line ranges in examples
  • Provide pseudocode and an explicit task breakdown for complex logic
  • Document known library/version-specific gotchas with mitigation steps
  • Add executable validation commands with expected outcomes (lint, tests, coverage, security scans)
  • Specify concrete test cases and assertions for work-orders to improve test coverage scores

Example use cases

  • Score a newly authored PRP to decide if it can be delegated to an autonomous subagent
  • Validate a generated work-order to ensure test cases and validation gates are present before CI runs
  • Review legacy task descriptions and convert them into high-confidence PRPs for safe automation
  • Triage incoming implementation requests to determine research vs execution tracks
  • Produce a concise remediation checklist for a PRP rated below the readiness threshold

FAQ

What overall score is required to delegate to a subagent?

A score of 9–10 indicates excellent readiness for autonomous subagent execution; 7–8 is acceptable for direct execution with minor discovery; below 7 requires refinement.

Which dimensions matter most for PRPs vs work-orders?

For PRPs the key dimensions are Context, Implementation, Gotchas, and Validation. For work-orders replace Implementation with Test Coverage and weigh context, gotchas, validation, and tests.

How are validation commands expected to be provided?

Validation commands should be executable copy-paste commands from project configs (package.json, pyproject.toml) and include expected outcomes or thresholds.

What immediate actions follow a low-confidence assessment?

Provide a short remediation checklist: add explicit file paths, write pseudocode, document gotchas with mitigations, and add executable validation/test commands.