home / skills / laurigates / claude-plugins / confidence-scoring

confidence-scoring skill

safe

/blueprint-plugin/skills/confidence-scoring

This skill assesses confidence in PRPs and work-orders, guiding readiness for execution or subagent delegation with structured scoring.

npx playbooks add skill laurigates/claude-plugins --skill confidence-scoring

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

8.7 KB

---
model: haiku
created: 2025-12-16
modified: 2026-02-06
reviewed: 2025-12-16
name: confidence-scoring
description: "Assess quality of PRPs and work-orders using systematic confidence scoring. Use when evaluating readiness for execution or subagent delegation."
allowed-tools: Read, Grep, Glob
---

# Confidence Scoring for PRPs and Work-Orders

This skill provides systematic evaluation of PRPs (Product Requirement Prompts) and work-orders to determine their readiness for execution or delegation.

## When to Use This Skill

Activate this skill when:
- Creating a new PRP (`/prp:create`)
- Generating a work-order (`/blueprint:work-order`)
- Deciding whether to execute or refine a PRP
- Evaluating whether a task is ready for subagent delegation
- Reviewing PRPs/work-orders for quality

## Scoring Dimensions

### 1. Context Completeness (1-10)

Evaluates whether all necessary context is explicitly provided.

| Score | Criteria |
|-------|----------|
| **10** | All file paths explicit with line numbers, all code snippets included, library versions specified, integration points documented |
| **8-9** | Most context provided, minor gaps that can be inferred from codebase |
| **6-7** | Key context present but some discovery required |
| **4-5** | Significant context missing, will need exploration |
| **1-3** | Minimal context, extensive discovery needed |

**Checklist**:
- [ ] File paths are absolute or clearly relative to project root
- [ ] Code snippets include actual line numbers (e.g., `src/auth.py:45-60`)
- [ ] Library versions are specified
- [ ] Integration points are documented
- [ ] Patterns from codebase are shown with examples

### 2. Implementation Clarity (1-10)

Evaluates how clear the implementation approach is.

| Score | Criteria |
|-------|----------|
| **10** | Pseudocode covers all cases, step-by-step clear, edge cases addressed |
| **8-9** | Main path clear, most edge cases covered |
| **6-7** | Implementation approach clear, some details need discovery |
| **4-5** | High-level only, significant ambiguity |
| **1-3** | Vague requirements, unclear approach |

**Checklist**:
- [ ] Task breakdown is explicit
- [ ] Pseudocode is provided for complex logic
- [ ] Implementation order is specified
- [ ] Edge cases are identified
- [ ] Error handling approach is documented

### 3. Gotchas Documented (1-10)

Evaluates whether known pitfalls are documented with mitigations.

| Score | Criteria |
|-------|----------|
| **10** | All known pitfalls documented, each has mitigation, library-specific issues covered |
| **8-9** | Major gotchas covered, mitigations clear |
| **6-7** | Some gotchas documented, may discover more |
| **4-5** | Few gotchas mentioned, incomplete coverage |
| **1-3** | No gotchas documented |

**Checklist**:
- [ ] Library-specific gotchas documented
- [ ] Version-specific behaviors noted
- [ ] Common mistakes identified
- [ ] Each gotcha has a mitigation
- [ ] Race conditions/concurrency issues addressed

### 4. Validation Coverage (1-10)

Evaluates whether executable validation commands are provided.

| Score | Criteria |
|-------|----------|
| **10** | All quality gates have executable commands, expected outcomes specified |
| **8-9** | Main validation commands present, most outcomes specified |
| **6-7** | Some validation commands, gaps in coverage |
| **4-5** | Minimal validation commands |
| **1-3** | No executable validation |

**Checklist**:
- [ ] Linting command provided and executable
- [ ] Type checking command provided (if applicable)
- [ ] Unit test command with specific test files
- [ ] Integration test command (if applicable)
- [ ] Coverage check command with threshold
- [ ] Security scan command (if applicable)
- [ ] All commands include expected outcomes

### 5. Test Coverage (1-10) - Work-Orders Only

Evaluates whether test cases are specified.

| Score | Criteria |
|-------|----------|
| **10** | All test cases specified with assertions, edge cases covered |
| **8-9** | Main test cases specified, most assertions included |
| **6-7** | Key test cases present, some gaps |
| **4-5** | Few test cases, minimal detail |
| **1-3** | No test cases specified |

**Checklist**:
- [ ] Each test case has code template
- [ ] Assertions are explicit
- [ ] Happy path tested
- [ ] Error cases tested
- [ ] Edge cases tested

## Calculating Overall Score

### For PRPs
```
Overall = (Context + Implementation + Gotchas + Validation) / 4
```

### For Work-Orders
```
Overall = (Context + Gotchas + TestCoverage + Validation) / 4
```

## Score Thresholds

| Score | Readiness | Recommendation |
|-------|-----------|----------------|
| **9-10** | Excellent | Ready for autonomous subagent execution |
| **7-8** | Good | Ready for execution with some discovery |
| **5-6** | Fair | Needs refinement before execution |
| **3-4** | Poor | Significant gaps, recommend research phase |
| **1-2** | Inadequate | Restart with proper research |

## Response Templates

### High Confidence (7+)

```markdown
## Confidence Score: X.X/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | X/10 | [specific observation] |
| Implementation Clarity | X/10 | [specific observation] |
| Gotchas Documented | X/10 | [specific observation] |
| Validation Coverage | X/10 | [specific observation] |
| **Overall** | **X.X/10** | |

**Assessment:** Ready for execution

**Strengths:**
- [Key strength 1]
- [Key strength 2]

**Recommendations (optional):**
- [Minor improvement 1]
```

### Low Confidence (<7)

```markdown
## Confidence Score: X.X/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | X/10 | [specific gap] |
| Implementation Clarity | X/10 | [specific gap] |
| Gotchas Documented | X/10 | [specific gap] |
| Validation Coverage | X/10 | [specific gap] |
| **Overall** | **X.X/10** | |

**Assessment:** Needs refinement before execution

**Gaps to Address:**
- [ ] [Gap 1 with suggested action]
- [ ] [Gap 2 with suggested action]
- [ ] [Gap 3 with suggested action]

**Next Steps:**
1. [Specific research action]
2. [Specific documentation action]
3. [Specific validation action]
```

## Examples

### Example 1: Well-Prepared PRP

```markdown
## Confidence Score: 8.5/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | 9/10 | All files explicit, code snippets with line refs |
| Implementation Clarity | 8/10 | Pseudocode covers main path, one edge case unclear |
| Gotchas Documented | 8/10 | Redis connection pool, JWT format issues covered |
| Validation Coverage | 9/10 | All gates have commands, outcomes specified |
| **Overall** | **8.5/10** | |

**Assessment:** Ready for execution

**Strengths:**
- Comprehensive codebase intelligence with actual code snippets
- Validation gates are copy-pasteable
- Known library gotchas well-documented

**Recommendations:**
- Consider documenting concurrent token refresh edge case
```

### Example 2: Needs Work

```markdown
## Confidence Score: 5.0/10

| Dimension | Score | Notes |
|-----------|-------|-------|
| Context Completeness | 4/10 | File paths vague ("somewhere in auth/") |
| Implementation Clarity | 6/10 | High-level approach clear, no pseudocode |
| Gotchas Documented | 3/10 | No library-specific gotchas |
| Validation Coverage | 7/10 | Test command present, missing lint/type check |
| **Overall** | **5.0/10** | |

**Assessment:** Needs refinement before execution

**Gaps to Address:**
- [ ] Add explicit file paths (use `grep` to find them)
- [ ] Add pseudocode for token generation logic
- [ ] Research jsonwebtoken gotchas (check GitHub issues)
- [ ] Add linting and type checking commands

**Next Steps:**
1. Run `/prp:curate-docs jsonwebtoken` to create ai_docs entry
2. Use Explore agent to find exact file locations
3. Add validation gate commands from project's package.json
```

## Integration with Blueprint Development

This skill is automatically applied when:
- `/prp:create` generates a new PRP
- `/blueprint:work-order` generates a work-order
- Reviewing existing PRPs for execution readiness

The confidence score determines:
- **9+**: Proceed with subagent delegation
- **7-8**: Proceed with direct execution
- **< 7**: Refine before execution

## Tips for Improving Scores

### Context Completeness
- Use `grep` to find exact file locations
- Include actual line numbers in code snippets
- Reference ai_docs entries for library patterns

### Implementation Clarity
- Write pseudocode before describing approach
- Enumerate edge cases explicitly
- Define error handling strategy

### Gotchas Documented
- Search GitHub issues for library gotchas
- Check Stack Overflow for common problems
- Document team experience from past projects

### Validation Coverage
- Copy commands from project's config (package.json, pyproject.toml)
- Include specific file paths in test commands
- Specify expected outcomes for each gate

Overview

This skill assesses the quality of PRPs (Product Requirement Prompts) and work-orders using a systematic confidence scoring model to determine readiness for execution or delegation. It produces per-dimension scores, an overall score, and actionable recommendations to refine items before running subagents. The output is designed to be copy-pasteable into reviews and automation gates.

How this skill works

The skill evaluates five dimensions: Context Completeness, Implementation Clarity, Gotchas Documented, Validation Coverage, and (for work-orders) Test Coverage. Each dimension is scored 1–10 according to checklists and combined into an overall score based on the PRP or work-order formula. It emits a templated assessment (score table, strengths, gaps, and next steps) and a readiness recommendation for delegation or further refinement.

When to use it

After creating a new PRP (/prp:create) to verify execution readiness
When generating a work-order (/blueprint:work-order) to confirm test and validation coverage
Before delegating tasks to subagents to avoid wasted cycles
During PRP or work-order reviews to identify gaps and mitigations
When deciding whether to run an automated execution pipeline or pause for refinement

Best practices

Include absolute or clearly relative file paths and line ranges in examples
Provide pseudocode and an explicit task breakdown for complex logic
Document known library/version-specific gotchas with mitigation steps
Add executable validation commands with expected outcomes (lint, tests, coverage, security scans)
Specify concrete test cases and assertions for work-orders to improve test coverage scores

Example use cases

Score a newly authored PRP to decide if it can be delegated to an autonomous subagent
Validate a generated work-order to ensure test cases and validation gates are present before CI runs
Review legacy task descriptions and convert them into high-confidence PRPs for safe automation
Triage incoming implementation requests to determine research vs execution tracks
Produce a concise remediation checklist for a PRP rated below the readiness threshold

FAQ

What overall score is required to delegate to a subagent?

A score of 9–10 indicates excellent readiness for autonomous subagent execution; 7–8 is acceptable for direct execution with minor discovery; below 7 requires refinement.

Which dimensions matter most for PRPs vs work-orders?

For PRPs the key dimensions are Context, Implementation, Gotchas, and Validation. For work-orders replace Implementation with Test Coverage and weigh context, gotchas, validation, and tests.

How are validation commands expected to be provided?

Validation commands should be executable copy-paste commands from project configs (package.json, pyproject.toml) and include expected outcomes or thresholds.

What immediate actions follow a low-confidence assessment?

Provide a short remediation checklist: add explicit file paths, write pseudocode, document gotchas with mitigations, and add executable validation/test commands.