home / skills / zpankz / mcp-skillset / baseline-quality-assessment

baseline-quality-assessment skill

safe

This skill helps you plan iteration 0 with a data-driven baseline (V_meta ≥0.40), enabling rapid convergence and clear prioritization.

npx playbooks add skill zpankz/mcp-skillset --skill baseline-quality-assessment

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

12.4 KB

---
name: Baseline Quality Assessment
description: Achieve comprehensive baseline (V_meta ≥0.40) in iteration 0 to enable rapid convergence. Use when planning iteration 0 time allocation, domain has established practices to reference, rich historical data exists for immediate quantification, or targeting 3-4 iteration convergence. Provides 4 quality levels (minimal/basic/comprehensive/exceptional), component-by-component V_meta calculation guide, and 3 strategies for comprehensive baseline (leverage prior art, quantify baseline, domain universality analysis). 40-50% iteration reduction when V_meta(s₀) ≥0.40 vs <0.20. Spend 3-4 extra hours in iteration 0, save 3-6 hours overall.
allowed-tools: Read, Grep, Glob, Bash, Edit, Write
---

# Baseline Quality Assessment

**Invest in iteration 0 to save 40-50% total time.**

> A strong baseline (V_meta ≥0.40) is the foundation of rapid convergence. Spend hours in iteration 0 to save days overall.

---

## When to Use This Skill

Use this skill when:
- 📋 **Planning iteration 0**: Deciding time allocation and priorities
- 🎯 **Targeting rapid convergence**: Want 3-4 iterations (not 5-7)
- 📚 **Prior art exists**: Domain has established practices to reference
- 📊 **Historical data available**: Can quantify baseline immediately
- ⏰ **Time constraints**: Need methodology in 10-15 hours total
- 🔍 **Gap clarity needed**: Want obvious iteration objectives

**Don't use when**:
- ❌ Exploratory domain (no prior art)
- ❌ Greenfield project (no historical data)
- ❌ Time abundant (standard convergence acceptable)
- ❌ Incremental baseline acceptable (build up gradually)

---

## Quick Start (30 minutes)

### Baseline Quality Self-Assessment

Calculate your V_meta(s₀):

**V_meta = (Completeness + Effectiveness + Reusability + Validation) / 4**

**Completeness** (Documentation exists?):
- 0.00: No documentation
- 0.25: Basic notes only
- 0.50: Partial documentation (some categories)
- 0.75: Most documentation complete
- 1.00: Comprehensive documentation

**Effectiveness** (Speedup quantified?):
- 0.00: No baseline measurement
- 0.25: Informal estimates
- 0.50: Some metrics measured
- 0.75: Most metrics quantified
- 1.00: Full quantitative baseline

**Reusability** (Transferable patterns?):
- 0.00: No patterns identified
- 0.25: Ad-hoc solutions only
- 0.50: Some patterns emerging
- 0.75: Most patterns codified
- 1.00: Universal patterns documented

**Validation** (Evidence-based?):
- 0.00: No validation
- 0.25: Anecdotal only
- 0.50: Some data analysis
- 0.75: Systematic analysis
- 1.00: Comprehensive validation

**Example** (Bootstrap-003, V_meta(s₀) = 0.48):
```
Completeness: 0.60 (10-category taxonomy, 79.1% coverage)
Effectiveness: 0.40 (Error rate quantified: 5.78%)
Reusability: 0.40 (5 workflows, 5 patterns, 8 guidelines)
Validation: 0.50 (1,336 errors analyzed)
---
V_meta(s₀) = (0.60 + 0.40 + 0.40 + 0.50) / 4 = 0.475 ≈ 0.48
```

**Target**: V_meta(s₀) ≥ 0.40 for rapid convergence

---

## Four Baseline Quality Levels

### Level 1: Minimal (V_meta <0.20)

**Characteristics**:
- No or minimal documentation
- No quantitative metrics
- No pattern identification
- No validation

**Iteration 0 time**: 1-2 hours
**Total iterations**: 6-10 (standard to slow convergence)
**Example**: Starting from scratch in novel domain

**When acceptable**: Exploratory research, no prior art

### Level 2: Basic (V_meta 0.20-0.39)

**Characteristics**:
- Basic documentation (notes, informal structure)
- Some metrics identified (not quantified)
- Ad-hoc patterns (not codified)
- Anecdotal validation

**Iteration 0 time**: 2-3 hours
**Total iterations**: 5-7 (standard convergence)
**Example**: Bootstrap-002 (V_meta(s₀) = 0.04, but quickly built to basic)

**When acceptable**: Standard timelines, incremental approach

### Level 3: Comprehensive (V_meta 0.40-0.60) ⭐ TARGET

**Characteristics**:
- Structured documentation (taxonomy, categories)
- Quantified metrics (baseline measured)
- Codified patterns (initial pattern library)
- Systematic validation (data analysis)

**Iteration 0 time**: 3-5 hours
**Total iterations**: 3-4 (rapid convergence)
**Example**: Bootstrap-003 (V_meta(s₀) = 0.48, converged in 3 iterations)

**When to target**: Time constrained, prior art exists, data available

### Level 4: Exceptional (V_meta >0.60)

**Characteristics**:
- Comprehensive documentation (≥90% coverage)
- Full quantitative baseline (all metrics)
- Extensive pattern library
- Validated methodology (proven in 1+ contexts)

**Iteration 0 time**: 5-8 hours
**Total iterations**: 2-3 (exceptional rapid convergence)
**Example**: Hypothetical (not yet observed in experiments)

**When to target**: Adaptation of proven methodology, domain expertise high

---

## Three Strategies for Comprehensive Baseline

### Strategy 1: Leverage Prior Art (2-3 hours)

**When**: Domain has established practices

**Steps**:

1. **Literature review** (30 min):
   - Industry best practices
   - Existing methodologies
   - Academic research

2. **Extract patterns** (60 min):
   - Common approaches
   - Known anti-patterns
   - Success metrics

3. **Adapt to context** (60 min):
   - What's applicable?
   - What needs modification?
   - What's missing?

**Example** (Bootstrap-003):
```
Prior art: Error handling literature
- Detection: Industry standard (logs, monitoring)
- Diagnosis: Root cause analysis patterns
- Recovery: Retry, fallback patterns
- Prevention: Static analysis, linting

Adaptation:
- Detection: meta-cc MCP queries (novel application)
- Diagnosis: Session history analysis (context-specific)
- Recovery: Generic patterns apply
- Prevention: Pre-tool validation (novel approach)

Result: V_completeness = 0.60 (60% from prior art, 40% novel)
```

### Strategy 2: Quantify Baseline (1-2 hours)

**When**: Rich historical data exists

**Steps**:

1. **Identify data sources** (15 min):
   - Logs, session history, metrics
   - Git history, CI/CD logs
   - Issue trackers, user feedback

2. **Extract metrics** (30 min):
   - Volume (total instances)
   - Rate (frequency)
   - Distribution (categories)
   - Impact (cost)

3. **Analyze patterns** (45 min):
   - What's most common?
   - What's most costly?
   - What's preventable?

**Example** (Bootstrap-003):
```
Data source: meta-cc MCP server
Query: meta-cc query-tools --status error

Results:
- Volume: 1,336 errors
- Rate: 5.78% error rate
- Distribution: File-not-found 12.2%, Read-before-write 5.2%, etc.
- Impact: MTTD 15 min, MTTR 30 min

Analysis:
- Top 3 categories account for 23.7% of errors
- File path issues most preventable
- Clear automation opportunities

Result: V_effectiveness = 0.40 (baseline quantified)
```

### Strategy 3: Domain Universality Analysis (1-2 hours)

**When**: Domain is universal (errors, testing, CI/CD)

**Steps**:

1. **Identify universal patterns** (30 min):
   - What applies to all projects?
   - What's language-agnostic?
   - What's platform-agnostic?

2. **Document transferability** (30 min):
   - What % is reusable?
   - What needs adaptation?
   - What's project-specific?

3. **Create initial taxonomy** (30 min):
   - Categorize patterns
   - Identify gaps
   - Estimate coverage

**Example** (Bootstrap-003):
```
Universal patterns:
- Errors affect all software (100% universal)
- Detection, diagnosis, recovery, prevention (universal workflow)
- File operations, API calls, data validation (universal categories)

Taxonomy (iteration 0):
- 10 categories identified
- 1,058 errors classified (79.1% coverage)
- Gaps: Edge cases, complex interactions

Result: V_reusability = 0.40 (universal patterns identified)
```

---

## Baseline Investment ROI

**Trade-off**: Spend more in iteration 0 to save overall time

**Data** (from experiments):

| Baseline | Iter 0 Time | Total Iterations | Total Time | Savings |
|----------|-------------|------------------|------------|---------|
| Minimal (<0.20) | 1-2h | 6-10 | 24-40h | Baseline |
| Basic (0.20-0.39) | 2-3h | 5-7 | 20-28h | 10-30% |
| Comprehensive (0.40-0.60) | 3-5h | 3-4 | 12-16h | 40-50% |
| Exceptional (>0.60) | 5-8h | 2-3 | 10-15h | 50-60% |

**Example** (Bootstrap-003):
```
Comprehensive baseline:
- Iteration 0: 3 hours (vs 1 hour minimal)
- Total: 10 hours, 3 iterations
- Savings: 15-25 hours vs minimal baseline (60-70%)

ROI: +2 hours investment → 15-25 hours saved
```

**Recommendation**: Target comprehensive (V_meta ≥0.40) when:
- Time constrained (need fast convergence)
- Prior art exists (can leverage quickly)
- Data available (can quantify immediately)

---

## Component-by-Component Guide

### Completeness (Documentation)

**0.00**: No documentation

**0.25**: Basic notes
- Informal observations
- Bullet points
- No structure

**0.50**: Partial documentation
- Some categories/patterns
- 40-60% coverage
- Basic structure

**0.75**: Most documentation
- Structured taxonomy
- 70-90% coverage
- Clear organization

**1.00**: Comprehensive
- Complete taxonomy
- 90%+ coverage
- Production-ready

**Target for V_meta ≥0.40**: Completeness ≥0.50

### Effectiveness (Quantification)

**0.00**: No baseline measurement

**0.25**: Informal estimates
- "Errors happen sometimes"
- No numbers

**0.50**: Some metrics
- Volume measured (e.g., 1,336 errors)
- Rate not calculated

**0.75**: Most metrics
- Volume, rate, distribution
- Missing impact (MTTD/MTTR)

**1.00**: Full quantification
- All metrics measured
- Baseline fully quantified

**Target for V_meta ≥0.40**: Effectiveness ≥0.30

### Reusability (Patterns)

**0.00**: No patterns

**0.25**: Ad-hoc solutions
- One-off fixes
- No generalization

**0.50**: Some patterns
- 3-5 patterns identified
- Partial universality

**0.75**: Most patterns
- 5-10 patterns codified
- High transferability

**1.00**: Universal patterns
- Complete pattern library
- 90%+ transferable

**Target for V_meta ≥0.40**: Reusability ≥0.40

### Validation (Evidence)

**0.00**: No validation

**0.25**: Anecdotal
- "Seems to work"
- No data

**0.50**: Some data
- Basic analysis
- Limited scope

**0.75**: Systematic
- Comprehensive analysis
- Clear evidence

**1.00**: Validated
- Multiple contexts
- Statistical confidence

**Target for V_meta ≥0.40**: Validation ≥0.30

---

## Iteration 0 Checklist (for V_meta ≥0.40)

**Documentation** (Target: Completeness ≥0.50):
- [ ] Create initial taxonomy (≥5 categories)
- [ ] Document 3-5 patterns/workflows
- [ ] Achieve 60-80% coverage
- [ ] Structured markdown documentation

**Quantification** (Target: Effectiveness ≥0.30):
- [ ] Measure volume (total instances)
- [ ] Calculate rate (frequency)
- [ ] Analyze distribution (category breakdown)
- [ ] Baseline quantified with numbers

**Patterns** (Target: Reusability ≥0.40):
- [ ] Identify 3-5 universal patterns
- [ ] Document transferability
- [ ] Estimate reusability %
- [ ] Distinguish universal vs domain-specific

**Validation** (Target: Validation ≥0.30):
- [ ] Analyze historical data
- [ ] Sample validation (≥30 instances)
- [ ] Evidence-based claims
- [ ] Data sources documented

**Time Investment**: 3-5 hours

**Expected V_meta(s₀)**: 0.40-0.50

---

## Success Criteria

Baseline quality assessment succeeded when:

1. **V_meta target met**: V_meta(s₀) ≥ 0.40 achieved
2. **Iteration reduction**: 3-4 iterations vs 5-7 (40-50% reduction)
3. **Time savings**: Total time ≤12-16 hours (comprehensive baseline)
4. **Gap clarity**: Clear objectives for iteration 1-2
5. **ROI positive**: Baseline investment <total time saved

**Bootstrap-003 Validation**:
- ✅ V_meta(s₀) = 0.48 (target met)
- ✅ 3 iterations (vs 6 for Bootstrap-002 with minimal baseline)
- ✅ 10 hours total (60% reduction)
- ✅ Gaps clear (top 3 automations identified)
- ✅ ROI: +2h investment → 15h saved

---

## Related Skills

**Parent framework**:
- [methodology-bootstrapping](../methodology-bootstrapping/SKILL.md) - Core OCA cycle

**Uses baseline for**:
- [rapid-convergence](../rapid-convergence/SKILL.md) - V_meta ≥0.40 is criterion #1

**Validation**:
- [retrospective-validation](../retrospective-validation/SKILL.md) - Data quantification

---

## References

**Core guide**:
- [Quality Levels](reference/quality-levels.md) - Detailed level definitions
- [Component Guide](reference/components.md) - V_meta calculation
- [Investment ROI](reference/roi.md) - Time savings analysis

**Examples**:
- [Bootstrap-003 Comprehensive](examples/error-recovery-comprehensive-baseline.md) - V_meta=0.48
- [Bootstrap-002 Minimal](examples/testing-strategy-minimal-baseline.md) - V_meta=0.04

---

**Status**: ✅ Validated | 40-50% iteration reduction | Positive ROI

Overview

This skill defines a practical method to establish a strong baseline (V_meta ≥0.40) in iteration 0 to enable rapid convergence. It prescribes component-level scoring, four quality levels, and three concrete strategies to reach a comprehensive baseline. Use it to trade a small extra investment up front for large overall time savings and clearer iteration goals.

How this skill works

Inspect four components—Completeness, Effectiveness, Reusability, and Validation—and calculate V_meta as their average. Provide a short self-assessment rubric for each component, a checklist for iteration 0, and three focused strategies (leverage prior art, quantify baseline, domain universality) to raise V_meta quickly. The output guides time allocation and predicts iteration count and ROI.

When to use it

Planning iteration 0 time allocation and priorities
Targeting rapid convergence (3–4 iterations)
When prior art and rich historical data exist for immediate quantification
When time is constrained and clear iteration objectives are needed
When you want obvious gap identification and measurable baselines

Best practices

Spend 3–5 hours in iteration 0 to aim for V_meta ≥0.40 for best ROI
Start with prior-art extraction to fast-track completeness and reusability
Quantify baseline metrics from logs and issue history to boost Effectiveness
Document universal patterns and an initial taxonomy to improve transferability
Use the iteration 0 checklist: taxonomy, metrics, patterns, and sample validation

Example use cases

Bootstrap a recovery workflow where historical error logs exist and you need 3 iterations to converge
Adapt an established methodology to a new project by extracting and applying prior patterns
Measure and prioritize automation candidates by quantifying error volume and impact
Create a reusable pattern library for cross-project CI/CD issues using universality analysis

FAQ

What is the target V_meta for rapid convergence?

Target V_meta(s₀) ≥0.40 to achieve 3–4 iteration convergence and ~40–50% iteration reduction.

How much extra time should I invest in iteration 0?

Plan 3–5 hours for a comprehensive baseline; 2–3 hours for a basic baseline; exceptional baselines may take 5–8 hours.