home / skills / nickcrew / claude-cortex / quality-audit

quality-audit skill

safe

This skill audits skills using four dimensions Clarity, Completeness, Accuracy, and Usefulness to deliver actionable quality recommendations.

npx playbooks add skill nickcrew/claude-cortex --skill quality-audit

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

8.9 KB

---
name: quality-audit
description: >-
  Meta-skill for auditing and validating skill quality. Use when reviewing
  skills for consistency, completeness, accuracy, and adherence to standards.
  Provides structured rubrics, scoring frameworks, and actionable recommendations.
author: cortex team
version: 1.0.0
license: MIT
tags: [meta, quality, validation, review, standards]
created: 2026-01-05
updated: 2026-01-05
triggers:
  - audit skill
  - review skill quality
  - validate skill
  - skill quality check
  - rubric assessment
dependencies:
  skills: []
  tools: []
token_estimate: ~2000
---

# Quality Audit Skill

Systematic framework for evaluating skill quality across four dimensions: **Clarity**, **Completeness**, **Accuracy**, and **Usefulness**.

## When to Use This Skill

- Reviewing a new skill before adding to the registry
- Auditing existing skills for quality improvements
- Creating quality rubrics for skill validation
- Standardizing skill quality across the library
- Preparing skills for production use

## Core Principles

### The Four Quality Dimensions

| Dimension | Weight | Focus |
|-----------|--------|-------|
| **Clarity** | 25% | Structure, readability, progressive disclosure |
| **Completeness** | 25% | Coverage, examples, edge cases, anti-patterns |
| **Accuracy** | 30% | Correctness, best practices, security |
| **Usefulness** | 20% | Real-world applicability, production-readiness |

### Scoring Scale (1-5)

| Score | Label | Meaning |
|-------|-------|---------|
| 1 | Unacceptable | Fundamentally broken, dangerous, or unusable |
| 2 | Needs Work | Major issues requiring significant revision |
| 3 | Acceptable | Meets minimum standards, functional |
| 4 | Good | High quality, minor improvements possible |
| 5 | Excellent | Exemplary, production-ready, best-in-class |

### Passing Criteria

- **Minimum**: 3.0 weighted average (acceptable)
- **Target**: 4.0 weighted average (good)
- **Exceptional**: 4.5+ weighted average (excellent)
- **Blocking**: Accuracy must be ≥3.0 (no dangerous advice)

## Audit Workflow

### Phase 1: Structure Check

```yaml
checklist:
  structure:
    - [ ] Has valid YAML frontmatter
    - [ ] Contains required metadata (name, description)
    - [ ] Follows progressive disclosure (Tier 1 → 2 → 3)
    - [ ] Sections are logically ordered
    - [ ] Token estimate is reasonable (<5000 for core)
```

### Phase 2: Content Evaluation

```yaml
checklist:
  content:
    - [ ] "When to Use" section is clear
    - [ ] Core principles are well-defined
    - [ ] Code examples are complete and runnable
    - [ ] Anti-patterns are documented
    - [ ] Troubleshooting guidance exists
```

### Phase 3: Dimension Scoring

For each dimension, evaluate against specific criteria:

**Clarity Criteria:**
- Well-organized sections with logical flow
- Concise explanations without jargon overload
- Code examples are readable and well-commented
- Progressive disclosure from simple to complex

**Completeness Criteria:**
- Covers core concepts thoroughly
- Includes edge cases and error handling
- Provides both do's and don'ts
- Has working examples for main use cases

**Accuracy Criteria:**
- Code examples compile/run without errors
- Follows current best practices (not deprecated)
- Security considerations are correct
- Performance claims are verifiable

**Usefulness Criteria:**
- Examples solve real-world problems
- Can be applied immediately
- Scales to production use cases
- Includes troubleshooting guidance

### Phase 4: Report Generation

```markdown
## Audit Report: {skill_name}

**Date**: {date}
**Auditor**: {auditor}
**Status**: {PASS|FAIL|NEEDS_REVIEW}

### Scores

| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Clarity | {x}/5 | 25% | {x*0.25} |
| Completeness | {x}/5 | 25% | {x*0.25} |
| Accuracy | {x}/5 | 30% | {x*0.30} |
| Usefulness | {x}/5 | 20% | {x*0.20} |
| **Total** | | | **{sum}/5** |

### Issues Found

- [CRITICAL] {issue description}
- [MAJOR] {issue description}
- [MINOR] {issue description}

### Recommendations

1. {actionable recommendation}
2. {actionable recommendation}
```

## Implementation Patterns

### Pattern 1: Quick Audit (5-minute review)

Use for rapid assessment of skill quality:

```bash
# Run automated structure checks
cortex skills audit <skill-name> --quick

# Output: Pass/Fail with basic metrics
```

**Quick Audit Checks:**
1. YAML frontmatter valid?
2. Required sections present?
3. Code blocks have language tags?
4. No TODO/FIXME markers?
5. Token count reasonable?

### Pattern 2: Full Audit (15-30 minute review)

Comprehensive evaluation with human review:

```bash
# Generate full audit report
cortex skills audit <skill-name> --full

# Interactive mode for scoring
cortex skills audit <skill-name> --interactive
```

**Full Audit Process:**
1. Run automated checks
2. Read through content manually
3. Test code examples
4. Score each dimension
5. Document issues and recommendations
6. Generate report

### Pattern 3: Comparative Audit

Compare skill against reference implementation:

```bash
# Compare against template-skill-enhanced
cortex skills audit <skill-name> --compare template-skill-enhanced
```

### Pattern 4: Batch Audit

Audit multiple skills for registry health:

```bash
# Audit all skills in a category
cortex skills audit --category security

# Audit skills below threshold
cortex skills audit --below-score 3.5
```

## CLI Commands

```bash
# Basic audit
cortex skills audit <skill-name>

# Options
  --quick           Quick structural check only
  --full            Full audit with all dimensions
  --interactive     Interactive scoring mode
  --output FILE     Write report to file
  --format FORMAT   Output format (markdown|json|yaml)
  --compare SKILL   Compare against reference skill
  --fix             Auto-fix simple issues (formatting)
```

## Creating Custom Rubrics

Skills can define custom rubrics in `validation/rubric.yaml`:

```yaml
# validation/rubric.yaml
version: "1.0.0"
skill_name: my-skill

dimensions:
  clarity:
    weight: 25
    criteria:
      - "API examples use realistic data"
      - "Error handling is shown for each operation"
  completeness:
    weight: 25
    criteria:
      - "Covers all HTTP methods"
      - "Includes pagination patterns"
  accuracy:
    weight: 30
    criteria:
      - "Follows REST conventions"
      - "Security headers documented"
  usefulness:
    weight: 20
    criteria:
      - "Examples work with common frameworks"

passing_criteria:
  minimum_score: 3.5  # Higher bar for this skill
  required_dimensions:
    - accuracy
    - completeness
```

## Best Practices

### Do

- **Be specific** - "Line 45: SQL query vulnerable to injection" not "has security issues"
- **Be actionable** - Include how to fix each issue
- **Be fair** - Use the same standards consistently
- **Document evidence** - Quote specific content for each score
- **Prioritize** - Critical issues first, suggestions last

### Don't

- Score based on personal style preferences
- Mark deprecated patterns without suggesting alternatives
- Fail skills for missing optional sections
- Ignore security issues regardless of other scores
- Rush through audits for complex skills

## Anti-Patterns

### The Rubber Stamp

**Problem**: Approving skills without thorough review
**Why it's bad**: Low-quality skills erode trust in the library
**Fix**: Use the full audit checklist, test code examples

### The Perfectionist Block

**Problem**: Failing skills for minor issues
**Why it's bad**: Prevents useful skills from being available
**Fix**: Distinguish between blocking issues and suggestions

### Score Inflation

**Problem**: Giving high scores without justification
**Why it's bad**: Makes scores meaningless
**Fix**: Document specific evidence for each score

## Integration with CI/CD

```yaml
# .github/workflows/skill-quality.yml
name: Skill Quality Gate

on:
  pull_request:
    paths:
      - 'skills/**'

jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install cortex
        run: pip install cortex
      - name: Audit changed skills
        run: |
          for skill in $(git diff --name-only HEAD~1 | grep 'skills/' | cut -d'/' -f2 | uniq); do
            cortex skills audit "$skill" --quick --fail-under 3.0
          done
```

## Troubleshooting

### "Audit fails but skill looks fine"

1. Check YAML frontmatter syntax
2. Verify all required sections exist
3. Ensure code blocks have language tags
4. Check for hidden characters (copy/paste issues)

### "Scores seem inconsistent"

1. Review the scoring guide for each dimension
2. Calibrate by auditing template-skill-enhanced first
3. Use --interactive mode for clearer criteria

## External Resources

- [Skill Template Reference](../template-skill-enhanced/SKILL.md)
- [Rubric Schema](../rubric.schema.yaml)
- [Skill Creator Guide](../skill-creator/SKILL.md)

## Changelog

### 1.0.0 (2026-01-05)
- Initial release
- Four-dimension scoring framework
- CLI integration
- CI/CD workflow example

Overview

This skill provides a meta-audit framework for evaluating and validating the quality of AI skills across clarity, completeness, accuracy, and usefulness. It delivers structured rubrics, scoring rules, and report templates to produce consistent, actionable audit results. Use it to standardize quality gates before publishing or deploying skills.

How this skill works

The skill runs automated structure checks, guides a human review across four weighted dimensions, and produces a scored audit report with findings and prioritized recommendations. It supports quick, full, comparative, and batch audit patterns, plus CLI options to export reports and integrate with CI. Custom rubrics can be supplied per-skill to adjust weights and pass criteria.

When to use it

Before adding a new skill to a registry
Auditing existing skills for quality improvements
Creating or applying standardized validation rubrics
Enforcing quality gates in CI/CD pipelines
Batch reviews of category or low-scoring skills

Best practices

Be specific and cite line numbers or exact snippets for issues
Provide actionable remediation steps, not just descriptions of problems
Use the same rubric and calibration examples across auditors
Prioritize critical security and accuracy issues before stylistic ones
Test code examples and include runnable snippets where possible

Example use cases

Run a quick 5-minute structural check during pull requests to block obvious issues
Perform a 15–30 minute full audit for production readiness with interactive scoring
Compare an incoming skill to a reference implementation to identify gaps
Batch-audit a category of skills to produce a registry health report
Integrate the audit into CI to fail PRs under a minimum quality threshold

FAQ

What are the scoring thresholds for pass/fail?

Minimum acceptable is a 3.0 weighted average, target is 4.0, and exceptional is 4.5+. Accuracy must be at least 3.0 to avoid blocking risks.

Can I customize the rubric per skill?

Yes. Skills can include a validation/rubric.yaml to override weights, criteria, and minimum passing scores for that specific skill.

When should I run a quick audit vs a full audit?

Use quick audits for rapid checks on PRs or large batches; use full audits for production candidates, complex skills, or when testing code examples is necessary.