home / skills / zpankz / mcp-skillset / critique

critique skill

/critique

This skill conducts multi-lens dialectical critique to stress-test arguments and synthesize robust conclusions across structural, evidential, scope,

npx playbooks add skill zpankz/mcp-skillset --skill critique

Review the files below or copy the command above to add this skill to your agents.

Files (6)
SKILL.md
16.2 KB
---
name: critique
description: Multi-perspective dialectical reasoning with cross-evaluative synthesis. Spawns parallel evaluative lenses (STRUCTURAL, EVIDENTIAL, SCOPE, ADVERSARIAL, PRAGMATIC) that critique thesis AND critique each other's critiques, producing N-squared evaluation matrix before recursive aggregation. Triggers on /critique, /dialectic, /crosseval, requests for thorough analysis, stress-testing arguments, or finding weaknesses. Implements Hegelian refinement enhanced with interleaved multi-domain evaluation and convergent synthesis.
---

# Critique: Multi-Lens Dialectical Refinement

Execute adversarial self-refinement through parallel evaluative lenses with cross-evaluation and recursive aggregation.

## Architecture

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                         DIALECTIC ENGINE v3                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│  Φ0: CLASSIFY    → complexity assessment, mode selection, lens allocation    │
│  Φ1: THESIS      → committed position with claim DAG                         │
│  Φ2: MULTI-LENS  → N lenses evaluate thesis (N critiques)                    │
│      ANTITHESIS    + each lens evaluates others (N×(N-1) cross-evals)        │
│                    = N² total evaluation cells                               │
│  Φ3: AGGREGATE   → consensus/contested/unique extraction                     │
│      SYNTHESIS     + recursive compression passes → single output            │
│  Φ4: CONVERGE    → stability check, iterate or finalize                      │
└──────────────────────────────────────────────────────────────────────────────┘

PHASE DEPENDENCIES:
  Φ0 ──► Φ1 ──► Φ2a ──► Φ2b ──► Φ3 ──► Φ4
              (initial)  (cross)      │
                                      └──► Φ1 (if ITERATE)
```

## Mode Selection

### Automatic Mode Detection

```python
def select_mode(query: str) -> Mode:
    """
    Select critique depth based on query characteristics.
    
    QUICK:  Simple claims, factual questions, narrow scope
    STANDARD: Moderate complexity, clear domain, some nuance
    FULL:   Complex arguments, multiple stakeholders, high stakes
    """
    indicators = {
        "quick": [
            len(query) < 200,
            single_claim(query),
            factual_verifiable(query),
            low_controversy(query)
        ],
        "full": [
            len(query) > 1000,
            multi_stakeholder(query),
            ethical_implications(query),
            policy_recommendation(query),
            high_stakes_decision(query)
        ]
    }
    
    if sum(indicators["quick"]) >= 3:
        return Mode.QUICK
    elif sum(indicators["full"]) >= 2:
        return Mode.FULL
    else:
        return Mode.STANDARD
```

### Mode Specifications

| Mode | Lenses | Cross-Eval | Cycles | Threshold | Token Budget |
|------|--------|------------|--------|-----------|--------------|
| QUICK | 3 (S,E,A) | None | 1 | 0.85 | ~800 |
| STANDARD | 5 (all) | Selective (10 cells) | 2 | 0.92 | ~2000 |
| FULL | 5 (all) | Complete (25 cells) | 3 | 0.96 | ~4000 |

### Manual Triggers

| Trigger | Mode | Description |
|---------|------|-------------|
| `/critique` | Auto-detect | Intelligent mode selection |
| `/critique-quick` | QUICK | Fast, 3-lens, no cross-eval |
| `/critique-standard` | STANDARD | Balanced, selective cross-eval |
| `/critique-full` | FULL | Complete N² analysis |
| `/crosseval` | FULL | Emphasis on Φ2b matrix |
| `/aggregate` | FULL | Emphasis on Φ3 synthesis |

## Evaluative Lenses

Five orthogonal perspectives designed for comprehensive coverage with minimal overlap:

| Lens | Code | Domain | Core Question | Orthogonality Rationale |
|------|------|--------|---------------|-------------------------|
| STRUCTURAL | S | Logic & coherence | Is reasoning valid? | Form vs content |
| EVIDENTIAL | E | Evidence & epistemology | What justifies belief? | Justification type |
| SCOPE | O | Boundaries & generality | Where does this apply? | Domain limits |
| ADVERSARIAL | A | Opposition & alternatives | What's the best counter? | External challenge |
| PRAGMATIC | P | Application & consequence | Does this work? | Theory vs practice |

### Lens Independence Validation

Lenses target distinct failure modes:
- **S** catches: invalid inference, circular reasoning, equivocation
- **E** catches: weak evidence, unfalsifiable claims, cherry-picking
- **O** catches: overgeneralization, edge cases, context dependence
- **A** catches: stronger alternatives, unconsidered objections
- **P** catches: implementation barriers, unintended consequences

Overlap detection: If two lenses identify the same issue, it's either a genuine high-priority concern (reinforce) or a lens calibration problem (investigate).

## Execution Protocol

### Φ0: Classification & Mode Selection

```python
def classify_and_configure(query: str) -> Config:
    mode = select_mode(query)
    
    configs = {
        Mode.QUICK: {
            "lenses": ["S", "E", "A"],
            "cross_eval": False,
            "cycles": 1,
            "threshold": 0.85,
            "token_budget": 800
        },
        Mode.STANDARD: {
            "lenses": ["S", "E", "O", "A", "P"],
            "cross_eval": "selective",  # 10 highest-value cells
            "cycles": 2,
            "threshold": 0.92,
            "token_budget": 2000
        },
        Mode.FULL: {
            "lenses": ["S", "E", "O", "A", "P"],
            "cross_eval": "complete",   # All 25 cells
            "cycles": 3,
            "threshold": 0.96,
            "token_budget": 4000
        }
    }
    
    return Config(**configs[mode], mode=mode)
```

**Output**: `[CRITIQUE:Φ0|mode={m}|lenses={n}|cross={type}|budget={t}]`

### Φ1: Thesis Generation

Generate committed response with explicit claim DAG.

**Requirements**:
1. State positions with **falsifiable specificity**
2. Build claim graph with stability ordering:
   - `F` (FOUNDATIONAL) — axioms, definitions (immutable after Φ1)
   - `S` (STRUCTURAL) — derived claims (attackable)
   - `P` (PERIPHERAL) — applications (most vulnerable)
3. Verify acyclicity (DAG enforcement)
4. Compute initial topology metrics

**Schema**:
```yaml
thesis:
  response: "{Complete committed response}"
  claims:
    - id: C1
      content: "{Specific falsifiable claim}"
      stability: F|S|P
      supports: [C2, C3]
      depends_on: []
      confidence: 0.0-1.0
      evidence_type: empirical|logical|definitional|analogical
  topology:
    nodes: {n}
    edges: {e}
    density: {e/n}  # Target ≥2.0
    cycles: 0       # Must be 0 (enforced)
  aggregate_confidence: 0.0-1.0
  completion_marker: "Φ1_COMPLETE"  # Required for Φ2 to proceed
```

**Output**: `[CRITIQUE:Φ1|claims={n}|edges={e}|η={density}|conf={c}|✓]`

### Φ2: Multi-Lens Antithesis

#### Φ2a: Initial Lens Evaluations

**Prerequisite**: `Φ1.completion_marker == "Φ1_COMPLETE"`

Each lens independently evaluates thesis using attack vectors:

```yaml
# STRUCTURAL lens attacks
structural:
  - non_sequitur: "Conclusion does not follow from premises"
  - circular_reasoning: "Conclusion presupposed in premises"
  - false_dichotomy: "Excluded middle options"
  - equivocation: "Term shifts meaning mid-argument"

# EVIDENTIAL lens attacks  
evidential:
  - insufficient_evidence: "Claim exceeds evidential support"
  - cherry_picking: "Counter-evidence unaddressed"
  - unfalsifiable: "No possible disconfirming evidence"
  - correlation_causation: "Causal claim from correlational data"

# SCOPE lens attacks
scope:
  - overgeneralization: "Specific case → universal claim"
  - edge_case: "Valid boundary defeats universal"
  - context_dependence: "Unstated contextual requirements"

# ADVERSARIAL lens attacks
adversarial:
  - steel_man: "Strongest form of opposition"
  - alternative_explanation: "Competing hypothesis equally plausible"
  - precedent_contradiction: "Accepted instance defeats thesis"

# PRAGMATIC lens attacks
pragmatic:
  - implementation_barrier: "Cannot be executed as stated"
  - unintended_consequence: "Second-order effects harmful"
  - scaling_failure: "Works small, fails large"
```

**Per-lens output**:
```yaml
lens_evaluation:
  lens: S|E|O|A|P
  attacks:
    - target: C{id}
      type: "{attack_vector}"
      content: "{Specific critique}"
      severity: fatal|major|minor|cosmetic
      confidence_impact: -0.0 to -1.0
  summary_score: 0.0-1.0
  completion_marker: "Φ2a_{lens}_COMPLETE"
```

**Completion Gate**: All lenses must have `completion_marker` before Φ2b proceeds.

#### Φ2b: Cross-Lens Evaluation

**Prerequisite**: All `Φ2a_{lens}_COMPLETE` markers present

**QUICK mode**: Skip Φ2b entirely

**STANDARD mode**: Evaluate 10 highest-value cells:
- High-severity attacks from each lens (5 cells)
- Highest-confidence attacks cross-checked by adjacent lens (5 cells)

**FULL mode**: Complete 5×5 matrix (25 cells, minus 5 diagonal = 20 evaluations)

```
Cross-evaluation matrix:
    │  S eval │  E eval │  O eval │  A eval │  P eval │
────┼─────────┼─────────┼─────────┼─────────┼─────────┤
S → │    —    │   S→E   │   S→O   │   S→A   │   S→P   │
E → │   E→S   │    —    │   E→O   │   E→A   │   E→P   │
O → │   O→S   │   O→E   │    —    │   O→A   │   O→P   │
A → │   A→S   │   A→E   │   A→O   │    —    │   A→P   │
P → │   P→S   │   P→E   │   P→O   │   P→A   │    —    │
```

**Cross-eval output**:
```yaml
cross_evaluation:
  evaluator: S|E|O|A|P
  evaluated: S|E|O|A|P
  verdict: endorse|partial|reject
  agreements: ["{attack_ids}"]
  disagreements:
    - attack: "{attack_id}"
      objection: "{Why evaluator disagrees}"
  missed: ["{What evaluator would add}"]
  calibration: "{Over/under severity assessment}"
```

**Output**: `[CRITIQUE:Φ2|mode={m}|attacks={n}|cross={cells}|✓]`

### Φ3: Aggregation & Synthesis

#### Phase 3a: Matrix Analysis

```python
def analyze_matrix(all_attacks: list, cross_evals: Matrix) -> Analysis:
    # Consensus: ≥80% lenses agree
    consensus = [a for a in all_attacks if agreement_rate(a) >= 0.80]
    
    # Contested: 40-79% agreement
    contested = [a for a in all_attacks if 0.40 <= agreement_rate(a) < 0.80]
    
    # Unique: Single lens, but cross-eval endorsed
    unique = [a for a in all_attacks 
              if source_count(a) == 1 and cross_endorsed(a)]
    
    # Rejected: <40% agreement AND cross-eval rejection
    rejected = [a for a in all_attacks 
                if agreement_rate(a) < 0.40 and cross_rejected(a)]
    
    return Analysis(consensus, contested, unique, rejected)
```

#### Phase 3b: Conflict Resolution

For contested items:

```python
def resolve_contested(contested: list, matrix: Matrix) -> list:
    resolutions = []
    for attack in contested:
        support_weight = sum(credibility(s) for s in supporters(attack))
        oppose_weight = sum(credibility(o) for o in opposers(attack))
        
        if support_weight > oppose_weight * 1.5:
            resolution = "ADOPT"
        elif oppose_weight > support_weight * 1.5:
            resolution = "REJECT"
        else:
            resolution = "CONDITIONAL"
        
        resolutions.append(Resolution(attack, resolution, rationale(attack)))
    return resolutions
```

#### Phase 3c: Recursive Compression

```
Pass 1: Apply consensus → Core modifications (mandatory)
Pass 2: Apply contested → Conditional modifications (with qualifications)
Pass 3: Apply unique → Enhancement layer (optional enrichment)
Pass 4: Validate coherence → If failed, re-compress with tighter constraints
```

**Maximum compression passes**: 4 (prevent infinite recursion)

**Synthesis output**:
```yaml
synthesis:
  response: "{Refined response}"
  modifications:
    from_consensus: [{claim, action, rationale}]
    from_contested: [{claim, action, condition}]
    from_unique: [{claim, enhancement}]
  rejected_attacks: [{attack, rejection_rationale}]
  residual_uncertainties: [{uncertainty, disagreeing_lenses, impact}]
  confidence:
    initial: {Φ1}
    final: {post-synthesis}
```

**Output**: `[CRITIQUE:Φ3|consensus={n}|contested={n}|unique={n}|rejected={n}|conf={f}]`

### Φ4: Convergence Check

**Convergence Formula**:
```python
convergence = (
    0.30 * semantic_similarity(Φ1, Φ3) +
    0.25 * graph_similarity(Φ1.claims, Φ3.claims) +
    0.25 * confidence_stability(Φ1.conf, Φ3.conf) +
    0.20 * consensus_rate(Φ3.consensus / total_attacks)
)
```

**Threshold Justification**:
- 0.85 (QUICK): Acceptable for low-stakes, rapid iteration
- 0.92 (STANDARD): Balances thoroughness with efficiency
- 0.96 (FULL): High confidence required for complex/high-stakes

**Outcomes**:
- `CONVERGED`: Score ≥ threshold → output Φ3 synthesis
- `ITERATE`: Score < threshold AND cycles < max → Φ3 becomes new Φ1
- `EXHAUSTED`: Cycles exhausted → output Φ3 with uncertainty report

**Output**: `[CRITIQUE:Φ4|conv={score}|{STATUS}|iter={n}/{max}]`

## Graceful Degradation

When resources constrained (token budget, time pressure):

```
FULL → interrupt → Continue as STANDARD
STANDARD → interrupt → Continue as QUICK
QUICK → interrupt → Output best available synthesis with uncertainty flag
```

**Degradation markers**:
```yaml
degraded_output:
  original_mode: FULL
  actual_mode: STANDARD
  skipped_phases: [Φ2b_partial]
  confidence_penalty: -0.1
  recommendation: "Re-run in FULL mode for complete analysis"
```

## Compact Output Mode

```
[CRITIQUE|mode={m}|L={lenses}|c={cycle}/{max}]
[Φ1|n{claims}|e{edges}|η{density}|conf{c}|✓]
[Φ2|attacks{n}|cross{cells}|S:{s}|E:{e}|O:{o}|A:{a}|P:{p}|✓]
[Φ3|consensus{n}|contested{n}|unique{n}|rejected{n}|✓]
[Φ4|conv{score}|{STATUS}|conf{initial}→{final}]

SYNTHESIS: {2-3 sentence refined conclusion}
KEY_CHANGES: {Most significant modifications from Φ1}
RESIDUAL: {Primary unresolved uncertainty, if any}
```

## Meta-Cognitive Markers

```
[CLASSIFYING]  — Φ0: determining mode and resources
[COMMITTING]   — Φ1: stating without hedge
[LENS:X]       — Φ2a: evaluating from lens X perspective
[CROSS:X→Y]    — Φ2b: lens X evaluating lens Y's critique
[CONSENSUS]    — Φ3a: noting cross-lens agreement
[CONTESTED]    — Φ3a: noting genuine disagreement
[RESOLVING]    — Φ3b: applying resolution protocol
[COMPRESSING]  — Φ3c: recursive synthesis pass
[CONVERGING]   — Φ4: stability detected
[DEGRADING]    — Resource constraint, reducing scope
```

## Constraints

1. **Phase Dependencies**: Each phase requires predecessor completion marker
2. **DAG Enforcement**: Claim graph must remain acyclic; circular reasoning = fatal
3. **Stability Ordering**: FOUNDATIONAL claims immutable after Φ1
4. **Genuine Critique**: Softball attacks detected via cross-eval and rejected
5. **Compression Termination**: Max 4 recursive passes in Φ3c
6. **Convergence Cap**: Max cycles from config; output uncertainty if exhausted
7. **Token Budget**: Respect mode-specific limits; degrade gracefully if exceeded

## Integration

- **hierarchical-reasoning**: Map lenses to strategic/tactical/operational
- **graph**: Claim topology analysis, k-bisimulation on evaluation matrix
- **think**: Mental models power individual lens templates
- **non-linear**: Subagent spawning for parallel lens execution
- **infranodus**: Graph gap detection enhances STRUCTURAL lens
- **component**: Structure critique outputs as validatable configuration

## References

- `references/lens-specifications.md` — Complete lens templates and attack vectors
- `references/cross-evaluation-protocol.md` — Matrix construction and analysis
- `references/aggregation-algorithms.md` — Consensus extraction and compression

Overview

This skill performs multi-perspective dialectical refinement of a thesis using five orthogonal evaluative lenses and recursive synthesis. It spawns parallel critiques (STRUCTURAL, EVIDENTIAL, SCOPE, ADVERSARIAL, PRAGMATIC), cross-evaluates those critiques pairwise, and compresses the results into a coherent, higher-confidence output. The process supports quick, standard, and full modes with graceful degradation when resources are constrained.

How this skill works

The engine classifies the request to pick a mode, generates a committed thesis with a falsifiable claim graph, and runs independent lens evaluations that produce targeted attack vectors. In fuller modes each lens cross-checks other lenses' critiques (forming an N² evaluation matrix), then a matrix analysis classifies issues as consensus, contested, unique, or rejected and resolves them through weighted reconciliation. Finally, recursive compression yields a refined response plus convergence checks that either finalize the synthesis or iterate the cycle.

When to use it

  • Stress-testing arguments, policies, or proposals for hidden weaknesses
  • Creating a robust, falsifiable version of a prior claim or recommendation
  • Producing a balanced synthesis from competing evidence and perspectives
  • Evaluating high-stakes or multi-stakeholder decisions that need thorough vetting
  • Rapid diagnostics for logical, evidential, or implementation flaws

Best practices

  • Provide a clear, specific thesis or question to maximize efficiency and accuracy
  • Choose STANDARD for most analytic needs; use FULL for high-stakes or complex cases
  • If token/time limits apply, re-run in a higher mode to recover skipped cross-checks
  • Accept the committed Φ1 thesis as the starting point for critique—this forces falsifiable specificity
  • Use the compact output mode for quick summaries and the full synthesis for decision documents

Example use cases

  • Refining a policy recommendation before executive approval, exposing edge cases and unintended consequences
  • Evaluating a scientific claim by assessing logic, evidence quality, scope limits, and counterhypotheses
  • Stress-testing a startup go-to-market plan for operational barriers and scaling failure modes
  • Comparing competing product strategies by surfacing unconditional assumptions and stronger alternatives
  • Producing meeting-ready synthesis that lists consensus changes, conditional fixes, and residual uncertainties

FAQ

How long does a full critique take and when should I use it?

FULL mode is token- and time-intensive and suits complex, high-stakes issues; use STANDARD for typical analysis and QUICK for lightweight checks or single-claim verification.

What happens if lenses disagree?

Disagreements are categorized as contested; the system weights supporting and opposing credibility to ADOPT, REJECT, or mark CONDITIONAL, and records residual uncertainties for human review.