home / skills / zpankz / mcp-skillset / critique

critique skill

safe

This skill conducts multi-lens dialectical critique to stress-test arguments and synthesize robust conclusions across structural, evidential, scope,

npx playbooks add skill zpankz/mcp-skillset --skill critique

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

16.2 KB

---
name: critique
description: Multi-perspective dialectical reasoning with cross-evaluative synthesis. Spawns parallel evaluative lenses (STRUCTURAL, EVIDENTIAL, SCOPE, ADVERSARIAL, PRAGMATIC) that critique thesis AND critique each other's critiques, producing N-squared evaluation matrix before recursive aggregation. Triggers on /critique, /dialectic, /crosseval, requests for thorough analysis, stress-testing arguments, or finding weaknesses. Implements Hegelian refinement enhanced with interleaved multi-domain evaluation and convergent synthesis.
---

# Critique: Multi-Lens Dialectical Refinement

Execute adversarial self-refinement through parallel evaluative lenses with cross-evaluation and recursive aggregation.

## Architecture

```
┌──────────────────────────────────────────────────────────────────────────────┐
│                         DIALECTIC ENGINE v3                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│  Φ0: CLASSIFY    → complexity assessment, mode selection, lens allocation    │
│  Φ1: THESIS      → committed position with claim DAG                         │
│  Φ2: MULTI-LENS  → N lenses evaluate thesis (N critiques)                    │
│      ANTITHESIS    + each lens evaluates others (N×(N-1) cross-evals)        │
│                    = N² total evaluation cells                               │
│  Φ3: AGGREGATE   → consensus/contested/unique extraction                     │
│      SYNTHESIS     + recursive compression passes → single output            │
│  Φ4: CONVERGE    → stability check, iterate or finalize                      │
└──────────────────────────────────────────────────────────────────────────────┘

PHASE DEPENDENCIES:
  Φ0 ──► Φ1 ──► Φ2a ──► Φ2b ──► Φ3 ──► Φ4
              (initial)  (cross)      │
                                      └──► Φ1 (if ITERATE)
```

## Mode Selection

### Automatic Mode Detection

```python
def select_mode(query: str) -> Mode:
    """
    Select critique depth based on query characteristics.
    
    QUICK:  Simple claims, factual questions, narrow scope
    STANDARD: Moderate complexity, clear domain, some nuance
    FULL:   Complex arguments, multiple stakeholders, high stakes
    """
    indicators = {
        "quick": [
            len(query) < 200,
            single_claim(query),
            factual_verifiable(query),
            low_controversy(query)
        ],
        "full": [
            len(query) > 1000,
            multi_stakeholder(query),
            ethical_implications(query),
            policy_recommendation(query),
            high_stakes_decision(query)
        ]
    }
    
    if sum(indicators["quick"]) >= 3:
        return Mode.QUICK
    elif sum(indicators["full"]) >= 2:
        return Mode.FULL
    else:
        return Mode.STANDARD
```

### Mode Specifications

| Mode | Lenses | Cross-Eval | Cycles | Threshold | Token Budget |
|------|--------|------------|--------|-----------|--------------|
| QUICK | 3 (S,E,A) | None | 1 | 0.85 | ~800 |
| STANDARD | 5 (all) | Selective (10 cells) | 2 | 0.92 | ~2000 |
| FULL | 5 (all) | Complete (25 cells) | 3 | 0.96 | ~4000 |

### Manual Triggers

| Trigger | Mode | Description |
|---------|------|-------------|
| `/critique` | Auto-detect | Intelligent mode selection |
| `/critique-quick` | QUICK | Fast, 3-lens, no cross-eval |
| `/critique-standard` | STANDARD | Balanced, selective cross-eval |
| `/critique-full` | FULL | Complete N² analysis |
| `/crosseval` | FULL | Emphasis on Φ2b matrix |
| `/aggregate` | FULL | Emphasis on Φ3 synthesis |

## Evaluative Lenses

Five orthogonal perspectives designed for comprehensive coverage with minimal overlap:

| Lens | Code | Domain | Core Question | Orthogonality Rationale |
|------|------|--------|---------------|-------------------------|
| STRUCTURAL | S | Logic & coherence | Is reasoning valid? | Form vs content |
| EVIDENTIAL | E | Evidence & epistemology | What justifies belief? | Justification type |
| SCOPE | O | Boundaries & generality | Where does this apply? | Domain limits |
| ADVERSARIAL | A | Opposition & alternatives | What's the best counter? | External challenge |
| PRAGMATIC | P | Application & consequence | Does this work? | Theory vs practice |

### Lens Independence Validation

Lenses target distinct failure modes:
- **S** catches: invalid inference, circular reasoning, equivocation
- **E** catches: weak evidence, unfalsifiable claims, cherry-picking
- **O** catches: overgeneralization, edge cases, context dependence
- **A** catches: stronger alternatives, unconsidered objections
- **P** catches: implementation barriers, unintended consequences

Overlap detection: If two lenses identify the same issue, it's either a genuine high-priority concern (reinforce) or a lens calibration problem (investigate).

## Execution Protocol

### Φ0: Classification & Mode Selection

```python
def classify_and_configure(query: str) -> Config:
    mode = select_mode(query)
    
    configs = {
        Mode.QUICK: {
            "lenses": ["S", "E", "A"],
            "cross_eval": False,
            "cycles": 1,
            "threshold": 0.85,
            "token_budget": 800
        },
        Mode.STANDARD: {
            "lenses": ["S", "E", "O", "A", "P"],
            "cross_eval": "selective",  # 10 highest-value cells
            "cycles": 2,
            "threshold": 0.92,
            "token_budget": 2000
        },
        Mode.FULL: {
            "lenses": ["S", "E", "O", "A", "P"],
            "cross_eval": "complete",   # All 25 cells
            "cycles": 3,
            "threshold": 0.96,
            "token_budget": 4000
        }
    }
    
    return Config(**configs[mode], mode=mode)
```

**Output**: `[CRITIQUE:Φ0|mode={m}|lenses={n}|cross={type}|budget={t}]`

### Φ1: Thesis Generation

Generate committed response with explicit claim DAG.

**Requirements**:
1. State positions with **falsifiable specificity**
2. Build claim graph with stability ordering:
   - `F` (FOUNDATIONAL) — axioms, definitions (immutable after Φ1)
   - `S` (STRUCTURAL) — derived claims (attackable)
   - `P` (PERIPHERAL) — applications (most vulnerable)
3. Verify acyclicity (DAG enforcement)
4. Compute initial topology metrics

**Schema**:
```yaml
thesis:
  response: "{Complete committed response}"
  claims:
    - id: C1
      content: "{Specific falsifiable claim}"
      stability: F|S|P
      supports: [C2, C3]
      depends_on: []
      confidence: 0.0-1.0
      evidence_type: empirical|logical|definitional|analogical
  topology:
    nodes: {n}
    edges: {e}
    density: {e/n}  # Target ≥2.0
    cycles: 0       # Must be 0 (enforced)
  aggregate_confidence: 0.0-1.0
  completion_marker: "Φ1_COMPLETE"  # Required for Φ2 to proceed
```

**Output**: `[CRITIQUE:Φ1|claims={n}|edges={e}|η={density}|conf={c}|✓]`

### Φ2: Multi-Lens Antithesis

#### Φ2a: Initial Lens Evaluations

**Prerequisite**: `Φ1.completion_marker == "Φ1_COMPLETE"`

Each lens independently evaluates thesis using attack vectors:

```yaml
# STRUCTURAL lens attacks
structural:
  - non_sequitur: "Conclusion does not follow from premises"
  - circular_reasoning: "Conclusion presupposed in premises"
  - false_dichotomy: "Excluded middle options"
  - equivocation: "Term shifts meaning mid-argument"

# EVIDENTIAL lens attacks  
evidential:
  - insufficient_evidence: "Claim exceeds evidential support"
  - cherry_picking: "Counter-evidence unaddressed"
  - unfalsifiable: "No possible disconfirming evidence"
  - correlation_causation: "Causal claim from correlational data"

# SCOPE lens attacks
scope:
  - overgeneralization: "Specific case → universal claim"
  - edge_case: "Valid boundary defeats universal"
  - context_dependence: "Unstated contextual requirements"

# ADVERSARIAL lens attacks
adversarial:
  - steel_man: "Strongest form of opposition"
  - alternative_explanation: "Competing hypothesis equally plausible"
  - precedent_contradiction: "Accepted instance defeats thesis"

# PRAGMATIC lens attacks
pragmatic:
  - implementation_barrier: "Cannot be executed as stated"
  - unintended_consequence: "Second-order effects harmful"
  - scaling_failure: "Works small, fails large"
```

**Per-lens output**:
```yaml
lens_evaluation:
  lens: S|E|O|A|P
  attacks:
    - target: C{id}
      type: "{attack_vector}"
      content: "{Specific critique}"
      severity: fatal|major|minor|cosmetic
      confidence_impact: -0.0 to -1.0
  summary_score: 0.0-1.0
  completion_marker: "Φ2a_{lens}_COMPLETE"
```

**Completion Gate**: All lenses must have `completion_marker` before Φ2b proceeds.

#### Φ2b: Cross-Lens Evaluation

**Prerequisite**: All `Φ2a_{lens}_COMPLETE` markers present

**QUICK mode**: Skip Φ2b entirely

**STANDARD mode**: Evaluate 10 highest-value cells:
- High-severity attacks from each lens (5 cells)
- Highest-confidence attacks cross-checked by adjacent lens (5 cells)

**FULL mode**: Complete 5×5 matrix (25 cells, minus 5 diagonal = 20 evaluations)

```
Cross-evaluation matrix:
    │  S eval │  E eval │  O eval │  A eval │  P eval │
────┼─────────┼─────────┼─────────┼─────────┼─────────┤
S → │    —    │   S→E   │   S→O   │   S→A   │   S→P   │
E → │   E→S   │    —    │   E→O   │   E→A   │   E→P   │
O → │   O→S   │   O→E   │    —    │   O→A   │   O→P   │
A → │   A→S   │   A→E   │   A→O   │    —    │   A→P   │
P → │   P→S   │   P→E   │   P→O   │   P→A   │    —    │
```

**Cross-eval output**:
```yaml
cross_evaluation:
  evaluator: S|E|O|A|P
  evaluated: S|E|O|A|P
  verdict: endorse|partial|reject
  agreements: ["{attack_ids}"]
  disagreements:
    - attack: "{attack_id}"
      objection: "{Why evaluator disagrees}"
  missed: ["{What evaluator would add}"]
  calibration: "{Over/under severity assessment}"
```

**Output**: `[CRITIQUE:Φ2|mode={m}|attacks={n}|cross={cells}|✓]`

### Φ3: Aggregation & Synthesis

#### Phase 3a: Matrix Analysis

```python
def analyze_matrix(all_attacks: list, cross_evals: Matrix) -> Analysis:
    # Consensus: ≥80% lenses agree
    consensus = [a for a in all_attacks if agreement_rate(a) >= 0.80]
    
    # Contested: 40-79% agreement
    contested = [a for a in all_attacks if 0.40 <= agreement_rate(a) < 0.80]
    
    # Unique: Single lens, but cross-eval endorsed
    unique = [a for a in all_attacks 
              if source_count(a) == 1 and cross_endorsed(a)]
    
    # Rejected: <40% agreement AND cross-eval rejection
    rejected = [a for a in all_attacks 
                if agreement_rate(a) < 0.40 and cross_rejected(a)]
    
    return Analysis(consensus, contested, unique, rejected)
```

#### Phase 3b: Conflict Resolution

For contested items:

```python
def resolve_contested(contested: list, matrix: Matrix) -> list:
    resolutions = []
    for attack in contested:
        support_weight = sum(credibility(s) for s in supporters(attack))
        oppose_weight = sum(credibility(o) for o in opposers(attack))
        
        if support_weight > oppose_weight * 1.5:
            resolution = "ADOPT"
        elif oppose_weight > support_weight * 1.5:
            resolution = "REJECT"
        else:
            resolution = "CONDITIONAL"
        
        resolutions.append(Resolution(attack, resolution, rationale(attack)))
    return resolutions
```

#### Phase 3c: Recursive Compression

```
Pass 1: Apply consensus → Core modifications (mandatory)
Pass 2: Apply contested → Conditional modifications (with qualifications)
Pass 3: Apply unique → Enhancement layer (optional enrichment)
Pass 4: Validate coherence → If failed, re-compress with tighter constraints
```

**Maximum compression passes**: 4 (prevent infinite recursion)

**Synthesis output**:
```yaml
synthesis:
  response: "{Refined response}"
  modifications:
    from_consensus: [{claim, action, rationale}]
    from_contested: [{claim, action, condition}]
    from_unique: [{claim, enhancement}]
  rejected_attacks: [{attack, rejection_rationale}]
  residual_uncertainties: [{uncertainty, disagreeing_lenses, impact}]
  confidence:
    initial: {Φ1}
    final: {post-synthesis}
```

**Output**: `[CRITIQUE:Φ3|consensus={n}|contested={n}|unique={n}|rejected={n}|conf={f}]`

### Φ4: Convergence Check

**Convergence Formula**:
```python
convergence = (
    0.30 * semantic_similarity(Φ1, Φ3) +
    0.25 * graph_similarity(Φ1.claims, Φ3.claims) +
    0.25 * confidence_stability(Φ1.conf, Φ3.conf) +
    0.20 * consensus_rate(Φ3.consensus / total_attacks)
)
```

**Threshold Justification**:
- 0.85 (QUICK): Acceptable for low-stakes, rapid iteration
- 0.92 (STANDARD): Balances thoroughness with efficiency
- 0.96 (FULL): High confidence required for complex/high-stakes

**Outcomes**:
- `CONVERGED`: Score ≥ threshold → output Φ3 synthesis
- `ITERATE`: Score < threshold AND cycles < max → Φ3 becomes new Φ1
- `EXHAUSTED`: Cycles exhausted → output Φ3 with uncertainty report

**Output**: `[CRITIQUE:Φ4|conv={score}|{STATUS}|iter={n}/{max}]`

## Graceful Degradation

When resources constrained (token budget, time pressure):

```
FULL → interrupt → Continue as STANDARD
STANDARD → interrupt → Continue as QUICK
QUICK → interrupt → Output best available synthesis with uncertainty flag
```

**Degradation markers**:
```yaml
degraded_output:
  original_mode: FULL
  actual_mode: STANDARD
  skipped_phases: [Φ2b_partial]
  confidence_penalty: -0.1
  recommendation: "Re-run in FULL mode for complete analysis"
```

## Compact Output Mode

```
[CRITIQUE|mode={m}|L={lenses}|c={cycle}/{max}]
[Φ1|n{claims}|e{edges}|η{density}|conf{c}|✓]
[Φ2|attacks{n}|cross{cells}|S:{s}|E:{e}|O:{o}|A:{a}|P:{p}|✓]
[Φ3|consensus{n}|contested{n}|unique{n}|rejected{n}|✓]
[Φ4|conv{score}|{STATUS}|conf{initial}→{final}]

SYNTHESIS: {2-3 sentence refined conclusion}
KEY_CHANGES: {Most significant modifications from Φ1}
RESIDUAL: {Primary unresolved uncertainty, if any}
```

## Meta-Cognitive Markers

```
[CLASSIFYING]  — Φ0: determining mode and resources
[COMMITTING]   — Φ1: stating without hedge
[LENS:X]       — Φ2a: evaluating from lens X perspective
[CROSS:X→Y]    — Φ2b: lens X evaluating lens Y's critique
[CONSENSUS]    — Φ3a: noting cross-lens agreement
[CONTESTED]    — Φ3a: noting genuine disagreement
[RESOLVING]    — Φ3b: applying resolution protocol
[COMPRESSING]  — Φ3c: recursive synthesis pass
[CONVERGING]   — Φ4: stability detected
[DEGRADING]    — Resource constraint, reducing scope
```

## Constraints

1. **Phase Dependencies**: Each phase requires predecessor completion marker
2. **DAG Enforcement**: Claim graph must remain acyclic; circular reasoning = fatal
3. **Stability Ordering**: FOUNDATIONAL claims immutable after Φ1
4. **Genuine Critique**: Softball attacks detected via cross-eval and rejected
5. **Compression Termination**: Max 4 recursive passes in Φ3c
6. **Convergence Cap**: Max cycles from config; output uncertainty if exhausted
7. **Token Budget**: Respect mode-specific limits; degrade gracefully if exceeded

## Integration

- **hierarchical-reasoning**: Map lenses to strategic/tactical/operational
- **graph**: Claim topology analysis, k-bisimulation on evaluation matrix
- **think**: Mental models power individual lens templates
- **non-linear**: Subagent spawning for parallel lens execution
- **infranodus**: Graph gap detection enhances STRUCTURAL lens
- **component**: Structure critique outputs as validatable configuration

## References

- `references/lens-specifications.md` — Complete lens templates and attack vectors
- `references/cross-evaluation-protocol.md` — Matrix construction and analysis
- `references/aggregation-algorithms.md` — Consensus extraction and compression

Overview

This skill performs multi-perspective dialectical refinement of a thesis using five orthogonal evaluative lenses and recursive synthesis. It spawns parallel critiques (STRUCTURAL, EVIDENTIAL, SCOPE, ADVERSARIAL, PRAGMATIC), cross-evaluates those critiques pairwise, and compresses the results into a coherent, higher-confidence output. The process supports quick, standard, and full modes with graceful degradation when resources are constrained.

How this skill works

The engine classifies the request to pick a mode, generates a committed thesis with a falsifiable claim graph, and runs independent lens evaluations that produce targeted attack vectors. In fuller modes each lens cross-checks other lenses' critiques (forming an N² evaluation matrix), then a matrix analysis classifies issues as consensus, contested, unique, or rejected and resolves them through weighted reconciliation. Finally, recursive compression yields a refined response plus convergence checks that either finalize the synthesis or iterate the cycle.

When to use it

Stress-testing arguments, policies, or proposals for hidden weaknesses
Creating a robust, falsifiable version of a prior claim or recommendation
Producing a balanced synthesis from competing evidence and perspectives
Evaluating high-stakes or multi-stakeholder decisions that need thorough vetting
Rapid diagnostics for logical, evidential, or implementation flaws

Best practices

Provide a clear, specific thesis or question to maximize efficiency and accuracy
Choose STANDARD for most analytic needs; use FULL for high-stakes or complex cases
If token/time limits apply, re-run in a higher mode to recover skipped cross-checks
Accept the committed Φ1 thesis as the starting point for critique—this forces falsifiable specificity
Use the compact output mode for quick summaries and the full synthesis for decision documents

Example use cases

Refining a policy recommendation before executive approval, exposing edge cases and unintended consequences
Evaluating a scientific claim by assessing logic, evidence quality, scope limits, and counterhypotheses
Stress-testing a startup go-to-market plan for operational barriers and scaling failure modes
Comparing competing product strategies by surfacing unconditional assumptions and stronger alternatives
Producing meeting-ready synthesis that lists consensus changes, conditional fixes, and residual uncertainties

FAQ

How long does a full critique take and when should I use it?

FULL mode is token- and time-intensive and suits complex, high-stakes issues; use STANDARD for typical analysis and QUICK for lightweight checks or single-claim verification.

What happens if lenses disagree?

Disagreements are categorized as contested; the system weights supporting and opposing credibility to ADOPT, REJECT, or mark CONDITIONAL, and records residual uncertainties for human review.