home / skills / zpankz / mcp-skillset / rpp

rpp skill

Q: What coverage and node ratios should I expect at each level?

Target approximately L3=100% nodes, L2≈20% nodes for ~80% coverage, L1≈4% for ~64% coverage, and L0≈0.8% for ~51% coverage, with L1:L2 roughly 2–3:1.

Q: When should I use bidirectional construction instead of bottom-up only?

Use bidirectional (recommended) when you must validate a proposed schema against evidence or iterate rapidly—merge bottom-up discovery with top-down constraints for robust convergence.

safe

/rpp

This skill generates hierarchical knowledge graphs using recursive Pareto compression to create optimized domain ontologies and small-world schemas.

npx playbooks add skill zpankz/mcp-skillset --skill rpp

Review the files below or copy the command above to add this skill to your agents.

Files (11)

SKILL.md

9.0 KB

---
name: rpp
description: |
  Generates hierarchical knowledge graphs via Recursive Pareto Principle for optimised 
  schema construction. Produces four-level structures (L0 meta-graph through L3 detail-graph) 
  where each level contains 80% fewer nodes while grounding 80% of its derivative, achieving 
  51% coverage from 0.8% of nodes via Pareto³ compression. Use when creating domain ontologies 
  or knowledge architectures requiring: (1) Atomic first principles with emergent composites, 
  (2) Pareto-optimised information density, (3) Small-world topology with validated node 
  ratios (L1:L2 2-3:1), or (4) Bidirectional construction. Integrates with graph (η≥4 
  validation), abduct (refactoring), mega (SuperHyperGraphs), infranodus (gap detection). 
  Triggers: 'schema generation', 'ontology creation', 'Pareto hierarchy', 'recursive graph', 
  'first principles decomposition'.
---

# Recursive Pareto Principle (RPP)

> **λL.τ** : Domain → OptimisedSchema via recursive Pareto compression

## Purpose

Generate hierarchical knowledge structures where each level achieves maximum explanatory power with minimum nodes through recursive application of the Pareto principle.

## Core Model

```
L0 (Meta-graph/Schema)    ← 0.8% nodes → 51% coverage (Pareto³)
      │ abductive generalisation
      ▼
L1 (Logic-graph/Atomic)   ← 4% nodes → 64% coverage (Pareto²)
      │ Pareto extraction
      ▼
L2 (Concept-graph)        ← 20% nodes → 80% coverage (Pareto¹)
      │ emergent clustering
      ▼
L3 (Detail-graph)         ← 100% nodes → ground truth
```

### Level Specifications

| Level | Role | Node % | Coverage | Ratio to L3 |
|-------|------|--------|----------|-------------|
| L0 | Meta-graph/Schema | 0.8% | 51% | 6-9:1 to L1 |
| L1 | Logic-graph/Atomic | 4% | 64% | 2-3:1 to L2 |
| L2 | Concept-graph/Composite | 20% | 80% | — |
| L3 | Detail-graph/Ground-truth | 100% | 100% | — |

### Node Ratio Constraints

- **L1:L2** = 2-3:1 (atomic to composite)
- **L1:L2** = 9-12:1 (logic to concept)
- **L1:L3** = 6-9:1 (atomic to detail)
- **Generation constraint**: 2-3 children per node at any level

## Quick Start

### 1. Domain Analysis

```python
from rpp import RPPGenerator

# Initialize with domain text
rpp = RPPGenerator(domain="pharmacology")

# Extract ground truth (L3)
l3_graph = rpp.extract_details(corpus)
```

### 2. Hierarchical Construction

```python
# Bottom-up: L3 → L2 → L1 → L0
l2_graph = rpp.cluster_concepts(l3_graph, pareto_threshold=0.8)
l1_graph = rpp.extract_atomics(l2_graph, pareto_threshold=0.8)
l0_schema = rpp.generalise_schema(l1_graph, pareto_threshold=0.8)

# Validate ratios
rpp.validate_ratios(l0_schema, l1_graph, l2_graph, l3_graph)
```

### 3. Topology Validation

```python
# Ensure small-world properties
metrics = rpp.validate_topology(
    target_eta=4.0,        # Edge density
    target_ratio_l1_l2=(2, 3),
    target_ratio_l1_l3=(6, 9)
)
```

## Construction Methods

### Bottom-Up (Reconstruction)

Start from first principles, build emergent complexity:

```
L3 details → cluster → L2 concepts → extract → L1 atomics → generalise → L0 schema
```

Use when: Ground truth is well-defined, deriving principles from evidence.

### Top-Down (Decomposition)

Start from control systems, decompose to details:

```
L0 schema → derive → L1 atomics → expand → L2 concepts → ground → L3 details
```

Use when: Schema exists, validating against domain specifics.

### Bidirectional (Recommended)

Simultaneous construction with convergence:

```
┌─────────────────────────────────────────┐
│ Bottom-Up          ⊗          Top-Down │
│ L3→L2→L1→L0       merge        L0→L1→L2→L3 │
│         └───────→ L2 ←───────┘         │
│              convergence               │
└─────────────────────────────────────────┘
```

Use when: Iterative refinement needed, validating both directions.

## Graph Topology

### Small-World Properties

The RPP graph exhibits:
- **High clustering** — Related concepts form dense clusters
- **Short path length** — Any two nodes connected via few hops
- **Core-peripheral structure** — L0/L1 form core, L2/L3 form periphery
- **Orthogonal bridges** — Unexpected cross-hierarchical connections

### Topology Targets

| Metric | Target | Validation |
|--------|--------|------------|
| η (density) | ≥ 4.0 | `graph.validate_topology()` |
| κ (clustering) | > 0.3 | Small-world coefficient |
| φ (isolation) | < 0.2 | No orphan nodes |
| Bridge edges | Present | Cross-level connections |

### Edge Types

1. **Vertical edges** — Parent-child across levels (L0↔L1↔L2↔L3)
2. **Horizontal edges** — Sibling relations within level
3. **Hyperedges** — Multi-node interactions (weighted by semantic importance)
4. **Bridge edges** — Orthogonal cross-hierarchical connections

## Pareto Extraction Algorithm

```python
def pareto_extract(source_graph, target_ratio=0.2):
    """
    Extract Pareto-optimal nodes from source graph.
    
    Args:
        source_graph: Input graph (e.g., L3 for extracting L2)
        target_ratio: Target node reduction (default 20% = 0.2)
    
    Returns:
        Reduced graph with target_ratio * |source| nodes
        grounding (1 - target_ratio) of semantic coverage
    """
    # 1. Compute node importance (PageRank + semantic weight)
    importance = compute_importance(source_graph)
    
    # 2. Select top nodes by cumulative coverage
    selected = []
    coverage = 0.0
    for node in sorted(importance, reverse=True):
        selected.append(node)
        coverage += node.coverage_contribution
        if coverage >= (1 - target_ratio):
            break
    
    # 3. Verify Pareto constraint
    assert len(selected) / len(source_graph) <= target_ratio
    assert coverage >= (1 - target_ratio)
    
    # 4. Build reduced graph preserving topology
    return build_subgraph(selected, preserve_bridges=True)
```

## Integration Points

### With graph skill

```python
# Validate RPP topology
from graph import validate_topology
metrics = validate_topology(rpp_graph, require_eta=4.0)
```

### With abduct skill

```python
# Refactor schema for optimisation
from abduct import refactor_schema
l0_optimised = refactor_schema(l0_schema, target_compression=0.8)
```

### With mega skill

```python
# Extend to n-SuperHyperGraphs for complex domains
from mega import extend_to_superhypergraph
shg = extend_to_superhypergraph(rpp_graph, max_hyperedge_arity=5)
```

### With infranodus MCP

```python
# Detect structural gaps
gaps = mcp__infranodus__generateContentGaps(rpp_graph.to_text())
bridges = mcp__infranodus__getGraphAndAdvice(optimize="gaps")
```

## Scale Invariance Principles

The RPP framework embodies scale-invariant patterns:

| Principle | Application in RPP |
|-----------|-------------------|
| Fractal self-similarity | Each level mirrors whole structure |
| Pareto distribution | 80/20 at each level compounds |
| Neuroplasticity | Pruning weak, amplifying strong connections |
| Free energy principle | Minimising surprise through compression |
| Critical phase transitions | Level boundaries as phase transitions |
| Power-law distribution | Node importance follows power law |

## References

For detailed implementation, see:

| Need | File |
|------|------|
| Level-specific construction | [references/level-construction.md](references/level-construction.md) |
| Topology validation | [references/topology-validation.md](references/topology-validation.md) |
| Pareto algorithms | [references/pareto-algorithms.md](references/pareto-algorithms.md) |
| Scale invariance theory | [references/scale-invariance.md](references/scale-invariance.md) |
| Integration patterns | [references/integration-patterns.md](references/integration-patterns.md) |
| Examples and templates | [references/examples.md](references/examples.md) |

## Scripts

| Script | Purpose |
|--------|---------|
| [scripts/rpp_generator.py](scripts/rpp_generator.py) | Core RPP graph generation |
| [scripts/pareto_extract.py](scripts/pareto_extract.py) | Level extraction algorithm |
| [scripts/validate_ratios.py](scripts/validate_ratios.py) | Node ratio validation |
| [scripts/topology_check.py](scripts/topology_check.py) | Small-world validation |

## Checklist

### Before Generation
- [ ] Domain corpus available
- [ ] Target level count defined (typically 4)
- [ ] Integration skills accessible (graph, abduct)

### During Generation
- [ ] L3 ground truth extracted
- [ ] Each level achieves 80% coverage with 20% nodes
- [ ] Node ratios within constraints
- [ ] Hyperedge weights computed

### After Generation
- [ ] Topology validated (η≥4)
- [ ] Small-world coefficient verified
- [ ] Bridge edges present
- [ ] Schema exported in required format

---

```
λL.τ                     L3→L2→L1→L0 via Pareto extraction
80/20 → 64/4 → 51/0.8   recursive compression chain
rpp                      hierarchical knowledge architecture
```

Overview

This skill generates four-level hierarchical knowledge graphs using a Recursive Pareto Principle (RPP) to produce highly compressed, high-coverage schemas. It constructs L0 (meta-schema) through L3 (detail graph) so each higher level contains about 80% fewer nodes while grounding the majority of semantic coverage. The result is a Pareto³ compression that delivers dense, small-world ontologies optimized for explanation and navigation.

How this skill works

RPP extracts an L3 ground-truth graph from a corpus, then applies iterative Pareto extraction and clustering to produce L2 concepts, L1 atomic logic nodes, and an L0 meta-schema. Each extraction selects top nodes by importance (PageRank + semantic weight) until coverage targets are met, preserving bridge edges and topology. Topology validators enforce small-world metrics (η, clustering, isolation) and node-ratio constraints during bidirectional convergence.

When to use it

Designing domain ontologies or knowledge architectures from large corpora
Deriving atomic first principles with emergent composite concepts
Compressing information for optimized explanation, search, or summarization
Validating or refactoring existing schemas to meet topology/ratio constraints

Best practices

Start with a clear L3 corpus and extract ground-truth details before clustering
Use bidirectional construction (bottom-up + top-down) for iterative convergence and validation
Validate node ratios and topology (η≥4, clustering >0.3, isolation <0.2) after each level extraction
Preserve bridge and hyperedges to maintain cross-hierarchical semantics and small-world structure

Example use cases

Create a compact ontology for a scientific domain (e.g., pharmacology) that maps first principles to experimental detail
Refactor an enterprise knowledge graph to improve explainability and reduce node bloat while keeping coverage
Generate navigation and search schemas where 0.8% of nodes can surface 51% of signals for rapid triage
Integrate with graph/abduct/mega/infranodus workflows to detect gaps and extend to SuperHyperGraphs

FAQ

What coverage and node ratios should I expect at each level?

Target approximately L3=100% nodes, L2≈20% nodes for ~80% coverage, L1≈4% for ~64% coverage, and L0≈0.8% for ~51% coverage, with L1:L2 roughly 2–3:1.

When should I use bidirectional construction instead of bottom-up only?

Use bidirectional (recommended) when you must validate a proposed schema against evidence or iterate rapidly—merge bottom-up discovery with top-down constraints for robust convergence.