home / skills / zpankz / mcp-skillset / rpp

rpp skill

/rpp

This skill generates hierarchical knowledge graphs using recursive Pareto compression to create optimized domain ontologies and small-world schemas.

npx playbooks add skill zpankz/mcp-skillset --skill rpp

Review the files below or copy the command above to add this skill to your agents.

Files (11)
SKILL.md
9.0 KB
---
name: rpp
description: |
  Generates hierarchical knowledge graphs via Recursive Pareto Principle for optimised 
  schema construction. Produces four-level structures (L0 meta-graph through L3 detail-graph) 
  where each level contains 80% fewer nodes while grounding 80% of its derivative, achieving 
  51% coverage from 0.8% of nodes via Pareto³ compression. Use when creating domain ontologies 
  or knowledge architectures requiring: (1) Atomic first principles with emergent composites, 
  (2) Pareto-optimised information density, (3) Small-world topology with validated node 
  ratios (L1:L2 2-3:1), or (4) Bidirectional construction. Integrates with graph (η≥4 
  validation), abduct (refactoring), mega (SuperHyperGraphs), infranodus (gap detection). 
  Triggers: 'schema generation', 'ontology creation', 'Pareto hierarchy', 'recursive graph', 
  'first principles decomposition'.
---

# Recursive Pareto Principle (RPP)

> **λL.τ** : Domain → OptimisedSchema via recursive Pareto compression

## Purpose

Generate hierarchical knowledge structures where each level achieves maximum explanatory power with minimum nodes through recursive application of the Pareto principle.

## Core Model

```
L0 (Meta-graph/Schema)    ← 0.8% nodes → 51% coverage (Pareto³)
      │ abductive generalisation
      ▼
L1 (Logic-graph/Atomic)   ← 4% nodes → 64% coverage (Pareto²)
      │ Pareto extraction
      ▼
L2 (Concept-graph)        ← 20% nodes → 80% coverage (Pareto¹)
      │ emergent clustering
      ▼
L3 (Detail-graph)         ← 100% nodes → ground truth
```

### Level Specifications

| Level | Role | Node % | Coverage | Ratio to L3 |
|-------|------|--------|----------|-------------|
| L0 | Meta-graph/Schema | 0.8% | 51% | 6-9:1 to L1 |
| L1 | Logic-graph/Atomic | 4% | 64% | 2-3:1 to L2 |
| L2 | Concept-graph/Composite | 20% | 80% | — |
| L3 | Detail-graph/Ground-truth | 100% | 100% | — |

### Node Ratio Constraints

- **L1:L2** = 2-3:1 (atomic to composite)
- **L1:L2** = 9-12:1 (logic to concept)
- **L1:L3** = 6-9:1 (atomic to detail)
- **Generation constraint**: 2-3 children per node at any level

## Quick Start

### 1. Domain Analysis

```python
from rpp import RPPGenerator

# Initialize with domain text
rpp = RPPGenerator(domain="pharmacology")

# Extract ground truth (L3)
l3_graph = rpp.extract_details(corpus)
```

### 2. Hierarchical Construction

```python
# Bottom-up: L3 → L2 → L1 → L0
l2_graph = rpp.cluster_concepts(l3_graph, pareto_threshold=0.8)
l1_graph = rpp.extract_atomics(l2_graph, pareto_threshold=0.8)
l0_schema = rpp.generalise_schema(l1_graph, pareto_threshold=0.8)

# Validate ratios
rpp.validate_ratios(l0_schema, l1_graph, l2_graph, l3_graph)
```

### 3. Topology Validation

```python
# Ensure small-world properties
metrics = rpp.validate_topology(
    target_eta=4.0,        # Edge density
    target_ratio_l1_l2=(2, 3),
    target_ratio_l1_l3=(6, 9)
)
```

## Construction Methods

### Bottom-Up (Reconstruction)

Start from first principles, build emergent complexity:

```
L3 details → cluster → L2 concepts → extract → L1 atomics → generalise → L0 schema
```

Use when: Ground truth is well-defined, deriving principles from evidence.

### Top-Down (Decomposition)

Start from control systems, decompose to details:

```
L0 schema → derive → L1 atomics → expand → L2 concepts → ground → L3 details
```

Use when: Schema exists, validating against domain specifics.

### Bidirectional (Recommended)

Simultaneous construction with convergence:

```
┌─────────────────────────────────────────┐
│ Bottom-Up          ⊗          Top-Down │
│ L3→L2→L1→L0       merge        L0→L1→L2→L3 │
│         └───────→ L2 ←───────┘         │
│              convergence               │
└─────────────────────────────────────────┘
```

Use when: Iterative refinement needed, validating both directions.

## Graph Topology

### Small-World Properties

The RPP graph exhibits:
- **High clustering** — Related concepts form dense clusters
- **Short path length** — Any two nodes connected via few hops
- **Core-peripheral structure** — L0/L1 form core, L2/L3 form periphery
- **Orthogonal bridges** — Unexpected cross-hierarchical connections

### Topology Targets

| Metric | Target | Validation |
|--------|--------|------------|
| η (density) | ≥ 4.0 | `graph.validate_topology()` |
| κ (clustering) | > 0.3 | Small-world coefficient |
| φ (isolation) | < 0.2 | No orphan nodes |
| Bridge edges | Present | Cross-level connections |

### Edge Types

1. **Vertical edges** — Parent-child across levels (L0↔L1↔L2↔L3)
2. **Horizontal edges** — Sibling relations within level
3. **Hyperedges** — Multi-node interactions (weighted by semantic importance)
4. **Bridge edges** — Orthogonal cross-hierarchical connections

## Pareto Extraction Algorithm

```python
def pareto_extract(source_graph, target_ratio=0.2):
    """
    Extract Pareto-optimal nodes from source graph.
    
    Args:
        source_graph: Input graph (e.g., L3 for extracting L2)
        target_ratio: Target node reduction (default 20% = 0.2)
    
    Returns:
        Reduced graph with target_ratio * |source| nodes
        grounding (1 - target_ratio) of semantic coverage
    """
    # 1. Compute node importance (PageRank + semantic weight)
    importance = compute_importance(source_graph)
    
    # 2. Select top nodes by cumulative coverage
    selected = []
    coverage = 0.0
    for node in sorted(importance, reverse=True):
        selected.append(node)
        coverage += node.coverage_contribution
        if coverage >= (1 - target_ratio):
            break
    
    # 3. Verify Pareto constraint
    assert len(selected) / len(source_graph) <= target_ratio
    assert coverage >= (1 - target_ratio)
    
    # 4. Build reduced graph preserving topology
    return build_subgraph(selected, preserve_bridges=True)
```

## Integration Points

### With graph skill

```python
# Validate RPP topology
from graph import validate_topology
metrics = validate_topology(rpp_graph, require_eta=4.0)
```

### With abduct skill

```python
# Refactor schema for optimisation
from abduct import refactor_schema
l0_optimised = refactor_schema(l0_schema, target_compression=0.8)
```

### With mega skill

```python
# Extend to n-SuperHyperGraphs for complex domains
from mega import extend_to_superhypergraph
shg = extend_to_superhypergraph(rpp_graph, max_hyperedge_arity=5)
```

### With infranodus MCP

```python
# Detect structural gaps
gaps = mcp__infranodus__generateContentGaps(rpp_graph.to_text())
bridges = mcp__infranodus__getGraphAndAdvice(optimize="gaps")
```

## Scale Invariance Principles

The RPP framework embodies scale-invariant patterns:

| Principle | Application in RPP |
|-----------|-------------------|
| Fractal self-similarity | Each level mirrors whole structure |
| Pareto distribution | 80/20 at each level compounds |
| Neuroplasticity | Pruning weak, amplifying strong connections |
| Free energy principle | Minimising surprise through compression |
| Critical phase transitions | Level boundaries as phase transitions |
| Power-law distribution | Node importance follows power law |

## References

For detailed implementation, see:

| Need | File |
|------|------|
| Level-specific construction | [references/level-construction.md](references/level-construction.md) |
| Topology validation | [references/topology-validation.md](references/topology-validation.md) |
| Pareto algorithms | [references/pareto-algorithms.md](references/pareto-algorithms.md) |
| Scale invariance theory | [references/scale-invariance.md](references/scale-invariance.md) |
| Integration patterns | [references/integration-patterns.md](references/integration-patterns.md) |
| Examples and templates | [references/examples.md](references/examples.md) |

## Scripts

| Script | Purpose |
|--------|---------|
| [scripts/rpp_generator.py](scripts/rpp_generator.py) | Core RPP graph generation |
| [scripts/pareto_extract.py](scripts/pareto_extract.py) | Level extraction algorithm |
| [scripts/validate_ratios.py](scripts/validate_ratios.py) | Node ratio validation |
| [scripts/topology_check.py](scripts/topology_check.py) | Small-world validation |

## Checklist

### Before Generation
- [ ] Domain corpus available
- [ ] Target level count defined (typically 4)
- [ ] Integration skills accessible (graph, abduct)

### During Generation
- [ ] L3 ground truth extracted
- [ ] Each level achieves 80% coverage with 20% nodes
- [ ] Node ratios within constraints
- [ ] Hyperedge weights computed

### After Generation
- [ ] Topology validated (η≥4)
- [ ] Small-world coefficient verified
- [ ] Bridge edges present
- [ ] Schema exported in required format

---

```
λL.τ                     L3→L2→L1→L0 via Pareto extraction
80/20 → 64/4 → 51/0.8   recursive compression chain
rpp                      hierarchical knowledge architecture
```

Overview

This skill generates four-level hierarchical knowledge graphs using a Recursive Pareto Principle (RPP) to produce highly compressed, high-coverage schemas. It constructs L0 (meta-schema) through L3 (detail graph) so each higher level contains about 80% fewer nodes while grounding the majority of semantic coverage. The result is a Pareto³ compression that delivers dense, small-world ontologies optimized for explanation and navigation.

How this skill works

RPP extracts an L3 ground-truth graph from a corpus, then applies iterative Pareto extraction and clustering to produce L2 concepts, L1 atomic logic nodes, and an L0 meta-schema. Each extraction selects top nodes by importance (PageRank + semantic weight) until coverage targets are met, preserving bridge edges and topology. Topology validators enforce small-world metrics (η, clustering, isolation) and node-ratio constraints during bidirectional convergence.

When to use it

  • Designing domain ontologies or knowledge architectures from large corpora
  • Deriving atomic first principles with emergent composite concepts
  • Compressing information for optimized explanation, search, or summarization
  • Validating or refactoring existing schemas to meet topology/ratio constraints

Best practices

  • Start with a clear L3 corpus and extract ground-truth details before clustering
  • Use bidirectional construction (bottom-up + top-down) for iterative convergence and validation
  • Validate node ratios and topology (η≥4, clustering >0.3, isolation <0.2) after each level extraction
  • Preserve bridge and hyperedges to maintain cross-hierarchical semantics and small-world structure

Example use cases

  • Create a compact ontology for a scientific domain (e.g., pharmacology) that maps first principles to experimental detail
  • Refactor an enterprise knowledge graph to improve explainability and reduce node bloat while keeping coverage
  • Generate navigation and search schemas where 0.8% of nodes can surface 51% of signals for rapid triage
  • Integrate with graph/abduct/mega/infranodus workflows to detect gaps and extend to SuperHyperGraphs

FAQ

What coverage and node ratios should I expect at each level?

Target approximately L3=100% nodes, L2≈20% nodes for ~80% coverage, L1≈4% for ~64% coverage, and L0≈0.8% for ~51% coverage, with L1:L2 roughly 2–3:1.

When should I use bidirectional construction instead of bottom-up only?

Use bidirectional (recommended) when you must validate a proposed schema against evidence or iterate rapidly—merge bottom-up discovery with top-down constraints for robust convergence.