home / skills / plurigrid / asi / padic-ultrametric

padic-ultrametric skill

/skills/padic-ultrametric

This skill applies p-adic ultrametric distances to hierarchical skill analysis, integration with UMAP, HNSW, and MLX to improve semantic clustering.

npx playbooks add skill plurigrid/asi --skill padic-ultrametric

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
15.0 KB
---
name: padic-ultrametric
description: P-adic ultrametric distance as foundation for UMAP→itUMAP→HNSW→Snowflake→MLX→SPI
  skill stack with cq/jq/narya normal form diffing
license: MIT
metadata:
  trit: -1
  color: '#8B4513'
  gf3_role: MINUS
  version: 1.0.0
  prime: 2
  precision: 64
  interface_ports:
  - References
  - Integration Points
trit: -1
---
# P-adic Ultrametric Skill

**Trit**: -1 (MINUS - validates/constrains via non-Archimedean metric)  
**GF(3) Triad**: `padic-ultrametric (-1) ⊗ skill-embedding-vss (0) ⊗ gay-mcp (+1) = 0 ✓`

## The Stack: From Mathematics to Metal

```
┌─────────────────────────────────────────────────────────────────────┐
│  P-ADIC ULTRAMETRIC: The Non-Archimedean Foundation                 │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 6: P-adic Valuation                                          │
│           d(x,z) ≤ max(d(x,y), d(y,z)) ← Strong Triangle Inequality │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 5: UMAP / itUMAP                                             │
│           Manifold approximation with ultrametric preservation      │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 4: HNSW (Hierarchical Navigable Small World)                 │
│           Log(n) approximate nearest neighbor via skip-list graph   │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 3: Snowflake Arctic 1024-bit Embeddings                      │
│           Dense semantic vectors: text → R^1024                     │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 2: MLX (Apple Machine Learning Framework)                    │
│           Unified memory, lazy evaluation, Metal kernels            │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 1: Apple Silicon (M1/M2/M3/M4)                               │
│           SPI-verified: deterministic across parallel streams       │
└─────────────────────────────────────────────────────────────────────┘
```

## Why P-adic Ultrametrics?

### The Strong Triangle Inequality

Standard (Archimedean) triangle inequality: `d(x,z) ≤ d(x,y) + d(y,z)`

P-adic (non-Archimedean) inequality: `d(x,z) ≤ max(d(x,y), d(y,z))`

**Implications:**
1. **All triangles are isoceles** with the unequal side being shortest
2. **Natural hierarchical clustering** - no intermediate distances
3. **Every point is a center** - all balls have same radius from any interior point
4. **Perfect for skill taxonomies** - semantic similarity is hierarchical

### Connection to Prime Geodesics (Chromatic Walk)

From [chromatic-walk](file:///Users/bob/.claude/skills/chromatic-walk/SKILL.md):

> Walks are **prime geodesics**: non-backtracking paths that are unambiguously traversable in p-adic number systems.

| Property | Prime Path | Composite Path |
|----------|------------|----------------|
| Factorization | **Unique** | Multiple |
| p-adic valuation | **Well-defined** | Ambiguous |
| Möbius μ(n) | ≠ 0 | = 0 (filtered) |

## Core Implementation

### P-adic Valuation and Norm

```python
def p_adic_valuation(n: int, p: int = 2) -> int:
    """v_p(n) = largest k such that p^k divides n."""
    if n == 0:
        return float('inf')
    k = 0
    while n % p == 0:
        n //= p
        k += 1
    return k

def p_adic_norm(n: int, p: int = 2) -> float:
    """|n|_p = p^(-v_p(n)). Smaller values are "further"."""
    v = p_adic_valuation(n, p)
    return 0.0 if v == float('inf') else p ** (-v)

def padic_ultrametric_distance(emb_a: np.ndarray, emb_b: np.ndarray, p: int = 2) -> float:
    """
    P-adic ultrametric: d_p(a, b) = max_i |a_i - b_i|_p
    
    Satisfies: d(x,z) ≤ max(d(x,y), d(y,z))
    """
    diff = emb_a - emb_b
    scale = 2 ** 32
    diff_int = (diff * scale).astype(np.int64)
    
    norms = [p_adic_norm(abs(int(d)), p) if d != 0 else 0.0 for d in diff_int]
    return max(norms) if norms else 0.0
```

### Content IDs with Normal Forms

Finding content with IDs that have normal forms (via cq | jq diffed with narya.el semantics):

```python
@dataclass
class ContentID:
    """Content-addressable identifier with normal form."""
    id: str
    content: str
    normal_form: str
    hash: str
    source: str  # 'cq' | 'jq' | 'narya'

def jq_normalize(content: str) -> str:
    """Normalize JSON/YAML using jq-style: sort keys, compact."""
    try:
        data = json.loads(content)
        return json.dumps(data, sort_keys=True, separators=(',', ':'))
    except json.JSONDecodeError:
        return content.strip()

def cq_normalize(content: str) -> str:
    """Normalize using cq (Clojure query) style - EDN/S-expr."""
    content = re.sub(r'\s+', ' ', content)
    return content.strip()

@dataclass
class NaryaDiff:
    """Narya-style semantic diff: before/after/delta/birth/death."""
    before: str
    after: str
    delta: Dict[str, Any]
    birth: List[str]   # New content
    death: List[str]   # Removed content
    
    @classmethod
    def from_contents(cls, before: str, after: str) -> 'NaryaDiff':
        before_lines = set(before.split('\n'))
        after_lines = set(after.split('\n'))
        birth = list(after_lines - before_lines)
        death = list(before_lines - after_lines)
        return cls(
            before=before, after=after,
            delta={'added': len(birth), 'removed': len(death), 'changed': len(birth) + len(death)},
            birth=birth, death=death
        )
    
    def to_narya_witness(self) -> Dict:
        """Format as Narya proof witness for observational bridge types."""
        import hashlib
        return {
            'before': hashlib.sha256(self.before.encode()).hexdigest()[:16],
            'after': hashlib.sha256(self.after.encode()).hexdigest()[:16],
            'delta': self.delta,
            'impact': self.delta['changed'] > 0
        }
```

## MLX Operation Tracing → SPI Verification

Trace every MLX op down to Metal kernels with Strong Parallelism Invariance:

```python
@dataclass
class MLXTrace:
    """Trace MLX operations to Apple Silicon primitives."""
    operation: str
    input_shapes: List[Tuple[int, ...]]
    output_shape: Tuple[int, ...]
    metal_kernel: Optional[str] = None
    flops: int = 0
    memory_bytes: int = 0

class SPIVerifier:
    """Strong Parallelism Invariance: same seed → same result across parallel streams."""
    GOLDEN = 0x9E3779B97F4A7C15
    
    def __init__(self, seed: int):
        self.seed = seed
        self.state = seed
        self.traces: List[MLXTrace] = []
        self.checksums: List[int] = []
    
    def log_trace(self, trace: MLXTrace):
        self.traces.append(trace)
        # Update checksum deterministically
        self.state = ((self.state ^ (hash(trace.operation) & 0xFFFFFFFF)) * self.GOLDEN) & 0xFFFFFFFFFFFFFFFF
        self.checksums.append(self.state & 0xFFFF)
    
    def verify_chain(self) -> bool:
        """Verify deterministic chain: recompute from seed, compare checksums."""
        verifier = SPIVerifier(self.seed)
        for trace in self.traces:
            verifier.log_trace(trace)
        return verifier.checksums == self.checksums
```

## UMAP ↔ itUMAP ↔ HNSW Integration

### itUMAP (Iterative UMAP)

Preserves geodesic distances better than standard UMAP:

```python
def itumap_with_padic(embeddings: np.ndarray, n_neighbors: int = 15, 
                      metric: str = 'padic', prime: int = 2) -> np.ndarray:
    """
    itUMAP with p-adic ultrametric as base distance.
    
    1. Compute p-adic distance matrix
    2. Run UMAP with precomputed distances
    3. Iterate: refine based on projection error
    """
    n = len(embeddings)
    
    # P-adic distance matrix
    dist_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(i+1, n):
            d = padic_ultrametric_distance(embeddings[i], embeddings[j], prime)
            dist_matrix[i, j] = d
            dist_matrix[j, i] = d
    
    # UMAP with precomputed distances
    import umap
    reducer = umap.UMAP(
        metric='precomputed',
        n_neighbors=n_neighbors,
        min_dist=0.1
    )
    projection = reducer.fit_transform(dist_matrix)
    
    return projection
```

### HNSW with P-adic Reranking

Use Euclidean HNSW for recall, p-adic for precision:

```python
def hnsw_padic_search(index: 'PAdicSkillIndex', query: np.ndarray, k: int = 10) -> List[Tuple[str, float]]:
    """
    Two-stage search:
    1. HNSW retrieval (fast, Euclidean)
    2. P-adic reranking (precise, ultrametric)
    """
    # Stage 1: Get 3k candidates via HNSW
    candidates = index.conn.execute(f'''
        SELECT name, array_distance(embedding, ?::FLOAT[1024]) as dist
        FROM skills ORDER BY dist ASC LIMIT ?
    ''', [query.tolist(), k * 3]).fetchall()
    
    # Stage 2: Rerank by p-adic distance
    reranked = []
    for name, eucl_dist in candidates:
        if name in index.embeddings:
            padic_dist = padic_ultrametric_distance(query, index.embeddings[name], index.prime)
            reranked.append((name, padic_dist))
    
    reranked.sort(key=lambda x: x[1])
    return reranked[:k]
```

## Ultrametric Clustering

The strong triangle inequality gives natural hierarchical clusters:

```python
def ultrametric_clustering(embeddings: Dict[str, np.ndarray], 
                           threshold: float = 0.5, 
                           prime: int = 2) -> List[List[str]]:
    """
    Single-linkage clustering in ultrametric space.
    
    In ultrametric: all triangles are isoceles → natural dendrograms.
    """
    skills = list(embeddings.keys())
    n = len(skills)
    
    # Distance matrix
    dist_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(i+1, n):
            d = padic_ultrametric_distance(embeddings[skills[i]], embeddings[skills[j]], prime)
            dist_matrix[i, j] = d
            dist_matrix[j, i] = d
    
    # Single-linkage (natural for ultrametric)
    clusters = [[i] for i in range(n)]
    
    while len(clusters) > 1:
        min_dist = float('inf')
        merge_pair = None
        
        for i, c1 in enumerate(clusters):
            for j, c2 in enumerate(clusters[i+1:], i+1):
                d = max(dist_matrix[a, b] for a in c1 for b in c2)
                if d < min_dist:
                    min_dist = d
                    merge_pair = (i, j)
        
        if min_dist > threshold or merge_pair is None:
            break
        
        i, j = merge_pair
        clusters[i].extend(clusters[j])
        clusters.pop(j)
    
    return [[skills[i] for i in cluster] for cluster in clusters]
```

## Propagation to All Skills

### How to Use P-adic Distance in Any Skill

```python
# In any skill that uses similarity/distance:
from padic_ultrametric import padic_ultrametric_distance, PAdicConfig

config = PAdicConfig(prime=2, precision=64)

# Replace Euclidean distance:
# distance = np.linalg.norm(a - b)  # OLD
distance = padic_ultrametric_distance(a, b, config.prime)  # NEW

# The ultrametric property guarantees:
assert distance <= max(d_ab, d_bc)  # Strong triangle inequality
```

## CLI Usage

```bash
# Find p-adic nearest neighbors
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills', seed=1069, prime=2)
idx.index_skills_with_ids()
for name, eucl, padic in idx.padic_nearest('bisimulation-game', k=5):
    print(f'{name}: eucl={eucl:.4f}, p-adic={padic:.6f}')
"

# Find skills with content IDs and normal forms
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills')
cids = idx.index_skills_with_ids()
for name, cid in list(cids.items())[:10]:
    print(f'{name}: {cid.id} (hash: {cid.hash}, source: {cid.source})')
"

# Generate SPI report
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills', seed=1069)
idx.index_skills_with_ids()
report = idx.spi_report()
print(f'Seed: {report[\"seed\"]}, Chain valid: {report[\"chain_valid\"]}')
print(f'Total FLOPS: {report[\"total_flops\"]:,}')
"
```

## Invariants

```yaml
invariants:
  - name: ultrametric_property
    predicate: "d(x,z) ≤ max(d(x,y), d(y,z))"
    scope: all_triples
    
  - name: spi_determinism
    predicate: "same seed → same checksum chain"
    scope: per_session
    
  - name: normal_form_stability
    predicate: "jq_normalize(cq_normalize(x)) = jq_normalize(x)"
    scope: per_content
    
  - name: gf3_conservation
    predicate: "padic(-1) + embedding-vss(0) + gay-mcp(+1) = 0"
    scope: per_triad
```

---

## End-of-Skill Interface

### Integration Points

| Skill | Integration Point | Benefit |
|-------|------------------|---------|
| `skill-embedding-vss` | `find_nearest()` | Hierarchical skill clusters |
| `chromatic-walk` | Prime geodesic validation | Non-backtracking paths |
| `bisimulation-game` | Observational equivalence | Ultrametric bisimilarity |
| `gay-mcp` | Color distance | p-adic hue difference |
| `glass-bead-game` | World hopping distance | Triangle inequality sparsification |

## References

- [Non-Archimedean Geometry](https://en.wikipedia.org/wiki/Non-Archimedean_geometry)
- [P-adic Numbers](https://en.wikipedia.org/wiki/P-adic_number)
- [Chromatic Walk: Prime Geodesics](file:///Users/bob/.claude/skills/chromatic-walk/SKILL.md)
- [Skill Embedding VSS](file:///Users/bob/.claude/skills/skill-embedding-vss/SKILL.md)
- [Structure-Aware Version Control via Observational Bridge Types](https://topos.institute/blog/2024-11-13-structure-aware-version-control-via-observational-bridge-types/)
- [SplitMix64](https://dl.acm.org/doi/10.1145/2714064.2660195)


---

## Autopoietic Marginalia

> **The interaction IS the skill improving itself.**

Every use of this skill is an opportunity for worlding:
- **MEMORY** (-1): Record what was learned
- **REMEMBERING** (0): Connect patterns to other skills  
- **WORLDING** (+1): Evolve the skill based on use



*Add Interaction Exemplars here as the skill is used.*

Overview

This skill implements a p-adic ultrametric distance and integrates it across a retrieval and embedding stack (UMAP → itUMAP → HNSW → Snowflake → MLX → SPI). It uses non-Archimedean geometry to produce hierarchical, deterministic similarity measures and ties content-addressable IDs to jq/cq/narya normal forms. The design emphasizes reproducible traces down to Metal kernels on Apple Silicon.

How this skill works

It computes p-adic valuations and norms on discretized embedding differences, then defines distance as the coordinate-wise maximum p-adic norm (a true ultrametric). Distances feed into an itUMAP pipeline for projection, HNSW for fast candidate recall, and a p-adic reranker for precision. The skill also normalizes content via jq/cq conventions and produces narya-style semantic diffs and SPI-verified MLX traces to guarantee deterministic behavior.

When to use it

  • When hierarchical, tree-like similarity is required rather than smooth Euclidean neighborhoods
  • To improve semantic clustering and make all balls have stable centers for taxonomy building
  • When you need deterministic, reproducible ML traces across parallel executions on Apple Silicon
  • To rerank fast Euclidean retrievals with a precise ultrametric criterion
  • When content IDs must be stable via normal forms and semantic diffs

Best practices

  • Discretize and scale embeddings consistently before p-adic valuation to avoid numeric artifacts
  • Use Euclidean HNSW for candidate generation, then apply p-adic reranking for precision
  • Choose the prime and fixed precision (bit scaling) to match embedding quantization
  • Log MLX traces and verify via SPI on a fixed seed to ensure chain determinism
  • Normalize content with jq/cq before computing hashes or diffs to keep IDs stable

Example use cases

  • Building hierarchical skill taxonomies where every cluster has clear centers
  • High-precision reranking for semantic search: HNSW for recall, p-adic for final ranking
  • Embedding visualization that preserves ultrametric geodesics using itUMAP
  • Generating stable content-addressable IDs and narya witnesses for change provenance
  • Producing SPI reports that trace ML ops to Metal kernels for deployment verification

FAQ

Why use a p-adic ultrametric instead of Euclidean distance?

P-adic ultrametrics enforce the strong triangle inequality, producing natural hierarchical clusters, stable centers, and clear taxonomies where intermediate distances are suppressed.

How do I keep results deterministic across runs?

Use the SPI verifier: fix a seed, emit MLXTrace entries for each operation, and validate checksums. This reproducibility extends to Metal kernels on Apple Silicon when traces are logged consistently.