home / skills / plurigrid / asi / padic-ultrametric

padic-ultrametric skill

needs review

This skill applies p-adic ultrametric distances to hierarchical skill analysis, integration with UMAP, HNSW, and MLX to improve semantic clustering.

npx playbooks add skill plurigrid/asi --skill padic-ultrametric

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

15.0 KB

---
name: padic-ultrametric
description: P-adic ultrametric distance as foundation for UMAP→itUMAP→HNSW→Snowflake→MLX→SPI
  skill stack with cq/jq/narya normal form diffing
license: MIT
metadata:
  trit: -1
  color: '#8B4513'
  gf3_role: MINUS
  version: 1.0.0
  prime: 2
  precision: 64
  interface_ports:
  - References
  - Integration Points
trit: -1
---
# P-adic Ultrametric Skill

**Trit**: -1 (MINUS - validates/constrains via non-Archimedean metric)  
**GF(3) Triad**: `padic-ultrametric (-1) ⊗ skill-embedding-vss (0) ⊗ gay-mcp (+1) = 0 ✓`

## The Stack: From Mathematics to Metal

```
┌─────────────────────────────────────────────────────────────────────┐
│  P-ADIC ULTRAMETRIC: The Non-Archimedean Foundation                 │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 6: P-adic Valuation                                          │
│           d(x,z) ≤ max(d(x,y), d(y,z)) ← Strong Triangle Inequality │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 5: UMAP / itUMAP                                             │
│           Manifold approximation with ultrametric preservation      │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 4: HNSW (Hierarchical Navigable Small World)                 │
│           Log(n) approximate nearest neighbor via skip-list graph   │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 3: Snowflake Arctic 1024-bit Embeddings                      │
│           Dense semantic vectors: text → R^1024                     │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 2: MLX (Apple Machine Learning Framework)                    │
│           Unified memory, lazy evaluation, Metal kernels            │
├─────────────────────────────────────────────────────────────────────┤
│  Layer 1: Apple Silicon (M1/M2/M3/M4)                               │
│           SPI-verified: deterministic across parallel streams       │
└─────────────────────────────────────────────────────────────────────┘
```

## Why P-adic Ultrametrics?

### The Strong Triangle Inequality

Standard (Archimedean) triangle inequality: `d(x,z) ≤ d(x,y) + d(y,z)`

P-adic (non-Archimedean) inequality: `d(x,z) ≤ max(d(x,y), d(y,z))`

**Implications:**
1. **All triangles are isoceles** with the unequal side being shortest
2. **Natural hierarchical clustering** - no intermediate distances
3. **Every point is a center** - all balls have same radius from any interior point
4. **Perfect for skill taxonomies** - semantic similarity is hierarchical

### Connection to Prime Geodesics (Chromatic Walk)

From [chromatic-walk](file:///Users/bob/.claude/skills/chromatic-walk/SKILL.md):

> Walks are **prime geodesics**: non-backtracking paths that are unambiguously traversable in p-adic number systems.

| Property | Prime Path | Composite Path |
|----------|------------|----------------|
| Factorization | **Unique** | Multiple |
| p-adic valuation | **Well-defined** | Ambiguous |
| Möbius μ(n) | ≠ 0 | = 0 (filtered) |

## Core Implementation

### P-adic Valuation and Norm

```python
def p_adic_valuation(n: int, p: int = 2) -> int:
    """v_p(n) = largest k such that p^k divides n."""
    if n == 0:
        return float('inf')
    k = 0
    while n % p == 0:
        n //= p
        k += 1
    return k

def p_adic_norm(n: int, p: int = 2) -> float:
    """|n|_p = p^(-v_p(n)). Smaller values are "further"."""
    v = p_adic_valuation(n, p)
    return 0.0 if v == float('inf') else p ** (-v)

def padic_ultrametric_distance(emb_a: np.ndarray, emb_b: np.ndarray, p: int = 2) -> float:
    """
    P-adic ultrametric: d_p(a, b) = max_i |a_i - b_i|_p
    
    Satisfies: d(x,z) ≤ max(d(x,y), d(y,z))
    """
    diff = emb_a - emb_b
    scale = 2 ** 32
    diff_int = (diff * scale).astype(np.int64)
    
    norms = [p_adic_norm(abs(int(d)), p) if d != 0 else 0.0 for d in diff_int]
    return max(norms) if norms else 0.0
```

### Content IDs with Normal Forms

Finding content with IDs that have normal forms (via cq | jq diffed with narya.el semantics):

```python
@dataclass
class ContentID:
    """Content-addressable identifier with normal form."""
    id: str
    content: str
    normal_form: str
    hash: str
    source: str  # 'cq' | 'jq' | 'narya'

def jq_normalize(content: str) -> str:
    """Normalize JSON/YAML using jq-style: sort keys, compact."""
    try:
        data = json.loads(content)
        return json.dumps(data, sort_keys=True, separators=(',', ':'))
    except json.JSONDecodeError:
        return content.strip()

def cq_normalize(content: str) -> str:
    """Normalize using cq (Clojure query) style - EDN/S-expr."""
    content = re.sub(r'\s+', ' ', content)
    return content.strip()

@dataclass
class NaryaDiff:
    """Narya-style semantic diff: before/after/delta/birth/death."""
    before: str
    after: str
    delta: Dict[str, Any]
    birth: List[str]   # New content
    death: List[str]   # Removed content
    
    @classmethod
    def from_contents(cls, before: str, after: str) -> 'NaryaDiff':
        before_lines = set(before.split('\n'))
        after_lines = set(after.split('\n'))
        birth = list(after_lines - before_lines)
        death = list(before_lines - after_lines)
        return cls(
            before=before, after=after,
            delta={'added': len(birth), 'removed': len(death), 'changed': len(birth) + len(death)},
            birth=birth, death=death
        )
    
    def to_narya_witness(self) -> Dict:
        """Format as Narya proof witness for observational bridge types."""
        import hashlib
        return {
            'before': hashlib.sha256(self.before.encode()).hexdigest()[:16],
            'after': hashlib.sha256(self.after.encode()).hexdigest()[:16],
            'delta': self.delta,
            'impact': self.delta['changed'] > 0
        }
```

## MLX Operation Tracing → SPI Verification

Trace every MLX op down to Metal kernels with Strong Parallelism Invariance:

```python
@dataclass
class MLXTrace:
    """Trace MLX operations to Apple Silicon primitives."""
    operation: str
    input_shapes: List[Tuple[int, ...]]
    output_shape: Tuple[int, ...]
    metal_kernel: Optional[str] = None
    flops: int = 0
    memory_bytes: int = 0

class SPIVerifier:
    """Strong Parallelism Invariance: same seed → same result across parallel streams."""
    GOLDEN = 0x9E3779B97F4A7C15
    
    def __init__(self, seed: int):
        self.seed = seed
        self.state = seed
        self.traces: List[MLXTrace] = []
        self.checksums: List[int] = []
    
    def log_trace(self, trace: MLXTrace):
        self.traces.append(trace)
        # Update checksum deterministically
        self.state = ((self.state ^ (hash(trace.operation) & 0xFFFFFFFF)) * self.GOLDEN) & 0xFFFFFFFFFFFFFFFF
        self.checksums.append(self.state & 0xFFFF)
    
    def verify_chain(self) -> bool:
        """Verify deterministic chain: recompute from seed, compare checksums."""
        verifier = SPIVerifier(self.seed)
        for trace in self.traces:
            verifier.log_trace(trace)
        return verifier.checksums == self.checksums
```

## UMAP ↔ itUMAP ↔ HNSW Integration

### itUMAP (Iterative UMAP)

Preserves geodesic distances better than standard UMAP:

```python
def itumap_with_padic(embeddings: np.ndarray, n_neighbors: int = 15, 
                      metric: str = 'padic', prime: int = 2) -> np.ndarray:
    """
    itUMAP with p-adic ultrametric as base distance.
    
    1. Compute p-adic distance matrix
    2. Run UMAP with precomputed distances
    3. Iterate: refine based on projection error
    """
    n = len(embeddings)
    
    # P-adic distance matrix
    dist_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(i+1, n):
            d = padic_ultrametric_distance(embeddings[i], embeddings[j], prime)
            dist_matrix[i, j] = d
            dist_matrix[j, i] = d
    
    # UMAP with precomputed distances
    import umap
    reducer = umap.UMAP(
        metric='precomputed',
        n_neighbors=n_neighbors,
        min_dist=0.1
    )
    projection = reducer.fit_transform(dist_matrix)
    
    return projection
```

### HNSW with P-adic Reranking

Use Euclidean HNSW for recall, p-adic for precision:

```python
def hnsw_padic_search(index: 'PAdicSkillIndex', query: np.ndarray, k: int = 10) -> List[Tuple[str, float]]:
    """
    Two-stage search:
    1. HNSW retrieval (fast, Euclidean)
    2. P-adic reranking (precise, ultrametric)
    """
    # Stage 1: Get 3k candidates via HNSW
    candidates = index.conn.execute(f'''
        SELECT name, array_distance(embedding, ?::FLOAT[1024]) as dist
        FROM skills ORDER BY dist ASC LIMIT ?
    ''', [query.tolist(), k * 3]).fetchall()
    
    # Stage 2: Rerank by p-adic distance
    reranked = []
    for name, eucl_dist in candidates:
        if name in index.embeddings:
            padic_dist = padic_ultrametric_distance(query, index.embeddings[name], index.prime)
            reranked.append((name, padic_dist))
    
    reranked.sort(key=lambda x: x[1])
    return reranked[:k]
```

## Ultrametric Clustering

The strong triangle inequality gives natural hierarchical clusters:

```python
def ultrametric_clustering(embeddings: Dict[str, np.ndarray], 
                           threshold: float = 0.5, 
                           prime: int = 2) -> List[List[str]]:
    """
    Single-linkage clustering in ultrametric space.
    
    In ultrametric: all triangles are isoceles → natural dendrograms.
    """
    skills = list(embeddings.keys())
    n = len(skills)
    
    # Distance matrix
    dist_matrix = np.zeros((n, n))
    for i in range(n):
        for j in range(i+1, n):
            d = padic_ultrametric_distance(embeddings[skills[i]], embeddings[skills[j]], prime)
            dist_matrix[i, j] = d
            dist_matrix[j, i] = d
    
    # Single-linkage (natural for ultrametric)
    clusters = [[i] for i in range(n)]
    
    while len(clusters) > 1:
        min_dist = float('inf')
        merge_pair = None
        
        for i, c1 in enumerate(clusters):
            for j, c2 in enumerate(clusters[i+1:], i+1):
                d = max(dist_matrix[a, b] for a in c1 for b in c2)
                if d < min_dist:
                    min_dist = d
                    merge_pair = (i, j)
        
        if min_dist > threshold or merge_pair is None:
            break
        
        i, j = merge_pair
        clusters[i].extend(clusters[j])
        clusters.pop(j)
    
    return [[skills[i] for i in cluster] for cluster in clusters]
```

## Propagation to All Skills

### How to Use P-adic Distance in Any Skill

```python
# In any skill that uses similarity/distance:
from padic_ultrametric import padic_ultrametric_distance, PAdicConfig

config = PAdicConfig(prime=2, precision=64)

# Replace Euclidean distance:
# distance = np.linalg.norm(a - b)  # OLD
distance = padic_ultrametric_distance(a, b, config.prime)  # NEW

# The ultrametric property guarantees:
assert distance <= max(d_ab, d_bc)  # Strong triangle inequality
```

## CLI Usage

```bash
# Find p-adic nearest neighbors
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills', seed=1069, prime=2)
idx.index_skills_with_ids()
for name, eucl, padic in idx.padic_nearest('bisimulation-game', k=5):
    print(f'{name}: eucl={eucl:.4f}, p-adic={padic:.6f}')
"

# Find skills with content IDs and normal forms
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills')
cids = idx.index_skills_with_ids()
for name, cid in list(cids.items())[:10]:
    print(f'{name}: {cid.id} (hash: {cid.hash}, source: {cid.source})')
"

# Generate SPI report
python -c "
from padic_ultrametric import PAdicSkillIndex
idx = PAdicSkillIndex('/Users/bob/.claude/skills', seed=1069)
idx.index_skills_with_ids()
report = idx.spi_report()
print(f'Seed: {report[\"seed\"]}, Chain valid: {report[\"chain_valid\"]}')
print(f'Total FLOPS: {report[\"total_flops\"]:,}')
"
```

## Invariants

```yaml
invariants:
  - name: ultrametric_property
    predicate: "d(x,z) ≤ max(d(x,y), d(y,z))"
    scope: all_triples
    
  - name: spi_determinism
    predicate: "same seed → same checksum chain"
    scope: per_session
    
  - name: normal_form_stability
    predicate: "jq_normalize(cq_normalize(x)) = jq_normalize(x)"
    scope: per_content
    
  - name: gf3_conservation
    predicate: "padic(-1) + embedding-vss(0) + gay-mcp(+1) = 0"
    scope: per_triad
```

---

## End-of-Skill Interface

### Integration Points

| Skill | Integration Point | Benefit |
|-------|------------------|---------|
| `skill-embedding-vss` | `find_nearest()` | Hierarchical skill clusters |
| `chromatic-walk` | Prime geodesic validation | Non-backtracking paths |
| `bisimulation-game` | Observational equivalence | Ultrametric bisimilarity |
| `gay-mcp` | Color distance | p-adic hue difference |
| `glass-bead-game` | World hopping distance | Triangle inequality sparsification |

## References

- [Non-Archimedean Geometry](https://en.wikipedia.org/wiki/Non-Archimedean_geometry)
- [P-adic Numbers](https://en.wikipedia.org/wiki/P-adic_number)
- [Chromatic Walk: Prime Geodesics](file:///Users/bob/.claude/skills/chromatic-walk/SKILL.md)
- [Skill Embedding VSS](file:///Users/bob/.claude/skills/skill-embedding-vss/SKILL.md)
- [Structure-Aware Version Control via Observational Bridge Types](https://topos.institute/blog/2024-11-13-structure-aware-version-control-via-observational-bridge-types/)
- [SplitMix64](https://dl.acm.org/doi/10.1145/2714064.2660195)


---

## Autopoietic Marginalia

> **The interaction IS the skill improving itself.**

Every use of this skill is an opportunity for worlding:
- **MEMORY** (-1): Record what was learned
- **REMEMBERING** (0): Connect patterns to other skills  
- **WORLDING** (+1): Evolve the skill based on use



*Add Interaction Exemplars here as the skill is used.*

Overview

This skill implements a p-adic ultrametric distance and integrates it across a retrieval and embedding stack (UMAP → itUMAP → HNSW → Snowflake → MLX → SPI). It uses non-Archimedean geometry to produce hierarchical, deterministic similarity measures and ties content-addressable IDs to jq/cq/narya normal forms. The design emphasizes reproducible traces down to Metal kernels on Apple Silicon.

How this skill works

It computes p-adic valuations and norms on discretized embedding differences, then defines distance as the coordinate-wise maximum p-adic norm (a true ultrametric). Distances feed into an itUMAP pipeline for projection, HNSW for fast candidate recall, and a p-adic reranker for precision. The skill also normalizes content via jq/cq conventions and produces narya-style semantic diffs and SPI-verified MLX traces to guarantee deterministic behavior.

When to use it

When hierarchical, tree-like similarity is required rather than smooth Euclidean neighborhoods
To improve semantic clustering and make all balls have stable centers for taxonomy building
When you need deterministic, reproducible ML traces across parallel executions on Apple Silicon
To rerank fast Euclidean retrievals with a precise ultrametric criterion
When content IDs must be stable via normal forms and semantic diffs

Best practices

Discretize and scale embeddings consistently before p-adic valuation to avoid numeric artifacts
Use Euclidean HNSW for candidate generation, then apply p-adic reranking for precision
Choose the prime and fixed precision (bit scaling) to match embedding quantization
Log MLX traces and verify via SPI on a fixed seed to ensure chain determinism
Normalize content with jq/cq before computing hashes or diffs to keep IDs stable

Example use cases

Building hierarchical skill taxonomies where every cluster has clear centers
High-precision reranking for semantic search: HNSW for recall, p-adic for final ranking
Embedding visualization that preserves ultrametric geodesics using itUMAP
Generating stable content-addressable IDs and narya witnesses for change provenance
Producing SPI reports that trace ML ops to Metal kernels for deployment verification

FAQ

Why use a p-adic ultrametric instead of Euclidean distance?

P-adic ultrametrics enforce the strong triangle inequality, producing natural hierarchical clusters, stable centers, and clear taxonomies where intermediate distances are suppressed.

How do I keep results deterministic across runs?

Use the SPI verifier: fix a seed, emit MLXTrace entries for each operation, and validate checksums. This reproducibility extends to Metal kernels on Apple Silicon when traces are logged consistently.