home / skills / zpankz / mcp-skillset / leann

leann skill

needs review

This skill enables fast local semantic search for large codebases by using anchor-based reconstruction to dramatically reduce storage while preserving accuracy.

npx playbooks add skill zpankz/mcp-skillset --skill leann

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

18.7 KB

---
name: leann
description: Local RAG indexing with 97% storage reduction via anchor-based lazy recomputation. Graph-based selective embedding storage for memory-efficient semantic code search.
version: 1.0.0
triggers:
  - keywords: [leann, semantic search, code index, RAG, vector search, embedding index, anchor graph]
  - context: [indexing large codebases, memory-efficient search, local code search, production RAG]
  - file_patterns: [leann.config.json, leann.index, .leann/]
tags: [rag, vector-search, graph, compression, local-first, semantic-search]
---

# LEANN Skill

**LEANN (Learned Embedding ANchor Navigation)** - A graph-based selective recomputation system achieving 97% storage reduction for local RAG indexes while maintaining fast retrieval performance.

---

## Core Principle

**Use leann for persistent local code indexes with minimal storage overhead.**

Instead of storing all embeddings at full precision (3 GB for 1M files), leann stores only strategic anchors and reconstructs others on-demand (128 MB for same dataset - **95.8% reduction**).

---

## When to Use LEANN

### Primary Use Cases

**1. Large Codebase Indexing (10K+ files)**
- Monorepos with multiple services
- Enterprise codebases that exceed vector database free tiers
- Projects requiring fast local semantic search without cloud dependencies

**2. Memory-Constrained Environments**
- Development machines with limited RAM
- CI/CD pipelines needing index validation
- Edge deployments or air-gapped systems

**3. Cost-Sensitive RAG Applications**
- Avoiding Pinecone/Weaviate monthly costs
- Self-hosted vector search with minimal infrastructure
- Batch processing of large document collections

**4. Real-Time Code Navigation**
- IDE integrations for semantic code search
- Developer tools needing instant relevance feedback
- Documentation search within editors

### When NOT to Use LEANN

**Use traditional vector DBs instead when:**
- Dataset is small (<1K items) - overhead not worth it
- Cloud infrastructure is required (multi-tenant, global CDN)
- Need advanced features (hybrid search, filtering, multi-tenancy)
- Absolute lowest latency required (<5ms) - leann trades latency for storage

---

## Architecture Overview

### Two-Stage Query Process

```
Query → [Stage 1: Anchor Graph Search] → [Stage 2: Lazy Reconstruction] → Results

Stage 1: Fast HNSW/DiskANN traversal (1-5ms)
Stage 2: Reconstruct top candidates (5-25ms)
Total: 10-30ms (HNSW) or 50-200ms (DiskANN)
```

### Storage Model

```
Naive:  N × D × 4 bytes (full embeddings)
LEANN:  M × D × 4 + N × (D / 8) bytes (anchors + compressed deltas)

Example (100K files, 768 dims):
Naive:  100,000 × 768 × 4     = 307 MB
LEANN:  1,000 × 768 × 4       = 3 MB (anchors)
        100,000 × 96          = 9.6 MB (deltas)
        Total                 ≈ 12.6 MB (95.9% reduction)
```

### Key Components

1. **Anchor Graph:** HNSW (in-memory) or DiskANN (disk-based) graph of anchor embeddings
2. **Product Quantization:** Compress delta vectors (delta = embedding - anchor)
3. **ZMQ Embedding Server:** GPU-accelerated embedding generation
4. **Delta Index:** Incremental updates without full rebuild

---

## Quick Start

### 1. Installation

```bash
# Via pip
pip install leann

# Via conda
conda install -c conda-forge leann

# From source
git clone https://github.com/leann-index/leann.git
cd leann && pip install -e .
```

### 2. Start Embedding Server

```bash
# Docker (recommended - GPU support)
docker run -d \
  --name leann-embeddings \
  --gpus all \
  -p 5555:5555 \
  -e MODEL=sentence-transformers/all-MiniLM-L6-v2 \
  leann/embedding-server:latest

# Python script (alternative)
python -m leann.server \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --port 5555 \
  --device cuda
```

### 3. Create Index

```bash
# Basic indexing
leann index create \
  --input /path/to/codebase \
  --output ./leann.index \
  --config leann.config.json

# With progress tracking
leann index create \
  --input /path/to/codebase \
  --output ./leann.index \
  --progress \
  --verbose
```

### 4. Query Index

```bash
# CLI query
leann query \
  --index ./leann.index \
  --query "JWT authentication middleware" \
  --top-k 10

# Watch mode (live updates)
leann serve \
  --index ./leann.index \
  --watch \
  --port 8080
```

---

## Configuration

### Minimal Configuration

```json
{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 128,
    "timeout": 30000
  }
}
```

### Production Configuration

```json
{
  "backend": "hnsw",
  "complexity": "high",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 1000,
    "samples": 20000
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256,
    "trainingSamples": 50000
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 256,
    "timeout": 60000
  },
  "incremental": {
    "enabled": true,
    "deltaIndexThreshold": 1000,
    "rebuildSchedule": "0 2 * * *"
  },
  "hnsw": {
    "M": 32,
    "efConstruction": 400,
    "efSearch": 200
  }
}
```

### Configuration Parameters

#### Backend Selection

**`backend: "hnsw"` (In-Memory)**
- Use when index fits in RAM
- Query latency: 10-30ms
- Suitable for: <1M files, <50K anchors

**`backend: "diskann"` (Disk-Based)**
- Use when index exceeds RAM
- Query latency: 50-200ms
- Suitable for: 1M+ files, massive monorepos

#### Complexity Levels

| Level | M | efConstruction | efSearch | Build Time | Accuracy |
|-------|---|----------------|----------|------------|----------|
| low | 12 | 100 | 50 | Fast | Good |
| medium | 16 | 200 | 100 | Moderate | Better |
| high | 32 | 400 | 200 | Slow | Best |

#### Anchor Selection Strategies

**1. Random (Fastest)**
```json
{
  "type": "random",
  "count": 300
}
```
- Speed: O(N)
- Quality: Acceptable for homogeneous codebases
- Use when: Prototyping or single-language projects

**2. K-Means (Recommended)**
```json
{
  "type": "kmeans",
  "clusters": 300,
  "samples": 10000
}
```
- Speed: O(N × k × iterations)
- Quality: Excellent coverage
- Use when: Production deployments (default)

**3. Max-Coverage (Best Quality)**
```json
{
  "type": "max-coverage",
  "count": 300,
  "diversityThreshold": 0.2
}
```
- Speed: O(N × M²)
- Quality: Best for sparse distributions
- Use when: High-value outliers must be findable

---

## Common Workflows

### Workflow 1: Initial Indexing

```bash
# 1. Discover codebase
find /path/to/codebase -type f \( -name "*.py" -o -name "*.ts" \) | wc -l
# Output: 15,000 files

# 2. Calculate recommended anchors
# Rule: M = √N → √15000 ≈ 122 → recommend 300 for better coverage

# 3. Create config
cat > leann.config.json <<EOF
{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300,
    "samples": 5000
  },
  "quantization": {
    "subVectors": 96,
    "codebookSize": 256
  },
  "embeddingServer": {
    "endpoint": "tcp://localhost:5555",
    "model": "sentence-transformers/all-MiniLM-L6-v2",
    "batchSize": 128,
    "timeout": 30000
  }
}
EOF

# 4. Build index
leann index create \
  --config leann.config.json \
  --input /path/to/codebase \
  --output ./leann.index \
  --progress

# 5. Validate
leann index validate ./leann.index

# 6. Test query
leann query \
  --index ./leann.index \
  --query "database connection pooling" \
  --top-k 5
```

### Workflow 2: CI/CD Integration

```yaml
# .github/workflows/update-index.yml
name: Update LEANN Index

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 0'  # Weekly full rebuild

jobs:
  update-index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup LEANN
        run: |
          pip install leann
          docker run -d -p 5555:5555 leann/embedding-server:latest

      - name: Incremental Update
        run: |
          leann index update \
            --index ./leann.index \
            --git-diff origin/main HEAD \
            --output ./leann-updated.index

      - name: Validate Index
        run: |
          leann index validate ./leann-updated.index
          if [ $? -ne 0 ]; then
            # Validation failed - trigger full rebuild
            leann index create \
              --config leann.config.json \
              --input . \
              --output ./leann-updated.index
          fi

      - name: Upload Artifact
        uses: actions/upload-artifact@v3
        with:
          name: leann-index
          path: ./leann-updated.index
```

### Workflow 3: IDE Integration

```python
# VS Code extension example
from leann import LEANNIndex

class SemanticSearchProvider:
    def __init__(self, index_path):
        self.index = LEANNIndex.load(index_path)

    def search(self, query, top_k=10):
        results = self.index.query(
            query=query,
            top_k=top_k,
            mode='retrieve',
            filters={'languages': ['python', 'typescript']}
        )
        return [
            {
                'path': item.item.path,
                'score': item.score,
                'preview': self._get_preview(item.item)
            }
            for item in results.items
        ]
```

### Workflow 4: Polyglot Codebase

```json
{
  "backend": "hnsw",
  "complexity": "medium",
  "anchorSelection": {
    "type": "stratified-kmeans",
    "totalClusters": 500,
    "stratification": {
      "enabled": true,
      "groupBy": "language",
      "minPerGroup": 20,
      "allocation": "proportional"
    }
  },
  "filters": {
    "languages": ["python", "typescript", "go", "rust"],
    "excludePaths": ["node_modules/", "__pycache__/", "vendor/", "target/"]
  }
}
```

**Benefit:** Ensures each language gets proportional representation in anchors, preventing Python from dominating a Python/TypeScript/Go codebase.

---

## Performance Tuning

### Query Latency Optimization

**Problem:** Queries taking >50ms

**Diagnosis:**
```bash
leann index stats ./leann.index

# Check:
# - Avg candidates evaluated (should be <1000)
# - efSearch parameter (default: 100)
```

**Solutions:**

1. **Reduce efSearch** (faster, lower recall)
```json
{
  "hnsw": {
    "efSearch": 50
  }
}
```

2. **Increase anchor count** (more memory, better coverage)
```bash
leann index rebuild \
  --index ./leann.index \
  --new-anchor-count 1000 \
  --output ./leann-improved.index
```

3. **Use early termination**
```bash
leann query \
  --index ./leann.index \
  --query "authentication" \
  --top-k 10 \
  --early-terminate \
  --confidence-threshold 0.9
```

### Storage Optimization

**Problem:** Index larger than expected

**Diagnosis:**
```bash
leann index stats ./leann.index

# Check:
# - Anchor count (should be ~√N)
# - PQ configuration (96 subvectors × 256 codebook is standard)
```

**Solutions:**

1. **Reduce anchor count** (lower quality, less storage)
```json
{
  "anchorSelection": {
    "clusters": 100  // Reduced from 300
  }
}
```

2. **Increase PQ compression** (lower quality, less storage)
```json
{
  "quantization": {
    "subVectors": 128,  // More aggressive compression
    "codebookSize": 128
  }
}
```

### Accuracy Improvement

**Problem:** Poor retrieval relevance

**Diagnosis:**
```bash
leann index validate ./leann.index --verbose

# Check:
# - Coverage score (target: >0.80)
# - Quantization error (target: <0.05)
```

**Solutions:**

1. **Increase anchor count**
```bash
leann index rebuild \
  --index ./leann.index \
  --new-anchor-count 1000 \
  --output ./leann-improved.index
```

2. **Use k-means selection** (better than random)
```json
{
  "anchorSelection": {
    "type": "kmeans",  // Changed from "random"
    "clusters": 500
  }
}
```

3. **Enable re-ranking with cross-encoder**
```bash
leann query \
  --index ./leann.index \
  --query "authentication middleware" \
  --mode rerank \
  --top-k 10
```

---

## Monitoring & Health

### Key Metrics

**1. Coverage Score** (target: >0.80)
```bash
leann index validate ./leann.index --metric coverage
```
Measures how uniformly anchors cover embedding space.

**2. Quantization Error** (target: <0.05)
```bash
leann index validate ./leann.index --metric quantization
```
Measures average reconstruction error from PQ compression.

**3. Utilization Balance** (target: >0.70)
```bash
leann index validate ./leann.index --metric balance
```
Checks if anchors are evenly utilized (no overloaded anchors).

### Rebuild Triggers

Automatic rebuild when:
- Coverage score drops below 0.70
- Delta index exceeds threshold (default: 10% of main index)
- Query latency degrades >30% from baseline
- Manual trigger: `leann index rebuild`

### Health Check Script

```bash
#!/bin/bash
# scripts/index-validator.sh

INDEX="./leann.index"

echo "LEANN Index Health Check"
echo "========================"

# Check coverage
COVERAGE=$(leann index validate $INDEX --metric coverage --json | jq -r '.score')
echo "Coverage: $COVERAGE (target: >0.80)"

# Check quantization error
QUANT_ERROR=$(leann index validate $INDEX --metric quantization --json | jq -r '.error')
echo "Quantization Error: $QUANT_ERROR (target: <0.05)"

# Check balance
BALANCE=$(leann index validate $INDEX --metric balance --json | jq -r '.score')
echo "Balance: $BALANCE (target: >0.70)"

# Check delta index size
DELTA_SIZE=$(leann index stats $INDEX --json | jq -r '.deltaIndexSize')
MAIN_SIZE=$(leann index stats $INDEX --json | jq -r '.mainIndexSize')
DELTA_RATIO=$(echo "scale=2; $DELTA_SIZE / $MAIN_SIZE * 100" | bc)
echo "Delta Index: $DELTA_RATIO% of main (rebuild at 10%)"

# Recommend actions
if (( $(echo "$COVERAGE < 0.70" | bc -l) )); then
  echo "⚠️  WARNING: Coverage too low - consider rebuilding with more anchors"
fi

if (( $(echo "$QUANT_ERROR > 0.05" | bc -l) )); then
  echo "⚠️  WARNING: High quantization error - consider reducing PQ compression"
fi

if (( $(echo "$DELTA_RATIO > 10" | bc -l) )); then
  echo "⚠️  WARNING: Delta index large - recommend full rebuild"
fi
```

---

## Integration Patterns

### Pattern 1: Python API

```python
from leann import LEANNIndex, QueryRequest

# Load index
index = LEANNIndex.load('./leann.index')

# Basic query
results = index.query(
    query="database connection pooling",
    top_k=10,
    mode='retrieve'
)

for item in results.items:
    print(f"{item.item.path} (score: {item.score:.2f})")
    print(f"  {item.preview}\n")

# Query with filters
results = index.query(
    query="authentication middleware",
    top_k=5,
    filters={
        'languages': ['typescript'],
        'paths': ['src/middleware/*.ts']
    }
)

# Batch queries
queries = [
    "JWT authentication",
    "database migration",
    "error handling"
]
results_batch = index.batch_query(queries, top_k=5)
```

### Pattern 2: REST API

```python
from fastapi import FastAPI
from leann import LEANNIndex

app = FastAPI()
index = LEANNIndex.load('./leann.index')

@app.post('/search')
async def search(query: str, top_k: int = 10):
    results = index.query(query=query, top_k=top_k)
    return {
        'results': [
            {
                'path': item.item.path,
                'score': item.score,
                'preview': item.preview
            }
            for item in results.items
        ],
        'latency': results.latency
    }

@app.get('/health')
async def health():
    stats = index.get_stats()
    return {
        'status': 'healthy' if stats.healthScore > 70 else 'degraded',
        'healthScore': stats.healthScore,
        'indexSize': stats.storageSize,
        'avgLatency': stats.avgLatency
    }
```

### Pattern 3: CLI Tool

```bash
#!/bin/bash
# Wrapper script for common queries

function search() {
  leann query \
    --index ~/.cache/leann/project.index \
    --query "$1" \
    --top-k "${2:-10}" \
    --format json | jq -r '.items[] | "\(.item.path):\(.item.startLine)"'
}

function update-index() {
  leann index update \
    --index ~/.cache/leann/project.index \
    --watch \
    --auto-rebuild
}

# Usage:
# search "database connection" 5
# update-index
```

---

## Troubleshooting

### Issue: Out of Memory During Indexing

**Error:**
```
Cannot allocate memory for k-means clustering
```

**Solution:**
Reduce sampling in anchor selection:

```json
{
  "anchorSelection": {
    "type": "kmeans",
    "clusters": 300,
    "samples": 5000  // Reduced from 10000
  }
}
```

### Issue: Embedding Server Connection Failed

**Error:**
```
Connection refused: tcp://localhost:5555
```

**Solution:**
```bash
# Check if server is running
docker ps | grep leann-embeddings

# If not, start it
docker run -d -p 5555:5555 leann/embedding-server:latest

# Test connection
curl http://localhost:5555/health
```

### Issue: Low Query Accuracy

**Error:**
```
Results not relevant to query
```

**Diagnosis:**
```bash
leann index validate ./leann.index --verbose
# Check coverage score and quantization error
```

**Solution:**
```bash
# Rebuild with more anchors and k-means selection
leann index rebuild \
  --index ./leann.index \
  --anchor-strategy kmeans \
  --anchor-count 1000 \
  --output ./leann-improved.index
```

---

## Resources

**Codebase Documentation:**
- Type definitions: `@architect/leann-codebase/types/core.ts`
- Anchor selection principles: `@architect/leann-codebase/principles/anchor-selection.md`
- Graph navigation: `@architect/leann-codebase/principles/graph-navigation.md`
- Indexing workflows: `@architect/leann-codebase/templates/indexing-workflow.md`

**References:**
- Quick commands: `@architect/leann/assets/cheatsheet.md`
- Indexing patterns: `@architect/leann/references/indexing-patterns.md`
- Configuration guide: `@architect/leann/references/configuration.md`

**External:**
- GitHub: https://github.com/leann-index/leann
- Documentation: https://leann-index.github.io
- Paper: "LEANN: Learned Embedding Anchor Navigation"

---

## Summary

**Core Value Proposition:**
- 97% storage reduction vs traditional vector databases
- 10-30ms query latency (local, no network overhead)
- Zero cloud costs (self-hosted)
- Incremental updates (no nightly rebuilds)

**Best For:**
- Large codebases (10K+ files)
- Memory-constrained environments
- Cost-sensitive RAG applications
- Local-first semantic search

**Decision Tree:**

```
Need semantic code search?
├─ Dataset < 1K items → Use simple vector DB (overhead not worth it)
└─ Dataset > 1K items
    ├─ Cloud infrastructure required → Use Pinecone/Weaviate
    └─ Local-first / cost-sensitive → Use LEANN
        ├─ Index < RAM → HNSW backend (10-30ms)
        └─ Index > RAM → DiskANN backend (50-200ms)
```

**Golden Rules:**
1. Start with medium complexity, increase only if needed
2. Use k-means anchor selection for production
3. Monitor coverage score - rebuild if <0.70
4. Enable incremental updates for CI/CD
5. Use re-ranking for user-facing search

Overview

This skill implements LEANN: a local RAG indexing system that cuts embedding storage by ~95–97% using anchor-based lazy recomputation and graph navigation. It provides a memory-efficient semantic search solution for large codebases, enabling fast retrieval with minimal on-disk footprint. The design balances storage, latency, and accuracy for developer tooling and self-hosted applications.

How this skill works

LEANN builds an anchor graph (HNSW or DiskANN) of a small set of full embeddings and stores compressed delta vectors for the full corpus. Queries first traverse the anchor graph to retrieve candidate anchors, then lazily reconstruct candidate embeddings on demand using product quantization. Incremental delta indexing and a GPU-backed embedding server allow fast updates without full rebuilds.

When to use it

Index large monorepos or codebases (10k+ files) where full-precision storage is prohibitive
Run semantic search on development machines, CI/CD runners, or air-gapped environments with limited memory
Self-hosted RAG applications that must avoid commercial vector DB costs
IDE integrations and developer tools that need semantic code navigation without cloud dependencies
Edge deployments where disk-efficient indexes are critical

Best practices

Start with k-means anchor selection for balanced coverage; increase anchors if recall drops
Tune HNSW efSearch/efConstruction based on latency vs. recall targets
Monitor coverage and quantization error; rebuild when coverage <0.70 or quantization error >0.05
Use a GPU embedding server for batch embedding and low-latency lazy reconstruction
Enable incremental delta indexing and schedule periodic full rebuilds when delta ratio grows

Example use cases

Local semantic code search in an IDE extension with a small on-disk index
CI pipeline that incrementally updates and validates a code search index on pull requests
Self-hosted knowledge base for offline teams that requires low storage and high recall
Large polyglot repositories where stratified anchor selection ensures balanced language coverage
Batch processing of documentation where cloud vector DB costs are unacceptable

FAQ

How much storage reduction can I expect?

Typical reductions are 95–97% compared to storing all full embeddings. Exact savings depend on anchor count and PQ settings.

What latency should I expect for queries?

Two-stage queries usually run 10–30ms with HNSW and 50–200ms with DiskANN; lazy reconstruction adds 5–25ms for top candidates.

When should I rebuild the index?

Rebuild when coverage score drops below ~0.70, delta index exceeds configured threshold (default ~10%), or query latency/recall degrades significantly.