home / skills / benchflow-ai / skillsbench / lean4-memories

lean4-memories skill

safe

/tasks/lean4-proof/environment/skills/lean4-memories

This skill helps Lean 4 projects remember proven proof patterns and avoid dead ends by syncing memory across sessions.

This is most likely a fork of the lean4-memories skill from cameronfreer

npx playbooks add skill benchflow-ai/skillsbench --skill lean4-memories

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

12.5 KB

---
name: lean4-memories
description: This skill should be used when working on Lean 4 formalization projects to maintain persistent memory of successful proof patterns, failed approaches, project conventions, and user preferences across sessions using MCP memory server integration
---

# Lean 4 Memories

## Overview

This skill enables persistent learning and knowledge accumulation across Lean 4 formalization sessions by leveraging MCP (Model Context Protocol) memory servers. It transforms stateless proof assistance into a learning system that remembers successful patterns, avoids known dead-ends, and adapts to project-specific conventions.

**Core principle:** Learn from each proof session and apply accumulated knowledge to accelerate future work.

## When to Use This Skill

This skill applies when working on Lean 4 formalization projects, especially:
- **Multi-session projects** - Long-running formalizations spanning days/weeks/months
- **Repeated proof patterns** - Similar theorems requiring similar approaches
- **Complex proofs** - Theorems with multiple attempted approaches
- **Team projects** - Shared knowledge across multiple developers
- **Learning workflows** - Building up domain-specific proof expertise

**Especially important when:**
- Starting a new session on an existing project
- Encountering a proof pattern similar to previous work
- Trying an approach that previously failed
- Needing to recall project-specific conventions
- Building on successful proof strategies from earlier sessions

## How Memory Integration Works

### Memory Scoping

All memories are scoped by:
1. **Project path** - Prevents cross-project contamination
2. **Skill context** - Memories tagged with `lean4-memories`
3. **Entity type** - Structured by pattern type (ProofPattern, FailedApproach, etc.)

**Example scoping:**
```
Project: /Users/freer/work/exch-repos/exchangeability-cursor
Skill: lean4-memories
Entity: ProofPattern:condExp_unique_pattern
```

### Memory Types

**1. ProofPattern** - Successful proof strategies
```
Store when: Proof completes successfully after exploration
Retrieve when: Similar goal pattern detected
```

**2. FailedApproach** - Known dead-ends to avoid
```
Store when: Approach attempted but failed/looped/errored
Retrieve when: About to try similar approach
```

**3. ProjectConvention** - Code style and patterns
```
Store when: Consistent pattern observed (naming, structure, tactics)
Retrieve when: Creating new definitions/theorems
```

**4. UserPreference** - Workflow customization
```
Store when: User expresses preference (verbose output, specific tools, etc.)
Retrieve when: Choosing between options
```

**5. TheoremDependency** - Relationships between theorems
```
Store when: One theorem proves useful for proving another
Retrieve when: Looking for helper lemmas
```

## Memory Workflows

### Storing Memories

**After successful proof:**
```lean
-- Just proved: exchangeable_iff_fullyExchangeable
-- Store the successful pattern
```

Store:
- Goal pattern: `exchangeable X ↔ fullyExchangeable X`
- Successful tactics: `[apply measure_eq_of_fin_marginals_eq, intro, simp]`
- Helper lemmas used: `[prefixCylinder_measurable, isPiSystem_prefixCylinders]`
- Difficulty: medium (54 lines)
- Confidence: high (proof clean, no warnings)

**After failed approach:**
```lean
-- Attempted: simp only [condExp_indicator, mul_comm]
-- Result: infinite loop, build timeout
```

Store:
- Failed tactic: `simp only [condExp_indicator, mul_comm]`
- Error: "infinite simp loop"
- Context: conditional expectation with indicator
- Recommendation: "Use simp only [condExp_indicator] without mul_comm"

**Project conventions observed:**
```lean
-- Pattern: All measure theory proofs start with haveI
haveI : MeasurableSpace Ω := inferInstance
```

Store:
- Convention: "Measure theory proofs require explicit MeasurableSpace instance"
- Pattern: `haveI : MeasurableSpace Ω`
- Frequency: 15 occurrences
- Files: DeFinetti/ViaL2.lean, Core.lean, Contractability.lean

### Retrieving Memories

**Starting new proof session:**
1. Load project-specific conventions
2. Retrieve similar proof patterns from past work
3. Surface any known issues with current file/module

**Encountering similar goal:**
```
⊢ condExp μ m X =ᵐ[μ] condExp μ m Y

Memory retrieved: "Similar goals proved using condExp_unique"
Pattern: "Show ae_eq, verify measurability, apply condExp_unique"
Success rate: 3/3 in this project
```

**Before trying a tactic:**
```
About to: simp only [condExp_indicator, mul_comm]

Memory retrieved: ⚠️ WARNING - This combination causes infinite loop
Failed in: ViaL2.lean:2830 (2025-10-17)
Alternative: Use simp only [condExp_indicator], then ring
```

## Integration with lean4-theorem-proving Skill

The lean4-memories skill complements (doesn't replace) lean4-theorem-proving:

**lean4-theorem-proving provides:**
- General Lean 4 workflows (4-Phase approach)
- mathlib search and tactics reference
- Automation scripts
- Domain-specific knowledge (measure theory, probability)

**lean4-memories adds:**
- Project-specific learned patterns
- History of what worked/failed in this project
- Accumulated domain expertise from your proofs
- Personalized workflow preferences

**Use together:**
1. lean4-theorem-proving guides general workflow
2. lean4-memories provides project-specific context
3. Memories inform tactics choices from lean4-theorem-proving

## Memory Operations

### Storing a Successful Proof Pattern

After completing a proof, store the pattern using MCP memory:

**What to capture:**
- **Goal pattern** - Type/structure of goal (equality, exists, forall, etc.)
- **Tactics sequence** - Tactics that worked, in order
- **Helper lemmas** - Key lemmas applied
- **Difficulty** - Lines of proof, complexity estimate
- **Confidence** - Clean proof vs sorries/warnings
- **Context** - File, module, theorem name

**When to store:**
- Proof completed successfully (no sorries)
- Non-trivial (>10 lines or required exploration)
- Likely to be useful again (similar theorems expected)

**Storage format:**
```
Entity type: ProofPattern
Name: {descriptive_name}
Attributes:
  - project: {absolute_path}
  - goal_pattern: {pattern_description}
  - tactics: [list, of, tactics]
  - helper_lemmas: [lemma1, lemma2]
  - difficulty: {small|medium|large}
  - confidence: {0.0-1.0}
  - file: {filename}
  - timestamp: {date}
```

### Storing a Failed Approach

When an approach fails (error, loop, timeout), store to avoid repeating:

**What to capture:**
- **Failed tactic** - Exact tactic/sequence that failed
- **Error type** - Loop, timeout, type error, etc.
- **Context** - What was being proved
- **Alternative** - What worked instead (if known)

**When to store:**
- Infinite simp loops
- Tactics causing build timeouts
- Type mismatches from subtle issues
- Approaches that seemed promising but didn't work

**Storage format:**
```
Entity type: FailedApproach
Name: {descriptive_name}
Attributes:
  - project: {absolute_path}
  - failed_tactic: {tactic_text}
  - error: {error_description}
  - context: {what_was_being_proved}
  - alternative: {what_worked}
  - timestamp: {date}
```

### Storing Project Conventions

Track consistent patterns that emerge:

**What to capture:**
- **Naming conventions** - h_ for hypotheses, have_ for results
- **Proof structure** - Standard opening moves (haveI, intro patterns)
- **Import patterns** - Commonly used imports
- **Tactic preferences** - measurability vs explicit proofs

**When to store:**
- Pattern observed 3+ times consistently
- Convention affects multiple files
- Style guide established

### Retrieving Memories

**Before starting proof:**
```
1. Query for similar goal patterns
2. Surface successful tactics for this pattern
3. Check for known issues with current context
4. Suggest helper lemmas from similar proofs
```

**During proof:**
```
1. Before each major tactic, check for known failures
2. When stuck, retrieve alternative approaches
3. Suggest next tactics based on past success
```

**Query patterns:**
```
# Find similar proofs
search_entities(
  query="condExp equality goal",
  filters={"project": current_project, "entity_type": "ProofPattern"}
)

# Check for failures
search_entities(
  query="simp only condExp_indicator",
  filters={"project": current_project, "entity_type": "FailedApproach"}
)

# Get conventions
search_entities(
  query="naming conventions measure theory",
  filters={"project": current_project, "entity_type": "ProjectConvention"}
)
```

## Best Practices

### Memory Quality

**DO store:**
- ✅ Successful non-trivial proofs (>10 lines)
- ✅ Failed approaches that wasted significant time
- ✅ Consistent patterns observed multiple times
- ✅ Project-specific insights

**DON'T store:**
- ❌ Trivial proofs (rfl, simp, exact)
- ❌ One-off tactics unlikely to recur
- ❌ General Lean knowledge (already in training/mathlib)
- ❌ Temporary workarounds

### Memory Hygiene

**Confidence scoring:**
- **High (0.8-1.0)** - Clean proof, no warnings, well-tested
- **Medium (0.5-0.8)** - Works but has minor issues
- **Low (0.0-0.5)** - Hacky solution, needs refinement

**Aging:**
- Recent memories (same session) = higher relevance
- Older memories = verify still applicable
- Patterns from many sessions = high confidence

**Pruning:**
- Remove memories for deleted theorems
- Update when better approach found
- Mark as outdated if project evolves

### User Control

**Users can:**
- Toggle lean4-memories skill on/off independently
- Clear project-specific memories
- Review stored memories
- Adjust confidence thresholds
- Export/import memories for sharing

## Example Workflow

**Session 1: First proof**
```lean
-- Proving: measure_eq_of_fin_marginals_eq
-- No memories yet, explore from scratch
-- [After 30 minutes of exploration]
-- ✅ Success with π-system uniqueness approach

Store: ProofPattern "pi_system_uniqueness"
  - Works for: measure equality via finite marginals
  - Tactics: [isPiSystem, generateFrom_eq, measure_eq_on_piSystem]
  - Confidence: 0.9
```

**Session 2: Similar theorem (weeks later)**
```lean
-- Proving: fullyExchangeable_via_pathLaw
-- Goal: Show two measures equal
-- System: "Similar to measure_eq_of_fin_marginals_eq"
--         Retrieve memory: pi_system_uniqueness pattern
--         Suggestion: "Try isPiSystem approach?"

-- ✅ Success in 5 minutes using remembered pattern
```

**Session 3: Avoiding failure**
```lean
-- Proving: condIndep_of_condExp_eq
-- About to: simp only [condExp_indicator, mul_comm]
-- ⚠️ Memory: This causes infinite loop (stored Session 1)
--          Alternative: simp only [condExp_indicator], then ring

-- Avoid 20-minute debugging session by using memory
```

## Configuration

### Memory Server Setup

Ensure MCP memory server is configured:

```json
// In Claude Desktop config
{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-memory"]
    }
  }
}
```

### Project-Specific Settings

Memories are automatically scoped by project path. To work across multiple projects:

**Same formalization, different repos:**
```
# Link memories using project aliases
# (Future enhancement - not yet implemented)
```

**Sharing memories with team:**
```
# Export/import functionality
# (Future enhancement - not yet implemented)
```

## Integration with Automation Scripts

Memories enhance script usage:

**proof_templates.sh:**
- Retrieve project-specific template preferences
- Include common proof patterns in scaffolding

**suggest_tactics.sh:**
- Prioritize tactics that succeeded in this project
- Warn about tactics with known issues

**sorry_analyzer.py:**
- Link sorries to similar completed proofs
- Suggest approaches based on memory

## Limitations and Caveats

**What memories DON'T replace:**
- Mathematical understanding
- Lean type system knowledge
- mathlib API documentation
- Formal verification principles

**Potential issues:**
- Stale memories if project evolves significantly
- Over-fitting to specific project patterns
- Memory bloat if not maintained
- Cross-project contamination if scoping fails

**Mitigation:**
- Regular review of stored memories
- Confidence scoring and aging
- Strict project-path scoping
- User control over memory operations

## Future Enhancements

**Planned features:**
- Memory visualization dashboard
- Pattern mining across projects
- Collaborative memory sharing
- Automated memory pruning
- Integration with git history
- Cross-project pattern detection (with user consent)

## See Also

- **lean4-theorem-proving skill** - Core workflows and automation
- **MCP memory server docs** - https://modelcontextprotocol.io/docs/getting-started/intro
- **references/memory-patterns.md** - Detailed memory operation examples

Overview

This skill enables persistent, project-scoped memory for Lean 4 formalization workflows by integrating with an MCP memory server. It records successful proof patterns, failed approaches, project conventions, and user preferences so future sessions reuse what worked and avoid known dead-ends. The result is faster iteration on recurring theorems and more consistent team practices.

How this skill works

The skill stores structured memory entities (ProofPattern, FailedApproach, ProjectConvention, UserPreference, TheoremDependency) scoped to the project path and tagged with lean4-memories. On startup and during proof exploration it queries the MCP memory server for matches to the current goal, upcoming tactic choices, or file context and surfaces relevant patterns, warnings, and helper lemmas. Memories include metadata (confidence, difficulty, file, timestamp) so recommendations can be ranked and aged.

When to use it

Long-running multi-session Lean 4 projects where knowledge should persist across days or weeks
Proofs that repeatedly use similar tactics or lemma combinations
Complex or exploratory proofs where failed approaches should be recorded to avoid repetition
Team repositories where conventions and successful strategies should be shared
Sessions where project-specific naming, imports, or measurability patterns matter

Best practices

Store non-trivial, repeatable proof patterns (avoid trivial one-line proofs)
Record failed approaches that caused loops, timeouts, or hard-to-find errors
Score confidence and age memories; prefer recent, high-confidence patterns
Scope memories by project path to prevent cross-project contamination
Review and prune outdated memories when project APIs or style evolve

Example use cases

After completing a multi-step proof, save the tactics sequence and helper lemmas so similar future goals are solved faster
Before running simp, check memories for combinations that previously caused infinite simp loops and use recommended alternatives
On session start, load project conventions (haveI usage, common imports) to scaffold new proofs consistently
When stuck, retrieve alternative approaches previously used for related goals and suggested helper lemmas
Automate scaffolding scripts to prefer project-specific templates and tactics stored in memory

FAQ

How are memories scoped to avoid mixing projects?

All memories include the absolute project path and a lean4-memories tag; queries filter by project path so recommendations are project-specific.

When should I store a proof pattern?

Store patterns for non-trivial proofs (e.g., >10 lines or exploratory solutions) that are likely to recur; avoid trivial or one-off tactics.