home / skills / zpankz / mcp-skillset / obsidian-process

obsidian-process skill

needs review

This skill batch-processes Obsidian vaults, extracting wikilinks, normalizing tags, and managing frontmatter for scalable migrations and metadata consistency.

npx playbooks add skill zpankz/mcp-skillset --skill obsidian-process

Review the files below or copy the command above to add this skill to your agents.

Files (51)

SKILL.md

10.4 KB

---
name: obsidian-process
description: This skill should be used when batch processing Obsidian markdown vaults. Handles wikilink extraction, tag normalization, frontmatter CRUD operations, and vault analysis. Use for vault-wide transformations, link auditing, tag standardization, metadata management, and migration workflows. Integrates with obsidian-markdown for syntax validation and obsidian-data-importer for structured imports.
---

# Obsidian Process

Modular Python batch processing toolkit for Obsidian vault operations with dry-run support, backup/rollback capabilities, and comprehensive reporting.

## System Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    BatchProcessor (Base)                     │
│  - find_markdown_files()  - read_file()  - write_file()    │
│  - VaultContext (rollback)  - ProcessingResult (output)     │
└──────────────────────────┬──────────────────────────────────┘
                           │
     ┌─────────────────────┼─────────────────────┐
     │                     │                     │
     ▼                     ▼                     ▼
┌──────────┐        ┌──────────┐         ┌──────────────┐
│ Wikilink │        │   Tag    │         │ Frontmatter  │
│Extractor │        │Normalizer│         │  Processor   │
└──────────┘        └──────────┘         └──────────────┘
```

## Quick Start

### Extract All Wikilinks

```bash
python scripts/wikilink_extractor.py --vault /path/to/vault --output links.json
```

### Normalize Tags

```bash
python scripts/tag_normalizer.py --vault /path/to/vault --case lower --dry-run
```

### Process Frontmatter

```bash
python scripts/frontmatter_processor.py --vault /path/to/vault --operation add --key status --value draft
```

## Modules

### 1. BatchProcessor (base class)

All processors inherit from `BatchProcessor`:

```python
from batch_processor import BatchProcessor, ProcessingResult, VaultContext

class MyProcessor(BatchProcessor):
    def process(self) -> ProcessingResult:
        with VaultContext(self) as ctx:
            for file in self.find_markdown_files():
                content = self.read_file(file)
                # Transform content
                self.write_file(file, new_content)
        return ProcessingResult(success=True, files_processed=n, ...)
```

**Key Features**:
- `--dry-run`: Preview changes without modifying files
- `--verbose`: Detailed logging output
- Automatic backup before modifications
- Rollback on errors via `VaultContext`

### 2. WikilinkExtractor

Extracts and analyzes all wikilink formats:

| Format | Example | Captured Fields |
|--------|---------|-----------------|
| Basic | `[[Note]]` | target |
| Alias | `[[Note\|Display]]` | target, display_text |
| Header | `[[Note#Section]]` | target, header |
| Block | `[[Note^abc123]]` | target, block_id |
| Embed | `![[Image.png]]` | target, is_embedded |

**Output**: JSON with `link_index`, `backlink_index`, `statistics`

```bash
# Build link graph
python scripts/wikilink_extractor.py --vault ./vault --output graph.json

# Find broken links
python scripts/wikilink_extractor.py --vault ./vault --find-broken
```

### 3. TagNormalizer

Handles both inline (`#tag`) and frontmatter tags:

**Operations**:
- Case normalization: `lower`, `upper`, `title`
- Pattern-based rules via JSON config
- Hierarchy parsing: `#project/active/urgent` → 3 levels

```bash
# Normalize all tags to lowercase
python scripts/tag_normalizer.py --vault ./vault --case lower

# Apply custom rules
python scripts/tag_normalizer.py --vault ./vault --rules rules.json
```

**Rules JSON Format**:
```json
{
  "rules": [
    {"pattern": "^todo$", "replacement": "task", "case_sensitive": false},
    {"pattern": "^WIP$", "replacement": "in-progress"}
  ]
}
```

### 4. FrontmatterProcessor

Full YAML frontmatter CRUD with templates and validators:

**Operations**:
- `add`: Add/update field (merges with existing)
- `remove`: Delete field from frontmatter
- `update`: Same as add (explicit intent)
- `validate`: Check against registered validators
- `template`: Apply predefined templates

```bash
# Add created date to all files
python scripts/frontmatter_processor.py --vault ./vault \
  --operation add --key created --value "2024-01-01"

# Apply article template
python scripts/frontmatter_processor.py --vault ./vault \
  --operation template --template article

# Validate all frontmatter
python scripts/frontmatter_processor.py --vault ./vault --operation validate
```

**Built-in Templates**:
- `basic`: created, tags, aliases
- `article`: created, modified, tags, aliases, author, published, draft
- `meeting`: created, tags, attendees, date, location

## Integration Patterns

### With obsidian-markdown Skill

Use obsidian-process for batch operations, then validate with obsidian-markdown:

```
1. obsidian-process: Normalize tags vault-wide
2. obsidian-markdown: Validate Obsidian syntax compliance
3. obsidian-process: Generate validation report
```

### With obsidian-data-importer Skill

Process imported data after ingestion:

```
1. obsidian-data-importer: Convert CSV/JSON to markdown
2. obsidian-process: Apply frontmatter templates
3. obsidian-process: Build wikilink index
4. obsidian-markdown: Validate final output
```

## Common Workflows

### Vault Health Check

```bash
# 1. Extract all wikilinks
python scripts/wikilink_extractor.py --vault ./vault --output links.json

# 2. Identify orphaned notes (no incoming links)
python scripts/wikilink_extractor.py --vault ./vault --find-orphans

# 3. Validate frontmatter
python scripts/frontmatter_processor.py --vault ./vault --operation validate
```

### Tag Migration

```bash
# 1. Preview changes with dry-run
python scripts/tag_normalizer.py --vault ./vault --case lower --dry-run

# 2. Apply normalization
python scripts/tag_normalizer.py --vault ./vault --case lower

# 3. Report statistics
python scripts/tag_normalizer.py --vault ./vault --stats-only
```

### Frontmatter Standardization

```bash
# 1. Apply template to all notes
python scripts/frontmatter_processor.py --vault ./vault \
  --operation template --template basic --dry-run

# 2. Add custom fields
python scripts/frontmatter_processor.py --vault ./vault \
  --operation add --key project --value "[[Projects/Main]]"
```

## ProcessingResult Schema

All processors return standardized results:

```python
@dataclass
class ProcessingResult:
    success: bool
    files_processed: int
    files_modified: int
    errors: List[str]
    warnings: List[str]
    metadata: Dict[str, Any]  # Operation-specific data
    timestamp: str
```

## Safety Features

| Feature | Description |
|---------|-------------|
| `--dry-run` | Preview all changes without writing |
| Backup | Automatic `.bak` files before modification |
| Rollback | `VaultContext` reverts on exceptions |
| Logging | Timestamped operation logs |
| Validation | Pre-write content validation |

## Extension Guide

To add a new processor:

```python
from batch_processor import BatchProcessor, ProcessingResult

class MyProcessor(BatchProcessor):
    def __init__(self, vault_path, dry_run=False, verbose=False):
        super().__init__(vault_path, dry_run, verbose)
        # Custom initialization

    def process_file(self, file_path: Path) -> bool:
        content = self.read_file(file_path)
        # Transform content
        return self.write_file(file_path, new_content)

    def process(self) -> ProcessingResult:
        for file in self.find_markdown_files():
            self.process_file(file)
        return ProcessingResult(success=True, ...)
```

## CLI Reference

All scripts share common arguments:

```
--vault PATH      Path to Obsidian vault (required)
--dry-run         Preview changes without modifying
--verbose, -v     Enable detailed logging
--output PATH     Output file for results (JSON)
```

### wikilink_extractor.py

```
--find-broken     Report links to non-existent notes
--find-orphans    Report notes with no incoming links
--stats-only      Output statistics without full index
```

### tag_normalizer.py

```
--case {lower,upper,title}  Case normalization mode
--rules PATH                JSON file with transformation rules
--stats-only                Output tag statistics only
```

### frontmatter_processor.py

```
--operation {add,remove,update,validate,template}
--key KEY         Frontmatter key for add/remove/update
--value VALUE     Value to set (supports JSON arrays/objects)
--template NAME   Template name for template operation
```

## LSP Integration

**When to use**: Semantic analysis requiring graph-awareness—backlink chains, cross-vault renaming, broken link detection with context.

**Setup**:
```bash
export ENABLE_LSP_TOOL=1
# Requires: markdown-oxide in PATH (cargo install markdown-oxide)
```

**Core Capabilities** (via `scripts/lsp_integration.py`):

| Operation | Purpose |
|-----------|---------|
| `findReferences` | Find all backlinks to a note |
| `goToDefinition` | Resolve wikilink targets |
| `workspaceSymbol` | Search vault for tags/headings |
| `diagnostics` | Detect broken links |
| `rename` | Cross-vault symbol renaming |

**Key Classes**:
- `LSPClient`: Low-level LSP protocol wrapper
- `MarkdownOxideIntegration`: High-level Obsidian operations
- `RecursiveLSP`: Graph traversal (backlink chains, orphan detection, strongly connected components)
- `VaultLSPAnalyzer`: Extends BatchProcessor with LSP-powered vault analysis

**CLI Quick Reference**:
```bash
# Analyze entire vault
python scripts/lsp_integration.py analyze-vault --vault ~/vault --output analysis.json

# Find recursive backlinks
python scripts/lsp_integration.py recursive-backlinks --vault ~/vault --file note.md --depth 3
```

**Detailed Documentation**: See `references/lsp-patterns.md` for:
- Complete LSP method reference
- Recursive query patterns (backlink chains, reference graphs, transitive closure)
- Batch processing strategies with caching
- Integration patterns with WikilinkExtractor, TagNormalizer, VaultAnalyzer
- Performance optimization and troubleshooting

Overview

This skill is a batch processing toolkit for Obsidian markdown vaults that automates wikilink extraction, tag normalization, frontmatter CRUD, and vault analysis. It is designed for safe, repeatable vault-wide transformations with dry-run, automatic backup, and rollback support. Use it to standardize metadata, audit links, migrate tags, and generate structured reports for downstream tools.

How this skill works

Processors inherit from a common BatchProcessor that finds markdown files, reads and transforms content, and writes changes while producing a standardized ProcessingResult. Key modules include a WikilinkExtractor that builds link and backlink indexes, a TagNormalizer for inline and frontmatter tags with rule-driven transforms, and a FrontmatterProcessor that performs add/remove/update/validate/template operations. Safety features include --dry-run, automatic .bak backups, VaultContext rollback on exceptions, and pre-write validation; outputs are JSON-ready for integration.

When to use it

Standardize tags across an entire vault before sharing or exporting
Audit and repair broken or orphaned wikilinks and build a link graph
Apply or enforce frontmatter templates and metadata policies
Run vault-wide migrations after bulk imports or structural changes
Generate reports for validation and downstream processing by other tools

Best practices

Always run with --dry-run first to preview changes and catch unexpected transforms
Version-control your vault and rely on automatic .bak files for an extra safety layer
Use JSON rule files for tag mappings to ensure reproducible migrations
Combine processing with obsidian-markdown validation to catch syntax issues before commit
Run targeted processors on subsets (folders or globs) before scaling to the whole vault

Example use cases

Normalize all tags to lowercase, preview with --dry-run, then apply across the vault
Extract a full wikilink index, find broken links, and export a graph.json for analysis
Apply an article frontmatter template to newly imported notes after CSV/JSON ingestion
Detect orphaned notes and generate a report for content cleanup
Perform a controlled rename of a top-level note using LSP-enabled recursive backlink updates

FAQ

Can I preview changes without modifying files?

Yes — use --dry-run to simulate all changes and produce reports without writing files.

How does rollback work if something fails mid-run?

Processors use VaultContext which creates backups before writes and will revert to backups if an exception occurs.

Can I apply complex tag mappings?

Yes — TagNormalizer accepts a JSON rules file with regex patterns and replacements for deterministic, case-aware mappings.