home / skills / jmagly / aiwg / source-unifier

This skill intelligently unifies documentation from multiple sources—docs, GitHub, and PDFs—and reports conflicts for consistent, accurate AI-assisted

npx playbooks add skill jmagly/aiwg --skill source-unifier

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
7.7 KB
---
name: source-unifier
description: Merge multiple documentation sources (docs, GitHub, PDF) with conflict detection. Use when combining docs + code for complete skill coverage.
tools: Read, Write, Bash, Glob, Grep
---

# Source Unifier Skill

## Purpose

Single responsibility: Intelligently merge documentation from multiple sources (websites, GitHub repos, PDFs) while detecting and transparently reporting conflicts between documented and implemented behavior. (BP-4)

## Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

- [ ] All source URLs/paths are accessible
- [ ] Each source type is correctly identified (docs, github, pdf)
- [ ] Output directory is writable
- [ ] Merge mode is specified (rule-based or AI-enhanced)
- [ ] Conflict resolution strategy is defined

**DO NOT merge without inspecting each source first.**

## Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

- Conflict severity unclear (is doc or code authoritative?)
- Multiple valid interpretations of API signature
- Source versions don't match (v2 docs vs v3 code)
- Merge strategy produces ambiguous results

**NEVER silently resolve conflicts. Always report discrepancies.**

## Context Scope (Archetype 3 Mitigation)

| Context Type | Included | Excluded |
|--------------|----------|----------|
| RELEVANT | All specified sources, merge config | Unrelated documentation |
| PERIPHERAL | Version history for context | Other projects |
| DISTRACTOR | Previous merge attempts | Unrelated codebases |

## Conflict Types

| Type | Severity | Description | Example |
|------|----------|-------------|---------|
| Missing in code | HIGH | Documented but not implemented | API endpoint in docs, not in code |
| Missing in docs | MEDIUM | Implemented but not documented | Hidden feature in code |
| Signature mismatch | MEDIUM | Different parameters/types | `func(a, b)` vs `func(a, b, c=None)` |
| Description mismatch | LOW | Different explanations | Wording differences |

## Workflow Steps

### Step 1: Verify Sources (Grounding)

```bash
# Test documentation URL
curl -I https://docs.example.com/

# Test GitHub repo
gh repo view owner/repo --json name,description

# Test PDF file
file manual.pdf && pdfinfo manual.pdf
```

### Step 2: Create Unified Configuration

```json
{
  "name": "myframework",
  "description": "Complete framework knowledge from docs + code",
  "merge_mode": "rule-based",
  "conflict_resolution": {
    "missing_in_code": "warn",
    "missing_in_docs": "include",
    "signature_mismatch": "show_both",
    "description_mismatch": "prefer_docs"
  },
  "sources": [
    {
      "type": "documentation",
      "base_url": "https://docs.example.com/",
      "extract_api": true,
      "max_pages": 200
    },
    {
      "type": "github",
      "repo": "owner/myframework",
      "include_code": true,
      "code_analysis_depth": "surface",
      "max_issues": 100
    },
    {
      "type": "pdf",
      "path": "docs/manual.pdf",
      "extract_tables": true
    }
  ]
}
```

### Step 3: Execute Unified Scraping

**Option A: With skill-seekers**

```bash
skill-seekers unified --config unified-config.json
```

**Option B: Manual merge workflow**

1. Scrape each source independently
2. Extract API signatures from each
3. Compare and detect conflicts
4. Generate merged output with conflict annotations

### Step 4: Review Conflict Report

The unifier generates a conflict report:

```markdown
# Conflict Report: myframework

## Summary
- Total APIs analyzed: 245
- Conflicts detected: 18
- Missing in code: 3 (HIGH)
- Missing in docs: 8 (MEDIUM)
- Signature mismatches: 5 (MEDIUM)
- Description mismatches: 2 (LOW)

## HIGH Severity Conflicts

### `deprecated_function()`
- **Status**: Documented but not found in code
- **Documentation**: "Use this function to..."
- **Code**: NOT FOUND
- **Recommendation**: Remove from docs or implement

## MEDIUM Severity Conflicts

### `process_data(input: str)`
- **Status**: Signature mismatch
- **Documentation**: `process_data(input: str)`
- **Code**: `process_data(input: str, validate: bool = True)`
- **Recommendation**: Update documentation to include `validate` parameter
```

### Step 5: Validate Merged Output

```bash
# Check merged skill structure
ls -la output/myframework/

# Verify conflict annotations
grep -r "⚠️\|Conflict\|WARNING" output/myframework/references/

# Count conflict markers
grep -c "Conflict" output/myframework/references/*.md
```

## Recovery Protocol (Archetype 4 Mitigation)

On error:

1. **PAUSE** - Preserve partial merge state
2. **DIAGNOSE** - Check error type:
   - `Source unavailable` → Skip source, note in report
   - `Parse error` → Check source format, retry with different parser
   - `Memory error` → Process sources sequentially
   - `Conflict overflow` → Increase conflict threshold or filter by severity
3. **ADAPT** - Adjust merge strategy
4. **RETRY** - Resume merge (max 3 attempts)
5. **ESCALATE** - Present partial results, ask user for conflict resolution

## Checkpoint Support

State saved to: `.aiwg/working/checkpoints/source-unifier/`

```
checkpoints/source-unifier/
├── source_1_docs.json      # Processed docs
├── source_2_github.json    # Processed GitHub
├── source_3_pdf.json       # Processed PDF
├── conflicts.json          # Detected conflicts
└── merge_progress.json     # Current merge state
```

Resume: `skill-seekers unified --config config.json --resume`

## Output Structure

```
output/myframework/
├── SKILL.md                    # Main skill with conflict summary
├── references/
│   ├── index.md                # Unified index
│   ├── api_reference.md        # Merged API docs (with conflict markers)
│   ├── guides.md               # Merged guides
│   └── conflicts.md            # Detailed conflict report
├── sources/
│   ├── documentation.md        # Original docs content
│   ├── github.md               # GitHub-extracted content
│   └── pdf.md                  # PDF-extracted content
└── metadata/
    ├── sources.json            # Source metadata
    └── conflict_summary.json   # Machine-readable conflicts
```

## Conflict Markers in Output

Merged content includes inline conflict markers:

```markdown
#### `process_data(input: str, validate: bool = True)`

⚠️ **Conflict**: Documentation signature differs from implementation

**Documentation says:**
```python
def process_data(input: str) -> dict:
    """Process input data and return results."""
```

**Code implementation:**
```python
def process_data(input: str, validate: bool = True) -> dict:
    """Process input data with optional validation."""
```

**Resolution**: Documentation should be updated to include the `validate` parameter added in v2.3.
```

## Merge Modes

| Mode | Description | Use Case |
|------|-------------|----------|
| `rule-based` | Apply predefined rules for conflict resolution | Fast, deterministic |
| `ai-enhanced` | Use AI to intelligently merge conflicting content | Better quality, slower |
| `manual` | Generate conflicts only, user resolves | Full control |

## Troubleshooting

| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| Too many conflicts | Sources very different | Filter by severity, merge incrementally |
| False positives | Parser differences | Normalize API extraction |
| Missing content | Source incomplete | Add supplementary source |
| Merge too slow | Large sources | Use parallel processing |

## References

- Skill Seekers Unified Scraping: https://github.com/jmagly/Skill_Seekers/blob/main/docs/UNIFIED_SCRAPING.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-6 model consortium parallel)
- REF-002: LLM Failure Modes (Archetype 1-4 mitigations)

Overview

This skill intelligently merges multiple documentation sources (web docs, GitHub repos, PDFs) into a single unified reference while detecting and reporting conflicts between documentation and implementation. It produces a merged output with clear conflict annotations and a machine-readable conflict summary to support review and remediation. Use it to create complete, auditable skill coverage combining code and docs.

How this skill works

The unifier verifies source accessibility and type, extracts API signatures and content from each source, and compares items to detect missing items, signature mismatches, and description differences. It highlights conflicts by severity, saves checkpoints during processing, and emits a unified output layout with both human-readable conflict reports and structured JSON for automation. The tool supports rule-based, AI-enhanced, and manual merge modes and refuses to silently resolve unclear conflicts.

When to use it

  • Combining product docs, repository code, and PDFs into a single authoritative reference
  • Preparing a skill or library manifest that must reflect both documentation and implementation
  • Auditing API surface to find undocumented or misdocumented features
  • Automating release notes or migration work where docs/code drift is common
  • Generating conflict reports for engineering or docs teams prior to publication

Best practices

  • Verify all source URLs/paths and types before running a merge
  • Choose a merge mode that fits risk: rule-based for speed, AI-enhanced for nuance, manual for full control
  • Escalate ambiguous conflicts to a human rather than auto-resolving
  • Persist checkpoints and run merges incrementally on large codebases
  • Filter conflicts by severity to prioritize remediation work

Example use cases

  • Merge v2 docs and v3 code to identify breaking signature changes before release
  • Create a combined API reference for a SDK that pulls docs, README, and inline code comments
  • Audit a repo to surface implemented endpoints that lack documentation
  • Generate a conflict report for a cross-team review meeting
  • Resume a paused merge after a transient source outage using saved checkpoints

FAQ

What happens when sources disagree and neither is clearly authoritative?

The skill flags the conflict, marks its severity, and prompts you to choose an authoritative source or supply a resolution strategy; it never silently picks one.

Can the tool run incrementally on very large projects?

Yes. It saves checkpoint files and supports sequential processing, retries, and resume to handle memory limits or partial failures.