home / skills / petekp / claude-code-setup / hierarchical-matching-systems
This skill diagnoses, architects, and debugs hierarchical matching systems across multi-level hierarchies to ensure correct, stable, and constrained matches.
npx playbooks add skill petekp/claude-code-setup --skill hierarchical-matching-systemsReview the files below or copy the command above to add this skill to your agents.
---
name: hierarchical-matching-systems
description: >
Expertise in architecting, implementing, reviewing, and debugging hierarchical matching systems.
Use when working with: (1) Two-sided matching (Gale-Shapley, hospital-resident, student-school),
(2) Assignment/optimization problems (Hungarian algorithm, bipartite matching),
(3) Multi-level hierarchy matching (org charts, taxonomies, nested categories),
(4) Entity resolution and record linkage across hierarchies.
Triggers: debugging match quality issues, reviewing matching algorithms, translating business
requirements into constraints, validating match correctness, architecting new matching systems,
fixing unstable matches, resolving constraint violations, diagnosing preference misalignment.
---
# Hierarchical Matching Systems
This skill provides rigid diagnostic and architectural procedures for hierarchical matching systems. Follow checklists exactly—matching bugs often hide in skipped steps.
## Quick Reference
- **Algorithm selection**: See [references/decision-guide.md](references/decision-guide.md)
- **Algorithm details**: See [references/algorithms.md](references/algorithms.md)
---
## 1. Problem Classification Checklist
Before any work, classify the problem. Check ALL that apply:
```
□ TWO-SIDED: Both sides have preferences (students↔schools, workers↔jobs)
□ ONE-SIDED: Only one side has preferences (tasks→workers, items→bins)
□ HIERARCHICAL: Entities exist at multiple levels (org→dept→team→person)
□ WEIGHTED: Matches have costs/scores to optimize
□ CONSTRAINED: Hard limits exist (capacity, exclusions, required pairings)
□ STABLE: Solution must resist defection (no blocking pairs)
□ OPTIMAL: Solution must minimize/maximize objective function
□ FUZZY: Entities may partially match (entity resolution, deduplication)
```
Classification determines algorithm family. Proceed to Section 2 for architecture or Section 3 for debugging.
---
## 2. Architecture Procedure
Follow these phases in order when designing a new matching system.
### Phase 2.1: Requirements Translation
Convert each business requirement into formal constraints:
| Business Requirement | Constraint Type | Formal Expression |
|---------------------|-----------------|-------------------|
| "Each student gets one school" | Capacity | `|matches(s)| = 1 ∀ student s` |
| "Schools have seat limits" | Capacity | `|matches(school)| ≤ capacity` |
| "Siblings must be together" | Coupling | `school(s1) = school(s2) if siblings(s1,s2)` |
| "Student X cannot attend Y" | Exclusion | `(X, Y) ∉ matches` |
| "Priority for residents" | Preference ordering | `rank(resident) < rank(non-resident)` |
**Checklist:**
```
□ List ALL business requirements
□ Classify each as: capacity | coupling | exclusion | ordering | soft preference
□ Identify conflicts between requirements (document tradeoffs)
□ Distinguish HARD constraints (must satisfy) from SOFT (optimize toward)
□ Validate translations with stakeholder examples
```
### Phase 2.2: Algorithm Selection
Use [references/decision-guide.md](references/decision-guide.md) to select algorithm. Verify:
```
□ Algorithm handles all HARD constraints
□ Algorithm can optimize SOFT constraints (or document gaps)
□ Complexity acceptable for data size (see references/algorithms.md)
□ Stability requirements met (if two-sided)
□ Optimality requirements met (if weighted)
```
### Phase 2.3: Data Model Design
Define entities, relationships, and preference representation:
```
□ Entity schema for each side (attributes, identifiers)
□ Preference representation (ranked list | score matrix | pairwise comparisons)
□ Constraint encoding (how exclusions/couplings are stored)
□ Hierarchy representation (if multi-level: tree | DAG | adjacency list)
□ Tie-breaking rules (deterministic ordering for equal preferences)
```
### Phase 2.4: Interface Contracts
Specify inputs, outputs, and invariants:
**Input Contract:**
```
□ Preference format and validation rules
□ Constraint format and validation rules
□ Required vs optional fields
□ How missing preferences are handled (reject | default rank | exclude)
```
**Output Contract:**
```
□ Match format (pairs | assignment map | ranked list)
□ Unmatched entity handling (explicit list | null matches | error)
□ Match metadata (scores, stability proof, constraint satisfaction report)
```
**Invariants:**
```
□ Determinism: same input → same output (document randomness if any)
□ Completeness: all entities matched OR explicitly unmatched
□ Validity: all matches satisfy hard constraints
```
### Phase 2.5: Testing Strategy
Define validation before implementation:
```
□ Unit tests for preference parsing and constraint validation
□ Property tests: stability, optimality, constraint satisfaction
□ Edge cases: empty inputs, single entity, all tied preferences
□ Regression tests from known-good examples
□ Performance benchmarks at target scale
```
---
## 3. Debugging Procedure
Follow this diagnostic sequence for any matching issue. Do not skip steps.
### Phase 3.1: Symptom Classification
Identify the symptom category:
| Symptom | Category | Go To |
|---------|----------|-------|
| Same inputs, different outputs | INSTABILITY | 3.2 |
| Matches violate business rules | CONSTRAINT VIOLATION | 3.3 |
| Matches technically valid but "wrong" | PREFERENCE MISALIGNMENT | 3.4 |
| Errors with nested/hierarchical data | HIERARCHY BUG | 3.5 |
| Poor performance at scale | PERFORMANCE | 3.6 |
### Phase 3.2: Instability Diagnosis
**Root causes of non-deterministic matches:**
```
□ RANDOMNESS: Check for unseeded RNG in tie-breaking
→ Fix: Use deterministic tie-breaker (lexicographic ID, timestamp)
□ FLOATING POINT: Check score comparisons for floating point issues
→ Fix: Use epsilon comparison or integer scores
□ HASH ORDERING: Check if iteration order depends on hash maps
→ Fix: Sort keys before iteration
□ PARALLEL RACE: Check for concurrent modifications
→ Fix: Synchronize or use sequential processing
□ INPUT ORDERING: Check if algorithm is order-sensitive
→ Fix: Canonicalize input order (sort by ID)
```
**Verification:**
```
□ Run matching 10x with identical inputs
□ Diff all outputs
□ If any differ, add logging to identify divergence point
```
### Phase 3.3: Constraint Violation Diagnosis
**Diagnostic sequence:**
```
1. □ IDENTIFY: Which specific constraint is violated?
→ List the violated constraint and the violating match
2. □ TRACE: Where should constraint be enforced?
→ Map constraint to code location (filter | validation | algorithm step)
3. □ VERIFY ENCODING: Is constraint correctly represented?
→ Print constraint data structure, verify against requirement
4. □ VERIFY ENFORCEMENT: Is constraint checked at right time?
→ Add logging before/after enforcement point
5. □ CHECK ORDERING: Is constraint checked before conflicting decisions?
→ Trace decision sequence, verify constraint checked first
6. □ CHECK COMPLETENESS: Are all instances covered?
→ Enumerate all entities that should be constrained
```
**Common failure patterns:**
| Pattern | Symptom | Fix |
|---------|---------|-----|
| Late enforcement | Valid intermediate state, invalid final | Move check earlier |
| Partial coverage | Some entities constrained, others not | Enumerate all cases |
| Soft vs hard confusion | Constraint violated for "better" match | Reclassify as hard |
| Stale data | Constraint on outdated values | Refresh before check |
### Phase 3.4: Preference Misalignment Diagnosis
When matches are valid but don't reflect intended priorities:
```
1. □ EXTRACT: Get the actual preference data used
→ Log/print the exact preference structure at match time
2. □ COMPARE: Check against expected preferences
→ Side-by-side diff with business-stated priorities
3. □ TRACE TRANSFORMATION: Follow preference from input to algorithm
→ Log at each transformation step (parsing, normalization, scoring)
4. □ CHECK SCORING: Verify score calculation
→ Manual calculation for 2-3 example cases
5. □ CHECK AGGREGATION: If multi-criteria, verify combination
→ Test each criterion independently, then combined
6. □ CHECK NORMALIZATION: Verify scale/range handling
→ Check for min/max, z-score, or rank normalization bugs
```
**Scoring function checklist:**
```
□ Direction correct (higher = better or lower = better, consistently)
□ Scale appropriate (no single factor dominating)
□ Missing values handled (null → 0? → excluded? → default?)
□ Ties handled explicitly (not left to floating point chance)
□ Edge cases: extreme values, all same values, single candidate
```
### Phase 3.5: Hierarchy Traversal Diagnosis
For multi-level matching issues:
```
1. □ VISUALIZE: Draw the hierarchy with the problematic match
→ Tree diagram showing all levels and the match path
2. □ CHECK INHERITANCE: Do child constraints inherit from parent?
→ Verify constraint propagation rules
3. □ CHECK AGGREGATION: How do child preferences roll up?
→ Verify aggregation function (sum | max | weighted | majority)
4. □ CHECK LEVEL INTERACTION: Can matches cross levels?
→ Document allowed/forbidden cross-level matches
5. □ CHECK TRAVERSAL ORDER: Top-down or bottom-up?
→ Verify algorithm processes levels in intended order
6. □ CHECK PARTIAL MATCHES: Can a parent match without all children?
→ Document completeness requirements per level
```
**Common hierarchy bugs:**
| Bug | Symptom | Fix |
|-----|---------|-----|
| Missing propagation | Child ignores parent constraint | Add inheritance logic |
| Double counting | Same entity weighted multiple times | Deduplicate in aggregation |
| Level skipping | Match at wrong level | Add level validation |
| Orphan handling | Unattached children cause errors | Define orphan policy |
### Phase 3.6: Performance Diagnosis
For scale and speed issues:
```
1. □ PROFILE: Identify the slow component
→ Time each phase: input parsing, preference building, matching, output
2. □ COMPLEXITY CHECK: Verify actual vs expected complexity
→ Log iteration counts, compare to theoretical O(n)
3. □ MEMORY CHECK: Profile memory usage
→ Watch for preference matrix explosion (n² space)
4. □ ALGORITHM FIT: Verify algorithm appropriate for scale
→ See references/algorithms.md for complexity comparison
5. □ CACHING: Check for redundant computation
→ Log cache hits/misses for preference lookups
6. □ BATCH VS STREAMING: Check processing model
→ Full recomputation vs incremental updates
```
---
## 4. Testing & Validation Procedure
### 4.1: Correctness Properties
Test these properties for every matching system:
```
□ DETERMINISM: run(input) = run(input) (10 trials minimum)
□ COMPLETENESS: all entities either matched or explicitly unmatched
□ VALIDITY: all matches satisfy all hard constraints
□ STABILITY (if applicable): no blocking pairs exist
□ OPTIMALITY (if applicable): objective function at expected value
```
### 4.2: Stability Verification (Two-Sided Matching)
For stable matching, verify no blocking pairs:
```python
# Pseudocode - verify no blocking pair exists
for each unmatched_pair (a, b):
if a prefers b over current_match(a):
if b prefers a over current_match(b):
FAIL: blocking pair found (a, b)
```
```
□ Enumerate all non-matched pairs
□ Check mutual preference for each
□ Report any blocking pairs found
□ For large instances, sample-check (document coverage)
```
### 4.3: Constraint Satisfaction Verification
```
□ List all hard constraints
□ For each match, verify against each constraint
□ Generate constraint satisfaction report
□ Flag any violations with specific match and constraint
```
### 4.4: Edge Case Test Suite
Mandatory test cases:
```
□ Empty input (no entities on one or both sides)
□ Single entity (one-to-one degenerate case)
□ All identical preferences (maximum tie scenario)
□ Mutually exclusive preferences (everyone wants same thing)
□ Impossible constraints (unsatisfiable, should error clearly)
□ Maximum capacity (all slots exactly filled)
□ Minimum capacity (barely enough slots)
□ Self-referential (can entity match itself? test boundary)
□ Circular preferences (A→B→C→A)
```
### 4.5: Regression Test Maintenance
```
□ Capture real production cases that revealed bugs
□ Minimize to smallest reproducing example
□ Document expected behavior explicitly
□ Run on every change to matching logic
```
---
## 5. Review Checklist
When reviewing matching system code or design:
### 5.1: Design Review
```
□ Problem correctly classified (Section 1)
□ Algorithm appropriate for problem class (references/decision-guide.md)
□ All business requirements mapped to constraints (Section 2.1)
□ Hard vs soft constraints clearly distinguished
□ Tie-breaking is deterministic and documented
□ Hierarchy semantics defined (if applicable)
```
### 5.2: Implementation Review
```
□ Preference representation matches algorithm requirements
□ Constraints enforced at correct point in algorithm
□ No hidden randomness (unseeded RNG, hash iteration)
□ Floating point comparison handled correctly
□ Edge cases handled (empty, single, ties)
□ Error messages identify specific constraint violations
```
### 5.3: Testing Review
```
□ All properties from 4.1 tested
□ Edge cases from 4.4 covered
□ Performance benchmarked at realistic scale
□ Regression tests exist for past bugs
```
---
## Appendix: Common Anti-Patterns
| Anti-Pattern | Problem | Solution |
|--------------|---------|----------|
| Greedy first-come | Order-dependent, non-optimal | Use proper algorithm |
| Score = sum(all factors) | One factor dominates | Normalize scales |
| Retry until valid | Non-deterministic, slow | Fix constraint order |
| Global preference cache | Stale across updates | Invalidate on change |
| String matching for entities | Case/whitespace bugs | Use canonical IDs |
| Float equality for ties | Non-deterministic | Use epsilon or integer |
| Recursive hierarchy walk | Stack overflow risk | Use iterative with explicit stack |
| N² preference matrix | Memory explosion | Use sparse representation |
This skill provides pragmatic procedures for architecting, implementing, reviewing, and debugging hierarchical matching systems. It codifies classification checklists, an ordered architecture process, and step-by-step debugging flows for two-sided, assignment, multi-level, and entity-resolution matching problems. Use it to convert business requirements into formal constraints and to restore correctness and stability in production matchers.
The skill inspects problem class (two-sided, one-sided, hierarchical, weighted, constrained, fuzzy) and maps requirements to constraint types and algorithms. It prescribes data model, interface contracts, testing, and deterministic tie-breaking. For debugging, it walks a strict diagnostic sequence: classify the symptom, trace encoding and enforcement, verify preference transformation, and profile performance.
How do I decide between stability and optimality?
Classify whether a blocking-pair-free outcome (stability) is required by stakeholders or whether a global objective should be optimized. Two-sided preference problems typically prioritize stability; weighted assignments favor optimality. Document tradeoffs if both are requested.
What’s the first thing to check for non-deterministic outputs?
Verify unseeded randomness, hash-dependent iteration order, input ordering sensitivity, and floating-point tie handling. Run the matching 10+ times with identical inputs and diff outputs to localize divergence.