home / skills / phrazzld / claude-config / schema-design

schema-design skill

/skills/schema-design

This skill helps you design robust database schemas by applying normalization, data types, keys, constraints, and anti-pattern avoidance for scalable systems.

npx playbooks add skill phrazzld/claude-config --skill schema-design

Review the files below or copy the command above to add this skill to your agents.

Files (6)
skill.md
3.4 KB
---
name: schema-design
description: "Universal database schema design principles: normalization strategy, data types, primary keys, constraints, anti-patterns, and red flags. Apply when designing schemas, reviewing database architecture, or refactoring data models."
---

# Database Schema Design Principles

**"Schema design debt compounds faster than code debt. Fix it now or pay 10x later."**

## Foundation Principles

### Primary Keys
- Every table MUST have a primary key
- Prefer surrogate keys (auto-increment, UUIDv7) over composite natural keys
- UUIDv7 for distributed systems, auto-increment BIGINT for monoliths

### Foreign Keys
- Use FK constraints unless specific reason not to (high-volume logging, sharded DBs)
- ON DELETE: RESTRICT (safest), CASCADE (use sparingly), SET NULL (breaks audit)

### Data Types
- Choose smallest sufficient type (BIGINT vs INT = 4 bytes * rows)
- Money: DECIMAL (never FLOAT/DOUBLE)
- Dates without time: DATE not DATETIME
- Small sets: ENUM not VARCHAR

### Constraints
- NOT NULL on required columns
- UNIQUE on natural keys
- CHECK for business rules
- DEFAULT where appropriate

## Quality Checklist

### Structural Integrity
- [ ] Every table has primary key
- [ ] Foreign key constraints defined
- [ ] Appropriate data types (smallest sufficient)
- [ ] NOT NULL, UNIQUE, CHECK constraints

### Anti-Pattern Scan
- [ ] No EAV (entity-attribute-value) patterns
- [ ] No god tables (> 50 columns)
- [ ] No multi-valued fields (CSV in columns)
- [ ] No DATETIME for date-only data

### Performance
- [ ] Indexes match query patterns
- [ ] Foreign keys indexed
- [ ] Composite index column order optimized

## Decision Trees

### "Should I denormalize this?"
```
Have evidence of query performance problem?
├─ NO → DON'T denormalize (premature optimization)
└─ YES → Tried indexes, query optimization, caching?
   ├─ NO → Try those first
   └─ YES → Read-heavy (> 100:1)?
      ├─ NO → Normalize, optimize queries
      └─ YES → Denormalize specific fields
```

### "UUID or auto-increment?"
```
Distributed system (multiple write nodes)?
├─ YES → UUIDv7 (time-ordered, better than v4)
└─ NO → Exposed to users (issue-123)?
   ├─ YES → Auto-increment (better UX)
   └─ NO → Auto-increment (better performance)
```

### "Soft or hard delete?"
```
GDPR "right to erasure" applies?
├─ YES → Hard delete or audit table
└─ NO → Need audit trail?
   ├─ YES → Audit table pattern (recommended)
   └─ NO → High deletion rate (> 20%)?
      ├─ YES → Hard delete
      └─ NO → Soft delete acceptable
```

## References

Detailed patterns and examples:
- `references/anti-patterns.md` — EAV, god tables, multi-valued fields, red flags
- `references/normalization.md` — 1NF/2NF/3NF, when to denormalize, OLTP vs OLAP
- `references/advanced-patterns.md` — Soft delete, temporal data, JSON columns
- `references/naming-conventions.md` — Tables, columns, indexes, constraints
- `references/performance-patterns.md` — Indexing strategy, partitioning, data types

## Remember

**"The best schema is one you can understand in 6 months and modify with confidence."**

Design schemas that:
1. **Enforce integrity** — Constraints, foreign keys, data types
2. **Optimize for common patterns** — Indexes, denormalization where proven
3. **Enable evolution** — Proper normalization, migration strategy
4. **Prevent known anti-patterns** — No EAV, god tables, multi-valued fields

Overview

This skill codifies universal database schema design principles to help you design, review, and refactor relational data models. It focuses on keys, data types, constraints, normalization, anti-pattern detection, and practical decision trees for trade-offs. Use it to catch structural debt early and make schemas maintainable, performant, and evolvable.

How this skill works

The skill inspects table definitions and metadata to verify primary keys, foreign key constraints, appropriate data types, and presence of NOT NULL/UNIQUE/CHECK defaults. It scans for anti-patterns (EAV, god tables, multi-valued fields) and evaluates indexing and foreign key indexing against common query patterns. Decision trees guide choices like UUID vs auto-increment, denormalization, and soft vs hard deletes based on concrete criteria.

When to use it

  • Designing a new database schema for an application or service
  • Performing architecture reviews or technical debt audits on existing schemas
  • Refactoring data models before scaling or migrating databases
  • Diagnosing performance issues related to schema choices and indexes
  • Deciding between normalization and denormalization for read/write patterns

Best practices

  • Always define a primary key; prefer surrogate keys (UUIDv7 for distributed, BIGINT auto-increment for monoliths)
  • Enforce integrity with FK constraints, NOT NULL, UNIQUE, and CHECK where appropriate
  • Choose the smallest sufficient data type and use DECIMAL for money; DATE for date-only values
  • Index columns based on query patterns and ensure foreign key columns are indexed
  • Avoid EAV, god tables (>50 columns), CSV/multi-valued columns and premature denormalization
  • Use decision trees: optimize/index/cache before denormalizing; pick UUIDv7 only for multi-writer systems

Example use cases

  • Reviewing a legacy database to identify schema debt and propose prioritized fixes
  • Designing tables for a distributed service that requires globally unique IDs
  • Refactoring an OLTP schema to reduce row size and improve index efficiency
  • Choosing a delete strategy (soft vs hard) based on GDPR and audit requirements
  • Deciding whether to add a denormalized column for a heavy-read endpoint after profiling

FAQ

When should I use UUIDv7 instead of auto-increment?

Use UUIDv7 for distributed systems with multiple write nodes; prefer auto-increment BIGINT for single-node monoliths for performance and UX when sequential IDs are acceptable.

Is denormalization always bad?

No. Denormalize only when you have proven read-heavy performance problems and after trying indexes, query optimization, and caching; target specific fields, not whole tables.

Can I skip foreign key constraints for performance?

Only in narrow cases (very high-volume logging, sharded systems). Prefer FK constraints for integrity; if skipped, enforce relationships at the application layer and document the reason.