home / skills / srstomp / pokayokay / database-design
database-design skill

not checked
/plugins/pokayokay/skills/database-design
npx playbooks add skill srstomp/pokayokay --skill database-design
Review the files below or copy the command above to add this skill to your agents.
Files (7)
SKILL.md
12.5 KB
---
name: database-design
description: Design database schemas, plan migrations, optimize queries, and manage data models. Covers relational databases (PostgreSQL, MySQL, SQLite), document databases (MongoDB), and ORM integration (Prisma, Drizzle, TypeORM). Use this skill when designing schemas, reviewing data models, planning migrations, optimizing slow queries, or establishing database patterns for a project. Triggers on "database", "schema", "migration", "model", "query optimization", "index", "normalize/denormalize".
---

# Database Design

Design efficient, maintainable database schemas with safe migration strategies.

## Design Process

```
┌─────────────────────────────────────────────────────────────────┐
│                    DATABASE DESIGN PROCESS                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. REQUIREMENTS      2. MODELING         3. SCHEMA             │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │ Entities    │  →  │ ER diagram  │  →  │ Tables &    │       │
│  │ Attributes  │     │ Relations   │     │ Columns     │       │
│  │ Constraints │     │ Cardinality │     │ Constraints │       │
│  └─────────────┘     └─────────────┘     └─────────────┘       │
│                                                                 │
│  4. INDEXES          5. MIGRATION        6. REVIEW              │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │ Query       │  →  │ Safe changes│  →  │ Performance │       │
│  │ patterns    │     │ Rollback    │     │ Consistency │       │
│  │ Performance │     │ Zero-down   │     │ Integrity   │       │
│  └─────────────┘     └─────────────┘     └─────────────┘       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## Core Principles

### 1. Data Integrity First
Enforce constraints at the database level, not just application.

```sql
-- ✅ Database enforces integrity
CREATE TABLE orders (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID NOT NULL REFERENCES users(id),
  total DECIMAL(10,2) NOT NULL CHECK (total >= 0),
  status VARCHAR(20) NOT NULL DEFAULT 'pending'
);

-- ❌ Relying only on application validation
CREATE TABLE orders (
  id INTEGER,
  user_id INTEGER,  -- no FK
  total TEXT        -- wrong type, no constraint
);
```

### 2. Normalize by Default, Denormalize with Purpose
Start with 3NF. Denormalize only when you have measured performance needs.

```
✅ Normalize: Eliminate redundancy, maintain consistency
✅ Denormalize: Reduce joins for read-heavy queries (with justification)
❌ Premature optimization without query patterns
```

### 3. Explicit Over Implicit
Clear naming, explicit constraints, documented decisions.

```sql
-- ✅ Explicit
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

-- ❌ Implicit
user INTEGER  -- What does this reference?
created TEXT  -- When? Timezone?
```

### 4. Plan for Evolution
Schema will change. Design for safe migrations.

```
✅ Additive changes preferred (add columns, tables)
✅ Nullable new columns (can deploy before backfill)
✅ Soft deletes for recoverable data
❌ Breaking changes without migration plan
```

## Quick Decisions

### Primary Key Strategy

| Strategy | Use When | Pros | Cons |
|----------|----------|------|------|
| UUID v4 | Distributed systems, security-sensitive | No collisions, unpredictable | 128-bit, random = bad index | 
| UUID v7 | Modern default choice | Sortable, no collisions | Larger than integer |
| ULID | Need K-sortable + readable | URL-safe, time-ordered | Less common |
| Auto-increment | Simple apps, legacy compat | Small, sequential | Enumerable, single-point |
| Composite | Junction tables | Natural fit for M:N | Complex queries |

**Recommendation:** UUID v7 or ULID for new projects. Auto-increment for simple apps.

### Relationship Patterns

```
One-to-One:   user ←→ profile       → FK + UNIQUE on child
One-to-Many:  user ←→ orders        → FK on "many" side
Many-to-Many: users ←→ roles        → Junction table
Self-ref:     employee ←→ manager   → FK to same table
Polymorphic:  comments on any entity → See references/schema-patterns.md
```

### When to Denormalize

| Signal | Action |
|--------|--------|
| N+1 queries in hot path | Consider embedding |
| Join across 4+ tables frequently | Materialized view or redundant column |
| Counting relationships is slow | Store counter cache |
| Full-text search on joined data | Denormalize to search index |

**Rule:** Measure first. Denormalize with documentation.

### Soft Delete vs Hard Delete

| Use Soft Delete | Use Hard Delete |
|-----------------|-----------------|
| Audit requirements | GDPR "right to erasure" |
| User-recoverable data | Session/temporary data |
| Billing/financial records | PII after retention period |
| Referenced by other tables | Truly ephemeral data |

```sql
-- Soft delete pattern
deleted_at TIMESTAMPTZ DEFAULT NULL
-- Query non-deleted: WHERE deleted_at IS NULL
-- Create partial index for performance
CREATE INDEX idx_users_active ON users(email) WHERE deleted_at IS NULL;
```

## Standard Schema Patterns

### Timestamps

Always include, always use timezone:

```sql
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
-- Add trigger or ORM hook for updated_at
```

### Audit Columns

```sql
created_by UUID REFERENCES users(id),
updated_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
```

### Soft Delete

```sql
deleted_at TIMESTAMPTZ DEFAULT NULL,
deleted_by UUID REFERENCES users(id)
```

### Versioning

```sql
version INTEGER NOT NULL DEFAULT 1
-- Increment on update, use for optimistic locking
```

## Index Design Checklist

```
□ Primary key (automatic)
□ Foreign keys (add manually in most DBs)
□ Unique constraints
□ Columns in WHERE clauses (high selectivity)
□ Columns in ORDER BY
□ Columns in JOIN conditions
□ Composite indexes for multi-column queries

⚠️ Avoid:
□ Indexing low-cardinality columns alone (status, boolean)
□ Over-indexing (slows writes)
□ Indexes that duplicate existing coverage
```

### Index Types Quick Reference

| Type | Use Case | Example |
|------|----------|---------|
| B-tree | Default, range queries | Most columns |
| Hash | Equality only | Lookup tables |
| GIN | Arrays, JSONB, full-text | Tags, search |
| GiST | Geometric, range types | PostGIS, tsrange |
| BRIN | Very large, naturally ordered | Time-series |

## Migration Safety Checklist

Before any production migration:

```
□ Migration is reversible (has rollback plan)
□ Tested on copy of production data
□ Estimated lock time calculated
□ Deployment can proceed if migration fails
□ New code works with old AND new schema
□ Backfill strategy for new columns
```

### Safe Migration Patterns

| Change | Safe Approach |
|--------|---------------|
| Add column | Add as nullable, deploy code, backfill, add NOT NULL |
| Remove column | Stop using in code, deploy, then remove column |
| Rename column | Add new, copy data, update code, remove old |
| Add index | CREATE INDEX CONCURRENTLY (PostgreSQL) |
| Add constraint | Add as NOT VALID, then VALIDATE separately |

### Dangerous Operations

```sql
-- ⚠️ Locks table (avoid on large tables)
ALTER TABLE users ADD COLUMN name VARCHAR(255) NOT NULL DEFAULT '';
ALTER TABLE users ADD CONSTRAINT ... (without NOT VALID);
CREATE INDEX idx_users_email ON users(email);  -- non-concurrent

-- ✅ Safe alternatives
ALTER TABLE users ADD COLUMN name VARCHAR(255);  -- nullable first
ALTER TABLE users ADD CONSTRAINT ... NOT VALID;
ALTER TABLE users VALIDATE CONSTRAINT ...;  -- separate transaction
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
```

## Anti-Patterns

### ❌ God Tables
```sql
-- Everything in one table
CREATE TABLE data (
  id SERIAL,
  type VARCHAR(50),
  json_blob JSONB
);
-- No type safety, impossible to query efficiently
```

### ❌ EAV (Entity-Attribute-Value)
```sql
-- Attributes as rows
CREATE TABLE attributes (
  entity_id INT,
  attribute_name VARCHAR(100),
  attribute_value TEXT
);
-- Impossible to enforce types, terrible performance
```

### ❌ Implicit Relationships
```sql
-- Magic strings instead of FKs
CREATE TABLE orders (
  user TEXT,  -- Is this user_id? username? email?
  product TEXT
);
```

### ❌ Over-denormalization
```sql
-- Copying everything everywhere
CREATE TABLE orders (
  user_name VARCHAR(255),     -- What if name changes?
  user_email VARCHAR(255),    -- Stale data guaranteed
  user_address TEXT,
  user_phone VARCHAR(50)
);
```

### ❌ Missing Constraints
```sql
-- Database allows invalid data
CREATE TABLE products (
  price DECIMAL,  -- Can be NULL, negative, anything
  quantity INT    -- Negative inventory?
);
```

## ORM Integration

### Prisma Conventions

```prisma
model User {
  id        String   @id @default(uuid())
  email     String   @unique
  orders    Order[]
  createdAt DateTime @default(now()) @map("created_at")
  updatedAt DateTime @updatedAt @map("updated_at")
  
  @@map("users")
}

model Order {
  id        String   @id @default(uuid())
  userId    String   @map("user_id")
  user      User     @relation(fields: [userId], references: [id])
  total     Decimal
  status    OrderStatus @default(PENDING)
  
  @@map("orders")
  @@index([userId])
}
```

### Drizzle Conventions

```typescript
export const users = pgTable('users', {
  id: uuid('id').primaryKey().defaultRandom(),
  email: varchar('email', { length: 255 }).notNull().unique(),
  createdAt: timestamp('created_at').notNull().defaultNow(),
  updatedAt: timestamp('updated_at').notNull().defaultNow(),
});

export const orders = pgTable('orders', {
  id: uuid('id').primaryKey().defaultRandom(),
  userId: uuid('user_id').notNull().references(() => users.id),
  total: decimal('total', { precision: 10, scale: 2 }).notNull(),
  status: varchar('status', { length: 20 }).notNull().default('pending'),
}, (table) => ({
  userIdIdx: index('orders_user_id_idx').on(table.userId),
}));
```

## Design Deliverables

When designing a new schema, produce:

1. **Entity list** - All entities with key attributes
2. **ER diagram** - Relationships and cardinality (Mermaid or similar)
3. **Schema DDL** - CREATE TABLE statements
4. **Index plan** - Expected queries and supporting indexes
5. **Migration plan** - If modifying existing schema

### Example ER Diagram (Mermaid)

```mermaid
erDiagram
    users ||--o{ orders : places
    users ||--o{ addresses : has
    orders ||--|{ order_items : contains
    products ||--o{ order_items : "ordered in"
    
    users {
        uuid id PK
        string email UK
        string name
        timestamp created_at
    }
    
    orders {
        uuid id PK
        uuid user_id FK
        decimal total
        enum status
        timestamp created_at
    }
```

---

**References:**
- [references/schema-patterns.md](references/schema-patterns.md) — Common schema patterns (users, multi-tenant, polymorphic, etc.)
- [references/migration-strategies.md](references/migration-strategies.md) — Safe migration techniques, rollback procedures, zero-downtime patterns
- [references/index-design.md](references/index-design.md) — When and how to add indexes, query analysis
- [references/postgresql.md](references/postgresql.md) — PostgreSQL-specific features, extensions, optimization
- [references/prisma-patterns.md](references/prisma-patterns.md) — Prisma schema design, migrations, best practices
- [references/query-optimization.md](references/query-optimization.md) — Identifying and fixing slow queries, EXPLAIN analysis