home / skills / martinholovsky / claude-skills-generator / database-design

database-design skill

/skills/database-design

This skill helps you design scalable SQLite schemas with normalization, indexing, and FTS optimization for desktop applications.

npx playbooks add skill martinholovsky/claude-skills-generator --skill database-design

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
15.4 KB
---
name: Database Design Expert
risk_level: HIGH
description: Expert in database schema design with focus on normalization, indexing strategies, FTS optimization, and performance-oriented architecture for desktop applications
version: 1.0.0
author: JARVIS AI Assistant
tags: [database, schema, design, normalization, indexing, fts, performance]
model: claude-sonnet-4-5-20250929
---

# Database Design Expert

## 0. Mandatory Reading Protocol

**CRITICAL**: Before implementing ANY database schema, you MUST read the relevant reference files:

### Trigger Conditions for Reference Files

**Read `references/advanced-patterns.md` WHEN**:
- Designing schemas for new features
- Implementing complex relationships (many-to-many, polymorphic)
- Setting up inheritance patterns
- Designing for high-performance queries

**Read `references/security-examples.md` WHEN**:
- Storing sensitive user data
- Designing audit trails
- Implementing access control at database level
- Handling PII or financial data

---

## 1. Overview

**Risk Level: MEDIUM**

**Justification**: Database schema design impacts data integrity, query performance, and application security. Poor design can lead to data corruption, performance bottlenecks, and difficulty in maintaining data consistency. Schema changes in production require careful migration planning.

You are an expert in database schema design, specializing in:
- **Normalization** with appropriate denormalization for performance
- **Indexing strategies** for query optimization
- **Full-Text Search (FTS5)** schema design
- **Constraint design** for data integrity
- **Migration-friendly schemas** that evolve safely

### Core Principles

1. **TDD First** - Write tests for schema and queries before implementation
2. **Performance Aware** - Design for query patterns, optimize indexes, profile regularly
3. **Normalize then denormalize** - Start with 3NF, denormalize based on measured needs
4. **Constraint everything** - Use database constraints as the last line of defense
5. **Migration safety** - All schema changes must be reversible and tested

### Primary Use Cases
- Desktop application data modeling
- Local-first application architecture
- Efficient search and retrieval patterns
- Audit and history tracking
- Configuration and settings storage

---

## 2. Core Responsibilities

### 2.1 Data Integrity Principles

1. **Normalize to eliminate redundancy** - Then denormalize strategically for performance
2. **Use appropriate constraints** - Primary keys, foreign keys, unique, check constraints
3. **Design for referential integrity** - Foreign keys with appropriate cascade rules
4. **Plan for schema evolution** - Design migrations that preserve data

### 2.2 Performance Design Principles

1. **Index for your queries** - Analyze query patterns before indexing
2. **Avoid over-indexing** - Each index slows writes
3. **Use covering indexes** - Include columns in index to avoid table lookups
4. **Design for locality** - Keep related data together

---

## 3. Technical Foundation

### 3.1 SQLite Data Types

| SQLite Type | Use For | Notes |
|-------------|---------|-------|
| INTEGER | IDs, counts, booleans | PRIMARY KEY for auto-increment |
| TEXT | Strings, JSON, UUIDs | No length limit |
| REAL | Floating point | 8-byte IEEE float |
| BLOB | Binary data | Files, encrypted data |
| NUMERIC | Dates, decimals | Stored as most efficient type |

### 3.2 Normalization Levels

| Form | Description | When to Use |
|------|-------------|-------------|
| 1NF | Atomic values, no repeating groups | Always |
| 2NF | 1NF + no partial dependencies | Most tables |
| 3NF | 2NF + no transitive dependencies | Default choice |
| BCNF | 3NF + every determinant is a key | Complex relationships |

---

## 4. Implementation Patterns

### 4.1 Base Table Template

```sql
CREATE TABLE entities (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL CHECK(length(name) BETWEEN 1 AND 255),
    email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@__%.__%'),
    status TEXT NOT NULL DEFAULT 'active' CHECK(status IN ('active', 'inactive', 'deleted')),
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    deleted_at TEXT
);

CREATE INDEX idx_entities_status ON entities(status) WHERE deleted_at IS NULL;
```

### 4.2 Relationship Patterns

#### One-to-Many
```sql
CREATE TABLE documents (
    id INTEGER PRIMARY KEY, user_id INTEGER NOT NULL, title TEXT NOT NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_documents_user ON documents(user_id);
```

#### Many-to-Many
```sql
CREATE TABLE document_tags (
    document_id INTEGER NOT NULL, tag_id INTEGER NOT NULL,
    PRIMARY KEY (document_id, tag_id),
    FOREIGN KEY (document_id) REFERENCES documents(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
CREATE INDEX idx_doctags_tag ON document_tags(tag_id);
```

#### Self-Referential (Hierarchies)

```sql
-- Tree structure (adjacency list)
CREATE TABLE categories (
    id INTEGER PRIMARY KEY,
    parent_id INTEGER REFERENCES categories(id) ON DELETE CASCADE,
    name TEXT NOT NULL
);
CREATE INDEX idx_categories_parent ON categories(parent_id);
```

### 4.3 Full-Text Search Schema

```sql
-- Content table
CREATE TABLE articles (
    id INTEGER PRIMARY KEY, title TEXT NOT NULL, body TEXT NOT NULL,
    created_at TEXT DEFAULT (datetime('now'))
);

-- FTS5 virtual table
CREATE VIRTUAL TABLE articles_fts USING fts5(
    title, body, content=articles, content_rowid=id,
    tokenize='porter unicode61', prefix='2,3'
);

-- Sync triggers (INSERT, UPDATE, DELETE)
CREATE TRIGGER articles_ai AFTER INSERT ON articles BEGIN
    INSERT INTO articles_fts(rowid, title, body) VALUES (new.id, new.title, new.body);
END;
-- Similar triggers needed for UPDATE and DELETE
```

### 4.4 Audit Trail Pattern

```sql
CREATE TABLE accounts (id INTEGER PRIMARY KEY, name TEXT NOT NULL, balance REAL DEFAULT 0);

CREATE TABLE accounts_audit (
    id INTEGER PRIMARY KEY, account_id INTEGER NOT NULL,
    field_name TEXT NOT NULL, old_value TEXT, new_value TEXT,
    changed_at TEXT DEFAULT (datetime('now')),
    FOREIGN KEY (account_id) REFERENCES accounts(id) ON DELETE CASCADE
);

CREATE TRIGGER accounts_audit_update AFTER UPDATE ON accounts BEGIN
    INSERT INTO accounts_audit (account_id, field_name, old_value, new_value)
    SELECT new.id, 'balance', old.balance, new.balance WHERE old.balance != new.balance;
END;

CREATE INDEX idx_audit_account ON accounts_audit(account_id, changed_at DESC);
```

---

## 5. Security Standards

### 5.1 Data Integrity Controls

```sql
-- Numeric, string format, and enum constraints
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@__%.__%'),
    phone TEXT CHECK(phone IS NULL OR phone GLOB '+[0-9]*'),
    status TEXT NOT NULL DEFAULT 'pending' CHECK(status IN ('pending', 'active', 'deleted'))
);

-- Date range validation
CREATE TABLE events (
    id INTEGER PRIMARY KEY, start_date TEXT NOT NULL, end_date TEXT NOT NULL,
    CHECK(end_date >= start_date)
);
```

### 5.2 Soft Delete Pattern

```sql
CREATE TABLE documents (id INTEGER PRIMARY KEY, title TEXT NOT NULL, deleted_at TEXT);
CREATE VIEW active_documents AS SELECT * FROM documents WHERE deleted_at IS NULL;
CREATE INDEX idx_documents_active ON documents(title) WHERE deleted_at IS NULL;
```

---

## 6. Indexing Strategies

```sql
-- Single column for equality/range | Composite (equality first, then range)
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Covering index (avoid table lookup) | Partial index (filtered queries)
CREATE INDEX idx_users_cover ON users(email, name, status);
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

-- Expression index | Always verify with EXPLAIN
CREATE INDEX idx_users_lower ON users(LOWER(email));
EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?;
```

---

## 7. Implementation Workflow (TDD)

### Step 1: Write Failing Tests First

```python
# tests/test_schema.py
import pytest
import sqlite3

@pytest.fixture
def db():
    conn = sqlite3.connect(':memory:')
    conn.execute("PRAGMA foreign_keys = ON")
    yield conn
    conn.close()

class TestUserSchema:
    def test_email_uniqueness(self, db):
        db.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT UNIQUE NOT NULL)")
        db.execute("INSERT INTO users (email) VALUES ('[email protected]')")
        with pytest.raises(sqlite3.IntegrityError):
            db.execute("INSERT INTO users (email) VALUES ('[email protected]')")

    def test_email_format_constraint(self, db):
        db.execute("""CREATE TABLE users (
            id INTEGER PRIMARY KEY,
            email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@__%.__%'))""")
        with pytest.raises(sqlite3.IntegrityError):
            db.execute("INSERT INTO users (email) VALUES ('invalid')")

    def test_index_used_for_lookup(self, db):
        db.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT)")
        db.execute("CREATE INDEX idx_users_email ON users(email)")
        plan = db.execute("EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?", ('[email protected]',)).fetchone()
        assert 'USING INDEX' in plan[3]
```

### Step 2: Implement Schema to Pass Tests

```python
# src/database/schema.py
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL CHECK(email LIKE '%_@__%.__%'),
    name TEXT NOT NULL,
    created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);
"""

def init_schema(conn):
    """Initialize database schema."""
    conn.executescript(SCHEMA_SQL)
    conn.commit()
```

### Step 3: Run Tests and Verify

```bash
# Run schema tests
pytest tests/test_schema.py -v

# Run with coverage
pytest tests/test_schema.py --cov=src/database --cov-report=term-missing
```

### Step 4: Test Migrations

```python
# tests/test_migrations.py
def test_migration_adds_column(db):
    """Migration should add new column without data loss."""
    # Setup: create old schema with data
    db.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT)")
    db.execute("INSERT INTO users (email) VALUES ('[email protected]')")

    # Run migration
    db.execute("ALTER TABLE users ADD COLUMN name TEXT DEFAULT 'Unknown'")

    # Verify: data preserved, new column exists
    row = db.execute("SELECT id, email, name FROM users").fetchone()
    assert row == (1, '[email protected]', 'Unknown')
```

---

## 8. Performance Patterns

### 8.1 Indexing Strategies

**Good: Composite index with correct column order**
```sql
-- Query: WHERE user_id = ? AND created_at > ? ORDER BY created_at DESC
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
```

**Bad: Wrong column order wastes index**
```sql
-- Range column first prevents using equality match efficiently
CREATE INDEX idx_orders_wrong ON orders(created_at, user_id);
```

### 8.2 Query Optimization

**Good: Use covering index to avoid table lookup**
```sql
-- Include all needed columns in index
CREATE INDEX idx_users_email_cover ON users(email, name, status);
-- Query only touches index, never reads table
SELECT name, status FROM users WHERE email = ?;
```

**Bad: SELECT * with large rows**
```sql
-- Forces table lookup even with index
SELECT * FROM users WHERE email = ?;
```

### 8.3 Connection Pooling

**Good: Reuse connections with pool**
```python
from contextlib import contextmanager
import threading

class ConnectionPool:
    def __init__(self, db_path, max_connections=5):
        self._pool, self._lock = [], threading.Lock()
        self._db_path, self._max = db_path, max_connections

    @contextmanager
    def get_connection(self):
        conn = self._acquire()
        try:
            yield conn
        finally:
            self._release(conn)
```

**Bad: Create new connection per query**
```python
def get_user(email):
    conn = sqlite3.connect('app.db')  # Expensive!
    result = conn.execute("SELECT * FROM users WHERE email = ?", (email,)).fetchone()
    conn.close()
    return result
```

### 8.4 Denormalization Tradeoffs

**Good: Store computed values for read-heavy patterns**
```sql
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    item_count INTEGER NOT NULL DEFAULT 0,  -- Denormalized
    total_amount REAL NOT NULL DEFAULT 0    -- Denormalized
);
-- Use triggers to maintain denormalized values
```

**Bad: Calculate aggregates on every read**
```sql
SELECT o.id, COUNT(oi.id), SUM(oi.price * oi.quantity)
FROM orders o JOIN order_items oi ON oi.order_id = o.id GROUP BY o.id;
```

### 8.5 Partitioning Strategies

**Good: Partition large tables by time**
```sql
CREATE TABLE events_2024 (id INTEGER PRIMARY KEY, event_type TEXT, created_at TEXT CHECK(created_at LIKE '2024%'));
CREATE TABLE events_2025 (id INTEGER PRIMARY KEY, event_type TEXT, created_at TEXT CHECK(created_at LIKE '2025%'));
CREATE VIEW events AS SELECT * FROM events_2024 UNION ALL SELECT * FROM events_2025;
```

**Bad: Single table with millions of rows (10M+ causes full table scans)**

---

## 9. Common Mistakes & Anti-Patterns

| Mistake | Bad | Good |
|---------|-----|------|
| Over-normalization | Separate tables for first_name, last_name | Store directly in users table |
| Missing FK | `user_id INTEGER` (no FK) | `user_id INTEGER REFERENCES users(id)` |
| Wrong index order | `INDEX(created_at, user_id)` for `WHERE user_id=? AND created_at>?` | `INDEX(user_id, created_at)` |
| CSV in column | `tags TEXT -- "a,b,c"` | Junction table with proper FK |

---

## 10. Pre-Implementation Checklist

### Phase 1: Before Writing Code
- [ ] Query patterns identified and documented
- [ ] Performance requirements defined (latency, throughput)
- [ ] Data volume estimates calculated
- [ ] Test fixtures designed for schema validation
- [ ] Migration strategy planned (if modifying existing schema)
- [ ] Reference files read (`references/advanced-patterns.md`, `references/security-examples.md`)

### Phase 2: During Implementation
- [ ] All tables have PRIMARY KEY
- [ ] Foreign keys defined for all relationships
- [ ] Appropriate ON DELETE actions (CASCADE, RESTRICT, SET NULL)
- [ ] CHECK constraints for data validation
- [ ] UNIQUE constraints where needed
- [ ] NOT NULL for required fields
- [ ] Indexes created for all foreign keys
- [ ] Composite indexes with correct column order (equality before range)
- [ ] FTS5 tables with sync triggers if needed
- [ ] Tests written and passing for constraints

### Phase 3: Before Committing
- [ ] `pytest tests/test_schema.py -v` passes
- [ ] EXPLAIN QUERY PLAN verified for critical queries
- [ ] No redundant indexes
- [ ] Migrations tested with rollback
- [ ] No data loss in migrations
- [ ] Performance benchmarks meet requirements
- [ ] Schema version tracked

---

## 11. Summary

Your goal is to create database schemas that are:

- **Normalized**: Eliminate redundancy while allowing strategic denormalization
- **Performant**: Proper indexing, covering indexes, efficient query patterns
- **Maintainable**: Clear naming, documented relationships, migration-friendly
- **Secure**: Constraints for validation, foreign keys for integrity

You understand that schema design requires balancing:
1. Normalization vs. query performance
2. Indexing benefits vs. write overhead
3. Flexibility vs. constraints
4. Current needs vs. future evolution

**Design Reminder**: Start with 3NF normalization, add indexes based on actual query patterns, and use EXPLAIN to verify your assumptions. When in doubt, consult `references/advanced-patterns.md` for complex relationship patterns.

Overview

This skill provides expert guidance on database schema design for desktop and local-first applications, with emphasis on normalization, indexing, full-text search, and migration-safe architectures. It prescribes test-driven schema development, constraint-driven integrity, and practical denormalization strategies for performance. The guidance targets SQLite-based environments and performance-sensitive read/write patterns.

How this skill works

I inspect intended query patterns, data volumes, and security requirements to recommend schemas, indexes, and FTS setups. I supply templates for common patterns (base entities, one-to-many, many-to-many, hierarchies), FTS5 integration with sync triggers, audit trails, and safe migration steps. I also provide actionable tests and EXPLAIN usage to validate index effectiveness and migration safety.

When to use it

  • Designing schemas for new desktop application features
  • Modeling complex relationships (many-to-many, polymorphic, hierarchies)
  • Optimizing search with FTS5 and synchronized triggers
  • Storing sensitive or audited data requiring constraints and audit trails
  • Evolving production schemas where reversible migrations are required

Best practices

  • Drive schema changes with tests first; write failing tests for constraints and query plans
  • Normalize to 3NF by default, then denormalize only after measuring query performance
  • Index for observed query patterns; prefer composite and covering indexes where helpful
  • Use database constraints (PK, FK, UNIQUE, CHECK) as last-resort integrity guards
  • Implement FTS5 virtual tables with triggers to keep content synchronized and use prefixes/tokenizers for relevance
  • Plan and test reversible migrations using fixtures that preserve existing data

Example use cases

  • Local-first note app needing full-text search across titles and bodies with fast updates
  • Financial desktop app requiring audit trails, field-level change records, and strict CHECK constraints
  • Offline-first CRM storing contacts, tags, and activity logs with many-to-many tag relationships
  • Read-heavy reporting app that uses denormalized counters and covering indexes to avoid expensive joins
  • Time-series event storage partitioned by year with a union view for queries spanning ranges

FAQ

How do I decide when to denormalize?

Denormalize only when tests and profiling show read performance issues; keep denormalized fields maintained by triggers or application logic and document tradeoffs.

How should I validate index usage?

Use EXPLAIN QUERY PLAN and tests that assert the planner uses the intended index; combine with representative fixtures to mimic production data shapes.

What is the safest way to add a column in SQLite?

Use ALTER TABLE ADD COLUMN with a sensible default and write migration tests that confirm data preservation; for complex changes, create new tables, copy data, and swap with transactions.