home / skills / ratacat / claude-skills / data-systems-architecture

data-systems-architecture skill

safe

This skill helps you design data systems for performance and scalability by guiding schema choices, indexing, OLTP/OLAP trade-offs, and ORM/N+1 handling.

npx playbooks add skill ratacat/claude-skills --skill data-systems-architecture

Review the files below or copy the command above to add this skill to your agents.

Files (8)

SKILL.md

5.3 KB

---
name: data-systems-architecture
description: Use when designing databases for data-heavy applications, making schema decisions for performance, choosing between normalization and denormalization, selecting storage/indexing strategies, planning for scale, or evaluating OLTP vs OLAP trade-offs. Also use when encountering N+1 queries, ORM issues, or concurrency problems.
---

# Data Systems Architecture

## Overview

**Core principle:** Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.

This skill synthesizes knowledge from three foundational texts:
- *Designing Data-Intensive Applications* (Kleppmann) - distributed systems, storage engines, scaling
- *The Art of PostgreSQL* (Fontaine) - PostgreSQL-specific patterns, SQL as programming
- *PostgreSQL Query Optimization* (Dombrovskaya et al.) - execution plans, performance tuning

## When to Use

| Symptom | Start With |
|---------|------------|
| Designing a new database/schema | `01-foundational-principles.md` |
| Normalization vs denormalization decisions | `02-data-modeling.md` |
| Need to understand OLTP vs OLAP | `03-storage-engines.md` |
| Slow queries, index selection | `04-indexing.md` |
| Planning for growth, read replicas | `05-scaling-patterns.md` |
| Race conditions, deadlocks, isolation issues | `06-transactions-concurrency.md` |
| N+1 queries, ORM problems, application integration | `07-application-integration.md` |

## Navigation

### Reference Files (Load as needed)

```
01-foundational-principles.md    - Reliability/Scalability/Maintainability, load parameters
02-data-modeling.md              - Normalization, denormalization, schema design patterns
03-storage-engines.md            - B-trees, LSM-trees, OLTP vs OLAP, PostgreSQL internals
04-indexing.md                   - Index types, compound indexes, covering indexes, maintenance
05-scaling-patterns.md           - Replication, partitioning, sharding strategies
06-transactions-concurrency.md   - ACID, isolation levels, MVCC, locking patterns
07-application-integration.md    - ORM pitfalls, N+1, business logic placement, batch processing
```

## Quick Decision Framework

```
New system design?
├─ Yes → Read 01, then 02 for data model
└─ No → What's the problem?
         ├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns)
         ├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions)
         ├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP)
         ├─ "App makes too many queries" → Read 07 (N+1, ORM patterns)
         └─ "Race conditions/deadlocks" → Read 06 (concurrency)
```

## Core Concepts (Quick Reference)

### The Three Pillars

| Concern | Definition | Key Question |
|---------|------------|--------------|
| **Reliability** | System works correctly under faults | What happens when things fail? |
| **Scalability** | Handles growth gracefully | What's 10x load look like? |
| **Maintainability** | Easy to operate and evolve | Can new engineers understand this? |

### Data Model Selection

| Model | Best For | Avoid When |
|-------|----------|------------|
| **Relational** | Many-to-many relationships, joins, consistency | Highly hierarchical data, constant schema changes |
| **Document** | Self-contained docs, tree structures | Need for joins, many-to-many |
| **Graph** | Highly connected data, recursive queries | Simple CRUD, no relationship traversal |

### OLTP vs OLAP

| Aspect | OLTP | OLAP |
|--------|------|------|
| **Query pattern** | Point lookups, few rows | Aggregates, many rows |
| **Optimization** | Index everything used in WHERE | Fewer indexes, full scans OK |
| **Storage** | Row-oriented | Consider column-oriented |

### Index Type Quick Reference

| Type | Use Case | PostgreSQL |
|------|----------|------------|
| **B-tree** | Equality, range, sorting | Default, most queries |
| **Hash** | Equality only | Faster for exact match |
| **GIN** | Arrays, JSONB, full-text | `@>`, `@@` operators |
| **GiST** | Geometric, range types | PostGIS, nearest-neighbor |
| **BRIN** | Large, naturally ordered tables | Time-series data |

### Isolation Levels

| Level | Prevents | PostgreSQL Default? |
|-------|----------|-------------------|
| **Read Committed** | Dirty reads | Yes |
| **Repeatable Read** | + Non-repeatable reads | No |
| **Serializable** | All anomalies | No (uses SSI) |

## Design Checklist

Before finalizing a data architecture:

- [ ] Identified load parameters (read/write ratio, data volume, latency requirements)
- [ ] Chose appropriate data model (relational/document/graph hybrid?)
- [ ] Normalized to 3NF first, denormalized only with measured justification
- [ ] Designed indexes for actual query patterns (not hypothetical)
- [ ] Considered 10x growth scenario
- [ ] Established isolation level requirements
- [ ] Defined where business logic lives (app vs DB vs both)
- [ ] Planned for operations (backups, monitoring, migrations)

## References

- Kleppmann, M. *Designing Data-Intensive Applications* (O'Reilly, 2017)
- Fontaine, D. *The Art of PostgreSQL* (2nd ed., 2020)
- Dombrovskaya, H., Novikov, B., Bailliekova, A. *PostgreSQL Query Optimization* (Apress, 2021)

Overview

This skill helps design and evaluate data systems for data-heavy applications, guiding schema choices, indexing, storage, and scaling decisions. It emphasizes trade-offs between reliability, scalability, and maintainability so you can make pragmatic choices for production systems. Use it to diagnose query performance, concurrency issues, and ORM integration problems.

How this skill works

The skill inspects workload characteristics (read/write ratio, query shapes, growth expectations) and maps them to concrete architecture patterns: relational vs document vs graph models, OLTP vs OLAP storage choices, and appropriate index types. It provides prescriptive guidance on normalization vs denormalization, transaction isolation and locking, replication/partitioning strategies, and remedies for common application-layer problems like N+1 queries and ORM inefficiencies.

When to use it

Designing a new database/schema for a data-heavy app
Choosing between normalization and denormalization for performance
Deciding OLTP vs OLAP and suitable storage/indexing engines
Diagnosing slow queries and selecting or redesigning indexes
Planning scaling: replication, partitioning, sharding, and 10x growth
Resolving concurrency issues, deadlocks, or ORM N+1 query problems

Best practices

Start by measuring: capture query patterns, latencies, and growth projections before changing schema
Normalize to 3NF first; denormalize selectively with measurable performance gains and clear maintenance plans
Design indexes for real, high-impact queries and monitor their maintenance costs
Choose storage and index types based on access patterns (row vs column, B-tree vs GIN/BRIN)
Define isolation level requirements explicitly and prefer optimistic patterns where possible to reduce contention
Plan for operations: backups, monitoring, migration paths, and runbooks before launch

Example use cases

Designing a primary schema for a transactional service with many joins and strict consistency
Refactoring slow report queries by choosing columnar storage or materialized views for OLAP workloads
Resolving an N+1 problem by introducing batch loading and targeted joins or query builders in the ORM layer
Choosing partitioning and replica strategies to handle a 10x increase in write throughput
Tuning PostgreSQL: selecting B-tree/GIN/BRIN indexes and adjusting vacuum/maintenance for large time-series data

FAQ

When should I denormalize a schema?

Denormalize only after measuring that normalized queries are a performance bottleneck and when the increased write/maintenance cost is acceptable; prefer materialized views or caching if suitable.

How do I decide between replication and sharding?

Start with replication for read scaling and high availability; choose sharding when single-node capacity limits (storage/IO) are reached or write throughput requires horizontal partitioning.