home / skills / obsessiondb / clickhouse-skills / clickhouse-schema-design

clickhouse-schema-design skill

Q: Will changing ORDER BY always require table recreation?

Yes: changing the ORDER BY of an existing MergeTree requires creating a new table with the desired ORDER BY and migrating data; test and confirm before switching production tables.

Q: How do I choose partition granularity?

Target partition sizes in the recommended ranges: general tables 1–300 GB, Summing/Replacing 400 MB–40 GB. Use monthly partitions for most analytics, daily only for very high volume (>1 TB/month).

/skills/clickhouse-schema-design

This skill helps you design ClickHouse schemas with optimized ORDER BY, partitioning, and data types to achieve sub-second queries and strong compression.

npx playbooks add skill obsessiondb/clickhouse-skills --skill clickhouse-schema-design

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

11.9 KB

---
name: clickhouse-schema-design
description: "15+ schema rules for ClickHouse tables. ALWAYS LOAD when creating or modifying tables. Achieves sub-second queries, 10x compression, and automated data lifecycle."
---

# ClickHouse Schema Design

**ALWAYS LOAD** when creating or modifying ClickHouse tables.

## Goals

- Sub-second query response on billion-row tables
- 10x+ compression ratios vs raw data
- Zero-maintenance data retention and tiered storage
- Queries that scan <1% of data instead of full table scans

## Reference Documentation

- [MergeTree Engine](https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree)
- [Primary Keys and Indexes](https://clickhouse.com/docs/optimize/sparse-primary-indexes)
- [Data Types](https://clickhouse.com/docs/sql-reference/data-types)
- [TTL](https://clickhouse.com/docs/sql-reference/statements/alter/ttl)
- [Altinity: Pick Keys Guide](https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/)

**Search terms:** schema design, ORDER BY, PRIMARY KEY, PARTITION BY, data types, TTL, index_granularity

## Critical Rules

### [CRITICAL]

1. **ORDER BY determines query performance.** Put most-filtered columns first, low cardinality before high.
2. **PARTITION BY is for data management, not query speed.** Use ORDER BY to speed up queries.
3. **Use smallest data types possible.** UInt32 not UInt64 if values fit.

### [HIGH]

4. **LowCardinality for strings with <10K unique values.** Saves storage and speeds queries.
5. **Avoid Nullable when possible.** Use DEFAULT values instead.
6. **PRIMARY KEY can be a prefix of ORDER BY.** Reduces index size.

### [MEDIUM]

7. **3-5 columns in ORDER BY is typical.** More columns rarely help, fewer may miss optimization opportunities.

## Engine Selection

| Engine                 | Use Case                            | Key Behavior                                                |
| ---------------------- | ----------------------------------- | ----------------------------------------------------------- |
| `MergeTree`            | Append-only data (events, logs)     | Standard storage, no special merge logic                    |
| `ReplacingMergeTree`   | Deduplication, upserts              | Keeps latest row per ORDER BY key; query with `argMax` pattern |
| `SummingMergeTree`     | Pre-aggregated counters             | Automatically sums numeric columns on merge                 |
| `AggregatingMergeTree` | Complex aggregates (uniq, quantile) | Stores intermediate states; use `-State`/`-Merge` functions |
| `CollapsingMergeTree`  | Mutable rows                        | Uses +1/-1 sign column to cancel/update rows                |

**For engine examples and Materialized Views, see `clickhouse-materialized-views` skill.**

## ORDER BY Selection

### The 3-5 Column Rule

```sql
ORDER BY (
    tenant_id,      -- 1. Lowest cardinality, most filtered
    event_date,     -- 2. Time component (common filter)
    user_id,        -- 3. Medium cardinality
    event_type      -- 4. Higher cardinality (optional)
)
```

### Decision Process

1. **Which columns appear in WHERE clauses?** - These go first
2. **What's the cardinality?** - Lower cardinality columns before higher
3. **Is there a time dimension?** - Usually included for range queries

### Good vs Bad

```sql
-- BAD: High cardinality first, rarely-filtered columns
ORDER BY (user_id, timestamp, tenant_id)

-- GOOD: Low cardinality first, commonly-filtered columns
ORDER BY (tenant_id, toDate(timestamp), user_id)
```

### PRIMARY KEY as Prefix

```sql
-- Full sorting key for data layout
ORDER BY (tenant_id, event_date, user_id, event_type)

-- Shorter primary key for smaller index (optional)
PRIMARY KEY (tenant_id, event_date)
```

## PARTITION BY Guidelines

### Size Targets

| Scenario                              | Target Partition Size |
| ------------------------------------- | --------------------- |
| General MergeTree tables              | 1-300 GB              |
| SummingMergeTree / ReplacingMergeTree | 400 MB - 40 GB        |
| Small tables (<5 GB total)            | No partitioning       |

### Common Patterns

```sql
-- Monthly (most common for analytics)
PARTITION BY toYYYYMM(event_date)

-- Daily (high volume, >1TB/month)
PARTITION BY toDate(event_date)

-- Multi-tenant with time
PARTITION BY (tenant_id, toYYYYMM(event_date))

-- No partitioning (small tables)
-- Simply omit PARTITION BY clause
```

### Anti-Pattern

```sql
-- BAD: Over-partitioning creates thousands of small parts
PARTITION BY (toDate(event_date), user_id)

-- BAD: Partitioning by high-cardinality column
PARTITION BY user_id
```

## Data Type Optimization

### Numbers

```sql
-- Use smallest type that fits your data
count UInt16,           -- Max 65,535 (instead of UInt64)
percentage Float32,     -- Instead of Float64 if 6-7 digits precision is enough
flags UInt8,            -- For small integers, booleans
```

### Strings

```sql
-- LowCardinality for <10K unique values
country LowCardinality(String),
status LowCardinality(String),
event_type LowCardinality(String),

-- Regular String for high cardinality
user_agent String,
url String,
```

### Dates and Times

```sql
-- Use simplest type that meets requirements
event_date Date,                    -- If you only need date
event_time DateTime,                -- If you need seconds
event_time_precise DateTime64(3),   -- Only if you need milliseconds
```

### Avoiding Nullable

```sql
-- BAD: Nullable adds overhead and complexity
user_id Nullable(UInt64),

-- GOOD: Use DEFAULT for missing values
user_id UInt64 DEFAULT 0,

-- GOOD: Use empty string for missing text
name String DEFAULT '',
```

## TTL Configuration

### Delete Old Data

```sql
CREATE TABLE events (
    event_date Date,
    ...
) ENGINE = MergeTree()
ORDER BY (...)
TTL event_date + INTERVAL 90 DAY DELETE;
```

### Tiered Storage

```sql
CREATE TABLE events (
    event_date Date,
    ...
) ENGINE = MergeTree()
ORDER BY (...)
TTL
    event_date + INTERVAL 7 DAY TO VOLUME 'hot',
    event_date + INTERVAL 30 DAY TO VOLUME 'warm',
    event_date + INTERVAL 365 DAY DELETE;
```

### Column-Level TTL

```sql
CREATE TABLE events (
    event_date Date,
    user_id UInt64,
    -- Delete PII after 30 days, keep aggregated data
    email String TTL event_date + INTERVAL 30 DAY,
    ip_address String TTL event_date + INTERVAL 30 DAY
) ENGINE = MergeTree()
ORDER BY (event_date, user_id);
```

## Complete Example

```sql
CREATE TABLE analytics_events (
    -- Time dimension
    event_date Date,
    event_time DateTime,

    -- Identifiers (low to high cardinality)
    tenant_id UInt32,
    user_id UInt64,
    session_id String,

    -- Categorical data (use LowCardinality)
    event_type LowCardinality(String),
    country LowCardinality(String),
    device_type LowCardinality(String),

    -- Metrics
    duration_ms UInt32,

    -- Flexible data
    properties String,  -- JSON as string

    -- Skip indices for secondary lookups
    INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 3,
    INDEX idx_session session_id TYPE bloom_filter(0.01) GRANULARITY 3
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (tenant_id, event_date, user_id)
TTL event_date + INTERVAL 12 MONTH DELETE
SETTINGS index_granularity = 8192;
```

## Anti-Patterns

| Anti-Pattern                               | Problem                            | Solution                               |
| ------------------------------------------ | ---------------------------------- | -------------------------------------- |
| `ORDER BY (uuid, timestamp)`               | High cardinality first             | Put low cardinality columns first      |
| `PARTITION BY toDate(ts)` for small tables | Too many small partitions          | Omit partitioning or use monthly       |
| `Nullable(UInt64)` everywhere              | Storage and query overhead         | Use DEFAULT values                     |
| `String` for status codes                  | Wastes space                       | Use `LowCardinality(String)` or `Enum` |
| `DateTime64(9)` always                     | Nanosecond precision rarely needed | Use `DateTime` or `DateTime64(3)`      |
| Putting timestamp first in ORDER BY        | Poor compression and filtering     | Put categorical columns first          |

## Verification Queries

```sql
-- Check table size and compression
SELECT
    table,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    sum(rows) AS rows,
    round(sum(bytes_on_disk) / sum(rows), 2) AS bytes_per_row
FROM system.parts
WHERE active AND database = 'default'
GROUP BY table;

-- Check partition sizes
SELECT
    partition,
    count() AS parts,
    sum(rows) AS rows,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active AND table = 'your_table'
GROUP BY partition
ORDER BY partition;

-- Verify column types
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'default' AND table = 'your_table';
```

## Troubleshooting

**Always ask for user confirmation before applying schema changes (ALTER TABLE, recreating tables).**

### "Too Many Parts" Error

**Problem:** `DB::Exception: Too many parts`, inserts rejected, merge queue growing

**Diagnose:**

```sql
SELECT table, partition, count() AS parts
FROM system.parts
WHERE active AND database = 'default'
GROUP BY table, partition
HAVING parts > 300
ORDER BY parts DESC;
```

**Solutions:**

| Cause | Fix |
|-------|-----|
| Over-partitioning (daily + high cardinality) | Use monthly partitions: `PARTITION BY toYYYYMM(date)` |
| Too many small inserts | Batch inserts: 1000+ rows per INSERT |
| High-cardinality partition key | Remove high-cardinality columns from PARTITION BY |

```sql
-- Fix: Change from daily to monthly partitioning (requires table recreation)
CREATE TABLE events_new (...) PARTITION BY toYYYYMM(event_date) ...;
INSERT INTO events_new SELECT * FROM events;
RENAME TABLE events TO events_old, events_new TO events;
DROP TABLE events_old;
```

### Poor Compression (<3x Ratio)

**Problem:** Table using more disk than expected, compression ratio below 3x

**Diagnose:**

```sql
SELECT
    column,
    type,
    formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
    round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 2) AS ratio
FROM system.parts_columns
WHERE active AND table = 'your_table'
GROUP BY column, type
ORDER BY ratio ASC;
```

**Solutions:**

| Cause | Fix |
|-------|-----|
| High-cardinality column first in ORDER BY | Reorder: low cardinality columns first |
| String for low-cardinality data | Use `LowCardinality(String)` |
| Wrong codec for data pattern | Use DoubleDelta for timestamps, Gorilla for floats |
| Random UUIDs in ORDER BY | Move UUID later in ORDER BY, or use different key |

```sql
-- Check cardinality to decide LowCardinality usage
SELECT uniq(status) FROM events;  -- If <10K, use LowCardinality

-- Fix column type (requires recreation or new column)
ALTER TABLE events ADD COLUMN status_new LowCardinality(String);
ALTER TABLE events UPDATE status_new = status WHERE 1;
-- Then migrate queries to use status_new
```

### Slow Queries Despite Good Schema

**Problem:** Queries slow even with proper ORDER BY and partitioning

**Diagnose:**

```sql
EXPLAIN indexes = 1 SELECT ... FROM your_table WHERE ...;
-- Check: Are granules being skipped? Is partition pruning happening?
```

**Solutions:**

| Cause | Fix |
|-------|-----|
| Query doesn't filter on ORDER BY prefix | Add ORDER BY columns to WHERE clause |
| Function on filter column | Store computed column, filter on that |
| Missing skip index for secondary lookups | Add bloom_filter index |
| Selecting too many columns | Select only needed columns |

```sql
-- Example: Table ORDER BY (tenant_id, event_date, user_id)

-- BAD: Skips ORDER BY prefix
SELECT * FROM events WHERE user_id = 123;

-- GOOD: Include prefix
SELECT * FROM events WHERE tenant_id = 1 AND event_date = today() AND user_id = 123;

-- ALT: Add skip index for direct user_id lookups
ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 4;
```

Overview

This skill provides a compact set of 15+ actionable schema rules and patterns for ClickHouse table design. It is intended to be loaded whenever creating or modifying tables to achieve sub-second queries, strong compression, and automated data lifecycle. The guidance focuses on ORDER BY, PARTITION BY, data types, TTL, and verification/troubleshooting steps.

How this skill works

The skill inspects schema choices and recommends concrete changes: ideal ORDER BY ordering, partitioning strategy, smallest-fit data types, LowCardinality usage, and TTL rules for tiered storage or deletion. It also suggests engine selection (MergeTree variants), skip indexes for secondary lookups, and provides verification queries and remediation steps for common problems like too many parts or poor compression. Use it as a checklist before applying ALTER/CREATE operations.

When to use it

Always when creating new ClickHouse tables
Before altering table schema or changing ORDER BY/PARTITION BY
When queries on large tables are slower than expected
When compression ratios or storage growth are concerning
When implementing automated retention or tiered storage with TTL

Best practices

Choose ORDER BY columns in the order most commonly filtered: low-cardinality first, then time, then higher-cardinality keys
Use PARTITION BY for management (partition size targets), not primary filtering; prefer monthly partitions for analytics
Pick the smallest numeric and time types that fit needs; avoid nullable where possible and prefer DEFAULTs
Use LowCardinality(String) for categories with <10k uniques and enums where appropriate
Limit ORDER BY to 3–5 meaningful columns; make PRIMARY KEY a prefix of ORDER BY to shrink index
Configure TTLs for deletion and tiered storage and verify with system tables before applying changes

Example use cases

High-volume event table: ORDER BY (tenant_id, event_date, user_id), PARTITION BY toYYYYMM(event_date), TTL 12 months DELETE
Multi-tenant analytics: partition by (tenant_id, toYYYYMM(date)) to isolate tenant data and meet partition size targets
Dedup/upsert workload: use ReplacingMergeTree with deduplication key and materialized views for ingestion
PII minimization: column-level TTL to delete email/ip after 30 days while retaining aggregated metrics
Secondary lookups: add bloom_filter skip index on session_id or user_id for occasional direct lookups

FAQ

Will changing ORDER BY always require table recreation?

Yes: changing the ORDER BY of an existing MergeTree requires creating a new table with the desired ORDER BY and migrating data; test and confirm before switching production tables.

How do I choose partition granularity?

Target partition sizes in the recommended ranges: general tables 1–300 GB, Summing/Replacing 400 MB–40 GB. Use monthly partitions for most analytics, daily only for very high volume (>1 TB/month).