home / skills / altinity / skills / altinity-expert-clickhouse-schema

This skill analyzes ClickHouse table designs, partitions, ORDER BY keys, and materialized views to identify anti-patterns and optimization opportunities.

npx playbooks add skill altinity/skills --skill altinity-expert-clickhouse-schema

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
3.8 KB
---
name: altinity-expert-clickhouse-schema
description: Analyze ClickHouse table structure, partitioning, ORDER BY keys, materialized views, and identify schema design anti-patterns. Use for table design issues and optimization.
---

# Table Schema and Design Analysis

Analyze table structure, partitioning, ORDER BY, materialized views, and identify design anti-patterns.

---

## Diagnostics

Run all queries from the file checks.sql and analyze the results.

---

## Deep Dive Queries (Placeholder-Based)

### Partition Distribution for Specific Table

```sql
select
    database,
    table,
    count() as partitions,
    sum(rows) as total_rows,
    formatReadableSize(sum(bytes_on_disk)) as total_size,
    formatReadableSize(median(bytes_on_disk)) as median_partition_size,
    min(partition) as oldest_partition,
    max(partition) as newest_partition
from system.parts
where active and database = '{database}' and table = '{table}'
group by database, table, partition
order by partition desc
limit 100
```

### Column Compression Analysis for Specific Table

```sql
select
    name,
    type,
    formatReadableSize(data_compressed_bytes) as compressed,
    formatReadableSize(data_uncompressed_bytes) as uncompressed,
    round(data_uncompressed_bytes / nullIf(data_compressed_bytes, 0), 2) as ratio,
    compression_codec
from system.columns
where database = '{database}' and table = '{table}'
order by data_compressed_bytes desc
limit 50
```

**Look for:**
- Columns with ratio < 2 → consider better codec or data transformation
- Large columns without codec → add CODEC(ZSTD) or LZ4HC
- String columns with low cardinality → consider LowCardinality(String)

### Index Usage Analysis for Specific Database

```sql
select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}'
order by database, table
```

---

## Schema Design Recommendations

### Partition Key Guidelines

| Data Volume | Recommended Granularity | Example |
|-------------|------------------------|---------|
| < 10GB/month | No partitioning or yearly | `toYear(ts)` |
| 10-100GB/month | Monthly | `toYYYYMM(ts)` |
| 100GB-1TB/month | Weekly or daily | `toMonday(ts)` |
| > 1TB/month | Daily | `toDate(ts)` |

### ORDER BY Guidelines

1. **First column**: Low cardinality, frequently filtered (e.g., `tenant_id`, `region`)
2. **Second column**: Time-based if range queries common
3. **Subsequent**: Other filter columns by selectivity (most selective last)

**Anti-patterns:**
- UUID/hash as first column
- High-cardinality ID without tenant prefix
- DateTime64 with microseconds as first column

### Compression Codec Recommendations

| Data Type | Recommended Codec |
|-----------|-------------------|
| Integers (sequential) | `Delta, ZSTD` |
| Integers (random) | `ZSTD` or `LZ4HC` |
| Floats | `Gorilla, ZSTD` |
| Timestamps | `DoubleDelta, ZSTD` |
| Strings (long) | `ZSTD(3)` |
| Strings (repetitive) | `LowCardinality` + `ZSTD` |

---

## Cross-Module Triggers

| Finding | Load Module | Reason |
|---------|-------------|--------|
| Many small partitions | `altinity-expert-clickhouse-ingestion` | Check batch sizing |
| Oversized partitions | `altinity-expert-clickhouse-merges` | Merge can't complete |
| High PK memory | `altinity-expert-clickhouse-memory` | Memory pressure |
| MV performance issues | `altinity-expert-clickhouse-reporting` | Query analysis |
| Too many parts per partition | `altinity-expert-clickhouse-merges` | Merge backlog |

---

## Settings Reference

| Setting | Default | Recommendation |
|---------|---------|----------------|
| `index_granularity` | 8192 | Lower for point lookups, higher for scans |
| `ttl_only_drop_parts` | 0 | Set to 1 if TTL deletes entire partitions |
| `min_bytes_for_wide_part` | 10MB | Increase if many small parts |

Overview

This skill analyzes ClickHouse table schemas to find partitioning, ORDER BY, compression, and materialized view issues and surface schema-design anti-patterns. It runs diagnostic queries against system tables and provides practical recommendations to improve query performance, storage efficiency, and merge behavior. Use it to quickly audit table design and prioritize fixes.

How this skill works

The skill executes a set of diagnostic SQL checks against system.parts, system.columns, and system.data_skipping_indices to collect partition counts, partition sizes, column compression stats, and index definitions. It highlights problematic patterns such as many small partitions, oversized partitions, poor compression ratios, missing codecs, suboptimal ORDER BY keys, and risky index configurations. It maps findings to concrete recommendations and cross-module triggers for upstream issues like ingestion or merge settings.

When to use it

  • After onboarding new datasets to validate partitioning and ORDER BY choices
  • When query latency or memory usage spikes and you suspect schema causes
  • During capacity planning to tune partition granularity and compression
  • If merges are failing or there are unusually many parts per partition
  • Before creating large materialized views or changing primary keys

Best practices

  • Choose partition granularity based on monthly ingest: none/yearly to daily for >1TB/month
  • Make the first ORDER BY column low-cardinality and frequently filtered (tenant_id, region)
  • Use time as the second ORDER BY key when range queries are common
  • Apply codecs per data shape: Delta/DoubleDelta for sequential integers/timestamps, ZSTD for general use
  • Convert repetitive string columns to LowCardinality(String) to reduce storage and memory
  • Monitor partition sizes and avoid many tiny partitions or very large partitions that block merges

Example use cases

  • Audit a slow reporting table to determine if ORDER BY or missing indices cause scans
  • Inspect a high-ingest table to recommend partitioning (daily, weekly, monthly) based on bytes/month
  • Find columns with poor compression ratio and suggest codecs or data transforms
  • Detect too many parts per partition and point to ingestion batch sizing or merge settings
  • Validate materialized views for design anti-patterns that cause heavy refresh or memory pressure

FAQ

How do you choose partition granularity?

Pick granularity by ingest volume: no/yearly for <10GB/month, monthly for 10–100GB, weekly/daily for 100GB–1TB, and daily for >1TB.

When should I use LowCardinality(String)?

Use it for string columns with low distinct values relative to rows to reduce storage and improve query performance.