home / skills / obsessiondb / clickhouse-skills / clickhouse-materialized-views
This skill helps you design and manage ClickHouse materialized views for real-time aggregation and data transformation, optimizing queries and avoiding common
npx playbooks add skill obsessiondb/clickhouse-skills --skill clickhouse-materialized-viewsReview the files below or copy the command above to add this skill to your agents.
---
name: clickhouse-materialized-views
description: "10+ patterns for ClickHouse Materialized Views. Load when creating MVs for real-time aggregation, data transformation, or query optimization. Covers SummingMergeTree, AggregatingMergeTree, and common pitfalls."
---
# ClickHouse Materialized Views
Load when creating Materialized Views for real-time aggregation, ETL pipelines, or query optimization.
**Prerequisite:** Understand MergeTree engine variants in `clickhouse-schema-design` skill.
## Reference Documentation
- [Materialized Views](https://clickhouse.com/docs/materialized-view)
- [AggregatingMergeTree](https://clickhouse.com/docs/engines/table-engines/mergetree-family/aggregatingmergetree)
- [Altinity: MV Best Practices](https://kb.altinity.com/altinity-kb-schema-design/materialized-views/)
**Search terms:** materialized view, MV, real-time aggregation, AggregateFunction, -State, -Merge, SummingMergeTree, pre-aggregation, incremental
## Critical Rules
### [CRITICAL]
1. **MVs are triggers, not caches.** They process INSERT data, not query results.
2. **Use correct engine.** AggregatingMergeTree for complex aggregates, SummingMergeTree for simple counters.
3. **Query with -Merge functions or argMax.** Aggregation completes at query time, not insert time.
### [HIGH]
4. **MV sees INSERT only.** No backfill; existing data must be inserted manually.
5. **ORDER BY in target must match GROUP BY in MV.** Otherwise aggregation won't work properly.
### [MEDIUM]
6. **Avoid too many MVs on one source table.** Each MV adds overhead to every INSERT.
## How Materialized Views Work
```
Source Table ──INSERT──► MV Transform ──► Target Table
│
└─ MV executes SELECT for each inserted block
```
**Key insight:** The MV's SELECT query runs on each INSERT batch. Results go to the target table. The MV does NOT query historical data.
## Pattern 1: Real-Time Counters (SummingMergeTree)
Best for simple sums and counts that need real-time updates.
```sql
-- Source: raw events
CREATE TABLE events (
event_time DateTime,
tenant_id UInt32,
event_type String,
user_id UInt64
) ENGINE = MergeTree()
ORDER BY (tenant_id, event_time);
-- Target: daily counters
CREATE TABLE daily_event_counts (
date Date,
tenant_id UInt32,
event_type LowCardinality(String),
event_count UInt64,
unique_users UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (tenant_id, date, event_type);
-- MV: transform inserts
CREATE MATERIALIZED VIEW daily_event_counts_mv
TO daily_event_counts AS
SELECT
toDate(event_time) AS date,
tenant_id,
event_type,
count() AS event_count,
uniq(user_id) AS unique_users -- WARNING: Not additive!
FROM events
GROUP BY date, tenant_id, event_type;
```
**Warning:** `uniq()` in SummingMergeTree is not accurate—sums don't equal unique counts. Use AggregatingMergeTree for unique counts.
## Pattern 2: Complex Aggregates (AggregatingMergeTree)
For accurate uniq, quantiles, or any non-additive aggregate.
```sql
-- Target: uses AggregateFunction types
CREATE TABLE user_metrics_agg (
date Date,
tenant_id UInt32,
total_events AggregateFunction(sum, UInt64),
unique_users AggregateFunction(uniq, UInt64),
p95_duration AggregateFunction(quantile(0.95), Float32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date);
-- MV: use -State functions
CREATE MATERIALIZED VIEW user_metrics_mv
TO user_metrics_agg AS
SELECT
toDate(event_time) AS date,
tenant_id,
sumState(1) AS total_events,
uniqState(user_id) AS unique_users,
quantileState(0.95)(duration_ms) AS p95_duration
FROM events
GROUP BY date, tenant_id;
-- Query: use -Merge functions
SELECT
date,
tenant_id,
sumMerge(total_events) AS events,
uniqMerge(unique_users) AS users,
quantileMerge(0.95)(p95_duration) AS p95
FROM user_metrics_agg
WHERE tenant_id = 1
GROUP BY date, tenant_id;
```
**Key pattern:** `-State` to insert, `-Merge` to query.
## Pattern 3: Data Transformation Pipeline
Transform/enrich data as it arrives.
```sql
-- Source: raw JSON logs
CREATE TABLE raw_logs (
timestamp DateTime,
raw_json String
) ENGINE = MergeTree()
ORDER BY timestamp;
-- Target: parsed structured data
CREATE TABLE parsed_logs (
timestamp DateTime,
level LowCardinality(String),
service LowCardinality(String),
message String,
trace_id String
)
ENGINE = MergeTree()
ORDER BY (service, timestamp);
-- MV: parse JSON on insert
CREATE MATERIALIZED VIEW parsed_logs_mv
TO parsed_logs AS
SELECT
timestamp,
JSONExtractString(raw_json, 'level') AS level,
JSONExtractString(raw_json, 'service') AS service,
JSONExtractString(raw_json, 'message') AS message,
JSONExtractString(raw_json, 'trace_id') AS trace_id
FROM raw_logs;
```
## Pattern 4: Last Value Tracking (ReplacingMergeTree)
Track latest state per entity.
```sql
-- Target: latest user state
CREATE TABLE user_latest_state (
user_id UInt64,
last_seen DateTime,
last_action LowCardinality(String),
total_actions UInt64
)
ENGINE = ReplacingMergeTree(last_seen)
ORDER BY user_id;
-- MV: update on each event
CREATE MATERIALIZED VIEW user_state_mv
TO user_latest_state AS
SELECT
user_id,
max(event_time) AS last_seen,
argMax(event_type, event_time) AS last_action,
count() AS total_actions
FROM events
GROUP BY user_id;
```
**Query with argMax pattern (avoid FINAL on large tables):**
```sql
SELECT
user_id,
argMax(last_seen, last_seen) AS last_seen,
argMax(last_action, last_seen) AS last_action,
argMax(total_actions, last_seen) AS total_actions
FROM user_latest_state
WHERE user_id = 123
GROUP BY user_id;
```
## Backfilling Existing Data
MVs don't process existing data. Backfill manually:
```sql
-- Insert historical data into target table
INSERT INTO daily_event_counts
SELECT
toDate(event_time) AS date,
tenant_id,
event_type,
count() AS event_count,
uniq(user_id) AS unique_users
FROM events
WHERE event_time < '2024-01-01' -- Before MV was created
GROUP BY date, tenant_id, event_type;
```
## MV Management
### Check MV Status
```sql
-- List all MVs
SELECT name, engine, create_table_query
FROM system.tables
WHERE engine = 'MaterializedView';
-- Check target table size
SELECT
table,
formatReadableSize(sum(bytes_on_disk)) AS size,
sum(rows) AS rows
FROM system.parts
WHERE active AND table = 'daily_event_counts'
GROUP BY table;
```
### Pause/Resume MV
```sql
-- Pause (stop processing inserts)
ALTER TABLE events DETACH MATERIALIZED VIEW daily_event_counts_mv;
-- Resume
ALTER TABLE events ATTACH MATERIALIZED VIEW daily_event_counts_mv;
```
### Modify MV
MVs cannot be altered. Drop and recreate:
```sql
DROP VIEW daily_event_counts_mv;
-- Optionally truncate target: TRUNCATE TABLE daily_event_counts;
CREATE MATERIALIZED VIEW daily_event_counts_mv TO daily_event_counts AS ...;
-- Backfill if needed
```
## Common Pitfalls
| Pitfall | Problem | Solution |
|---------|---------|----------|
| Using `uniq()` with SummingMergeTree | Sums don't equal uniques | Use AggregatingMergeTree with `uniqState/uniqMerge` |
| Forgetting argMax or -Merge | Incomplete aggregation results | Use `argMax` pattern for Replacing/Collapsing, `-Merge` for Aggregating |
| No backfill after MV creation | Missing historical data | Manually INSERT historical aggregates |
| MV on wrong table | Inserts to wrong source ignored | Ensure MV is on the table receiving INSERTs |
| Too many MVs on one source | Slow inserts | Consider fewer MVs or async processing |
## Decision Tree
```
Need to aggregate data at query time?
│
├─ Yes, and data changes frequently → Query raw data
│
├─ Yes, but queries are slow → Create MV
│ │
│ ├─ Simple sums/counts only?
│ │ └─ SummingMergeTree
│ │
│ ├─ Need uniq, quantile, or complex aggregates?
│ │ └─ AggregatingMergeTree with -State/-Merge
│ │
│ └─ Need latest value per key?
│ └─ ReplacingMergeTree with argMax pattern
│
└─ Need to transform/parse data on insert?
└─ MV with regular MergeTree target
```
## Complete Example: Multi-Level Aggregation
```sql
-- Level 1: Raw events (source)
CREATE TABLE events (...) ENGINE = MergeTree() ORDER BY ...;
-- Level 2: Hourly aggregates
CREATE TABLE hourly_stats (
hour DateTime,
tenant_id UInt32,
events AggregateFunction(sum, UInt64),
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() ORDER BY (tenant_id, hour);
CREATE MATERIALIZED VIEW hourly_mv TO hourly_stats AS
SELECT
toStartOfHour(event_time) AS hour,
tenant_id,
sumState(1) AS events,
uniqState(user_id) AS users
FROM events GROUP BY hour, tenant_id;
-- Level 3: Daily aggregates (from hourly)
CREATE TABLE daily_stats (
date Date,
tenant_id UInt32,
events AggregateFunction(sum, UInt64),
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree() ORDER BY (tenant_id, date);
CREATE MATERIALIZED VIEW daily_mv TO daily_stats AS
SELECT
toDate(hour) AS date,
tenant_id,
sumMergeState(events) AS events, -- Merge then re-State
uniqMergeState(users) AS users
FROM hourly_stats GROUP BY date, tenant_id;
```
**Query any level:**
```sql
-- Fast daily query
SELECT date, sumMerge(events), uniqMerge(users)
FROM daily_stats WHERE tenant_id = 1 GROUP BY date;
-- Drill down to hourly
SELECT hour, sumMerge(events), uniqMerge(users)
FROM hourly_stats WHERE tenant_id = 1 AND toDate(hour) = today() GROUP BY hour;
```
## Troubleshooting
**Always ask for user confirmation before creating/modifying MVs or target tables.**
### Wrong Aggregation Results
**Problem:** Counts are too high, uniq values don't match raw data, aggregates seem doubled
**Diagnose:**
```sql
-- Compare MV result vs raw query
SELECT count() FROM target_table;
SELECT count() FROM source_table WHERE <same_filters>;
-- Check for duplicate keys in target
SELECT date, tenant_id, count() AS rows
FROM target_table
GROUP BY date, tenant_id
HAVING rows > 1;
```
**Solutions:**
| Cause | Fix |
|-------|-----|
| Using `uniq()` with SummingMergeTree | Switch to AggregatingMergeTree with `uniqState`/`uniqMerge` |
| Forgetting `-Merge` in query | Always use `sumMerge()`, `uniqMerge()` for AggregatingMergeTree |
| Forgetting `argMax` for ReplacingMergeTree | Use argMax pattern: `SELECT key, argMax(col, version) ... GROUP BY key` |
| Duplicate inserts to source | Deduplicate source or use ReplacingMergeTree for target |
```sql
-- BAD: uniq in SummingMergeTree (sums don't work)
ENGINE = SummingMergeTree()
-- SELECT uniq(user_id) AS users -- Wrong!
-- GOOD: AggregatingMergeTree with State/Merge
ENGINE = AggregatingMergeTree()
-- MV: uniqState(user_id) AS users
-- Query: uniqMerge(users)
```
### MV Not Updating / Missing Data
**Problem:** Target table not receiving new data, counts stuck at old values
**Diagnose:**
```sql
-- Check if MV is attached
SELECT name, engine FROM system.tables WHERE engine = 'MaterializedView';
-- Check target table recent data
SELECT max(date), count() FROM target_table;
-- Verify source table is receiving inserts
SELECT max(event_time), count() FROM source_table WHERE event_time > now() - INTERVAL 1 HOUR;
```
**Solutions:**
| Cause | Fix |
|-------|-----|
| MV detached | `ALTER TABLE source ATTACH MATERIALIZED VIEW mv_name` |
| MV on wrong source table | Drop MV, recreate with correct source |
| Historical data not backfilled | Manually INSERT aggregated historical data |
| Inserts going to different table | Ensure app inserts to the MV's source table |
```sql
-- Backfill historical data
INSERT INTO target_table
SELECT
toDate(event_time) AS date,
tenant_id,
sumState(1) AS events,
uniqState(user_id) AS users
FROM source_table
WHERE event_time < '2024-01-01' -- Before MV existed
GROUP BY date, tenant_id;
```
### Slow Inserts After Adding MV
**Problem:** INSERT performance degraded after creating MV, insert latency increased
**Diagnose:**
```sql
-- Check MVs on this table
SELECT name, as_select FROM system.tables
WHERE engine = 'MaterializedView' AND as_select LIKE '%source_table%';
-- Check MV query complexity
EXPLAIN SELECT ... FROM source_table ...; -- Use MV's SELECT query
```
**Solutions:**
| Cause | Fix |
|-------|-----|
| Too many MVs on one source | Consolidate MVs or use async inserts |
| Complex MV query (JOINs, heavy transforms) | Simplify MV, move complexity to query time |
| MV target table has wrong ORDER BY | Match target ORDER BY to MV's GROUP BY |
```sql
-- Example: MV groups by (tenant_id, date)
-- Target table ORDER BY should match
CREATE TABLE target (...)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, date); -- Matches GROUP BY in MV
```
### Target Table Growing Too Large
**Problem:** MV target table larger than expected, not aggregating properly
**Diagnose:**
```sql
-- Check rows per key (should be 1 after merge for same GROUP BY)
SELECT date, tenant_id, count() AS rows
FROM target_table
GROUP BY date, tenant_id
ORDER BY rows DESC
LIMIT 10;
```
**Solutions:**
| Cause | Fix |
|-------|-----|
| ORDER BY doesn't match GROUP BY | Recreate target with ORDER BY matching MV's GROUP BY |
| Background merges haven't run | Wait for automatic merge, or use argMax in query |
| Wrong engine (MergeTree instead of Summing/Aggregating) | Recreate with correct engine |
```sql
-- ORDER BY must match GROUP BY columns for proper aggregation
-- MV: GROUP BY (tenant_id, date, event_type)
-- Target: ORDER BY (tenant_id, date, event_type) -- Must match!
```
This skill provides 10+ proven patterns for ClickHouse Materialized Views (MVs) to build real-time aggregation, transformation, and query-optimization pipelines. It highlights which MergeTree engines to use—SummingMergeTree, AggregatingMergeTree, ReplacingMergeTree—and warns about common pitfalls like missing backfills and wrong ORDER BY choices. Practical examples and troubleshooting guidance help you design safe, performant MVs.
Materialized Views run the MV SELECT on each INSERT batch into the source table and write results into a target table; they do not read historical data. Use -State functions in MVs to store intermediate aggregate states and -Merge functions at query time to complete aggregation for AggregatingMergeTree. Choose SummingMergeTree for additive counters and AggregatingMergeTree for non-additive aggregates like uniq or quantiles.
Do MVs process historical data automatically?
No. Materialized Views only run on new INSERT batches. Backfill historical data by inserting aggregated results into the target table manually.
When should I use AggregatingMergeTree vs SummingMergeTree?
Use SummingMergeTree for pure additive metrics (sums, counters). Use AggregatingMergeTree when you need non-additive aggregates like uniq(), quantile(), or when accurate merging of state is required.