home / skills / altinity / skills / altinity-expert-clickhouse-ingestion

This skill helps diagnose ClickHouse ingestion bottlenecks and optimize insert performance by analyzing batch sizing, MV overhead, and kafka consumer issues.

npx playbooks add skill altinity/skills --skill altinity-expert-clickhouse-ingestion

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.1 KB
---
name: altinity-expert-clickhouse-ingestion
description: Diagnose ClickHouse INSERT performance, batch sizing, part creation patterns, and ingestion bottlenecks. Use for slow inserts and data pipeline issues.
---

## Diagnostics

Run all queries from the file checks.sql and analyze the results.

## Problem-Specific Investigation

### Insert with MV Overhead - Correlate by Query ID

When inserts feed materialized views, slow MVs cause insert delays. To correlate a slow insert with its MV breakdown:

```sql
-- Correlate slow insert with MV breakdown (requires query_id)
select
    view_name,
    view_duration_ms,
    read_rows,
    written_rows,
    status
from system.query_views_log
where query_id = '{query_id}'
order by view_duration_ms desc
```

### Kafka Consumer Exception Drill-Down (Targeted)

Use this only for problematic Kafka tables to avoid noisy output.

```sql
-- Filter to a specific Kafka table when lag is observed
select
    hostName() as host,
    database,
    table,
    consumer_id,
    is_currently_used,
    dateDiff('second', last_poll_time, now()) as last_poll_age_s,
    dateDiff('second', last_commit_time, now()) as last_commit_age_s,
    num_messages_read,
    num_commits,
    length(assignments.topic) as assigned_partitions,
    length(exceptions.text) as exception_count,
    exceptions.text[-1] as last_exception
from clusterAllReplicas('{cluster}', system.kafka_consumers)
where database = '{db}'
  and table = '{kafka_table}'
order by is_currently_used desc, last_poll_age_s desc
limit 50
```

## Ad-Hoc Query Guidelines

### Required Safeguards

```sql
-- Always limit results
limit 100

-- Always time-bound
where event_date = today()
-- or
where event_time > now() - interval 1 hour

-- For query_log, filter by type
where type = 'QueryFinish'  -- completed
-- or
where type like 'Exception%'  -- failed
```

### Useful Filters

```sql
-- Filter by table
where has(tables, 'database.table_name')

-- Filter by user
where user = 'producer_app'

-- Filter by insert size
where written_rows > 1000000  -- large inserts
where written_rows < 100      -- micro-batches
```

Overview

This skill diagnoses ClickHouse ingestion and INSERT performance issues, focusing on batch sizing, part creation patterns, and pipeline bottlenecks. It helps pinpoint slow inserts, materialized view overhead, Kafka consumer problems, and problematic commit/poll behavior. Use it to convert mysterious slow-downs into actionable diagnostics and tuning steps.

How this skill works

The skill runs a curated set of SQL checks to collect metrics from system tables and logs, including query_views_log, kafka_consumers, and query_log. It correlates slow INSERTs with downstream work such as materialized view execution and inspects Kafka consumer health and lag for affected tables. It enforces safe query patterns (limits and time bounds) to avoid noisy or costly scans and highlights suspicious patterns like oversized batches or many small parts.

When to use it

  • When INSERT statements are unexpectedly slow or show high latency spikes.
  • When materialized views might be causing upstream insert delays.
  • When Kafka-backed tables show lag, exceptions, or stalled consumers.
  • When parts proliferation or poor batch sizing is suspected.
  • During incident triage or performance audits of data pipelines.

Best practices

  • Always run diagnostics with result limits and recent time bounds to avoid heavy cluster load.
  • Correlate by query_id to map a slow INSERT to its MV subqueries and durations.
  • Filter Kafka diagnostics to the specific table to reduce noise and focus on relevant consumers.
  • Classify inserts by written_rows to find micro-batches or oversized batches and tune producer behavior.
  • Use cluster-aware system tables (clusterAllReplicas) for multi-node environments to get full visibility.

Example use cases

  • Identify a slow INSERT where a materialized view’s long processing time is the root cause.
  • Detect a Kafka consumer that stopped polling or shows repeated exceptions for a given topic/table.
  • Find patterns of many small inserts creating many parts, causing merge pressure and poor throughput.
  • Validate that large bulk loads are creating expected part sizes and not overwhelming merges.
  • Triage an intermittent pipeline regression by filtering query_log to completed and exception entries from the last hour.

FAQ

How do I link a slow insert to materialized view overhead?

Capture the insert query_id and query query_views_log for that id; sort by view_duration_ms to see which views contribute most to latency.

Why restrict Kafka diagnostics to a single table?

Kafka consumer tables can produce high-volume output; filtering to the specific database.table keeps results actionable and reduces noise.