home / skills / altinity / skills / altinity-expert-clickhouse-reporting

This skill diagnoses ClickHouse query performance, identifies slow patterns, and offers optimization insights to reduce latency and timeouts.

npx playbooks add skill altinity/skills --skill altinity-expert-clickhouse-reporting

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.7 KB
---
name: altinity-expert-clickhouse-reporting
description: Diagnose ClickHouse SELECT query performance, analyze query patterns, identify slow queries, and find optimization opportunities. Use for query latency and timeout issues.
---

# Query Performance Analysis

Diagnose SELECT query performance issues, analyze query patterns, and identify optimization opportunities.

---

## Diagnostics

Run all queries from the file checks.sql and analyze the results.

---

## Query Optimization Hints

### Index Usage Check

```sql
-- Check if data skipping indices exist
select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}' and table = '{table}'
```

### Mark Count for Query

For a specific slow query, check how many marks (granules) were read:

```sql
select
    query_id,
    read_rows,
    selected_marks,
    selected_parts,
    formatReadableSize(read_bytes) as read_bytes,
    round(read_rows / nullIf(selected_marks, 0)) as rows_per_mark
from system.query_log
where query_id = '{query_id}'
  and type = 'QueryFinish'
```

**High `selected_marks`** relative to result = index not selective enough.

---

## Ad-Hoc Query Guidelines

### Required Safeguards
```sql
-- Always time-bound
where event_date >= today() - 1
-- or
where event_time > now() - interval 1 hour

-- Always limit
limit 100

-- Filter by type
where type = 'QueryFinish'  -- completed
where type like 'Exception%'  -- failed
```

### Useful Filters
```sql
-- By user
where user = 'analytics_user'

-- By query pattern
where query ilike '%SELECT%FROM my_table%'

-- By duration threshold
where query_duration_ms > 10000  -- > 10 seconds

-- By normalized hash (for specific query pattern)
where normalized_query_hash = 1234567890
```

---

## Cross-Module Triggers

| Finding | Load Module | Reason |
|---------|-------------|--------|
| High memory queries | `altinity-expert-clickhouse-memory` | Memory limits/optimization |
| Reading too many parts | `altinity-expert-clickhouse-merges` | Part consolidation |
| Poor index selectivity | `altinity-expert-clickhouse-schema` | Index/ORDER BY design |
| Cache misses | `altinity-expert-clickhouse-caches` | Cache sizing |
| MV slow | `altinity-expert-clickhouse-ingestion` | MV optimization |

---

## Settings Reference

| Setting | Scope | Notes |
|---------|-------|-------|
| `max_execution_time` | Query | Query timeout |
| `max_rows_to_read` | Query | Limit rows scanned |
| `max_bytes_to_read` | Query | Limit bytes scanned |
| `max_threads` | Query | Parallelism |
| `use_query_cache` | Query | Enable query result caching |
| `log_queries` | Server | Enable query logging |
| `log_queries_min_query_duration_ms` | Server | Log threshold |

Overview

This skill diagnoses ClickHouse SELECT query performance, surfaces slow or timing-out queries, and highlights optimization opportunities. It helps engineers analyze query patterns, identify expensive reads and index issues, and prioritize fixes to reduce latency and avoid timeouts. The goal is actionable insights for faster, more efficient queries.

How this skill works

The skill runs a suite of diagnostic checks and analyzes system tables (query_log, data_skipping_indices, etc.) to collect metrics like read_rows, selected_marks, selected_parts, and read_bytes. It detects high mark or part counts, long durations, and problematic query patterns, then maps findings to likely root causes and related optimization areas. It also suggests settings and safeguards to avoid heavy ad-hoc scans.

When to use it

  • Investigating SELECT queries with high latency or timeouts
  • After deployment of new queries or schema changes
  • When query logs show frequent long-running or failed queries
  • During capacity planning or performance regressions
  • Before optimizing indexes, materialized views, or merges

Best practices

  • Always run ad-hoc analysis with time bounds (e.g., event_date >= today() - 1) to avoid full-table scans
  • Apply LIMIT and type filters when exploring query_log to reduce noise
  • Check selected_marks and rows_per_mark to assess index selectivity
  • Filter by normalized_query_hash or query pattern to focus on a single query shape
  • Cross-reference findings with memory, merges, and cache diagnostics to find root causes

Example use cases

  • Find recent SELECT queries exceeding 10s and list their query_ids and read_bytes
  • Identify queries that read many marks indicating poor data-skipping index selectivity
  • Verify whether data_skipping_indices exist for a troublesome table and inspect index expressions and granularity
  • Measure rows_per_mark for a slow query to estimate index effectiveness
  • Detect patterns of repeated heavy reads that suggest need for PART consolidation or materialized view

FAQ

How do I tell if an index is ineffective?

Compare selected_marks to read_rows; a high selected_marks count relative to result rows indicates low selectivity and an index that isn’t filtering effectively.

What quick safeguards should I use for ad-hoc queries?

Always add a time-bound filter (e.g., last hour or day), use LIMIT, and restrict to completed or exception entries (type = 'QueryFinish' or like 'Exception%') to avoid long-running scans.