home / skills / altinity / skills / altinity-expert-clickhouse-storage

This skill analyzes ClickHouse disk usage and compression, identifies bottlenecks, and recommends optimizations for storage efficiency.

npx playbooks add skill altinity/skills --skill altinity-expert-clickhouse-storage

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
1.5 KB
---
name: altinity-expert-clickhouse-storage
description: Diagnose ClickHouse disk usage, compression efficiency, part sizes, and storage bottlenecks. Use for disk space issues and slow IO.
---

# Storage and Disk Usage Analysis

Diagnose disk usage, compression efficiency, part sizes, and storage bottlenecks.

---

## Diagnostics

Run all queries from the file checks.sql and analyze the results.

---

## Ad-Hoc Query Guidelines

### Required Safeguards
```sql
-- Always limit results
limit 100

-- For part_log
where event_date >= today() - 1
```

### Key Tables
- `system.disks` - Disk configuration
- `system.parts` - Part storage details
- `system.columns` - Column compression
- `system.storage_policies` - Tiered storage
- `system.detached_parts` - Orphaned parts

---

## Cross-Module Triggers

| Finding | Load Module | Reason |
|---------|-------------|--------|
| Poor compression | `altinity-expert-clickhouse-schema` | Codec recommendations |
| Many small parts | `altinity-expert-clickhouse-merges` | Merge backlog |
| High write IO | `altinity-expert-clickhouse-ingestion` | Batch sizing |
| System logs large | `altinity-expert-clickhouse-logs` | TTL configuration |
| Slow disk + merges | `altinity-expert-clickhouse-merges` | Merge optimization |

---

## Settings Reference

| Setting | Notes |
|---------|-------|
| `min_bytes_for_wide_part` | Threshold for Wide vs Compact parts |
| `min_rows_for_wide_part` | Row threshold for Wide parts |
| `max_bytes_to_merge_at_max_space_in_pool` | Max merge size |
| `prefer_not_to_merge` | Disable merges (emergency) |

Overview

This skill diagnoses ClickHouse disk usage and storage health to find space pressure, inefficient compression, and I/O bottlenecks. It targets part sizes, compression ratios, detached/orphaned parts, and storage policy issues to prioritize fixes. Use it when disk space is tight or queries suffer from slow disk performance.

How this skill works

It runs a curated set of diagnostic queries (from checks.sql) against system tables like system.disks, system.parts, system.columns, system.storage_policies, and system.detached_parts. The skill analyzes compression efficiency, distribution of part sizes, merge backlog indicators, and disk utilization to identify root causes and recommend actions. It also applies safe ad-hoc query rules (limits and recent-part filters) to avoid heavy load on production clusters.

When to use it

  • Disk space unexpectedly low or growing fast
  • Queries are IO-bound or show slow local reads
  • Large number of small parts or high merge backlog
  • Suspicion of poor compression or inefficient codecs
  • Orphaned detached parts consuming space

Best practices

  • Always run with query limits and recent-date filters to avoid scanning entire history
  • Compare compression ratio at the column level before changing codecs
  • Inspect min_bytes_for_wide_part and min_rows_for_wide_part before changing part thresholds
  • Check storage_policies for tiering and correct disk assignments
  • Coordinate merge or prefer_not_to_merge changes with maintenance windows

Example use cases

  • Find which disks are near capacity and which tables contribute most to usage
  • Identify columns with poor compression to target codec changes
  • Detect many small parts and link to merge backlog recommendations
  • Locate detached parts and provide cleanup options
  • Assess whether slow IO correlates with active merges or disk contention

FAQ

Will diagnostics run heavy or full-table scans?

No. Queries enforce limits and recent-date filters to avoid full scans; follow the ad-hoc query safeguards to stay safe on production clusters.

Does this skill change configuration automatically?

No. It provides diagnostics and concrete recommendations. Configuration changes (codec, merge settings, storage policies) should be applied manually or via an ops plan after review.