home / skills / melodic-software / claude-code-plugins / estimation-techniques
npx playbooks add skill melodic-software/claude-code-plugins --skill estimation-techniquesReview the files below or copy the command above to add this skill to your agents.
---
name: estimation-techniques
description: Back-of-envelope calculations for system design. Use when estimating QPS, storage, bandwidth, or latency for capacity planning. Includes latency numbers every programmer should know and common estimation patterns.
allowed-tools: Read, Glob, Grep
---
# Estimation Techniques
This skill provides frameworks for back-of-envelope calculations essential for system design and capacity planning.
## When to Use This Skill
**Keywords:** back-of-envelope, estimation, QPS, storage calculation, bandwidth, latency, capacity planning, scale estimation
**Use this skill when:**
- Estimating system capacity requirements
- Calculating storage needs for a feature
- Determining bandwidth requirements
- Sizing infrastructure for expected load
- Justifying architectural decisions with numbers
- Preparing for system design interviews
## Core Principle
**Estimation is not about precision, it's about order of magnitude.**
Getting within 10x is usually good enough for architectural decisions. The goal is to identify if you need:
- 1 server or 100 servers
- 1 GB or 1 TB of storage
- 10 ms or 1 second latency
## Essential Numbers to Know
### Powers of 2
| Power | Value | Approximate |
| ----- | ----- | ----------- |
| 2^10 | 1,024 | ~1 Thousand (KB) |
| 2^20 | 1,048,576 | ~1 Million (MB) |
| 2^30 | 1,073,741,824 | ~1 Billion (GB) |
| 2^40 | 1,099,511,627,776 | ~1 Trillion (TB) |
### Time Conversions
| Unit | Seconds | Useful For |
| ---- | ------- | ---------- |
| 1 minute | 60 | Short operations |
| 1 hour | 3,600 | Batch jobs |
| 1 day | 86,400 (~100K) | Daily aggregations |
| 1 month | 2,592,000 (~2.5M) | Monthly calculations |
| 1 year | 31,536,000 (~30M) | Annual projections |
### Availability Targets
| Availability | Downtime/Year | Downtime/Month | Downtime/Day |
| ------------ | ------------- | -------------- | ------------ |
| 99% (two 9s) | 3.65 days | 7.31 hours | 14.4 min |
| 99.9% (three 9s) | 8.76 hours | 43.8 min | 1.44 min |
| 99.99% (four 9s) | 52.6 min | 4.38 min | 8.64 sec |
| 99.999% (five 9s) | 5.26 min | 26.3 sec | 864 ms |
### Latency Numbers Every Programmer Should Know
**See full reference:** `references/latency-numbers.md`
Quick reference:
| Operation | Latency | Relative |
| --------- | ------- | -------- |
| L1 cache reference | 0.5 ns | 1x |
| L2 cache reference | 7 ns | 14x |
| Main memory reference | 100 ns | 200x |
| SSD random read | 16 us | 32,000x |
| HDD seek | 2 ms | 4,000,000x |
| Round trip same datacenter | 0.5 ms | 1,000,000x |
| Round trip CA to Netherlands | 150 ms | 300,000,000x |
## Estimation Patterns
### Pattern 1: QPS (Queries Per Second)
#### Formula
```text
QPS = (Number of Users) x (Actions per User per Day) / (Seconds per Day)
```
#### Example: Twitter-like service
```text
Given:
- 300 million monthly active users
- 50% are daily active = 150M DAU
- Average user reads 20 tweets/day
QPS = 150M * 20 / 86,400
= 3 billion / 100,000
= 30,000 QPS
Peak load (typically 2-3x average):
Peak QPS = 30,000 * 3 = 90,000 QPS
```
### Pattern 2: Storage Estimation
#### Formula
```text
Storage = (Number of Items) x (Size per Item) x (Replication Factor) x (Time Period)
```
#### Example: Photo storage service
```text
Given:
- 100 million users
- 10% upload daily = 10M uploads/day
- Average photo size = 2 MB
- Keep 5 years of data
- Replication factor = 3
Daily storage = 10M * 2 MB = 20 TB
Yearly storage = 20 TB * 365 = 7.3 PB
5-year storage = 7.3 * 5 = 36.5 PB
With replication = 36.5 * 3 = ~110 PB
```
### Pattern 3: Bandwidth Estimation
#### Formula
```text
Bandwidth = (QPS) x (Request Size or Response Size)
```
#### Example: Video streaming service
```text
Given:
- 1 million concurrent viewers
- Average bitrate = 5 Mbps
- Peak hours: 8 PM - 11 PM
Bandwidth = 1M * 5 Mbps = 5 Tbps
With 20% overhead: ~6 Tbps
CDN egress cost (rough):
$0.02/GB * 6 Tbps * 3 hours * 3600 sec/hour / 8 bits/byte
= massive cost (hence why Netflix built their own CDN)
```
### Pattern 4: Cache Size Estimation
#### Formula
```text
Cache Size = (QPS) x (Cache TTL) x (Response Size) x (Unique Ratio)
```
#### Example: API response cache
```text
Given:
- 10,000 QPS
- Cache TTL = 5 minutes = 300 seconds
- Average response = 10 KB
- 20% of requests are unique
Cache entries = 10,000 * 300 * 0.20 = 600,000 entries
Cache size = 600,000 * 10 KB = 6 GB
With overhead (keys, metadata): ~10 GB
```
### Pattern 5: Database Sizing
#### Formula
```text
DB Size = (Number of Rows) x (Row Size) x (Index Overhead) x (Replication)
```
#### Example: User profile database
```text
Given:
- 500 million users
- Average profile = 1 KB (name, email, settings, etc.)
- Index overhead = 30%
- Primary + 2 replicas = 3x
Data size = 500M * 1 KB = 500 GB
With indexes = 500 GB * 1.3 = 650 GB
With replication = 650 GB * 3 = ~2 TB
Memory for hot data (20%): ~400 GB
```
## Common Estimation Scenarios
### Scenario 1: URL Shortener
```text
Requirements:
- 100M new URLs/month
- 10:1 read:write ratio
Writes:
- 100M / (30 * 24 * 3600) = ~40 writes/second
- Peak: ~100 writes/second
Reads:
- 40 * 10 = 400 reads/second
- Peak: ~1000 reads/second
Storage (5 years):
- 100M URLs/month * 60 months = 6 billion URLs
- Average URL = 100 bytes (short) + 500 bytes (long) = 600 bytes
- 6B * 600 bytes = 3.6 TB
- With indexes and overhead: ~5 TB
```
### Scenario 2: Chat Application
```text
Requirements:
- 10M daily active users
- Average 50 messages sent/day
- Average 200 messages received/day
Message throughput:
- Sends: 10M * 50 / 86,400 = ~6,000 messages/second
- Peak: ~20,000 messages/second
Connections:
- Each user maintains 1-3 connections (phone, laptop, tablet)
- Peak concurrent: 10M * 0.1 (10% online) * 2 = 2M connections
Storage (1 year):
- 10M users * 50 msgs/day * 365 days = 182B messages/year
- Average message = 200 bytes
- 182B * 200 bytes = 36.4 TB/year
```
### Scenario 3: Video Streaming
```text
Requirements:
- 100M monthly active users
- 30% watch daily = 30M DAU
- Average 1 hour/day viewing
Concurrent viewers (peak):
- 30M DAU / 24 hours * 3 (peak factor) = ~4M concurrent
Bandwidth:
- Average stream: 5 Mbps
- 4M * 5 Mbps = 20 Tbps peak bandwidth
Storage (library of 10K titles):
- Average video = 2 hours
- Multiple qualities: 480p (1GB), 720p (3GB), 1080p (5GB), 4K (20GB)
- Per title: ~30 GB
- Library: 10K * 30 GB = 300 TB
```
## Estimation Tips
### Round Aggressively
```text
Instead of: Use:
86,400 seconds ~100,000 (10^5)
2.5 million ~3 million
7.3 petabytes ~10 petabytes
```
### Use Orders of Magnitude
Think in powers of 10:
- Thousands (10^3)
- Millions (10^6)
- Billions (10^9)
- Trillions (10^12)
### State Your Assumptions
Always verbalize:
- "I'm assuming 10% of users are active at peak"
- "I'm estimating average message size at 200 bytes"
- "I'm using a 3x replication factor"
### Sanity Check Results
After calculating, ask:
- "Does this make sense?"
- "Is this in the right order of magnitude?"
- "What would change if my assumption is off by 10x?"
## Common Mistakes
### Mistake 1: Ignoring Peak vs Average
#### Problem
Sizing for average load.
```text
Average QPS: 10,000
Peak QPS: 30,000 (often 2-3x average)
If you size for 10,000, you'll fail at peak.
```
### Mistake 2: Forgetting Replication
#### Problem
Calculating raw storage without copies.
```text
Data: 1 TB
With 3 replicas: 3 TB
With backups: 4-5 TB
```
### Mistake 3: Not Accounting for Growth
#### Problem
Sizing for current, not future.
```text
Current users: 10M
Expected growth: 50%/year
Year 3: 10M * 1.5^3 = 34M users
Size for at least 2x current to avoid near-term issues.
```
### Mistake 4: Over-Precision
#### Problem
Calculating to 3 decimal places.
```text
Bad: "We need exactly 3,456,789 IOPS"
Good: "We need roughly 3-4 million IOPS"
```
## Quick Reference Calculations
| Need | Formula |
| ---- | ------- |
| QPS | users * actions/day / 86400 |
| Storage/day | items/day * size/item |
| Bandwidth | QPS * response_size |
| Cache hit rate | 1 - (DB_QPS / total_QPS) |
| Servers needed | QPS / QPS_per_server |
| Shards needed | data_size / max_shard_size |
## Related Skills
- `design-interview-methodology` - Overall interview framework
- `quality-attributes-taxonomy` - NFR definitions (scalability, performance)
- `database-scaling` - Database capacity planning (Phase 3)
- `caching-strategies` - Cache sizing and hit rates (Phase 3)
## Related Commands
- `/sd:estimate <scenario>` - Calculate capacity interactively
## Related Agents
- `capacity-planner` - Guided estimation with calculations
## References
- `references/latency-numbers.md` - Complete latency reference table
---
## Version History
- **v1.0.0** (2025-12-26): Initial release
---
## Last Updated
**Date:** 2025-12-26
**Model:** claude-opus-4-5-20251101