home / skills / trantuananh-17 / product-reviews / bigquery

bigquery skill

safe

This skill helps you design efficient BigQuery tables and queries, optimize partitioning, clustering, and Node.js integration for faster analytics.

npx playbooks add skill trantuananh-17/product-reviews --skill bigquery

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

2.1 KB

---
name: bigquery-analytics
description: Use this skill when the user asks about "BigQuery", "analytics queries", "data warehouse", "partitioning", "clustering", "SQL queries", or any BigQuery-related work. Provides BigQuery table design, query optimization, and Node.js integration patterns.
---

# BigQuery Best Practices

## Table Design

### Partitioning (REQUIRED for tables > 1GB)

```sql
CREATE TABLE `project.dataset.events` (
  event_id STRING,
  shop_id STRING,
  event_type STRING,
  created_at TIMESTAMP,
  data JSON
)
PARTITION BY DATE(created_at)
CLUSTER BY shop_id, event_type;
```

| Data Size | Partition By |
|-----------|--------------|
| < 1GB | Not needed |
| 1GB - 1TB | DATE/TIMESTAMP |
| > 1TB | DATE + consider sharding |

---

## Query Patterns

### Always Use Partition Filter

```sql
-- ❌ BAD: No partition filter (full scan)
SELECT * FROM events WHERE shop_id = 'shop_123';

-- ✅ GOOD: Partition filter included
SELECT * FROM events
WHERE created_at >= '2024-01-01'
  AND created_at < '2024-02-01'
  AND shop_id = 'shop_123';
```

### Select Only Needed Columns

```sql
-- ❌ BAD: SELECT *
SELECT * FROM events;

-- ✅ GOOD: Select specific columns
SELECT event_id, event_type, created_at FROM events;
```

---

## Node.js Integration

### Always Batch Inserts

```javascript
// ✅ GOOD: Single batch insert
await table.insert(batch.map(row => ({
  ...row,
  time: new Date()
})));

// ❌ BAD: Insert one row at a time
for (const row of batch) {
  await table.insert([row]);
}
```

| Scenario | Max Batch Size |
|----------|----------------|
| Streaming inserts | 500-1000 rows |
| High throughput | Up to 10,000 rows |

---

## Cost Control

```javascript
// Dry run before expensive queries
const [job] = await bigquery.createQueryJob({
  query: sql,
  dryRun: true
});
const estimatedCost = (job.statistics.totalBytesProcessed / 1e12) * 5;
```

---

## Checklist

```
□ Large tables (>1GB) have partitioning
□ Queries include partition column in WHERE
□ Tables clustered by frequently filtered columns
□ No SELECT * - select specific columns
□ Using parameterized queries
□ Batch inserts (not row-by-row)
```

Overview

This skill helps you design and optimize BigQuery tables, write cost-efficient analytics queries, and integrate BigQuery with Node.js. It focuses on partitioning, clustering, query patterns, batching inserts, and cost controls. Use it to reduce query costs, improve performance, and standardize integration patterns for streaming and batch loads.

How this skill works

The skill inspects table size and recommends partitioning and clustering strategies, enforces query patterns that limit scanned bytes (partition filters, column projection), and shows Node.js patterns for efficient inserts and dry-run cost estimation. It provides concrete SQL and JavaScript examples you can copy into your projects. It also offers a short operational checklist to validate datasets and queries.

When to use it

Designing or refactoring BigQuery tables larger than 1GB
Writing analytics queries that must be cost- and performance-sensitive
Implementing streaming or batch inserts from Node.js
Diagnosing unexpectedly high query costs or long runtimes
Preparing SLAs where predictable query latency and cost matter

Best practices

Partition tables by DATE/TIMESTAMP for tables >1GB; consider sharding only for extreme scale
Always include a partition filter in WHERE to avoid full table scans
Select only the columns you need; avoid SELECT * in production queries
Cluster by frequently filtered columns (e.g., shop_id, event_type) to speed selective scans
Batch inserts in Node.js (hundreds to thousands of rows) instead of row-by-row
Dry-run expensive queries to estimate bytes processed and approximate cost

Example use cases

Event analytics: partition events by created_at and cluster by shop_id/event_type for fast per-shop reporting
Monthly reporting queries that explicitly filter the partition range to reduce scanned data
High-throughput ingestion from a Node.js service using batched streaming inserts
Cost analysis: run a dry-run of a large transformation query to estimate billing impact
Refactoring a wide table to remove SELECT * usage and improve query predictability

FAQ

When should I shard instead of partition?

Sharding is only appropriate at extreme scale when partitions become too large or hot; prefer DATE/TIMESTAMP partitioning up to the terabyte range.

How large should Node.js insert batches be?

For streaming inserts, aim for 500–1,000 rows per batch; high-throughput bulk loads can go up to ~10,000 rows depending on payload size and latency needs.