home / skills / deletexiumu / agentskills-hub / sql-standards

sql-standards skill

/skills/public/smart-data-query/skills/sql-standards

This skill enforces Hive and SparkSQL compatible SQL standards, ensuring executability, clear parameterization, and dialect-aware output for data warehouses.

npx playbooks add skill deletexiumu/agentskills-hub --skill sql-standards

Review the files below or copy the command above to add this skill to your agents.

Files (6)

SKILL.md

2.4 KB

---
name: sql-standards
description: SQL output standards and Hive dialect compatibility rules for data warehouse queries. This skill should be used when generating SQL for Hive/SparkSQL/GaussDB, or when reviewing SQL for compatibility issues.
---

# SQL 标准规范

生成数仓 SQL 时必须遵守的输出规范、方言规则和验收要求。

## 输出规范

- **可直接执行**：完整 SQL（允许 CTE），不输出伪代码
- **参数清晰**：可变参数用 `{{start_date}}` 占位符，注释说明格式
- **字段命名一致**：输出字段与需求字段一一对应
- **分区可下推**：用分区字段（`dt/ds/biz_date`）过滤，避免全表扫描
- **中文别名**：最终 SELECT 用中文别名，Hive 用 `` `中文` ``，GaussDB 用 `"中文"`

### CTE 推荐结构

1. `base`：事实表取数（必要列 + 分区过滤 + 业务过滤）
2. `agg`：按需求粒度聚合
3. `dim_*`：维度补充（先去重/取最新再 join）
4. `final`：整理输出字段、排序

## Hive 低版本兼容性

### 禁止的语法

| 禁止项 | 替代方案 |
|--------|----------|
| TRANSFORM 函数 | `LATERAL VIEW EXPLODE`、`regexp_extract`、`get_json_object` |
| SELECT/JOIN ON 中的子查询 `(select ...)` | 派生表 / CTE 先算好再 JOIN |
| `ORDER BY 1, 2, 3` 序号排序 | 用输出列名排序 |

### ORDER BY 规则

`ORDER BY` 必须只引用当前 SELECT 的输出列名。如果用了中文别名，必须用中文别名排序：

```sql
-- 正确
SELECT province_name AS `省份`, COUNT(*) AS `数量`
FROM t GROUP BY province_name
ORDER BY `省份`

-- 错误：用底层字段名
ORDER BY province_name
```

## 验收要点

- 禁用 `select *`，显式列出输出字段
- `group by` 与选择列一致，聚合指标不重复计算
- join 时写清类型与 key，维表多版本先去重再 join
- 默认 Hive/SparkSQL 方言，GaussDB 仅按需输出
- 口径不确定时给多分支 SQL 或注释说明假设

## 详细文档

| 主题 | 路径 |
|------|------|
| SQL 输出规范 | [`references/SQL-输出规范.md`](references/SQL-输出规范.md) |
| 方言兼容 | [`references/方言与兼容性.md`](references/方言与兼容性.md) |
| 静态验收清单 | [`references/静态验收清单.md`](references/静态验收清单.md) |
| 日志迭代 | [`references/日志与迭代机制.md`](references/日志与迭代机制.md) |
| 需求问卷 | [`references/问数需求问卷模板.md`](references/问数需求问卷模板.md) |

Overview

This skill encodes SQL output standards and Hive-compatible dialect rules for data warehouse queries. It guides generation and review of SQL for Hive, SparkSQL, and GaussDB so outputs are executable, auditable, and compatible across target engines. Use it to enforce naming, partitioning, CTE structure, and low-version Hive restrictions.

How this skill works

The skill inspects SQL drafts and enforces a checklist: executable SQL (no pseudocode), explicit parameter placeholders, and column-level correspondence with requirements. It checks dialect-specific constraints (forbidden constructs in older Hive, ORDER BY rules, and GaussDB differences), partition pushdown, and CTE structure recommendations. It also flags common acceptance issues like select *, mismatched GROUP BY, ambiguous joins, and versioned dimension handling.

When to use it

Generating SQL intended to run on Hive, SparkSQL, or GaussDB
Reviewing existing queries for cross-dialect compatibility or low-version Hive constraints
Preparing production-ready SQL with partition filters and clear parameterization
Validating aggregation correctness, join keys, and explicit output fields
Creating SQL templates for scheduled jobs or shared data assets

Best practices

Always output complete, executable SQL; allow CTEs but no pseudocode
Use clear placeholders like {{start_date}} and document expected formats in comments
Filter by partition columns (dt/ds/biz_date) to enable partition pushdown
List output columns explicitly; avoid select * and ensure GROUP BY matches select
Use recommended CTE structure: base → agg → dim_* → final
For Hive low versions, avoid TRANSFORM and inline subqueries in SELECT/JOIN; materialize them as CTEs or derived tables

Example use cases

Authoring a scheduled ETL query that must run on both SparkSQL and older Hive clusters
Reviewing a teammate's report SQL to enforce partition filters and explicit projections
Converting a query using TRANSFORM or inline subqueries into CTE-based compatible SQL
Preparing multiple SQL branches with documented assumptions when business logic is ambiguous
Producing user-facing result sets with Chinese aliases using Hive or GaussDB quoting rules

FAQ

How should I format mutable parameters?

Use placeholders like {{start_date}} and add a comment that specifies type and format (for example: -- {{start_date}}: YYYY-MM-DD).

What if the target engine is unknown?

Default to Hive/SparkSQL-compatible SQL and provide GaussDB-specific variants only when required.

How to handle dimension tables with multiple versions?

Deduplicate or select the latest version in a dim_* CTE before joining to fact tables.