home / skills / vadimcomanescu / codex-skills / senior-data-engineer

senior-data-engineer skill

/skills/.curated/data/senior-data-engineer

This skill helps you design reliable data pipelines by defining contracts, quality checks, and observability to ensure predictable, recoverable ETL workflows.

npx playbooks add skill vadimcomanescu/codex-skills --skill senior-data-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (5)

SKILL.md

1.1 KB

---
name: senior-data-engineer
description: "Data engineering workflows for designing reliable pipelines and datasets: ingestion, transforms, orchestration, schema evolution, data contracts, quality checks, and observability. Use when building ETL/ELT, reviewing pipelines, defining warehouse/lake schemas, or diagnosing data quality incidents."
---

# Senior Data Engineer

Make data pipelines boring: predictable, observable, and recoverable.

## Quick Start
1) Define the data contract (schema + semantics + freshness + ownership).
2) Design the pipeline:
   - Inputs, transformations, outputs, backfills, and failure handling
3) Data quality: checks for nulls, ranges, uniqueness, and referential integrity.
4) Operational story: retries, checkpoints, alerting, and lineage.

## Optional tool: lightweight profiling for CSV/JSONL
```bash
python ~/.codex/skills/senior-data-engineer/scripts/data_quality_scan.py path/to/data.csv --out /tmp/data_profile.json
```

## References
- Data contract template: `references/data-contract.md`
- Pipeline checklist: `references/pipeline-checklist.md`

Overview

This skill helps design and harden data engineering workflows so pipelines are predictable, observable, and recoverable. It focuses on defining data contracts, designing ingestion and transform steps, implementing quality checks, and building an operational story for alerts and recovery. Use it to standardize ETL/ELT patterns and reduce production incidents.

How this skill works

The skill inspects pipeline design and documentation to ensure inputs, outputs, transformations, backfills, and failure handling are explicit. It evaluates schema and semantic contracts, recommends data quality checks (nulls, ranges, uniqueness, referential integrity), and identifies gaps in retries, checkpoints, alerting, and lineage. Optionally, it can run a lightweight profile on CSV/JSONL files to produce a quick data profile for schema and quality validation.

When to use it

Designing a new ETL/ELT pipeline or onboarding a new data source
Reviewing or hardening existing pipelines before production rollout
Defining or validating warehouse/lake schemas and data contracts
Diagnosing data quality incidents or unexplained schema drift
Building observability and operational runbooks for data teams

Best practices

Treat a data contract as the single source of truth: include schema, semantics, freshness, and ownership
Design explicit failure modes and backfill strategies for every pipeline
Implement automated data quality checks at ingestion and after transforms
Ship lineage and monitoring with the pipeline so root cause analysis is fast
Favor idempotent transforms and checkpointed writes to support safe retries

Example use cases

Create a data contract for a new API feed including expected schema, required fields, and SLA
Audit an existing pipeline to add uniqueness and referential integrity checks
Design an orchestration plan with retries, checkpointing, and backfill instructions
Run a quick profile on a CSV dump to detect schema inconsistencies before loading
Draft an incident playbook that maps alerts to owners and remediation steps

FAQ

Can I profile large datasets with this skill?

The built-in profiler is lightweight and best for sample files or extracts. For large datasets, sample or use scalable profiling tools and feed results into the same contract and quality checks.

How do data contracts handle schema evolution?

Define versioned contracts and explicit migration rules: denote additive versus breaking changes, require compatibility tests, and automate deployment gates for breaking changes.