home / skills / vadimcomanescu / codex-skills / senior-data-engineer
This skill helps you design reliable data pipelines by defining contracts, quality checks, and observability to ensure predictable, recoverable ETL workflows.
npx playbooks add skill vadimcomanescu/codex-skills --skill senior-data-engineerReview the files below or copy the command above to add this skill to your agents.
---
name: senior-data-engineer
description: "Data engineering workflows for designing reliable pipelines and datasets: ingestion, transforms, orchestration, schema evolution, data contracts, quality checks, and observability. Use when building ETL/ELT, reviewing pipelines, defining warehouse/lake schemas, or diagnosing data quality incidents."
---
# Senior Data Engineer
Make data pipelines boring: predictable, observable, and recoverable.
## Quick Start
1) Define the data contract (schema + semantics + freshness + ownership).
2) Design the pipeline:
- Inputs, transformations, outputs, backfills, and failure handling
3) Data quality: checks for nulls, ranges, uniqueness, and referential integrity.
4) Operational story: retries, checkpoints, alerting, and lineage.
## Optional tool: lightweight profiling for CSV/JSONL
```bash
python ~/.codex/skills/senior-data-engineer/scripts/data_quality_scan.py path/to/data.csv --out /tmp/data_profile.json
```
## References
- Data contract template: `references/data-contract.md`
- Pipeline checklist: `references/pipeline-checklist.md`
This skill helps design and harden data engineering workflows so pipelines are predictable, observable, and recoverable. It focuses on defining data contracts, designing ingestion and transform steps, implementing quality checks, and building an operational story for alerts and recovery. Use it to standardize ETL/ELT patterns and reduce production incidents.
The skill inspects pipeline design and documentation to ensure inputs, outputs, transformations, backfills, and failure handling are explicit. It evaluates schema and semantic contracts, recommends data quality checks (nulls, ranges, uniqueness, referential integrity), and identifies gaps in retries, checkpoints, alerting, and lineage. Optionally, it can run a lightweight profile on CSV/JSONL files to produce a quick data profile for schema and quality validation.
Can I profile large datasets with this skill?
The built-in profiler is lightweight and best for sample files or extracts. For large datasets, sample or use scalable profiling tools and feed results into the same contract and quality checks.
How do data contracts handle schema evolution?
Define versioned contracts and explicit migration rules: denote additive versus breaking changes, require compatibility tests, and automate deployment gates for breaking changes.