home / skills / vadimcomanescu / codex-skills / senior-data-engineer

This skill helps you design reliable data pipelines by defining contracts, quality checks, and observability to ensure predictable, recoverable ETL workflows.

npx playbooks add skill vadimcomanescu/codex-skills --skill senior-data-engineer

Review the files below or copy the command above to add this skill to your agents.

Files (5)
SKILL.md
1.1 KB
---
name: senior-data-engineer
description: "Data engineering workflows for designing reliable pipelines and datasets: ingestion, transforms, orchestration, schema evolution, data contracts, quality checks, and observability. Use when building ETL/ELT, reviewing pipelines, defining warehouse/lake schemas, or diagnosing data quality incidents."
---

# Senior Data Engineer

Make data pipelines boring: predictable, observable, and recoverable.

## Quick Start
1) Define the data contract (schema + semantics + freshness + ownership).
2) Design the pipeline:
   - Inputs, transformations, outputs, backfills, and failure handling
3) Data quality: checks for nulls, ranges, uniqueness, and referential integrity.
4) Operational story: retries, checkpoints, alerting, and lineage.

## Optional tool: lightweight profiling for CSV/JSONL
```bash
python ~/.codex/skills/senior-data-engineer/scripts/data_quality_scan.py path/to/data.csv --out /tmp/data_profile.json
```

## References
- Data contract template: `references/data-contract.md`
- Pipeline checklist: `references/pipeline-checklist.md`

Overview

This skill helps design and harden data engineering workflows so pipelines are predictable, observable, and recoverable. It focuses on defining data contracts, designing ingestion and transform steps, implementing quality checks, and building an operational story for alerts and recovery. Use it to standardize ETL/ELT patterns and reduce production incidents.

How this skill works

The skill inspects pipeline design and documentation to ensure inputs, outputs, transformations, backfills, and failure handling are explicit. It evaluates schema and semantic contracts, recommends data quality checks (nulls, ranges, uniqueness, referential integrity), and identifies gaps in retries, checkpoints, alerting, and lineage. Optionally, it can run a lightweight profile on CSV/JSONL files to produce a quick data profile for schema and quality validation.

When to use it

  • Designing a new ETL/ELT pipeline or onboarding a new data source
  • Reviewing or hardening existing pipelines before production rollout
  • Defining or validating warehouse/lake schemas and data contracts
  • Diagnosing data quality incidents or unexplained schema drift
  • Building observability and operational runbooks for data teams

Best practices

  • Treat a data contract as the single source of truth: include schema, semantics, freshness, and ownership
  • Design explicit failure modes and backfill strategies for every pipeline
  • Implement automated data quality checks at ingestion and after transforms
  • Ship lineage and monitoring with the pipeline so root cause analysis is fast
  • Favor idempotent transforms and checkpointed writes to support safe retries

Example use cases

  • Create a data contract for a new API feed including expected schema, required fields, and SLA
  • Audit an existing pipeline to add uniqueness and referential integrity checks
  • Design an orchestration plan with retries, checkpointing, and backfill instructions
  • Run a quick profile on a CSV dump to detect schema inconsistencies before loading
  • Draft an incident playbook that maps alerts to owners and remediation steps

FAQ

Can I profile large datasets with this skill?

The built-in profiler is lightweight and best for sample files or extracts. For large datasets, sample or use scalable profiling tools and feed results into the same contract and quality checks.

How do data contracts handle schema evolution?

Define versioned contracts and explicit migration rules: denote additive versus breaking changes, require compatibility tests, and automate deployment gates for breaking changes.