home / skills / sickn33 / antigravity-awesome-skills / data-quality-frameworks

data-quality-frameworks skill

This skill helps automate data quality validation with Great Expectations, dbt tests, and data contracts across pipelines, CI/CD, and monitoring.

This is most likely a fork of the data-quality-frameworks skill from xfstudio

npx playbooks add skill sickn33/antigravity-awesome-skills --skill data-quality-frameworks

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

1.4 KB

---
name: data-quality-frameworks
description: Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.
---

# Data Quality Frameworks

Production patterns for implementing data quality with Great Expectations, dbt tests, and data contracts to ensure reliable data pipelines.

## Use this skill when

- Implementing data quality checks in pipelines
- Setting up Great Expectations validation
- Building comprehensive dbt test suites
- Establishing data contracts between teams
- Monitoring data quality metrics
- Automating data validation in CI/CD

## Do not use this skill when

- The data sources are undefined or unavailable
- You cannot modify validation rules or schemas
- The task is unrelated to data quality or contracts

## Instructions

- Identify critical datasets and quality dimensions.
- Define expectations/tests and contract rules.
- Automate validation in CI/CD and schedule checks.
- Set alerting, ownership, and remediation steps.
- If detailed patterns are required, open `resources/implementation-playbook.md`.

## Safety

- Avoid blocking critical pipelines without a fallback plan.
- Handle sensitive data securely in validation outputs.

## Resources

- `resources/implementation-playbook.md` for detailed frameworks, templates, and examples.

Overview

This skill implements production-ready data quality validation using Great Expectations, dbt tests, and formal data contracts. It provides patterns to identify critical datasets, define expectations and tests, and automate validation within CI/CD and scheduled pipelines. The goal is reliable data pipelines with clear ownership, alerting, and remediation steps.

How this skill works

The skill inspects schema, completeness, uniqueness, value distributions, and business rules across datasets and surfaces failures through validation suites. It codifies checks as Great Expectations expectations, dbt test cases, and contract rules that can be enforced at ingestion, transformation, and delivery points. Validation runs are automated in CI/CD pipelines and schedulers, and failures trigger alerts and defined remediation workflows. It also supports metrics collection for monitoring data quality trends.

When to use it

Building or hardening ETL/ELT pipelines
Onboarding new datasets or changing schemas
Implementing data contracts between producers and consumers
Adding automated validation to CI/CD or scheduled jobs
Setting up monitoring and alerting for data quality

Best practices

Start with a critical dataset inventory and prioritize high-impact checks
Combine types of checks: schema, nulls, uniqueness, ranges, and distribution drift
Codify tests in dbt and expectations in Great Expectations for layered coverage
Automate validations in CI/CD and run lightweight checks in PRs, heavier checks in scheduled jobs
Define ownership, SLA, alerting thresholds, and a documented remediation playbook
Avoid blocking critical pipelines without a safety/fallback plan and sanitize sensitive values in logs

Example use cases

Prevent production regressions by running dbt tests and GE validations on every deploy
Establish a data contract between analytics and engineering teams with automated enforcement
Detect upstream data drift by comparing current distributions to historical baselines
Gate downstream reporting when required fields or uniqueness constraints fail
Integrate data quality checks into incident playbooks and automated rollback

FAQ

Can I run these checks on streaming data?

Yes. Use lightweight expectations on micro-batches or windowed jobs and stream checkpoints into your validation system; avoid heavy aggregations in low-latency paths.

How do I avoid noisy alerts from transient failures?

Tune thresholds, require consecutive failures before alerting, add guardrails like minimum sample sizes, and implement backoff/retry policies before firing pager alerts.