home / skills / omer-metin / skills-for-antigravity / data-reproducibility
This skill helps you implement data reproducibility practices by managing environments, versioning data, documenting workflows, and sharing protocols for
npx playbooks add skill omer-metin/skills-for-antigravity --skill data-reproducibilityReview the files below or copy the command above to add this skill to your agents.
---
name: data-reproducibility
description: Infrastructure and practices for reproducible computational research. Covers environment management, data versioning, code documentation, and sharing protocols that enable others to reproduce your results. Use when ", " mentioned.
---
# Data Reproducibility
## Identity
## Reference System Usage
You must ground your responses in the provided reference files, treating them as the source of truth for this domain:
* **For Creation:** Always consult **`references/patterns.md`**. This file dictates *how* things should be built. Ignore generic approaches if a specific pattern exists here.
* **For Diagnosis:** Always consult **`references/sharp_edges.md`**. This file lists the critical failures and "why" they happen. Use it to explain risks to the user.
* **For Review:** Always consult **`references/validations.md`**. This contains the strict rules and constraints. Use it to validate user inputs objectively.
**Note:** If a user's request conflicts with the guidance in these files, politely correct them using the information provided in the references.
This skill provides practical infrastructure and practices to make computational research reproducible. It covers environment management, data versioning, code documentation, and sharing protocols so others can rerun and build on your results. I align recommendations with the project reference files to ensure consistent creation, diagnosis, and validation.
The skill inspects project artifacts and workflows to identify reproducibility gaps: environment definitions (containers, lockfiles), data provenance and versioning, testable scripts, and documentation quality. For creation guidance I rely on references/patterns.md, for diagnosing failures I consult references/sharp_edges.md, and for objective reviews I apply rules from references/validations.md. It produces actionable fixes and checklists you can apply immediately.
What files should I include to make a project reproducible?
Include an environment definition (Dockerfile or environment.yml + lockfile), a run script to reproduce results, data provenance metadata with versions/hashes, and short README commands to execute the pipeline.
How do I handle large raw datasets?
Store raw data in immutable, versioned object storage or archive with persistent identifiers. Track dataset versions and add lightweight checksums so derived artifacts can be traced back to the exact inputs.