home / skills / a5c-ai / babysitter / gatk-variant-caller

This skill applies GATK best practices for germline and somatic variant calling with joint genotyping support.

npx playbooks add skill a5c-ai/babysitter --skill gatk-variant-caller

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.3 KB
---
name: gatk-variant-caller
description: GATK best practices skill for germline and somatic variant calling with joint genotyping
allowed-tools:
  - Read
  - Write
  - Glob
  - Grep
  - Edit
  - WebFetch
  - WebSearch
  - Bash
metadata:
  version: "1.0"
  category: bioinformatics
  tags:
    - variant-analysis
    - gatk
    - snv
    - indel
---

# GATK Variant Caller Skill

## Purpose
Provide GATK best practices for germline and somatic variant calling with joint genotyping support.

## Capabilities
- HaplotypeCaller execution
- Base quality score recalibration (BQSR)
- Variant quality score recalibration (VQSR)
- Joint genotyping across cohorts
- GVCF generation and management
- Mutect2 somatic calling

## Usage Guidelines
- Follow GATK best practices workflow
- Apply BQSR for improved accuracy
- Use VQSR for quality filtering when sample count permits
- Generate GVCFs for scalable joint calling
- Select Mutect2 for somatic variants
- Document resource bundles and versions

## Dependencies
- GATK4
- Picard

## Process Integration
- Whole Genome Sequencing Pipeline (wgs-analysis-pipeline)
- Clinical Variant Interpretation (clinical-variant-interpretation)
- Tumor Molecular Profiling (tumor-molecular-profiling)
- Rare Disease Diagnostic Pipeline (rare-disease-diagnostics)

Overview

This skill implements GATK best practices for germline and somatic variant calling with support for joint genotyping. It packages automated steps like BQSR, HaplotypeCaller, Mutect2, VQSR, and GVCF management into a reproducible workflow suitable for cohort-scale analysis. The skill is designed to integrate with orchestration tools to run deterministic, resumable pipelines.

How this skill works

The skill runs data-preprocessing (including Base Quality Score Recalibration) then executes HaplotypeCaller to produce per-sample GVCFs for germline analysis. For somatic workflows it runs Mutect2 with tumor-normal handling and filtering. It supports VQSR for cohort-aware variant quality modeling and performs joint genotyping across aggregated GVCFs to produce cohort VCFs. The implementation expects GATK4 and Picard as runtime dependencies.

When to use it

  • Performing cohort-level germline variant discovery with scalable joint genotyping.
  • Calling somatic variants in tumor-normal pairs using Mutect2 and downstream filters.
  • Reprocessing BAMs to improve variant accuracy through BQSR and standardized preprocessing.
  • Building VQSR models when sample counts are sufficient for reliable recalibration.
  • Integrating into WGS, clinical interpretation, or tumor profiling pipelines that require reproducibility.

Best practices

  • Always apply BQSR before variant calling to reduce false positives from systematic error.
  • Produce per-sample GVCFs to enable efficient joint genotyping as cohort size grows.
  • Use VQSR only when you have enough variants/samples; otherwise apply hard filters.
  • Document GATK, Picard, and reference resource versions for reproducibility and audits.
  • Validate somatic calls with panel-of-normals and contamination estimates to reduce artifacts.

Example use cases

  • Running a population study: generate GVCFs for 1,000 samples and perform joint genotyping to produce a high-quality cohort VCF.
  • Clinical pipeline: standardized preprocessing and germline calling for rare-disease diagnostics with traceable resource versions.
  • Tumor profiling: Mutect2 tumor-normal calling with PON and contamination filtering for actionable somatic variant detection.

FAQ

What dependencies are required?

GATK4 and Picard are required; include the same reference FASTA, known-sites VCFs, and resource bundles used in best-practice documentation.

When should I use VQSR versus hard filters?

Use VQSR when you have a sufficiently large cohort or training resources; use carefully tuned hard filters for small datasets where VQSR is unreliable.