home / skills / gptomics / bioskills / methylation-pipeline

methylation-pipeline skill

/workflows/methylation-pipeline

This skill guides end-to-end bisulfite sequencing analysis from FASTQ to DMRs, coordinating Bismark, methylation calling, and methylKit workflows for robust

npx playbooks add skill gptomics/bioskills --skill methylation-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
7.3 KB
---
name: bio-workflows-methylation-pipeline
description: End-to-end bisulfite sequencing workflow from FASTQ to differentially methylated regions. Covers Bismark alignment, methylation calling, and DMR detection with methylKit. Use when analyzing bisulfite sequencing data.
tool_type: mixed
primary_tool: Bismark
workflow: true
depends_on:
  - read-qc/fastp-workflow
  - methylation-analysis/bismark-alignment
  - methylation-analysis/methylation-calling
  - methylation-analysis/methylkit-analysis
  - methylation-analysis/dmr-detection
qc_checkpoints:
  - after_qc: "Q30 >80%, adapter content removed"
  - after_alignment: "Mapping efficiency >50%, bisulfite conversion >99%"
  - after_calling: "Coverage distribution reasonable, no biased positions"
---

## Version Compatibility

Reference examples tested with: Bismark 0.24+, Bowtie2 2.5.3+, FastQC 0.12+, Trim Galore 0.6.10+, fastp 0.23+, methylKit 1.28+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Methylation Pipeline

**"Analyze my bisulfite sequencing data from FASTQ to DMRs"** → Orchestrate Bismark alignment, methylation calling, methylKit analysis, DMR detection, annotation with genomic features, and visualization of methylation patterns.

Complete workflow from bisulfite sequencing FASTQ to differentially methylated regions.

## Workflow Overview

```
FASTQ files
    |
    v
[1. QC & Trimming] -----> fastp/Trim Galore
    |
    v
[2. Alignment] ---------> Bismark
    |
    v
[3. Deduplication] -----> deduplicate_bismark
    |
    v
[4. Methylation Calling] -> bismark_methylation_extractor
    |
    v
[5. Analysis] -----------> methylKit (R)
    |
    v
[6. DMR Detection] ------> methylKit/DSS
    |
    v
Differentially methylated regions
```

## Primary Path: Bismark + methylKit

### Step 1: Quality Control

```bash
# Trim Galore recommended for bisulfite data (handles adapter bias)
trim_galore --paired --fastqc \
    -o trimmed/ \
    sample_R1.fastq.gz sample_R2.fastq.gz

# Or fastp with conservative settings
fastp -i sample_R1.fastq.gz -I sample_R2.fastq.gz \
    -o trimmed/sample_R1.fq.gz -O trimmed/sample_R2.fq.gz \
    --detect_adapter_for_pe \
    --qualified_quality_phred 20 \
    --length_required 35 \
    --html qc/sample_fastp.html
```

### Step 2: Bismark Alignment

```bash
# Prepare genome (once)
bismark_genome_preparation --bowtie2 genome/

# Align
bismark --genome genome/ \
    -1 trimmed/sample_R1_val_1.fq.gz \
    -2 trimmed/sample_R2_val_2.fq.gz \
    -o aligned/ \
    --parallel 4 \
    --temp_dir tmp/

# Output: sample_R1_val_1_bismark_bt2_pe.bam
```

**QC Checkpoint:** Check Bismark report
- Mapping efficiency >50% (BS-seq has lower rates)
- Bisulfite conversion rate >99%

### Step 3: Deduplication

```bash
deduplicate_bismark \
    --bam \
    -p \
    -o deduplicated/ \
    aligned/sample_R1_val_1_bismark_bt2_pe.bam
```

### Step 4: Methylation Calling

```bash
bismark_methylation_extractor \
    --paired-end \
    --comprehensive \
    --bedGraph \
    --cytosine_report \
    --genome_folder genome/ \
    -o methylation/ \
    deduplicated/sample_R1_val_1_bismark_bt2_pe.deduplicated.bam

# Generate summary report
bismark2report
bismark2summary
```

### Step 5: Analysis with methylKit

```r
library(methylKit)

# Read methylation calls
files <- list(
    'methylation/control_1.CpG_report.txt',
    'methylation/control_2.CpG_report.txt',
    'methylation/treated_1.CpG_report.txt',
    'methylation/treated_2.CpG_report.txt'
)

sample_ids <- c('control_1', 'control_2', 'treated_1', 'treated_2')
treatment <- c(0, 0, 1, 1)

# Read cytosine reports
meth_obj <- methRead(
    location = as.list(files),
    sample.id = as.list(sample_ids),
    assembly = 'hg38',
    treatment = treatment,
    context = 'CpG',
    pipeline = 'bismarkCytosineReport'
)

# Filter by coverage
meth_filtered <- filterByCoverage(meth_obj, lo.count = 10, hi.perc = 99.9)

# Normalize coverage
meth_norm <- normalizeCoverage(meth_filtered)

# Merge samples (keep sites covered in all)
meth_merged <- unite(meth_norm, destrand = TRUE)

# Sample statistics
getMethylationStats(meth_obj[[1]], plot = TRUE)
getCoverageStats(meth_obj[[1]], plot = TRUE)
```

### Step 6: DMR Detection

```r
# Calculate differential methylation (per CpG)
diff_meth <- calculateDiffMeth(meth_merged)

# Get significant DMCs
dmc <- getMethylDiff(diff_meth, difference = 25, qvalue = 0.01)

# Tile into regions (DMRs)
tiles <- tileMethylCounts(meth_merged, win.size = 1000, step.size = 1000)
diff_tiles <- calculateDiffMeth(tiles)
dmr <- getMethylDiff(diff_tiles, difference = 25, qvalue = 0.01)

# Export
write.csv(as.data.frame(dmc), 'dmc_results.csv')
write.csv(as.data.frame(dmr), 'dmr_results.csv')

# Annotate with genomic features
library(genomation)
gene_obj <- readTranscriptFeatures('genes.bed')
annotateWithGeneParts(as(dmr, 'GRanges'), gene_obj)
```

## Parameter Recommendations

| Step | Parameter | Value |
|------|-----------|-------|
| Trim Galore | default | Recommended for BS-seq |
| Bismark | --parallel | 4 (per sample parallelization) |
| methylKit | lo.count | 10 (minimum coverage) |
| methylKit | difference | 25 (% methylation difference) |
| methylKit | qvalue | 0.01 |
| DMR tiles | win.size | 500-1000 bp |

## Troubleshooting

| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| Low mapping rate | Normal for BS-seq | Expect 40-70% |
| Low conversion | Failed bisulfite treatment | Check spike-in controls |
| Few DMRs | Low coverage, small differences | Increase sequencing, relax thresholds |
| Biased positions | M-bias | Trim 10bp from read ends |

## Complete Pipeline Script

```bash
#!/bin/bash
set -e

THREADS=4
GENOME="genome/"
SAMPLES="control_1 control_2 treated_1 treated_2"
OUTDIR="methylation_results"

mkdir -p ${OUTDIR}/{trimmed,aligned,deduplicated,methylation,qc}

# Step 1: QC
for sample in $SAMPLES; do
    trim_galore --paired --fastqc -o ${OUTDIR}/trimmed/ \
        ${sample}_R1.fastq.gz ${sample}_R2.fastq.gz
done

# Step 2: Alignment
for sample in $SAMPLES; do
    bismark --genome ${GENOME} \
        -1 ${OUTDIR}/trimmed/${sample}_R1_val_1.fq.gz \
        -2 ${OUTDIR}/trimmed/${sample}_R2_val_2.fq.gz \
        -o ${OUTDIR}/aligned/ \
        --parallel ${THREADS} --temp_dir tmp/
done

# Step 3: Deduplication
for sample in $SAMPLES; do
    deduplicate_bismark --bam -p \
        -o ${OUTDIR}/deduplicated/ \
        ${OUTDIR}/aligned/${sample}_R1_val_1_bismark_bt2_pe.bam
done

# Step 4: Methylation calling
for sample in $SAMPLES; do
    bismark_methylation_extractor --paired-end --comprehensive \
        --bedGraph --cytosine_report \
        --genome_folder ${GENOME} \
        -o ${OUTDIR}/methylation/ \
        ${OUTDIR}/deduplicated/${sample}_R1_val_1_bismark_bt2_pe.deduplicated.bam
done

bismark2report
echo "Pipeline complete. Run R script for DMR analysis."
```

## Related Skills

- methylation-analysis/bismark-alignment - Bismark parameters
- methylation-analysis/methylation-calling - Calling details
- methylation-analysis/methylkit-analysis - methylKit functions
- methylation-analysis/dmr-detection - DMR algorithms

Overview

This skill packages an end-to-end bisulfite sequencing workflow from raw FASTQ files to differentially methylated regions (DMRs). It orchestrates adapter trimming/QC, Bismark alignment, deduplication, methylation calling, and downstream DMR detection and annotation with methylKit and genomation. Designed for practical, reproducible processing of BS-seq datasets.

How this skill works

The workflow runs QC and trimming (Trim Galore or fastp), aligns reads with Bismark (Bowtie2 backend), removes PCR duplicates, and calls methylation with bismark_methylation_extractor. Output cytosine reports are imported into methylKit in R for coverage filtering, normalization, per-CpG differential testing, tiling into regions, DMR calling, and annotation. The skill includes recommended parameters, example scripts, and checkpoints for conversion and mapping efficiency.

When to use it

  • You have paired-end bisulfite sequencing FASTQ files and need reproducible DMRs.
  • You want a tested pipeline combining Bismark alignment with methylKit statistical analysis.
  • You need end-to-end automation from trimming through annotation and export of DMR/ DMC tables.
  • You require recommended parameters and troubleshooting tips for common BS‑seq issues.

Best practices

  • Run bismark_genome_preparation once per reference genome and confirm Bowtie2 versions match examples.
  • Use Trim Galore for bisulfite-specific trimming; fastp is acceptable with conservative settings.
  • Check Bismark reports: mapping efficiency (expect 40–70%) and bisulfite conversion (>99%) before downstream analysis.
  • Filter coverage (lo.count ~10, hi.perc ~99.9) and normalize coverage in methylKit to reduce bias.
  • Use sensible DMR thresholds (e.g., difference 25%, qvalue 0.01) and test window sizes (500–1000 bp) for tile-based DMR detection.

Example use cases

  • Process a 4-sample case/control BS-seq experiment from FASTQ to annotated DMR CSVs.
  • Quickly validate bisulfite conversion and mapping efficiency across a sequencing run before committing to deep analysis.
  • Generate per-CpG methylation statistics and coverage plots for QC and reporting.
  • Tile methylation profiles into 1 kb windows to detect broader regulatory region changes between conditions.
  • Annotate discovered DMRs with gene parts to prioritize candidates for functional follow-up.

FAQ

What coverage thresholds should I use?

A common starting point is lo.count=10 and hi.perc=99.9 in methylKit, then adjust by data quality and sequencing depth.

My mapping rate is low — is that normal for BS‑seq?

Yes. Bisulfite conversion reduces complexity; expect 40–70% mapping. Check conversion controls and trimming (M-bias) before changing aligner settings.