home / skills / gptomics / bioskills / longread-sv-pipeline

longread-sv-pipeline skill

safe

This skill guides end-to-end long-read SV analysis by aligning reads, calling SVs with sniffles or cuteSV, filtering, and annotating results.

npx playbooks add skill gptomics/bioskills --skill longread-sv-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

6.7 KB

---
name: bio-workflows-longread-sv-pipeline
description: End-to-end workflow for detecting structural variants from long-read sequencing data. Covers ONT/PacBio alignment with minimap2 and SV calling with Sniffles or cuteSV. Use when detecting structural variants from long reads.
tool_type: cli
primary_tool: Sniffles
workflow: true
depends_on:
  - long-read-sequencing/long-read-alignment
  - long-read-sequencing/long-read-qc
  - long-read-sequencing/structural-variants
qc_checkpoints:
  - after_qc: "Read N50 >10kb, quality score >Q10"
  - after_alignment: "Mapping rate >90%, coverage sufficient"
  - after_calling: "SV count reasonable, genotypes concordant"
---

## Version Compatibility

Reference examples tested with: bcftools 1.19+, minimap2 2.26+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:
- CLI: `<tool> --version` then `<tool> --help` to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Long-Read SV Pipeline

**"Detect structural variants from my long-read sequencing data"** → Orchestrate minimap2 alignment, SV calling (Sniffles2/cuteSV), VCF merging across callers, annotation (AnnotSV), and visualization for ONT or PacBio data.

Complete workflow for detecting structural variants from ONT or PacBio long-read data.

## Workflow Overview

```
Long reads (ONT/PacBio)
    |
    v
[1. QC] ----------------> NanoPlot
    |
    v
[2. Alignment] ---------> minimap2
    |
    v
[3. SV Calling] --------> Sniffles / cuteSV
    |
    v
[4. Filtering] ---------> bcftools
    |
    v
[5. Annotation] --------> AnnotSV (optional)
    |
    v
Filtered SV VCF
```

## Primary Path: minimap2 + Sniffles

### Step 1: Quality Control

```bash
# ONT reads QC
NanoPlot --fastq reads.fastq.gz \
    --outdir nanoplot_output \
    --threads 8

# Check key metrics
# - Read N50 should be >10kb
# - Mean quality >Q10
# - Total bases sufficient for coverage
```

### Step 2: Alignment with minimap2

```bash
# ONT reads
minimap2 -ax map-ont \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

samtools index aligned.bam

# PacBio HiFi
minimap2 -ax map-hifi \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam

# PacBio CLR
minimap2 -ax map-pb \
    -t 16 \
    --MD \
    -Y \
    reference.fa \
    reads.fastq.gz | \
    samtools sort -@ 4 -o aligned.bam
```

**QC Checkpoint:** Check alignment stats
```bash
samtools flagstat aligned.bam
samtools depth -a aligned.bam | awk '{sum+=$3} END {print "Average coverage:",sum/NR}'
```
- Mapping rate >90%
- Average coverage >10x for SV calling (>20x preferred)

### Step 3: SV Calling with Sniffles

```bash
# Sniffles2 (recommended)
sniffles \
    --input aligned.bam \
    --vcf svs.vcf.gz \
    --reference reference.fa \
    --threads 8 \
    --minsvlen 50

# With tandem repeat annotations (recommended)
sniffles \
    --input aligned.bam \
    --vcf svs.vcf.gz \
    --reference reference.fa \
    --tandem-repeats tandem_repeats.bed \
    --threads 8
```

### Alternative: cuteSV

```bash
# cuteSV (faster, good for ONT)
cuteSV \
    aligned.bam \
    reference.fa \
    svs.vcf \
    work_dir/ \
    --threads 8 \
    --min_size 50 \
    --genotype

bgzip svs.vcf
tabix svs.vcf.gz
```

### Step 4: Filtering

```bash
# Filter by quality and size
bcftools view -i 'QUAL>=20 && ABS(SVLEN)>=50' svs.vcf.gz -Oz -o svs.filtered.vcf.gz

# Filter by SV type
bcftools view -i 'SVTYPE="DEL" || SVTYPE="INS"' svs.filtered.vcf.gz -Oz -o del_ins.vcf.gz

# Filter by genotype
bcftools view -i 'GT="1/1" || GT="0/1"' svs.filtered.vcf.gz -Oz -o genotyped.vcf.gz

# Stats
bcftools stats svs.filtered.vcf.gz > sv_stats.txt
```

### Step 5: Annotation (Optional)

```bash
# AnnotSV for gene/clinical annotations
AnnotSV -SVinputFile svs.filtered.vcf.gz \
    -outputFile annotated_svs \
    -genomeBuild GRCh38
```

## Multi-Sample SV Calling

```bash
# Call SVs per sample
for sample in sample1 sample2 sample3; do
    sniffles --input ${sample}.bam \
        --snf ${sample}.snf \
        --reference reference.fa
done

# Merge and joint genotype
sniffles --input sample1.snf sample2.snf sample3.snf \
    --vcf merged_svs.vcf.gz \
    --reference reference.fa
```

## Parameter Recommendations

| Tool | Parameter | ONT | PacBio HiFi |
|------|-----------|-----|-------------|
| minimap2 | -ax | map-ont | map-hifi |
| Sniffles | --minsvlen | 50 | 50 |
| Sniffles | --minsupport | auto | auto |
| cuteSV | --min_size | 50 | 50 |
| cuteSV | --min_support | 3 | 3 |

## SV Types Detected

| Type | Abbreviation | Description |
|------|--------------|-------------|
| Deletion | DEL | Sequence removed |
| Insertion | INS | Sequence added |
| Duplication | DUP | Sequence copied |
| Inversion | INV | Sequence reversed |
| Translocation | BND | Breakend (interchromosomal) |

## Troubleshooting

| Issue | Likely Cause | Solution |
|-------|--------------|----------|
| Few SVs | Low coverage | Increase sequencing depth |
| Many false positives | Low quality reads | Filter by QUAL, increase min support |
| Missing known SV | Repeat region | Use tandem repeat annotations |
| High breakend count | Mapping artifacts | Check alignment quality |

## Complete Pipeline Script

```bash
#!/bin/bash
set -e

THREADS=16
READS="reads.fastq.gz"
REF="reference.fa"
SAMPLE="sample1"
OUTDIR="sv_results"

mkdir -p ${OUTDIR}/{qc,aligned,sv}

# Step 1: QC
echo "=== QC ==="
NanoPlot --fastq ${READS} --outdir ${OUTDIR}/qc -t ${THREADS}

# Step 2: Alignment
echo "=== Alignment ==="
minimap2 -ax map-ont -t ${THREADS} --MD -Y ${REF} ${READS} | \
    samtools sort -@ 4 -o ${OUTDIR}/aligned/${SAMPLE}.bam
samtools index ${OUTDIR}/aligned/${SAMPLE}.bam

echo "Alignment stats:"
samtools flagstat ${OUTDIR}/aligned/${SAMPLE}.bam

# Step 3: SV calling
echo "=== SV Calling ==="
sniffles --input ${OUTDIR}/aligned/${SAMPLE}.bam \
    --vcf ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
    --reference ${REF} \
    --threads ${THREADS}

# Step 4: Filter
echo "=== Filtering ==="
bcftools view -i 'QUAL>=20' ${OUTDIR}/sv/${SAMPLE}.vcf.gz \
    -Oz -o ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz
bcftools index ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz

# Stats
bcftools stats ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz > ${OUTDIR}/sv/stats.txt

echo "=== Complete ==="
echo "SVs: $(bcftools view -H ${OUTDIR}/sv/${SAMPLE}.filtered.vcf.gz | wc -l)"
```

## Related Skills

- long-read-sequencing/long-read-alignment - minimap2 details
- long-read-sequencing/structural-variants - Sniffles, cuteSV options
- long-read-sequencing/long-read-qc - NanoPlot metrics
- variant-calling/structural-variant-calling - Short-read SV methods

Overview

This skill provides an end-to-end workflow to detect structural variants (SVs) from long-read sequencing data (ONT or PacBio). It orchestrates QC, alignment with minimap2, SV calling with Sniffles or cuteSV, filtering with bcftools, optional annotation with AnnotSV, and basic visualization/summary steps. The guidance focuses on practical commands, parameter recommendations, and troubleshooting checkpoints for reliable SV discovery.

How this skill works

The workflow runs NanoPlot for read-level QC, aligns reads to a reference using minimap2 with presets matched to platform, then calls SVs per-sample with Sniffles2 (recommended) or cuteSV. Resulting VCFs are filtered by quality, size, type, and genotype using bcftools, optionally annotated with AnnotSV, and merged across samples using Sniffles SNF merge for multi-sample analysis. Key checkpoints include mapping rate, coverage, and SV quality metrics.

When to use it

Detect structural variants from Nanopore (ONT) or PacBio reads
Single-sample SV discovery with short turnaround
Multi-sample joint SV discovery and genotyping
Projects needing annotation of SVs for gene/clinical interpretation
When caller comparison or merging across callers is required

Best practices

Run NanoPlot early and ensure read N50 and quality meet thresholds (e.g., N50>10 kb, mean Q>10)
Choose minimap2 preset by platform (map-ont, map-pb, map-hifi) and check mapping rate (>90%) and coverage (>10x; >20x preferred)
Prefer Sniffles2 for sensitivity and use tandem repeat BED input to reduce false positives in repetitive regions
Filter VCFs by QUAL, SVLEN, and genotype with bcftools (example: QUAL>=20 and ABS(SVLEN)>=50)
Keep tool versions consistent with tested examples (bcftools, minimap2, samtools) and adapt command flags if versions differ

Example use cases

Call and filter SVs from a single ONT run and produce an annotated VCF for interpretation
Process multiple PacBio samples, merge SNF files and produce a joint VCF for cohort analysis
Quick pipeline to screen for large insertions/deletions in a clinical research sample using Sniffles then AnnotSV
Compare cuteSV and Sniffles outputs to evaluate caller-specific SVs and merge high-confidence calls

FAQ

Which caller should I use: Sniffles or cuteSV?

Sniffles2 is recommended for overall sensitivity and multi-sample workflows; cuteSV is faster and performs well on ONT. Use both for caller comparison when needed.

What are minimum coverage and read-quality expectations?

Aim for average coverage >10x for basic SV calling and >20x for reliable detection. Read N50 >10 kb and mean read quality >Q10 reduce false positives.