home / skills / gptomics / bioskills / nucleosome-positioning
This skill helps identify nucleosome positions and occupancy from ATAC-seq data by integrating NucleoATAC, ATACseqQC, and fragment analysis.
npx playbooks add skill gptomics/bioskills --skill nucleosome-positioningReview the files below or copy the command above to add this skill to your agents.
---
name: bio-atac-seq-nucleosome-positioning
description: Extract nucleosome positions from ATAC-seq data using NucleoATAC, ATACseqQC, and fragment analysis. Use when analyzing chromatin organization, identifying nucleosome-free regions at promoters, or characterizing nucleosome occupancy patterns from ATAC-seq fragment size distributions.
tool_type: mixed
primary_tool: NucleoATAC
---
## Version Compatibility
Reference examples tested with: Rsamtools 2.18+, matplotlib 3.8+, numpy 1.26+, pyBigWig 0.3+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python: `pip show <package>` then `help(module.function)` to check signatures
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters
- CLI: `<tool> --version` then `<tool> --help` to confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
# Nucleosome Positioning
**"Map nucleosome positions from ATAC-seq"** → Separate nucleosome-free and mono-nucleosome fragments by size, then call nucleosome center positions and occupancy scores.
- CLI: `nucleoatac run --bed peaks.bed --bam atac.bam --fasta ref.fa`
- R: `ATACseqQC::splitGAlignmentsByCut()` for fragment separation
Extract nucleosome positions and occupancy from ATAC-seq fragment size patterns.
## Background
ATAC-seq fragments reflect chromatin structure:
- **< 100 bp**: Nucleosome-free regions (NFR)
- **180-247 bp**: Mono-nucleosome
- **315-473 bp**: Di-nucleosome
- **558-615 bp**: Tri-nucleosome
## ATACseqQC (R)
### Installation
```r
BiocManager::install('ATACseqQC')
```
### Fragment Size Distribution
```r
library(ATACseqQC)
library(Rsamtools)
# Read BAM
bamfile <- 'sample.bam'
# Fragment size distribution
fragSize <- fragSizeDist(bamfile, 'sample')
# Nucleosome-free and mono-nucleosome ratios
# Automatic QC metrics
```
### Nucleosome Positioning
**Goal:** Map nucleosome positions around TSS using ATAC-seq fragment size classes.
**Approach:** Read BAM, apply Tn5 shift correction, split fragments into NFR and mono-nucleosome classes by size, then compute signal profiles around TSS.
```r
library(ATACseqQC)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(BSgenome.Hsapiens.UCSC.hg38)
# Get TSS regions
txs <- transcripts(TxDb.Hsapiens.UCSC.hg38.knownGene)
tss <- promoters(txs, upstream=1000, downstream=1000)
# Read BAM
gal <- readBamFile(bamfile, asMates=TRUE, bigFile=TRUE)
# Shift reads (Tn5 offset correction)
gal_shifted <- shiftGAlignmentsList(gal)
# Split by nucleosome-free and nucleosomal
objs <- splitGAlignmentsByCut(gal_shifted, txs=txs,
genome=BSgenome.Hsapiens.UCSC.hg38)
# nucleosome-free fragments
nfr <- objs$NussomeFree
# Mono-nucleosome fragments
mono <- objs$mononucleosome
# Signal around TSS
sigs <- featureAlignedSignal(cvglist=objs,
feature.gr=tss,
upstream=1000,
downstream=1000)
```
### V-Plot (Fragment Size vs Position)
```r
# V-plot showing nucleosome positioning around TSS
vp <- vPlot(gal_shifted, tss,
genome=BSgenome.Hsapiens.UCSC.hg38,
upstream=1000, downstream=1000)
```
### Footprinting
```r
# Transcription factor footprinting
library(MotifDb)
# Get motif
motif <- query(MotifDb, 'CTCF')[[1]]
# Find motif occurrences
library(motifmatchr)
motif_pos <- matchMotifs(motif, BSgenome.Hsapiens.UCSC.hg38,
genome='hg38', out='positions')
# Calculate footprint
fp <- factorFootprints(gal_shifted, motif_pos,
genome=BSgenome.Hsapiens.UCSC.hg38,
upstream=100, downstream=100)
```
## NucleoATAC (Python)
### Installation
```bash
pip install nucleoatac
```
### Run NucleoATAC
**Goal:** Call precise nucleosome center positions and occupancy scores from ATAC-seq data.
**Approach:** Run NucleoATAC on defined genomic regions with a reference genome, producing nucleosome position calls and occupancy tracks.
```bash
# Call nucleosomes
nucleoatac run --bed regions.bed --bam sample.bam --fasta reference.fa \
--out nucleoatac_output --cores 8
```
### Output Files
| File | Description |
|------|-------------|
| `.nucpos.bed` | Nucleosome positions |
| `.nucpos.redundant.bed` | All nucleosome calls |
| `.nfrpos.bed` | NFR positions |
| `.occ.bedgraph` | Nucleosome occupancy track |
| `.nucmap_combined.bed` | Combined nucleosome map |
### Visualize Output
```bash
# Convert to bigWig for visualization
bedGraphToBigWig nucleoatac_output.occ.bedgraph chrom.sizes nucleosome_occ.bw
```
## Fragment Analysis (Custom)
### Extract Fragment Sizes
**Goal:** Visualize ATAC-seq fragment size distribution to assess nucleosome periodicity.
**Approach:** Extract template lengths from properly paired reads, then plot the histogram with NFR and mono-nucleosome cutoff markers.
```python
import pysam
import numpy as np
import matplotlib.pyplot as plt
bam = pysam.AlignmentFile('sample.bam', 'rb')
fragment_sizes = []
for read in bam.fetch():
if read.is_proper_pair and read.is_read1:
frag_size = abs(read.template_length)
if 0 < frag_size < 1000:
fragment_sizes.append(frag_size)
bam.close()
# Plot distribution
plt.figure(figsize=(10, 6))
plt.hist(fragment_sizes, bins=200, edgecolor='none', alpha=0.7)
plt.axvline(100, color='red', linestyle='--', label='NFR cutoff')
plt.axvline(180, color='blue', linestyle='--', label='Mono-nuc start')
plt.xlabel('Fragment Size (bp)')
plt.ylabel('Count')
plt.legend()
plt.savefig('fragment_distribution.png', dpi=300)
```
### Split by Fragment Size
```bash
# Extract nucleosome-free reads
samtools view -h sample.bam | \
awk '$9 > -100 && $9 < 100 || $1 ~ /^@/' | \
samtools view -b > nfr.bam
# Extract mono-nucleosome reads
samtools view -h sample.bam | \
awk '($9 >= 180 && $9 <= 247) || ($9 <= -180 && $9 >= -247) || $1 ~ /^@/' | \
samtools view -b > mono_nuc.bam
```
### Signal Around Features
```python
import pysam
import numpy as np
import pyBigWig
def signal_around_sites(bam_file, sites, upstream=1000, downstream=1000):
bam = pysam.AlignmentFile(bam_file, 'rb')
window_size = upstream + downstream
signal = np.zeros(window_size)
for chrom, pos, strand in sites:
start = pos - upstream if strand == '+' else pos - downstream
end = pos + downstream if strand == '+' else pos + upstream
for read in bam.fetch(chrom, max(0, start), end):
if read.is_proper_pair and read.is_read1:
frag_center = read.reference_start + abs(read.template_length) // 2
rel_pos = frag_center - start
if 0 <= rel_pos < window_size:
signal[rel_pos] += 1
bam.close()
return signal / len(sites)
# Load TSS sites
tss_sites = [] # Load from GTF
nfr_signal = signal_around_sites('nfr.bam', tss_sites)
mono_signal = signal_around_sites('mono_nuc.bam', tss_sites)
```
## DANPOS
### Installation
```bash
conda install -c bioconda danpos
```
### Run DANPOS
```bash
# Single sample
danpos.py dpos sample.bam -o danpos_output
# Compare conditions
danpos.py dpeak -b treatment.bam -c control.bam -o danpos_diff
```
## Complete Workflow
**Goal:** Run end-to-end nucleosome positioning analysis from BAM to heatmaps and V-plots.
**Approach:** Read BAM, shift reads for Tn5 offset, split fragments by size class, compute signal profiles around TSS, and generate heatmaps and V-plots.
```r
library(ATACseqQC)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(BSgenome.Hsapiens.UCSC.hg38)
bamfile <- 'sample.bam'
# 1. Fragment size QC
fragSize <- fragSizeDist(bamfile, 'sample')
pdf('fragment_size.pdf')
plot(fragSize)
dev.off()
# 2. Read and shift
gal <- readBamFile(bamfile, asMates=TRUE, bigFile=TRUE)
gal_shifted <- shiftGAlignmentsList(gal)
# 3. Get TSS regions
txs <- transcripts(TxDb.Hsapiens.UCSC.hg38.knownGene)
tss <- promoters(txs, upstream=2000, downstream=2000)
# 4. Split by fragment size
objs <- splitGAlignmentsByCut(gal_shifted, txs=txs,
genome=BSgenome.Hsapiens.UCSC.hg38)
# 5. Calculate signals
sigs <- featureAlignedSignal(cvglist=objs,
feature.gr=tss,
upstream=2000,
downstream=2000)
# 6. Plot heatmap
pdf('nucleosome_heatmap.pdf', width=8, height=10)
featureAlignedHeatmap(sigs, tss, upstream=2000, downstream=2000)
dev.off()
# 7. V-plot
pdf('vplot.pdf')
vPlot(gal_shifted, tss, genome=BSgenome.Hsapiens.UCSC.hg38,
upstream=1000, downstream=1000)
dev.off()
# 8. Export nucleosome-free and nucleosomal BAMs
export(objs$NuclsomeFree, 'nfr.bam')
export(objs$mononucleosome, 'mono_nucleosome.bam')
```
## Related Skills
- atac-seq/atac-peak-calling - Call accessibility peaks
- atac-seq/atac-qc - Quality control metrics
- atac-seq/footprinting - TF footprinting
- chip-seq/peak-annotation - Annotate nucleosome positions
This skill extracts nucleosome positions and occupancy from ATAC-seq data using NucleoATAC, ATACseqQC, fragment-size analysis, and optional DANPOS. It provides end-to-end patterns: fragment QC, Tn5 shift correction, fragment-class splitting (NFR, mono-, di-nucleosome), nucleosome calling, and visualization (V-plots, heatmaps, occupancy tracks). Practical CLI, R, and Python snippets guide conversion from BAM to bed/bigWig outputs for downstream analysis and genome browser inspection.
The workflow inspects paired-end ATAC-seq BAM files to compute fragment size distributions, applies Tn5 offset correction, and divides fragments into size classes (e.g., <100 bp NFR, 180–247 bp mono-nucleosome). NucleoATAC is used to call precise nucleosome centers and occupancy scores on target regions; ATACseqQC and custom Python code generate feature-aligned signals, V-plots, and heatmaps. Outputs include nucpos beds, occupancy bedGraph/bigWig, and class-specific BAMs for visualization and comparative analysis.
What fragment size ranges should I use for nucleosome classes?
Typical cutoffs are <100 bp for NFR, 180–247 bp for mono-nucleosome, ~315–473 bp for di-nucleosome; inspect your fragment histogram and adjust as needed.
When should I run NucleoATAC versus ATACseqQC functions?
Use ATACseqQC and fragment analysis for QC, Tn5 shift, and feature-aligned signals. Run NucleoATAC to call precise nucleosome centers and occupancy on defined regions for high-resolution maps.