home / skills / gptomics / bioskills / multiome-pipeline

multiome-pipeline skill

safe

This skill guides end-to-end joint scRNA-seq and scATAC-seq analysis using Seurat and Signac, enabling WNN integration and multiome clustering.

npx playbooks add skill gptomics/bioskills --skill multiome-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

8.3 KB

---
name: bio-workflows-multiome-pipeline
description: End-to-end multiome workflow for joint scRNA-seq + scATAC-seq analysis. Covers data loading, separate modality processing, and WNN integration with Seurat/Signac. Use when analyzing joint scRNA+scATAC data.
tool_type: r
primary_tool: Seurat
workflow: true
depends_on:
  - single-cell/data-io
  - single-cell/preprocessing
  - single-cell/clustering
  - single-cell/multimodal-integration
  - single-cell/scatac-analysis
qc_checkpoints:
  - after_loading: "Both modalities detected per cell"
  - after_rna_qc: "RNA quality filters passed"
  - after_atac_qc: "TSS enrichment >2, nucleosome signal <4"
  - after_wnn: "Joint embedding separates cell types"
---

## Version Compatibility

Reference examples tested with: ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Multiome Pipeline

**"Analyze my 10X Multiome data jointly"** → Orchestrate Cell Ranger ARC processing, Seurat/Signac scRNA+scATAC integration via WNN, chromatin accessibility peak calling, motif enrichment, and gene regulatory network inference.

Complete workflow for 10X Multiome (joint scRNA + scATAC) analysis using Seurat and Signac.

## Workflow Overview

```
10X Multiome data
    |
    v
[1. Load Data] ---------> Read RNA + ATAC
    |
    v
[2. RNA Processing] ----> Standard scRNA workflow
    |
    v
[3. ATAC Processing] ---> Peak calling, LSI
    |
    v
[4. WNN Integration] ---> Weighted nearest neighbors
    |
    v
[5. Joint Analysis] ----> Clustering, markers
    |
    v
[6. Linked Features] ---> Gene-peak links
    |
    v
Integrated multiome object
```

## Step 1: Load Multiome Data

```r
library(Seurat)
library(Signac)
library(EnsDb.Hsapiens.v86)
library(ggplot2)

# Load RNA
rna_counts <- Read10X_h5('filtered_feature_bc_matrix.h5')
# For multiome, this returns a list with 'Gene Expression' and 'Peaks'

# Create Seurat object with RNA
seurat_obj <- CreateSeuratObject(
    counts = rna_counts$`Gene Expression`,
    assay = 'RNA'
)

# Load ATAC
atac_counts <- rna_counts$Peaks
# Or from fragments file
frags <- CreateFragmentObject('atac_fragments.tsv.gz', cells = colnames(seurat_obj))

# Create ChromatinAssay
atac_assay <- CreateChromatinAssay(
    counts = atac_counts,
    sep = c(':', '-'),
    fragments = frags,
    annotation = GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
)

seurat_obj[['ATAC']] <- atac_assay
```

## Step 2: RNA Quality Control and Processing

```r
# QC metrics
seurat_obj[['percent.mt']] <- PercentageFeatureSet(seurat_obj, pattern = '^MT-')

# Filter
seurat_obj <- subset(seurat_obj,
    nCount_RNA > 1000 &
    nCount_RNA < 25000 &
    percent.mt < 20
)

# Normalize RNA
seurat_obj <- SCTransform(seurat_obj, assay = 'RNA', verbose = FALSE)

# PCA
seurat_obj <- RunPCA(seurat_obj, assay = 'SCT', verbose = FALSE)
```

## Step 3: ATAC Quality Control and Processing

```r
# ATAC QC metrics
DefaultAssay(seurat_obj) <- 'ATAC'

seurat_obj <- NucleosomeSignal(seurat_obj)
seurat_obj <- TSSEnrichment(seurat_obj)

# Visualize
VlnPlot(seurat_obj, features = c('nCount_ATAC', 'TSS.enrichment', 'nucleosome_signal'),
        pt.size = 0, ncol = 3)

# Filter ATAC
seurat_obj <- subset(seurat_obj,
    nCount_ATAC > 1000 &
    nCount_ATAC < 100000 &
    TSS.enrichment > 2 &
    nucleosome_signal < 4
)

# Normalize ATAC (TF-IDF + SVD = LSI)
seurat_obj <- RunTFIDF(seurat_obj)
seurat_obj <- FindTopFeatures(seurat_obj, min.cutoff = 'q0')
seurat_obj <- RunSVD(seurat_obj)

# Check LSI components (first often correlates with depth)
DepthCor(seurat_obj)
```

## Step 4: Weighted Nearest Neighbors (WNN)

```r
# Build WNN graph using both modalities
seurat_obj <- FindMultiModalNeighbors(
    seurat_obj,
    reduction.list = list('pca', 'lsi'),
    dims.list = list(1:30, 2:30),  # Skip LSI component 1 if depth-correlated
    modality.weight.name = 'RNA.weight'
)

# UMAP on WNN graph
seurat_obj <- RunUMAP(seurat_obj, nn.name = 'weighted.nn',
                       reduction.name = 'wnn.umap', reduction.key = 'wnnUMAP_')

# Cluster on WNN
seurat_obj <- FindClusters(seurat_obj, graph.name = 'wsnn',
                            algorithm = 3, resolution = 0.5, verbose = FALSE)
```

## Step 5: Visualization and Markers

```r
# Compare modality-specific and joint embeddings
p1 <- DimPlot(seurat_obj, reduction = 'pca', label = TRUE) + ggtitle('RNA PCA')
p2 <- DimPlot(seurat_obj, reduction = 'lsi', label = TRUE) + ggtitle('ATAC LSI')
p3 <- DimPlot(seurat_obj, reduction = 'wnn.umap', label = TRUE) + ggtitle('WNN UMAP')
p1 + p2 + p3

# Modality weights per cell
VlnPlot(seurat_obj, features = 'RNA.weight', group.by = 'seurat_clusters', pt.size = 0)

# Find markers (RNA)
DefaultAssay(seurat_obj) <- 'SCT'
rna_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.25)

# Find markers (ATAC - differentially accessible peaks)
DefaultAssay(seurat_obj) <- 'ATAC'
atac_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.05,
                                test.use = 'LR', latent.vars = 'nCount_ATAC')
```

## Step 6: Gene-Peak Linkage

```r
# Link peaks to genes
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- RegionStats(seurat_obj, genome = BSgenome.Hsapiens.UCSC.hg38)

seurat_obj <- LinkPeaks(
    seurat_obj,
    peak.assay = 'ATAC',
    expression.assay = 'SCT',
    genes.use = c('CD8A', 'CD4', 'MS4A1', 'CD14')  # Example genes
)

# Visualize links
CoveragePlot(seurat_obj, region = 'CD8A', features = 'CD8A',
             expression.assay = 'SCT', extend.upstream = 10000, extend.downstream = 10000)
```

## Complete Workflow Script

```r
library(Seurat)
library(Signac)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
library(ggplot2)

# Configuration
data_dir <- 'multiome_output'
output_dir <- 'multiome_results'
dir.create(output_dir, showWarnings = FALSE)

# === Load Data ===
cat('Loading data...\n')
counts <- Read10X_h5(file.path(data_dir, 'filtered_feature_bc_matrix.h5'))
frags <- file.path(data_dir, 'atac_fragments.tsv.gz')

seurat_obj <- CreateSeuratObject(counts = counts$`Gene Expression`, assay = 'RNA')
seurat_obj[['ATAC']] <- CreateChromatinAssay(
    counts = counts$Peaks,
    sep = c(':', '-'),
    fragments = frags,
    annotation = GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
)
cat('Cells:', ncol(seurat_obj), '\n')

# === RNA QC ===
cat('RNA QC...\n')
seurat_obj[['percent.mt']] <- PercentageFeatureSet(seurat_obj, pattern = '^MT-')
seurat_obj <- subset(seurat_obj, nCount_RNA > 1000 & nCount_RNA < 25000 & percent.mt < 20)

# === ATAC QC ===
cat('ATAC QC...\n')
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- NucleosomeSignal(seurat_obj)
seurat_obj <- TSSEnrichment(seurat_obj)
seurat_obj <- subset(seurat_obj, nCount_ATAC > 1000 & TSS.enrichment > 2 & nucleosome_signal < 4)
cat('After QC:', ncol(seurat_obj), 'cells\n')

# === Process RNA ===
cat('Processing RNA...\n')
DefaultAssay(seurat_obj) <- 'RNA'
seurat_obj <- SCTransform(seurat_obj, verbose = FALSE)
seurat_obj <- RunPCA(seurat_obj, verbose = FALSE)

# === Process ATAC ===
cat('Processing ATAC...\n')
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- RunTFIDF(seurat_obj)
seurat_obj <- FindTopFeatures(seurat_obj, min.cutoff = 'q0')
seurat_obj <- RunSVD(seurat_obj)

# === WNN Integration ===
cat('WNN integration...\n')
seurat_obj <- FindMultiModalNeighbors(seurat_obj,
    reduction.list = list('pca', 'lsi'),
    dims.list = list(1:30, 2:30),
    modality.weight.name = 'RNA.weight'
)
seurat_obj <- RunUMAP(seurat_obj, nn.name = 'weighted.nn',
    reduction.name = 'wnn.umap', reduction.key = 'wnnUMAP_')
seurat_obj <- FindClusters(seurat_obj, graph.name = 'wsnn', resolution = 0.5, verbose = FALSE)

# === Save ===
saveRDS(seurat_obj, file.path(output_dir, 'multiome_analyzed.rds'))

# === Plots ===
pdf(file.path(output_dir, 'wnn_umap.pdf'), width = 10, height = 8)
DimPlot(seurat_obj, reduction = 'wnn.umap', label = TRUE)
dev.off()

cat('Results saved to:', output_dir, '\n')
cat('Clusters:', length(unique(seurat_obj$seurat_clusters)), '\n')
```

## Related Skills

- single-cell/data-io - Loading 10X data
- single-cell/preprocessing - QC and normalization
- single-cell/multimodal-integration - WNN details
- single-cell/scatac-analysis - ATAC-specific processing

Overview

This skill implements an end-to-end multiome pipeline for joint single-cell RNA-seq and ATAC-seq (10X Multiome) analysis using Seurat and Signac. It guides loading Multiome output, modality-specific QC and processing, weighted nearest neighbor (WNN) integration, and downstream joint analyses such as clustering, marker discovery, and gene-peak linkage. The scripts are ready to run as a reproducible R workflow with sensible defaults and checkpoints.

How this skill works

The pipeline reads 10X Multiome HDF5 output (gene expression + peaks) and creates a Seurat object with separate RNA and ATAC assays. It performs RNA QC and SCTransform-based normalization, and ATAC QC with TF-IDF + LSI dimensionality reduction. It then builds a WNN graph combining PCA (RNA) and LSI (ATAC), computes a WNN UMAP, clusters cells on the multimodal graph, and identifies RNA markers and differentially accessible peaks. Gene-peak linking and coverage plots complete the joint regulatory analysis.

When to use it

You have 10X Multiome (joint scRNA + scATAC) sequencing data
You need a single-cell object integrating expression and chromatin accessibility
You want modality-aware clustering and visualization via WNN
You plan to identify gene-peak links or motif enrichment from linked peaks
You want a reproducible pipeline with common QC and filtering defaults

Best practices

Verify package versions (Seurat, Signac, BSgenome, EnsDb) and adapt API differences before running
Inspect and tune modality-specific QC thresholds (nCount, TSS enrichment, nucleosome signal) for your dataset
Exclude LSI component 1 if it correlates with sequencing depth when running WNN
Use SCTransform for RNA normalization and TF-IDF + SVD for ATAC to produce compatible embeddings
Save intermediate objects and summary plots (UMAP, QC violin plots) to help troubleshooting

Example use cases

Process 10X Multiome output to produce a joint Seurat object and publishable UMAPs
Identify cell-type marker genes (RNA) and differentially accessible peaks (ATAC) across clusters
Link peaks to nearby genes and visualize coverage for regulatory hypothesis testing
Compute modality weights per cell to quantify RNA vs ATAC contribution to cluster identity
Run motif enrichment and construct candidate gene regulatory networks from linked peaks

FAQ

Can I start from fragments instead of the HDF5 matrix?

Yes. The pipeline accepts peak counts plus a fragments file; create a Fragment object and pass it into CreateChromatinAssay as shown.

How should I select LSI dimensions for WNN?

Inspect DepthCor and examine correlation with QC metrics; skip LSI component 1 if depth-correlated and use a sensible range (e.g., 2:30) matched to PCA dims (e.g., 1:30).