home / skills / gptomics / bioskills / multiome-pipeline

multiome-pipeline skill

/workflows/multiome-pipeline

This skill guides end-to-end joint scRNA-seq and scATAC-seq analysis using Seurat and Signac, enabling WNN integration and multiome clustering.

npx playbooks add skill gptomics/bioskills --skill multiome-pipeline

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
8.3 KB
---
name: bio-workflows-multiome-pipeline
description: End-to-end multiome workflow for joint scRNA-seq + scATAC-seq analysis. Covers data loading, separate modality processing, and WNN integration with Seurat/Signac. Use when analyzing joint scRNA+scATAC data.
tool_type: r
primary_tool: Seurat
workflow: true
depends_on:
  - single-cell/data-io
  - single-cell/preprocessing
  - single-cell/clustering
  - single-cell/multimodal-integration
  - single-cell/scatac-analysis
qc_checkpoints:
  - after_loading: "Both modalities detected per cell"
  - after_rna_qc: "RNA quality filters passed"
  - after_atac_qc: "TSS enrichment >2, nucleosome signal <4"
  - after_wnn: "Joint embedding separates cell types"
---

## Version Compatibility

Reference examples tested with: ggplot2 3.5+

Before using code patterns, verify installed versions match. If versions differ:
- R: `packageVersion('<pkg>')` then `?function_name` to verify parameters

If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.

# Multiome Pipeline

**"Analyze my 10X Multiome data jointly"** → Orchestrate Cell Ranger ARC processing, Seurat/Signac scRNA+scATAC integration via WNN, chromatin accessibility peak calling, motif enrichment, and gene regulatory network inference.

Complete workflow for 10X Multiome (joint scRNA + scATAC) analysis using Seurat and Signac.

## Workflow Overview

```
10X Multiome data
    |
    v
[1. Load Data] ---------> Read RNA + ATAC
    |
    v
[2. RNA Processing] ----> Standard scRNA workflow
    |
    v
[3. ATAC Processing] ---> Peak calling, LSI
    |
    v
[4. WNN Integration] ---> Weighted nearest neighbors
    |
    v
[5. Joint Analysis] ----> Clustering, markers
    |
    v
[6. Linked Features] ---> Gene-peak links
    |
    v
Integrated multiome object
```

## Step 1: Load Multiome Data

```r
library(Seurat)
library(Signac)
library(EnsDb.Hsapiens.v86)
library(ggplot2)

# Load RNA
rna_counts <- Read10X_h5('filtered_feature_bc_matrix.h5')
# For multiome, this returns a list with 'Gene Expression' and 'Peaks'

# Create Seurat object with RNA
seurat_obj <- CreateSeuratObject(
    counts = rna_counts$`Gene Expression`,
    assay = 'RNA'
)

# Load ATAC
atac_counts <- rna_counts$Peaks
# Or from fragments file
frags <- CreateFragmentObject('atac_fragments.tsv.gz', cells = colnames(seurat_obj))

# Create ChromatinAssay
atac_assay <- CreateChromatinAssay(
    counts = atac_counts,
    sep = c(':', '-'),
    fragments = frags,
    annotation = GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
)

seurat_obj[['ATAC']] <- atac_assay
```

## Step 2: RNA Quality Control and Processing

```r
# QC metrics
seurat_obj[['percent.mt']] <- PercentageFeatureSet(seurat_obj, pattern = '^MT-')

# Filter
seurat_obj <- subset(seurat_obj,
    nCount_RNA > 1000 &
    nCount_RNA < 25000 &
    percent.mt < 20
)

# Normalize RNA
seurat_obj <- SCTransform(seurat_obj, assay = 'RNA', verbose = FALSE)

# PCA
seurat_obj <- RunPCA(seurat_obj, assay = 'SCT', verbose = FALSE)
```

## Step 3: ATAC Quality Control and Processing

```r
# ATAC QC metrics
DefaultAssay(seurat_obj) <- 'ATAC'

seurat_obj <- NucleosomeSignal(seurat_obj)
seurat_obj <- TSSEnrichment(seurat_obj)

# Visualize
VlnPlot(seurat_obj, features = c('nCount_ATAC', 'TSS.enrichment', 'nucleosome_signal'),
        pt.size = 0, ncol = 3)

# Filter ATAC
seurat_obj <- subset(seurat_obj,
    nCount_ATAC > 1000 &
    nCount_ATAC < 100000 &
    TSS.enrichment > 2 &
    nucleosome_signal < 4
)

# Normalize ATAC (TF-IDF + SVD = LSI)
seurat_obj <- RunTFIDF(seurat_obj)
seurat_obj <- FindTopFeatures(seurat_obj, min.cutoff = 'q0')
seurat_obj <- RunSVD(seurat_obj)

# Check LSI components (first often correlates with depth)
DepthCor(seurat_obj)
```

## Step 4: Weighted Nearest Neighbors (WNN)

```r
# Build WNN graph using both modalities
seurat_obj <- FindMultiModalNeighbors(
    seurat_obj,
    reduction.list = list('pca', 'lsi'),
    dims.list = list(1:30, 2:30),  # Skip LSI component 1 if depth-correlated
    modality.weight.name = 'RNA.weight'
)

# UMAP on WNN graph
seurat_obj <- RunUMAP(seurat_obj, nn.name = 'weighted.nn',
                       reduction.name = 'wnn.umap', reduction.key = 'wnnUMAP_')

# Cluster on WNN
seurat_obj <- FindClusters(seurat_obj, graph.name = 'wsnn',
                            algorithm = 3, resolution = 0.5, verbose = FALSE)
```

## Step 5: Visualization and Markers

```r
# Compare modality-specific and joint embeddings
p1 <- DimPlot(seurat_obj, reduction = 'pca', label = TRUE) + ggtitle('RNA PCA')
p2 <- DimPlot(seurat_obj, reduction = 'lsi', label = TRUE) + ggtitle('ATAC LSI')
p3 <- DimPlot(seurat_obj, reduction = 'wnn.umap', label = TRUE) + ggtitle('WNN UMAP')
p1 + p2 + p3

# Modality weights per cell
VlnPlot(seurat_obj, features = 'RNA.weight', group.by = 'seurat_clusters', pt.size = 0)

# Find markers (RNA)
DefaultAssay(seurat_obj) <- 'SCT'
rna_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.25)

# Find markers (ATAC - differentially accessible peaks)
DefaultAssay(seurat_obj) <- 'ATAC'
atac_markers <- FindAllMarkers(seurat_obj, only.pos = TRUE, min.pct = 0.05,
                                test.use = 'LR', latent.vars = 'nCount_ATAC')
```

## Step 6: Gene-Peak Linkage

```r
# Link peaks to genes
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- RegionStats(seurat_obj, genome = BSgenome.Hsapiens.UCSC.hg38)

seurat_obj <- LinkPeaks(
    seurat_obj,
    peak.assay = 'ATAC',
    expression.assay = 'SCT',
    genes.use = c('CD8A', 'CD4', 'MS4A1', 'CD14')  # Example genes
)

# Visualize links
CoveragePlot(seurat_obj, region = 'CD8A', features = 'CD8A',
             expression.assay = 'SCT', extend.upstream = 10000, extend.downstream = 10000)
```

## Complete Workflow Script

```r
library(Seurat)
library(Signac)
library(EnsDb.Hsapiens.v86)
library(BSgenome.Hsapiens.UCSC.hg38)
library(ggplot2)

# Configuration
data_dir <- 'multiome_output'
output_dir <- 'multiome_results'
dir.create(output_dir, showWarnings = FALSE)

# === Load Data ===
cat('Loading data...\n')
counts <- Read10X_h5(file.path(data_dir, 'filtered_feature_bc_matrix.h5'))
frags <- file.path(data_dir, 'atac_fragments.tsv.gz')

seurat_obj <- CreateSeuratObject(counts = counts$`Gene Expression`, assay = 'RNA')
seurat_obj[['ATAC']] <- CreateChromatinAssay(
    counts = counts$Peaks,
    sep = c(':', '-'),
    fragments = frags,
    annotation = GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
)
cat('Cells:', ncol(seurat_obj), '\n')

# === RNA QC ===
cat('RNA QC...\n')
seurat_obj[['percent.mt']] <- PercentageFeatureSet(seurat_obj, pattern = '^MT-')
seurat_obj <- subset(seurat_obj, nCount_RNA > 1000 & nCount_RNA < 25000 & percent.mt < 20)

# === ATAC QC ===
cat('ATAC QC...\n')
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- NucleosomeSignal(seurat_obj)
seurat_obj <- TSSEnrichment(seurat_obj)
seurat_obj <- subset(seurat_obj, nCount_ATAC > 1000 & TSS.enrichment > 2 & nucleosome_signal < 4)
cat('After QC:', ncol(seurat_obj), 'cells\n')

# === Process RNA ===
cat('Processing RNA...\n')
DefaultAssay(seurat_obj) <- 'RNA'
seurat_obj <- SCTransform(seurat_obj, verbose = FALSE)
seurat_obj <- RunPCA(seurat_obj, verbose = FALSE)

# === Process ATAC ===
cat('Processing ATAC...\n')
DefaultAssay(seurat_obj) <- 'ATAC'
seurat_obj <- RunTFIDF(seurat_obj)
seurat_obj <- FindTopFeatures(seurat_obj, min.cutoff = 'q0')
seurat_obj <- RunSVD(seurat_obj)

# === WNN Integration ===
cat('WNN integration...\n')
seurat_obj <- FindMultiModalNeighbors(seurat_obj,
    reduction.list = list('pca', 'lsi'),
    dims.list = list(1:30, 2:30),
    modality.weight.name = 'RNA.weight'
)
seurat_obj <- RunUMAP(seurat_obj, nn.name = 'weighted.nn',
    reduction.name = 'wnn.umap', reduction.key = 'wnnUMAP_')
seurat_obj <- FindClusters(seurat_obj, graph.name = 'wsnn', resolution = 0.5, verbose = FALSE)

# === Save ===
saveRDS(seurat_obj, file.path(output_dir, 'multiome_analyzed.rds'))

# === Plots ===
pdf(file.path(output_dir, 'wnn_umap.pdf'), width = 10, height = 8)
DimPlot(seurat_obj, reduction = 'wnn.umap', label = TRUE)
dev.off()

cat('Results saved to:', output_dir, '\n')
cat('Clusters:', length(unique(seurat_obj$seurat_clusters)), '\n')
```

## Related Skills

- single-cell/data-io - Loading 10X data
- single-cell/preprocessing - QC and normalization
- single-cell/multimodal-integration - WNN details
- single-cell/scatac-analysis - ATAC-specific processing

Overview

This skill implements an end-to-end multiome pipeline for joint single-cell RNA-seq and ATAC-seq (10X Multiome) analysis using Seurat and Signac. It guides loading Multiome output, modality-specific QC and processing, weighted nearest neighbor (WNN) integration, and downstream joint analyses such as clustering, marker discovery, and gene-peak linkage. The scripts are ready to run as a reproducible R workflow with sensible defaults and checkpoints.

How this skill works

The pipeline reads 10X Multiome HDF5 output (gene expression + peaks) and creates a Seurat object with separate RNA and ATAC assays. It performs RNA QC and SCTransform-based normalization, and ATAC QC with TF-IDF + LSI dimensionality reduction. It then builds a WNN graph combining PCA (RNA) and LSI (ATAC), computes a WNN UMAP, clusters cells on the multimodal graph, and identifies RNA markers and differentially accessible peaks. Gene-peak linking and coverage plots complete the joint regulatory analysis.

When to use it

  • You have 10X Multiome (joint scRNA + scATAC) sequencing data
  • You need a single-cell object integrating expression and chromatin accessibility
  • You want modality-aware clustering and visualization via WNN
  • You plan to identify gene-peak links or motif enrichment from linked peaks
  • You want a reproducible pipeline with common QC and filtering defaults

Best practices

  • Verify package versions (Seurat, Signac, BSgenome, EnsDb) and adapt API differences before running
  • Inspect and tune modality-specific QC thresholds (nCount, TSS enrichment, nucleosome signal) for your dataset
  • Exclude LSI component 1 if it correlates with sequencing depth when running WNN
  • Use SCTransform for RNA normalization and TF-IDF + SVD for ATAC to produce compatible embeddings
  • Save intermediate objects and summary plots (UMAP, QC violin plots) to help troubleshooting

Example use cases

  • Process 10X Multiome output to produce a joint Seurat object and publishable UMAPs
  • Identify cell-type marker genes (RNA) and differentially accessible peaks (ATAC) across clusters
  • Link peaks to nearby genes and visualize coverage for regulatory hypothesis testing
  • Compute modality weights per cell to quantify RNA vs ATAC contribution to cluster identity
  • Run motif enrichment and construct candidate gene regulatory networks from linked peaks

FAQ

Can I start from fragments instead of the HDF5 matrix?

Yes. The pipeline accepts peak counts plus a fragments file; create a Fragment object and pass it into CreateChromatinAssay as shown.

How should I select LSI dimensions for WNN?

Inspect DepthCor and examine correlation with QC metrics; skip LSI component 1 if depth-correlated and use a sensible range (e.g., 2:30) matched to PCA dims (e.g., 1:30).