home / skills / starlitnightly / omicverse / spatial-tutorials

spatial-tutorials skill

/.claude/skills/spatial-tutorials

This skill guides you through spatial transcriptomics tutorials using OmicVerse, covering preprocessing, deconvolution, and downstream modelling across

npx playbooks add skill starlitnightly/omicverse --skill spatial-tutorials

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
9.9 KB
---
name: spatial-transcriptomics-tutorials-with-omicverse
title: Spatial transcriptomics tutorials with omicverse
description: "Spatial transcriptomics: Visium/HD, Stereo-seq, Slide-seq preprocessing (crop, rotate, cellpose), deconvolution (Tangram, cell2location, Starfysh), clustering (GraphST, STAGATE), integration, trajectory, communication."
---

# Spatial Transcriptomics with OmicVerse

This skill covers spatial analysis workflows organized into three stages: Preprocessing, Deconvolution, and Downstream Analysis. Each stage includes the critical function calls, parameter guidance, and common pitfalls.

## Defensive Validation: Always Check Spatial Coordinates First

Before ANY spatial operation, verify that spatial coordinates exist and are numeric:

```python
# Required check before spatial analysis
assert 'spatial' in adata.obsm, \
    "Missing adata.obsm['spatial']. Load with ov.io.spatial.read_visium() or set manually."
# Cast to float64 to prevent coordinate precision issues during rotation/cropping
adata.obsm['spatial'] = adata.obsm['spatial'].astype('float64')
```

## Stage 1: Preprocessing

### Crop, Rotate, and Align Coordinates

Load Visium data and manipulate spatial coordinates for region selection and alignment:

```python
import scanpy as sc, omicverse as ov
ov.plot_set()

adata = sc.datasets.visium_sge(sample_id="V1_Breast_Cancer_Block_A_Section_1")
library_id = list(adata.uns['spatial'].keys())[0]

# Cast coordinates before manipulation
adata.obsm['spatial'] = adata.obsm['spatial'].astype('float64')

# Crop to region of interest
adata_crop = ov.space.crop_space_visium(adata, crop_loc=(0, 0), crop_area=(1000, 1000),
                                         library_id=library_id, scale=1)

# Rotate and auto-align
adata_rot = ov.space.rotate_space_visium(adata, angle=45, library_id=library_id)
ov.space.map_spatial_auto(adata_rot, method='phase')
# For manual refinement: ov.space.map_spatial_manual(adata_rot, ...)
```

### Visium HD Cell Segmentation

Segment Visium HD bins into cells using cellpose:

```python
adata = ov.space.read_visium_10x(path="binned_outputs/square_002um/",
                                  source_image_path="tissue_image.btf")
ov.pp.filter_genes(adata, min_cells=3)
ov.pp.filter_cells(adata, min_counts=1)

# H&E-based segmentation
adata = ov.space.visium_10x_hd_cellpose_he(adata, mpp=0.3, gpu=True, buffer=150)
# Expand labels to neighboring bins
ov.space.visium_10x_hd_cellpose_expand(adata, labels_key='labels_he',
                                        expanded_labels_key='labels_he_expanded', max_bin_distance=4)
# Gene-expression-driven seeds
ov.space.visium_10x_hd_cellpose_gex(adata, obs_key="n_counts_adjusted", mpp=0.3, sigma=5)
# Merge labels and aggregate to cell-level
ov.space.salvage_secondary_labels(adata, primary_label='labels_he_expanded',
                                   secondary_label='labels_gex', labels_key='labels_joint')
cdata = ov.space.bin2cell(adata, labels_key='labels_joint')
```

## Stage 2: Deconvolution

### Critical API Reference: Method Selection

```python
# CORRECT — Tangram: method passed to deconvolution() call
decov_obj = ov.space.Deconvolution(adata_sc=sc_adata, adata_sp=sp_adata,
                                    celltype_key='Subset', result_dir='result/tangram')
decov_obj.preprocess_sc(max_cells=5000)
decov_obj.preprocess_sp()
decov_obj.deconvolution(method='tangram', num_epochs=1000)

# CORRECT — cell2location: method passed at INIT time
cell2_obj = ov.space.Deconvolution(adata_sc=sc_adata, adata_sp=sp_adata,
                                    celltype_key='Subset', result_dir='result/c2l',
                                    method='cell2location')
cell2_obj.deconvolution(max_epochs=30000)
cell2_obj.save_model('result/c2l/model')
```

Note: For cell2location, the `method` parameter is set at initialization, not at the `deconvolution()` call. For Tangram, it's passed to `deconvolution()`.

### Starfysh Archetypal Deconvolution

```python
from omicverse.external.starfysh import AA, utils, plot_utils

visium_args = utils.prepare_data(adata_path="data/counts.h5ad",
                                  signature_path="data/signatures.csv",
                                  min_cells=10, filter_hvg=True, n_top_genes=3000)
adata, adata_normed = visium_args.get_adata()
aa_model = AA.ArchetypalAnalysis(adata_orig=adata_normed)
aa_model.fit(k=12, n_init=10)
visium_args = utils.refine_anchors(visium_args, aa_model, add_marker=True)
model, history = utils.run_starfysh(visium_args, poe=False, n_repeat=5, lr=5e-3, max_epochs=500)
```

## Stage 3: Downstream Analysis

### Spatial Clustering

```python
# GraphST + mclust
ov.utils.cluster(adata, use_rep='graphst|original|X_pca', method='mclust', n_components=7)

# STAGATE
ov.utils.cluster(adata, use_rep='STAGATE', method='mclust', n_components=7)

# Merge small clusters
ov.space.merge_cluster(adata, groupby='mclust', resolution=0.5)
```

Algorithm choice: GraphST and STAGATE require precalculated latent spaces. For standard clustering without spatial-aware embeddings, use Leiden/Louvain directly.

### Multi-Slice Integration (STAligner)

```python
import anndata as ad
Batch_list = [ov.read(p) for p in slice_paths]
adata_concat = ad.concat(Batch_list, label='slice_name', keys=section_ids)
STAligner_obj = ov.space.pySTAligner(adata=adata_concat, batch_key='slice_name',
                                      hidden_dims=[256, 64], use_gpu=True)
STAligner_obj.train_STAligner_subgraph(nepochs=800, lr=1e-3)
STAligner_obj.train()
adata_aligned = STAligner_obj.predicted()
sc.pp.neighbors(adata_aligned, use_rep='STAligner')
```

### Spatial Trajectories

**SpaceFlow** — pseudo-spatial maps:
```python
sf_obj = ov.space.pySpaceFlow(adata)
sf_obj.train(spatial_regularization_strength=0.1, num_epochs=300, patience=50)
sf_obj.cal_pSM(n_neighbors=20, resolution=1.0)
```

**STT** — transition dynamics:
```python
STT_obj = ov.space.STT(adata, spatial_loc='xy_loc', region='Region', n_neighbors=20)
STT_obj.stage_estimate()
STT_obj.train(n_states=9, n_iter=15, weight_connectivities=0.5)
```

### Cell Communication (COMMOT + FlowSig)

```python
df_cellchat = ov.external.commot.pp.ligand_receptor_database(species='human', database='cellchat')
df_cellchat = ov.external.commot.pp.filter_lr_database(df_cellchat, adata, min_expr_frac=0.05)
ov.external.commot.tl.spatial_communication(adata, lr_database=df_cellchat,
                                              distance_threshold=500, result_prefix='cellchat')
# FlowSig network
adata.layers['normalized'] = adata.X.copy()
ov.external.flowsig.tl.construct_intercellular_flow_network(
    adata, commot_output_key='commot-cellchat',
    flowsig_output_key='flowsig-cellchat', edge_threshold=0.7)
```

### Structural Layers (GASTON) and Slice Alignment (SLAT)

**GASTON** — iso-depth estimation:
```python
gas_obj = ov.space.GASTON(adata)
A = gas_obj.prepare_inputs(n_pcs=50)
gas_obj.load_rescale(A)
gas_obj.train(hidden_dims=[64, 32], dropout=0.1, max_epochs=2000)
gaston_isodepth, gaston_labels = gas_obj.cal_iso_depth(n_layers=5)
```

**SLAT** — cross-slice alignment:
```python
from omicverse.external.scSLAT.model import Cal_Spatial_Net, load_anndatas, run_SLAT, spatial_match
Cal_Spatial_Net(adata1, k_cutoff=20, model='KNN')
Cal_Spatial_Net(adata2, k_cutoff=20, model='KNN')
edges, features = load_anndatas([adata1, adata2], feature='DPCA', check_order=False)
embeddings, *_ = run_SLAT(features, edges, LGCN_layer=5)
best, index, distance = spatial_match(embeddings, adatas=[adata1, adata2])
```

## Troubleshooting

- **`ValueError: spatial coordinates out of bounds` after rotation**: Cast `adata.obsm['spatial']` to `float64` BEFORE calling `rotate_space_visium`. Integer coordinates lose precision during trigonometric rotation.
- **Cellpose segmentation fails with memory error**: For large `.btf` images, use `backend='tifffile'` to memory-map the image. Reduce `buffer` parameter if GPU memory is insufficient.
- **Gene ID overlap failure in Tangram/cell2location**: Harmonise identifiers (ENSEMBL vs gene symbols) between `adata_sc` and `adata_sp` before calling `preprocess_sc`/`preprocess_sp`. Drop non-overlapping genes.
- **`mclust` clustering error**: Requires `rpy2` and the R `mclust` package. If R bindings are unavailable, switch to `method='louvain'` or `method='leiden'`.
- **STAligner/SpaceFlow embeddings collapse to a single point**: Verify `adata.obsm['spatial']` exists and coordinates are scaled appropriately. Tune learning rate (try `lr=5e-4`) and regularisation strength.
- **FlowSig returns empty network**: Build spatial neighbor graphs before Moran's I filtering. Increase bootstraps or lower `edge_threshold` (try 0.5) if the network is too sparse.
- **GASTON `RuntimeError` in training**: Provide a writable `out_dir` path. PyTorch nondeterminism may cause variation between runs—set `torch.manual_seed()` for reproducibility.
- **SLAT alignment has many low-quality matches**: Regenerate spatial graphs with a higher `k_cutoff` value. Inspect `low_quality_index` flags and filter cells with high distance scores.
- **STT pathway enrichment fails**: `gseapy` needs network access for gene set downloads. Cache gene sets locally with `ov.utils.geneset_prepare()` and pass the dictionary directly.

## Dependencies
- Core: `omicverse`, `scanpy`, `anndata`, `squidpy`, `numpy`, `matplotlib`
- Segmentation: `cellpose`, `opencv-python`/`tifffile`, optional GPU PyTorch
- Deconvolution: `tangram-sc`, `cell2location`, `pytorch-lightning`; Starfysh needs `torch`, `scikit-learn`
- Downstream: `scikit-learn`, `commot`, `flowsig`, `gseapy`, torch-backed modules (STAligner, SpaceFlow, GASTON, SLAT)

## Examples
- "Crop and rotate my Visium slide, then run cellpose segmentation on the HD data and aggregate to cell-level AnnData."
- "Deconvolve my lymph node spatial data with Tangram and cell2location, compare proportions, and plot cell-type maps."
- "Integrate three DLPFC slices with STAligner, cluster with STAGATE, and infer communication with COMMOT+FlowSig."

## References
- Quick copy/paste commands: [`reference.md`](reference.md)

Overview

This skill guides users through omicverse spatial transcriptomics tutorials for preprocessing, deconvolution, and downstream modelling across Visium, Visium HD, Stereo‑seq, and Slide‑seq datasets. It organizes staged notebooks into Preprocess, Deconvolution, and Downstream workflows and surfaces key functions and troubleshooting tips for each step. Follow the recipes to prepare data, run Tangram/cell2location/Starfysh, and apply spatial clustering, alignment, and trajectory methods.

How this skill works

The skill points to notebook workflows that load spatial slides, rescale and align coordinates, and perform Visium HD segmentation with Cellpose or gene‑guided seeding. It wraps unified deconvolution via ov.space.Deconvolution (Tangram/cell2location) and archetypal deconvolution with Starfysh. Downstream modules include spatial clustering, multi‑slice alignment (STAligner), pseudo‑spatial modelling (SpaceFlow/STT), communication inference (COMMOT/FlowSig), GASTON layer extraction, and SLAT alignment.

When to use it

  • You need to crop, rotate, or manually align Visium coordinates before visualization or integration.
  • You have Visium HD tiles and want nucleus/cell segmentation then bin→cell aggregation.
  • You need reference‑based deconvolution (Tangram or cell2location) or archetypal deconvolution (Starfysh).
  • You want multi‑slice integration, latent embeddings, or slice alignment (STAligner).
  • You plan spatial trajectories, iso‑depth extraction, or intercellular signaling analyses (SpaceFlow, STT, GASTON, COMMOT/FlowSig).

Best practices

  • Always apply library scalefactors and cast spatial coordinates to float64 before automated mapping to avoid mismatches.
  • Filter low‑quality genes/cells before segmentation or deconvolution (min_cells/min_counts) and persist intermediate AnnData snapshots for reuse.
  • Harmonize gene identifiers (ENSEMBL vs symbols) before Tangram/cell2location and drop non‑overlapping genes.
  • Use memory‑mapped TIFF backends, adjust mpp/buffer, and verify .btf image paths for Cellpose; match CUDA/PyTorch for GPU runs.
  • Precompute spatial graphs and tune edge_threshold/bootstraps for FlowSig; tune learning rates and regularization for STAligner/SpaceFlow to prevent collapse.

Example use cases

  • Crop and rotate Visium slides, manually refine offsets, segment Visium HD tiles with Cellpose, and aggregate bins to cell‑level AnnData.
  • Run ov.space.Deconvolution with Tangram and cell2location on a lymph node reference and plot inferred cell‑type proportions across ROIs.
  • Train STAligner on Stereo‑seq slices, extract STAligner latent embeddings, cluster slices jointly, and inspect ARI between methods.
  • Apply SpaceFlow to build pseudo‑spatial maps, compute pSM trajectories, then run STT for pathway enrichment and lineage inference.
  • Construct FlowSig intercellular networks after COMMOT signaling calls, filter by Moran’s I, and validate edges with bootstrap thresholds.

FAQ

What causes coordinate mismatches after rotation?

Ensure scalefactors are applied when plotting and cast adata.obsm['spatial'] to float64 before running automatic mapping.

How do I fix Cellpose runtime errors on large images?

Verify image paths, use tifffile backend for memory mapping, adjust mpp and buffer, and match CUDA/PyTorch builds for GPU acceleration.

Why do Tangram/cell2location runs fail to align genes?

Harmonize gene IDs between scRNA and spatial matrices (ENS vs symbols) and remove non‑overlapping genes before preprocess steps.