home / skills / starlitnightly / omicverse / single-to-spatial-mapping

single-to-spatial-mapping skill

/.claude/skills/single-to-spatial-mapping

This skill maps single-cell references to spatial transcriptomics profiles, enabling spot-level reconstruction, marker visualization, and downstream reporting.

npx playbooks add skill starlitnightly/omicverse --skill single-to-spatial-mapping

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
3.9 KB
---
name: single2spatial-spatial-mapping
title: Single2Spatial spatial mapping
description: Map scRNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow for deep-forest training, spot-level assessment, and marker visualisation.
---

# Single2Spatial spatial mapping

## Overview
Apply this skill when converting single-cell references into spatially resolved profiles. It follows [`t_single2spatial.ipynb`](../../omicverse_guide/docs/Tutorials-bulk2single/t_single2spatial.ipynb), demonstrating how Single2Spatial trains on PDAC scRNA-seq and Visium data, reconstructs spot-level proportions, and visualises marker expression.

## Instructions
1. **Import dependencies and style**
   - Load `omicverse as ov`, `scanpy as sc`, `anndata`, `pandas as pd`, `numpy as np`, and `matplotlib.pyplot as plt`.
   - Call `ov.utils.ov_plot_set()` (or `ov.plot_set()` in older versions) to align plots with omicverse styling.
2. **Load single-cell and spatial datasets**
   - Read processed matrices with `pd.read_csv(...)` then create AnnData objects (`anndata.AnnData(raw_df.T)`).
   - Attach metadata: `single_data.obs = pd.read_csv(...)[['Cell_type']]` and `spatial_data.obs = pd.read_csv(... )` containing coordinates and slide metadata.
3. **Initialise Single2Spatial**
   - Instantiate `ov.bulk2single.Single2Spatial(single_data=single_data, spatial_data=spatial_data, celltype_key='Cell_type', spot_key=['xcoord','ycoord'], gpu=0)`.
   - Note that inputs should be normalised/log-scaled scRNA-seq matrices; ensure `spot_key` matches spatial coordinate columns.
4. **Train the deep-forest model**
   - Execute `st_model.train(spot_num=500, cell_num=10, df_save_dir='...', df_save_name='pdac_df', k=10, num_epochs=1000, batch_size=1000, predicted_size=32)` to fit the mapper and generate reconstructed spatial AnnData (`sp_adata`).
   - Explain that `spot_num` defines sampled pseudo-spots per iteration and `cell_num` controls per-spot cell draws.
5. **Load pretrained weights**
   - Use `st_model.load(modelsize=14478, df_load_dir='.../pdac_df.pth', k=10, predicted_size=32)` when checkpoints already exist to skip training.
6. **Assess spot-level outputs**
   - Call `st_model.spot_assess()` to compute aggregated spot AnnData (`sp_adata_spot`) for QC.
   - Plot marker genes with `sc.pl.embedding(sp_adata, basis='X_spatial', color=['REG1A', 'CLDN1', ...], frameon=False, ncols=4)`.
7. **Visualise proportions and cell-type maps**
   - Use `sc.pl.embedding(sp_adata_spot, basis='X_spatial', color=['Acinar cells', ...], frameon=False)` to highlight per-spot cell fractions.
   - Plot `sp_adata` coloured by `Cell_type` with `palette=ov.utils.ov_palette()[11:]` to show reconstructed assignments.
8. **Export results**
   - Encourage saving generated AnnData objects (`sp_adata.write_h5ad(...)`, `sp_adata_spot.write_h5ad(...)`) and derived CSV summaries for downstream reporting.
9. **Troubleshooting tips**
   - If training diverges, reduce `learning_rate` via keyword arguments or decrease `predicted_size` to stabilise the forest.
   - Ensure scRNA-seq inputs are log-normalised; raw counts can lead to scale mismatches and poor spatial predictions.
   - Verify GPU availability when `gpu` is non-zero; fallback to CPU by omitting the argument or setting `gpu=-1`.

## Examples
- "Train Single2Spatial on PDAC scRNA-seq and Visium slides, then visualise REG1A and CLDN1 spatial expression."
- "Load a saved Single2Spatial checkpoint to regenerate spot-level cell-type proportions for reporting."
- "Plot reconstructed cell-type maps with omicverse palettes to compare against histology."

## References
- Tutorial notebook: [`t_single2spatial.ipynb`](../../omicverse_guide/docs/Tutorials-bulk2single/t_single2spatial.ipynb)
- Example datasets and models: [`omicverse_guide/docs/Tutorials-bulk2single/data/pdac/`](../../omicverse_guide/docs/Tutorials-bulk2single/data/pdac/)
- Quick copy/paste commands: [`reference.md`](reference.md)

Overview

This skill maps single-cell RNA-seq atlases onto spatial transcriptomics slides using omicverse's Single2Spatial workflow. It trains a deep-forest mapper (or loads checkpoints) to reconstruct spot-level cell-type proportions and visualise marker expression on Visium-style spatial coordinates. The outcome is reconstructed spatial AnnData objects and spot-level QC summaries ready for downstream analysis and reporting.

How this skill works

The workflow takes a processed, log-normalised scRNA-seq reference and a spatial transcriptomics count matrix with x/y coordinates, then trains a deep-forest model to generate pseudo-spots and learn mapping from cell mixtures to spot profiles. Training produces reconstructed per-cell and per-spot AnnData outputs and aggregated spot-level proportions. The skill includes utilities for loading pretrained weights, performing spot-level assessment, plotting marker genes and cell-type proportion maps, and exporting results.

When to use it

  • You have a quality-controlled, log-normalised single-cell reference and Visium-style spatial data to map cell types spatially.
  • You want spot-level cell-type proportion estimates for integration with histology or downstream spatial analysis.
  • You need to visualise marker genes or reconstructed cell assignments across spatial coordinates.
  • You have pretrained Single2Spatial checkpoints to skip expensive retraining and quickly regenerate outputs.

Best practices

  • Ensure scRNA-seq input is log-normalised; raw counts cause scale mismatches and poor predictions.
  • Include accurate spatial coordinate columns (e.g., 'xcoord','ycoord') and set spot_key accordingly when initialising the model.
  • Start with moderate predicted_size and learning rate; reduce predicted_size or learning_rate if training diverges.
  • Save both reconstructed per-cell and aggregated per-spot AnnData (.h5ad) and export CSV summaries for reproducibility and reporting.
  • When using GPU, verify availability; fallback to CPU by omitting gpu or setting gpu=-1.

Example use cases

  • Train Single2Spatial on PDAC single-cell and Visium slides, then plot REG1A and CLDN1 spatial expression to compare with histology.
  • Load a saved checkpoint to rapidly regenerate spot-level cell-type proportions for a manuscript figure or QC table.
  • Produce reconstructed cell-type maps with omicverse palettes to evaluate deconvolution quality across tissue regions.
  • Generate aggregated spot AnnData for downstream spatial statistics, spatially variable gene tests, or integration with image features.

FAQ

Can I use raw counts as input for Single2Spatial?

No. Inputs should be log-normalised or otherwise scaled. Raw counts often produce scale mismatches and poor spatial predictions.

How do I skip training and reuse a model?

Use the load method with the saved checkpoint path and matching modelsize/predicted_size to restore pretrained weights and skip retraining.