home / skills / starlitnightly / omicverse / bulk-stringdb-ppi

bulk-stringdb-ppi skill

/.claude/skills/bulk-stringdb-ppi

This skill helps you query STRING for protein interactions, build PPI networks with pyPPI, and render styled network figures from gene lists.

npx playbooks add skill starlitnightly/omicverse --skill bulk-stringdb-ppi

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.9 KB
---
name: string-protein-interaction-analysis-with-omicverse
title: STRING protein interaction analysis with omicverse
description: Help Claude query STRING for protein interactions, build PPI graphs with pyPPI, and render styled network figures for bulk gene lists.
---

# STRING protein interaction analysis with omicverse

## Overview
Invoke this skill when the user has a list of genes and wants to explore STRING protein–protein interactions via omicverse. The
 workflow mirrors [`t_network.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_network.ipynb), covering species selection, S
TRING API queries, and quick visualisation of the resulting network.

## Instructions
1. **Set up libraries**
   - Import `omicverse as ov` and call `ov.utils.ov_plot_set()` (or `ov.plot_set()`) to match omicverse aesthetics.
2. **Collect gene inputs**
   - Accept a curated list of gene symbols (`gene_list = [...]`).
   - Encourage the user to flag priority genes or categories so you can colour-code groups in the plot.
3. **Assign metadata for plotting**
   - Build dictionaries mapping genes to types and colours, e.g. `gene_type_dict = dict(zip(gene_list, ['Type1']*5 + ['Type2']*6
))` and `gene_color_dict = {...}`.
   - Remind users that consistent group labels improve legend readability.
4. **Query STRING interactions**
   - Call `ov.bulk.string_interaction(gene_list, species_id)` where `species_id` is the NCBI taxonomy ID (e.g. 4932 for yeast).
   - Inspect the resulting DataFrame for combined scores and evidence channels to verify coverage.
5. **Construct the network object**
   - Initialise `ppi = ov.bulk.pyPPI(gene=gene_list, gene_type_dict=..., gene_color_dict=..., species=species_id)`.
   - Run `ppi.interaction_analysis()` to fetch and cache STRING edges.
6. **Visualise the network**
   - Generate a default plot with `ppi.plot_network()` to reproduce the notebook figure.
   - Mention that advanced styling (layout, node size, legends) can be tuned through `ov.utils.plot_network` keyword arguments if
 the user requests adjustments.
7. **Troubleshooting**
   - Ensure gene symbols match the species—STRING expects case-sensitive identifiers; suggest mapping Ensembl IDs to symbols when
 queries fail.
   - If the API rate-limits, instruct the user to wait or provide a cached interaction table.
   - For missing interactions, recommend enabling STRING's "add_nodes" option via `ppi.interaction_analysis(add_nodes=...)` to exp
and the network.

## Examples
- "Retrieve STRING interactions for FAA4 and plot the network highlighting two gene classes."
- "Download the STRING edge table for my Saccharomyces cerevisiae gene panel and colour nodes by module."
- "Extend the network by adding the top five predicted partners before plotting."

## References
- Tutorial notebook: [`t_network.ipynb`](../../omicverse_guide/docs/Tutorials-bulk/t_network.ipynb)
- STRING background: [string-db.org](https://string-db.org/)
- Quick copy/paste commands: [`reference.md`](reference.md)

Overview

This skill helps you query STRING for protein–protein interactions using omicverse, build PPI objects with pyPPI, and render publication-ready network figures for bulk gene lists. It accepts a curated gene list, supports species selection via NCBI taxonomy IDs, and produces both edge tables and styled network plots. Use it to quickly inspect interaction evidence, extend networks with predicted partners, and colour-code genes by groups or priorities.

How this skill works

You provide a gene list and optional metadata mapping genes to types and colours. The skill uses ov.bulk.string_interaction to fetch STRING edges, then constructs a pyPPI network object (ov.bulk.pyPPI) and runs interaction_analysis() to cache edges and optionally expand nodes. Finally it renders a styled network via ppi.plot_network(), with further layout and legend adjustments available through ov.utils.plot_network keyword arguments.

When to use it

  • You have a curated list of gene symbols and want to visualise known/predicted protein interactions.
  • You need an interaction edge table with STRING combined scores and evidence channels for downstream analysis.
  • You want to colour or annotate network nodes by gene modules, priorities, or experimental groups.
  • You plan to extend a seed network by adding top predicted partners before plotting.
  • You need reproducible figures that follow omicverse aesthetics for reports or manuscripts.

Best practices

  • Pass gene symbols that match the target species; STRING is case-sensitive—map Ensembl IDs to symbols when necessary.
  • Provide a concise gene_type_dict and gene_color_dict to ensure clear legends and consistent colour mapping.
  • Inspect the returned DataFrame for combined_score and evidence channels to confirm interaction coverage.
  • If rate-limited by the STRING API, either wait or reuse a cached edge table returned by interaction_analysis().
  • Use add_nodes in interaction_analysis(add_nodes=...) to include predicted partners when seed lists have sparse connectivity.

Example use cases

  • Retrieve STRING interactions for a yeast gene panel (NCBI 4932) and plot nodes coloured by module.
  • Download the STRING edge table for a bulk RNA-seq signature and filter edges by combined_score for enrichment tests.
  • Highlight a set of priority genes in a cancer signature network and export a high-resolution figure for a manuscript.
  • Expand a small seed list by the top five predicted partners and inspect how new nodes change network topology.
  • Share a cached interaction table with collaborators to let them reproduce or extend the analysis without re-querying STRING.

FAQ

What do I do if STRING returns no interactions for my genes?

Check species and symbol case; map IDs to the correct gene symbols. If still sparse, enable add_nodes to include predicted partners or provide broader gene lists.

How can I avoid STRING API rate limits?

Run queries in batches, wait between requests, or rely on the cached edge table produced by ppi.interaction_analysis() for repeated work.