home / skills / starlitnightly / omicverse / biocontext-knowledge

biocontext-knowledge skill

safe

This skill helps you annotate gene results and explore pathways, literature, and drug associations using BioContext's unified Python API.

npx playbooks add skill starlitnightly/omicverse --skill biocontext-knowledge

Review the files below or copy the command above to add this skill to your agents.

Files (2)

SKILL.md

7.2 KB

---
name: biocontext-knowledge-queries
title: BioContext knowledge queries
description: "BioContext knowledge: UniProt, AlphaFold, STRING, Reactome, GO, PanglaoDB, PubMed, OpenTargets queries via ov.biocontext for gene annotation."
---

# BioContext Knowledge Queries

Use this skill when the user wants to look up gene/protein annotations, query pathway databases, find cell type markers, search biomedical literature, or explore drug-disease associations. BioContext provides programmatic access to 49 biomedical databases through a unified Python API.

This is a knowledge integration layer — use it to annotate analysis results (e.g., annotate DEG lists with protein function, find pathways for gene clusters, validate marker genes against PanglaoDB).

## Available Functions by Category

### Protein & Genomics

| Function | Database | Returns |
|----------|----------|---------|
| `query_uniprot(gene_symbol, species)` | UniProt | Protein function, domains, GO terms, references |
| `get_uniprot_id(protein_symbol, species)` | UniProt | UniProt accession ID |
| `query_alphafold(protein_symbol, species)` | AlphaFold | Predicted 3D structure, confidence scores |
| `get_ensembl_id(gene_symbol, species)` | Ensembl | Ensembl gene ID (ENSG...) |
| `query_interpro(protein_id, source_db)` | InterPro | Protein domains, families, structural info |
| `search_interpro(query, entry_type)` | InterPro | Domain search by keyword |
| `query_string(protein_symbol, species, min_score)` | STRING | Protein-protein interactions |
| `query_hpa(gene_symbol)` | Human Protein Atlas | Tissue expression, subcellular localization |

### Pathways & Functional

| Function | Database | Returns |
|----------|----------|---------|
| `query_reactome(identifier, species)` | Reactome | Pathway membership, reactions, disease links |
| `query_go(gene_name, size)` | Gene Ontology | GO term associations (BP, MF, CC) |

### Cell Biology

| Function | Database | Returns |
|----------|----------|---------|
| `query_panglaodb(species, cell_type, organ)` | PanglaoDB | Cell type marker genes with sensitivity/specificity |

### Literature

| Function | Database | Returns |
|----------|----------|---------|
| `search_literature(query, sort_by, page_size)` | Europe PMC | Publications matching keywords |
| `search_preprints(server, days, category)` | bioRxiv/medRxiv | Recent preprints |
| `get_fulltext(pmc_id)` | PubMed Central | Full-text article content |

### Drug & Clinical

| Function | Database | Returns |
|----------|----------|---------|
| `query_opentargets(query_string, variables)` | OpenTargets | Gene-disease-drug associations |
| `search_clinical_trials(condition, status)` | ClinicalTrials.gov | Active/completed trials |
| `search_drugs(brand_name, generic_name)` | openFDA | Drug information, approval status |

### Ontology

| Function | Database | Returns |
|----------|----------|---------|
| `query_efo(disease_name, size, exact_match)` | EFO | Experimental Factor Ontology terms |
| `query_chebi(chemical_name, size)` | ChEBI | Chemical entities, small molecules |
| `query_cell_ontology(cell_type, size)` | Cell Ontology | Standardized cell type hierarchy |

### Proteomics

| Function | Database | Returns |
|----------|----------|---------|
| `search_pride(keyword, page_size)` | PRIDE | Mass spectrometry proteomics datasets |

### Generic Access

```python
# List all 49 available tools with parameters
ov.biocontext.list_tools()

# Call any tool directly by name
result = ov.biocontext.call_tool("tool_name", param1=value1, ...)
```

## Usage Patterns

### Single gene lookup

```python
import omicverse as ov

# Get protein function and domains
info = ov.biocontext.query_uniprot(gene_symbol='TP53', species='9606')

# Get pathway membership
pathways = ov.biocontext.query_reactome(identifier='TP53', species='Homo sapiens')

# Get GO terms
go_terms = ov.biocontext.query_go(gene_name='TP53', size=20)
```

### Annotate a DEG list

```python
# After differential expression: annotate top genes with biological context
deg_genes = ['TP53', 'BRCA1', 'MYC', 'EGFR', 'KRAS']
annotations = {}
for gene in deg_genes:
    annotations[gene] = {
        'uniprot': ov.biocontext.query_uniprot(gene_symbol=gene),
        'pathways': ov.biocontext.query_reactome(identifier=gene),
        'go': ov.biocontext.query_go(gene_name=gene, size=5),
    }
```

### Find cell type markers

```python
# Get known markers for a cell type
markers = ov.biocontext.query_panglaodb(
    species='Hs',           # 'Hs' (human), 'Mm' (mouse), 'Dr' (zebrafish)
    cell_type='T cells',
    organ='Blood',
    min_sensitivity=0.5,
)
# Returns: DataFrame with gene symbols, sensitivity, specificity scores
```

### Drug target exploration

```python
# Find drugs targeting a gene
targets = ov.biocontext.query_opentargets(
    query_string='{ target(ensemblId: "ENSG00000141510") { associatedDiseases { rows { disease { name } score } } } }'
)

# Search clinical trials
trials = ov.biocontext.search_clinical_trials(condition='breast cancer', status='RECRUITING')
```

### Literature search

```python
# Search for papers
results = ov.biocontext.search_literature(
    query='single-cell RNA-seq BRCA1',
    sort_by='RELEVANCE',
    page_size=5,
)

# Get full text of a specific paper
text = ov.biocontext.get_fulltext(pmc_id='PMC1234567')
```

## Species Codes

Different databases use different species identifiers:

| Species | NCBI Taxon ID | Ensembl | PanglaoDB |
|---------|--------------|---------|-----------|
| Human | 9606 | homo_sapiens | Hs |
| Mouse | 10090 | mus_musculus | Mm |
| Zebrafish | 7955 | danio_rerio | Dr |
| Rat | 10116 | rattus_norvegicus | Rn |

Most functions default to human (9606 or homo_sapiens).

## Critical API Reference

### query_uniprot accepts multiple identifier types

```python
# By gene symbol (most common)
ov.biocontext.query_uniprot(gene_symbol='TP53')

# By UniProt accession
ov.biocontext.query_uniprot(protein_id='P04637')

# By protein name
ov.biocontext.query_uniprot(protein_name='Cellular tumor antigen p53')
```

### query_opentargets uses GraphQL

```python
# OpenTargets requires GraphQL query strings
# See OpenTargets Platform API docs for query syntax
result = ov.biocontext.query_opentargets(
    query_string='{ search(queryString: "BRCA1") { total hits { id name } } }'
)
```

## Troubleshooting

- **Empty results for a known gene**: Check species parameter. Default is human (9606) — pass `species='10090'` for mouse genes.
- **Timeout on large queries**: External API calls have network latency. For batch annotation, add small delays between calls to avoid rate limiting.
- **`ConnectionError`**: Requires internet access. BioContext queries external databases in real-time.
- **Gene symbol not found**: Some databases are case-sensitive. Human genes should be uppercase (TP53), mouse mixed-case (Tp53).
- **OpenTargets query fails**: GraphQL syntax must be exact. Use `ov.biocontext.list_tools()` to see available OpenTargets tool variants with example queries.

## Examples
- "Look up the protein function and pathways for my top 10 DEGs."
- "Find known T-cell markers from PanglaoDB for my annotation."
- "Search for recent papers about spatial transcriptomics and BRCA1."
- "What drugs target EGFR? Check clinical trials status."

## References
- Quick copy/paste commands: [`reference.md`](reference.md)

Overview

This skill provides programmatic access to integrated biomedical knowledge for gene and protein annotation. It queries UniProt, AlphaFold, STRING, Reactome, GO, PanglaoDB, PubMed/Europe PMC, OpenTargets and other resources through ov.biocontext. Use it to add biological context to omics results, validate markers, explore pathways, or investigate drug–disease links.

How this skill works

The skill calls BioContext tools via ov.biocontext to retrieve structured records from 49 biomedical databases. Functions accept common identifiers (gene symbol, UniProt accession, Ensembl ID) and species codes, returning JSON or DataFrame-ready results like protein domains, GO terms, pathway membership, marker scores, publications, and clinical links. You can list available tools or call any tool by name for custom GraphQL or REST queries.

When to use it

Annotating differential expression or cluster marker lists with protein function and pathways.
Validating candidate cell-type markers against PanglaoDB marker sets.
Finding protein interactions and network partners via STRING before network analysis.
Searching literature and preprints for gene-specific evidence and recent findings.
Exploring drug targets, associated diseases, and clinical trials for translational follow-up.

Best practices

Always set the correct species code (human default is 9606 or homo_sapiens) to avoid empty results.
Batch queries with small delays to reduce rate limiting and avoid timeouts when annotating large gene lists.
Use multiple identifier types (gene symbol, UniProt ID, Ensembl ID) if one returns no hits.
Limit GO and pathway sizes when inspecting many genes to keep responses lightweight.
For OpenTargets use valid GraphQL queries and test them on a single gene before batch execution.

Example use cases

Annotate top 50 DEGs with UniProt function, Reactome pathways, and top GO terms for reporting.
Confirm that scRNA-seq cluster markers match PanglaoDB T-cell markers prior to cell-type naming.
Retrieve AlphaFold structure confidence for candidate proteins before designing domain-focused experiments.
Search Europe PMC for recent papers linking a gene to a phenotype or assay used in your study.
Query OpenTargets to list diseases and drugs associated with a gene and check active clinical trials.

FAQ

What species identifiers should I use?

Species vary by source. Human defaults are 9606 or homo_sapiens and PanglaoDB uses 'Hs'. Use 10090/mus_musculus for mouse and check the function docs when in doubt.

Why am I getting empty results for a known gene?

Common causes are wrong species code, case-sensitive gene symbols, or API rate limits. Try alternate identifiers (UniProt/Ensembl) and add small delays between calls.

Can I run batch annotations for hundreds of genes?

Yes. Use loops or vectorized calls but add throttling to avoid rate limits and consider saving intermediate results to resume after failures.