home / skills / jackspace / claudeskillz / scientific-pkg-tooluniverse

scientific-pkg-tooluniverse skill

/skills/scientific-pkg-tooluniverse

This skill helps researchers discover, execute, and compose multi-step scientific tool workflows across bioinformatics and beyond using ToolUniverse.

npx playbooks add skill jackspace/claudeskillz --skill scientific-pkg-tooluniverse

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
9.8 KB
---
name: tooluniverse
description: Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows.
---

# ToolUniverse

## Overview

ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery.

**Key Capabilities:**
- Access 600+ scientific tools, models, datasets, and APIs
- Discover tools using natural language, semantic search, or keywords
- Execute tools through standardized AI-Tool Interaction Protocol
- Compose multi-step workflows for complex research problems
- Integration with Claude Desktop/Code via Model Context Protocol (MCP)

## When to Use This Skill

Use this skill when:
- Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools")
- Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis)
- Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.)
- Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening)
- Working with bioinformatics, cheminformatics, or structural biology tasks
- Analyzing gene expression, protein sequences, molecular structures, or clinical data
- Performing literature searches, pathway enrichment, or variant annotation
- Building automated scientific research workflows

## Quick Start

### Basic Setup
```python
from tooluniverse import ToolUniverse

# Initialize and load tools
tu = ToolUniverse()
tu.load_tools()  # Loads 600+ scientific tools

# Discover tools
tools = tu.run({
    "name": "Tool_Finder_Keyword",
    "arguments": {
        "description": "disease target associations",
        "limit": 10
    }
})

# Execute a tool
result = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000537"}  # Hypertension
})
```

### Model Context Protocol (MCP)
For Claude Desktop/Code integration:
```bash
tooluniverse-smcp
```

## Core Workflows

### 1. Tool Discovery

Find relevant tools for your research task:

**Three discovery methods:**
- `Tool_Finder` - Embedding-based semantic search (requires GPU)
- `Tool_Finder_LLM` - LLM-based semantic search (no GPU required)
- `Tool_Finder_Keyword` - Fast keyword search

**Example:**
```python
# Search by natural language description
tools = tu.run({
    "name": "Tool_Finder_LLM",
    "arguments": {
        "description": "Find tools for RNA sequencing differential expression analysis",
        "limit": 10
    }
})

# Review available tools
for tool in tools:
    print(f"{tool['name']}: {tool['description']}")
```

**See `references/tool-discovery.md` for:**
- Detailed discovery methods and search strategies
- Domain-specific keyword suggestions
- Best practices for finding tools

### 2. Tool Execution

Execute individual tools through the standardized interface:

**Example:**
```python
# Execute disease-target lookup
targets = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000616"}  # Breast cancer
})

# Get protein structure
structure = tu.run({
    "name": "AlphaFold_get_structure",
    "arguments": {"uniprot_id": "P12345"}
})

# Calculate molecular properties
properties = tu.run({
    "name": "RDKit_calculate_descriptors",
    "arguments": {"smiles": "CCO"}  # Ethanol
})
```

**See `references/tool-execution.md` for:**
- Real-world execution examples across domains
- Tool parameter handling and validation
- Result processing and error handling
- Best practices for production use

### 3. Tool Composition and Workflows

Compose multiple tools for complex research workflows:

**Drug Discovery Example:**
```python
# 1. Find disease targets
targets = tu.run({
    "name": "OpenTargets_get_associated_targets_by_disease_efoId",
    "arguments": {"efoId": "EFO_0000616"}
})

# 2. Get protein structures
structures = []
for target in targets[:5]:
    structure = tu.run({
        "name": "AlphaFold_get_structure",
        "arguments": {"uniprot_id": target['uniprot_id']}
    })
    structures.append(structure)

# 3. Screen compounds
hits = []
for structure in structures:
    compounds = tu.run({
        "name": "ZINC_virtual_screening",
        "arguments": {
            "structure": structure,
            "library": "lead-like",
            "top_n": 100
        }
    })
    hits.extend(compounds)

# 4. Evaluate drug-likeness
drug_candidates = []
for compound in hits:
    props = tu.run({
        "name": "RDKit_calculate_drug_properties",
        "arguments": {"smiles": compound['smiles']}
    })
    if props['lipinski_pass']:
        drug_candidates.append(compound)
```

**See `references/tool-composition.md` for:**
- Complete workflow examples (drug discovery, genomics, clinical)
- Sequential and parallel tool composition patterns
- Output processing hooks
- Workflow best practices

## Scientific Domains

ToolUniverse supports 600+ tools across major scientific domains:

**Bioinformatics:**
- Sequence analysis, alignment, BLAST
- Gene expression (RNA-seq, DESeq2)
- Pathway enrichment (KEGG, Reactome, GO)
- Variant annotation (VEP, ClinVar)

**Cheminformatics:**
- Molecular descriptors and fingerprints
- Drug discovery and virtual screening
- ADMET prediction and drug-likeness
- Chemical databases (PubChem, ChEMBL, ZINC)

**Structural Biology:**
- Protein structure prediction (AlphaFold)
- Structure retrieval (PDB)
- Binding site detection
- Protein-protein interactions

**Proteomics:**
- Mass spectrometry analysis
- Protein databases (UniProt, STRING)
- Post-translational modifications

**Genomics:**
- Genome assembly and annotation
- Copy number variation
- Clinical genomics workflows

**Medical/Clinical:**
- Disease databases (OpenTargets, OMIM)
- Clinical trials and FDA data
- Variant classification

**See `references/domains.md` for:**
- Complete domain categorization
- Tool examples by discipline
- Cross-domain applications
- Search strategies by domain

## Reference Documentation

This skill includes comprehensive reference files that provide detailed information for specific aspects:

- **`references/installation.md`** - Installation, setup, MCP configuration, platform integration
- **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools
- **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling
- **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution
- **`references/domains.md`** - Tool categorization by domain, use case examples
- **`references/api_reference.md`** - Python API documentation, hooks, protocols

**Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies.

## Example Scripts

Two executable example scripts demonstrate common use cases:

**`scripts/example_tool_search.py`** - Demonstrates all three discovery methods:
- Keyword-based search
- LLM-based search
- Domain-specific searches
- Getting detailed tool information

**`scripts/example_workflow.py`** - Complete workflow examples:
- Drug discovery pipeline (disease → targets → structures → screening → candidates)
- Genomics analysis (expression data → differential analysis → pathways)

Run examples to understand typical usage patterns and workflow composition.

## Best Practices

1. **Tool Discovery:**
   - Start with broad searches, then refine based on results
   - Use `Tool_Finder_Keyword` for fast searches with known terms
   - Use `Tool_Finder_LLM` for complex semantic queries
   - Set appropriate `limit` parameter (default: 10)

2. **Tool Execution:**
   - Always verify tool parameters before execution
   - Implement error handling for production workflows
   - Validate input data formats (SMILES, UniProt IDs, gene symbols)
   - Check result types and structures

3. **Workflow Composition:**
   - Test each step individually before composing full workflows
   - Implement checkpointing for long workflows
   - Consider rate limits for remote APIs
   - Use parallel execution when tools are independent

4. **Integration:**
   - Initialize ToolUniverse once and reuse the instance
   - Call `load_tools()` once at startup
   - Cache frequently used tool information
   - Enable logging for debugging

## Key Terminology

- **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse
- **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword)
- **Tool Execution**: Running a tool with specific arguments via `tu.run()`
- **Tool Composition**: Chaining multiple tools for multi-step workflows
- **MCP**: Model Context Protocol for integration with Claude Desktop/Code
- **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication

## Resources

- **Official Website**: https://aiscientist.tools
- **GitHub**: https://github.com/mims-harvard/ToolUniverse
- **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/
- **Installation**: `uv pip install tooluniverse`
- **MCP Server**: `tooluniverse-smcp`

Overview

This skill provides unified access to 600+ scientific tools, models, datasets, and APIs across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. It lets you discover relevant tools, execute individual analysis modules, and compose multi-step research pipelines using a standardized AI-tool interface. Use it to accelerate computational research workflows and integrate scientific resources into LLM-driven automation.

How this skill works

ToolUniverse exposes a Python API that loads a catalog of tools and exposes a single run(...) interface for discovery and execution. Discovery supports keyword, LLM semantic, and embedding-based search; execution invokes individual tools (APIs, models, or scripts) with validated arguments. You can chain run calls to compose sequential or parallel workflows and integrate with model contexts via the MCP server for agentic automation.

When to use it

  • Searching for tools by function or domain (e.g., protein structure prediction, RNA-seq analysis).
  • Executing computational biology tasks like target identification, structure retrieval, or descriptor calculation.
  • Composing multi-step research pipelines (target discovery → structure prediction → virtual screening).
  • Accessing domain databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG).
  • Automating repetitive analysis steps or integrating scientific tools into LLM-based agents.

Best practices

  • Initialize ToolUniverse and call load_tools() once at startup to avoid repeated loading.
  • Start with broad discovery queries, inspect results, then refine with keywords or LLM prompts.
  • Validate input formats (SMILES, UniProt IDs, gene symbols) and verify tool parameters before execution.
  • Test each workflow step independently and add checkpointing for long-running pipelines.
  • Handle API rate limits and enable logging and error handling for production runs.

Example use cases

  • Find tools for RNA-seq differential expression, run DESeq2, and perform pathway enrichment.
  • Identify disease-associated targets via OpenTargets, fetch AlphaFold structures, and run virtual screens.
  • Compute molecular descriptors for compound libraries with RDKit and filter drug-like candidates.
  • Retrieve UniProt entries and annotate variants using VEP or ClinVar integrations.
  • Assemble automated agent workflows that combine semantic tool discovery with programmatic execution via MCP.

FAQ

How do I search for tools when I don't know exact keywords?

Use the LLM-based Tool_Finder_LLM for semantic search; it accepts natural-language descriptions and returns relevant tools.

Can I run multiple tools in parallel?

Yes. Compose workflows that run independent tools in parallel, but respect remote API rate limits and add checkpointing.