home / skills / spring-ai-alibaba / examples / arxiv-search

This skill helps you discover arXiv papers across physics, math, CS, and biology, delivering concise summaries for quick research insights.

npx playbooks add skill spring-ai-alibaba/examples --skill arxiv-search

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
3.6 KB
---
name: arxiv-search
description: Search arXiv preprint repository for papers in physics, mathematics, computer science, quantitative biology, and related fields, Instructions for searching arXiv papers using Python script and shell tool. Read SKILL.md to learn the workflow.
---

# arXiv Search Skill

This skill provides access to arXiv, a free distribution service and open-access archive for scholarly articles in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, systems science, and economics.

## When to Use This Skill

Use this skill when you need to:
- Find preprints and recent research papers before journal publication
- Search for papers in computational biology, bioinformatics, or systems biology
- Access mathematical or statistical methods papers relevant to biology
- Find machine learning papers applied to biological problems
- Get the latest research that may not yet be in PubMed

## How to Use

The skill provides a Python script that searches arXiv and returns formatted results.

### Basic Usage

**Note:** Always use the absolute path from your skills directory (shown in the system prompt above).

If running deepagents from a virtual environment:
```bash
.venv/bin/python [YOUR_SKILLS_DIR]/arxiv-search/arxiv_search.py "your search query" [--max-papers N]
```

Or for system Python:
```bash
python3 [YOUR_SKILLS_DIR]/arxiv-search/arxiv_search.py "your search query" [--max-papers N]
```

Replace `[YOUR_SKILLS_DIR]` with the absolute skills directory path from your system prompt (e.g., `~/.deepagents/agent/skills` or the full absolute path).

**Arguments:**
- `query` (required): The search query string (e.g., "neural networks protein structure", "single cell RNA-seq")
- `--max-papers` (optional): Maximum number of papers to retrieve (default: 10)

### Examples

Search for machine learning papers:
```bash
.venv/bin/python ~/.deepagents/agent/skills/arxiv-search/arxiv_search.py "deep learning drug discovery" --max-papers 5
```

Search for computational biology papers:
```bash
.venv/bin/python ~/.deepagents/agent/skills/arxiv-search/arxiv_search.py "protein folding prediction"
```

Search for bioinformatics methods:
```bash
.venv/bin/python ~/.deepagents/agent/skills/arxiv-search/arxiv_search.py "genome assembly algorithms"
```

## Output Format

The script returns formatted results with:
- **Title**: Paper title
- **Summary**: Abstract/summary text

Each paper is separated by blank lines for readability.

## Features

- **Relevance sorting**: Results ordered by relevance to query
- **Fast retrieval**: Direct API access with no authentication required
- **Simple interface**: Clean, easy-to-parse output
- **No API key required**: Free access to arXiv database

## Dependencies

This skill requires the `arxiv` Python package. The script will detect if it's missing and show an error.

**If you see "Error: arxiv package not installed":**

If running deepagents from a virtual environment (recommended), use the venv's Python:
```bash
.venv/bin/python -m pip install arxiv
```

Or for system-wide install:
```bash
python3 -m pip install arxiv
```

The package is not included in deepagents by default since it's skill-specific. Install it on-demand when first using this skill.

## Notes

- arXiv is particularly strong for:
    - Computer science (cs.LG, cs.AI, cs.CV)
    - Quantitative biology (q-bio)
    - Statistics (stat.ML)
    - Physics and mathematics
- Papers are preprints and may not be peer-reviewed
- Results include both recent uploads and older papers
- Best for computational/theoretical work in biology

Overview

This skill lets you search the arXiv preprint repository for papers across physics, mathematics, computer science, quantitative biology, and related fields. It provides a lightweight Python script and a shell-friendly interface to run queries and receive formatted results. The tool prioritizes relevance and returns titles and abstracts with fast, unauthenticated access.

How this skill works

The script uses the arXiv API via the arxiv Python package to submit queries, retrieve matching preprints, and print each paper's title and abstract. Results are ordered by relevance and separated by blank lines for easy reading or downstream parsing. The script accepts a free-text query and an optional maximum number of papers to return.

When to use it

  • When you need preprints or the latest research before journal publication
  • To find machine learning or computational-methods papers applied to biology
  • When searching for mathematical or statistical methods relevant to a problem
  • To compile literature for a review, experiment, or reproducibility check
  • When you want quick, unauthenticated access to arXiv search results

Best practices

  • Craft concise, specific queries with keywords and phrases (e.g., "deep learning protein structure")
  • Limit --max-papers to a reasonable number for faster responses and easier filtering
  • Run the script inside a Python virtual environment and install the arxiv package there
  • Verify preprints for peer-review status and check final published versions before citing
  • Pipe or redirect output to files if you plan to parse results programmatically

Example use cases

  • Quickly find recent papers on neural network approaches for protein folding
  • Search for new algorithms in genome assembly or single-cell RNA-seq analysis
  • Gather candidate papers for a literature review on statistical learning methods
  • Monitor arXiv for new uploads in cs.LG, stat.ML, or q-bio categories
  • Fetch abstracts to screen relevance before downloading full PDFs

FAQ

Do I need an API key to use this skill?

No. arXiv access is free and the script uses unauthenticated API endpoints.

What dependency is required?

Install the arxiv Python package (pip install arxiv) in the environment you run the script from.

How do I limit the number of results?

Use the --max-papers argument to set the maximum number of papers returned.