home / skills / bdambrosio / cognitive_workbench / semantic-scholar
This skill searches academic papers via Semantic Scholar API and returns structured notes with full text when available.
npx playbooks add skill bdambrosio/cognitive_workbench --skill semantic-scholarReview the files below or copy the command above to add this skill to your agents.
---
name: semantic-scholar
type: python
description: "Search academic papers. Returns Collection of JSON Notes with fields text (full paper text via GROBID when PDF available, otherwise abstract), metadata.title, metadata.authors, metadata.year, metadata.citations, metadata.uri (alias: pdf_url), metadata.venue"
---
# semantic-scholar
Search academic papers using Semantic Scholar API. Returns Collection of structured Notes with full paper text when PDF available.
## Input
- `query`: Query string (e.g., "attention mechanisms in neural networks")
- `limit`: Optional result limit (int, default: 10)
## Output
Success (`status: "success"`):
- `resource_id`: Collection ID containing structured Notes, each with:
- `text`: Full paper text (via GROBID) or abstract
- `format`: "paper"
- `metadata.title`: Paper title
- `metadata.authors`: List of authors
- `metadata.year`: Publication year
- `metadata.citations`: Citation count
- `metadata.uri`: PDF URL (may be null for paywalled papers)
- `metadata.venue`: Conference/journal name
- `char_count`: Character count
## Behavior
- When GROBID configured and PDF available, `text` contains full paper content
- Otherwise `text` contains the abstract
- Requires `SEMANTIC_SCHOLAR_API_KEY` environment variable
- Requires `grobid_url` in YAML config for full text extraction
## Content Structure
Each Note in the returned Collection has the following JSON structure:
```json
{
"text": "Full paper text or abstract...",
"format": "paper",
"metadata": {
"title": "Paper Title",
"authors": ["Author 1", "Author 2"],
"year": 2023,
"citations": 150,
"uri": "https://example.com/paper.pdf",
"venue": "NeurIPS"
},
"char_count": 5000
}
```
**Important:** All result data is in the Note's `content` field (a dict). Engine metadata (creation date, source tool, etc.) is separate and accessed via `get_resource_metadata()`, not via `content['metadata']`.
## Key Principle
**Results already contain full paper text in the `text` field.** Use extract/synthesize directly on the Collection — do NOT project metadata.uri for fetching. The URI is a PDF link for reference only; the text content is already loaded.
## Common Workflows
**Direct synthesis (preferred):**
```json
{"type":"semantic-scholar","query":"BERT model","out":"$papers"}
{"type":"synthesize","target":"$papers","focus":"key contributions of BERT","out":"$summary"}
```
**Per-paper extraction then synthesis:**
```json
{"type":"semantic-scholar","query":"attention mechanisms","out":"$papers"}
{"type":"map","target":"$papers","operation":"extract","instruction":"Extract the main architectural innovation","out":"$innovations"}
{"type":"synthesize","target":"$innovations","focus":"comparison of approaches","out":"$report"}
```
**Filter by year then analyze:**
```json
{"type":"filter-structured","target":"$papers","where":"metadata.year > 2020","out":"$recent_papers"}
{"type":"synthesize","target":"$recent_papers","focus":"recent advances","out":"$summary"}
```
**Extract paper metadata:**
```json
{"type":"project","target":"$papers","fields":["metadata.title","metadata.year","metadata.citations"],"out":"$paper_info"}
```
This skill searches academic papers via the Semantic Scholar API and returns a Collection of structured Notes containing paper text (full text when available) and rich metadata. It is designed to feed downstream extraction and synthesis workflows with ready-to-use paper content. Use it to gather papers, analyze contributions, or build literature summaries quickly.
The skill queries Semantic Scholar by a user-provided query and optional limit, then returns a Collection where each Note contains text (full paper via GROBID when a PDF is available, otherwise the abstract) plus metadata fields like title, authors, year, citations, pdf URL, and venue. It requires SEMANTIC_SCHOLAR_API_KEY in the environment and a grobid_url in the configuration if you want full-text extraction. Results are already loaded in Note.content.text, so downstream steps should operate on those texts rather than fetching the pdf_uri.
Do I need to download PDFs to get the full text?
No. When GROBID is configured and a PDF is available, the skill returns full paper text in Note.content.text. You should not fetch the metadata.uri to obtain content.
What configuration is required to get full texts?
Set the SEMANTIC_SCHOLAR_API_KEY environment variable and provide a grobid_url in the skill configuration. Without GROBID or a PDF link, the skill falls back to abstracts.