home / skills / bdambrosio / cognitive_workbench / extract-struct
This skill extracts structured metadata from academic paper text using an LLM and outputs JSON with title, authors, year, venue, and abstract.
npx playbooks add skill bdambrosio/cognitive_workbench --skill extract-structReview the files below or copy the command above to add this skill to your agents.
---
name: extract-struct
type: prompt_augmentation
description: "Extract structured metadata (title, authors, year) from paper text using LLM. Use to convert online search results metadata to JSON"
---
# extract-struct
Extract structured metadata from academic paper text using LLM analysis.
## Input
- `target`: Note ID or variable containing full text or first pages of academic paper
## Output
Returns JSON Note with:
- `title`: Paper title
- `authors`: List of author names
- `year`: Publication year
- `venue`: Conference/journal if identifiable
- `abstract`: Paper abstract if present
## Behavior
- Uses LLM to analyze paper text and extract structured fields
- Handles various paper formats and layouts
- Returns only JSON, no explanation text
## Planning Notes
- Provide full text or first few pages for best results
- Works best with academic papers that have clear title/author sections
- Use with `fetch-text` to get paper content first
## Example
```json
{"type":"extract-struct","target":"$paper_text","out":"$metadata"}
```
This skill extracts structured metadata (title, authors, year, venue, abstract) from academic paper text using a large language model. It converts unstructured paper content into clean JSON suitable for databases, search results, or bibliographic tools. The skill focuses on robustness across common paper layouts and noisy text extractions.
The skill analyzes the provided paper text (full text or first pages) with an LLM prompt tuned to identify title, authors, year, venue, and abstract. It handles variations in formatting, headers, and page artifacts, and normalizes the results into a single JSON object. The output is strictly JSON with the fields title, authors (array), year, venue, and abstract when present.
What input does the skill expect?
A Note ID or variable containing the paper text; ideally the first pages or full text.
What does the skill return?
A JSON object with title, authors (list), year, venue, and abstract when available.