home / skills / bdambrosio / cognitive_workbench / extract-references
This skill extracts references from PDFs using GROBID and returns a collection of notes with structured metadata for each citation.
npx playbooks add skill bdambrosio/cognitive_workbench --skill extract-referencesReview the files below or copy the command above to add this skill to your agents.
---
name: extract-references
type: python
description: "Extract bibliography/references from PDF files using GROBID and return a Collection of Notes (one per reference)."
schema_hint: {"path": "string (PDF file path or Note ID)", "grobid_url": "string"}
---
# extract-references
Extract bibliography/references from PDF files using GROBID. Returns a Collection of Notes, where each Note contains structured metadata for one reference (compatible with format-citation).
## Input
- `path`: PDF file path (absolute) or Note ID containing PDF URL/metadata (required)
- `grobid_url`: Optional GROBID server URL (from world_config)
## Output
Success returns:
- `resource_id`: Collection ID containing Notes (one Note per reference)
- Each Note contains:
- `data`: Structured reference metadata (title, authors, year, venue, doi, url)
- `metadata`: Source PDF, reference index, raw citation text
## Behavior
- Uses GROBID to parse PDF and extract references from `<bibl>` elements
- Creates one Note per reference with structured metadata
- Returns empty Collection if no references found
- Reference Notes are compatible with `format-citation` tool
## Examples
```json
{"type":"extract-references","path":"/path/to/paper.pdf","out":"$refs"}
{"type":"extract-references","path":"$paper_note","out":"$refs"}
{"type":"extract-references","path":"$refs","format-citation":"bibtex","out":"$bibtex"}
```
This skill extracts bibliography and reference entries from PDF files using a GROBID server and returns a Collection of Notes, one per reference. Each Note contains structured citation metadata (title, authors, year, venue, DOI, URL) plus source metadata to support downstream citation formatting or linking. The Collection is compatible with standard citation formatting tools.
The skill sends the PDF (or a Note that contains a PDF URL/metadata) to a GROBID instance, which parses the document and locates <bibl> elements representing bibliography entries. For each extracted reference it builds a Note that includes structured fields and the raw citation text, then stores all Notes in a Collection and returns the Collection ID. If no references are found, an empty Collection is returned.
What input formats are supported?
Input must be a PDF file path (absolute) or a Note ID that contains a PDF URL/metadata. Other formats are not supported.
What happens if GROBID fails to parse references?
If GROBID cannot find <bibl> elements or parsing fails, the skill returns an empty Collection. Check logs, PDF quality, or try a different GROBID instance.