home / skills / modelcontextprotocol / ext-apps / pdf-reading

pdf-reading skill

safe

/examples/pdf-server/plugin/skills/pdf-reading

This skill helps you view and navigate PDFs with a local server, offering summaries and question-answering after display.

npx playbooks add skill modelcontextprotocol/ext-apps --skill pdf-reading

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.2 KB

---
name: pdf-reading
description: Read and navigate PDF documents using the local PDF server. Reference this when the user wants to open, read, search, or interact with any PDF file or academic paper.
---

# PDF Reading

You have access to a local PDF server that provides interactive document viewing.

## Available Tools

- **list_pdfs** -- Show available PDFs. Call with no arguments.
- **display_pdf** -- Open a PDF in the interactive viewer.
  - `url` (string): URL or local file path
  - `page` (number, optional): Starting page number

## How to Use

**When the user mentions a PDF, paper, or document:**
1. If they give a URL or path, call `display_pdf` directly
2. If they say "open the paper" without specifying, call `list_pdfs` to show options
3. After displaying, offer to summarize, extract data, answer questions

**arXiv shortcuts:**
- `arxiv.org/abs/2301.12345` is auto-converted to the PDF URL
- Users can just say "open arxiv 2301.12345"

**Supported remote sources:**
arXiv, bioRxiv, medRxiv, chemRxiv, Zenodo, OSF, HAL Science, SSRN

## Best Practices

- Always display the PDF before trying to analyze it
- For multi-page documents, ask which section the user cares about
- When comparing papers, display them one at a time and note key differences

Overview

This skill lets the agent open, navigate, and interact with PDF documents via a local PDF server. It supports listing available PDFs, opening a specific file or URL, and starting on a given page. Use it as the primary tool whenever the user wants to read, search, or analyze a paper or document.

How this skill works

The skill exposes two core actions: list_pdfs to show available documents and display_pdf to open a PDF in an interactive viewer. If a user provides a URL or local path, display_pdf is called directly (optionally with a starting page). If the user asks to open a document without specifying which, the skill calls list_pdfs so the user can pick.

When to use it

User asks to open or read a specific PDF, paper, or report (provides URL or path).
User requests to browse available local PDFs before selecting one to read.
User asks to read a particular page, section, or start point in a PDF.
User wants the agent to summarize, extract data, or answer questions about a document.
User requests to open an arXiv/bioRxiv/medRxiv/Zenodo/other supported source by identifier.

Best practices

Always display the PDF first so the agent and user reference the same document.
For long or multi-section documents, ask which page range or section the user cares about.
When given an arXiv ID or supported repository ID, convert it to the PDF URL automatically.
Compare papers one at a time in the viewer and call out specific pages or figures when noting differences.
Confirm file selection when list_pdfs returns multiple similarly named documents.

Example use cases

User says 'open arxiv 2301.12345' — convert to PDF URL and call display_pdf starting at page 1.
User asks 'open the latest report' with no path — call list_pdfs so the user can choose a file.
User requests 'summarize pages 10–15' after a PDF is displayed — navigate to those pages and extract key points.
User wants data extraction from a methods section — ask which section or page, then display and extract.
User asks to compare two papers — display the first, note key findings, then display the second and highlight differences.

FAQ

What sources are supported?

arXiv, bioRxiv, medRxiv, chemRxiv, Zenodo, OSF, HAL Science, SSRN, plus local file paths and direct PDF URLs.

What if the user doesn’t specify which PDF to open?

Call list_pdfs to present available documents and prompt the user to choose.