home / mcp / pdf mcp server

PDF MCP Server

An MCP server for reading PDFs

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "averagejoeslab-pdf-reader-mcp": {
      "url": "http://localhost:8000/sse"
    }
  }
}

You have a PDF reading and analysis MCP server designed to extract, index, and reason about scholarly PDFs. It provides full-text extraction, metadata enrichment, and academic structure awareness so you can process PDFs with natural language prompts and targeted analyses, making it easier to build tooling that understands academic documents.

How to use

You connect to the PDF reader MCP server from your MCP client and run natural language requests to load PDFs, extract content, and perform academic analyses. You can load a PDF, retrieve its metadata, extract images and tables, render pages as high-resolution images, and run specialized academic prompts to summarize or analyze methodologies. You can also ask the server to detect sections like Abstract, Introduction, and Methods to quickly navigate long papers.

Basic workflows with the PDF reader

  • Load a PDF and cache it for faster repeated access
  • Extract the full text from a PDF document
  • Get metadata for a specific PDF to understand its properties
  • Render a page as a high-resolution image for figures or diagrams
  • Extract images, tables, and annotations from a PDF
  • Identify academic sections to understand document structure
  • Use academic prompts to summarize or analyze research methodology

Advanced prompts you can use

Summarize the PDF in a technical style focusing on methodology to capture how the study was designed and executed.

Analyze the structure of the PDF to understand its organization and how arguments are built across sections.

Extract citations and build a parsed references list to trace sources and credibility.

Available tools

load-pdf

Load and cache a PDF for processing and reuse in subsequent requests.

get-metadata

Retrieve document metadata and general information from the PDF.

extract-images

Extract embedded images from the PDF along with their metadata.

render-page

Render a specific PDF page as a high-resolution image.

extract-text

Extract the full text content from the PDF while preserving reading order.

extract-tables

Detect and extract table data from the document.

extract-annotations

Extract comments, highlights, and other annotations.

extract-academic-text

Extract text with proper reading order and preservation of mathematical formulas.

detect-sections

Identify academic sections such as Abstract, Introduction, Methods, and Results.

extract-abstract

Extract only the abstract section from the document.

extract-key-sections

Provide key sections optimized for agent understanding, such as Abstract, Methods, and Conclusions.

extract-citations

Parse in-text citations and reference lists for easy tracing of sources.

chunk-content

Break content into semantic chunks suitable for agent processing.

analyze-document-structure

Perform a comprehensive analysis of the document’s structure and organization.