home / mcp / pdf mcp server

PDF MCP Server

Provides a production-ready MCP server for PDF processing with intelligent caching and specialized read, search, and extract tools.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "jztan-pdf-mcp": {
      "command": "pdf-mcp",
      "args": [],
      "env": {
        "PDF_MCP_CACHE_DIR": "path to cache directory (default: ~/.cache/pdf-mcp)",
        "PDF_MCP_CACHE_TTL": "TTL in hours (default: 24)"
      }
    }
  }
}

You can run the pdf-mcp server locally to process PDF documents with intelligent caching, enabling fast reads, searches, and image extraction for AI agents and applications. This MCP server provides specialized tools to read, navigate, and analyze PDFs efficiently, even when dealing with large files or repeated access.

How to use

You interact with pdf-mcp through an MCP client, using the server to access a suite of PDF-focused tools. Start the server locally, then connect your client to the server using the standard MCP connection method described for your client. Once connected, you can inspect a document, read specific page ranges in chunks, search for content before loading, extract images, and leverage a persistent cache to speed up subsequent accesses.

How to install

Prerequisites: ensure you have Python and the Python package manager available on your system.

pip install pdf-mcp

Caching and configuration

The server uses a persistent SQLite cache to accelerate repeated access and survive server restarts. The default cache location is a hidden folder under your home directory. You can configure the cache directory and time-to-live (TTL) for cached items using environment variables.

Important cache details include automatic invalidation when the document changes, a manual clear option, and a configurable TTL to balance freshness with speed.

Environment variables you may use to customize caching include the cache directory and the cache TTL in hours.

Using the PDF tools

The server exposes eight specialized tools to work with PDFs. You typically start by inspecting the document, then read specific pages, search within the document, and optionally extract images or inspect the table of contents. Each tool is designed to help you build concise, chunked workflows that keep context within reasonable limits.

Tool capabilities at a glance

- pdf_info: Gather document metadata, page count, and contents to plan reads. Always begin with this to understand the document.

- pdf_read_pages: Read defined page ranges or specific pages in manageable chunks.

- pdf_read_all: Read the entire document when it is small and a safety limit allows.

- pdf_search: Find relevant sections before loading full content.

- pdf_get_toc: Retrieve the table of contents for quick navigation.

- pdf_extract_images: Extract images from specified pages as base64-encoded PNGs.

- pdf_cache_stats: View statistics about the cache.

- pdf_cache_clear: Clear expired or undesirable cache entries.

Example workflow

For a large document, start by inspecting the PDF, then read relevant page ranges in batches and finally synthesize a response from the gathered chunks.

Development

If you plan to contribute or build locally, you can install the package in editable mode and run tests and checks as part of your workflow.

Available tools

pdf_info

Get document information including page count, metadata, table of contents, file size, and estimated tokens. This should be called first to understand the document before reading.

pdf_read_pages

Read specific page ranges or individual pages in manageable chunks to control context size during processing.

pdf_read_all

Read the entire document when it fits within the safety limit defined by the server.

pdf_search

Search within the PDF to locate relevant pages before loading the content.

pdf_get_toc

Retrieve the table of contents to navigate the document structure quickly.

pdf_extract_images

Extract images from specified pages and encode them as base64 PNGs.

pdf_cache_stats

Show statistics about the PDF cache, including hit rate and size.

pdf_cache_clear

Clear expired or undesired cache entries to free space and refresh data.