home / mcp / pdftotext mcp server

PDFtotext MCP Server

A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.

Installation
Add the following to your MCP client configuration file.

Configuration

View docs
{
  "mcpServers": {
    "jpwebb-pdftotext-mcp": {
      "command": "pdftotext-mcp",
      "args": []
    }
  }
}

You have a dedicated MCP server that extracts text from PDF documents using the reliable pdftotext utility. It supports extracting the whole document or specific pages, preserves layout when requested, handles multiple encodings, and returns rich metadata to help you verify results and troubleshoot issues quickly.

How to use

You connect to this server from any MCP-compatible client by calling the single tool read_pdf_text. Start by starting the local server or using a remote MCP endpoint, then issue read_pdf_text requests with the path to your PDF and optional parameters. You can request the full document or a specific page, enable layout preservation for better readability, and choose an encoding such as UTF-8, Latin1, or ASCII. The server responds with the extracted text and detailed metadata that helps you verify size, timing, and page boundaries.

Practical outcomes you can achieve include verifying exact text from a page, comparing text between pages, and integrating the results into downstream processing pipelines, QA workflows, or searchable document indexes. If a request fails, you’ll receive structured error information to guide troubleshooting.

How to install

Prerequisites: you must have the pdftotext utility installed on your system.

# Prerequisites install (examples)
# Ubuntu/Debian
sudo apt update
sudo apt install poppler-utils

# macOS
brew install poppler

# Windows (Chocolatey or Scoop)
choco install poppler
scoop install poppler

Additional sections

Configuration is done by providing a server entry to your MCP client configuration. You can run the server locally for testing or point clients at a remote MCP server if you deploy it elsewhere.

# Local development start (recommended for development)
npm install
npm start
```

```
# Optional global installation (exposes a binary)
npm install -g pdftotext-mcp
pdftotext-mcp --help
```

```
# Alternative using npx (no install required)
npx pdftotext-mcp

Troubleshooting

If you encounter issues, check common problems like a missing pdftotext tool, incorrect file paths, or permission issues. Ensure the target PDF exists and is readable by the MCP server process. When the server cannot read the file, verify file permissions and that you provided a correct path.

If you see client connection issues, verify that the MCP server is running and that your client configuration points to the correct command or URL. For local testing, ensure you started the server from the expected working directory and that the process is still running.

Testing

Run the project’s tests and linting to ensure your environment is correctly set up and code quality checks pass.

npm test
npm run lint

Available tools

read_pdf_text

Extracts text from PDF files with options to target a specific page, preserve layout, and choose encoding; returns the extracted text along with rich metadata such as file info, page range, encoding, and timing.