home / mcp / pdftotext mcp server
A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.
Configuration
View docs{
"mcpServers": {
"jpwebb-pdftotext-mcp": {
"command": "pdftotext-mcp",
"args": []
}
}
}You have a dedicated MCP server that extracts text from PDF documents using the reliable pdftotext utility. It supports extracting the whole document or specific pages, preserves layout when requested, handles multiple encodings, and returns rich metadata to help you verify results and troubleshoot issues quickly.
You connect to this server from any MCP-compatible client by calling the single tool read_pdf_text. Start by starting the local server or using a remote MCP endpoint, then issue read_pdf_text requests with the path to your PDF and optional parameters. You can request the full document or a specific page, enable layout preservation for better readability, and choose an encoding such as UTF-8, Latin1, or ASCII. The server responds with the extracted text and detailed metadata that helps you verify size, timing, and page boundaries.
Practical outcomes you can achieve include verifying exact text from a page, comparing text between pages, and integrating the results into downstream processing pipelines, QA workflows, or searchable document indexes. If a request fails, youβll receive structured error information to guide troubleshooting.
Prerequisites: you must have the pdftotext utility installed on your system.
# Prerequisites install (examples)
# Ubuntu/Debian
sudo apt update
sudo apt install poppler-utils
# macOS
brew install poppler
# Windows (Chocolatey or Scoop)
choco install poppler
scoop install popplerConfiguration is done by providing a server entry to your MCP client configuration. You can run the server locally for testing or point clients at a remote MCP server if you deploy it elsewhere.
# Local development start (recommended for development)
npm install
npm start
```
```
# Optional global installation (exposes a binary)
npm install -g pdftotext-mcp
pdftotext-mcp --help
```
```
# Alternative using npx (no install required)
npx pdftotext-mcpIf you encounter issues, check common problems like a missing pdftotext tool, incorrect file paths, or permission issues. Ensure the target PDF exists and is readable by the MCP server process. When the server cannot read the file, verify file permissions and that you provided a correct path.
If you see client connection issues, verify that the MCP server is running and that your client configuration points to the correct command or URL. For local testing, ensure you started the server from the expected working directory and that the process is still running.
Run the projectβs tests and linting to ensure your environment is correctly set up and code quality checks pass.
npm test
npm run lintExtracts text from PDF files with options to target a specific page, preserve layout, and choose encoding; returns the extracted text along with rich metadata such as file info, page range, encoding, and timing.