home / skills / krishagel / geoffrey / pdf-to-markdown

pdf-to-markdown skill

/skills/pdf-to-markdown

This skill converts PDFs to clean Markdown with text-described images, preserving tables and formatting for AI processing.

npx playbooks add skill krishagel/geoffrey --skill pdf-to-markdown

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.2 KB
---
name: pdf-to-markdown
description: Convert PDF to clean Markdown with image content described as text. Use when user wants to convert a PDF to markdown, extract content from PDF, or prepare PDF content for AI tools.
allowed-tools: Read, Bash
version: 1.0.0
---

# PDF to Markdown Converter

Convert PDF files to clean, well-structured Markdown. Tables become markdown tables. Images and graphics are described as text (no image files generated).

## Quick Start

```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py input.pdf
```

Output: `~/Desktop/{filename}.md`

## Options

| Flag | Description |
|------|-------------|
| `--no-llm` | Skip LLM processing (faster, images become `[Image]` placeholders) |
| `--force-ocr` | Force OCR on all pages (for scanned PDFs) |
| `--page-range "0,5-10"` | Process specific pages only |

## Common Use Cases

### Convert a PDF with default settings
```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py ~/Documents/report.pdf
```

### Specify output location
```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py report.pdf ~/Documents/report.md
```

### Fast conversion (no image descriptions)
```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py --no-llm report.pdf
```

### Scanned PDF (force OCR)
```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py --force-ocr scanned_doc.pdf
```

### Extract specific pages
```bash
uv run skills/pdf-to-markdown/scripts/convert_to_markdown.py --page-range "0-5" large_report.pdf
```

## Output

- Pure Markdown text (no embedded images)
- Tables converted to Markdown table format
- Images/charts described as text using LLM
- Clean formatting suitable for AI processing

## Requirements

- **GEMINI_API_KEY**: Required for LLM image descriptions (loaded from 1Password)
- Use `--no-llm` flag if you don't have Gemini API access

## First Run Note

The first run downloads ML models (~1-2GB) which are cached at `~/.cache/marker/`. Subsequent runs are faster.

## Technical Details

Uses [Marker](https://github.com/VikParuchuri/marker) library:
- 31k+ GitHub stars
- Best-in-class PDF conversion accuracy
- Surya OCR for 90+ languages
- Gemini LLM integration for image understanding

Overview

This skill converts PDF files into clean, well-structured Markdown, turning tables into markdown tables and producing text descriptions for images and charts. It is optimized for preparing PDF content for AI tools or text workflows and supports OCR for scanned documents.

How this skill works

The tool parses PDF pages, converts layout elements into Markdown syntax, and uses an LLM to generate textual descriptions for images and graphics. Optional OCR runs on scanned pages, and a --no-llm mode produces fast conversions with image placeholders instead of descriptions.

When to use it

  • Prepare PDF content for AI ingestion, summarization, or fine-tuning.
  • Convert reports with tables into editable Markdown.
  • Extract text from scanned PDFs using OCR.
  • Create Markdown versions of documents when images need textual descriptions.
  • Quickly convert selected page ranges from large PDFs.

Best practices

  • Provide the GEMINI_API_KEY when you want descriptive image captions; otherwise use --no-llm for speed.
  • Use --force-ocr for scanned or low-quality PDFs to improve text extraction.
  • Specify --page-range to limit processing time on large documents.
  • Check the first-run cache (~1-2GB download) completes before batch jobs.
  • Review generated image descriptions for domain-specific accuracy and adjust prompts if needed.

Example use cases

  • Convert a research paper with tables and figures into Markdown for note-taking.
  • Extract selected pages from a long report into a Markdown summary for stakeholders.
  • Turn scanned meeting handouts into searchable Markdown using OCR.
  • Prepare product catalogs with tables for import into documentation sites.
  • Produce text-only versions of PDFs for accessibility or downstream AI pipelines.

FAQ

What output do I get?

A pure Markdown (.md) file with tables converted, text extracted, and images described as text (no image files).

Do I need an API key?

An LLM API key (GEMINI_API_KEY) is required for image descriptions. Use --no-llm if you don't have one.