home / skills / 0xdarkmatter / claude-mods / markitdown

markitdown skill

safe

This skill converts local documents to clean Markdown using markitdown for PDFs, Word, Excel, slides, OCR images, and audio.

npx playbooks add skill 0xdarkmatter/claude-mods --skill markitdown

Review the files below or copy the command above to add this skill to your agents.

Files (4)

SKILL.md

2.8 KB

---
name: markitdown
description: "Convert local documents to Markdown using Microsoft's markitdown CLI. Best for: PDF, Word, Excel, PowerPoint, images (OCR), audio. Can fetch URLs but Jina is faster for web. Triggers on: convert to markdown, read PDF, parse document, extract text from, docx, xlsx, pptx, OCR image, local file."
compatibility: "Requires markitdown. Install: pip install markitdown"
allowed-tools: "Bash"
---

# markitdown - Document to Markdown

Convert local documents to clean Markdown. One tool for PDF, Word, Excel, PowerPoint, images, and more.

## When to Use markitdown

| Use Case | Recommendation |
|----------|----------------|
| **Local files (PDF, Word, Excel)** | ✅ **Use markitdown** - unique capability |
| **Web pages** | ❌ Use Jina (`r.jina.ai/`) - 5x faster |
| **Blocked/anti-bot sites** | ❌ Use Firecrawl |
| **OCR on images** | ✅ **Use markitdown** |
| **Audio transcription** | ✅ **Use markitdown** |

## Basic Usage

```bash
# Local files (primary use case)
markitdown document.pdf
markitdown report.docx
markitdown data.xlsx
markitdown slides.pptx
markitdown screenshot.png    # OCR

# URLs (works, but Jina is faster)
markitdown https://example.com

# Save output
markitdown document.pdf > document.md
```

## Supported Formats

| Format | Extensions | Notes |
|--------|------------|-------|
| PDF | `.pdf` | Text extraction, tables |
| Word | `.docx` | Formatting preserved |
| Excel | `.xlsx` | Tables to markdown |
| PowerPoint | `.pptx` | Slides as sections |
| Images | `.jpg`, `.png` | OCR text extraction |
| HTML | `.html` | Clean conversion |
| Audio | `.mp3`, `.wav` | Speech-to-text |
| Text | `.txt`, `.csv`, `.json`, `.xml` | Pass-through/structure |
| URLs | `https://...` | Works but slower than Jina |

## Benchmarked Performance (URLs)

| Tool | Avg Speed | Success Rate |
|------|-----------|--------------|
| Jina | **0.5s** | 10/10 |
| markitdown | 2.5s | 9/10 |
| Firecrawl | 4.5s | 10/10 |

**Verdict**: For URLs, use Jina. For local files, markitdown is the only option.

## Examples

```bash
# PDF to markdown (primary use case)
markitdown report.pdf > report.md

# Excel spreadsheet
markitdown financials.xlsx

# Image with text (OCR)
markitdown screenshot.png

# PowerPoint deck
markitdown presentation.pptx > slides.md

# Audio transcription
markitdown meeting.mp3 > transcript.md
```

## Comparison with Alternatives

| Task | markitdown | Alternative |
|------|------------|-------------|
| PDF text | `markitdown file.pdf` | PyMuPDF, pdfplumber |
| Word docs | `markitdown file.docx` | python-docx |
| Excel | `markitdown file.xlsx` | pandas, openpyxl |
| OCR | `markitdown image.png` | Tesseract |
| Web pages | Use Jina instead | `r.jina.ai/URL` (5x faster) |

**markitdown's advantage**: One CLI for all local document formats. No code needed.

Overview

This skill converts local documents into clean, readable Markdown using the markitdown CLI. It handles PDFs, Word, Excel, PowerPoint, images (with OCR), and audio transcription without requiring code. Use it when you need fast, reliable local file conversion into Markdown for notes, publishing, or downstream processing.

How this skill works

markitdown inspects the input file type and applies format-specific extraction: text and tables from PDFs and Office files, slide sections from PPTX, OCR for images, and speech-to-text for audio. It outputs structured Markdown with headings, lists, and tables where appropriate. For web URLs it can fetch and convert pages, but local file handling is its core strength and is optimized for fidelity over raw crawling speed.

When to use it

Convert local PDFs, DOCX, XLSX, PPTX to Markdown
Extract OCR text from screenshots, scanned images, or photos
Transcribe local audio files (MP3, WAV) into Markdown transcripts
Quickly convert structured spreadsheets into Markdown tables
When you want a single CLI tool to handle many document formats locally

Best practices

Run markitdown on local copies of files to avoid network overhead and improve reliability
Redirect output to a .md file (markitdown file.pdf > file.md) to preserve results and allow edits
For complex tables or heavy layout, review the Markdown and adjust formatting manually after conversion
Prefer markitdown for local documents; use a dedicated web crawler (Jina) for faster URL scraping
When OCR is critical, provide high-quality images (clear text, high resolution) for better results

Example use cases

Convert meeting PDFs and slide decks into editable Markdown notes
Turn exported Excel reports into Markdown tables for documentation or blogging
Extract text from scanned receipts or screenshots using OCR for expense tracking
Transcribe recorded interviews or meetings to generate searchable Markdown transcripts
Batch-convert a project folder of DOCX and PPTX files into Markdown for migration or archival

FAQ

Can markitdown handle web pages as well as local files?

Yes, it can fetch and convert URLs, but converting web pages is slower than using a web-optimized tool like Jina; prefer markitdown for local files.

How accurate is the OCR and audio transcription?

OCR and transcription work well on clear inputs: high-resolution images and clean audio yield the best results. Complex layouts or noisy audio may require manual cleanup.