home / skills / steipete / agent-scripts / markdown-converter

markdown-converter skill

/skills/markdown-converter

This skill converts documents and media to Markdown using UVX markitdown, enabling seamless LLM processing without installation.

npx playbooks add skill steipete/agent-scripts --skill markdown-converter

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
1.9 KB
---
name: markdown-converter
description: Convert documents and files to Markdown using markitdown. Use when converting PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls), HTML, CSV, JSON, XML, images (with EXIF/OCR), audio (with transcription), ZIP archives, YouTube URLs, or EPubs to Markdown format for LLM processing or text analysis.
---

# Markdown Converter

Convert files to Markdown using `uvx markitdown` — no installation required.

## Basic Usage

```bash
# Convert to stdout
uvx markitdown input.pdf

# Save to file
uvx markitdown input.pdf -o output.md
uvx markitdown input.docx > output.md

# From stdin
cat input.pdf | uvx markitdown
```

## Supported Formats

- **Documents**: PDF, Word (.docx), PowerPoint (.pptx), Excel (.xlsx, .xls)
- **Web/Data**: HTML, CSV, JSON, XML
- **Media**: Images (EXIF + OCR), Audio (EXIF + transcription)
- **Other**: ZIP (iterates contents), YouTube URLs, EPub

## Options

```bash
-o OUTPUT      # Output file
-x EXTENSION   # Hint file extension (for stdin)
-m MIME_TYPE   # Hint MIME type
-c CHARSET     # Hint charset (e.g., UTF-8)
-d             # Use Azure Document Intelligence
-e ENDPOINT    # Document Intelligence endpoint
--use-plugins  # Enable 3rd-party plugins
--list-plugins # Show installed plugins
```

## Examples

```bash
# Convert Word document
uvx markitdown report.docx -o report.md

# Convert Excel spreadsheet
uvx markitdown data.xlsx > data.md

# Convert PowerPoint presentation
uvx markitdown slides.pptx -o slides.md

# Convert with file type hint (for stdin)
cat document | uvx markitdown -x .pdf > output.md

# Use Azure Document Intelligence for better PDF extraction
uvx markitdown scan.pdf -d -e "https://your-resource.cognitiveservices.azure.com/"
```

## Notes

- Output preserves document structure: headings, tables, lists, links
- First run caches dependencies; subsequent runs are faster
- For complex PDFs with poor extraction, use `-d` with Azure Document Intelligence

Overview

This skill converts a wide range of document and media formats to clean, structured Markdown using the markitdown tool. It’s designed for fast, repeatable conversion so you can prepare content for LLM processing, text analysis, or publishing. The tool preserves headings, tables, lists, links, and metadata where possible.

How this skill works

The converter accepts files, streams, archives, URLs, and media, then runs format-specific extraction to emit Markdown. It supports hints for extension, MIME type, and charset, can use OCR and audio transcription for images and audio, and offers an option to leverage Azure Document Intelligence for improved PDF extraction. Outputs can be written to stdout or saved to a file.

When to use it

  • Preparing PDFs, Word, PowerPoint, Excel, or EPUB for LLM input or analysis.
  • Extracting text from images (OCR) or audio (transcription) into Markdown.
  • Converting HTML, CSV, JSON, or XML to human-readable Markdown for review.
  • Batch-processing ZIP archives containing mixed file types into Markdown.
  • Pulling and converting content from YouTube URLs into text for summarization.

Best practices

  • Provide a file extension or MIME hint (-x or -m) when piping data via stdin to ensure correct detection.
  • Use the -o option to write output to files for reproducible pipelines and avoid shell redirection when possible.
  • Enable Azure Document Intelligence (-d and -e) for complex or scanned PDFs to improve layout and text fidelity.
  • Run the conversion once to warm the cache before bulk processing to reduce subsequent run times.
  • Use --use-plugins only when you trust installed third-party plugins; list plugins first with --list-plugins.

Example use cases

  • Convert a research report PDF to Markdown for citation extraction and summarization.
  • Transform a client’s PowerPoint deck into a Markdown outline for drafting a blog post.
  • Batch-convert an archive of mixed files (ZIP) to Markdown for ingestion into a knowledge base.
  • OCR scanned receipts and invoices (images) into Markdown for bookkeeping or expense analysis.
  • Transcribe and convert podcast audio to Markdown for show notes and SEO-friendly transcripts.

FAQ

How do I convert streamed input or stdin?

Pipe data to the tool and supply a file extension hint with -x or a MIME hint with -m so markitdown can detect the format correctly.

When should I use Azure Document Intelligence?

Enable -d with an endpoint (-e) for scanned or complex PDFs where native extraction misses layout, tables, or embedded text.