MarkItDown is a lightweight Python utility that converts various file formats to Markdown for use with LLMs and text analysis pipelines. It preserves important document structure like headings, lists, tables, and links while converting documents such as PDFs, PowerPoint presentations, Word documents, Excel spreadsheets, images, audio files, and more into Markdown format.
MarkItDown requires Python 3.10 or higher. It's recommended to use a virtual environment:
With standard Python:
python -m venv .venv
source .venv/bin/activate
With uv:
uv venv --python=3.12 .venv
source .venv/bin/activate
# NOTE: Be sure to use 'uv pip install' rather than just 'pip install'
With Anaconda:
conda create -n markitdown python=3.12
conda activate markitdown
Install MarkItDown with pip:
pip install 'markitdown[all]'
Or install from source:
git clone [email protected]:microsoft/markitdown.git
cd markitdown
pip install -e 'packages/markitdown[all]'
Convert a file to Markdown:
markitdown path-to-file.pdf > document.md
Specify an output file:
markitdown path-to-file.pdf -o document.md
Pipe content:
cat path-to-file.pdf | markitdown
You can install only the dependencies you need:
pip install 'markitdown[pdf,docx,pptx]'
Available optional dependencies:
[all]: Installs all optional dependencies[pptx]: For PowerPoint files[docx]: For Word files[xlsx]: For Excel files[xls]: For older Excel files[pdf]: For PDF files[outlook]: For Outlook messages[az-doc-intel]: For Azure Document Intelligence[audio-transcription]: For audio transcription[youtube-transcription]: For YouTube video transcriptionList installed plugins:
markitdown --list-plugins
Enable plugins:
markitdown --use-plugins path-to-file.pdf
Use Microsoft Document Intelligence for conversion:
markitdown path-to-file.pdf -o document.md -d -e "<document_intelligence_endpoint>"
More information about setting up an Azure Document Intelligence Resource can be found at Microsoft Learn.
Basic usage:
from markitdown import MarkItDown
md = MarkItDown(enable_plugins=False) # Set to True to enable plugins
result = md.convert("test.xlsx")
print(result.text_content)
Document Intelligence conversion:
from markitdown import MarkItDown
md = MarkItDown(docintel_endpoint="<document_intelligence_endpoint>")
result = md.convert("test.pdf")
print(result.text_content)
Using LLMs for image descriptions:
from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o", llm_prompt="optional custom prompt")
result = md.convert("example.jpg")
print(result.text_content)
docker build -t markitdown:latest .
docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md
MarkItDown offers an MCP (Model Context Protocol) server for integration with LLM applications like Claude Desktop. To use the MCP server, you'll need to install the markitdown-mcp package, which is available in the main MarkItDown repository.
pip install 'markitdown-mcp'
markitdown-mcp
The server will start on the default port 8080. You can specify a different port using the --port option:
markitdown-mcp --port 9000
Once running, the MCP server can be integrated with LLM applications that support the Model Context Protocol to provide document conversion capabilities.
To add this MCP server to Claude Code, run this command in your terminal:
claude mcp add-json "markitdown-mcp" '{"command":"npx","args":["-y","markitdown-mcp"]}'
See the official Claude Code MCP documentation for more details.
There are two ways to add an MCP server to Cursor. The most common way is to add the server globally in the ~/.cursor/mcp.json file so that it is available in all of your projects.
If you only need the server in a single project, you can add it to the project instead by creating or adding it to the .cursor/mcp.json file.
To add a global MCP server go to Cursor Settings > Tools & Integrations and click "New MCP Server".
When you click that button the ~/.cursor/mcp.json file will be opened and you can add your server like this:
{
"mcpServers": {
"markitdown-mcp": {
"command": "npx",
"args": [
"-y",
"markitdown-mcp"
]
}
}
}
To add an MCP server to a project you can create a new .cursor/mcp.json file or add it to the existing one. This will look exactly the same as the global MCP server example above.
Once the server is installed, you might need to head back to Settings > MCP and click the refresh button.
The Cursor agent will then be able to see the available tools the added MCP server has available and will call them when it needs to.
You can also explicitly ask the agent to use the tool by mentioning the tool name and describing what the function does.
To add this MCP server to Claude Desktop:
1. Find your configuration file:
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.json2. Add this to your configuration file:
{
"mcpServers": {
"markitdown-mcp": {
"command": "npx",
"args": [
"-y",
"markitdown-mcp"
]
}
}
}
3. Restart Claude Desktop for the changes to take effect