MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines.
Microsoft has now released an MCP Server, for integration with LLM apps.
Why Markitdown?
Markitdown is extremely close to plain text, with minimal markup or formatting, but still provides a way to represent important document structure. Mainstream LLMs, such as OpenAI's GPT-4o, natively "speak" Markdown, and often incorporate Markdown into their responses unprompted.
This suggests that they have been trained on vast amounts of Markdown-formatted text, and understand it well. As a side benefit, Markitdown conventions are also highly token-efficient.