Library/markitdown
Library/markitdownForked

microsoft/markitdown

markitdown

Python tool for converting files and office documents to Markdown.

Builder

Microsoft

Microsoft

microsoft • big-tech

Stars

93,202

Using upstream star count

Forks

5,614

Using upstream fork count

Open Issues

0

Activity Score

0/100

2 commits in 30d

Created

Nov 13, 2024

Project creation date

README Summary

MarkItDown is a Python utility that converts various file formats including PDF, PowerPoint, Word, Excel, images, audio, HTML, and text files into Markdown format. It provides both a simple Python API and command-line interface for easy integration into workflows. The tool leverages different libraries and services to handle format-specific conversions while maintaining document structure and content.

AI Dev Skills

Unmapped

Document ProcessingText ExtractionFile Format ConversionData PreprocessingContent Pipeline Engineering

Tags

Document ProcessingText ExtractionFile Format ConversionData PreprocessingContent Pipeline EngineeringText Extraction for AnalysisKnowledge Base ConversionContent ManagementImageDocument Format StandardizationVideoTabularDocument Preprocessing for RAG SystemsEducationContent MigrationDocument AIOn-premiseSelf-hostedRetrieval-Augmented GenerationCloud APIDocument ManagementContent Processing PipelinesPublishingAudioLegal TechTextPythonCLI

Taxonomy

Recent Activity

Updated 28 days ago

7 Days

0

30 Days

2

90 Days

7

Quality

beta
Quality
medium
Maturity
beta

Categories

MLOps & InfrastructurePrimaryDev Tools & AutomationRAG & RetrievalEvals & BenchmarkingML Platform & InfrastructureCoding & Dev ToolsFinance & LegalEdge & Mobile AISearch & KnowledgeOther AI / MLGenerative Media

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Nov 13, 2024
Forked
Mar 13, 2026
Your last push
28 days ago
Upstream last push
14 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…