Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/markitdown
Library/markitdownForked

microsoft/markitdown

markitdown

Python tool for converting files and office documents to Markdown.

View on GitHub↗Upstream microsoft/markitdown↗

Builder

Microsoft

Microsoft

microsoft • big-tech

Stars

130,562

Using upstream star count

Forks

8,953

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Nov 13, 2024

Project creation date

README Summary

[![PyPI](https://img.shields.io/pypi/v/markitdown.svg)](https://pypi.org/project/markitdown/) ![PyPI - Downloads](https://img.shields.io/pypi/dd/markitdown) [![Built by AutoGen Team](https://img.shields.io/badge/Built%20by-AutoGen%20Team-blue)](https://github.com/microsoft/autogen)

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Content Pipeline EngineeringData PreprocessingDocument ProcessingFile Format ConversionText Extraction

Tags

Content Pipeline EngineeringData PreprocessingDocument ProcessingFile Format ConversionText ExtractionAnthropic / ClaudeAutoGenBackendCLI ToolClaudeCourseDockerForkedLarge Language ModelsMCPOpen SourceOpenAIPythonSpeech to Text

Taxonomy

AI Trends

Retrieval-Augmented GenerationDocument AIContent Processing Pipelines

category

Foundation ModelsAI AgentsGenerative MediaMLOps & InfrastructureDev Tools & AutomationLearning Resources

Deployment Context

Self-hostedOn-premiseCloud API

Industries

Content ManagementDocument ManagementLegal TechEducationPublishing

Modalities

TextImageAudioVideoTabular

Skill Areas

Document ProcessingText ExtractionFile Format ConversionData PreprocessingContent Pipeline Engineering

tag

Anthropic / ClaudeAutoGenBackendCLI ToolClaudeCourseDockerForkedLarge Language ModelsMCPOpen SourceOpenAIPythonSpeech to Text

Use Cases

Document Preprocessing for RAG SystemsContent MigrationText Extraction for AnalysisDocument Format StandardizationKnowledge Base Conversion

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

2

Fix O(n) memory growth in PDF conversion by calling page.close() afte… (#1612)

lesyk • Mar 16, 2026

a6c8ac4

[MS] Add OCR layer service for embedded images and PDF scans (#1541)

lesyk • Mar 10, 2026

c6308dc

Bump version for release. (#1564)

afourney • Feb 20, 2026

4a5340f

Quality

beta
Quality
medium
Maturity
beta

Categories

MLOps & InfrastructurePrimaryDev Tools & AutomationLearning ResourcesFoundation ModelsAI AgentsGenerative MediaOther AI / ML

PM Skills

User ExperienceScale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Nov 13, 2024
Forked
Mar 13, 2026
Your last push
2 months ago
Upstream last push
1 months ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…