Library/MinerU
Library/MinerUForked

opendatalab/MinerU

MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Builder

opendatalab

opendatalab

opendatalab • individual

Stars

57,939

Using upstream star count

Forks

4,793

Using upstream fork count

Open Issues

0

Activity Score

0/100

190 commits in 30d

Created

Feb 29, 2024

Project creation date

README Summary

MinerU is a Python tool that transforms complex documents like PDFs into clean, structured markdown and JSON formats optimized for Large Language Model workflows. It provides robust document parsing capabilities to extract text, tables, images, and other elements from various document types. The tool is designed to prepare documents for AI agent workflows and LLM processing pipelines.

AI Dev Skills

Unmapped

Document ProcessingComputer Vision for Document UnderstandingOCR (Optical Character Recognition)Layout AnalysisText ExtractionDocument Structure RecognitionData Pipeline EngineeringMultimodal Document Parsing

Tags

Document ProcessingComputer Vision for Document UnderstandingOCR (Optical Character Recognition)Layout AnalysisText ExtractionDocument Structure RecognitionData Pipeline EngineeringMultimodal Document ParsingContract AnalysisLegal Document ReviewRegulatory Document ComplianceMultimodalAcademic Literature MiningResearch Paper ProcessingAgentic AIOn-premiseDocument Question AnsweringCloudTextDocument AIFinancial ServicesPublishingMultimodal ReasoningEducationIntelligent Document ProcessingFinancial Report AnalysisRetrieval-Augmented GenerationHealthcareImageSelf-hostedGovernmentLegal TechPython

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

0

30 Days

190

90 Days

414

Quality

beta
Quality
medium
Maturity
beta

Categories

MLOps & InfrastructurePrimaryLearning ResourcesRAG & RetrievalEvals & BenchmarkingNLP & TextML Platform & InfrastructureHealthcare & BiologyFinance & LegalMultimodal AISearch & KnowledgeOther AI / MLComputer VisionFoundation ModelsAI Agents

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Feb 29, 2024
Forked
Mar 16, 2026
Your last push
1 months ago
Upstream last push
10 days ago
Tracked since
Mar 7, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…