Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/MinerU
Library/MinerUForked

opendatalab/MinerU

MinerU

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

View on GitHub↗Upstream opendatalab/MinerU↗

Builder

opendatalab

opendatalab

opendatalab • individual

Stars

65,656

Using upstream star count

Forks

5,538

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Feb 29, 2024

Project creation date

README Summary

<div align="center" xmlns="http://www.w3.org/1999/html"> <!-- logo --> <p align="center"> <img src="https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/docs/images/MinerU-logo.png" width="300px" style="vertical-align:middle;"> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Computer Vision for Document UnderstandingData Pipeline EngineeringDocument ProcessingDocument Structure RecognitionLayout AnalysisMultimodal Document ParsingOCR (Optical Character Recognition)Text Extraction

Tags

Computer Vision for Document UnderstandingData Pipeline EngineeringDocument ProcessingDocument Structure RecognitionLayout AnalysisMultimodal Document ParsingOCR (Optical Character Recognition)Text ExtractionAWSBenchmarkingComputer VisionDockerEvalsForkedGPU / CUDAHuggingFaceJupyterLLM ServingLarge Language ModelsMultimodal AIOpenAIPythonResearch / PapersSGLangvLLM

Taxonomy

AI Trends

Agentic AIRetrieval-Augmented GenerationDocument AIMultimodal Reasoning

category

Foundation ModelsRAG & RetrievalEvals & BenchmarkingInference & ServingComputer VisionMLOps & InfrastructureCloud & PlatformsLearning ResourcesData Science & Analytics

Deployment Context

Self-hostedOn-premiseCloud

Industries

Legal TechHealthcareFinancial ServicesEducationGovernmentPublishing

Modalities

TextImageMultimodal

Skill Areas

Document ProcessingComputer Vision for Document UnderstandingOCR (Optical Character Recognition)Layout AnalysisText ExtractionDocument Structure RecognitionData Pipeline EngineeringMultimodal Document Parsing

tag

AWSBenchmarkingComputer VisionDockerDocument ProcessingEvalsForkedGPU / CUDAHuggingFaceJupyterLLM ServingLarge Language ModelsMultimodal AIOpenAIPythonResearch / PapersSGLangvLLM

Use Cases

Document Question AnsweringIntelligent Document ProcessingContract AnalysisResearch Paper ProcessingLegal Document ReviewFinancial Report AnalysisAcademic Literature MiningRegulatory Document Compliance

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

0

Update base image in mlu.Dockerfile

Xiaomeng Zhao • Mar 2, 2026

077b310

Quality

beta
Quality
medium
Maturity
beta

Categories

RAG & RetrievalPrimaryEvals & BenchmarkingInference & ServingMLOps & InfrastructureCloud & PlatformsLearning ResourcesData Science & AnalyticsFoundation ModelsComputer VisionMultimodal AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyUser ExperienceScale & ReliabilityData & EvaluationProduct Discovery

Languages

Python100.0%

Timeline

Project created
Feb 29, 2024
Forked
Mar 16, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 7, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…