Library/marker
Library/markerForked

datalab-to/marker

marker

Convert PDF to markdown + JSON quickly with high accuracy

Builder

datalab-to

datalab-to

datalab-to • individual

Stars

33,292

Using upstream star count

Forks

2,305

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Oct 30, 2023

Project creation date

README Summary

Marker is a Python tool that converts PDF documents to markdown and JSON formats with high accuracy and speed. It uses advanced OCR and document parsing techniques to extract text, tables, and structural elements from PDFs while preserving formatting. The tool is designed for efficient batch processing of documents with support for various PDF types including scanned documents.

AI Dev Skills

Unmapped

Document AIComputer VisionOptical Character RecognitionDocument Layout AnalysisPDF ProcessingText ExtractionDocument Structure Recognition

Tags

Document AIComputer VisionOptical Character RecognitionDocument Layout AnalysisPDF ProcessingText ExtractionDocument Structure RecognitionCloud APIDocument Processing PipelineResearchSelf-hostedText Mining from PDFsAcademic Paper ProcessingCompound AI SystemsDocument ManagementTextPDF Content ExtractionPublishingImageDocument Format ConversionDocument DigitizationOn-premiseKnowledge ManagementLegal TechPython

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

prototype
Quality
medium
Maturity
prototype

Categories

MLOps & InfrastructurePrimaryDev Tools & AutomationLearning ResourcesRAG & RetrievalML Platform & InfrastructureFinance & LegalEdge & Mobile AISearch & KnowledgeOther AI / MLComputer Vision

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Oct 30, 2023
Forked
Mar 16, 2026
Your last push
1 months ago
Upstream last push
9 days ago
Tracked since
Mar 10, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…