Library/OCRmyPDF
Library/OCRmyPDFForked

ocrmypdf/OCRmyPDF

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Builder

ocrmypdf

ocrmypdf

ocrmypdf • individual

Stars

33,103

Using upstream star count

Forks

2,295

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Dec 20, 2013

Project creation date

README Summary

OCRmyPDF is a Python tool that adds an OCR text layer to scanned PDF files, making them searchable and selectable while preserving the original appearance. It supports various OCR engines like Tesseract and offers extensive customization options for image processing, language detection, and output formatting. The tool can be used as a command-line utility or integrated into Python applications.

AI Dev Skills

Unmapped

Optical Character Recognition (OCR)Document Image ProcessingPDF ManipulationComputer Vision for Text RecognitionImage Preprocessing and Enhancement

Tags

Optical Character Recognition (OCR)Document Image ProcessingPDF ManipulationComputer Vision for Text RecognitionImage Preprocessing and EnhancementTextLegal DiscoveryCloudFinanceGovernmentDocument DigitizationLegal TechPDF Text ExtractionRecords ManagementOn-premiseCompliance Document ProcessingEducationLegacy Document SearchHealthcareArchive DigitizationPublishingImageOn-device AISelf-hostedDocument AIPythonCLI

Taxonomy

Recent Activity

Updated 28 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

RAG & RetrievalPrimaryHealthcare & BiologyFinance & LegalEdge & Mobile AISearch & KnowledgeOther AI / MLComputer VisionRobotics

PM Skills

Product DiscoveryData & Evaluation

Languages

Python100.0%

Timeline

Project created
Dec 20, 2013
Forked
Mar 16, 2026
Your last push
28 days ago
Upstream last push
7 days ago
Tracked since
Mar 16, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…