Library/tesseract
Library/tesseractForked

tesseract-ocr/tesseract

tesseract

Tesseract Open Source OCR Engine (main repository)

Builder

tesseract-ocr

tesseract-ocr

tesseract-ocr • individual

Stars

73,290

Using upstream star count

Forks

10,573

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 12, 2014

Project creation date

README Summary

Tesseract is an open-source Optical Character Recognition (OCR) engine originally developed by HP and now maintained by Google. It supports over 100 languages and can recognize and extract text from images and PDF documents. The engine provides both command-line tools and library APIs for integration into applications.

AI Dev Skills

Unmapped

Optical Character RecognitionComputer VisionImage PreprocessingFeature ExtractionPattern RecognitionNeural Networks for Text RecognitionLanguage ModelingDeep Learning for OCR

Tags

Optical Character RecognitionComputer VisionImage PreprocessingFeature ExtractionPattern RecognitionNeural Networks for Text RecognitionLanguage ModelingDeep Learning for OCRText Extraction from ImagesAutomated Data EntryDocument DigitizationAccessibility Text ReadingGovernmentOn-premiseEdge ComputingSelf-hostedCloud APITraditional Machine LearningEducationEdge/MobileHistorical Document PreservationOn-device AIDocument ManagementPublishingForm ProcessingLicense Plate RecognitionHealthcareFinancial ServicesImageLegal TechArchival ServicesInvoice ProcessingTextC++CLI

Taxonomy

Recent Activity

Updated 28 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Healthcare & BiologyPrimaryFinance & LegalEdge & Mobile AIOther AI / MLComputer Vision

PM Skills

Product Discovery

Languages

C++100.0%

Timeline

Project created
Aug 12, 2014
Forked
Mar 16, 2026
Your last push
28 days ago
Upstream last push
15 days ago
Tracked since
Mar 16, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…