Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/surya
Library/suryaForked

datalab-to/surya

surya

OCR, layout analysis, reading order, table recognition in 90+ languages

View on GitHub↗Upstream datalab-to/surya↗

Builder

datalab-to

datalab-to

datalab-to • individual

Stars

19,805

Using upstream star count

Forks

1,370

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Jan 10, 2024

Project creation date

README Summary

- OCR in 90+ languages that benchmarks favorably vs cloud services - Line-level text detection in any language - Layout analysis (table, image, header, etc detection) - Reading order detection - Table recognition (detecting rows/columns) - LaTeX OCR

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Computer VisionDeep Learning for VisionDocument Layout AnalysisDocument Structure UnderstandingMultilingual Text ProcessingOptical Character RecognitionReading Order PredictionTable RecognitionText Detection and RecognitionTransformer Architecture

Tags

Computer VisionDeep Learning for VisionDocument Layout AnalysisDocument Structure UnderstandingMultilingual Text ProcessingOptical Character RecognitionReading Order PredictionTable RecognitionText Detection and RecognitionTransformer ArchitectureBenchmarkingDeepSpeedEvalsFine-TuningForkedGPU / CUDAGoogle CloudHuggingFaceOpen SourcePyTorchPythonResearch / PapersSegmentationSynthetic DataTransformers

Taxonomy

AI Trends

Multimodal ReasoningOn-device AISmall Language Models

category

Model TrainingFoundation ModelsEvals & BenchmarkingInference & ServingComputer VisionCloud & PlatformsLearning Resources

Deployment Context

Cloud APISelf-hostedOn-premise

Industries

Legal TechFinTechHealthcareEducationPublishingGovernmentInsurance

Modalities

ImageTextMultimodal

Skill Areas

Computer VisionOptical Character RecognitionDocument Layout AnalysisMultilingual Text ProcessingDeep Learning for VisionTransformer ArchitectureText Detection and RecognitionDocument Structure UnderstandingTable RecognitionReading Order Prediction

tag

BenchmarkingDeepSpeedEvalsFine-TuningForkedGPU / CUDAGoogle CloudHuggingFaceOpen SourcePyTorchPythonResearch / PapersSegmentationSynthetic DataTransformers

Use Cases

Document DigitizationPDF Text ExtractionForm ProcessingInvoice ProcessingReceipt AnalysisAcademic Paper ProcessingLegal Document AnalysisMulti-language Document ProcessingTable Data ExtractionDocument Layout Understanding

Recent Activity

Updated 3 months ago

7 Days

0

30 Days

0

90 Days

0

@Br1an67 has signed the CLA in datalab-to/surya#489

github-actions[bot] • Mar 1, 2026

e735028

@bailey-coding has signed the CLA in datalab-to/surya#487

github-actions[bot] • Feb 24, 2026

2add5da

Quality

beta
Quality
high
Maturity
beta

Categories

Evals & BenchmarkingPrimaryInference & ServingCloud & PlatformsLearning ResourcesFoundation ModelsModel TrainingComputer VisionSearch & Knowledge

PM Skills

Data & Evaluation

Languages

Python100.0%

Timeline

Project created
Jan 10, 2024
Forked
Mar 22, 2026
Your last push
3 months ago
Upstream last push
28 days ago
Tracked since
Mar 1, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…