PaddlePaddle/PaddleOCR
PaddleOCR
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Builder

PaddlePaddle
PaddlePaddle • individual
Stars
74,770
Using upstream star count
Forks
10,170
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
May 8, 2020
Project creation date
README Summary
PaddleOCR is a comprehensive OCR (Optical Character Recognition) toolkit developed by PaddlePaddle that can extract text from images and PDF documents. It supports over 100 languages and provides both text detection and recognition capabilities with pretrained models. The toolkit is designed to convert visual documents into structured text data that can be easily processed by AI systems and large language models.
AI Dev Skills
Unmapped
Tags
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 28 days ago
7 Days
0
30 Days
0
90 Days
0
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- May 8, 2020
- Forked
- Mar 16, 2026
- Your last push
- 28 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 16, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…