Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/opendataloader-pdf
Library/opendataloader-pdfForked

opendataloader-project/opendataloader-pdf

opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

View on GitHub↗Upstream opendataloader-project/opendataloader-pdf↗

Builder

opendataloader-project

opendataloader-project

opendataloader-project • individual

Stars

21,807

Using upstream star count

Forks

2,039

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

May 13, 2025

Project creation date

README Summary

<!-- AI-AGENT-SUMMARY name: opendataloader-pdf category: PDF data extraction, PDF accessibility automation license: Apache-2.0 solves: [PDF to structured data for RAG/LLM pipelines, automate PDF accessibility compliance — layout analysis + auto-tagging to Tagged PDF (first open-source end-to-end)] input: PDF files (digital, scanned, tagged) output: Markdown, JSON (with bounding boxes), HTML, Tagged PDF, PDF/UA (enterprise) sdk: Python, Node.js, Java requirements: Java 11+ pricing: open-source co

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Data ExtractionData Pipeline EngineeringDocument ProcessingDocument UnderstandingText Processing

Tags

Data ExtractionData Pipeline EngineeringDocument ProcessingDocument UnderstandingText ProcessingAI SafetyAutomationBenchmarkingEvalsFinTechForkedHealthcare AIJavaLangChainLarge Language ModelsMultimodal AINode.jsPrompt InjectionPythonRoadmapTypeScriptWatermarking

Taxonomy

AI Trends

Document AIData Pipeline AutomationAI Data Preparation

category

Foundation ModelsAI AgentsEvals & BenchmarkingDev Tools & AutomationLearning ResourcesIndustry: HealthcareIndustry: FinTechSecurity & Safety

Deployment Context

Self-hostedOn-premise

Industries

Legal TechHealthcareFinancial ServicesEducationGovernment

Modalities

TextDocument

Skill Areas

Document ProcessingData ExtractionText ProcessingDocument UnderstandingData Pipeline Engineering

tag

AI SafetyAutomationBenchmarkingEvalsFinTechForkedHealthcare AIJavaLangChainLarge Language ModelsMultimodal AINode.jsPrompt InjectionPythonRoadmapTypeScriptWatermarking

Use Cases

Document Content ExtractionPDF Accessibility EnhancementDocument Preprocessing for AIStructured Data GenerationDocument Digitization

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

refactor: address code review feedback

Bundo Lee • Mar 20, 2026

6b42c7a

fix: address code review feedback on narrow outlier filtering

Bundo Lee • Mar 20, 2026

8e3f74a

fix: filter narrow outlier elements in vertical gap detection

Bundo Lee • Mar 20, 2026

9f6f3c6

Quality

prototype
Quality
medium
Maturity
prototype

Categories

Evals & BenchmarkingPrimaryDev Tools & AutomationLearning ResourcesIndustry: HealthcareIndustry: FinTechSecurity & SafetyFoundation ModelsAI AgentsSafety & AlignmentHealthcare & BiologyFinance & LegalMultimodal AIOther AI / ML

PM Skills

Safety & AlignmentUser ExperienceData & EvaluationDeveloper Platform

Languages

Java100.0%

Timeline

Project created
May 13, 2025
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 20, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…