Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/unstructured
Library/unstructuredForked

Unstructured-IO/unstructured

unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

View on GitHub↗Upstream Unstructured-IO/unstructured↗

Builder

Unstructured-IO

Unstructured-IO

Unstructured-IO • individual

Stars

14,808

Using upstream star count

Forks

1,241

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 26, 2022

Project creation date

README Summary

<h3 align="center"> <img src="https://raw.githubusercontent.com/Unstructured-IO/unstructured/main/img/unstructured_logo.png" height="200" > </h3>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Data Pipeline EngineeringDocument ParsingDocument ProcessingETL DevelopmentInformation RetrievalMetadata ExtractionNatural Language ProcessingOCR IntegrationTable ExtractionText Extraction

Tags

Data Pipeline EngineeringDocument ParsingDocument ProcessingETL DevelopmentInformation RetrievalMetadata ExtractionNatural Language ProcessingOCR IntegrationTable ExtractionText ExtractionComputer VisionDeep LearningDockerEmbeddingsForkedOpen SourcePlanning / CoTPythonSecurity

Taxonomy

AI Trends

Retrieval-Augmented GenerationDocument AICompound AI SystemsEnterprise AI

category

RAG & RetrievalAI AgentsComputer VisionMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

Cloud APISelf-hostedOn-premiseServerless

Industries

Legal TechHealthcareFinanceInsuranceGovernmentEducationKnowledge Management

Modalities

TextImageTabularMultimodal

Skill Areas

Document ProcessingText ExtractionData Pipeline EngineeringETL DevelopmentInformation RetrievalDocument ParsingNatural Language ProcessingOCR IntegrationTable ExtractionMetadata Extraction

tag

Computer VisionDeep LearningDockerDocument ProcessingEmbeddingsForkedOpen SourcePlanning / CoTPythonSecurity

Use Cases

Document Question AnsweringRAG Data PreparationDocument Search and RetrievalAutomated Document ProcessingContent MigrationDocument ClassificationInformation ExtractionKnowledge Base ConstructionLegal Document AnalysisResearch Paper Processing

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

4

fix(deps): Update security vulnerability in pypdf to v6.9.1 [SECURITY] (#4248)

utic-renovate[bot] • Mar 20, 2026

cc89c8c

feat: make telemetry off by default (#4281)

Clayton • Mar 16, 2026

cb16853

chore: disable fail-build on Anchore container scan (#4285)

Lawrence Elitzer (LoLo) • Mar 16, 2026

5585e98

Quality

production
Quality
high
Maturity
production

Categories

RAG & RetrievalPrimaryMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & SafetyAI AgentsComputer VisionOther AI / ML

PM Skills

Scale & ReliabilityProduct DiscoveryAI-Native Architecture

Languages

HTML100.0%

Timeline

Project created
Sep 26, 2022
Forked
Mar 21, 2026
Your last push
2 months ago
Upstream last push
17 days ago
Tracked since
Mar 20, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…