Library/unstructured
Library/unstructuredForked

Unstructured-IO/unstructured

unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

Builder

Unstructured-IO

Unstructured-IO

Unstructured-IO • individual

Stars

14,383

Using upstream star count

Forks

1,209

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 26, 2022

Project creation date

README Summary

Unstructured is an open-source ETL solution that transforms complex documents into clean, structured formats optimized for language models and AI applications. The platform provides tools for document partitioning, text extraction, chunking, and data enrichment across various file formats including PDFs, images, and office documents. It offers both open-source tools and enterprise-grade platform solutions for production workflows.

AI Dev Skills

Unmapped

Document ProcessingText ExtractionData Pipeline EngineeringETL DevelopmentInformation RetrievalDocument ParsingNatural Language ProcessingOCR IntegrationTable ExtractionMetadata Extraction

Tags

Document ProcessingText ExtractionData Pipeline EngineeringETL DevelopmentInformation RetrievalDocument ParsingNatural Language ProcessingOCR IntegrationTable ExtractionMetadata ExtractionOn-premiseDocument ClassificationRetrieval-Augmented GenerationInsuranceAutomated Document ProcessingLegal TechTextCompound AI SystemsEducationSelf-hostedKnowledge ManagementImageLegal Document AnalysisDocument Question AnsweringEnterprise AIDocument Search and RetrievalRAG Data PreparationKnowledge Base ConstructionInformation ExtractionGovernmentFinanceHealthcareServerlessResearch Paper ProcessingTabularMultimodalContent MigrationDocument AICloud APIHTML

Taxonomy

Recent Activity

Updated 24 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Evals & BenchmarkingPrimaryML Platform & InfrastructureHealthcare & BiologyFinance & LegalMultimodal AIEdge & Mobile AISearch & KnowledgeOther AI / MLMLOps & InfrastructureDev Tools & AutomationLearning ResourcesRAG & RetrievalNLP & TextFoundation Models

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

HTML100.0%

Timeline

Project created
Sep 26, 2022
Forked
Mar 21, 2026
Your last push
24 days ago
Upstream last push
7 days ago
Tracked since
Mar 20, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…