Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/cleanlab
Library/cleanlabForked

cleanlab/cleanlab

cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

View on GitHub↗Upstream cleanlab/cleanlab↗

Builder

cleanlab

cleanlab

cleanlab • individual

Stars

11,490

Using upstream star count

Forks

895

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

May 11, 2018

Project creation date

README Summary

<div align="center"> <img src="https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/cleanlab_logo_open_source.png" width=60%> </div>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Confident LearningCross-Validation TechniquesData-Centric AIData Quality AssessmentDuplicate DetectionEnsemble MethodsLabel Noise DetectionOutlier DetectionProbabilistic ClassificationStatistical Learning Theory

Tags

Confident LearningCross-Validation TechniquesData-Centric AIData Quality AssessmentDuplicate DetectionEnsemble MethodsLabel Noise DetectionOutlier DetectionProbabilistic ClassificationStatistical Learning TheoryComputer VisionCourseEmbeddingsForkedHuggingFaceJupyterKerasMachine LearningOpenAIPyTorchPythonResearch / PapersScikit-learnSegmentationTensorFlowTutorial

Taxonomy

AI Trends

Data-Centric AIAI SafetyTrustworthy AIML Observability

category

Model TrainingFoundation ModelsRAG & RetrievalComputer VisionLearning ResourcesData Science & Analytics

Deployment Context

Self-hostedCloud APIOn-premiseJupyter Notebooks

Modalities

TabularTextImageAudio

Skill Areas

Data Quality AssessmentLabel Noise DetectionOutlier DetectionDuplicate DetectionData-Centric AIConfident LearningStatistical Learning TheoryCross-Validation TechniquesEnsemble MethodsProbabilistic Classification

tag

Computer VisionCourseEmbeddingsForkedHuggingFaceJupyterKerasMachine LearningOpenAIPyTorchPythonResearch / PapersScikit-learnSegmentationTensorFlowTutorial

Use Cases

Mislabeled Data DetectionDataset Quality AuditingTraining Data CleaningOutlier IdentificationNear-Duplicate RemovalModel Performance ImprovementData Validation Pipeline

Recent Activity

Updated 4 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryRAG & RetrievalModel TrainingComputer VisionData Science & AnalyticsSearch & KnowledgeOther AI / MLLearning Resources

PM Skills

Product Discovery

Languages

Python100.0%

Timeline

Project created
May 11, 2018
Forked
Mar 22, 2026
Your last push
4 months ago
Upstream last push
4 months ago
Tracked since
Jan 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…