Library/cleanlab
Library/cleanlabForked

cleanlab/cleanlab

cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Builder

cleanlab

cleanlab

cleanlab • individual

Stars

11,407

Using upstream star count

Forks

884

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

May 11, 2018

Project creation date

README Summary

Cleanlab is an open-source Python library that helps improve machine learning models by automatically detecting and handling data quality issues in real-world datasets. It focuses on identifying problematic data points like label errors, outliers, and near-duplicates to enable more reliable model training and evaluation. The library provides a data-centric approach to AI that works with any ML framework and dataset type.

AI Dev Skills

Unmapped

Data Quality AssessmentLabel Noise DetectionOutlier DetectionDuplicate DetectionData-Centric AIConfident LearningStatistical Learning TheoryCross-Validation TechniquesEnsemble MethodsProbabilistic Classification

Tags

Data Quality AssessmentLabel Noise DetectionOutlier DetectionDuplicate DetectionData-Centric AIConfident LearningStatistical Learning TheoryCross-Validation TechniquesEnsemble MethodsProbabilistic ClassificationDataset Quality AuditingTrustworthy AIMislabeled Data DetectionOutlier IdentificationML ObservabilityAudioNear-Duplicate RemovalTraining Data CleaningTextTabularAI SafetyData Validation PipelineOn-premiseModel Performance ImprovementJupyter NotebooksCloud APISelf-hostedImagePython

Taxonomy

Recent Activity

Updated 3 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

RoboticsPrimaryModel TrainingObservability & MonitoringGenerative MediaML Platform & InfrastructureMLOps & InfrastructureSafety & AlignmentCoding & Dev ToolsData Science & AnalyticsOther AI / ML

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
May 11, 2018
Forked
Mar 22, 2026
Your last push
3 months ago
Upstream last push
3 months ago
Tracked since
Jan 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…