Library/datasets
Library/datasetsForked

huggingface/datasets

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Builder

HuggingFace

HuggingFace

huggingface • ai-lab

Stars

21,360

Using upstream star count

Forks

3,152

Using upstream fork count

Open Issues

0

Activity Score

0/100

20 commits in 30d

Created

Mar 26, 2020

Project creation date

README Summary

Hugging Face Datasets is a library that provides access to thousands of datasets for machine learning research and applications. It offers fast, memory-efficient data loading and processing capabilities with support for various data formats. The library integrates seamlessly with popular ML frameworks and provides tools for dataset manipulation, streaming, and sharing.

AI Dev Skills

Unmapped

Dataset Curation and ManagementData Preprocessing and Feature EngineeringLarge-scale Data ProcessingMachine Learning Pipeline DevelopmentData Versioning and ReproducibilityDistributed Data LoadingMemory-efficient Data HandlingCross-platform Data Compatibility

Tags

Dataset Curation and ManagementData Preprocessing and Feature EngineeringLarge-scale Data ProcessingMachine Learning Pipeline DevelopmentData Versioning and ReproducibilityDistributed Data LoadingMemory-efficient Data HandlingCross-platform Data CompatibilityMulti-task LearningComputer Vision Model DevelopmentData Augmentation WorkflowsTabularCloud APIDataset Benchmarking and EvaluationReproducible AI ResearchOn-premiseOpen Source AISpeech Recognition TrainingTextVideoTransfer Learning ExperimentsAudioResearch ReproducibilitySelf-hostedCommunity-driven AI DevelopmentTraining Language ModelsImageStandardized ML WorkflowsPython

Taxonomy

Recent Activity

Updated 25 days ago

7 Days

1

30 Days

20

90 Days

67

Quality

production
Quality
high
Maturity
production

Categories

MLOps & InfrastructurePrimaryLearning ResourcesEvals & BenchmarkingML Platform & InfrastructureCoding & Dev ToolsSearch & KnowledgeOther AI / MLModel TrainingGenerative MediaRoboticsComputer Vision

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Mar 26, 2020
Forked
Mar 22, 2026
Your last push
25 days ago
Upstream last push
11 days ago
Tracked since
Mar 19, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…