Library/tokenizers
Library/tokenizersForked

huggingface/tokenizers

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Builder

HuggingFace

HuggingFace

huggingface • ai-lab

Stars

10,591

Using upstream star count

Forks

1,066

Using upstream fork count

Open Issues

0

Activity Score

0/100

33 commits in 30d

Created

Nov 1, 2019

Project creation date

README Summary

Hugging Face Tokenizers is a high-performance library that provides fast, state-of-the-art tokenization algorithms optimized for both research and production environments. The library is implemented in Rust for maximum performance and offers Python bindings, supporting popular tokenization methods like BPE, WordPiece, and SentencePiece. It's designed to be extremely fast while maintaining flexibility and ease of use for NLP tasks.

AI Dev Skills

Unmapped

Natural Language ProcessingText PreprocessingTokenization AlgorithmsTransformer ArchitectureMachine Learning InfrastructurePerformance OptimizationRust Programming

Tags

Natural Language ProcessingText PreprocessingTokenization AlgorithmsTransformer ArchitectureMachine Learning InfrastructurePerformance OptimizationRust ProgrammingHigh-Performance ML InfrastructureLarge Language ModelsProduction NLP ApplicationsOn-premiseResearch Text ProcessingSelf-hostedText Tokenization for Language ModelsNLP Data PreprocessingMachine Learning Pipeline OptimizationCloud APITextRust

Taxonomy

Recent Activity

Updated 25 days ago

7 Days

0

30 Days

33

90 Days

48

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryNLP & TextMLOps & InfrastructureLearning ResourcesML Platform & InfrastructureSearch & KnowledgeOther AI / ML

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Rust100.0%

Timeline

Project created
Nov 1, 2019
Forked
Mar 22, 2026
Your last push
25 days ago
Upstream last push
11 days ago
Tracked since
Mar 20, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…