huggingface/tokenizers
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Builder

HuggingFace
huggingface • ai-lab
Stars
10,591
Using upstream star count
Forks
1,066
Using upstream fork count
Open Issues
0
Activity Score
0/100
33 commits in 30d
Created
Nov 1, 2019
Project creation date
README Summary
Hugging Face Tokenizers is a high-performance library that provides fast, state-of-the-art tokenization algorithms optimized for both research and production environments. The library is implemented in Rust for maximum performance and offers Python bindings, supporting popular tokenization methods like BPE, WordPiece, and SentencePiece. It's designed to be extremely fast while maintaining flexibility and ease of use for NLP tasks.
AI Dev Skills
Unmapped
Tags
Taxonomy
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 25 days ago
7 Days
0
30 Days
33
90 Days
48
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- Nov 1, 2019
- Forked
- Mar 22, 2026
- Your last push
- 25 days ago
- Upstream last push
- 11 days ago
- Tracked since
- Mar 20, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…