Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/tokenizers
Library/tokenizersForked

huggingface/tokenizers

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

View on GitHub↗Upstream huggingface/tokenizers↗

Builder

HuggingFace

HuggingFace

huggingface • ai-lab

Stars

10,779

Using upstream star count

Forks

1,108

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Nov 1, 2019

Project creation date

README Summary

<p align="center"> <br> <img src="https://huggingface.co/landing/assets/tokenizers/tokenizers-logo.png" width="600"/> <br> <p> <p align="center"> <img alt="Build" src="https://github.com/huggingface/tokenizers/workflows/Rust/badge.svg"> <a href="https://github.com/huggingface/tokenizers/blob/main/LICENSE"> <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/tokenizers.svg?color=blue&cachedrop"> </a> <a href="https://pepy.tech/project/token

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Machine Learning InfrastructureNatural Language ProcessingPerformance OptimizationRust ProgrammingText PreprocessingTokenization AlgorithmsTransformer Architecture

Tags

Machine Learning InfrastructureNatural Language ProcessingPerformance OptimizationRust ProgrammingText PreprocessingTokenization AlgorithmsTransformer ArchitectureForkedHuggingFaceNode.jsPythonRust

Taxonomy

AI Trends

Large Language ModelsTransformer ArchitectureHigh-Performance ML Infrastructure

category

Foundation ModelsDev Tools & Automation

Deployment Context

Self-hostedCloud APIOn-premise

Modalities

Text

Skill Areas

Natural Language ProcessingText PreprocessingTokenization AlgorithmsTransformer ArchitectureMachine Learning InfrastructurePerformance OptimizationRust Programming

tag

ForkedHuggingFaceNode.jsPythonRust

Use Cases

Text Tokenization for Language ModelsNLP Data PreprocessingMachine Learning Pipeline OptimizationResearch Text ProcessingProduction NLP Applications

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

0

Fix multithreaded concurrency test to use shared tokenizer instance (#1950)

Shintaro Murakami • Feb 27, 2026

c4e27cf

Bump minimatch from 3.1.2 to 3.1.3 in /bindings/node (#1955)

dependabot[bot] • Feb 25, 2026

c370063

Update to PyO3 0.28 to automatically disable GIL (#1948)

Nathan Goldbaum • Feb 25, 2026

4c2e48a

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryDev Tools & Automation

PM Skills

Scale & ReliabilityDeveloper Platform

Languages

Rust100.0%

Timeline

Project created
Nov 1, 2019
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
20 days ago
Tracked since
Mar 20, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…