Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/text-generation-inference
Library/text-generation-inferenceForked

huggingface/text-generation-inference

text-generation-inference

Large Language Model Text Generation Inference

View on GitHub↗Upstream huggingface/text-generation-inference↗

Builder

HuggingFace

HuggingFace

huggingface • ai-lab

Stars

10,857

Using upstream star count

Forks

1,268

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Oct 8, 2022

Project creation date

README Summary

> [!CAUTION] > text-generation-inference is now in maintenance mode. Going forward, we will accept pull requests for minor bug fixes, documentation improvements and lightweight maintenance tasks. > > TGI has initiated the movement for optimized inference engines to rely on a `transformers` model architectures. This approach is now adopted by downstream inference engines, which we contribute to and recommend using going forward: [vllm](https://github.com/vllm-project/vllm), [SGLang](https://githu

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Distributed InferenceGPU AccelerationLarge Language Model DeploymentModel QuantizationModel Serving OptimizationProduction ML SystemsText Generation InferenceTransformer Architecture

Tags

Distributed InferenceGPU AccelerationLarge Language Model DeploymentModel QuantizationModel Serving OptimizationProduction ML SystemsText Generation InferenceTransformer ArchitectureAPIBatchingC++Deep LearningDockerForkedGPU / CUDAHuggingFaceKubernetesLLM ServingLarge Language ModelsMistralOpenAIOpenTelemetryPyTorchPythonQuantizationReal-Time / StreamingResearch / PapersRustSGLangTGITransformersTutorialWatermarkingllama.cppvLLM

Taxonomy

AI Trends

Large Language ModelsProduction AI SystemsAI InfrastructureModel Optimization

category

Foundation ModelsModel TrainingInference & ServingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

Self-hostedCloud APIOn-premise

Modalities

Text

Skill Areas

Large Language Model DeploymentText Generation InferenceModel Serving OptimizationProduction ML SystemsTransformer ArchitectureGPU AccelerationModel QuantizationDistributed Inference

tag

APIBatchingC++Deep LearningDockerForkedGPU / CUDAHuggingFaceKubernetesLLM ServingLarge Language ModelsMistralOpenAIOpenTelemetryPyTorchPythonQuantizationReal-Time / StreamingResearch / PapersRustSGLangTGITransformersTutorialWatermarkingllama.cppvLLM

Use Cases

Large Language Model API ServingHigh-throughput Text GenerationProduction LLM DeploymentScalable AI Text ServicesCustom LLM Hosting

Recent Activity

Updated 4 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
medium
Maturity
production

Categories

Foundation ModelsPrimaryModel TrainingInference & ServingSearch & KnowledgeOther AI / MLMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

PM Skills

Cost & EfficiencyScale & ReliabilityDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Oct 8, 2022
Forked
Mar 13, 2026
Your last push
4 months ago
Upstream last push
2 months ago
Tracked since
Jan 8, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…