Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/FasterTransformer
Library/FasterTransformerForked

NVIDIA/FasterTransformer

FasterTransformer

Transformer related optimization, including BERT, GPT

View on GitHub↗Upstream NVIDIA/FasterTransformer↗

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

6,416

Using upstream star count

Forks

933

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Apr 2, 2021

Project creation date

README Summary

**Note: FasterTransformer development has transitioned to [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.5.0). All developers are encouraged to leverage TensorRT-LLM to get the latest improvements on LLM Inference. The NVIDIA/FasterTransformer repo will stay up, but will not have further development.**

Community Evaluation

Loading…

AI Dev Skills

Unmapped

BERT OptimizationCUDA Kernel DevelopmentDeep Learning Systems EngineeringGPT OptimizationGPU Memory ManagementHigh-Performance ComputingModel Inference AccelerationModel Serving InfrastructureTransformer Architecture Optimization

Tags

BERT OptimizationCUDA Kernel DevelopmentDeep Learning Systems EngineeringGPT OptimizationGPU Memory ManagementHigh-Performance ComputingModel Inference AccelerationModel Serving InfrastructureTransformer Architecture OptimizationBenchmarkingC++EmbeddingsEvalsForkedGPU / CUDAHuggingFaceLarge Language ModelsModel OptimizationOpenAIPyTorchQuantizationReal-Time / StreamingResearch / PapersTensorFlowTensorRTTransformers

Taxonomy

AI Trends

Model OptimizationEfficient AI InfrastructureProduction AI Systems

category

Foundation ModelsRAG & RetrievalModel TrainingEvals & BenchmarkingInference & ServingLearning Resources

Deployment Context

Cloud APISelf-hostedOn-premise

Modalities

Text

Skill Areas

Transformer Architecture OptimizationCUDA Kernel DevelopmentGPU Memory ManagementModel Inference AccelerationBERT OptimizationGPT OptimizationDeep Learning Systems EngineeringHigh-Performance ComputingModel Serving Infrastructure

tag

BenchmarkingC++EmbeddingsEvalsForkedGPU / CUDAHuggingFaceLarge Language ModelsModel OptimizationOpenAIPyTorchQuantizationReal-Time / StreamingResearch / PapersTensorFlowTensorRTTransformers

Use Cases

High-throughput Language Model ServingReal-time Text GenerationLarge-scale BERT InferenceProduction GPT DeploymentLatency-critical NLP Applications

Recent Activity

Updated 2 years ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

RAG & RetrievalPrimaryEvals & BenchmarkingInference & ServingLearning ResourcesFoundation ModelsModel TrainingSearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationProduct Discovery

Languages

C++100.0%

Timeline

Project created
Apr 2, 2021
Forked
Mar 14, 2026
Your last push
2 years ago
Upstream last push
2 years ago
Tracked since
Mar 27, 2024

Similar Repos

pgvector cosine similarity · $0

Loading…