Library/TransformerEngine
Library/TransformerEngineForked

NVIDIA/TransformerEngine

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

3,253

Using upstream star count

Forks

684

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 20, 2022

Project creation date

README Summary

NVIDIA TransformerEngine is a library designed to accelerate Transformer model training and inference on NVIDIA GPUs by leveraging advanced precision formats like FP8 and FP4. The library specifically targets Hopper, Ada, and Blackwell GPU architectures to deliver improved performance while reducing memory consumption. It provides optimized implementations for both training and inference workloads of Transformer-based models.

AI Dev Skills

Unmapped

Transformer ArchitectureGPU Computing and CUDAMixed Precision TrainingModel OptimizationDeep Learning AccelerationMemory OptimizationNumerical ComputingHardware-Software Co-designDistributed Training

Tags

Transformer ArchitectureGPU Computing and CUDAMixed Precision TrainingModel OptimizationDeep Learning AccelerationMemory OptimizationNumerical ComputingHardware-Software Co-designDistributed TrainingHardware OptimizationProduction ML Model ServingMultimodalModel EfficiencyLarge Language Model TrainingSelf-hostedTextLarge Language ModelsScaling AI TrainingCloud APIHigh-Performance Computing for AITransformer Model Inference AccelerationMemory-Constrained Model TrainingOn-premisePython

Taxonomy

Recent Activity

Updated 27 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Inference & ServingPrimaryMultimodal AIOther AI / MLFoundation ModelsModel Training

PM Skills

Scale & Reliability

Languages

Python100.0%

Timeline

Project created
Sep 20, 2022
Forked
Mar 14, 2026
Your last push
27 days ago
Upstream last push
6 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…