Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/TransformerEngine
Library/TransformerEngineForked

NVIDIA/TransformerEngine

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

View on GitHub↗Upstream NVIDIA/TransformerEngine↗

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

3,362

Using upstream star count

Forks

733

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Sep 20, 2022

Project creation date

README Summary

.. Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Deep Learning AccelerationDistributed TrainingGPU Computing and CUDAHardware-Software Co-designMemory OptimizationMixed Precision TrainingModel OptimizationNumerical ComputingTransformer Architecture

Tags

Deep Learning AccelerationDistributed TrainingGPU Computing and CUDAHardware-Software Co-designMemory OptimizationMixed Precision TrainingModel OptimizationNumerical ComputingTransformer ArchitectureBackendBenchmarkingC++Deep LearningDeepSpeedDockerEvalsForkedGPU / CUDAHuggingFaceJAXLarge Language ModelsNumPyOpenAIPyTorchPythonResearch / PapersSageMakerTransformersTutorial

Taxonomy

AI Trends

Large Language ModelsModel EfficiencyHardware OptimizationScaling AI Training

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingMLOps & InfrastructureDev Tools & AutomationCloud & PlatformsLearning ResourcesData Science & Analytics

Deployment Context

Cloud APISelf-hostedOn-premise

Modalities

TextMultimodal

Skill Areas

Transformer ArchitectureGPU Computing and CUDAMixed Precision TrainingModel OptimizationDeep Learning AccelerationMemory OptimizationNumerical ComputingHardware-Software Co-designDistributed Training

tag

BackendBenchmarkingC++Deep LearningDeepSpeedDockerEvalsForkedGPU / CUDAHuggingFaceJAXLarge Language ModelsNumPyOpenAIPyTorchPythonResearch / PapersSageMakerTransformersTutorial

Use Cases

Large Language Model TrainingTransformer Model Inference AccelerationMemory-Constrained Model TrainingHigh-Performance Computing for AIProduction ML Model Serving

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

[PyTorch] Backwards compatible single param checkpointing in `GroupedLinear` (#2761)

Kirthi Shankar Sivamani • Mar 16, 2026

4017565

[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel (#2555)

Zhongbo Zhu • Mar 16, 2026

523801d

[Common] Fix linker error for to_string(DType) in distributed tests (#2757)

vcherepanov-nv • Mar 16, 2026

a945846

Quality

production
Quality
high
Maturity
production

Categories

Evals & BenchmarkingPrimaryInference & ServingMLOps & InfrastructureDev Tools & AutomationCloud & PlatformsLearning ResourcesData Science & AnalyticsFoundation ModelsModel TrainingSearch & KnowledgeOther AI / ML

PM Skills

Scale & ReliabilityData & Evaluation

Languages

Python100.0%

Timeline

Project created
Sep 20, 2022
Forked
Mar 14, 2026
Your last push
2 months ago
Upstream last push
19 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…