Library/TransformerEngineForked

NVIDIA/TransformerEngine

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

View on GitHub↗Upstream NVIDIA/TransformerEngine↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

3,362

Using upstream star count

Forks

733

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Sep 20, 2022

Project creation date

README Summary

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Deep Learning AccelerationDistributed TrainingGPU Computing and CUDAHardware-Software Co-designMemory OptimizationMixed Precision TrainingModel OptimizationNumerical ComputingTransformer Architecture

Taxonomy

AI Trends

Large Language Models Model Efficiency Hardware Optimization Scaling AI Training

Recent Activity

Updated 2 months ago

7 Days

30 Days

90 Days

[PyTorch] Backwards compatible single param checkpointing in `GroupedLinear` (#2761)

Kirthi Shankar Sivamani • Mar 16, 2026

4017565

[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel (#2555)

Zhongbo Zhu • Mar 16, 2026

523801d

[Common] Fix linker error for to_string(DType) in distributed tests (#2757)

vcherepanov-nv • Mar 16, 2026

a945846

Quality

production

Quality: high
Maturity: production

PM Skills

Scale & ReliabilityData & Evaluation

Languages

Python100.0%

Timeline

Project created: Sep 20, 2022
Forked: Mar 14, 2026
Your last push: 2 months ago
Upstream last push: 19 days ago
Tracked since: Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/TransformerEngineForked

NVIDIA/TransformerEngine

TransformerEngine

View on GitHub↗Upstream NVIDIA/TransformerEngine↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

3,362

Using upstream star count

Forks

733

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Sep 20, 2022

Project creation date

README Summary

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Deep Learning AccelerationDistributed TrainingGPU Computing and CUDAHardware-Software Co-designMemory OptimizationMixed Precision TrainingModel OptimizationNumerical ComputingTransformer Architecture

Taxonomy

AI Trends

Large Language Models Model Efficiency Hardware Optimization Scaling AI Training

Recent Activity

Updated 2 months ago

7 Days

30 Days

90 Days

[PyTorch] Backwards compatible single param checkpointing in `GroupedLinear` (#2761)

Kirthi Shankar Sivamani • Mar 16, 2026

4017565

[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel (#2555)

Zhongbo Zhu • Mar 16, 2026

523801d

[Common] Fix linker error for to_string(DType) in distributed tests (#2757)

vcherepanov-nv • Mar 16, 2026

a945846

Quality

production

Quality: high
Maturity: production

PM Skills

Scale & ReliabilityData & Evaluation

Languages

Python100.0%

Timeline

Project created: Sep 20, 2022
Forked: Mar 14, 2026
Your last push: 2 months ago
Upstream last push: 19 days ago
Tracked since: Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

TransformerEngine

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

TransformerEngine

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos