NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Builder
NVIDIA
NVIDIA • big-tech
Stars
3,362
Using upstream star count
Forks
733
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Sep 20, 2022
Project creation date
.. Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Unmapped
category
Deployment Context
Modalities
Skill Areas
tag
Updated 2 months ago
7 Days
0
30 Days
0
90 Days
20
[PyTorch] Backwards compatible single param checkpointing in `GroupedLinear` (#2761)
Kirthi Shankar Sivamani • Mar 16, 2026
[NVFP4][Dense/MoE] Integrate Cutlass NVFP4 Row-Cast-Col-RHT-Transpose-Cast Fusion Kernel (#2555)
Zhongbo Zhu • Mar 16, 2026
[Common] Fix linker error for to_string(DType) in distributed tests (#2757)
vcherepanov-nv • Mar 16, 2026
pgvector cosine similarity · $0
Loading…