NVIDIA/TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Builder
NVIDIA
NVIDIA • big-tech
Stars
13,767
Using upstream star count
Forks
2,415
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Aug 16, 2023
Project creation date
TensorRT LLM =========================== <h4>TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.</h4>
Unmapped
AI Trends
category
Deployment Context
Modalities
Skill Areas
tag
Updated 2 months ago
7 Days
0
30 Days
0
90 Days
20
[https://nvbugs/5944411][fix] Handle anyOf parameter schemas in Qwen3Coder tool parser (#12173)
Joyjit Daw • Mar 13, 2026
[None][feat] Add mix-precision checkpoint support in AutoDeploy (#12175)
Frida Hou • Mar 13, 2026
[None][feat] Qwen3.5 perf optimizations (#11581)
Suyog Gupta • Mar 13, 2026
pgvector cosine similarity · $0
Loading…