NVIDIA/TensorRT-LLM
TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Builder

NVIDIA
NVIDIA • big-tech
Stars
13,252
Using upstream star count
Forks
2,244
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Aug 16, 2023
Project creation date
README Summary
TensorRT-LLM is NVIDIA's Python API framework for defining and optimizing Large Language Models for efficient inference on NVIDIA GPUs. It provides both Python and C++ runtimes with state-of-the-art optimizations to orchestrate high-performance LLM inference execution.
AI Dev Skills
Unmapped
Tags
Taxonomy
AI Trends
Deployment Context
Modalities
Skill Areas
Recent Activity
Updated 1 months ago
7 Days
0
30 Days
0
90 Days
0
Quality
production- Quality
- high
- Maturity
- production
Categories
PM Skills
Languages
Timeline
- Project created
- Aug 16, 2023
- Forked
- Mar 14, 2026
- Your last push
- 1 months ago
- Upstream last push
- 6 days ago
- Tracked since
- Mar 13, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…