Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/TensorRT-LLM
Library/TensorRT-LLMForked

NVIDIA/TensorRT-LLM

TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

View on GitHub↗Upstream NVIDIA/TensorRT-LLM↗

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

13,767

Using upstream star count

Forks

2,415

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 16, 2023

Project creation date

README Summary

TensorRT LLM =========================== <h4>TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.</h4>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingGPU ComputingLarge Language Model InferenceMemory ManagementModel OptimizationNeural Network CompilationPerformance EngineeringRuntime SystemsTensor OperationsTransformer Architecture

Tags

CUDA ProgrammingGPU ComputingLarge Language Model InferenceMemory ManagementModel OptimizationNeural Network CompilationPerformance EngineeringRuntime SystemsTensor OperationsTransformer ArchitectureBenchmarkingCachingComfyUICurated ListDeep LearningDeepSeekDockerFine-TuningForkedGPU / CUDAGoogle CloudHuggingFaceImage GenerationInferenceJupyterKV CacheKubernetesLLM ServingLarge Language ModelsLlamaLlamaIndexLoRA / PEFTMistralONNXOpenAIPyTorchPythonQuantizationReal-Time / StreamingResearch / PapersRoadmapSageMakerSpeculative DecodingStable DiffusionTensorRTTool UseTritonTutorial

Taxonomy

AI Trends

Large Language ModelsModel OptimizationGPU AccelerationInference EfficiencyProduction AI Systems

category

Inference & ServingFoundation ModelsAI AgentsRAG & RetrievalModel TrainingEvals & BenchmarkingGenerative MediaMLOps & InfrastructureCloud & PlatformsLearning ResourcesData Science & Analytics

Deployment Context

Self-hostedCloud APIOn-premiseGPU Clusters

Industries

Developer ToolsCloud ComputingEnterprise Software

Modalities

Text

Skill Areas

GPU ComputingLarge Language Model InferenceModel OptimizationCUDA ProgrammingTransformer ArchitectureTensor OperationsPerformance EngineeringRuntime SystemsNeural Network CompilationMemory Management

tag

BenchmarkingCachingComfyUICurated ListDeep LearningDeepSeekDockerFine-TuningForkedGPU / CUDAGoogle CloudHuggingFaceImage GenerationInferenceJupyterKV CacheKubernetesLLM ServingLarge Language ModelsLlamaLlamaIndexLoRA / PEFTMistralModel OptimizationONNXOpenAIPyTorchPythonQuantizationReal-Time / StreamingResearch / PapersRoadmapSageMakerSpeculative DecodingStable DiffusionTensorRTTool UseTritonTutorial

Use Cases

LLM ServingReal-time Text GenerationConversational AI DeploymentModel Inference OptimizationGPU-accelerated NLP Applications

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

[https://nvbugs/5944411][fix] Handle anyOf parameter schemas in Qwen3Coder tool parser (#12173)

Joyjit Daw • Mar 13, 2026

9a9dc3c

[None][feat] Add mix-precision checkpoint support in AutoDeploy (#12175)

Frida Hou • Mar 13, 2026

7754c66

[None][feat] Qwen3.5 perf optimizations (#11581)

Suyog Gupta • Mar 13, 2026

390a7fd

Quality

production
Quality
high
Maturity
production

Categories

Inference & ServingPrimaryRAG & RetrievalEvals & BenchmarkingMLOps & InfrastructureCloud & PlatformsLearning ResourcesData Science & AnalyticsFoundation ModelsAI AgentsModel TrainingGenerative MediaEdge & Mobile AISearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationDeveloper Platform

Languages

Python100.0%

Timeline

Project created
Aug 16, 2023
Forked
Mar 14, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…