Library/TensorRT-LLM
Library/TensorRT-LLMForked

NVIDIA/TensorRT-LLM

TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

Builder

NVIDIA

NVIDIA

NVIDIA • big-tech

Stars

13,252

Using upstream star count

Forks

2,244

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 16, 2023

Project creation date

README Summary

TensorRT-LLM is NVIDIA's Python API framework for defining and optimizing Large Language Models for efficient inference on NVIDIA GPUs. It provides both Python and C++ runtimes with state-of-the-art optimizations to orchestrate high-performance LLM inference execution.

AI Dev Skills

Unmapped

GPU ComputingLarge Language Model InferenceModel OptimizationCUDA ProgrammingTransformer ArchitectureTensor OperationsPerformance EngineeringRuntime SystemsNeural Network CompilationMemory Management

Tags

GPU ComputingLarge Language Model InferenceModel OptimizationCUDA ProgrammingTransformer ArchitectureTensor OperationsPerformance EngineeringRuntime SystemsNeural Network CompilationMemory ManagementDeveloper ToolsCloud ComputingLarge Language ModelsModel Inference OptimizationTextSelf-hostedGPU AccelerationGPU ClustersOn-premiseCloud APIInference EfficiencyLLM ServingConversational AI DeploymentEnterprise SoftwareProduction AI SystemsGPU-accelerated NLP ApplicationsReal-time Text GenerationPython

Taxonomy

Recent Activity

Updated 1 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryOther AI / MLDev Tools & AutomationInference & ServingNLP & Text

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Aug 16, 2023
Forked
Mar 14, 2026
Your last push
1 months ago
Upstream last push
6 days ago
Tracked since
Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…