Library/TensorRT-LLMForked

NVIDIA/TensorRT-LLM

TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

View on GitHub↗Upstream NVIDIA/TensorRT-LLM↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

13,767

Using upstream star count

Forks

2,415

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Aug 16, 2023

Project creation date

README Summary

TensorRT LLM =========================== <h4>TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.</h4>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingGPU ComputingLarge Language Model InferenceMemory ManagementModel OptimizationNeural Network CompilationPerformance EngineeringRuntime SystemsTensor OperationsTransformer Architecture

Taxonomy

AI Trends

Large Language Models Model Optimization GPU Acceleration Inference Efficiency Production AI Systems

Recent Activity

Updated 2 months ago

7 Days

30 Days

90 Days

[https://nvbugs/5944411][fix] Handle anyOf parameter schemas in Qwen3Coder tool parser (#12173)

Joyjit Daw • Mar 13, 2026

9a9dc3c

[None][feat] Add mix-precision checkpoint support in AutoDeploy (#12175)

Frida Hou • Mar 13, 2026

7754c66

[None][feat] Qwen3.5 perf optimizations (#11581)

Suyog Gupta • Mar 13, 2026

390a7fd

Quality

production

Quality: high
Maturity: production

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationDeveloper Platform

Languages

Python100.0%

Timeline

Project created: Aug 16, 2023
Forked: Mar 14, 2026
Your last push: 2 months ago
Upstream last push: 16 days ago
Tracked since: Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

Library/TensorRT-LLMForked

NVIDIA/TensorRT-LLM

TensorRT-LLM

View on GitHub↗Upstream NVIDIA/TensorRT-LLM↗

Builder

NVIDIA

NVIDIA • big-tech

Stars

13,767

Using upstream star count

Forks

2,415

Using upstream fork count

Open Issues

Activity Score

0/100

0 commits in 30d

Created

Aug 16, 2023

Project creation date

README Summary

Community Evaluation

Loading…

AI Dev Skills

Unmapped

CUDA ProgrammingGPU ComputingLarge Language Model InferenceMemory ManagementModel OptimizationNeural Network CompilationPerformance EngineeringRuntime SystemsTensor OperationsTransformer Architecture

Taxonomy

AI Trends

Large Language Models Model Optimization GPU Acceleration Inference Efficiency Production AI Systems

Recent Activity

Updated 2 months ago

7 Days

30 Days

90 Days

[https://nvbugs/5944411][fix] Handle anyOf parameter schemas in Qwen3Coder tool parser (#12173)

Joyjit Daw • Mar 13, 2026

9a9dc3c

[None][feat] Add mix-precision checkpoint support in AutoDeploy (#12175)

Frida Hou • Mar 13, 2026

7754c66

[None][feat] Qwen3.5 perf optimizations (#11581)

Suyog Gupta • Mar 13, 2026

390a7fd

Quality

production

Quality: high
Maturity: production

PM Skills

Cost & EfficiencyScale & ReliabilityData & EvaluationDeveloper Platform

Languages

Python100.0%

Timeline

Project created: Aug 16, 2023
Forked: Mar 14, 2026
Your last push: 2 months ago
Upstream last push: 16 days ago
Tracked since: Mar 13, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…

TensorRT-LLM

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos

TensorRT-LLM

README Summary

Community Evaluation

AI Dev Skills

Tags

Taxonomy

Recent Activity

Quality

Categories

PM Skills

Languages

Timeline

Similar Repos