Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/vllm
Library/vllmForked

vllm-project/vllm

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

View on GitHub↗Upstream vllm-project/vllm↗

Builder

vLLM

vLLM

vllm-project • startup

Stars

81,406

Using upstream star count

Forks

17,421

Using upstream fork count

Open Issues

0

Activity Score

0/100

1138 commits in 30d

Created

Feb 9, 2023

Project creation date

README Summary

<!-- markdownlint-disable MD001 MD041 --> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png"> <img alt="vLLM" src="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-light.png" width=55%> </picture> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Batching AlgorithmsDistributed SystemsGPU AccelerationHigh-Throughput ComputingLarge Language Model InferenceMemory OptimizationModel DeploymentModel Serving ArchitecturePerformance OptimizationTransformer Architecture

Tags

Batching AlgorithmsDistributed SystemsGPU AccelerationHigh-Throughput ComputingLarge Language Model InferenceMemory OptimizationModel DeploymentModel Serving ArchitecturePerformance OptimizationTransformer ArchitectureBackendBatchingCachingDeepSeekEmbeddingsForkedGPU / CUDAHuggingFaceLLM ServingLarge Language ModelsLoRA / PEFTMLOpsMistralOpenAIPythonReal-Time / StreamingResearch / PapersSecuritySpeculative DecodingTransformersTutorialvLLM

Taxonomy

AI Trends

Large Language ModelsModel Serving InfrastructureEfficient AI DeploymentProduction AI Systems

category

Inference & ServingFoundation ModelsRAG & RetrievalModel TrainingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

Cloud APISelf-hostedOn-premise

Industries

Developer ToolsCloud ServicesEnterprise Software

Modalities

Text

Skill Areas

Large Language Model InferenceModel Serving ArchitectureMemory OptimizationHigh-Throughput ComputingDistributed SystemsGPU AccelerationTransformer ArchitectureModel DeploymentPerformance OptimizationBatching Algorithms

tag

BackendBatchingCachingDeepSeekEmbeddingsForkedGPU / CUDAHuggingFaceLLM ServingLarge Language ModelsLoRA / PEFTMLOpsMistralOpenAIPythonReal-Time / StreamingResearch / PapersSecuritySpeculative DecodingTransformersTutorialvLLM

Use Cases

LLM API ServingHigh-Volume Text GenerationProduction Language Model DeploymentScalable AI Application BackendEfficient Model InferenceBatch Processing of Language Tasks

Recent Activity

Updated 2 months ago

7 Days

237

30 Days

1138

90 Days

3769

Add ability to replace oot ops when using lora (#37181)

Kyuyeun Kim • Mar 17, 2026

0a0a1a1

Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)

Vadim Gimpelson • Mar 17, 2026

6c1cfba

[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)

Harry Huang • Mar 17, 2026

45f526d

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryRAG & RetrievalModel TrainingInference & ServingML Platform & InfrastructureSearch & KnowledgeOther AI / MLMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

PM Skills

Cost & EfficiencyScale & ReliabilityProduct Discovery

Languages

Python100.0%

Timeline

Project created
Feb 9, 2023
Forked
Mar 13, 2026
Your last push
2 months ago
Upstream last push
15 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…