Library/vllmForked

vllm-project/vllm

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Builder

vLLM

vLLM

vllm-project • startup

Stars

75,076

Using upstream star count

Forks

15,115

Using upstream fork count

Open Issues

0

Activity Score

0/100

1038 commits in 30d

Created

Feb 9, 2023

Project creation date

README Summary

vLLM is a fast and memory-efficient inference and serving engine for Large Language Models (LLMs) that provides high-throughput serving with features like continuous batching, optimized CUDA kernels, and support for popular models like Llama, GPT, and others. It offers both offline inference APIs and online serving capabilities with OpenAI-compatible APIs, making it easy to deploy and scale LLM applications. The engine is designed to maximize GPU utilization and minimize memory usage through advanced techniques like PagedAttention.

AI Dev Skills

Unmapped

Large Language Model InferenceModel Serving ArchitectureMemory OptimizationHigh-Throughput ComputingDistributed SystemsGPU AccelerationTransformer ArchitectureModel DeploymentPerformance OptimizationBatching Algorithms

Tags

Large Language Model InferenceModel Serving ArchitectureMemory OptimizationHigh-Throughput ComputingDistributed SystemsGPU AccelerationTransformer ArchitectureModel DeploymentPerformance OptimizationBatching AlgorithmsOn-premiseCloud ServicesProduction Language Model DeploymentLLM API ServingHigh-Volume Text GenerationEfficient Model InferenceModel Serving InfrastructureEnterprise SoftwareProduction AI SystemsCloud APITextEfficient AI DeploymentScalable AI Application BackendBatch Processing of Language TasksLarge Language ModelsDeveloper ToolsSelf-hostedPython

Taxonomy

Recent Activity

Updated 27 days ago

7 Days

194

30 Days

1038

90 Days

3592

Quality

production
Quality
high
Maturity
production

Categories

Dev Tools & AutomationPrimaryInference & ServingML Platform & InfrastructureOther AI / MLFoundation Models

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Feb 9, 2023
Forked
Mar 13, 2026
Your last push
27 days ago
Upstream last push
6 days ago
Tracked since
Mar 17, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…