Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/vllm-turboquant
Library/vllm-turboquantForked

mitkox/vllm-turboquant

vllm-turboquant

vLLM 0.18.1rc1 with TurboQuant

View on GitHub↗Upstream mitkox/vllm-turboquant↗

Builder

mitkox

mitkox

mitkox • individual

Stars

592

Using upstream star count

Forks

104

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 25, 2026

Project creation date

README Summary

<!-- markdownlint-disable MD001 MD041 --> <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png"> <img alt="vLLM" src="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-light.png" width=55%> </picture> </p>

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Batch ProcessingCUDA/GPU ProgrammingDistributed InferenceLarge Language Model InferenceLLM Inference OptimizationMemory EfficiencyMemory OptimizationModel CompressionModel OptimizationModel ServingPerformance OptimizationPerformance TuningQuantizationQuantization TechniquesTransformer Model OptimizationTurboQuant AlgorithmvLLM FrameworkvLLM Framework Architecture

Tags

Batch ProcessingCUDA/GPU ProgrammingDistributed InferenceLarge Language Model InferenceLLM Inference OptimizationMemory EfficiencyMemory OptimizationModel CompressionModel OptimizationModel ServingPerformance OptimizationPerformance TuningQuantizationQuantization TechniquesTransformer Model OptimizationTurboQuant AlgorithmvLLM FrameworkvLLM Framework ArchitectureBatchingCachingDeepSeekEmbeddingsForkedGPU / CUDAHuggingFaceLLM ServingLarge Language ModelsLoRA / PEFTMLOpsMistralOpenAIReal-Time / StreamingResearch / PapersSecuritySpeculative DecodingTransformersTutorialvLLM

Taxonomy

AI Trends

Model QuantizationOn-device AISmall Language ModelsInference OptimizationEfficient AIModel CompressionQuantization-Aware Optimization

category

Inference & ServingFoundation ModelsRAG & RetrievalModel TrainingMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & Safety

Deployment Context

Self-hostedOn-premiseCloud APIEdge/MobileCloud

Modalities

Text

Skill Areas

Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm

tag

ActiveBatchingCachingDeepSeekEmbeddingsForkedGPU / CUDAHuggingFaceLLM ServingLarge Language ModelsLoRA / PEFTMLOpsMistralOpenAIReal-Time / StreamingResearch / PapersSecuritySpeculative DecodingTransformersTutorialvLLM

Use Cases

Efficient LLM InferenceCost-optimized Model DeploymentMemory-constrained LLM ServingLow-latency Language Model InferenceHigh-throughput Batch ProcessingMemory-constrained LLM deploymentLatency-optimized inference servingCost-reduced model servingEdge and resource-limited environment LLM inferenceLow-latency LLM inferenceMemory-constrained deploymentCost-optimized model servingReal-time language model inference

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

2

turboquant: correctness fixes, prefill fast path, and metadata improvements

mitkox • Mar 27, 2026

5fc73a3

Initial commit

mitkox • Mar 26, 2026

7a8a095

Quality

beta
Quality
medium
Maturity
beta

Categories

Inference & ServingPrimaryRAG & RetrievalMLOps & InfrastructureDev Tools & AutomationLearning ResourcesSecurity & SafetyFoundation ModelsModel TrainingML Platform & InfrastructureSearch & KnowledgeOther AI / ML

PM Skills

Cost & EfficiencyScale & ReliabilityProduct Discovery

Languages

Python100.0%

Timeline

Project created
Mar 25, 2026
Forked
Mar 28, 2026
Your last push
2 months ago
Upstream last push
1 months ago
Tracked since
Mar 27, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…