mitkox/vllm-turboquant
vllm-turboquant
vLLM 0.18.1rc1 with TurboQuant
Builder

mitkox
mitkox • individual
Stars
403
Using upstream star count
Forks
81
Using upstream fork count
Open Issues
0
Activity Score
0/100
0 commits in 30d
Created
Mar 25, 2026
Project creation date
README Summary
This repository contains vLLM version 0.18.1rc1 integrated with TurboQuant optimization technology. It provides enhanced performance for large language model inference through quantization techniques. The integration aims to improve inference speed and memory efficiency for LLM deployments.
AI Dev Skills
Unmapped
Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm
Tags
Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm
Taxonomy
AI Trends
Deployment Context
Modalities
Skill Areas
Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm
Use Cases
Efficient LLM InferenceCost-optimized Model DeploymentMemory-constrained LLM ServingLow-latency Language Model InferenceHigh-throughput Batch ProcessingMemory-constrained LLM deploymentLatency-optimized inference servingCost-reduced model servingEdge and resource-limited environment LLM inferenceLow-latency LLM inferenceMemory-constrained deploymentCost-optimized model servingReal-time language model inference
Recent Activity
Updated 17 days ago
7 Days
0
30 Days
0
90 Days
0
Quality
beta- Quality
- medium
- Maturity
- beta
Categories
Foundation ModelsPrimaryInference & ServingDev Tools & Automation
PM Skills
Developer Platform
Languages
Python100.0%
Timeline
- Project created
- Mar 25, 2026
- Forked
- Mar 28, 2026
- Your last push
- 17 days ago
- Upstream last push
- 7 days ago
- Tracked since
- Mar 27, 2026
Similar Repos
pgvector cosine similarity · $0
Loading…