Library/vllm-turboquant
Library/vllm-turboquantForked

mitkox/vllm-turboquant

vllm-turboquant

vLLM 0.18.1rc1 with TurboQuant

Builder

mitkox

mitkox

mitkox • individual

Stars

403

Using upstream star count

Forks

81

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Mar 25, 2026

Project creation date

README Summary

This repository contains vLLM version 0.18.1rc1 integrated with TurboQuant optimization technology. It provides enhanced performance for large language model inference through quantization techniques. The integration aims to improve inference speed and memory efficiency for LLM deployments.

AI Dev Skills

Unmapped

Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm

Tags

Quantization TechniquesModel CompressionLLM Inference OptimizationCUDA/GPU ProgrammingvLLM Framework ArchitectureTransformer Model OptimizationBatch ProcessingMemory OptimizationQuantizationModel ServingvLLM FrameworkPerformance TuningDistributed InferenceModel OptimizationLarge Language Model InferencePerformance OptimizationMemory EfficiencyTurboQuant Algorithm

Taxonomy

Recent Activity

Updated 17 days ago

7 Days

0

30 Days

0

90 Days

0

Quality

beta
Quality
medium
Maturity
beta

Categories

Foundation ModelsPrimaryInference & ServingDev Tools & Automation

PM Skills

Developer Platform

Languages

Python100.0%

Timeline

Project created
Mar 25, 2026
Forked
Mar 28, 2026
Your last push
17 days ago
Upstream last push
7 days ago
Tracked since
Mar 27, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…