Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/AutoAWQ
Library/AutoAWQForked

casper-hansen/AutoAWQ

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

View on GitHub↗Upstream casper-hansen/AutoAWQ↗

Builder

casper-hansen

casper-hansen

casper-hansen • individual

Stars

2,340

Using upstream star count

Forks

302

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 25, 2023

Project creation date

README Summary

It is no secret that maintaining a project such as AutoAWQ that has 2+ million downloads, 7000+ models on Huggingface, and 2.1k stars is hard for a solo developer who is doing this in their free time.

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Hardware-Aware OptimizationInference AccelerationLarge Language Model DeploymentMemory OptimizationModel CompressionModel QuantizationNeural Network OptimizationTransformer Architecture

Tags

Hardware-Aware OptimizationInference AccelerationLarge Language Model DeploymentMemory OptimizationModel CompressionModel QuantizationNeural Network OptimizationTransformer ArchitectureBenchmarkingDeepSeekEvalsExLlamaForkedGemmaGPU / CUDAHuggingFaceLarge Language ModelsLLM ServingLoRA / PEFTMistralMusic TechPhiPythonPyTorchQuantizationQwenResearch / PapersRoadmapTransformersvLLM

Taxonomy

AI Trends

Model EfficiencyOn-device AIGreen AIDemocratized AI AccessEdge Computing

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingLearning ResourcesIndustry: Audio & Music

Deployment Context

Self-hostedCloud APIEdge/MobileOn-premise

Modalities

Text

Skill Areas

Model QuantizationNeural Network OptimizationLarge Language Model DeploymentTransformer ArchitectureMemory OptimizationInference AccelerationModel CompressionHardware-Aware Optimization

tag

BenchmarkingDeepSeekEvalsExLlamaForkedGPU / CUDAGemmaHuggingFaceLLM ServingLarge Language ModelsLoRA / PEFTMistralMusic TechPhiPyTorchPythonQuantizationQwenResearch / PapersRoadmapTransformersvLLM

Use Cases

Large Language Model DeploymentMemory-Constrained Model ServingReal-time Text GenerationEdge AI DeploymentCost-Optimized Model InferenceHigh-Throughput Text Processing

Recent Activity

Updated 1 years ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryModel TrainingEvals & BenchmarkingInference & ServingLearning ResourcesIndustry: Audio & MusicGenerative MediaSearch & Knowledge

PM Skills

Cost & EfficiencyData & Evaluation

Languages

Python100.0%

Timeline

Project created
Aug 25, 2023
Forked
Mar 22, 2026
Your last push
1 years ago
Upstream last push
1 years ago
Tracked since
May 11, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…