Library/AutoAWQ
Library/AutoAWQForked

casper-hansen/AutoAWQ

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Builder

casper-hansen

casper-hansen

casper-hansen • individual

Stars

2,319

Using upstream star count

Forks

298

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Aug 25, 2023

Project creation date

README Summary

AutoAWQ is a Python library that implements the AWQ (Activation-aware Weight Quantization) algorithm for efficient 4-bit quantization of large language models. It provides a 2x speedup during inference while maintaining model accuracy through activation-aware weight quantization techniques. The library offers easy-to-use APIs for quantizing and running inference on popular transformer models.

AI Dev Skills

Unmapped

Model QuantizationNeural Network OptimizationLarge Language Model DeploymentTransformer ArchitectureMemory OptimizationInference AccelerationModel CompressionHardware-Aware Optimization

Tags

Model QuantizationNeural Network OptimizationLarge Language Model DeploymentTransformer ArchitectureMemory OptimizationInference AccelerationModel CompressionHardware-Aware OptimizationMemory-constrained Model DeploymentCost-efficient Model ServingSelf-hostedActivation-aware QuantizationSmall Language ModelsTextEdge/MobileLLM Inference Acceleration4-bit Precision ComputingWeight CompressionEdge Device LLM DeploymentEfficient AIOn-premiseCloud APIOn-device AIPyTorch Model OptimizationProduction Model OptimizationCUDA ProgrammingInference OptimizationPython

Taxonomy

Recent Activity

Updated 11 months ago

7 Days

0

30 Days

0

90 Days

0

Quality

production
Quality
high
Maturity
production

Categories

Foundation ModelsPrimaryModel TrainingInference & ServingEdge & Mobile AIOther AI / ML

PM Skills

Scale & Reliability

Languages

Python100.0%

Timeline

Project created
Aug 25, 2023
Forked
Mar 22, 2026
Your last push
11 months ago
Upstream last push
11 months ago
Tracked since
May 11, 2025

Similar Repos

pgvector cosine similarity · $0

Loading…