Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/ao
Library/aoForked

pytorch/ao

ao

PyTorch native quantization and sparsity for training and inference

View on GitHub↗Upstream pytorch/ao↗

Builder

pytorch

pytorch

pytorch • individual

Stars

2,841

Using upstream star count

Forks

510

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Nov 3, 2023

Project creation date

README Summary

PyTorch-Native Training-to-Serving Model Optimization - Pre-train Llama-3.1-70B **1.5x faster** with float8 training - Recover **67% of quantized accuracy degradation** on Gemma3-4B with QAT - Quantize Llama-3-8B to int4 for **1.89x faster** inference with **58% less memory**

Community Evaluation

Loading…

AI Dev Skills

Unmapped

Custom CUDA KernelsHardware-aware TrainingLow-bit InferenceModel CompressionModel SparsityNeural Network QuantizationPost-training QuantizationPruning AlgorithmsPyTorch OptimizationQuantization-aware Training

Tags

Custom CUDA KernelsHardware-aware TrainingLow-bit InferenceModel CompressionModel SparsityNeural Network QuantizationPost-training QuantizationPruning AlgorithmsPyTorch OptimizationQuantization-aware TrainingAxolotlC++CourseEmbeddingsFine-TuningForkedFSDPGemmaGPU / CUDAHuggingFaceJupyterLarge Language ModelsLlamaLLM ServingLoRA / PEFTMobileModel OptimizationPythonPyTorchQuantizationQwenResearch / PapersSageMakerSGLangTorchTuneTransformersTutorialUnslothvLLM

Taxonomy

AI Trends

On-device AIEdge ComputingEfficient AIGreen AI

category

Model TrainingFoundation ModelsRAG & RetrievalInference & ServingCloud & PlatformsLearning ResourcesData Science & Analytics

Deployment Context

Edge/MobileCloud APIOn-premiseSelf-hosted

Modalities

Any modality supported by PyTorch

Skill Areas

Neural Network QuantizationModel SparsityPyTorch OptimizationLow-bit InferenceCustom CUDA KernelsModel CompressionHardware-aware TrainingQuantization-aware TrainingPost-training QuantizationPruning Algorithms

tag

AxolotlC++CourseEmbeddingsFSDPFine-TuningForkedGPU / CUDAGemmaHuggingFaceJupyterLLM ServingLarge Language ModelsLlamaLoRA / PEFTMobileModel OptimizationPyTorchPythonQuantizationQwenResearch / PapersSGLangSageMakerTorchTuneTransformersTutorialUnslothvLLM

Use Cases

Model Size ReductionInference Speed OptimizationMemory-efficient DeploymentEdge Device OptimizationProduction Model ServingQuantized Model Training

Recent Activity

Updated 2 months ago

7 Days

0

30 Days

0

90 Days

20

Delete deprecated CutlassSemiSparseLayout and related code (#4126)

Jerry Zhang • Mar 22, 2026

4cef023

Delete deprecated autoquant v1 and all references (#4122)

Jerry Zhang • Mar 22, 2026

5812c35

Delete autoquant_v2 and subgraph_utils (#4121)

Jerry Zhang • Mar 22, 2026

8f566e2

Quality

beta
Quality
high
Maturity
beta

Categories

Model TrainingPrimaryFoundation ModelsRAG & RetrievalInference & ServingCloud & PlatformsLearning ResourcesData Science & AnalyticsEdge & Mobile AISearch & Knowledge

PM Skills

Cost & EfficiencyProduct Discovery

Languages

Python100.0%

Timeline

Project created
Nov 3, 2023
Forked
Mar 22, 2026
Your last push
2 months ago
Upstream last push
16 days ago
Tracked since
Mar 22, 2026

Similar Repos

pgvector cosine similarity · $0

Loading…