Reporium
GraphWikiTaxonomyStacksInsightsTrendsArchitectureAI-NativeFAQ
Ask anything about the repo library…
Loading repo…
←Library/gptq
Library/gptqForked

IST-DASLab/gptq

gptq

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

View on GitHub↗Upstream IST-DASLab/gptq↗

Builder

IST-DASLab

IST-DASLab

IST-DASLab • individual

Stars

2,312

Using upstream star count

Forks

198

Using upstream fork count

Open Issues

0

Activity Score

0/100

0 commits in 30d

Created

Oct 19, 2022

Project creation date

README Summary

This repository contains the code for the ICLR 2023 paper [GPTQ: Accurate Post-training Compression for Generative Pretrained Transformers](https://arxiv.org/abs/2210.17323). The current release includes the following features:

Community Evaluation

Loading…

AI Dev Skills

Unmapped

GPU AccelerationLarge Language Model OptimizationLow-precision ArithmeticMemory OptimizationModel CompressionNeural Network PruningPost-training QuantizationTransformer Architecture

Tags

GPU AccelerationLarge Language Model OptimizationLow-precision ArithmeticMemory OptimizationModel CompressionNeural Network PruningPost-training QuantizationTransformer ArchitectureBenchmarkingC++EvalsForkedGPU / CUDAHuggingFacePythonPyTorchQuantizationResearch / PapersTransformers

Taxonomy

AI Trends

Small Language ModelsOn-device AIEfficient AIModel Compression

category

Foundation ModelsModel TrainingEvals & BenchmarkingInference & ServingLearning Resources

Deployment Context

Edge/MobileSelf-hostedOn-premise

Modalities

Text

Skill Areas

Post-training QuantizationModel CompressionLarge Language Model OptimizationTransformer ArchitectureNeural Network PruningLow-precision ArithmeticGPU AccelerationMemory Optimization

tag

BenchmarkingC++EvalsForkedGPU / CUDAHuggingFacePyTorchPythonQuantizationResearch / PapersTransformers

Use Cases

Large Language Model DeploymentMemory-constrained InferenceMobile AI ApplicationsCost-effective Model ServingEdge Computing AIResearch in Model Compression

Recent Activity

Updated 2 years ago

7 Days

0

30 Days

0

90 Days

0

Quality

research
Quality
high
Maturity
research

Categories

Foundation ModelsPrimaryModel TrainingEvals & BenchmarkingInference & ServingLearning ResourcesSearch & Knowledge

PM Skills

Cost & EfficiencyData & Evaluation

Languages

Python100.0%

Timeline

Project created
Oct 19, 2022
Forked
Mar 22, 2026
Your last push
2 years ago
Upstream last push
2 years ago
Tracked since
Mar 27, 2024

Similar Repos

pgvector cosine similarity · $0

Loading…