Top 23 Python quantization Projects

LLaMA-Factory

3 22,989 9.9 Python

Unify Efficient Fine-Tuning of 100+ LLMs

Project mention: FLaNK-AIM Weekly 06 May 2024 | dev.to | 2024-05-06

Chinese-LLaMA-Alpaca

4 17,653 8.3 Python

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
faster-whisper

24 9,424 8.1 Python

Faster Whisper transcription with CTranslate2

Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26

I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...

AutoGPTQ

19 3,906 9.3 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18

Pretrained-Language-Model

1 2,971 6.1 Python

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
deepsparse

21 2,902 9.4 Python

Sparsity-aware deep learning inference runtime for CPUs

Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

xTuring

31 2,532 8.4 Python

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07

Explore the project on GitHub here.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mixtral-offloading

3 2,261 8.6 Python

Run Mixtral-8x7B models in Colab or consumer desktops

Project mention: DBRX: A New Open LLM | news.ycombinator.com | 2024-03-27

Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...

optimum

8 2,225 9.4 Python

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

Shout out to Huggingface's Optimum – which made it easier to quantize models.

neural-compressor

3 2,016 9.8 Python

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
aimet

2 1,956 9.7 Python

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
model-optimization

1 1,473 6.8 Python

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
mmrazor

4 1,387 2.8 Python

OpenMMLab Model Compression Toolbox and Benchmark.
intel-extension-for-pytorch

16 1,403 9.7 Python

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Project mention: Intel Arc A770: Arrays larger than 4GB crashes | news.ycombinator.com | 2024-05-07

I have been playing around in pytorch with an a770 16GB card and hit this error. The response seems to be https://github.com/intel/intel-extension-for-pytorch/issues/... that larger than 4gb allocations aren't supported even though the card is 16gb. I haven't seen a ton of stuff on intel arc for machine learning so wanted to share my experience

nncf

2 832 9.7 Python

Neural Network Compression Framework for enhanced OpenVINO™ inference

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06

finn

4 673 9.7 Python

Dataflow compiler for QNN inference on FPGAs

Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13

FINN - https://github.com/Xilinx/finn

quanto

1 611 9.7 Python

A pytorch Quantization Toolkit

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

SqueezeLLM

5 580 6.9 Python

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Project mention: Llama33B vs Falcon40B vs MPT30B | /r/LocalLLaMA | 2023-07-05

Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.

fastT5

5 540 0.0 Python

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
qkeras

3 527 6.2 Python

QKeras: a quantization deep learning library for Tensorflow Keras
hqq

7 515 9.4 Python

Official implementation of Half-Quadratic Quantization (HQQ)

Project mention: Half-Quadratic Quantization of Large Machine Learning Models | news.ycombinator.com | 2024-03-14

llama.onnx

2 327 7.3 Python

LLaMa/RWKV onnx models, quantization and testcase

Project mention: Qnap TS-264 | /r/LocalLLaMA | 2023-06-29

You can find LLM models in the onnx format here: https://github.com/tpoisonooo/llama.onnx

Sparsebit

1 321 5.9 Python

A model compression and acceleration toolbox based on pytorch.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python quantization related posts

Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

7 projects | dev.to | 29 Apr 2024
Apple Explores Home Robotics as Potential 'Next Big Thing'

3 projects | news.ycombinator.com | 4 Apr 2024
Half-Quadratic Quantization of Large Machine Learning Models

1 project | news.ycombinator.com | 14 Mar 2024
New Mixtral HQQ Quantzied 4-bit/2-bit configuration

1 project | news.ycombinator.com | 18 Dec 2023
[D] Which framework do you use for applying post-training quantization on image classification models?

1 project | /r/MachineLearning | 9 Dec 2023
Half-Quadratic Quantization of Large Machine Learning Models

3 projects | news.ycombinator.com | 7 Dec 2023
Now I Can Just Print That Video

5 projects | news.ycombinator.com | 4 Dec 2023
A note from our sponsor - SaaSHub
www.saashub.com | 1 Jun 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source quantization projects in Python? This list will help you:

	Project	Stars
1	LLaMA-Factory	22,989
2	Chinese-LLaMA-Alpaca	17,653
3	faster-whisper	9,424
4	AutoGPTQ	3,906
5	Pretrained-Language-Model	2,971
6	deepsparse	2,902
7	xTuring	2,532
8	mixtral-offloading	2,261
9	optimum	2,225
10	neural-compressor	2,016
11	aimet	1,956
12	model-optimization	1,473
13	mmrazor	1,387
14	intel-extension-for-pytorch	1,403
15	nncf	832
16	finn	673
17	quanto	611
18	SqueezeLLM	580
19	fastT5	540
20	qkeras	527
21	hqq	515
22	llama.onnx	327
23	Sparsebit	321