SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python quantization Projects
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
Pretrained-Language-Model
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
-
xTuring
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
-
aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
-
model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
-
intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse
Project mention: I'm developing an open-source AI tool called xTuring, enabling anyone to construct a Language Model with just 5 lines of code. I'd love to hear your thoughts! | /r/machinelearningnews | 2023-09-07Explore the project on GitHub here.
Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.
1. https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...
Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02Shout out to Huggingface's Optimum – which made it easier to quantize models.
I have been playing around in pytorch with an a770 16GB card and hit this error. The response seems to be https://github.com/intel/intel-extension-for-pytorch/issues/... that larger than 4gb allocations aren't supported even though the card is 16gb. I haven't seen a ton of stuff on intel arc for machine learning so wanted to share my experience
Project mention: Hi, What could be the best HLS tool for implementing neural networks on FPGA | /r/FPGA | 2023-06-13FINN - https://github.com/Xilinx/finn
Using the currently popular gptq the 3bit quantization hurts performance much more than 4bit, but there's also awq (https://github.com/mit-han-lab/llm-awq) and squishllm (https://github.com/SqueezeAILab/SqueezeLLM) which are able to manage 3bit without as much performance drop - I hope to see them used more commonly.
Project mention: Half-Quadratic Quantization of Large Machine Learning Models | news.ycombinator.com | 2024-03-14
You can find LLM models in the onnx format here: https://github.com/tpoisonooo/llama.onnx
Python quantization related posts
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
-
Half-Quadratic Quantization of Large Machine Learning Models
-
New Mixtral HQQ Quantzied 4-bit/2-bit configuration
-
[D] Which framework do you use for applying post-training quantization on image classification models?
-
Half-Quadratic Quantization of Large Machine Learning Models
-
Now I Can Just Print That Video
-
A note from our sponsor - SaaSHub
www.saashub.com | 1 Jun 2024
Index
What are some of the best open-source quantization projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | LLaMA-Factory | 22,989 |
2 | Chinese-LLaMA-Alpaca | 17,653 |
3 | faster-whisper | 9,424 |
4 | AutoGPTQ | 3,906 |
5 | Pretrained-Language-Model | 2,971 |
6 | deepsparse | 2,902 |
7 | xTuring | 2,532 |
8 | mixtral-offloading | 2,261 |
9 | optimum | 2,225 |
10 | neural-compressor | 2,016 |
11 | aimet | 1,956 |
12 | model-optimization | 1,473 |
13 | mmrazor | 1,387 |
14 | intel-extension-for-pytorch | 1,403 |
15 | nncf | 832 |
16 | finn | 673 |
17 | quanto | 611 |
18 | SqueezeLLM | 580 |
19 | fastT5 | 540 |
20 | qkeras | 527 |
21 | hqq | 515 |
22 | llama.onnx | 327 |
23 | Sparsebit | 321 |
Sponsored