GPTQ-for-LLaMa Alternatives

Similar projects and alternatives to GPTQ-for-LLaMa

text-generation-webui

877 37,401 9.9 Python GPTQ-for-LLaMa VS text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
llama.cpp

788 58,856 10.0 C++ GPTQ-for-LLaMa VS llama.cpp

LLM inference in C/C++
Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Open-Assistant

329 36,738 8.3 Python GPTQ-for-LLaMa VS Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
whisper.cpp

187 31,969 9.8 C GPTQ-for-LLaMa VS whisper.cpp

Port of OpenAI's Whisper model in C/C++
llama

184 53,605 8.0 Python GPTQ-for-LLaMa VS llama

Inference code for Llama models
transformers

181 126,915 10.0 Python GPTQ-for-LLaMa VS transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
alpaca-lora

107 18,280 3.6 Jupyter Notebook GPTQ-for-LLaMa VS alpaca-lora

Instruct-tune LLaMA on consumer hardware
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
petals

99 8,763 8.3 Python GPTQ-for-LLaMa VS petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
alpaca.cpp

94 9,878 9.4 C GPTQ-for-LLaMa VS alpaca.cpp

Discontinued Locally run an Instruction-Tuned Chat-Style LLM
SHARK

84 1,396 9.3 Python GPTQ-for-LLaMa VS SHARK

SHARK - High Performance Machine Learning Distribution
bitsandbytes

61 5,581 9.4 Python GPTQ-for-LLaMa VS bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
exllama

64 2,631 9.0 Python GPTQ-for-LLaMa VS exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
tinygrad

58 17,800 9.7 Python GPTQ-for-LLaMa VS tinygrad

Discontinued You like pytorch? You like micrograd? You love tinygrad! ❤️ [Moved to: https://github.com/tinygrad/tinygrad] (by geohot)
llm

41 5,980 9.4 Rust GPTQ-for-LLaMa VS llm

An ecosystem of Rust libraries for working with large language models
GPTQ-for-LLaMa

19 130 7.7 Python GPTQ-for-LLaMa VS GPTQ-for-LLaMa

4 bits quantization of LLaMa using GPTQ (by oobabooga)
AutoGPTQ

19 3,906 9.3 Python GPTQ-for-LLaMa VS AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
sd-webui-modelscope-text2video

17 479 10.0 Python GPTQ-for-LLaMa VS sd-webui-modelscope-text2video

Discontinued Auto1111 extension consisting of implementation of text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies [Moved to: https://github.com/deforum-art/sd-webui-text2video]
erasing

11 479 5.0 Python GPTQ-for-LLaMa VS erasing

Erasing Concepts from Diffusion Models
qlora

80 9,562 7.4 Jupyter Notebook GPTQ-for-LLaMa VS qlora

QLoRA: Efficient Finetuning of Quantized LLMs
private-gpt

131 52,412 9.2 Python GPTQ-for-LLaMa VS private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better GPTQ-for-LLaMa alternative or higher similarity.

Suggest an alternative to GPTQ-for-LLaMa

GPTQ-for-LLaMa reviews and mentions

Posts with mentions or reviews of GPTQ-for-LLaMa. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-07-10.

[P] Early in 2023 I put in a lot of work on a new machine learning project. Now I'm not sure what to do with it.
1 project | /r/MachineLearning | 3 Dec 2023

First I want to make it clear this is not a self promotion post. I hope many machine learning people come at me with questions or comments about this project. A little background about myself. I did work on the 4 bits quantization of LLaMA using GPTQ. (https://github.com/qwopqwop200/GPTQ-for-LLaMa). I've been studying AI in-depth for many years now.
GPT-4 Details Leaked
3 projects | news.ycombinator.com | 10 Jul 2023

Deploying the 60B version is a challenge though and you might need to apply 4-bit quantization with something like https://github.com/PanQiWei/AutoGPTQ or https://github.com/qwopqwop200/GPTQ-for-LLaMa . Then you can improve the inference speed by using https://github.com/turboderp/exllama .
If you prefer to use an "instruct" model à la ChatGPT (i.e. that does not need few-shot learning to output good results) you can use something like this: https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...
Rambling
1 project | /r/PygmalionAI | 30 Jun 2023

I use gptq-for-llama - from this https://github.com/qwopqwop200/GPTQ-for-LLaMa and Pygmalion 7B.
Now that ExLlama is out with reduced VRAM usage, are there any GPTQ models bigger than 7b which can fit onto an 8GB card?
2 projects | /r/LocalLLaMA | 29 Jun 2023

exllama is an optimized implementation of GPTQ-for-LLaMa, allowing you to run 4-bit quantized language models with GPU at great speeds.
GGML – AI at the Edge
11 projects | news.ycombinator.com | 6 Jun 2023

With a single NVIDIA 3090 and the fastest inference branch of GPTQ-for-LLAMA https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/fastest-i..., I get a healthy 10-15 tokens per second on the 30B models. IMO GGML is great (And I totally use it) but it's still not as fast as running the models on GPU for now.
New quantization method AWQ outperforms GPTQ in 4-bit and 3-bit with 1.45x speedup and works with multimodal LLMs
4 projects | /r/LocalLLaMA | 2 Jun 2023

And exactly what Triton version are they comparing against? I just tried the latest version of this, and on my 4090/12900K I get 77 tokens per second for Llama 7B-128g. My own GPTQ CUDA implementation gets 151 tokens/second on the same model, same hardware. That makes it 96% faster, whereas AWQ is only 79% faster. For 30B-128g I'm currently only getting a 110% speedup over Triton compared to their 178%, but it still seems a little disingenuous to compare against their own CUDA implementation only, when they're trying to present the quantization method as being faster for inference.
Introducing Basaran: self-hosted open-source alternative to the OpenAI text completion API
9 projects | /r/LocalLLaMA | 1 Jun 2023

Thanks for the explanation. I think some repos, like text generation webui used gptq for llama (I don't know if it's this repo or another one), anyway most repo that I saw use external things (like gptq for llama)
How to use AMD GPU?
4 projects | /r/LocalLLaMA | 1 Jun 2023

cd ../.. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b triton cd GPTQ-for-LLaMa pip install -r requirements.txt mkdir -p ../text-generation-webui/repositories ln -s ../../GPTQ-for-LLaMa ../text-generation-webui/repositories/GPTQ-for-LLaMa
Help needed with installing quant_cuda for the WebUI
2 projects | /r/LocalLLaMA | 31 May 2023

cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa pip install -r requirements.txt
The installed version of bitsandbytes was compiled without GPU support
2 projects | /r/Oobabooga | 29 May 2023

# To use the GPTQ models I need to Install GPTQ-for-LLaMa and the monkey patch mkdir repositories cd repositories git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa.git -b triton cd GPTQ-for-LLaMa pip install ninja pip install -r requirements.txt cd cd text-generation-webui # download random model python download-model.py xxx/yyy # try to start the gui python server.py # It returns this warning but it runs bin /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " /home/gm/miniconda3/envs/chat/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Jun 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →