Ask HN: Cheapest hardware to run Llama 2 70B

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

exllama

64 2,632 9.0 Python

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
petals

99 8,763 8.3 Python

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

If you have a lot of money (but not H100/A100 money), get 4090s as they're currently the best bang for your buck on the CUDA side (according to George Hotz). If broke, get multiple second hand 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni.... If unwilling to spend any money at all and just want to play around with llama70b, look into petals https://github.com/bigscience-workshop/petals

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
ollama

221 69,806 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

The only info I can provide is the table I've seen on: https://github.com/jmorganca/ollama where it states one needs "32 GB to run the 13B models." I would assume you may need a GPU for this.
Related, could someone please point me in the right direction on how to run Wizard Vicuna Uncensored or Llama2 13B locally in Linux? I've been searching for a guide and have not found what I need for a beginner like myself. In the Github I referenced the download is only for Mac at the time. I have a Macbook Pro M1 I can use though it's running Debian.
Thank you.

llama.cpp

789 59,389 10.0 C++

LLM inference in C/C++

Was it from here: https://github.com/ggerganov/llama.cpp
Do you have a guide that you followed and could link it to me or was it just from prior knowledge?

llama2.rs

3 984 8.9 Rust

A fast llama2 decoder in pure Rust.

This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Lama3-V project from a Stanford team plagiarized a lot from MiniCPM-Llama3-v2.5

1 project | news.ycombinator.com | 3 Jun 2024
Llama3V is suspected to have been stolen from the MiniCPM-Llama3-v2.5 project

1 project | news.ycombinator.com | 2 Jun 2024
[2209.02842] ASR2K: Speech Recognition for Around 2000 Languages without Audio

1 project | /r/speechtech | 10 Sep 2022
Text-to-Speech with Speaker Diarization

1 project | news.ycombinator.com | 2 Jun 2024
Here comes the Muybridge camera moment but for text. Photoshop too

2 projects | news.ycombinator.com | 2 Jun 2024