Ask HN: Cheapest hardware to run Llama 2 70B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • exllama

    A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

  • petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  • If you have a lot of money (but not H100/A100 money), get 4090s as they're currently the best bang for your buck on the CUDA side (according to George Hotz). If broke, get multiple second hand 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni.... If unwilling to spend any money at all and just want to play around with llama70b, look into petals https://github.com/bigscience-workshop/petals

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • The only info I can provide is the table I've seen on: https://github.com/jmorganca/ollama where it states one needs "32 GB to run the 13B models." I would assume you may need a GPU for this.

    Related, could someone please point me in the right direction on how to run Wizard Vicuna Uncensored or Llama2 13B locally in Linux? I've been searching for a guide and have not found what I need for a beginner like myself. In the Github I referenced the download is only for Mac at the time. I have a Macbook Pro M1 I can use though it's running Debian.

    Thank you.

  • llama.cpp

    LLM inference in C/C++

  • Was it from here: https://github.com/ggerganov/llama.cpp

    Do you have a guide that you followed and could link it to me or was it just from prior knowledge?

  • llama2.rs

    A fast llama2 decoder in pure Rust.

  • This code runs Llama2 quantized and unquantized in a roughly minimal way: https://github.com/srush/llama2.rs (though extracting the quantized 70B weights takes a lot of RAM). I'm running the 13B quantized model on ~10-11GB of CPU memory.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Lama3-V project from a Stanford team plagiarized a lot from MiniCPM-Llama3-v2.5

    1 project | news.ycombinator.com | 3 Jun 2024
  • Llama3V is suspected to have been stolen from the MiniCPM-Llama3-v2.5 project

    1 project | news.ycombinator.com | 2 Jun 2024
  • [2209.02842] ASR2K: Speech Recognition for Around 2000 Languages without Audio

    1 project | /r/speechtech | 10 Sep 2022
  • Text-to-Speech with Speaker Diarization

    1 project | news.ycombinator.com | 2 Jun 2024
  • Here comes the Muybridge camera moment but for text. Photoshop too

    2 projects | news.ycombinator.com | 2 Jun 2024