LLaMA: A foundational, 65B-parameter large language model

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

llama

184 53,502 8.1 Python

Inference code for Llama models

> To maintain integrity and prevent misuse, we are releasing our model under a noncommercial license focused on research use cases. Access to the model will be granted on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world. People interested in applying for access can find the link to the application in our research paper.
The closest you are going to get to the source is here: https://github.com/facebookresearch/llama
It is still unclear if your're even going to get access to the entire model. Even if you did, you can't use it for your commercial product anyway.

xformers

46 7,711 9.3 Python

Hackable and optimized Transformers building blocks, supporting a composable construction.

I'm going to assume you know how to stand up and manage a distributed training cluster as a simplifying assumption. Note this is an aggressive assumption.
You would need to replicate the preprocessing steps. Replicating these steps is going to be tricky as they are not described in detail.Then you would need to implement the model using xformers [1]. Using xformers is going to save you a lot of compute spend. You will need to manually implement the backwards pass to reduce recomputation of expensive activations.
The model was trained using 2048 A100 GPUs with 80GBs of VRAM. A single 8 A100 GPU machine from Lambda Cloud costs $12.00/hr [2]. The team from meta used 256 such machines giving you a per day cost of $73,728. It takes 21 days to train this model. The upfront lower bound cost estimate of doing this is [(12.00 * 24) * 21 * 256) = ] $1,548,288 dollars assuming everything goes smoothly and your model doesn't bite it during training. You may be able to negotiate bulk pricing for these types of workloads.
That dollar value is just for the compute resources alone. Given the compute costs required you will probably also want a team composed of ML Ops engineers to monitor the training cluster and research scientists to help you with the preprocessing and model pipelines.
[1] https://github.com/facebookresearch/xformers

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
gpt_index

48 7,332 9.8 Python

Discontinued LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. [Moved to: https://github.com/jerryjliu/llama_index]

(creator of gpt index / llamaindex here https://github.com/jerryjliu/gpt_index)
Funny that we had just rebranded our tool from GPT Index to LlamaIndex about a week ago to avoid potential trademark issues with OpenAI, and turns out Meta has similar ideas around LLM+llama puns :). Must mean the name is good though!
Also very excited to try plugging in the LLaMa model into LlamaIndex, will report the results.

Quake-III-Arena

37 6,818 0.0 C

Quake III Arena GPL Source Release

You mean this code?
https://archive.softwareheritage.org/browse/content/sha1_git...
Do you see that notice at the top of the file? It says:
==
This file is part of Quake III Arena source code.
Quake III Arena source code is free software; you can redistribute it

FlexGen

39 9,022 3.5 Python

Running large language models on a single GPU for throughput-oriented scenarios.

If you're patient, https://github.com/FMInference/FlexGen lets you trade off GPU RAM for system RAM or even disk space.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Run 70B LLM Inference on a Single 4GB GPU with This New Technique

3 projects | news.ycombinator.com | 3 Dec 2023
Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed

1 project | /r/hardware | 17 Apr 2023
FlexGen: Running large language models on a single GPU

1 project | /r/hypeurls | 26 Mar 2023
FlexGen: Running large language models on a single GPU

1 project | /r/patient_hackernews | 26 Mar 2023
FlexGen: Running large language models on a single GPU

1 project | /r/hackernews | 26 Mar 2023

LLaMA: A foundational, 65B-parameter large language model

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
chatgpt Deep Learning gpt-3 high-throughput large-language-models
Post date: 24 Feb 2023

llama

xformers

InfluxDB

gpt_index

Quake-III-Arena

FlexGen

SaaSHub

Related posts

Run 70B LLM Inference on a Single 4GB GPU with This New Technique

Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed

FlexGen: Running large language models on a single GPU

FlexGen: Running large language models on a single GPU

FlexGen: Running large language models on a single GPU

LLaMA: A foundational, 65B-parameter large language model

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com chatgpt Deep Learning gpt-3 high-throughput large-language-models Post date: 24 Feb 2023

llama

xformers

InfluxDB

gpt_index

Quake-III-Arena

FlexGen

SaaSHub

Related posts

Run 70B LLM Inference on a Single 4GB GPU with This New Technique

Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed

FlexGen: Running large language models on a single GPU

FlexGen: Running large language models on a single GPU

FlexGen: Running large language models on a single GPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
chatgpt Deep Learning gpt-3 high-throughput large-language-models
Post date: 24 Feb 2023