Ask HN: Which LLMs can run locally on most consumer computers

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

jan

20 19,542 10.0 TypeScript

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

seconded - IMHO Jan has the cleanest UI and most straightforward setup out of all LLM frontends available now.
https://jan.ai/
https://github.com/janhq/jan

ollama

221 69,806 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

I was able to successfully run Llama 3 8B, mistral 7B, phi and other 7B models using Ollama [1] on my M1 MacBook Air.
[1] https://ollama.com

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
llama.cpp

789 59,389 10.0 C++

LLM inference in C/C++

Here [1] is a reference to the token/sec of Llama 3 on different apple hardware. You can evaluate if this is an acceptable performance for your agents. I would assume the token/sec would be much lower if the LLM agent is running along the side as the game would also be using a portion of the CPU and GPU. I think this is something that you need to test out on your own to determine its usability.
You can also look into lower parameter models (3B for example) to determine if the balance between accuracy and performance fits under your usecase.
>Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right?
I don't have any knowledge on game dev so I can comment on this but yes, packaging it locally would make the inference free.
[1] https://github.com/ggerganov/llama.cpp/discussions/4167

llamafile

38 16,040 9.6 C++

Distribute and run LLMs with a single file.

See https://github.com/Mozilla-Ocho/llamafile, a standalone packaging of llama.cpp that runs an LLM locally. It will use the GPU, but it also falls back on the CPU. CPU performance of small, quantized models is still pretty decent, and the page has estimated memory requirements for different models.

primeqa

6 707 7.5 Python

The prime repository for state-of-the-art Multilingual Question Answering research and development.

There is actually a specific approach of this concept for generating synthetic data for training called UDAPDR[0].
It or something like it could likely be applied to any form of generation including what you are describing.
[0] - https://github.com/primeqa/primeqa/tree/4ae1b456dbe9f75276fe...

GPU-Benchmarks-on-LLM-Inference

3 467 7.6 Jupyter Notebook

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

LocalLLaMA subreddit usually has some interesting benchmarks and reports.
Here is one example, testing performance of different GPUs and Macs with various flavours of Llama:
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

GPU-Benchmarks-on-LLM-Inferen

2 - -

LocalLLaMA subreddit usually has some interesting benchmarks and reports.
Here is one example, testing performance of different GPUs and Macs with various flavours of Llama:
https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Devoxx Genie Plugin : an Update

6 projects | dev.to | 28 May 2024
Llama3.np: pure NumPy implementation of Llama3

10 projects | news.ycombinator.com | 16 May 2024
Introducing Jan

4 projects | dev.to | 5 May 2024
LocalAI: Self-hosted OpenAI alternative reaches 2.14.0

1 project | news.ycombinator.com | 3 May 2024
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

11 projects | news.ycombinator.com | 28 Apr 2024

Ask HN: Which LLMs can run locally on most consumer computers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Ibm llama2 Artificial intelligence llama ibm-research-ai
Post date: 21 May 2024

jan

ollama

InfluxDB

llama.cpp

llamafile

primeqa

GPU-Benchmarks-on-LLM-Inference

GPU-Benchmarks-on-LLM-Inferen

SaaSHub

Related posts

Devoxx Genie Plugin : an Update

Llama3.np: pure NumPy implementation of Llama3

Introducing Jan

LocalAI: Self-hosted OpenAI alternative reaches 2.14.0

Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

Ask HN: Which LLMs can run locally on most consumer computers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Ibm llama2 Artificial intelligence llama ibm-research-ai Post date: 21 May 2024

Related posts

Devoxx Genie Plugin : an Update

Llama3.np: pure NumPy implementation of Llama3

Introducing Jan

LocalAI: Self-hosted OpenAI alternative reaches 2.14.0

Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Ibm llama2 Artificial intelligence llama ibm-research-ai
Post date: 21 May 2024