Ask HN: Which LLMs can run locally on most consumer computers

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • jan

    Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

  • seconded - IMHO Jan has the cleanest UI and most straightforward setup out of all LLM frontends available now.

    https://jan.ai/

    https://github.com/janhq/jan

  • ollama

    Get up and running with Llama 3, Mistral, Gemma, and other large language models.

  • I was able to successfully run Llama 3 8B, mistral 7B, phi and other 7B models using Ollama [1] on my M1 MacBook Air.

    [1] https://ollama.com

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • llama.cpp

    LLM inference in C/C++

  • Here [1] is a reference to the token/sec of Llama 3 on different apple hardware. You can evaluate if this is an acceptable performance for your agents. I would assume the token/sec would be much lower if the LLM agent is running along the side as the game would also be using a portion of the CPU and GPU. I think this is something that you need to test out on your own to determine its usability.

    You can also look into lower parameter models (3B for example) to determine if the balance between accuracy and performance fits under your usecase.

    >Is there a way to reliably package these models with existing games and make them run locally? This would virtually make inference free right?

    I don't have any knowledge on game dev so I can comment on this but yes, packaging it locally would make the inference free.

    [1] https://github.com/ggerganov/llama.cpp/discussions/4167

  • llamafile

    Distribute and run LLMs with a single file.

  • See https://github.com/Mozilla-Ocho/llamafile, a standalone packaging of llama.cpp that runs an LLM locally. It will use the GPU, but it also falls back on the CPU. CPU performance of small, quantized models is still pretty decent, and the page has estimated memory requirements for different models.

  • primeqa

    The prime repository for state-of-the-art Multilingual Question Answering research and development.

  • There is actually a specific approach of this concept for generating synthetic data for training called UDAPDR[0].

    It or something like it could likely be applied to any form of generation including what you are describing.

    [0] - https://github.com/primeqa/primeqa/tree/4ae1b456dbe9f75276fe...

  • GPU-Benchmarks-on-LLM-Inference

    Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

  • LocalLLaMA subreddit usually has some interesting benchmarks and reports.

    Here is one example, testing performance of different GPUs and Macs with various flavours of Llama:

    https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

  • LocalLLaMA subreddit usually has some interesting benchmarks and reports.

    Here is one example, testing performance of different GPUs and Macs with various flavours of Llama:

    https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inferen...

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Devoxx Genie Plugin : an Update

    6 projects | dev.to | 28 May 2024
  • Llama3.np: pure NumPy implementation of Llama3

    10 projects | news.ycombinator.com | 16 May 2024
  • Introducing Jan

    4 projects | dev.to | 5 May 2024
  • LocalAI: Self-hosted OpenAI alternative reaches 2.14.0

    1 project | news.ycombinator.com | 3 May 2024
  • Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B

    11 projects | news.ycombinator.com | 28 Apr 2024