Top 12 C++ llm Projects

llama.cpp

780 57,984 10.0 C++

LLM inference in C/C++

Project mention: New exponent functions that make SiLU and SoftMax 2x faster, at full acc | news.ycombinator.com | 2024-05-15

LocalAI

83 20,346 9.9 C++

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

Project mention: LocalAI: Self-hosted OpenAI alternative reaches 2.14.0 | news.ycombinator.com | 2024-05-03

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
PowerInfer

4 6,996 9.8 C++

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26

koboldcpp

180 3,951 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

Project mention: Any Online Communities on Local/Home AI? | news.ycombinator.com | 2024-04-24

cortex

8 1,635 9.8 C++

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan (by janhq)

Project mention: Introducing Jan | dev.to | 2024-05-05

Jan incorporates a lightweight, built-in inference server called Nitro. Nitro supports both llama.cpp and NVIDIA's TensorRT-LLM engines. This means many open LLMs in the GGUF format are supported. Jan's Model Hub is designed for easy installation of pre-configured models but it also allows you to install virtually any model from Hugging Face or even your own.

rwkv.cpp

12 1,111 6.8 C++

INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Project mention: Eagle 7B: Soaring past Transformers | news.ycombinator.com | 2024-01-28

There's https://github.com/saharNooby/rwkv.cpp, which related-ish[0] to ggml/llama.cpp
[0]: https://github.com/ggerganov/llama.cpp/issues/846

distributed-llama

4 780 9.2 C++

Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.

Project mention: Distributed Grok-1 (314B) | news.ycombinator.com | 2024-04-15

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
awadb

3 159 8.9 C++

AI Native database for embedding vectors

Project mention: An AI Native database for embedding vectors | news.ycombinator.com | 2023-09-10

What is your novelty? I just see HNSW.
What is the relationship between awa and vearch? You both have a 'gamma' index with identical copy:
https://github.com/awa-ai/awadb/tree/main/awadb/db_engine/in...

llama_cpp.rb

2 144 9.6 C++

llama_cpp provides Ruby bindings for llama.cpp

Project mention: Llama.cpp: Full CUDA GPU Acceleration | news.ycombinator.com | 2023-06-12

Python sits on the C-glue segment of programming languages (where Perl, PHP, Ruby and Node are also notable members). Being a glue language means having APIs to a lot of external toolchains written in not only C/C++ but many other compiled languages, APIs and system resources. Conda, virtualenv, etc. are godsend modules for making it all work, or even better, to freeze things once they all work, without resourcing to Docker, VMs or shell scripts. It's meant for application and DevOps people who need to slap together, ie, ML, Numpy, Elasticsearch, AWS APIs and REST endpoints and Get $hit Done.
It's annoying to see them "glueys" compared to the binary compiled segment where the heavy lifting is done. Python and others exist to latch on and assimilate. Resistance is futile:
https://pypi.org/project/pyllamacpp/
https://www.npmjs.com/package/llama-node
https://packagist.org/packages/kambo/llama-cpp-php
https://github.com/yoshoku/llama_cpp.rb

collider

1 117 9.4 C++

Large Model Collider - The Platform for serving LLM models

Project mention: Show HN: Collider – the platform for local LLM debug and inference at warp speed | news.ycombinator.com | 2023-11-30

redis-llm

1 46 7.7 C++

redis-llm integrates LLM with Redis. It can help LLM to access private data, and remember long chat history.

Project mention: Show HN: Redis-LLM – Redis module integrates LLM with Redis | news.ycombinator.com | 2023-07-10

rendezllama

1 8 7.7 C++

CLI for llama.cpp with various commands to guide, edit, and regenerate tokens on the fly.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ llm related posts

New exponent functions that make SiLU and SoftMax 2x faster, at full acc

2 projects | news.ycombinator.com | 15 May 2024
Gemini Flash

2 projects | news.ycombinator.com | 14 May 2024
Ggml: Add Flash Attention

1 project | news.ycombinator.com | 13 May 2024
Structured: Extract Data from Unstructured Input with LLM

3 projects | dev.to | 10 May 2024
IBM Granite: A Family of Open Foundation Models for Code Intelligence

3 projects | news.ycombinator.com | 7 May 2024
Ask HN: Affordable hardware for running local large language models?

1 project | news.ycombinator.com | 5 May 2024
Better and Faster Large Language Models via Multi-Token Prediction

1 project | news.ycombinator.com | 1 May 2024
A note from our sponsor - SaaSHub
www.saashub.com | 16 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source llm projects in C++? This list will help you:

	Project	Stars
1	llama.cpp	57,984
2	LocalAI	20,346
3	PowerInfer	6,996
4	koboldcpp	3,951
5	cortex	1,635
6	rwkv.cpp	1,111
7	distributed-llama	780
8	awadb	159
9	llama_cpp.rb	144
10	collider	117
11	redis-llm	46
12	rendezllama	8

C++ llm

Top 12 C++ llm Projects

C++ llm related posts

New exponent functions that make SiLU and SoftMax 2x faster, at full acc

Gemini Flash

Ggml: Add Flash Attention

Structured: Extract Data from Unstructured Input with LLM

IBM Granite: A Family of Open Foundation Models for Code Intelligence

Ask HN: Affordable hardware for running local large language models?

Better and Faster Large Language Models via Multi-Token Prediction

Index