What do you use to run your models?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

ollama

209 66,540 9.9 Go

Get up and running with Llama 3, Mistral, Gemma, and other large language models.

https://ollama.ai. It's a menu bar Mac app to run the server and cli that lets you pull & run a variety of popular models from its library. No need to compile anything or install a bunch of dependencies. Support of Apple Silicon GPUs is enabled by default. I'd be surprised if anything else will get you up and running quickly as quickly.

koboldcpp

180 3,951 10.0 C++

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
exllamav2

17 3,010 9.8 Python

A fast inference library for running LLMs locally on modern consumer-class GPUs

Sorry, I'm somewhat familiar with this term (I've seen it as a model loader in Oobabooga), but still not following the correlation here. Are you saying I should instead be using this project in lieu of llama.cpp? Or are you saying that there is, perhaps, an exllamav2 "extension" or similar within llama.cpp that I can use?

llama-api

1 104 8.7 Python

An OpenAI-like LLaMA inference API

https://github.com/c0sogi/llama-api , right? This offers better performance on GPU-optimized models, right?

ghostpad

2 32 9.6 Python

A free AI text generation interface based on KoboldAI

fresh from the oven someone just posted this https://github.com/ghostpad/ghostpad seems like great (from https://www.reddit.com/r/LocalLLaMA/comments/18crcms/ghostpad_now_supports_llamacpp/?sort=new)!

refact

34 1,436 9.8 JavaScript

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

On vscode i sometimes use continue.dev and refact.ai just for fun and they are great!

ReAIterator

1 2 7.2 Python

Reiterate text file through AI

Mainly desire to control the exact prompt, so instead of UI silently cutting it, I can comment out blocks of text from being fed to the model and rewrite them to shorter blocks(UIs don't support commenting out blocks). On long stories it's quite frustrating to have only rough idea what model sees. Especially on UIs with world info where it can inject itself at will. So my tool panics if sees too many tokens and calls vim over and over(Hence the name, reaiterator until number of tokens gets reduced to desired number. Also vim is better editor than browser. Especially with undotree. I also didn't like that ooba doesn't have several generations at the same time while kobold has, but they run in parallel similar to several batches of the same prompt: it causes OoM. Not sure if this behavior still persists in kobold.cpp.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
LocalAI

83 20,346 9.9 C++

:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.

If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI

text-generation-webui

876 36,827 9.9 Python

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.

SillyTavern

76 6,108 10.0 JavaScript

LLM Frontend for Power Users.

Finally, no matter what backend I use, I need it to be compatible with my power-user frontend, SillyTavern. That way I always use the same UI, with the characters I created and extensions I want, e. g. web search, XTTS text-to-speech and Whisper speech recognition for real-time voice chat - and all of that local!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

PawanOsman/ChatGPT: Access GPT-3.5.turbo for free via an API

1 project | news.ycombinator.com | 26 Apr 2024
Claude 3 beats GPT-4 on Aider's code editing benchmark – aider

6 projects | news.ycombinator.com | 27 Mar 2024
Group chats vs online defined characters, token efficiency question

2 projects | /r/SillyTavernAI | 10 Dec 2023
SillyTavern 1.11.0 has been released

1 project | /r/SillyTavernAI | 9 Dec 2023
Is possible to run local voice chat agent? If yes what GPU do i Need with 500€ budget?

2 projects | /r/KoboldAI | 7 Dec 2023

What do you use to run your models?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
AI API Characters exllama Chat
Post date: 7 Dec 2023

ollama

koboldcpp

InfluxDB

exllamav2

llama-api

ghostpad

refact

ReAIterator

SaaSHub

LocalAI

text-generation-webui

SillyTavern

Related posts

PawanOsman/ChatGPT: Access GPT-3.5.turbo for free via an API

Claude 3 beats GPT-4 on Aider's code editing benchmark – aider

Group chats vs online defined characters, token efficiency question

SillyTavern 1.11.0 has been released

Is possible to run local voice chat agent? If yes what GPU do i Need with 500€ budget?

What do you use to run your models?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA AI API Characters exllama Chat Post date: 7 Dec 2023

Related posts

PawanOsman/ChatGPT: Access GPT-3.5.turbo for free via an API

Claude 3 beats GPT-4 on Aider's code editing benchmark – aider

Group chats vs online defined characters, token efficiency question

SillyTavern 1.11.0 has been released

Is possible to run local voice chat agent? If yes what GPU do i Need with 500€ budget?

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
AI API Characters exllama Chat
Post date: 7 Dec 2023