-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
LocalAI
:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
-
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
https://ollama.ai. It's a menu bar Mac app to run the server and cli that lets you pull & run a variety of popular models from its library. No need to compile anything or install a bunch of dependencies. Support of Apple Silicon GPUs is enabled by default. I'd be surprised if anything else will get you up and running quickly as quickly.
I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.
Sorry, I'm somewhat familiar with this term (I've seen it as a model loader in Oobabooga), but still not following the correlation here. Are you saying I should instead be using this project in lieu of llama.cpp? Or are you saying that there is, perhaps, an exllamav2 "extension" or similar within llama.cpp that I can use?
https://github.com/c0sogi/llama-api , right? This offers better performance on GPU-optimized models, right?
fresh from the oven someone just posted this https://github.com/ghostpad/ghostpad seems like great (from https://www.reddit.com/r/LocalLLaMA/comments/18crcms/ghostpad_now_supports_llamacpp/?sort=new)!
On vscode i sometimes use continue.dev and refact.ai just for fun and they are great!
Mainly desire to control the exact prompt, so instead of UI silently cutting it, I can comment out blocks of text from being fed to the model and rewrite them to shorter blocks(UIs don't support commenting out blocks). On long stories it's quite frustrating to have only rough idea what model sees. Especially on UIs with world info where it can inject itself at will. So my tool panics if sees too many tokens and calls vim over and over(Hence the name, reaiterator until number of tokens gets reduced to desired number. Also vim is better editor than browser. Especially with undotree. I also didn't like that ooba doesn't have several generations at the same time while kobold has, but they run in parallel similar to several batches of the same prompt: it causes OoM. Not sure if this behavior still persists in kobold.cpp.
If you're running this as a server, I would recommend LocalAI https://github.com/mudler/LocalAI
I like koboldcpp for the simplicity, but currently prefer the speed of exllamav2 (e. g. Goliath 120B at over 10 tokens per second), included with oobabooga's text-generation-webui which I can remote-control easily from my browser.
Finally, no matter what backend I use, I need it to be compatible with my power-user frontend, SillyTavern. That way I always use the same UI, with the characters I created and extensions I want, e. g. web search, XTTS text-to-speech and Whisper speech recognition for real-time voice chat - and all of that local!
Related posts
-
PawanOsman/ChatGPT: Access GPT-3.5.turbo for free via an API
-
Claude 3 beats GPT-4 on Aider's code editing benchmark – aider
-
Group chats vs online defined characters, token efficiency question
-
SillyTavern 1.11.0 has been released
-
Is possible to run local voice chat agent? If yes what GPU do i Need with 500€ budget?