AI Code assistant for about 50-70 users

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

vllm

31 20,017 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

https://github.com/vllm-project/vllm is probably more optimized for that use case.

refact

34 1,458 9.8 JavaScript

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

Refact was made for this: https://github.com/smallcloudai/refact

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
text-generation-inference

29 8,098 9.6 Python

Large Language Model Text Generation Inference

Setting up a server for multiple users is very different from setting up LLM for yourself. A safe bet would be to just use TGI, which supports continuous batching and is very easy to run via Docker on your server. https://github.com/huggingface/text-generation-inference

deploy-os-code-llm

2 52 5.5

🌉 How to deploy an open-source code LLM for your dev team

I looked into how to deploy an open-source code LLM for a dev team a couple months ago and identified five questions to figure out:

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Hugging Face reverts the license back to Apache 2.0

1 project | news.ycombinator.com | 8 Apr 2024
Deploying Llama2 with vLLM vs TGI. Need advice

3 projects | /r/LocalLLaMA | 14 Sep 2023
Continuous batch enables 23x throughput in LLM inference and reduce p50 latency

1 project | news.ycombinator.com | 15 Aug 2023
HuggingFace Text Generation License No Longer Open-Source

3 projects | news.ycombinator.com | 29 Jul 2023
HuggingFace Text Generation Library License Changed from Apache 2 to Hfoil

1 project | news.ycombinator.com | 28 Jul 2023

AI Code assistant for about 50-70 users

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Gpt Pytorch Inference Bloom NLP
Post date: 6 Dec 2023

vllm

refact

Scout Monitoring

text-generation-inference

deploy-os-code-llm

Related posts

Hugging Face reverts the license back to Apache 2.0

Deploying Llama2 with vLLM vs TGI. Need advice

Continuous batch enables 23x throughput in LLM inference and reduce p50 latency

HuggingFace Text Generation License No Longer Open-Source

HuggingFace Text Generation Library License Changed from Apache 2 to Hfoil

AI Code assistant for about 50-70 users

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Gpt Pytorch Inference Bloom NLP Post date: 6 Dec 2023

vllm

refact

Scout Monitoring

text-generation-inference

deploy-os-code-llm

Related posts

Hugging Face reverts the license back to Apache 2.0

Deploying Llama2 with vLLM vs TGI. Need advice

Continuous batch enables 23x throughput in LLM inference and reduce p50 latency

HuggingFace Text Generation License No Longer Open-Source

HuggingFace Text Generation Library License Changed from Apache 2 to Hfoil

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA
Gpt Pytorch Inference Bloom NLP
Post date: 6 Dec 2023