-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
https://github.com/vllm-project/vllm is probably more optimized for that use case.
Refact was made for this: https://github.com/smallcloudai/refact
Setting up a server for multiple users is very different from setting up LLM for yourself. A safe bet would be to just use TGI, which supports continuous batching and is very easy to run via Docker on your server. https://github.com/huggingface/text-generation-inference
I looked into how to deploy an open-source code LLM for a dev team a couple months ago and identified five questions to figure out:
Related posts
-
Hugging Face reverts the license back to Apache 2.0
-
Deploying Llama2 with vLLM vs TGI. Need advice
-
Continuous batch enables 23x throughput in LLM inference and reduce p50 latency
-
HuggingFace Text Generation License No Longer Open-Source
-
HuggingFace Text Generation Library License Changed from Apache 2 to Hfoil