Vllm Alternatives

Similar projects and alternatives to vllm

llama.cpp

777 57,463 10.0 C++ vllm VS llama.cpp

LLM inference in C/C++
ROCm

198 3,637 0.0 Python vllm VS ROCm

Discontinued AMD ROCm™ Software - GitHub Home [Moved to: https://github.com/ROCm/ROCm]
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
mlc-llm

89 17,053 9.9 Python vllm VS mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
FastChat

83 34,514 9.6 Python vllm VS FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
FLiPStackWeekly

81 14 9.9 vllm VS FLiPStackWeekly

FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
bruno

55 19,743 9.9 JavaScript vllm VS bruno

Opensource IDE For Exploring and Testing Api's (lightweight alternative to postman/insomnia)
text-generation-inference

29 7,938 9.6 Python vllm VS text-generation-inference

Large Language Model Text Generation Inference
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
TensorRT

22 9,145 5.0 C++ vllm VS TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
axolotl

29 5,899 9.8 Python vllm VS axolotl

Go ahead and axolotl questions
nerd-dictation

28 1,171 2.9 Python vllm VS nerd-dictation

Simple, hackable offline speech to text - using the VOSK-API.
CTranslate2

14 2,825 8.9 C++ vllm VS CTranslate2

Fast inference engine for Transformer models
fiftyone

21 6,756 10.0 Python vllm VS fiftyone

The open-source tool for building high-quality datasets and computer vision models
AdaptiveCpp

19 1,046 9.7 C++ vllm VS AdaptiveCpp

Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!
LAVIS

18 8,781 6.3 Jupyter Notebook vllm VS LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
virtualagc

13 2,494 8.9 Assembly vllm VS virtualagc

Virtual Apollo Guidance Computer (AGC) software
OpenPipe

13 2,381 9.9 TypeScript vllm VS OpenPipe

Turn expensive prompts into cheap fine-tuned models
oasdiff

12 587 9.2 Go vllm VS oasdiff

OpenAPI Diff and Breaking Changes
lmdeploy

4 2,482 9.8 Python vllm VS lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
clip-retrieval

11 2,152 7.7 Jupyter Notebook vllm VS clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
Llama-2-Onnx

3 987 6.7 Python vllm VS Llama-2-Onnx
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better vllm alternative or higher similarity.

Suggest an alternative to vllm

vllm reviews and mentions

Posts with mentions or reviews of vllm. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-09.

AI leaderboards are no longer useful. It's time to switch to Pareto curves
1 project | news.ycombinator.com | 30 Apr 2024

I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Mistral AI Launches New 8x22B Moe Model
4 projects | news.ycombinator.com | 9 Apr 2024

The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
FLaNK AI for 11 March 2024
46 projects | dev.to | 11 Mar 2024
Show HN: We got fine-tuning Mistral-7B to not suck
4 projects | news.ycombinator.com | 7 Feb 2024

Great question! scheduling workloads onto GPUs in a way where VRAM is being utilised efficiently was quite the challenge.
What we found was the IO latency for loading model weights into VRAM will kill responsiveness if you don't "re-use" sessions (i.e. where the model weights remain loaded and you run multiple inference sessions over the same loaded weights).
Obviously projects like https://github.com/vllm-project/vllm exist but we needed to build out a scheduler that can run a fleet of GPUs for a matrix of text/image vs inference/finetune sessions.
disclaimer: I work on Helix
Mistral CEO confirms 'leak' of new open source AI model nearing GPT4 performance
5 projects | news.ycombinator.com | 31 Jan 2024

FYI, vLLM also just added experiment multi-lora support: https://github.com/vllm-project/vllm/releases/tag/v0.3.0
Also check out the new prefix caching, I see huge potential for batch processing purposes there!
VLLM Sacrifices Accuracy for Speed
1 project | news.ycombinator.com | 23 Jan 2024
Easy, fast, and cheap LLM serving for everyone
1 project | news.ycombinator.com | 17 Dec 2023
vllm
1 project | news.ycombinator.com | 15 Dec 2023
Mixtral Expert Parallelism
1 project | news.ycombinator.com | 15 Dec 2023
Mixtral 8x7B Support
1 project | news.ycombinator.com | 11 Dec 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 11 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Stats

Basic vllm repo stats

Mentions

Stars

18,931

Activity

9.9

Last Commit

6 days ago

vllm-project/vllm is an open source project licensed under Apache License 2.0 which is an OSI approved license.

The primary programming language of vllm is Python.

Popular Comparisons