Top 23 Python Inference Projects

ColossalAI

42 38,081 9.7 Python

Making large AI models cheaper, faster and more accessible

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

DeepSpeed

51 33,122 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
vllm

31 20,017 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30

I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.

faster-whisper

24 9,424 8.1 Python

Faster Whisper transcription with CTranslate2

Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26

I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...

text-generation-inference

29 8,098 9.6 Python

Large Language Model Text Generation Inference

Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22

server

24 7,509 9.5 Python

The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08

adversarial-robustness-toolbox

8 4,523 9.6 Python

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
torch2trt

5 4,432 7.6 Python

An easy to use PyTorch to TensorRT converter
open_model_zoo

5 3,976 8.6 Python

Pre-trained Deep Learning models and demos (high quality and extremely fast)

Project mention: FLaNK Stack Weekly 06 Nov 2023 | dev.to | 2023-11-06

AutoGPTQ

19 3,906 9.3 Python

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Project mention: Setting up LLAMA2 70B Chat locally | /r/developersIndia | 2023-08-18

deepsparse

21 2,902 9.4 Python

Sparsity-aware deep learning inference runtime for CPUs

Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23

Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse

optimum

8 2,225 9.4 Python

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02

Shout out to Huggingface's Optimum – which made it easier to quantize models.

DeepSpeed-MII

6 1,693 8.6 Python

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
transformer-deploy

8 1,626 6.8 Python

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
budgetml

4 1,334 0.0 Python

Deploy a ML inference service on a budget in less than 10 lines of code.
BERT-NER

1 1,182 0.0 Python

Pytorch-Named-Entity-Recognition-with-BERT
uform

8 913 9.2 Python

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25

question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?

GenossGPT

1 743 8.7 Python

One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.

Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17

hidet

3 624 8.6 Python

An open-source efficient deep learning framework/compiler, written in python.

Project mention: karpathy/llm.c | news.ycombinator.com | 2024-04-08

Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).
[1] https://github.com/hidet-org/hidet

filetype.py

1 613 5.0 Python

Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
pinferencia

21 558 0.0 Python

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
fastT5

5 540 0.0 Python

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
emlearn

5 446 9.2 Python

Machine Learning inference engine for Microcontrollers and Embedded devices
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Inference related posts

Rete Algorithm

10 projects | news.ycombinator.com | 27 May 2024
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow

7 projects | dev.to | 29 Apr 2024
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data

1 project | news.ycombinator.com | 25 Apr 2024
Multimodal Embeddings for JavaScript, Swift, and Python

1 project | news.ycombinator.com | 25 Apr 2024
FLaNK AI-April 22, 2024

28 projects | dev.to | 22 Apr 2024
Hugging Face reverts the license back to Apache 2.0

1 project | news.ycombinator.com | 8 Apr 2024
Apple Explores Home Robotics as Potential 'Next Big Thing'

3 projects | news.ycombinator.com | 4 Apr 2024
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Inference projects in Python? This list will help you:

	Project	Stars
1	ColossalAI	38,081
2	DeepSpeed	33,122
3	vllm	20,017
4	faster-whisper	9,424
5	text-generation-inference	8,098
6	server	7,509
7	adversarial-robustness-toolbox	4,523
8	torch2trt	4,432
9	open_model_zoo	3,976
10	AutoGPTQ	3,906
11	deepsparse	2,902
12	optimum	2,225
13	DeepSpeed-MII	1,693
14	transformer-deploy	1,626
15	budgetml	1,334
16	BERT-NER	1,182
17	uform	913
18	GenossGPT	743
19	hidet	624
20	filetype.py	613
21	pinferencia	558
22	fastT5	540
23	emlearn	446