Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 23 Python Inference Projects
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)
-
adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
-
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
-
uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
-
GenossGPT
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
-
filetype.py
Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
-
pinferencia
Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Project mention: Self-hosted offline transcription and diarization service with LLM summary | news.ycombinator.com | 2024-05-26I've been using this:
https://github.com/bugbakery/transcribee
It's noticeably work-in-progress but it does the job and has a nice UI to edit transcriptions and speakers etc.
It's running on the CPU for me, would be nice to have something that can make use of a 4GB Nvidia GPU, which faster-whisper is actually able to [1]
https://github.com/SYSTRAN/faster-whisper?tab=readme-ov-file...
Project mention: Fast Llama 2 on CPUs with Sparse Fine-Tuning and DeepSparse | news.ycombinator.com | 2023-11-23Interesting company. Yannic Kilcher interviewed Nir Shavit last year and they went into some depth: https://www.youtube.com/watch?v=0PAiQ1jTN5k DeepSparse is on GitHub: https://github.com/neuralmagic/deepsparse
Project mention: FastEmbed: Fast and Lightweight Embedding Generation for Text | dev.to | 2024-02-02Shout out to Huggingface's Optimum – which made it easier to quantize models.
Project mention: CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data | news.ycombinator.com | 2024-04-25question: any good on-device size image embedding models?
tried https://github.com/unum-cloud/uform which i do like, especially they also support languages other than English. Any recommendations on other alternatives?
Project mention: Drop-in replacement for the OpenAI API based on open source LLMs | news.ycombinator.com | 2024-01-17
Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).
[1] https://github.com/hidet-org/hidet
Python Inference related posts
-
Rete Algorithm
-
Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow
-
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data
-
Multimodal Embeddings for JavaScript, Swift, and Python
-
FLaNK AI-April 22, 2024
-
Hugging Face reverts the license back to Apache 2.0
-
Apple Explores Home Robotics as Potential 'Next Big Thing'
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024
Index
What are some of the best open-source Inference projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ColossalAI | 38,081 |
2 | DeepSpeed | 33,122 |
3 | vllm | 20,017 |
4 | faster-whisper | 9,424 |
5 | text-generation-inference | 8,098 |
6 | server | 7,509 |
7 | adversarial-robustness-toolbox | 4,523 |
8 | torch2trt | 4,432 |
9 | open_model_zoo | 3,976 |
10 | AutoGPTQ | 3,906 |
11 | deepsparse | 2,902 |
12 | optimum | 2,225 |
13 | DeepSpeed-MII | 1,693 |
14 | transformer-deploy | 1,626 |
15 | budgetml | 1,334 |
16 | BERT-NER | 1,182 |
17 | uform | 913 |
18 | GenossGPT | 743 |
19 | hidet | 624 |
20 | filetype.py | 613 |
21 | pinferencia | 558 |
22 | fastT5 | 540 |
23 | emlearn | 446 |
Sponsored