Top 23 Mlops Open-Source Projects

Made-With-ML

51 36,087 6.8 Jupyter Notebook

Learn how to design, develop, deploy and iterate on production-grade ML applications.

Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13

Made With ML

Airflow

170 34,877 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
jina

126 20,235 9.1 Python

☁️ Build multimodal AI applications with cloud-native stack

Project mention: Jina.ai: Self-host Multimodal models | news.ycombinator.com | 2024-01-26

vllm

31 20,017 9.9 Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30

I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.

qdrant

142 18,326 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

AgentCloud uses Qdrant as the vector store to efficiently store and manage large sets of vector embeddings. For a given user query the RAG application fetches relevant documents from vector store by analyzing how similar their vector representation is compared to the query vector.

label-studio

50 16,886 9.8 JavaScript

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Project mention: Annotation is dead | dev.to | 2024-04-26

If instead you have a cohort on hand — -i.e., you do not want to send your data to a third party for any reason, or perhaps you have energetic undergrads — -then you could alternatively consider local, open-source annotation such as CVAT and Label Studio. Finally, nowadays, you might instead work with Large Multimodal Models to have them annotate your data; more on this awkward angle later.

awesome-production-machine-learning

9 16,485 7.5

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13

One trove of treasures is the awesome-production-machine-learning repository on GitHub. This curated list provides a multitude of frameworks, libraries, and software designed to facilitate various stages of the ML lifecycle.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
argo

43 14,415 9.8 Go

Workflow Engine for Kubernetes

Project mention: StackStorm – IFTTT for Ops | news.ycombinator.com | 2023-11-05

Like Argo Workflows?
https://github.com/argoproj/argo-workflows

nni

5 13,813 4.5 Python

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
awesome-mlops

24 11,865 5.2

A curated list of references for MLOps
dagster

47 10,468 10.0 Python

An orchestration platform for the development, production, and observation of data assets.

Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

Weaviate

77 9,865 10.0 Go

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Project mention: Weaviate – A cloud-native, open-source vector database | news.ycombinator.com | 2024-05-07

amazon-sagemaker-examples

17 9,587 8.9 Jupyter Notebook

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

Project mention: Thesis Project Help Using SageMaker Free Tier | /r/aws | 2023-09-23

I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb

great_expectations

15 9,567 9.9 Python

Always know what to expect from your data.
Kedro

29 9,409 9.7 Python

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).

mlops-zoomcamp

23 10,365 7.5 Jupyter Notebook

Free MLOps course from DataTalks.Club

Project mention: Where do I start to learn MLOPS? | /r/mlops | 2023-07-01

There is MLOps Zoomcamp course (which shows end-to-end MLOps process with open-source MLOps tools) https://github.com/DataTalksClub/mlops-zoomcamp.

Taipy

16 9,282 9.9 Python

Turns Data and AI algorithms into production-ready web applications in no time.

Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26

Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.

machine-learning-systems-design

7 8,346 0.0 HTML

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"

Project mention: Any recent E4 Meta onsite experiences? | /r/leetcode | 2023-12-07

huyenchip.com/machine-learning-systems-design/toc.html - another nice but compact resource

wandb

16 8,354 9.9 Python

🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.

Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05

Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.

deeplake

13 7,799 9.8 Python

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25

metaflow

24 7,688 9.2 Python

:rocket: Build and manage real-life ML, AI, and data science projects with ease!

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

BentoML

16 6,650 9.8 Python

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

Project mention: Who's hiring developer advocates? (December 2023) | dev.to | 2023-12-04

Link to GitHub -->

feast

8 5,312 9.5 Python

The Open Source Feature Store for Machine Learning

Project mention: What's Happening with Feast? | news.ycombinator.com | 2023-12-07

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Mlops related posts

AI leaderboards are no longer useful. It's time to switch to Pareto curves

1 project | news.ycombinator.com | 30 Apr 2024
Building an Email Assistant Application with Burr

6 projects | dev.to | 26 Apr 2024
Show HN: Evaluate LLM-based RAG Applications with automated test set generation

1 project | news.ycombinator.com | 11 Apr 2024
Show HN: Starwhale – An open source MLOps/LLMOps Platform

1 project | news.ycombinator.com | 30 Jan 2024
VLLM Sacrifices Accuracy for Speed

1 project | news.ycombinator.com | 23 Jan 2024
Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks

1 project | dev.to | 13 Jan 2024
Show HN: One-click machine learning deployment at scale on any cluster

1 project | news.ycombinator.com | 10 Jan 2024
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source Mlops projects? This list will help you:

	Project	Stars
1	Made-With-ML	36,087
2	Airflow	34,877
3	jina	20,235
4	vllm	20,017
5	qdrant	18,326
6	label-studio	16,886
7	awesome-production-machine-learning	16,485
8	argo	14,415
9	nni	13,813
10	awesome-mlops	11,865
11	dagster	10,468
12	Weaviate	9,865
13	amazon-sagemaker-examples	9,587
14	great_expectations	9,567
15	Kedro	9,409
16	mlops-zoomcamp	10,365
17	Taipy	9,282
18	machine-learning-systems-design	8,346
19	wandb	8,354
20	deeplake	7,799
21	metaflow	7,688
22	BentoML	6,650
23	feast	5,312

Mlops

Top 23 Mlops Open-Source Projects

Mlops related posts

AI leaderboards are no longer useful. It's time to switch to Pareto curves

Building an Email Assistant Application with Burr

Show HN: Evaluate LLM-based RAG Applications with automated test set generation

Show HN: Starwhale – An open source MLOps/LLMOps Platform

VLLM Sacrifices Accuracy for Speed

Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks

Show HN: One-click machine learning deployment at scale on any cluster

Index