Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 23 Mlops Open-Source Projects
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
-
label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
-
awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
-
amazon-sagemaker-examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
machine-learning-systems-design
A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems"
-
wandb
🔥 A tool for visualizing and tracking your machine learning experiments. This repo contains the CLI and Python API.
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
BentoML
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13Made With ML
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
Project mention: AI leaderboards are no longer useful. It's time to switch to Pareto curves | news.ycombinator.com | 2024-04-30I guess the root cause of my claim is that OpenAI won't tell us whether or not GPT-3.5 is an MoE model, and I assumed it wasn't. Since GPT-3.5 is clearly nondeterministic at temp=0, I believed the nondeterminism was due to FPU stuff, and this effect was amplified with GPT-4's MoE. But if GPT-3.5 is also MoE then that's just wrong.
What makes this especially tricky is that small models are truly 100% deterministic at temp=0 because the relative likelihoods are too coarse for FPU issues to be a factor. I had thought 3.5 was big enough that some of its token probabilities were too fine-grained for the FPU. But that's probably wrong.
On the other hand, it's not just GPT, there are currently floating-point difficulties in vllm which significantly affect the determinism of any model run on it: https://github.com/vllm-project/vllm/issues/966 Note that a suggested fix is upcasting to float32. So it's possible that GPT-3.5 is using an especially low-precision float and introducing nondeterminism by saving money on compute costs.
Sadly I do not have the money[1] to actually run a test to falsify any of this. It seems like this would be a good little research project.
[1] Or the time, or the motivation :) But this stuff is expensive.
Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13AgentCloud uses Qdrant as the vector store to efficiently store and manage large sets of vector embeddings. For a given user query the RAG application fetches relevant documents from vector store by analyzing how similar their vector representation is compared to the query vector.
If instead you have a cohort on hand — -i.e., you do not want to send your data to a third party for any reason, or perhaps you have energetic undergrads — -then you could alternatively consider local, open-source annotation such as CVAT and Label Studio. Finally, nowadays, you might instead work with Large Multimodal Models to have them annotate your data; more on this awkward angle later.
Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13One trove of treasures is the awesome-production-machine-learning repository on GitHub. This curated list provides a multitude of frameworks, libraries, and software designed to facilitate various stages of the ML lifecycle.
Like Argo Workflows?
https://github.com/argoproj/argo-workflows
Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.
Project mention: Weaviate – A cloud-native, open-source vector database | news.ycombinator.com | 2024-05-07
I need to use AWS Sagemaker (required, can't use easier services) and my adviser gave me this document to start with: https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/question_answering_retrieval_augmented_generation/question_answering_langchain_jumpstart.ipynb
Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
There is MLOps Zoomcamp course (which shows end-to-end MLOps process with open-source MLOps tools) https://github.com/DataTalksClub/mlops-zoomcamp.
Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.
huyenchip.com/machine-learning-systems-design/toc.html - another nice but compact resource
Project mention: A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev | dev.to | 2024-02-05Weights & Biases — The developer-first MLOps platform. Build better models faster with experiment tracking, dataset versioning, and model management. Free tier for personal projects only, with 100 GB of storage included.
Link to GitHub -->
Mlops related posts
-
AI leaderboards are no longer useful. It's time to switch to Pareto curves
-
Building an Email Assistant Application with Burr
-
Show HN: Evaluate LLM-based RAG Applications with automated test set generation
-
Show HN: Starwhale – An open source MLOps/LLMOps Platform
-
VLLM Sacrifices Accuracy for Speed
-
Detect, Defend, Prevail: Payments Fraud Detection using ML & Deepchecks
-
Show HN: One-click machine learning deployment at scale on any cluster
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 1 Jun 2024
Index
What are some of the best open-source Mlops projects? This list will help you:
Project | Stars | |
---|---|---|
1 | Made-With-ML | 36,087 |
2 | Airflow | 34,877 |
3 | jina | 20,235 |
4 | vllm | 20,017 |
5 | qdrant | 18,326 |
6 | label-studio | 16,886 |
7 | awesome-production-machine-learning | 16,485 |
8 | argo | 14,415 |
9 | nni | 13,813 |
10 | awesome-mlops | 11,865 |
11 | dagster | 10,468 |
12 | Weaviate | 9,865 |
13 | amazon-sagemaker-examples | 9,587 |
14 | great_expectations | 9,567 |
15 | Kedro | 9,409 |
16 | mlops-zoomcamp | 10,365 |
17 | Taipy | 9,282 |
18 | machine-learning-systems-design | 8,346 |
19 | wandb | 8,354 |
20 | deeplake | 7,799 |
21 | metaflow | 7,688 |
22 | BentoML | 6,650 |
23 | feast | 5,312 |
Sponsored