Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 ML Open-Source Projects
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
-
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Rebuilding TensorFlow 2.8.4 on Ubuntu 22.04 to patch vulnerabilities | dev.to | 2024-06-02The official 2.8.4 container was published in Nov 2022. That's 1.5 years of OS updates at least. I looked up the 2.8.4 source and found that it's using Ubuntu 20.04 as the base OS. Of note, we're using the x86_64 architecture according to the container image layer: ENV NVARCH=x86_64.
- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:
Docs
use handy visualizers, for example, https://netron.app/
Project mention: How to build your Developer Portfolio with MindsDB: The symbiotic relationship between developers and Opensource in 2024. | dev.to | 2024-05-23Developers are able to check for issues to fix on MindsDB’s Github Issues Page. The issues are marked with labels which indicate what you can work on,which you can find here. Fixing bugs showcases that you are a problem solver and capable of resolving issues. Companies find this capability very valuable as it has an impact on the quality of their product and user experience.
Project mention: Mlflow: Open-source platform for the machine learning lifecycle | news.ycombinator.com | 2024-05-16
Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...
Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!
Project mention: [D][R] Deploying deep models on memory constrained devices | /r/MachineLearning | 2023-10-03However, I am looking on this subject through the problem of training/finetuning deep models on the edge devices, being increasingly available thing to do. Looking at tflite, alibaba's MNN, mit-han-lab's tinyengine etc..
Be careful with unstructured:
https://github.com/Unstructured-IO/unstructured/blob/d11c70c...
from: https://github.com/open-webui/open-webui/issues/687
Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving
Project mention: Open-sourcing a simple automation/agent workflow builder | /r/ChatGPTPro | 2023-10-07We're open-sourcing a project that lets you build simple automations/agent workflows that use LLMs for different tasks. Kinda like Zapier or IFTTT but focused on using natural language to accomplish your tasks.It's super early but we'd love to start getting feedback to steer it in the right direction. It currently supports OpenAI and local models through llm.
ML related posts
-
How to build your Developer Portfolio with MindsDB: The symbiotic relationship between developers and Opensource in 2024.
-
Mlflow: Open-source platform for the machine learning lifecycle
-
Show HN: LLM-powered NPCs running on your hardware
-
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
-
Machine Learning with PHP
-
Show HN: Open-source Google Docs for audio transcriptions (Whisper)
-
What’s the Difference Between Fine-tuning, Retraining, and RAG?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 2 Jun 2024
Index
What are some of the best open-source ML projects? This list will help you:
Project | Stars | |
---|---|---|
1 | tensorflow | 183,162 |
2 | ML-For-Beginners | 67,497 |
3 | yolov5 | 47,719 |
4 | netron | 26,489 |
5 | handson-ml | 25,111 |
6 | MindsDB | 21,531 |
7 | MLflow | 17,475 |
8 | StableLM | 15,859 |
9 | best-of-ml-python | 15,869 |
10 | kubeflow | 13,778 |
11 | awesome-mlops | 11,865 |
12 | ludwig | 10,893 |
13 | dopamine | 10,397 |
14 | ML.NET | 8,879 |
15 | pycaret | 8,553 |
16 | MNN | 8,373 |
17 | deeplake | 7,799 |
18 | metaflow | 7,688 |
19 | unstructured | 7,017 |
20 | CoreML-Models | 6,274 |
21 | serving | 6,101 |
22 | llm | 5,980 |
23 | oneflow | 5,759 |
Sponsored