Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 scikit-learn Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
python-machine-learning-book
The "Python Machine Learning (1st edition)" book code repository and info resource
-
machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
-
superduperdb
🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.
-
python-machine-learning-book-3rd-edition
The "Python Machine Learning (3rd edition)" book code repository
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:
5. Avik-Jain/100-Days-Of-ML-Code - As the name implies, this repository offers a structured approach to learning machine learning with Python. It covers core ML principles and algorithms through real-world applications. https://github.com/Avik-Jain/100-Days-Of-ML-Code
Project mention: About Data analyst, data scientist and data engineer, resources and experiences | dev.to | 2024-03-26Python Data Science Handbook
Project mention: New exponent functions that make SiLU and SoftMax 2x faster, at full acc | news.ycombinator.com | 2024-05-15
Project mention: Featuretools – A Python Library for Automated Feature Engineering | news.ycombinator.com | 2023-09-20
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
Project mention: Pyenv – lets you easily switch between multiple versions of Python | news.ycombinator.com | 2024-03-25We use Pyenv successfully for developing the Flower open-source project. We use a few simple Bash scripts to manage virtual environments with different Python versions via pyenv and the pyenv-virtualenv plugin.
The main scripts are `venv-create.sh`, `venv-delete.sh` and `bootstrap.sh`. `venv-reset.sh` pulls these three scripts together to make reinstalling your venv a single command.
Here's the link if anyone is interested: https://github.com/adap/flower/tree/main/dev
I really like the simplicity of this framework, and they hit on a lot of common problems found in other agent-based frameworks. Most intrigued by the RAG improvements.
Seems like Microsoft was frustrated with the pace of movement in this space and the shitty results of agents (which admittedly kept my interest turned away from agents for the last few months). I'm interested again because it makes practical sense, and from looking at the example notebooks, seems fairly easy to integrate into existing applications.
Maybe this is the 'low code' approach that might actually work, and bridge together engineering and non-engineering resources.
This example was what caught my eye: https://github.com/microsoft/FLAML/blob/main/notebook/autoge...
scikit-learn related posts
-
About Data analyst, data scientist and data engineer, resources and experiences
-
Show HN: Logistic Regression Training on Encrypted Data with FHE
-
Implementing a ChatGPT-like LLM from scratch, step by step
-
Training ML Models on Encrypted Data with Homomorphic Encryption (FHE)
-
AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though
-
Tradero: A tool for achieving self-funding via trading
-
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 May 2024
Index
What are some of the best open-source scikit-learn projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ML-For-Beginners | 67,267 |
2 | 100-Days-Of-ML-Code | 43,599 |
3 | PythonDataScienceHandbook | 41,635 |
4 | data-science-ipython-notebooks | 26,532 |
5 | handson-ml | 25,099 |
6 | best-of-ml-python | 15,672 |
7 | onnxruntime | 12,894 |
8 | python-machine-learning-book | 12,076 |
9 | Dask | 12,055 |
10 | mlcourse.ai | 9,454 |
11 | sktime | 7,454 |
12 | auto-sklearn | 7,422 |
13 | autogluon | 7,181 |
14 | featuretools | 7,064 |
15 | interpret | 6,022 |
16 | skorch | 5,648 |
17 | orange | 4,626 |
18 | machine_learning_complete | 4,520 |
19 | superduperdb | 4,415 |
20 | python-machine-learning-book-3rd-edition | 4,386 |
21 | flower | 4,251 |
22 | yellowbrick | 4,206 |
23 | FLAML | 3,701 |
Sponsored