SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Data Science Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
-
applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
-
awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:
Keras
Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.
https://www.youtube.com/watch?v=RY0SSvSUkMA
https://github.com/apache/superset/discussions/20094
Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Project mention: AWS Serverless Diversity: Multi-Language Strategies for Optimal Solutions | dev.to | 2024-04-28Python is a natural fit for serverless development. It boasts a vast array of libraries, including Powertools for AWS and robust libraries for data engineers. Its versatility and excellent developer experience make it a top choice for serverless projects, offering a seamless and enjoyable development experience.
Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13Made With ML
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
Project mention: Creando Subtítulos Automáticos para Vídeos con Python, Faster-Whisper, FFmpeg, Streamlit, Pillow | dev.to | 2024-04-29Streamlit (https://streamlit.io/)
22. Ray | Github | tutorial
gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)
Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):
**[I.am.ai AI Expert Roadmap](https://i.am.ai/roadmap)**: This roadmap focuses more on AI and includes various aspects of machine learning and deep learning. It's suitable for those who want to delve deeper into AI, particularly in cutting-edge research and applications.
Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29It's very easy to get started, right in your Terminal, no fees! No credit card at all.
And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.
You don't need OpenAI - nobody does.
Get started with Data Science in the Data Science for Beginners curricula.
Project mention: Probabilistic Programming and Bayesian Methods for Hackers (2013) | news.ycombinator.com | 2024-02-10
Project mention: About Data analyst, data scientist and data engineer, resources and experiences | dev.to | 2024-03-26Awesome Data Science by Academic
Project mention: The fastai book, published as Jupyter Notebooks | news.ycombinator.com | 2024-01-17
Project mention: How and where is matplotlib package making use of PySide? | /r/learnpython | 2023-12-07
Data Science related posts
-
Cold-(Brew) Outreach: Landing my first big client at a coffee shop
-
PySheets – Spreadsheet UI for Python
-
Building an Email Assistant Application with Burr
-
My Favorite DevTools to Build AI/ML Applications!
-
Release: Keras 3.3.0
-
Runhouse
-
Frawk: An efficient Awk-like programming language. (2021)
-
A note from our sponsor - SaaSHub
www.saashub.com | 2 May 2024
Index
What are some of the best open-source Data Science projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ML-For-Beginners | 66,908 |
2 | Keras | 60,937 |
3 | superset | 58,852 |
4 | scikit-learn | 58,130 |
5 | Pandas | 41,983 |
6 | Made-With-ML | 35,656 |
7 | Airflow | 34,485 |
8 | streamlit | 31,717 |
9 | Ray | 31,101 |
10 | gradio | 28,987 |
11 | spaCy | 28,751 |
12 | AI-Expert-Roadmap | 28,418 |
13 | pytorch-lightning | 26,883 |
14 | data-science-ipython-notebooks | 26,490 |
15 | Data-Science-For-Beginners | 26,392 |
16 | Probabilistic-Programming-and-Bayesian-Methods-for-Hackers | 26,362 |
17 | applied-ml | 25,984 |
18 | awesome-datascience | 23,777 |
19 | ML-From-Scratch | 23,189 |
20 | d2l-en | 21,704 |
21 | fastbook | 20,749 |
22 | dash | 20,502 |
23 | matplotlib | 19,262 |
Sponsored