Python Data Science

Open-source Python projects categorized as Data Science

Top 23 Python Data Science Projects

  • Keras

    Deep Learning for humans

  • Project mention: Library for Machine learning and quantum computing | dev.to | 2024-04-27

    Keras

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | dev.to | 2024-05-05

    Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: https://scikit-learn.org/ Kaggle Learn: https://www.kaggle.com/learn Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Project mention: The ultimate guide to creating a secure Python package | dev.to | 2024-05-08

    It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

    Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

  • streamlit

    Streamlit — A faster way to build and share data apps.

  • Project mention: Developing a Generic Streamlit UI to Test Amazon Bedrock Agents | dev.to | 2024-05-05

    I decided to use Streamlit to build the UI as it is a popular and fitting choice. Streamlit is an open-source Python library used for building interactive web applications specially for AI and data applications. Since the application code is written only in Python, it is easy to learn and build with.

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

  • Project mention: Ray: Unified framework for scaling AI and Python applications | news.ycombinator.com | 2024-05-03
  • gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

  • Project mention: AI enthusiasm #9 - A multilingual chatbot📣🈸 | dev.to | 2024-05-01

    gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: How I discovered Named Entity Recognition while trying to remove gibberish from a string. | dev.to | 2024-05-06
  • pytorch-lightning

    Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

  • Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29

    It's very easy to get started, right in your Terminal, no fees! No credit card at all.

    And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.

    You don't need OpenAI - nobody does.

  • data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • dash

    Data Apps & Dashboards for Python. No JavaScript Required.

  • Project mention: dash VS solara - a user suggested alternative | libhunt.com/r/dash | 2023-10-13
  • matplotlib

    matplotlib: plotting with Python

  • Project mention: How and where is matplotlib package making use of PySide? | /r/learnpython | 2023-12-07
  • recommenders

    Best Practices on Recommendation Systems

  • Project mention: My kernel dies when I fit my LightFm model from Microsoft Recommenders | /r/Jupyter | 2023-06-16
  • ipython

    Official repository for IPython itself. Other repos in the IPython organization contain things like the website, documentation builds, etc.

  • Project mention: The new pdbp (Pdb+) Python debugger! | dev.to | 2023-08-02

    If you’re already using ipython, this isn’t a problem because you’ll already need to download most of these dependencies anyway. But if you’re not using ipython… you’ll still need to download those dependencies.

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  • gensim

    Topic Modelling for Humans

  • Project mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
  • Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

  • Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
  • nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

  • dvc

    🦉 ML Experiments and Data Management with Git

  • Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

    Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • seaborn

    Statistical data visualization in Python

  • Project mention: "No" is not an actionable error message | news.ycombinator.com | 2024-05-03
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Science related posts

  • Lessons learned reinventing the Python notebook

    3 projects | news.ycombinator.com | 11 May 2024
  • AI Strategy Guide: How to Scale AI Across Your Business

    4 projects | dev.to | 11 May 2024
  • How I discovered Named Entity Recognition while trying to remove gibberish from a string.

    1 project | dev.to | 6 May 2024
  • How to Build a Logistic Regression Model: A Spam-filter Tutorial

    1 project | dev.to | 5 May 2024
  • "No" is not an actionable error message

    1 project | news.ycombinator.com | 3 May 2024
  • Cold-(Brew) Outreach: Landing my first big client at a coffee shop

    1 project | news.ycombinator.com | 30 Apr 2024
  • PySheets – Spreadsheet UI for Python

    3 projects | news.ycombinator.com | 28 Apr 2024
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 17 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Data Science projects in Python? This list will help you:

Project Stars
1 Keras 61,044
2 scikit-learn 58,265
3 Pandas 42,104
4 Airflow 34,705
5 streamlit 32,051
6 Ray 31,414
7 gradio 29,400
8 spaCy 28,849
9 pytorch-lightning 27,064
10 data-science-ipython-notebooks 26,532
11 ML-From-Scratch 23,260
12 d2l-en 21,858
13 dash 20,583
14 matplotlib 19,382
15 recommenders 18,085
16 ipython 16,146
17 best-of-ml-python 15,633
18 gensim 15,289
19 Prefect 14,780
20 nni 13,777
21 dvc 13,189
22 ydata-profiling 12,085
23 seaborn 11,994

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com