Data Analysis

Top 23 Data Analysis Open-Source Projects

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15

    We are looking at moving our Power BI stuff to Apache Superset [1]. How does this compare to Superset?

    [1] https://superset.apache.org/

  • scikit-learn

    scikit-learn: machine learning in Python

  • Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | dev.to | 2024-05-05

    Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: https://scikit-learn.org/ Kaggle Learn: https://www.kaggle.com/learn Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Pandas

    Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

  • Project mention: The ultimate guide to creating a secure Python package | dev.to | 2024-05-08

    It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:

  • Metabase

    The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

  • Project mention: HackTheBox - Writeup Analytics | dev.to | 2024-03-30

    Remote Code Execution via H2

  • streamlit

    Streamlit — A faster way to build and share data apps.

  • Project mention: Developing a Generic Streamlit UI to Test Amazon Bedrock Agents | dev.to | 2024-05-05

    I decided to use Streamlit to build the UI as it is a popular and fitting choice. Streamlit is an open-source Python library used for building interactive web applications specially for AI and data applications. Since the application code is written only in Python, it is easy to learn and build with.

  • gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

  • Project mention: AI enthusiasm #9 - A multilingual chatbot📣🈸 | dev.to | 2024-05-01

    gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)

  • AI-Expert-Roadmap

    Roadmap to becoming an Artificial Intelligence Expert in 2022

  • Project mention: Best AI ML DL DS Roadmap | /r/deeplearning | 2023-12-07

    **[I.am.ai AI Expert Roadmap](https://i.am.ai/roadmap)**: This roadmap focuses more on AI and includes various aspects of machine learning and deep learning. It's suitable for those who want to delve deeper into AI, particularly in cutting-edge research and applications.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

  • Project mention: Welcome to 14 days of Data Science! | dev.to | 2024-03-07

    Get started with Data Science in the Data Science for Beginners curricula.

  • CyberChef

    The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

  • Project mention: PicoCTF 2024: packer | dev.to | 2024-04-05

    Then we take the encrypted text and use CyberChef to decrypt it.

  • GoAccess

    GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

  • Project mention: Ask HN: Interesting TUIs (text user interfaces), maybe forgotten ones? | news.ycombinator.com | 2024-05-06

    Not forgotten by any means but goaccess is nice and simple to use

    https://goaccess.io/

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

    AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • pandas-ai

    Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

  • Project mention: PandasAI is great but is there a more general library? | news.ycombinator.com | 2023-08-23
  • OpenRefine

    OpenRefine is a free, open source power tool for working with messy data and improving it

  • Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07

    "OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/

  • pandas_exercises

    Practice your pandas skills!

  • pygwalker

    PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

  • Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
  • statsmodels

    Statsmodels: statistical modeling and econometrics in Python

  • mlcourse.ai

    Open Machine Learning Course

  • Project mention: Open Machine Learning Course | news.ycombinator.com | 2023-10-22
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库 (by akfamily)

  • pyod

    A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)

  • Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13

    This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod

  • cudf

    cuDF - GPU DataFrame Library

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Data Analysis related posts

  • Ask HN: Why all these GitHub fake accounts starring my project

    1 project | news.ycombinator.com | 9 May 2024
  • The Birth of Parquet

    3 projects | news.ycombinator.com | 8 May 2024
  • The ultimate guide to creating a secure Python package

    4 projects | dev.to | 8 May 2024
  • How to Build a Logistic Regression Model: A Spam-filter Tutorial

    1 project | dev.to | 5 May 2024
  • PDEP-13: The Pandas Logical Type System

    1 project | news.ycombinator.com | 4 May 2024
  • Cold-(Brew) Outreach: Landing my first big client at a coffee shop

    1 project | news.ycombinator.com | 30 Apr 2024
  • Using DuckDB-WASM for In-Browser Data Engineering

    1 project | news.ycombinator.com | 30 Apr 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 19 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Data Analysis projects? This list will help you:

Project Stars
1 superset 59,473
2 scikit-learn 58,344
3 Pandas 42,104
4 Metabase 36,784
5 streamlit 32,222
6 gradio 29,400
7 AI-Expert-Roadmap 28,527
8 Data-Science-For-Beginners 26,583
9 CyberChef 25,819
10 GoAccess 17,585
11 best-of-ml-python 15,672
12 airbyte 14,296
13 ydata-profiling 12,101
14 pandas-ai 11,140
15 OpenRefine 10,527
16 pandas_exercises 10,276
17 pygwalker 9,930
18 statsmodels 9,591
19 mlcourse.ai 9,454
20 cleanlab 8,819
21 akshare 8,479
22 pyod 7,994
23 cudf 7,333

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com