SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Data Analysis Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
Metabase
The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
CyberChef
The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
-
GoAccess
GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
-
OpenRefine
OpenRefine is a free, open source power tool for working with messy data and improving it
-
cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15We are looking at moving our Power BI stuff to Apache Superset [1]. How does this compare to Superset?
[1] https://superset.apache.org/
Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | dev.to | 2024-05-05Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: https://scikit-learn.org/ Kaggle Learn: https://www.kaggle.com/learn Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!
It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:
Remote Code Execution via H2
Project mention: Developing a Generic Streamlit UI to Test Amazon Bedrock Agents | dev.to | 2024-05-05I decided to use Streamlit to build the UI as it is a popular and fitting choice. Streamlit is an open-source Python library used for building interactive web applications specially for AI and data applications. Since the application code is written only in Python, it is easy to learn and build with.
gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)
**[I.am.ai AI Expert Roadmap](https://i.am.ai/roadmap)**: This roadmap focuses more on AI and includes various aspects of machine learning and deep learning. It's suitable for those who want to delve deeper into AI, particularly in cutting-edge research and applications.
Get started with Data Science in the Data Science for Beginners curricula.
Then we take the encrypted text and use CyberChef to decrypt it.
Project mention: Ask HN: Interesting TUIs (text user interfaces), maybe forgotten ones? | news.ycombinator.com | 2024-05-06Not forgotten by any means but goaccess is nice and simple to use
https://goaccess.io/
Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.
Project mention: PandasAI is great but is there a more general library? | news.ycombinator.com | 2023-08-23
Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07"OpenRefine is a powerful free, open source tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." https://openrefine.org/
Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.
Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod
Data Analysis related posts
-
Ask HN: Why all these GitHub fake accounts starring my project
-
The Birth of Parquet
-
The ultimate guide to creating a secure Python package
-
How to Build a Logistic Regression Model: A Spam-filter Tutorial
-
PDEP-13: The Pandas Logical Type System
-
Cold-(Brew) Outreach: Landing my first big client at a coffee shop
-
Using DuckDB-WASM for In-Browser Data Engineering
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 May 2024
Index
What are some of the best open-source Data Analysis projects? This list will help you:
Project | Stars | |
---|---|---|
1 | superset | 59,473 |
2 | scikit-learn | 58,344 |
3 | Pandas | 42,104 |
4 | Metabase | 36,784 |
5 | streamlit | 32,222 |
6 | gradio | 29,400 |
7 | AI-Expert-Roadmap | 28,527 |
8 | Data-Science-For-Beginners | 26,583 |
9 | CyberChef | 25,819 |
10 | GoAccess | 17,585 |
11 | best-of-ml-python | 15,672 |
12 | airbyte | 14,296 |
13 | ydata-profiling | 12,101 |
14 | pandas-ai | 11,140 |
15 | OpenRefine | 10,527 |
16 | pandas_exercises | 10,276 |
17 | pygwalker | 9,930 |
18 | statsmodels | 9,591 |
19 | mlcourse.ai | 9,454 |
20 | cleanlab | 8,819 |
21 | akshare | 8,479 |
22 | pyod | 7,994 |
23 | cudf | 7,333 |
Sponsored