data-analytics

Top 23 data-analytic Open-Source Projects

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15

    We are looking at moving our Power BI stuff to Apache Superset [1]. How does this compare to Superset?

    [1] https://superset.apache.org/

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • danfojs

    Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

  • lightdash

    Self-serve BI to 10x your data team ⚡️

  • Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15

    How is this different from Lightdash? https://github.com/lightdash/lightdash

  • lance

    Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

  • Project mention: The Nimble File Format by Meta | news.ycombinator.com | 2024-04-25
  • diffgram

    The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.

  • zui

    Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dremio-oss

    Dremio - the missing link in modern data

  • insights

    Open Source Self-Hosted Business Intelligence Platform

  • data-science-with-ruby

    Practical Data Science with Ruby based tools.

  • Data-Analyst-Roadmap

    I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge

  • isp-data-pollution

    ISP Data Pollution to Protect Private Browsing History with Obfuscation

  • bitcoin-etl

    ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ

  • ethereum-etl-airflow

    Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee

  • Project mention: ethereum-etl-airflow: NEW Data - star count:358.0 | /r/algoprojects | 2023-07-10
  • ActivitySchema

    Repository for the ActivitySchema spec and supporting materials

  • desbordante-core

    Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

  • Project mention: Show HN: Desbordante 1.0.0 Released | news.ycombinator.com | 2023-12-11
  • tellery

    Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.

  • traffic

    A toolbox for processing and analysing air traffic data (by xoolive)

  • Project mention: FLaNK AI for 11 March 2024 | dev.to | 2024-03-11
  • data-drift

    Metrics Observability & Troubleshooting

  • Project mention: Open-Source Observability for the Semantic Layer | news.ycombinator.com | 2024-01-16

    Think of Datadrift as a simple & open-source Monte Carlo for the semantic layer era. The repo is at https://github.com/data-drift/data-drift

    Datadrift started as an internal tool built at our former company, a large European B2B Fintech. We had data reliability challenges impacting key metrics used for financial and regulatory reporting.

    However, when we tried existing data quality tools we where always frustrated. They provide row-level static testing (eg. uniqueness or nullness) which does not address time-varying metrics like revenues. And commercial observability solutions costs $manyK a month and brings compliance and security overhead.

    We designed Datadrift to solve these problems. Datadrift works by simply adding a monitor where your metric is computed. It then understands how your metric is computed and on which upstream tables it depends. When an issue occurs, it pinpoints exactly which rows have been updated and introducing the change.

    You can also set up alerting and customise it. For example, you can decide to open and assign an Github issue to the analyst owning the revenue metric when a +10% change is detected. We tried to make it easy to customise and developer friendly.

    We are thinking of adding features around root cause analysis automation/issues pattern analysis to help data teams improve metrics quality overtime. We’d love to hear your feature requests.

    Datadrift is built with Python and Go, and licensed under GPL. Our docs are here: https://github.com/data-drift/data-drift?tab=readme-ov-file#...

    Dev set up and demo : https://app.claap.io/sammyt/drift-db-demo-a18-c-ApwBh9kt4p-0...

    We’re very eager to get your feedback!

  • SQL-for-Data-Analytics

    Perform fast and efficient data analysis with the power of SQL

  • dsensei

    AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.

  • Project mention: Show HN: Dsensei, pinpoint the root cause of metric change in one minute | news.ycombinator.com | 2023-08-03
  • Morpheus

    The foundational library of the Morpheus data science framework

  • snowpark-python

    Snowflake Snowpark Python API

  • Project mention: Show HN: SQLFrame – I ran PySpark without Spark on a SQL database | news.ycombinator.com | 2024-05-20

    This is cool and in my mind super useful for migrations.

    It seems the main benefit of using something like that in daily life is that it's more convenient to generate complex SQL statements (like pivoting a table with a lot of columns).

    However, I never really liked the PySpark dataframe api and looking at the code examples, SQL has the same visual complexity.

    Snowflake has built something similar (just for Snowflake) SnowPark [1]. Here one promoted benefit was that you could also inject native Python function and "extend" the SQL dialect. However, I don't think it really took off.

    [1] https://github.com/snowflakedb/snowpark-python

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-analytics related posts

  • Show HN: SQLFrame – I ran PySpark without Spark on a SQL database

    2 projects | news.ycombinator.com | 20 May 2024
  • Show HN: Open-source BI and analytics for engineers

    6 projects | news.ycombinator.com | 15 May 2024
  • Open-Source Observability for the Semantic Layer

    2 projects | news.ycombinator.com | 16 Jan 2024
  • Show HN: Desbordante 1.0.0 Released

    1 project | news.ycombinator.com | 11 Dec 2023
  • Explainable (Structured) Machine Learning Algorithm

    1 project | /r/Python | 5 Dec 2023
  • Would learn Go to contribute to an OS project ? Or should I stick to python ?

    1 project | /r/dataengineering | 29 Nov 2023
  • public-datasets: NEW Data - star count:181.0

    1 project | /r/algoprojects | 20 Nov 2023
  • A note from our sponsor - SurveyJS
    surveyjs.io | 22 May 2024
    With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js. Learn more →

Index

What are some of the best open-source data-analytic projects? This list will help you:

Project Stars
1 superset 59,473
2 awesome-bigdata 12,845
3 danfojs 4,667
4 lightdash 3,479
5 lance 3,328
6 diffgram 1,801
7 zui 1,743
8 dremio-oss 1,306
9 insights 1,057
10 data-science-with-ruby 695
11 Data-Analyst-Roadmap 582
12 isp-data-pollution 566
13 bitcoin-etl 388
14 ethereum-etl-airflow 386
15 ActivitySchema 373
16 desbordante-core 357
17 tellery 352
18 traffic 347
19 data-drift 302
20 SQL-for-Data-Analytics 252
21 dsensei 251
22 Morpheus 235
23 snowpark-python 231

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com