With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js. Learn more →
Top 23 data-analytic Open-Source Projects
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
danfojs
Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
-
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
-
diffgram
The AI Datastore for Schemas, BLOBs, and Predictions. Use with your apps or integrate built-in Human Supervision, Data Workflow, and UI Catalog to get the most value out of your AI Data.
-
zui
Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Data-Analyst-Roadmap
I am sharing my Journey of 66DaysofData into Data Analytics by participating in Ken Jee's #66daysofdata challenge
-
bitcoin-etl
ETL scripts for Bitcoin, Litecoin, Dash, Zcash, Doge, Bitcoin Cash. Available in Google BigQuery https://goo.gl/oY5BCQ
-
ethereum-etl-airflow
Airflow DAGs for exporting, loading, and parsing the Ethereum blockchain data. How to get any Ethereum smart contract into BigQuery https://towardsdatascience.com/how-to-get-any-ethereum-smart-contract-into-bigquery-in-8-mins-bab5db1fdeee
-
desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
-
tellery
Tellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
-
dsensei
AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15We are looking at moving our Power BI stuff to Apache Superset [1]. How does this compare to Superset?
[1] https://superset.apache.org/
Project mention: Show HN: Open-source BI and analytics for engineers | news.ycombinator.com | 2024-05-15How is this different from Lightdash? https://github.com/lightdash/lightdash
Project mention: Open-Source Observability for the Semantic Layer | news.ycombinator.com | 2024-01-16Think of Datadrift as a simple & open-source Monte Carlo for the semantic layer era. The repo is at https://github.com/data-drift/data-drift
Datadrift started as an internal tool built at our former company, a large European B2B Fintech. We had data reliability challenges impacting key metrics used for financial and regulatory reporting.
However, when we tried existing data quality tools we where always frustrated. They provide row-level static testing (eg. uniqueness or nullness) which does not address time-varying metrics like revenues. And commercial observability solutions costs $manyK a month and brings compliance and security overhead.
We designed Datadrift to solve these problems. Datadrift works by simply adding a monitor where your metric is computed. It then understands how your metric is computed and on which upstream tables it depends. When an issue occurs, it pinpoints exactly which rows have been updated and introducing the change.
You can also set up alerting and customise it. For example, you can decide to open and assign an Github issue to the analyst owning the revenue metric when a +10% change is detected. We tried to make it easy to customise and developer friendly.
We are thinking of adding features around root cause analysis automation/issues pattern analysis to help data teams improve metrics quality overtime. We’d love to hear your feature requests.
Datadrift is built with Python and Go, and licensed under GPL. Our docs are here: https://github.com/data-drift/data-drift?tab=readme-ov-file#...
Dev set up and demo : https://app.claap.io/sammyt/drift-db-demo-a18-c-ApwBh9kt4p-0...
We’re very eager to get your feedback!
Project mention: Show HN: Dsensei, pinpoint the root cause of metric change in one minute | news.ycombinator.com | 2023-08-03
Project mention: Show HN: SQLFrame – I ran PySpark without Spark on a SQL database | news.ycombinator.com | 2024-05-20This is cool and in my mind super useful for migrations.
It seems the main benefit of using something like that in daily life is that it's more convenient to generate complex SQL statements (like pivoting a table with a lot of columns).
However, I never really liked the PySpark dataframe api and looking at the code examples, SQL has the same visual complexity.
Snowflake has built something similar (just for Snowflake) SnowPark [1]. Here one promoted benefit was that you could also inject native Python function and "extend" the SQL dialect. However, I don't think it really took off.
[1] https://github.com/snowflakedb/snowpark-python
data-analytics related posts
-
Show HN: SQLFrame – I ran PySpark without Spark on a SQL database
-
Show HN: Open-source BI and analytics for engineers
-
Open-Source Observability for the Semantic Layer
-
Show HN: Desbordante 1.0.0 Released
-
Explainable (Structured) Machine Learning Algorithm
-
Would learn Go to contribute to an OS project ? Or should I stick to python ?
-
public-datasets: NEW Data - star count:181.0
-
A note from our sponsor - SurveyJS
surveyjs.io | 22 May 2024
Index
What are some of the best open-source data-analytic projects? This list will help you:
Project | Stars | |
---|---|---|
1 | superset | 59,473 |
2 | awesome-bigdata | 12,845 |
3 | danfojs | 4,667 |
4 | lightdash | 3,479 |
5 | lance | 3,328 |
6 | diffgram | 1,801 |
7 | zui | 1,743 |
8 | dremio-oss | 1,306 |
9 | insights | 1,057 |
10 | data-science-with-ruby | 695 |
11 | Data-Analyst-Roadmap | 582 |
12 | isp-data-pollution | 566 |
13 | bitcoin-etl | 388 |
14 | ethereum-etl-airflow | 386 |
15 | ActivitySchema | 373 |
16 | desbordante-core | 357 |
17 | tellery | 352 |
18 | traffic | 347 |
19 | data-drift | 302 |
20 | SQL-for-Data-Analytics | 252 |
21 | dsensei | 251 |
22 | Morpheus | 235 |
23 | snowpark-python | 231 |
Sponsored