SaaSHub helps you find the best software and product alternatives Learn more →
Top 9 Rust Big Data Projects
-
risingwave
SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.
-
quickwit
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
-
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
renoir
Reactive Network of Operators In Rust. Framework for Parallel and distributed computation inspired from the DataFlow model
Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave
Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441
Project mention: Pg_lakehouse: Query Any Data Lake from Postgres | news.ycombinator.com | 2024-05-13We're actually using pyiceberg to retrieve metadata! All our IO and decoding happens in the rust side once the data has been passthrough.
We expose something called a ScanOperator which allows integration into various catalogs through a thin layer that exposes ScanTasks.
Iceberg's impl: https://github.com/Eventual-Inc/Daft/blob/416009138359a9d410...
sorry thats https://matano.dev
Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista
Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19
Since joining ReductStore's project, I've been exploring alternative solutions to get a better understanding about how the project fits into current echosystem.
Rust Big Data related posts
-
Pg_lakehouse: Query Any Data Lake from Postgres
-
Velox: Meta's Unified Execution Engine [pdf]
-
Apache Arrow DataFusion
-
Ballista (Rust) vs Apache Spark. A Tale of Woe.
-
GlareDB: An open source SQL database to query and analyze distributed data
-
Evolution and Trends of Data Engineering 2022/23
-
Polars: Computing a new column from multiple columns - there must be a better way
-
A note from our sponsor - SaaSHub
www.saashub.com | 26 May 2024
Index
What are some of the best open-source Big Data projects in Rust? This list will help you:
Project | Stars | |
---|---|---|
1 | risingwave | 6,407 |
2 | quickwit | 6,244 |
3 | datafusion | 5,233 |
4 | Daft | 1,776 |
5 | matano | 1,367 |
6 | datafusion-ballista | 1,318 |
7 | blaze | 908 |
8 | ReductStore | 144 |
9 | renoir | 49 |
Sponsored