Top 9 Rust Big Data Projects

risingwave

27 6,407 10.0 Rust

SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.

Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30

How does this compare to RisingWave and Materialize?
https://github.com/risingwavelabs/risingwave

quickwit

64 6,244 9.8 Rust

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.

Project mention: Show HN: Search on S3 Using AWS Lambda | news.ycombinator.com | 2024-01-21

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
datafusion

55 5,233 9.9 Rust

Apache DataFusion SQL Query Engine

Project mention: Velox: Meta's Unified Execution Engine [pdf] | news.ycombinator.com | 2024-03-25

Python's Substrait seems like the biggest/most-used competitor-ish out there. I'd love some compare & contrast; my sense is that Substrait has a smaller ambition, and more wants to be a language for talking about execution rather than a full on execution engine. https://github.com/substrait-io/substrait
We can also see from the DataFusion discussion that they too see themselves as a bit of a Velox competitor. https://github.com/apache/arrow-datafusion/discussions/6441

Daft

9 1,776 9.8 Rust

Distributed DataFrame for Python designed for the cloud, powered by Rust

Project mention: Pg_lakehouse: Query Any Data Lake from Postgres | news.ycombinator.com | 2024-05-13

We're actually using pyiceberg to retrieve metadata! All our IO and decoding happens in the rust side once the data has been passthrough.
We expose something called a ScanOperator which allows integration into various catalogs through a thin layer that exposes ScanTasks.
Iceberg's impl: https://github.com/Eventual-Inc/Daft/blob/416009138359a9d410...

matano

38 1,367 7.0 Rust

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

Project mention: Cisco Acquires Splunk | news.ycombinator.com | 2023-09-21

sorry thats https://matano.dev

datafusion-ballista

12 1,318 8.2 Rust

Apache Arrow Ballista Distributed Query Engine

Project mention: Polars | news.ycombinator.com | 2024-01-08

Not super on topic because this is all immature and not integrated with one another yet, but there is a scaled-out rust data-frames-on-arrow implementation called ballista that could maybe? form the backend of a polars scale out approach: https://github.com/apache/arrow-ballista

blaze

8 908 9.3 Rust

Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
ReductStore

45 144 9.3 Rust

A time series database for storing and managing large amounts of blob data

Project mention: How to Choose the Right MQTT Database | dev.to | 2024-05-17

Since joining ReductStore's project, I've been exploring alternative solutions to get a better understanding about how the project fits into current echosystem.

renoir

1 49 7.7 Rust

Reactive Network of Operators In Rust. Framework for Parallel and distributed computation inspired from the DataFlow model

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust Big Data related posts

Pg_lakehouse: Query Any Data Lake from Postgres

7 projects | news.ycombinator.com | 13 May 2024
Velox: Meta's Unified Execution Engine [pdf]

2 projects | news.ycombinator.com | 25 Mar 2024
Apache Arrow DataFusion

1 project | news.ycombinator.com | 1 Oct 2023
Ballista (Rust) vs Apache Spark. A Tale of Woe.

1 project | /r/dataengineering | 7 Jul 2023
GlareDB: An open source SQL database to query and analyze distributed data

4 projects | /r/dataengineering | 8 Jun 2023
Evolution and Trends of Data Engineering 2022/23

1 project | /r/dataengineering | 19 May 2023
Polars: Computing a new column from multiple columns - there must be a better way

1 project | /r/rust | 4 May 2023
A note from our sponsor - SaaSHub
www.saashub.com | 26 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Big Data projects in Rust? This list will help you:

	Project	Stars
1	risingwave	6,407
2	quickwit	6,244
3	datafusion	5,233
4	Daft	1,776
5	matano	1,367
6	datafusion-ballista	1,318
7	blaze	908
8	ReductStore	144
9	renoir	49

Rust Big Data

Top 9 Rust Big Data Projects

Rust Big Data related posts

Pg_lakehouse: Query Any Data Lake from Postgres

Velox: Meta's Unified Execution Engine [pdf]

Apache Arrow DataFusion

Ballista (Rust) vs Apache Spark. A Tale of Woe.

GlareDB: An open source SQL database to query and analyze distributed data

Evolution and Trends of Data Engineering 2022/23

Polars: Computing a new column from multiple columns - there must be a better way

Index