Rust data-engineering

Open-source Rust projects categorized as data-engineering

Top 11 Rust data-engineering Projects

  • risingwave

    SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.

  • Project mention: Proton, a fast and lightweight alternative to Apache Flink | news.ycombinator.com | 2024-01-30

    How does this compare to RisingWave and Materialize?

    https://github.com/risingwavelabs/risingwave

  • paradedb

    Postgres for Search and Analytics

  • Project mention: Pg_lakehouse: Query Any Data Lake from Postgres | news.ycombinator.com | 2024-05-13
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • qsv

    CSVs sliced, diced & analyzed.

  • Project mention: Qsv: Efficient CSV CLI Toolkit | news.ycombinator.com | 2023-12-22

    Thanks for the detailed feedback @snidane!

    As maintainer of qsv, here's my reply:

    - Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.

    - Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)

    - I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.

    - With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv

    - With the cat rowskey command, qsv can already concatenate files with mismatched headers.

    - other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)

    - as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.

    Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0

  • Daft

    Distributed DataFrame for Python designed for the cloud, powered by Rust

  • Project mention: Pg_lakehouse: Query Any Data Lake from Postgres | news.ycombinator.com | 2024-05-13

    We're actually using pyiceberg to retrieve metadata! All our IO and decoding happens in the rust side once the data has been passthrough.

    We expose something called a ScanOperator which allows integration into various catalogs through a thin layer that exposes ScanTasks.

    Iceberg's impl: https://github.com/Eventual-Inc/Daft/blob/416009138359a9d410...

  • blaze

    Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core. (by kwai)

  • Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19
  • delta-sharing-rs

    A Minimalistic Rust Implementation of Delta Sharing Server.

  • grant-rs

    Manage Redshift/Postgres privileges in GitOps style written in Rust

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • xvc

    A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

  • pipebase

    data integration framework

  • ansilo

    Unlocking the power of SQL/MED to create data ecosystems from disparate data sources

  • pipebuilder

    pipebase app CI

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Rust data-engineering related posts

  • Pg_lakehouse: Query Any Data Lake from Postgres

    1 project | news.ycombinator.com | 12 May 2024
  • Ask HN: Best way to mirror a Postgres database to parquet?

    1 project | news.ycombinator.com | 10 Apr 2024
  • Daft: Distributed DataFrame for Python

    2 projects | news.ycombinator.com | 29 Feb 2024
  • Transforming Postgres into a Fast OLAP Database

    3 projects | news.ycombinator.com | 7 Feb 2024
  • Show HN: Pg_analytics – Speed Up Postgres Analytical Queries by 94x

    1 project | news.ycombinator.com | 31 Jan 2024
  • ParadeDB – PostgreSQL for Search

    1 project | news.ycombinator.com | 2 Jan 2024
  • Postgresql index

    1 project | /r/SQL | 11 Dec 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 22 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source data-engineering projects in Rust? This list will help you:

Project Stars
1 risingwave 6,394
2 paradedb 4,240
3 qsv 2,249
4 Daft 1,761
5 blaze 908
6 delta-sharing-rs 72
7 grant-rs 24
8 xvc 22
9 pipebase 9
10 ansilo 4
11 pipebuilder 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com