An Empirical Evaluation of Columnar Storage Formats [pdf]

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • nimble

    New file format for storage of large columnar datasets. (by facebookincubator)

  • Nice to see methodology here. Ideally Lancedb lance v2 and nimble would also both be represented here. It feels like there's huge appetite to do better than Parquet; ideally work like this would help inform where we go next.

    https://blog.lancedb.com/lance-v2/

    https://github.com/facebookincubator/nimble

  • vortex

    Discontinued A toolkit for working with compressed array data [Moved to: https://github.com/spiraldb/vortex] (by fulcrum-so)

  • Lance v2 looks interesting. I like their meta-data + container story. Lacking SOTA encoding schemes though.

    There is also Vortex (https://github.com/fulcrum-so/vortex). That has modern encoding schemes that we want to use.

    BtrBlocks (https://github.com/maxi-k/btrblocks) from the Germans is another Parquet alternative.

    Nimble (formerly Alpha) is a complicated story. We worked with the Velox team for over a year to open-source and extend it. But plans got stymied by legal. This was in collaboration with Meta + CWI + Nvidia + Voltron. We decided to a separate path because Nimble code has no spec/docs. Too tightly coupled with Velox/Folly.

    Given that, we are working on a new file format. We hope to share our ideas/code later this year.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • btrblocks

    BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)

  • Lance v2 looks interesting. I like their meta-data + container story. Lacking SOTA encoding schemes though.

    There is also Vortex (https://github.com/fulcrum-so/vortex). That has modern encoding schemes that we want to use.

    BtrBlocks (https://github.com/maxi-k/btrblocks) from the Germans is another Parquet alternative.

    Nimble (formerly Alpha) is a complicated story. We worked with the Velox team for over a year to open-source and extend it. But plans got stymied by legal. This was in collaboration with Meta + CWI + Nvidia + Voltron. We decided to a separate path because Nimble code has no spec/docs. Too tightly coupled with Velox/Folly.

    Given that, we are working on a new file format. We hope to share our ideas/code later this year.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • BtrBlocks: Efficient Columnar Compression for Data Lakes [pdf]

    2 projects | news.ycombinator.com | 16 Sep 2023
  • Show HN: Kanzi, fast lossless data compression

    1 project | news.ycombinator.com | 30 May 2024
  • Show HN: Automatically extract data from APIs with dlt and OpenAPI

    2 projects | news.ycombinator.com | 29 May 2024
  • Ask HN: Looking for a simple note taking tool

    2 projects | news.ycombinator.com | 28 May 2024
  • The fastest local database ever!!!!

    1 project | dev.to | 27 May 2024