Apache Arrow 3.0.0 Release

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

perspective

45 7,639 9.3 C++

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

In Perspective (https://github.com/finos/perspective), we use Apache Arrow as a fast, cross-language/cross-network data encoding that is extremely useful for in-browser data visualization and analytics.
Some benefits:
- super fast read/write compared to CSV & JSON (Perspective and Arrow share an extremely similar column encoding scheme, so we can memcpy Arrow columns into Perspective wholesale instead of reading a dataset iteratively).
- the ability to send Arrow binaries as an ArrayBuffer between a Python server and a WASM client, which guarantees compatibility and removes the overhead of JSON serialization/deserialization.
- because Arrow columns are strictly typed, there's no need to infer data types - this helps with speed and correctness.
- Compared to JSON/CSV, Arrow binaries have a super compact encoding that reduces network transport time.
For us, building on top of Apache Arrow (and using it wherever we can) reduces the friction of passing around data between clients, servers, and runtimes in different languages, and allows larger datasets to be efficiently visualized and analyzed in the browser context.

arquero

8 1,203 4.6 JavaScript

Query processing and transformation of array-backed data tables.

Take a look at the arquero library from a research group at University of Washington (the same group that D3 came out of). https://github.com/uwdata/arquero

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
arrow-julia

4 279 5.6 Julia

Official Julia implementation of Apache Arrow

Excited to see this release's official inclusion of the pure Julia Arrow implementation [1]!
It's so cool to be able mmap Arrow memory and natively manipulate it from within Julia with virtually no performance overhead. Since the Julia compiler can specialize on the layout of Arrow-backed types at runtime (just as it can with any other type), the notion of needing to build/work with a separate "compiler for fast UDFs" is rendered obsolete.
It feels pretty magical when two tools like this compose so well without either being designed with the other in mind - a testament to the thoughtful design of both :) mad props to Jacob Quinn for spearheading the effort to revive/restart Arrow.jl and get the package into this release.
[1] https://github.com/JuliaData/Arrow.jl

go-py-arrow-bridge

1 18 0.0 Go

Bridge between Go and Python to facilitate zero-copy using Apache Arrow

Not only in between processes, but also in between languages in a single process. In this POC I spun up a Python interpreter in Go and can pass the Arrow data buffer between processes in constant time. https://github.com/nickpoorman/go-py-arrow-bridge

Apache Arrow

76 13,698 10.0 C++

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

https://github.com/apache/arrow/tree/master/format

vega-loader-arrow

1 53 7.4 JavaScript

Data loader for the Apache Arrow format.

Not sure I follow, that page indicates that JS support is pretty good for all but the more obscure features (e.g. decimals) and doesn't mention data visualization at all? Anyhow, I've successfully used https://github.com/vega/vega-loader-arrow for in-browser plots before, and Observable has a fair few notebooks showing how to use the JS API (e.g. https://observablehq.com/@theneuralbit/introduction-to-apach...)

cylon

3 295 4.3 C++

Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame. (by cylondata)

Cudf and Cylon are two execution engines natively supporting Arrow format https://github.com/rapidsai/cudf https://github.com/cylondata/cylon

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
ClickHouse

211 34,836 10.0 C++

ClickHouse® is a real-time analytics DBMS

On a side note, Clickhouse had some Arrow support
https://github.com/ClickHouse/ClickHouse/issues/12284

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

42.parquet – A Zip Bomb for the Big Data Age

1 project | news.ycombinator.com | 26 Mar 2024
DuckDB-WASM: WebAssembly Version of DuckDB

1 project | news.ycombinator.com | 22 Jan 2024
Show HN: DuckDB-WASM, execute queries in a browser, and share them as links

1 project | news.ycombinator.com | 19 Dec 2023
Show HN: Udsv.js – A faster CSV parser in 5KB (min)

3 projects | news.ycombinator.com | 4 Sep 2023
[Question] Using DuckDB to connect to (external/cloud) Postgres DB

1 project | /r/learnpython | 24 May 2023

Apache Arrow 3.0.0 Release

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database Arrow Data Analytics WebAssembly
Post date: 3 Feb 2021

perspective

arquero

InfluxDB

arrow-julia

go-py-arrow-bridge

Apache Arrow

vega-loader-arrow

cylon

SaaSHub

ClickHouse

Related posts

42.parquet – A Zip Bomb for the Big Data Age

DuckDB-WASM: WebAssembly Version of DuckDB

Show HN: DuckDB-WASM, execute queries in a browser, and share them as links

Show HN: Udsv.js – A faster CSV parser in 5KB (min)

[Question] Using DuckDB to connect to (external/cloud) Postgres DB

Apache Arrow 3.0.0 Release

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Database Arrow Data Analytics WebAssembly Post date: 3 Feb 2021

Related posts

42.parquet – A Zip Bomb for the Big Data Age

DuckDB-WASM: WebAssembly Version of DuckDB

Show HN: DuckDB-WASM, execute queries in a browser, and share them as links

Show HN: Udsv.js – A faster CSV parser in 5KB (min)

[Question] Using DuckDB to connect to (external/cloud) Postgres DB

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Database Arrow Data Analytics WebAssembly
Post date: 3 Feb 2021