Rethinking string encoding: a 37.5% space efficient encoding than UTF-8 in Fury

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

incubator-fury

16 2,677 9.7 Java

A blazingly fast multi-language serialization framework powered by JIT and zero-copy.

For implemetation details, https://github.com/apache/incubator-fury/blob/main/java/fury... can be taken as an example

zstd

109 22,550 9.6 C

Zstandard - Fast real-time compression algorithm

> In such cases, the serialized binary are mostly in 200~1000 bytes. Not big enough for zstd to work
You're not referring to the same dictionary that I am. Look at --train in [1].
If you have a training corpus of representative data, you can generate a dictionary that you preshare on both sides which will perform much better for very small binaries (including 200-1k bytes).
If you want maximum flexibility (i.e. you don't know the universe of representative messages ahead of time or you want maximum compression performance), you can gather this corpus transparently as messages are generated & then generate a dictionary & attach it as sideband metadata to a message. You'll probably need to defer the decoding if it references a dictionary not yet received (i.e. send delivers messages out-of-order from generation). There are other techniques you can apply, but the general rule is that your custom encoding scheme is unlikely to outperform zstd + a representative training corpus. If it does, you'd need to actually show this rather than try to argue from first principles.
[1] https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Apache Fury Serialization 0.5.1 released

1 project | news.ycombinator.com | 29 May 2024
Apache Fury – fast serialization framework – 0.5.0 released

1 project | news.ycombinator.com | 6 May 2024
Fast Cloud Native Java Serialization:Fury JIT and GraalVM Native Image AOT

1 project | news.ycombinator.com | 1 Dec 2023
Fury Serialization Framework 0.3.1 Released: Support Python 3.11&3.12

1 project | /r/Python | 23 Nov 2023
Fury Serialization 0.3.1 Released: support Python 3.11&12

1 project | news.ycombinator.com | 21 Nov 2023

Rethinking string encoding: a 37.5% space efficient encoding than UTF-8 in Fury

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Compression Projects Serialization cross-language Fast
Post date: 7 May 2024

incubator-fury

zstd

InfluxDB

Related posts

Apache Fury Serialization 0.5.1 released

Apache Fury – fast serialization framework – 0.5.0 released

Fast Cloud Native Java Serialization:Fury JIT and GraalVM Native Image AOT

Fury Serialization Framework 0.3.1 Released: Support Python 3.11&3.12

Fury Serialization 0.3.1 Released: support Python 3.11&12

Rethinking string encoding: a 37.5% space efficient encoding than UTF-8 in Fury

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Compression Projects Serialization cross-language Fast Post date: 7 May 2024

incubator-fury

zstd

InfluxDB

Related posts

Apache Fury Serialization 0.5.1 released

Apache Fury – fast serialization framework – 0.5.0 released

Fast Cloud Native Java Serialization:Fury JIT and GraalVM Native Image AOT

Fury Serialization Framework 0.3.1 Released: Support Python 3.11&amp;3.12

Fury Serialization 0.3.1 Released: support Python 3.11&12

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Compression Projects Serialization cross-language Fast
Post date: 7 May 2024

Fury Serialization Framework 0.3.1 Released: Support Python 3.11&3.12