-
incubator-fury
A blazingly fast multi-language serialization framework powered by JIT and zero-copy.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
For implemetation details, https://github.com/apache/incubator-fury/blob/main/java/fury... can be taken as an example
> In such cases, the serialized binary are mostly in 200~1000 bytes. Not big enough for zstd to work
You're not referring to the same dictionary that I am. Look at --train in [1].
If you have a training corpus of representative data, you can generate a dictionary that you preshare on both sides which will perform much better for very small binaries (including 200-1k bytes).
If you want maximum flexibility (i.e. you don't know the universe of representative messages ahead of time or you want maximum compression performance), you can gather this corpus transparently as messages are generated & then generate a dictionary & attach it as sideband metadata to a message. You'll probably need to defer the decoding if it references a dictionary not yet received (i.e. send delivers messages out-of-order from generation). There are other techniques you can apply, but the general rule is that your custom encoding scheme is unlikely to outperform zstd + a representative training corpus. If it does, you'd need to actually show this rather than try to argue from first principles.
[1] https://github.com/facebook/zstd/blob/dev/programs/zstd.1.md
Related posts
-
Apache Fury Serialization 0.5.1 released
-
Apache Fury – fast serialization framework – 0.5.0 released
-
Fast Cloud Native Java Serialization:Fury JIT and GraalVM Native Image AOT
-
Fury Serialization Framework 0.3.1 Released: Support Python 3.11&3.12
-
Fury Serialization 0.3.1 Released: support Python 3.11&12