Query Engines: Push vs. Pull

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • arroyo

    Distributed stream processing engine in Rust

  • Interesting - I looked into your code a bit. I found your window aggregation library [1]. You may be interested in looking into the Rust implementation of some of the research work I've been a part of [2].

    In Flink, I believe the reason they need to implement their own backpressure system is that they multiplex TCP connections. That is, they have multiple logical streams flowing through a single TCP connection. If that's the case, you need to do some work to 1) detect which logical stream is the one that's blocking, and 2) don't block because other logical streams may be able to use the active TCP connection.

    Thinking it through, I think what Flink's approach buys is not necessarily better performance, but better just a manageable number of connections. That is, imagine you have a process P1 with operators A, B and C. And then P2 has D, E, F. Now imagine that this is a shuffle, where A, B and C are fully connected to D, E and F. In my old system, you would have 9 TCP connections. In Flink, you will have 1.

    [1] https://github.com/ArroyoSystems/arroyo/blob/master/arroyo-w...

  • ClickHouse

    ClickHouse® is a free analytics DBMS for big data

  • We initially had a pull-based query engine in ClickHouse, but then migrated to a dataflow graph query engine: https://github.com/ClickHouse/ClickHouse/blob/master/src/Pro...

    It allows decoupling of the control flow and the data flow. The movement of the data in the query pipeline is controlled explicitly.

    We did this migration a few years ago. Many database engines forked or influenced by ClickHouse still use pull-based query engines.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sliding-window-aggregators

    Reference implementations of sliding window aggregation algorithms

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts