Pure Python Distributed SQL Engine

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • datafusion-python

    Apache DataFusion Python Bindings

  • Interesting, I was wondering if you considered building on top of https://github.com/apache/arrow-datafusion-python

    I really do think a distributed db with compute/storage separation and optimized for feature engineering/dataloading (for training NNs) is underserved.

    I'd be very interested in the time series aspects of what you're building.

  • quokka

    Making data lake work for time series (by marsupialtail)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • datafusion-ballista

    Apache Arrow Ballista Distributed Query Engine

  • Can you explain how this might differ from something like https://github.com/apache/arrow-ballista

    I've seen several variants of "next-gen" spark, but nowhere have I really seen the different tradeoffs/advantages/disadvantages between them.

  • sqlglot

    Python SQL Parser and Transpiler

  • pg8000

    A Pure-Python PostgreSQL Driver

  • When people say "pure X", to me, it normally means they didn't involve an FFI or external compiler. This is an often beneficial thing since it simplifies your build process.

    For example, here [0] is a "pure Python postgres driver" and the implication is that it doesn't use libpg.

    Or see also this discussion [1].

    [0] https://github.com/tlocke/pg8000

    [1] https://www.reddit.com/r/learnpython/comments/nktut1/eli5_th...

  • polars

    Dataframes powered by a multithreaded, vectorized query engine, written in Rust

  • Yes, we have basic support.

    Here are some examples of how to use it in python:

    https://github.com/pola-rs/polars/blob/91a419acaf024e64410e7...

    However, full sql support is on the roadmap. It's just a matter of hours in a day...

  • opteryx

    🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

  • Thanks for sharing.

    I have a SQL Engine in Python too (https://github.com/mabel-dev/opteryx). I focused my initial effort on supporting SQL statements and making the usage feel like a database - that probably reflects the problem I had in front of me when I set out - only handling handfuls of gigabytes in a batch environment for ETLs with a group of new-to-data-engineering engineers. Have recently started looking more at real-time performance, such as distributing work. Am interesting in how you've approached.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • sqlparser-rs

    Extensible SQL Lexer and Parser for Rust

  • It uses https://github.com/sqlparser-rs/sqlparser-rs as the parser and lexer. The binder, planner, optimizer and executor are in Python. The optimizer stage only works on the logical plan and the rules are heuristic only.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Polars

    11 projects | news.ycombinator.com | 8 Jan 2024
  • How moving from Pandas to Polars made me write better code without writing better code

    2 projects | dev.to | 5 Mar 2024
  • I used multiprocessing and multithreading at the same time to drop the execution time of my code from 155+ seconds to just over 2+ seconds

    1 project | /r/Python | 29 May 2023
  • Data Engineering with Rust

    5 projects | /r/rust | 9 May 2023
  • Any job processing framework like Spark but in Rust?

    4 projects | /r/dataengineering | 23 Mar 2023