quokka vs Daft

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

quokka		Daft
	Project
23	Mentions	9
1,088	Stars	1,792
-	Growth	6.0%
8.3	Activity	9.8
8 months ago	Latest Commit	6 days ago
Python	Language	Rust
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

quokka

Posts with mentions or reviews of quokka. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-08.

How Query Engines Work
2 projects | news.ycombinator.com | 8 Sep 2023

An awesome read!
Something related that I found out about from HN a few months back is another engine called quokka. It's particularly interesting and applicable how quokka schedules distributed queries to outperform Spark https://github.com/marsupialtail/quokka/blob/master/blog/why...
Quokka – Distributed Polars on Ray
1 project | news.ycombinator.com | 30 Jun 2023
Algorithmic Trading with Go
6 projects | news.ycombinator.com | 30 Jun 2023

Hi Justin, you might be interested in my blog: https://github.com/marsupialtail/quokka/blob/master/blog/bac... advocating a cloud based approach.
You don't have to use the system I am building, but it's worth thinking about that design.
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
4 projects | news.ycombinator.com | 7 Jun 2023

SQL support is very challenging.
I work on Quokka (https://github.com/marsupialtail/quokka). I support Iceberg reads. Recently we are adding SQL support from just parsing the DuckDB logical plan, though that is very challenging as well.
The Python world lacks a standard for a plug and play SQL query optimizer. Apache Calcite is good for the JVM world, but not great if you are trying to cut out the JVM.
Why your dataframe library needs to understand vector embeddings
2 projects | news.ycombinator.com | 20 May 2023
The Inner Workings of Distributed Databases
1 project | news.ycombinator.com | 17 Apr 2023

In case people are interested, I wrote a post about fault tolerance strategies of data systems like Spark and Flink: https://github.com/marsupialtail/quokka/blob/master/blog/fau...
The key difference here is that these systems don't store data, so fault tolerance means recovering within a query instead of not losing data.
Launch HN: DAGWorks – ML platform for data science teams
7 projects | news.ycombinator.com | 7 Mar 2023

would love to collaborate on an integration with pyquokka (https://github.com/marsupialtail/quokka) once I put out a stable release end of this month :-)
is spark always your go to solution ?
1 project | /r/dataengineering | 14 Feb 2023

Then you should keep an eye on quokka. This may become the "Spark" for Polars/DuckDB. It seems to be under active development though I'm not sure how stable it is.
Distributed fault tolerance made simple
1 project | news.ycombinator.com | 12 Feb 2023
Fault tolerance for distributed data systems is quite simple
1 project | news.ycombinator.com | 31 Jan 2023

Daft

Posts with mentions or reviews of Daft. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-02-29.

Daft: Distributed DataFrame for Python
2 projects | news.ycombinator.com | 29 Feb 2024

There are benchmarks here - https://github.com/Eventual-Inc/Daft?tab=readme-ov-file#benc.... Seems to outperform Dask by a fair bit.
Daft: A High-Performance Distributed Dataframe Library for Multimodal Data
4 projects | news.ycombinator.com | 7 Jun 2023

Hi (one of the maintainers here), that is a good suggestion! I wasn't aware of that project. I went ahead and made an issue to add `export DO_NOT_TRACK=1` as one of the variables we track! https://github.com/Eventual-Inc/Daft/issues/1015

1 project | news.ycombinator.com | 6 Jun 2023
Daft: The Distributed Python Dataframe
4 projects | /r/Python | 23 Mar 2023

We are looking at supporting other distributed backends as well - please drop by our discussion forums (https://github.com/Eventual-Inc/Daft/discussions) and drop us a message if you have any suggestions! We’d love to hear from you :)

What are some alternatives?

When comparing quokka and Daft you can also consider the following projects:

opteryx - 🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.

polars - Dataframes powered by a multithreaded, vectorized query engine, written in Rust

cempaka - "Write a trading bot which buys low and sells high." Sounds simple enough, right?

xvc - A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

awesome-pipeline - A curated list of awesome pipeline toolkits inspired by Awesome Sysadmin

hamilton - A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

spyql - Query data on the command line with SQL-like SELECTs powered by Python expressions

deeplake - Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

pg8000 - A Pure-Python PostgreSQL Driver

lightflus - A Lightweight, Cloud-Native Stateful Distributed Dataflow Engine

blog - Some notes on things I find interesting and important.

hamilton - Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

quokka vs opteryx Daft vs polars quokka vs cempaka Daft vs xvc quokka vs awesome-pipeline Daft vs hamilton quokka vs spyql Daft vs deeplake quokka vs pg8000 Daft vs lightflus quokka vs blog Daft vs hamilton

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

Compare quokka vs Daft and see what are their differences.

quokka

Daft

quokka

Daft

What are some alternatives?