Changing std:sort at Google’s Scale and Beyond

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

xeus-cling

16 2,985 4.2 C++

Jupyter kernel for the C++ programming language
awesome-algorithms

34 17,973 3.1

A curated list of awesome places to learn and/or practice algorithms.

https://github.com/tayllan/awesome-algorithms#github-librari...
awesome-theoretical-computer-science > Machine Learning Theory, Physics; Grover's; and surely something is faster than Timsort:

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
awesome-theoretical-computer-science

12 611 4.0 Python

The interdicplinary of Mathematics and Computer Science, Distinguisehed by its emphasis on mathemtical technique and rigour.
awesome-theoretical-computer

1 - -
fluxsort

12 664 6.4 C

A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.

Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.
I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.
[0] https://github.com/scandum/fluxsort
[1] https://github.com/scandum/crumsort
[2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

crumsort

7 315 3.6 C

A branchless unstable quicksort / mergesort that is highly adaptive.

Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.
I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.
[0] https://github.com/scandum/fluxsort
[1] https://github.com/scandum/crumsort
[2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

SHOGUN

1 3,011 4.8 C++

Shōgun

The function is trying to get the median, which is not defined for an empty set. With this particular implementation, there is an assert for that:
https://github.com/shogun-toolbox/shogun/blob/9b8d85/src/sho...
Unrelatedly, but from the same section:
> Fixes are trivial, access the nth element only after the call being made. Be careful.
Wouldn't the proper fix to do the nth_element for the larget element first (for those cases that don't do that already) and then adjust the end to be the begin + larger_n for the second nth_element call? Otherwise the second call will check [begin + larger_n, end) again for no reason at all.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

About Data analyst, data scientist and data engineer, resources and experiences

5 projects | dev.to | 26 Mar 2024
Good coding groups for black women?

26 projects | news.ycombinator.com | 13 Jan 2024
Best-Of Machine Learning with Python

1 project | news.ycombinator.com | 28 Apr 2022
Questions regarding Job Requirements for data analyst to data science transition?

1 project | /r/datascience | 5 Mar 2022
Awesome list of ML

1 project | /r/programming | 16 Sep 2021

Changing std:sort at Google’s Scale and Beyond

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning c-plus-plus-14 Awesome Data Science jupyter-kernels
Post date: 20 Apr 2022

xeus-cling

awesome-algorithms

InfluxDB

awesome-theoretical-computer-science

awesome-theoretical-computer

fluxsort

crumsort

SHOGUN

SaaSHub

Related posts

About Data analyst, data scientist and data engineer, resources and experiences

Good coding groups for black women?

Best-Of Machine Learning with Python

Questions regarding Job Requirements for data analyst to data science transition?

Awesome list of ML

Changing std:sort at Google’s Scale and Beyond

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning c-plus-plus-14 Awesome Data Science jupyter-kernels Post date: 20 Apr 2022

Related posts

About Data analyst, data scientist and data engineer, resources and experiences

Good coding groups for black women?

Best-Of Machine Learning with Python

Questions regarding Job Requirements for data analyst to data science transition?

Awesome list of ML

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning c-plus-plus-14 Awesome Data Science jupyter-kernels
Post date: 20 Apr 2022