Changing std:sort at Google’s Scale and Beyond

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • xeus-cling

    Jupyter kernel for the C++ programming language

  • awesome-algorithms

    A curated list of awesome places to learn and/or practice algorithms.

  • https://github.com/tayllan/awesome-algorithms#github-librari...

    awesome-theoretical-computer-science > Machine Learning Theory, Physics; Grover's; and surely something is faster than Timsort:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-theoretical-computer-science

    The interdicplinary of Mathematics and Computer Science, Distinguisehed by its emphasis on mathemtical technique and rigour.

  • fluxsort

    A fast branchless stable quicksort / mergesort hybrid that is highly adaptive.

  • Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.

    I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.

    [0] https://github.com/scandum/fluxsort

    [1] https://github.com/scandum/crumsort

    [2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

  • crumsort

    A branchless unstable quicksort / mergesort that is highly adaptive.

  • Any chance you could comment on fluxsort[0], another fast quicksort? It's stable and uses a buffer about the size of the original array, which sounds like it puts it in a similar category as glidesort. Benchmarks against pdqsort at the end of that README; I can verify that it's faster on random data by 30% or so, and the stable partitioning should mean it's at least as adaptive (but the current implementation uses an initial analysis pass followed by adaptive mergesort rather than optimistic insertion sort to deal with nearly-sorted data, which IMO is fragile). There's an in-place effort called crumsort along similar lines, but it's not stable.

    I've been doing a lot of work on sorting[2], in particular working to hybridize various approaches better. Very much looking forward to seeing how glidesort works.

    [0] https://github.com/scandum/fluxsort

    [1] https://github.com/scandum/crumsort

    [2] https://mlochbaum.github.io/BQN/implementation/primitive/sor...

  • SHOGUN

    Shōgun

  • The function is trying to get the median, which is not defined for an empty set. With this particular implementation, there is an assert for that:

    https://github.com/shogun-toolbox/shogun/blob/9b8d85/src/sho...

    Unrelatedly, but from the same section:

    > Fixes are trivial, access the nth element only after the call being made. Be careful.

    Wouldn't the proper fix to do the nth_element for the larget element first (for those cases that don't do that already) and then adjust the end to be the begin + larger_n for the second nth_element call? Otherwise the second call will check [begin + larger_n, end) again for no reason at all.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • About Data analyst, data scientist and data engineer, resources and experiences

    5 projects | dev.to | 26 Mar 2024
  • Good coding groups for black women?

    26 projects | news.ycombinator.com | 13 Jan 2024
  • Best-Of Machine Learning with Python

    1 project | news.ycombinator.com | 28 Apr 2022
  • Questions regarding Job Requirements for data analyst to data science transition?

    1 project | /r/datascience | 5 Mar 2022
  • Awesome list of ML

    1 project | /r/programming | 16 Sep 2021