sb_lower_bound
branchless-binary-search
sb_lower_bound | branchless-binary-search | |
---|---|---|
8 | 2 | |
15 | 10 | |
- | - | |
3.9 | 10.0 | |
11 months ago | over 4 years ago | |
C++ | C | |
- | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
sb_lower_bound
-
Fastest Branchless Binary Search
Then you'll want to look at https://mhdm.dev/posts/sb_lower_bound/#prefetching
100mb is large enough that the branchy version turns out to have a slight advantage, more due to quirks of x86 (speculative execution) rather than being better.
"very similar topic" is an understatement. Funnily enough the "implementation to perform the best on Apple M1 after all micro-optimizations are applied" in the Conclusion is equivalent in terms of the how many actual comparisons are made as with sb_lower_bound. Out of curiosity I've benchmarked the two and orlp lower_bound seems to perform slightly worse: ~39ns average (using gcc) vs ~33ns average of sb_lower_bound (using clang -cmov). I'm comparing best runs for both, usual disclaimer of tested on my machine.
branchless-binary-search
-
Fastest Branchless Binary Search
I took a stab at the same problem a while ago. Since the upper bound of iterations is based on the input length, if you write your search in a way that extra iterations don't change the result, you can use a switch fallthrough to "unroll" the loop and not have to branch.
https://github.com/ehrmann/branchless-binary-search/blob/mas...
-
Beautiful branchless binary search
Shameless plug of my attempt at this: https://github.com/ehrmann/branchless-binary-search
What are some alternatives?
ThinkingInSimd - An essay comparing performance implications of ignoring AVX acceleration
amh-code - Complete implementations from "Algorithms for Modern Hardware"
tigerbeetle - The distributed financial transactions database designed for mission critical safety and performance.
Spreads - Series and Panels for Real-time and Exploratory Analysis of Data Streams
zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
optimization-manual - Contains the source code examples described in the "IntelĀ® 64 and IA-32 Architectures Optimization Reference Manual"
rust - Empowering everyone to build reliable and efficient software.