amh-code
1brc
amh-code | 1brc | |
---|---|---|
8 | 28 | |
551 | 5,190 | |
- | - | |
10.0 | 9.8 | |
over 1 year ago | 20 days ago | |
Jupyter Notebook | Java | |
- | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
amh-code
-
Ask HN: Recommendations for high quality, free CS books online
I recently stumbled on https://en.algorithmica.org/hpc/ which I absolutely loved. It's really well written, comprehensible and concise. It felt like a pleasure to read which I find really rare with CS textbooks and I feel like I've come out of it understanding how computers work a bit better
Does anyone have any similar CS books they'd recommend? Ideally they'd be:
- Algorithms for Modern Hardware
-
Ask HN: How can I learn about performance optimization?
I admire Daniel Lemire’s work on SIMD implementations. [Lemire]
[Lemire] https://lemire.me/en/#publications
I learn a lot by reading my compiler’s and profiler’s documentation.
For Rust, the Rust Performance Book by Nicholas Nethercote et al. [Nethercote] seems like a nice place to start after reading the Cargo and rustc books.
[Nethercote] https://nnethercote.github.io/perf-book/
Algorithms for Modern Hardware by Sergey Slotin [Slotin] is a dense and approachable overview.
[Slotin] https://en.algorithmica.org/hpc/
Quantitative understanding of the underlying implementations and computer architecture has been invaluable for me. Computer architecture: a quantitative approach by John L. Hennessy and David A. Patterson [H&P] and Computer organization and design: the hardware/software interface by Patterson and Hennessy [P&H ARM, P&H RISC] are two introductory books I like the best. There are three editions of the second book: the ARM, MIPS and RISC-V editions.
[H&P] https://www.google.com/books/edition/_/cM8mDwAAQBAJ
- Algorithms for Modern Hardware – Algorithmica
-
Ask HN: Programming Courses for Experienced Coders?
Hello, recently I've enjoyed Casey Muratori's Performance-Aware Programming course[0]. You could read Algorithms for Modern Hardware[1] to learn similar set of stuff though. Casey's course is aimed at bringing beginners all the way to a nearly-industry-leading understanding of performance issues while the book assumes a bit more knowledge, but I think a lot of people have trouble getting into this stuff using a book if they don't have related experience.
I've also found Hacker's Delight Second Edition[2] to be a useful reference, and I really wish that I would get around to reading What Every Programmer Should Know About Memory[3] in full, because I end up reading a bunch of other things[4] to learn stuff that's surely in there.
[0]: https://www.computerenhance.com/p/welcome-to-the-performance...
[1]: https://en.algorithmica.org/hpc/
[2]: https://github.com/lancetw/ebook-1/blob/80eccb7f59bf102586ba...
[3]: https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
[4]: https://danluu.com/3c-conflict/
-
SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
https://en.algorithmica.org/hpc/ and http://0x80.pl/ have some stuff about this, but the latter can be dense. I've had fun getting my hands dirty with some problems at https://highload.fun/ but there's not much direction unless you go to the telegram chat and ask people questions.
-
Fastest Branchless Binary Search
Other fast binary searches https://github.com/sslotin/amh-code/tree/main/binsearch
1brc
-
The One Billion Row Challenge in CUDA: from 17 minutes to 17 seconds
This would be the code to beat. Ideally with only 8 cores but any number of cores is also very interesting.
https://github.com/gunnarmorling/1brc/discussions/710
-
One Billion Row Challenge in Golang - From 95s to 1.96s
Given that 1-billion-line-file is approximately 13GB, instead of providing a fixed database, the official repository offers a script to generate synthetic data with random readings. Just follow the instructions to create your own database.
-
1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words
Local disk I/O is no longer the bottleneck on modern systems: https://benhoyt.com/writings/io-is-no-longer-the-bottleneck/
In addition, the official 1BRC explicitly evaluated results on a RAM disk to avoid I/O speed entirely: https://github.com/gunnarmorling/1brc?tab=readme-ov-file#eva... "Programs are run from a RAM disk (i.o. the IO overhead for loading the file from disk is not relevant)"
-
Processing One Billion Rows in PHP!
You may have heard of the "The One Billion Row Challenge" (1brc) and in case you don't, go checkout Gunnar Morlings's 1brc repo.
-
The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions
Here’s a thread on results with duckdb, I don’t mean to discourage you taking a shot at all though: https://github.com/gunnarmorling/1brc/discussions/39
-
Ask HN: How can I learn about performance optimization?
If you are in “javaland” look at billion row challenge, you will learn a lot - https://github.com/gunnarmorling/1brc
- Lessons Learned from Doing the One Billion Row Challenge
- 1B Row Challenge Shows Java Can Process 1B Rows File in 2 Seconds
-
From slow to SIMD: A Go optimization story
Even manual vectorization is pain...writing ASM, really?
Rust has unstable portable SIMD and a few third-party crates, C++ has that as well, C# has stable portable SIMD and a very small BLAS-like library on top of it (hell it even exercises PackedSIMD when ran in a browser) and Java is getting stable Panama vectors some time in the future (though the question of codegen quality stands open given planned changes to unsafe API).
Go among these is uniquely disadvantaged. And if that's not enough, you may want to visit 1Brc's challenge discussions and see that Go struggles get anywhere close to 2s mark with both C# and C++ are blazing past it:
https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-...
https://github.com/gunnarmorling/1brc/discussions/67
-
JEP Draft: Deprecate Memory-Access Methods in Sun.misc.Unsafe for Removal
In terms of performance: I realize that this is a somewhat "toy" issue, and it's a sample size of 1, but for the currently ongoing "One Billion Row Challenge"[1] (an ongoing Java performance competition related to parsing and aggregating a 13 GB file), all of the current top-performers are using Unsafe. More specifically, the use of Unsafe appears to have been the change for a few entries that allowed getting below the 3-second barrier in the test.
1. https://github.com/gunnarmorling/1brc
What are some alternatives?
sb_lower_bound - Fastest Branchless Binary Search
1brc - C99 implementation of the 1 Billion Rows Challenge. 1️⃣🐝🏎️ Runs in ~1.6 seconds on my not-so-fast laptop CPU w/ 16GB RAM.
branchless-binary-search - Binary search implementation that avoids branch instructions
yolov7-object-tracking - YOLOv7 Object Tracking Using PyTorch, OpenCV and Sort Tracking
Nim - Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
csvlens - Command line csv viewer
tigerbeetle - The distributed financial transactions database designed for mission critical safety and performance.
nodejs - 1️⃣🐝🏎️ The One Billion Row Challenge with Node.js -- A fun exploration of how quickly 1B rows from a text file can be aggregated with different languages.
ThinkingInSimd - An essay comparing performance implications of ignoring AVX acceleration
pocketbase - Open Source realtime backend in 1 file
std-simd - std::experimental::simd for GCC [ISO/IEC TS 19570:2018]
Apache Arrow - Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing