83
199
290
Mentions
@
|
Stars | Project | Description |
---|---|---|---|
5 | 17,441 | LLM training in simple, raw C/CUDA | |
4 | 294 | GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications. | |
1 | 826 | CUDA accelerated rasterization of gaussian splatting | |
1 | 176 | CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups | |
1 | 74 | CUDA checkpoint and restore utility | |
1 | 6 | My CUDA solution to the 1BRC |
Popular Cuda Topics
Latest Mentions
Latest mentioned Cuda repos
Stars | Project |
---|---|
17,441 | llm.c |
176 | CGBN |
826 | gsplat |
74 | cuda-checkpoint |
6 | cuda-1brc |
294 | dietgpu |
414 | flash-attention-minimal |
0 | blog-code |
614 | raft |
5 | tuna |
278 | NATTEN |
5 | build-nccl-tests-with-pytorch |
23 | GPUODEBenchmarks |
190 | RWKV-CUDA |
168 | causal-conv1d |
67 | ABMGPU |
7 | gpu-desktop-calculator |
57 | gdlog |
2 | DOKSparse |
1,575 | cugraph |
Latest Discoveries
Latest discovered Cuda repos
Stars | Project |
---|---|
176 | CGBN |
826 | gsplat |
74 | cuda-checkpoint |
6 | cuda-1brc |
17,441 | llm.c |
414 | flash-attention-minimal |
0 | blog-code |
5 | tuna |
278 | NATTEN |
5 | build-nccl-tests-with-pytorch |
23 | GPUODEBenchmarks |
168 | causal-conv1d |
67 | ABMGPU |
7 | gpu-desktop-calculator |
57 | gdlog |
22 | Harmonia_for_B_plus_trees |
0 | MandelbrotExplorer |
124 | Parallel-Computing-Cuda-C |
7 | GCGT |
667 | nccl-tests |
Recently updated posts
-
How hard can generating 1024-bit primes be?
-
Llm.c State of the Union
-
CUDA Checkpoint and Restore
-
Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia
-
Show HN: One Billion Rows in CUDA