Latest Cuda related posts with mentions of open-source projects

Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20

1 project | news.ycombinator.com | 6 days ago
Jaxsplat: 3D Gaussian Splatting for Jax

2 projects | news.ycombinator.com | 14 days ago
Welcome to the Parallel Future of Computation

5 projects | news.ycombinator.com | 15 days ago
Bend a Parallel Language

1 project | news.ycombinator.com | 16 days ago
Bend: A higher order language for the GPU

1 project | news.ycombinator.com | 16 days ago
ThunderKittens: Tile Primitives for Speedy Kernels

1 project | news.ycombinator.com | 16 days ago
Bend: A High-Level GPU Language Powered by HVM2

1 project | news.ycombinator.com | 17 days ago
Bend: A Python-Like Parallel Language for GPUs and Multicore CPUs

1 project | news.ycombinator.com | 18 days ago
SimpleGEMM

1 project | news.ycombinator.com | 21 days ago
How hard can generating 1024-bit primes be?

4 projects | news.ycombinator.com | 29 days ago
Llm.c State of the Union

1 project | news.ycombinator.com | 29 days ago
CUDA Checkpoint and Restore

1 project | news.ycombinator.com | about 1 month ago
Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia

1 project | news.ycombinator.com | about 1 month ago
Show HN: One Billion Rows in CUDA

1 project | news.ycombinator.com | about 2 months ago
The Simple Beauty of XOR Floating Point Compression

1 project | news.ycombinator.com | about 2 months ago
Show HN: Faster sorting with register shuffling in CUDA

1 project | news.ycombinator.com | 3 months ago
Raft: Fundamental widely-used algorithms and primitives for machine learning

1 project | news.ycombinator.com | 3 months ago
A Fast FP16xFP4 Gemm CUDA Kernel

1 project | news.ycombinator.com | 4 months ago
Direct Pixel-Space Megapixel Image Generation with Diffusion Models

1 project | news.ycombinator.com | 4 months ago
Show HN: Build NCCL-Tests and Configure SSHD in PyTorch Container

1 project | news.ycombinator.com | 5 months ago
Show HN: Demo of Agent Based Model on GPU with CUDA and OpenGL (Windows/Linux)

1 project | /r/hypeurls | 6 months ago
Show HN: GPU Desktop Calculator

1 project | news.ycombinator.com | 6 months ago
Punica: Serving multiple LoRA finetuned LLM as one

1 project | news.ycombinator.com | 7 months ago
CuGraph – GPU-accelerated graph analytics

1 project | news.ycombinator.com | 8 months ago
A High Throughput B+tree for SIMD Architectures [pdf]

2 projects | news.ycombinator.com | 9 months ago
Parallel Computing Using Cuda-C

1 project | /r/CUDA | 11 months ago
I want a 3d scanner...

1 project | /r/3Dprinting | 11 months ago
Has anyone tried out Squeezellm?

1 project | /r/LocalLLaMA | 11 months ago
Scanning in real life environments to be viewed in VR >>> taking pictures. Simple process from video -> render, using instant-ngp

1 project | /r/virtualreality | 11 months ago
How about Ranger Green?

1 project | /r/airsoft | about 1 year ago
Roast my MC kit

1 project | /r/airsoft | about 1 year ago
I started reading about CUDA programming and I don't see what makes it better than CPU programming

1 project | /r/learnprogramming | about 1 year ago
Has anyone tried to generate images from enough angles to feed Nvidia Nerf to make 3D models?

1 project | /r/StableDiffusion | about 1 year ago
Instant NPG: how do minimize noise and maximize quality? Tips welcome!

1 project | /r/computervision | about 1 year ago
tensor.to_sparse() Memory Allocation

1 project | /r/pytorch | about 1 year ago
GPU implementation of shortest path?

1 project | /r/learnpython | about 1 year ago
I NeRF'd the new Taco Bell on Rt. 40

1 project | /r/Delaware | about 1 year ago
[Mediasynthesis] Les meilleurs modèles d’IA pour un upscaling de résolution d’image ?

1 project | /r/enfrancais | about 1 year ago
Scalix: A Data Parallel Compute Framework w/ Automatic Scaling

1 project | /r/HPC | about 1 year ago
How ? Title: A glitch in the Matrix discovered.

1 project | /r/CaptainDisillusion | about 1 year ago
[P] Clustering face embeddings (512d) using GCN's (not knowing the amount of needed clusters)

1 project | /r/MachineLearning | about 1 year ago
Why this video is sooo good but he's sooo underrated... Y'all should watch it, it's perfect.

1 project | /r/AyyMD | about 1 year ago
Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC

1 project | news.ycombinator.com | about 1 year ago
MiniLLM: A minimal system for running LLMs on consumer-grade Nvidia GPUs

1 project | news.ycombinator.com | about 1 year ago
Kobra: A 3D Rendering Engine

1 project | /r/programming | about 1 year ago
Path Tracing Engine

1 project | /r/GraphicsProgramming | about 1 year ago
Maxing out the device

1 project | /r/CUDA | about 1 year ago
A beginner looking to run NeRF studio on my laptop

1 project | /r/NeuralRadianceFields | over 1 year ago
Do you think neural rendering ready in production?

2 projects | /r/computergraphics | over 1 year ago
2 days ago I was struggling to figure NeRF out. Today I'm exporting these. Excited to see where this technology goes.

1 project | /r/photogrammetry | over 1 year ago

Cuda Posts

Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20

Jaxsplat: 3D Gaussian Splatting for Jax

Welcome to the Parallel Future of Computation

Bend a Parallel Language

Bend: A higher order language for the GPU

ThunderKittens: Tile Primitives for Speedy Kernels

Bend: A High-Level GPU Language Powered by HVM2

Bend: A Python-Like Parallel Language for GPUs and Multicore CPUs

SimpleGEMM

How hard can generating 1024-bit primes be?

Llm.c State of the Union

CUDA Checkpoint and Restore

Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia

Show HN: One Billion Rows in CUDA

The Simple Beauty of XOR Floating Point Compression

Show HN: Faster sorting with register shuffling in CUDA

Raft: Fundamental widely-used algorithms and primitives for machine learning

A Fast FP16xFP4 Gemm CUDA Kernel

Direct Pixel-Space Megapixel Image Generation with Diffusion Models

Show HN: Build NCCL-Tests and Configure SSHD in PyTorch Container

Show HN: Demo of Agent Based Model on GPU with CUDA and OpenGL (Windows/Linux)

Show HN: GPU Desktop Calculator

Punica: Serving multiple LoRA finetuned LLM as one

CuGraph – GPU-accelerated graph analytics

A High Throughput B+tree for SIMD Architectures [pdf]

Parallel Computing Using Cuda-C

I want a 3d scanner...

Has anyone tried out Squeezellm?

Scanning in real life environments to be viewed in VR &gt;&gt;&gt; taking pictures. Simple process from video -&gt; render, using instant-ngp

How about Ranger Green?

Roast my MC kit

I started reading about CUDA programming and I don't see what makes it better than CPU programming

Has anyone tried to generate images from enough angles to feed Nvidia Nerf to make 3D models?

Instant NPG: how do minimize noise and maximize quality? Tips welcome!

tensor.to_sparse() Memory Allocation

GPU implementation of shortest path?

I NeRF'd the new Taco Bell on Rt. 40

[Mediasynthesis] Les meilleurs modèles d’IA pour un upscaling de résolution d’image ?

Scalix: A Data Parallel Compute Framework w/ Automatic Scaling

How ? Title: A glitch in the Matrix discovered.

[P] Clustering face embeddings (512d) using GCN's (not knowing the amount of needed clusters)

Why this video is sooo good but he's sooo underrated... Y'all should watch it, it's perfect.

Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC

MiniLLM: A minimal system for running LLMs on consumer-grade Nvidia GPUs

Kobra: A 3D Rendering Engine

Path Tracing Engine

Maxing out the device

A beginner looking to run NeRF studio on my laptop

Do you think neural rendering ready in production?

2 days ago I was struggling to figure NeRF out. Today I'm exporting these. Excited to see where this technology goes.

Scanning in real life environments to be viewed in VR >>> taking pictures. Simple process from video -> render, using instant-ngp