-
Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20
-
Jaxsplat: 3D Gaussian Splatting for Jax
-
Welcome to the Parallel Future of Computation
-
Bend a Parallel Language
-
Bend: A higher order language for the GPU
-
ThunderKittens: Tile Primitives for Speedy Kernels
-
Bend: A High-Level GPU Language Powered by HVM2
-
Bend: A Python-Like Parallel Language for GPUs and Multicore CPUs
-
SimpleGEMM
-
How hard can generating 1024-bit primes be?
-
Llm.c State of the Union
-
CUDA Checkpoint and Restore
-
Ask HN: Yo Nephew, in E. Africa, wants to train an LLM with on disk Wikipedia
-
Show HN: One Billion Rows in CUDA
-
The Simple Beauty of XOR Floating Point Compression
-
Show HN: Faster sorting with register shuffling in CUDA
-
Raft: Fundamental widely-used algorithms and primitives for machine learning
-
A Fast FP16xFP4 Gemm CUDA Kernel
-
Direct Pixel-Space Megapixel Image Generation with Diffusion Models
-
Show HN: Build NCCL-Tests and Configure SSHD in PyTorch Container
-
Show HN: Demo of Agent Based Model on GPU with CUDA and OpenGL (Windows/Linux)
-
Show HN: GPU Desktop Calculator
-
Punica: Serving multiple LoRA finetuned LLM as one
-
CuGraph – GPU-accelerated graph analytics
-
A High Throughput B+tree for SIMD Architectures [pdf]
-
Parallel Computing Using Cuda-C
-
I want a 3d scanner...
-
Has anyone tried out Squeezellm?
-
Scanning in real life environments to be viewed in VR >>> taking pictures. Simple process from video -> render, using instant-ngp
-
How about Ranger Green?
-
Roast my MC kit
-
I started reading about CUDA programming and I don't see what makes it better than CPU programming
-
Has anyone tried to generate images from enough angles to feed Nvidia Nerf to make 3D models?
-
Instant NPG: how do minimize noise and maximize quality? Tips welcome!
-
tensor.to_sparse() Memory Allocation
-
GPU implementation of shortest path?
-
I NeRF'd the new Taco Bell on Rt. 40
-
[Mediasynthesis] Les meilleurs modèles d’IA pour un upscaling de résolution d’image ?
-
Scalix: A Data Parallel Compute Framework w/ Automatic Scaling
-
How ? Title: A glitch in the Matrix discovered.
-
[P] Clustering face embeddings (512d) using GCN's (not knowing the amount of needed clusters)
-
Why this video is sooo good but he's sooo underrated... Y'all should watch it, it's perfect.
-
Don't have a $5k MacBook to run LLAMA65B? MiniLLM runs LLMs on GPUs in <500 LOC
-
MiniLLM: A minimal system for running LLMs on consumer-grade Nvidia GPUs
-
Kobra: A 3D Rendering Engine
-
Path Tracing Engine
-
Maxing out the device
-
A beginner looking to run NeRF studio on my laptop
-
Do you think neural rendering ready in production?
-
2 days ago I was struggling to figure NeRF out. Today I'm exporting these. Excited to see where this technology goes.