HIPIFY
OpenRAND
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
HIPIFY
- CUDA Is Still a Giant Moat for Nvidia
- Nvidia hits $2T valuation as AI frenzy grips Wall Street
- Hipify automatically translates CUDA source code into portable HIP C++
-
Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
> what would be the point for someone to add ROCm support to various pieces of software which currently require CUDA
It isn't just old cards though, CUDA is a point of centralization on a single provider during a time when access to that providers higher end cards isn't even available and that is causing people to look elsewhere.
ROCm supports CUDA through the included HIP projects...
https://github.com/ROCm/HIP
https://github.com/ROCm/HIPCC
https://github.com/ROCm/HIPIFY
The later will regex replace your CUDA methods with HIP methods. If it is as easy as running hipify on your codebase (or just coding to HIP apis), it certainly makes sense to do so.
- AMD leaps after launching AI chip that could challenge Nvidia dominance
OpenRAND
-
Intel CEO: 'The entire industry is motivated to eliminate the CUDA market'
> Generating random numbers is a bit complicated!
I know! I just wrote a whole paper and published a library on this!
But really, perhaps not as much as many from outside might think. The core of a Philox implementation can be around 50 lines of C++ [1], with all the bells and whistles maybe around 300-400. That implementation's performance equals CuRAND's , sometimes even surpasses it! (the API is designed to avoid maintaining any rng states on device memory, something curand forces you to do).
> running the same PRNG with the same seed on all your cores will produce the same result
You're right. Solution here is to utilize multiple generator objects, one per thread, ensuring each produces statistically independent random streams. Some good algorithms (Philox for example), allow you to use any set of unique values as seeds for your threads (e.g. thread id).
[1] https://github.com/msu-sparta/OpenRAND/blob/main/include/ope...
What are some alternatives?
ZLUDA - CUDA on AMD GPUs
Cgml - GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.
llama-cpp-python - Python bindings for llama.cpp
stable-diffusion - This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, and multiple features and other enhancements. [Moved to: https://github.com/invoke-ai/InvokeAI]
stable-diffusion-webui - Stable Diffusion web UI
HIPIFY - HIPIFY: Convert CUDA to Portable C++ Code [Moved to: https://github.com/ROCm/HIPIFY]
ROCm - ROCm Website [Moved to: https://github.com/ROCm/ROCm.github.io]