Top 23 Python GPU Projects

Pytorch

341 78,642 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Project mention: PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed | news.ycombinator.com | 2024-05-10

DeepSpeed

51 33,122 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ivy

17 14,027 10.0 Python

The Unified AI Framework

Project mention: Keras 3.0 | news.ycombinator.com | 2023-11-28

See also https://github.com/unifyai/ivy which I have not tried but seems along the lines of what you are describing, working with all the major frameworks

tvm

16 11,272 9.9 Python

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Project mention: Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU | news.ycombinator.com | 2024-05-03

Yes. Web-llm is a wrapper of tvmjs: https://github.com/apache/tvm
Just wrappers all the way down

scalene

32 11,240 9.2 Python

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

Project mention: Memray – A Memory Profiler for Python | news.ycombinator.com | 2024-02-10

I collected a list of profilers (also memory profilers, also specifically for Python) here: https://github.com/albertz/wiki/blob/master/profiling.md
Currently I actually need a Python memory profiler, because I want to figure out whether there is some memory leak in my application (PyTorch based training script), and where exactly (in this case, it's not a problem of GPU memory, but CPU memory).
I tried Scalene (https://github.com/plasma-umass/scalene), which seems to be powerful, but somehow the output it gives me is not useful at all? It doesn't really give me a flamegraph, or a list of the top lines with memory allocations, but instead it gives me a listing of all source code lines, and prints some (very sparse) information on each line. So I need to search through that listing now by hand to find the spots? Maybe I just don't know how to use it properly.
I tried Memray, but first ran into an issue (https://github.com/bloomberg/memray/issues/212), but after using some workaround, it worked now. I get a flamegraph out, but it doesn't really seem accurate? After a while, there don't seem to be any new memory allocations at all anymore, and I don't quite trust that this is correct.
There is also Austin (https://github.com/P403n1x87/austin), which I also wanted to try (have not yet).
Somehow this experience so far was very disappointing.
(Side node, I debugged some very strange memory allocation behavior of Python before, where all local variables were kept around after an exception, even though I made sure there is no reference anymore to the exception object, to the traceback, etc, and I even called frame.clear() for all frames to really clear it. It turns out, frame.f_locals will create another copy of all the local variables, and the exception object and all the locals in the other frame still stay alive until you access frame.f_locals again. At that point, it will sync the f_locals again with the real (fast) locals, and then it can finally free everything. It was quite annoying to find the source of this problem and to find workarounds for it. https://github.com/python/cpython/issues/113939)

ImageAI

12 8,439 4.5 Python

A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

Project mention: Photo gallery frontend with encryption and search | /r/selfhosted | 2023-11-27

Hi. I want to implement an image server similar to Photoprism using ImageAI to tag images based on objects and context. However I don't want to spend to much time working on the frontend, at first I were thinking about using Danbooru and use Flexbooru or the web interface on my phone. But it doesn't have any encryption or password protection (since the purpose of it is to be used as a public image board).

cupy

22 7,843 9.9 Python

NumPy & SciPy for GPU

Project mention: Mojo: Ownership and lifetime checks deep dive with Chris Lattner [video] | news.ycombinator.com | 2024-05-13

I think I would agree with you. In my opinion, that already exists and is decently mature. CuPy [0] for Python and CUDA.jl [1] for Julia are both excellent ways to interface with GPU that don't require you to get into the nitty gritty of CUDA. Both do their best to keep you at the Array-level abstraction until you actually need to start writing kernels yourself and even then, it's pretty simple. They took a complete GPU novice like me and let me to write pretty performant kernels without having to ever touch raw CUDA.
[0] https://cupy.dev/

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
catboost

8 7,795 9.9 Python

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05

AlphaPose

4 7,750 0.0 Python

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
server

24 7,452 9.5 Python

The Triton Inference Server provides an optimized cloud and edge inferencing solution. (by triton-inference-server)

Project mention: FLaNK Weekly 08 Jan 2024 | dev.to | 2024-01-08

chainer

2 5,868 0.0 Python

A flexible framework of neural networks for deep learning

Project mention: ChaiNNer – Node/Graph based image processing and AI upscaling GUI | news.ycombinator.com | 2023-07-19

There is already an AI framework named Chainer: https://github.com/chainer/chainer

skypilot

34 5,762 9.8 Python

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

Project mention: Alternative clouds are booming as companies seek cheaper access to GPUs | news.ycombinator.com | 2024-05-06

Skypilot is worth a mention here:
https://github.com/skypilot-org/skypilot
Open source CLI to deploy multiple gpu vm’s on all major cloud providers, with an option to use spot pricing with 1 cheap vm used as a controller to always make sure you have the most inexpensive deployment available with failover and load balancing.
It’s like beating the cloud providers at their own game I wouldn’t be surprised if they banned it.

tf-quant-finance

133 4,318 2.9 Python

High-performance TensorFlow library for quantitative finance.

Project mention: tf-quant-finance: NEW Derivatives and Hedging - star count:3911.0 | /r/algoprojects | 2023-06-10

nvitop

5 4,068 7.3 Python

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Project mention: Nvtop: Linux Task Monitor for Nvidia, AMD and Intel GPUs | news.ycombinator.com | 2024-03-12

That's why the authors recommend pipx for installing nvitop. I am not a sysadmin, but I prefer pipx over relying on the (often outdated) distro sources.
https://github.com/XuehaiPan/nvitop?tab=readme-ov-file#insta...

gpustat

7 3,882 6.7 Python

📊 A simple command-line utility for querying and monitoring GPU status

Project mention: Nvtop: Linux Task Monitor for Nvidia, AMD and Intel GPUs | news.ycombinator.com | 2024-03-12

My favorite would be gpustat [1]. This shows the bare minimum amount of information to let's me know that the training has problems/running well
[1] https://github.com/wookayin/gpustat

pytorch-forecasting

9 3,660 8.6 Python

Time series forecasting with PyTorch

Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14

jittor

4 3,020 8.1 Python

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Project mention: VSL; Vlang's Scientific Library | /r/vlang | 2023-06-09

Would it make sense to have a backend support for OpenXLA, Apache TVM, Jittor or other similar to get free GPU, TPU and other accelerators for free ?

asitop

18 3,023 0.0 Python

Perf monitoring CLI tool for Apple Silicon

Project mention: Nvtop: Linux Task Monitor for Nvidia, AMD and Intel GPUs | news.ycombinator.com | 2024-03-12

There’s also asitop https://github.com/tlkh/asitop

leptonai

2 2,474 9.7 Python

A Pythonic framework to simplify AI service building

Project mention: Show HN: Running LLMs in one line of Python without Docker | news.ycombinator.com | 2023-10-04

Hello Hacker News! We're Yangqing, Xiang and JJ from lepton.ai. We are building a platform to run any AI models as easy as writing local code, and to get your favorite models in minutes. It's like container for AI, but without the hassle of actually building a docker image.
We built and contributed to some of the world's most popular AI software - PyTorch 1.0, ONNX, Caffe, etcd, Kubernetes, etc. We also managed hundreds of thousands of computers in our previous jobs. And we found that the AI software stack is usually unnecessarily complex - and we want to change that.
Imagine if you are a developer who sees a good model on github, or HuggingFace. To make it a production ready service, the current solution usually requires you to build a docker image. But think about it - I have a few python code and a few python dependencies. That sounds like a huge overhead, right?
lepton.ai is really a pythonic way to free you from such difficulties. You write a simple python scaffold around your PyTorch / TensorFlow code, and lepton launches it as a full-fledged service callable via python, javascript, or any language that understands OpenAPI. We use containers under the hood, but you don't need to worry about all the infrastructure nuts and bolts.
We have made the python library open-source at https://github.com/leptonai/leptonai/. With it, launching a common HuggingFace model is as simple as a one liner. For example, if you have a GPU, Stable Diffusion XL is as simple as:
```

pygraphistry

9 2,074 9.2 Python

PyGraphistry is a Python library to quickly load, shape, embed, and explore big graphs with the GPU-accelerated Graphistry visual graph analyzer

Project mention: Graph Data Fits in Memory | news.ycombinator.com | 2024-04-15

Extra fun: We find most enterprise/gov graph analytics work only requires 1-2 attributes to go along with the graph index, and those attributes often are already numeric (time, $, ...) or can be dictionary-encoded as discussed here (categorical, ID, ...)... so even 'tough' billion scale graphs are fine on 1 gpu.
Early, but that's been the basic thinking into our new GFQL system: slice into the columns you want, and then do all the in-GPU traversals you want. In our V1, we keep things dataframe-native include the in-GPU data representation, and are already working on the first extensions to support switching to more graph-native indexing for steps as needed.
Ex: https://github.com/graphistry/pygraphistry/blob/master/demos...

jetson_stats

2 2,041 8.6 Python

📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
PyCUDA

0 1,762 5.4 Python

CUDA integration for Python, plus shiny features
torchrec

1 1,743 9.8 Python

Pytorch domain library for recommendation systems
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python GPU related posts

PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed

1 project | news.ycombinator.com | 10 May 2024
Clasificador de imágenes con una red neuronal convolucional (CNN)

2 projects | dev.to | 1 May 2024
Graph Data Fits in Memory

1 project | news.ycombinator.com | 15 Apr 2024
Functions and operators for Dot and Matrix multiplication and Element-wise calculation in PyTorch

1 project | dev.to | 21 Mar 2024
Building a GPT Model from the Ground Up!

1 project | dev.to | 20 Mar 2024
Nvtop: Linux Task Monitor for Nvidia, AMD and Intel GPUs

10 projects | news.ycombinator.com | 12 Mar 2024
The "missing" graph datatype already exists. It was invented in the '70s

6 projects | news.ycombinator.com | 5 Mar 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 26 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source GPU projects in Python? This list will help you:

	Project	Stars
1	Pytorch	78,642
2	DeepSpeed	33,122
3	ivy	14,027
4	tvm	11,272
5	scalene	11,240
6	ImageAI	8,439
7	cupy	7,843
8	catboost	7,795
9	AlphaPose	7,750
10	server	7,452
11	chainer	5,868
12	skypilot	5,762
13	tf-quant-finance	4,318
14	nvitop	4,068
15	gpustat	3,882
16	pytorch-forecasting	3,660
17	jittor	3,020
18	asitop	3,023
19	leptonai	2,474
20	pygraphistry	2,074
21	jetson_stats	2,041
22	PyCUDA	1,762
23	torchrec	1,743

Python GPU

Top 23 Python GPU Projects

Python GPU related posts

PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed

Clasificador de imágenes con una red neuronal convolucional (CNN)

Graph Data Fits in Memory

Functions and operators for Dot and Matrix multiplication and Element-wise calculation in PyTorch

Building a GPT Model from the Ground Up!

Nvtop: Linux Task Monitor for Nvidia, AMD and Intel GPUs

The "missing" graph datatype already exists. It was invented in the '70s

Index