Top 23 C++ Deep Learning Projects

tensorflow

223 182,857 10.0 C++

An Open Source Machine Learning Framework for Everyone

Project mention: Side Quest Devblog #1: These Fakes are getting Deep | dev.to | 2024-04-29

# L2-normalize the encoding tensors image_encoding = tf.math.l2_normalize(image_encoding, axis=1) audio_encoding = tf.math.l2_normalize(audio_encoding, axis=1) # Find euclidean distance between image_encoding and audio_encoding # Essentially trying to detect if the face is saying the audio # Will return nan without the 1e-12 offset due to https://github.com/tensorflow/tensorflow/issues/12071 d = tf.norm((image_encoding - audio_encoding) + 1e-12, ord='euclidean', axis=1, keepdims=True) discriminator = keras.Model(inputs=[image_input, audio_input], outputs=[d], name="discriminator")

OpenCV

197 75,938 9.9 C++

Open Source Computer Vision Library

Project mention: FridgeBot — GPT-4o shopping list automation | dev.to | 2024-05-20

Open the camera feed — and use the OpenCV library for real-time computer vision processing

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Caffe

6 33,891 0.0 C++

Caffe: a fast open framework for deep learning.
openpose

36 30,025 5.1 C++

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Project mention: AI "Artists" Are Lazy, and the Ultimate Goal of AI Image Generation (hint: its sloth) | /r/ArtistHate | 2023-11-25

Open Pose, a multi-person keypoint detection library for body, face, hands, and foot estimation [10], is used for posing generated characters;

mediapipe

49 25,688 9.9 C++

Cross-platform, customizable ML solutions for live and streaming media.

Project mention: Mediapipe openpose Controlnet model for SD | /r/localdiffusion | 2023-11-15

mediapipe/docs/solutions/pose.md at master · google/mediapipe · GitHub

DeepSpeech

68 24,422 0.0 C++

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project mention: ESpeak-ng: speech synthesizer with more than one hundred languages and accents | news.ycombinator.com | 2024-05-01

As I understand it DeepSpeech is no longer actively maintained by Mozilla: https://github.com/mozilla/DeepSpeech/issues/3693
For Text To Speech, I've found Piper TTS useful (for situations where "quality"=="realistic"/"natual"): https://github.com/rhasspy/piper
For Speech to Text (which AIUI DeepSpeech provided), I've had some success with Vosk: https://github.com/alphacep/vosk-api

PaddlePaddle

6 21,656 10.0 C++

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
ncnn

12 19,352 9.4 C++

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Project mention: AMD Funded a Drop-In CUDA Implementation Built on ROCm: It's Open-Source | news.ycombinator.com | 2024-02-12

ncnn uses Vulkan for GPU acceleration, I've seen it used in a few projects to get AMD hardware support.
https://github.com/Tencent/ncnn

CNTK

1 17,435 0.0 C++

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
Dlib

33 13,084 8.2 C++

A toolkit for making real world machine learning and data analysis applications in C++

Project mention: Modern Image Processing Algorithms Implementation in C | news.ycombinator.com | 2023-06-06

onnxruntime

55 12,960 10.0 C++

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Project mention: New exponent functions that make SiLU and SoftMax 2x faster, at full acc | news.ycombinator.com | 2024-05-15

carla

22 10,601 9.4 C++

Open-source simulator for autonomous driving research.

Project mention: Tesla braces for its first trial involving Autopilot fatality | news.ycombinator.com | 2023-08-28

TensorRT

22 9,223 4.8 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Project mention: AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack | news.ycombinator.com | 2023-12-17

> It's not rocket science to implement matrix multiplication in any GPU.
You're right, it's harder. Saying this as someone who's done more work on the former than the latter. (I have, with a team, built a rocket engine. And not your school or backyard project size, but nozzle bigger than your face kind. I've also written CUDA kernels and boy is there a big learning curve to the latter that you gotta fundamentally rethink how you view a problem. It's unquestionable why CUDA devs are paid so much. Really it's only questionable why they aren't paid more)
I know it is easy to think this problem is easy, it really looks that way. But there's an incredible amount of optimization that goes into all of this and that's what's really hard. You aren't going to get away with just N for loops for a tensor rank N. You got to chop the data up, be intelligent about it, manage memory, how you load memory, handle many data types, take into consideration different results for different FMA operations, and a whole lot more. There's a whole lot of non-obvious things that result in high optimization (maybe obvious __after__ the fact, but that's not truthfully "obvious"). The thing is, the space is so well researched and implemented that you can't get away with naive implementations, you have to be on the bleeding edge.
Then you have to do that and make it reasonably usable for the programmer too, abstracting away all of that. Cuda also has a huge head start and momentum is not a force to be reckoned with (pun intended).
Look at TensorRT[0]. The software isn't even complete and it still isn't going to cover all neural networks on all GPUs. I've had stuff work on a V100 and H100 but not an A100, then later get fixed. They even have the "Apple Advantage" in that they have control of the hardware. I'm not certain AMD will have the same advantage. We talk a lot about the difficulties of being first mover, but I think we can also recognize that momentum is an advantage of being first mover. And it isn't one to scoff at.
[0] https://github.com/NVIDIA/TensorRT

MNN

3 8,336 8.0 C++

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Project mention: [D][R] Deploying deep models on memory constrained devices | /r/MachineLearning | 2023-10-03

However, I am looking on this subject through the problem of training/finetuning deep models on the edge devices, being increasingly available thing to do. Looking at tflite, alibaba's MNN, mit-han-lab's tinyengine etc..

jetson-inference

11 7,413 7.7 C++

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
wav2letter

3 6,339 4.6 C++

Facebook AI Research's Automatic Speech Recognition Toolkit
serving

12 6,092 9.8 C++

A flexible, high-performance serving system for machine learning models

Project mention: Llama.cpp: Full CUDA GPU Acceleration | news.ycombinator.com | 2023-06-12

Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving

openvino

17 6,028 10.0 C++

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05

tiny-cnn

1 5,763 0.0 C++

header only, dependency-free deep learning framework in C++14
oneflow

32 5,748 8.4 C++

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
flashlight

16 5,169 7.4 C++

A C++ standalone library for machine learning (by flashlight)

Project mention: MatX: Efficient C++17 GPU numerical computing library with Python-like syntax | news.ycombinator.com | 2023-10-03

I think a comparison to PyTorch, TensorFlow and/or JAX is more relevant than a comparison to CuPy/NumPy.
And then maybe also a comparison to Flashlight (https://github.com/flashlight/flashlight) or other C/C++ based ML/computing libraries?
Also, there is no mention of it, so I suppose this does not support automatic differentiation?

DALI

5 4,931 9.6 C++

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Project mention: [D] Will data augmentations work faster on TPUs? | /r/MachineLearning | 2023-12-07

Another option is DALI https://github.com/NVIDIA/DALI For my project while training EfficientNet2, it was a game changer. But it a way harder to implement in code than TorchVision or Kornia.

mace

1 4,882 2.5 C++

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Deep Learning related posts

FridgeBot — GPT-4o shopping list automation

2 projects | dev.to | 20 May 2024
New exponent functions that make SiLU and SoftMax 2x faster, at full acc

2 projects | news.ycombinator.com | 15 May 2024
ESpeak-ng: speech synthesizer with more than one hundred languages and accents

21 projects | news.ycombinator.com | 1 May 2024
Side Quest Devblog #1: These Fakes are getting Deep

3 projects | dev.to | 29 Apr 2024
การจำแนกสายพันธุ์มะม่วง โดยใช้ Visual Geometry Group 16 (VGG16) ใน Python

1 project | dev.to | 16 Apr 2024
The Elements of Differentiable Programming

5 projects | news.ycombinator.com | 22 Mar 2024
Who uses Google TPUs for inference in production?

1 project | news.ycombinator.com | 11 Mar 2024
A note from our sponsor - SaaSHub
www.saashub.com | 21 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Deep Learning projects in C++? This list will help you:

	Project	Stars
1	tensorflow	182,857
2	OpenCV	75,938
3	Caffe	33,891
4	openpose	30,025
5	mediapipe	25,688
6	DeepSpeech	24,422
7	PaddlePaddle	21,656
8	ncnn	19,352
9	CNTK	17,435
10	Dlib	13,084
11	onnxruntime	12,960
12	carla	10,601
13	TensorRT	9,223
14	MNN	8,336
15	jetson-inference	7,413
16	wav2letter	6,339
17	serving	6,092
18	openvino	6,028
19	tiny-cnn	5,763
20	oneflow	5,748
21	flashlight	5,169
22	DALI	4,931
23	mace	4,882

C++ Deep Learning

Top 23 C++ Deep Learning Projects

C++ Deep Learning related posts

FridgeBot — GPT-4o shopping list automation

New exponent functions that make SiLU and SoftMax 2x faster, at full acc

ESpeak-ng: speech synthesizer with more than one hundred languages and accents

Side Quest Devblog #1: These Fakes are getting Deep

การจำแนกสายพันธุ์มะม่วง โดยใช้ Visual Geometry Group 16 (VGG16) ใน Python

The Elements of Differentiable Programming

Who uses Google TPUs for inference in production?

Index