Top 23 Python Machine Learning Projects

transformers

180 126,170 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Project mention: Reading list to join AI field from Hugging Face cofounder | news.ycombinator.com | 2024-05-18

Not sure what you are implying. Thomas Wolf has the second highest number of commits on HuggingFace/transformers. He is clearly competent & deeply technical
https://github.com/huggingface/transformers/

Pytorch

341 78,642 10.0 Python

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Project mention: PyTorch 2.3: User-Defined Triton Kernels, Tensor Parallelism in Distributed | news.ycombinator.com | 2024-05-10

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Keras

79 61,044 9.9 Python

Deep Learning for humans

Project mention: Side Quest #3: maybe the real Deepfakes were the friends we made along the way | dev.to | 2024-05-20

def batcher_from_directory(batch_size:int, dataset_path:str, shuffle=False,seed=None) -> tf.data.Dataset: """ Return a tensorflow Dataset object that returns images and spectrograms as required. Partly inspired by https://github.com/keras-team/keras/blob/v3.3.3/keras/src/utils/image_dataset_utils.py Args: batch_size: The batch size. dataset_path: The path to the dataset folder which must contain the image folder and audio folder. shuffle: Whether to shuffle the dataset. Default to False. seed: The seed for the shuffle. Default to None. """ image_dataset_path = os.path.join(dataset_path, "image") # create the foundation datasets og_dataset = tf.data.Dataset.from_generator(lambda: original_image_path_gen(image_dataset_path), output_signature=tf.TensorSpec(shape=(), dtype=tf.string)) og_dataset = og_dataset.repeat(None) # repeat indefinitely ref_dataset = tf.data.Dataset.from_generator(lambda: ref_image_path_gen(image_dataset_path), output_signature=(tf.TensorSpec(shape=(), dtype=tf.string), tf.TensorSpec(shape=(), dtype=tf.bool))) ref_dataset = ref_dataset.repeat(None) # repeat indefinitely # create the input datasets og_image_dataset = og_dataset.map(lambda x: tf.py_function(load_image, [x, tf.convert_to_tensor(False, dtype=tf.bool)], tf.float32), num_parallel_calls=tf.data.AUTOTUNE) masked_image_dataset = og_image_dataset.map(lambda x: tf.py_function(load_masked_image, [x], tf.float32), num_parallel_calls=tf.data.AUTOTUNE) ref_image_dataset = ref_dataset.map(lambda x, y: tf.py_function(load_image, [x, y], tf.float32), num_parallel_calls=tf.data.AUTOTUNE) audio_spec_dataset = og_dataset.map(lambda x: tf.py_function(load_audio_data, [x, dataset_path], tf.float64), num_parallel_calls=tf.data.AUTOTUNE) unsync_spec_dataset = ref_dataset.map(lambda x, _: tf.py_function(load_audio_data, [x, dataset_path], tf.float64), num_parallel_calls=tf.data.AUTOTUNE) # ensure shape as tensorflow does not accept unknown shapes og_image_dataset = og_image_dataset.map(lambda x: tf.ensure_shape(x, IMAGE_SHAPE)) masked_image_dataset = masked_image_dataset.map(lambda x: tf.ensure_shape(x, MASKED_IMAGE_SHAPE)) ref_image_dataset = ref_image_dataset.map(lambda x: tf.ensure_shape(x, IMAGE_SHAPE)) audio_spec_dataset = audio_spec_dataset.map(lambda x: tf.ensure_shape(x, AUDIO_SPECTROGRAM_SHAPE)) unsync_spec_dataset = unsync_spec_dataset.map(lambda x: tf.ensure_shape(x, AUDIO_SPECTROGRAM_SHAPE)) # multi input using https://discuss.tensorflow.org/t/train-a-model-on-multiple-input-dataset/17829/4 full_dataset = tf.data.Dataset.zip((masked_image_dataset, ref_image_dataset, audio_spec_dataset, unsync_spec_dataset), og_image_dataset) # if shuffle: # full_dataset = full_dataset.shuffle(buffer_size=batch_size * 8, seed=seed) # not sure why buffer size is such # batch full_dataset = full_dataset.batch(batch_size=batch_size) return full_dataset

scikit-learn

82 58,344 9.9 Python

scikit-learn: machine learning in Python

Project mention: How to Build a Logistic Regression Model: A Spam-filter Tutorial | dev.to | 2024-05-05

Online Courses: Coursera: "Machine Learning" by Andrew Ng edX: "Introduction to Machine Learning" by MIT Tutorials: Scikit-learn documentation: https://scikit-learn.org/ Kaggle Learn: https://www.kaggle.com/learn Books: "Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow" by Aurélien Géron "The Elements of Statistical Learning" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman By understanding the core concepts of logistic regression, its limitations, and exploring further resources, you'll be well-equipped to navigate the exciting world of machine learning!

Face Recognition

34 51,968 0.0 Python

The world's simplest facial recognition api for Python and the command line

Project mention: Security Image Recognition | /r/computervision | 2023-12-10

Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition

faceswap

10 49,466 8.5 Python

Deepfakes Software For All

Project mention: faceswap VS facefusion - a user suggested alternative | libhunt.com/r/faceswap | 2024-01-30

yolov5

129 47,375 8.8 Python

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Project mention: จำแนกสายพันธ์ุหมากับแมวง่ายๆด้วยYoLoV5 | dev.to | 2024-04-15

Ref https://www.youtube.com/watch?v=0GwnxFNfZhM https://github.com/ultralytics/yolov5 https://dev.to/gfstealer666/kaaraich-yolo-alkrithuemainkaartrwcchcchabwatthu-object-detection-3lef https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/data

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Open-Assistant

329 36,728 8.3 Python

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Project mention: Best open source AI chatbot alternative? | /r/opensource | 2023-12-08

For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference

Airflow

170 34,705 10.0 Python

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

gym

96 33,966 0.0 Python

A toolkit for developing and comparing reinforcement learning algorithms.

Project mention: OpenAI Acquires Global Illumination | news.ycombinator.com | 2023-08-16

A co-founder announced they disbanded their robots team a couple years ago: https://venturebeat.com/business/openai-disbands-its-robotic...
That was the same time they depreciated OpenAI Gym: https://github.com/openai/gym

DeepSpeed

51 33,018 9.8 Python

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06

DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.

streamlit

258 32,222 9.8 Python

Streamlit — A faster way to build and share data apps.

Project mention: Developing a Generic Streamlit UI to Test Amazon Bedrock Agents | dev.to | 2024-05-05

I decided to use Streamlit to build the UI as it is a popular and fitting choice. Streamlit is an open-source Python library used for building interactive web applications specially for AI and data applications. Since the application code is written only in Python, it is easy to learn and build with.

Ray

43 31,414 10.0 Python

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Project mention: Ray: Unified framework for scaling AI and Python applications | news.ycombinator.com | 2024-05-03

gradio

116 29,400 9.9 Python

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Project mention: AI enthusiasm #9 - A multilingual chatbot📣🈸 | dev.to | 2024-05-01

gradio is a package developed to ease the development of app interfaces in python and other languages (GitHub)

spaCy

107 28,887 9.2 Python

💫 Industrial-strength Natural Language Processing (NLP) in Python

Project mention: How I discovered Named Entity Recognition while trying to remove gibberish from a string. | dev.to | 2024-05-06

pytorch-lightning

9 27,064 9.9 Python

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Project mention: SB-1047 will stifle open-source AI and decrease safety | news.ycombinator.com | 2024-04-29

It's very easy to get started, right in your Terminal, no fees! No credit card at all.
And there are cloud providers like https://replicate.com/ and https://lightning.ai/ that will let you use your LLM via an API key just like you did with OpenAI if you need that.
You don't need OpenAI - nobody does.

data-science-ipython-notebooks

1 26,545 0.0 Python

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
OpenBBTerminal

212 26,204 9.8 Python

Investment Research for Everyone, Everywhere.

Project mention: Where do you get your B2B news? | news.ycombinator.com | 2024-05-07

have you seen the https://openbb.co/ project? an open source Bloomberg Terminal project you may find interesting ;-)

ultralytics

27 23,574 9.8 Python

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Project mention: The CEO of Ultralytics (yolov8) using LLMs to engage with commenters on GitHub | news.ycombinator.com | 2024-02-12

Yep, I noticed this a while ago. It posts easily identifiable ChatGPT responses. It also posts garbage wrong answers which makes it worse than useless. Totally disrespectful to the userbase.
https://github.com/ultralytics/ultralytics/issues/5748#issue...

ML-From-Scratch

3 23,260 0.0 Python

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
NLP-progress

17 22,362 2.1 Python

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
EasyOCR

39 22,237 3.6 Python

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Project mention: I built an online PDF management platform using open-source software | news.ycombinator.com | 2024-05-12

Ok on cleaned aligned data, but there are a few newer ones like EasyOCR [0] that can deal with much less organized text (albeit more slowly)
[0] https://github.com/JaidedAI/EasyOCR

d2l-en

6 21,922 8.5 Python

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Machine Learning related posts

Side Quest #3: maybe the real Deepfakes were the friends we made along the way

3 projects | dev.to | 20 May 2024
Show HN: Anthropic's Prompt Engineering Interactive Tutorial (Web Version)

2 projects | news.ycombinator.com | 18 May 2024
Mlflow: Open-source platform for the machine learning lifecycle

1 project | news.ycombinator.com | 16 May 2024
Ask HN: Running LLMs Locally

2 projects | news.ycombinator.com | 15 May 2024
A Developer's Guide to Evaluating LLMs!

1 project | dev.to | 14 May 2024
River: Online Machine Learning in Python

1 project | news.ycombinator.com | 12 May 2024
AI Strategy Guide: How to Scale AI Across Your Business

4 projects | dev.to | 11 May 2024
A note from our sponsor - SaaSHub
www.saashub.com | 20 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Machine Learning projects in Python? This list will help you:

	Project	Stars
1	transformers	126,170
2	Pytorch	78,642
3	Keras	61,044
4	scikit-learn	58,344
5	Face Recognition	51,968
6	faceswap	49,466
7	yolov5	47,375
8	Open-Assistant	36,728
9	Airflow	34,705
10	gym	33,966
11	DeepSpeed	33,018
12	streamlit	32,222
13	Ray	31,414
14	gradio	29,400
15	spaCy	28,887
16	pytorch-lightning	27,064
17	data-science-ipython-notebooks	26,545
18	OpenBBTerminal	26,204
19	ultralytics	23,574
20	ML-From-Scratch	23,260
21	NLP-progress	22,362
22	EasyOCR	22,237
23	d2l-en	21,922

Python Machine Learning

Top 23 Python Machine Learning Projects

Python Machine Learning related posts

Side Quest #3: maybe the real Deepfakes were the friends we made along the way

Show HN: Anthropic's Prompt Engineering Interactive Tutorial (Web Version)

Mlflow: Open-source platform for the machine learning lifecycle

Ask HN: Running LLMs Locally

A Developer's Guide to Evaluating LLMs!

River: Online Machine Learning in Python

AI Strategy Guide: How to Scale AI Across Your Business

Index