Python Computer Vision

Open-source Python projects categorized as Computer Vision

Top 23 Python Computer Vision Projects

  • Face Recognition

    The world's simplest facial recognition api for Python and the command line

  • Project mention: Security Image Recognition | /r/computervision | 2023-12-10

    Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition

  • pytorch-CycleGAN-and-pix2pix

    Image-to-Image Translation in PyTorch

  • Project mention: List of AI-Models | /r/GPT_do_dah | 2023-05-16

    Click to Learn more...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

  • Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27

    PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

  • datasets

    πŸ€— The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

  • Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert πŸ§‘β€πŸ’» πŸ₯‡ | dev.to | 2023-10-19
  • vit-pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

  • Project mention: Is it easier to go from Pytorch to TF and Keras than the other way around? | /r/pytorch | 2023-05-13

    I also need to learn Pyspark so right now I am going to download the Fashion Mnist dataset, use Pyspark to downsize each image and put the into separate folders according to their labels (just to show employers I can do some basic ETL with Pyspark, not sure how I am going to load for training in Pytorch yet though). Then I am going to write the simplest Le Net to try to categorize the fashion MNIST dataset (results will most likely be bad but it's okay). Next, try to learn transfer learning in Pytorch for both CNN or maybe skip ahead to ViT. Ideally at this point I want to study the Attention mechanism a bit more and try to implement Simple Vit which I saw here: https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py

  • vision

    Datasets, Transforms and Models specific to Computer Vision

  • Project mention: Transitioning From PyTorch to Burn | dev.to | 2024-02-14

    Let's start by defining the ResNet module according to the Residual Network architecture, as replicated[1] by the torchvision implementation of the model we will import. Detailed architecture variants with a depth of 18, 34, 50, 101 and 152 layers can be found in the table below.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • supervision

    We write your reusable computer vision tools. πŸ’œ

  • Project mention: Supervision: Reusable Computer Vision | news.ycombinator.com | 2024-03-24

    You can always slice the images into smaller ones, run detection on each tile, and combine results. Supervision has a utility for this - https://supervision.roboflow.com/latest/detection/tools/infe..., but it only works with detections. You can get a much more accurate result this way. Here is some side-by-side comparison: https://github.com/roboflow/supervision/releases/tag/0.14.0.

  • facenet

    Face recognition using Tensorflow

  • labelme

    Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

  • fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_down:

  • Project mention: Logistic Regression for Image Classification Using OpenCV | news.ycombinator.com | 2023-12-31

    In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.

    Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#

  • gaussian-splatting

    Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

  • Project mention: Show HN: Gaussian Splat renderer in VR with Unity | news.ycombinator.com | 2024-01-24

    Chris' post doesn't really give much background info, so here's what's going on here and why it's awesome.

    Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects like photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.

    3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.

    https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/

    3DGS uses several advanced techniques:

    1. A 3D point cloud is estimated by using "structure in motion" techniques.

    2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.

    3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.

    The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.

    In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • Meshroom

    3D Reconstruction Software

  • Project mention: AI bots saying I made a fake post: NOT FAKE POST | /r/opensource | 2023-12-10
  • pytorch-grad-cam

    Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

  • Project mention: Exploring GradCam and More with FiftyOne | dev.to | 2024-02-13

    For the two examples we will be looking at, we will be using pytorch_grad_cam, an incredible open source package that makes working with GradCam very easy. There are excellent other tutorials to check out on the repo as well.

  • Kornia

    Geometric Computer Vision Library for Spatial AI

  • nerfstudio

    A collaboration friendly studio for NeRFs

  • Project mention: Smerf: Streamable Memory Efficient Radiance Fields | news.ycombinator.com | 2023-12-13

    You’re under the right paper for doing this. Instead of one big model, they have several smaller ones for regions in the scene. This way rendering is fast for large scenes.

    This is similar to Block-NeRF [0], in their project page they show some videos of what you’re asking.

    As for an easy way of doing this, nothing out-of-the-box. You can keep an eye on nerfstudio [1], and if you feel brave you could implement this paper and make a PR!

    [0] https://waymo.com/intl/es/research/block-nerf/

    [1] https://github.com/nerfstudio-project/nerfstudio

  • RobustVideoMatting

    Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

  • Project mention: lineart_coarse + openpose, batch img2img | /r/StableDiffusion | 2023-05-10
  • U-2-Net

    The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

  • Project mention: I used the ChatGPT API to create a proof-of-concept AI driven video game. Using generative AI for the images and dialogue and GPT-3.5 for narrative and game control. More info in comments. | /r/ChatGPT | 2023-06-17

    I use a finetuned custom Stable Diffusion model in combination with a style embedding for the characters for image generation and UΒ²-Net for background removal.

  • deeplake

    Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

  • Project mention: FLaNK AI Weekly 25 March 2025 | dev.to | 2024-03-25
  • autogluon

    Fast and Accurate ML in 3 Lines of Code

  • fiftyone

    The open-source tool for building high-quality datasets and computer vision models

  • Project mention: May 8, 2024 AI, Machine Learning and Computer Vision Meetup | dev.to | 2024-05-01

    In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.

  • BackgroundMattingV2

    Real-Time High-Resolution Background Matting

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Computer Vision related posts

  • Voxel51 Is Hiring AI Researchers and Scientists β€” What the New Open Science Positions Mean

    1 project | dev.to | 26 Apr 2024
  • Show HN: I made a ROS package for realtime semantic segmentation

    1 project | news.ycombinator.com | 26 Apr 2024
  • How to Estimate Depth from a Single Image

    8 projects | dev.to | 25 Apr 2024
  • How to Detect Small Objects

    3 projects | dev.to | 22 Apr 2024
  • How to Cluster Images

    5 projects | dev.to | 9 Apr 2024
  • Running OCR against PDFs and images directly in the browser

    7 projects | news.ycombinator.com | 30 Mar 2024
  • Supervision: Reusable Computer Vision

    5 projects | news.ycombinator.com | 24 Mar 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 4 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more β†’

Index

What are some of the best open-source Computer Vision projects in Python? This list will help you:

Project Stars
1 Face Recognition 51,816
2 pytorch-CycleGAN-and-pix2pix 22,029
3 EasyOCR 21,953
4 d2l-en 21,704
5 datasets 18,443
6 vit-pytorch 18,006
7 vision 15,454
8 supervision 14,068
9 facenet 13,507
10 labelme 12,361
11 fashion-mnist 11,439
12 gaussian-splatting 11,391
13 ludwig 10,827
14 Meshroom 10,599
15 pytorch-grad-cam 9,456
16 Kornia 9,395
17 nerfstudio 8,533
18 RobustVideoMatting 8,189
19 U-2-Net 8,115
20 deeplake 7,729
21 autogluon 7,124
22 fiftyone 6,712
23 BackgroundMattingV2 6,665

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com