SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Python Computer Vision Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
datasets
π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
-
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
labelme
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
-
gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
-
pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
-
U-2-Net
The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition
Click to Learn more...
Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]
Project mention: ππ 23 issues to grow yourself as an exceptional open-source Python expert π§βπ» π₯ | dev.to | 2023-10-19
Project mention: Is it easier to go from Pytorch to TF and Keras than the other way around? | /r/pytorch | 2023-05-13I also need to learn Pyspark so right now I am going to download the Fashion Mnist dataset, use Pyspark to downsize each image and put the into separate folders according to their labels (just to show employers I can do some basic ETL with Pyspark, not sure how I am going to load for training in Pytorch yet though). Then I am going to write the simplest Le Net to try to categorize the fashion MNIST dataset (results will most likely be bad but it's okay). Next, try to learn transfer learning in Pytorch for both CNN or maybe skip ahead to ViT. Ideally at this point I want to study the Attention mechanism a bit more and try to implement Simple Vit which I saw here: https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/simple_vit.py
Let's start by defining the ResNet module according to the Residual Network architecture, as replicated[1] by the torchvision implementation of the model we will import. Detailed architecture variants with a depth of 18, 34, 50, 101 and 152 layers can be found in the table below.
You can always slice the images into smaller ones, run detection on each tile, and combine results. Supervision has a utility for this - https://supervision.roboflow.com/latest/detection/tools/infe..., but it only works with detections. You can get a much more accurate result this way. Here is some side-by-side comparison: https://github.com/roboflow/supervision/releases/tag/0.14.0.
Project mention: Logistic Regression for Image Classification Using OpenCV | news.ycombinator.com | 2023-12-31In this case there's no advantage to using logistic regression on an image other than the novelty. Logistic regression is excellent for feature explainability, but you can't explain anything from an image.
Traditional classification algorithms but not deep learning such as SVMs and Random Forest perform a lot better on MNIST, up to 97% accuracy compared to the 88% from logistic regression in this post. Check the Original MNIST benchmarks here: http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/#
Project mention: Show HN: Gaussian Splat renderer in VR with Unity | news.ycombinator.com | 2024-01-24Chris' post doesn't really give much background info, so here's what's going on here and why it's awesome.
Real-time 3D rendering has historically been based on rasterisation of polygons. This has brought us a long way and has a lot of advantages, but making photorealistic scenes takes a lot of work from the artist. You can scan real objects like photogrammetry and then convert to high poly meshes, but photogrammetry rigs are pro-level tools, and the assets won't render at real time speeds. Unreal 5 introduced Nanite which is a very advanced LoD algorithm and that helps a lot, but again, we seem to be hitting the limits of what can be done with polygon based rendering.
3D Gaussian Splats is a new AI based technique that lets you render in real-time photorealistic 3D scenes that were captured with only a few photos taken using normal cameras. It replaces polygon based rendering with radiance fields.
https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/
3DGS uses several advanced techniques:
1. A 3D point cloud is estimated by using "structure in motion" techniques.
2. The points are turned into "3D gaussians", which are sort of floating blobs of light where each one has a position, opacity and a covariance matrix defined using "spherical harmonics" (no me neither). They're ellipsoids so can be thought of as spheres that are stretched and rotated.
3. Rendering is done via a form of ray-tracing in which the 3D Gaussians are projected to the 2D screen (into "splats"), sorted so transparency works and then rasterized on the fly using custom shaders.
The neural network isn't actually used at rendering time, so GPUs can render the scene nice and fast.
In terms of what it can do the technique might be similar to Unreal's Nanite. Both are designed for static scenes. Whilst 3D Gaussians can be moved around on the fly, so the scene can be changed in principle, none of the existing animation, game engines or artwork packages know what to do without polygons. But this sort of thing could be used to rapidly create VR worlds based on only videos taken from different angles, which seems useful.
Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!
For the two examples we will be looking at, we will be using pytorch_grad_cam, an incredible open source package that makes working with GradCam very easy. There are excellent other tutorials to check out on the repo as well.
Project mention: Smerf: Streamable Memory Efficient Radiance Fields | news.ycombinator.com | 2023-12-13Youβre under the right paper for doing this. Instead of one big model, they have several smaller ones for regions in the scene. This way rendering is fast for large scenes.
This is similar to Block-NeRF [0], in their project page they show some videos of what youβre asking.
As for an easy way of doing this, nothing out-of-the-box. You can keep an eye on nerfstudio [1], and if you feel brave you could implement this paper and make a PR!
[0] https://waymo.com/intl/es/research/block-nerf/
[1] https://github.com/nerfstudio-project/nerfstudio
Project mention: I used the ChatGPT API to create a proof-of-concept AI driven video game. Using generative AI for the images and dialogue and GPT-3.5 for narrative and game control. More info in comments. | /r/ChatGPT | 2023-06-17I use a finetuned custom Stable Diffusion model in combination with a style embedding for the characters for image generation and UΒ²-Net for background removal.
In this brief walkthrough, I will illustrate how to leverage open-source FiftyOne and Anomalib to build deployment-ready anomaly detection models. First, we will load and visualize the MVTec AD dataset in the FiftyOne App. Next, we will use Albumentations to test out augmentation techniques. We will then train an anomaly detection model with Anomalib and evaluate the model with FiftyOne.
Python Computer Vision related posts
-
Voxel51 Is Hiring AI Researchers and Scientists β What the New Open Science Positions Mean
-
Show HN: I made a ROS package for realtime semantic segmentation
-
How to Estimate Depth from a Single Image
-
How to Detect Small Objects
-
How to Cluster Images
-
Running OCR against PDFs and images directly in the browser
-
Supervision: Reusable Computer Vision
-
A note from our sponsor - SaaSHub
www.saashub.com | 4 May 2024
Index
What are some of the best open-source Computer Vision projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Face Recognition | 51,816 |
2 | pytorch-CycleGAN-and-pix2pix | 22,029 |
3 | EasyOCR | 21,953 |
4 | d2l-en | 21,704 |
5 | datasets | 18,443 |
6 | vit-pytorch | 18,006 |
7 | vision | 15,454 |
8 | supervision | 14,068 |
9 | facenet | 13,507 |
10 | labelme | 12,361 |
11 | fashion-mnist | 11,439 |
12 | gaussian-splatting | 11,391 |
13 | ludwig | 10,827 |
14 | Meshroom | 10,599 |
15 | pytorch-grad-cam | 9,456 |
16 | Kornia | 9,395 |
17 | nerfstudio | 8,533 |
18 | RobustVideoMatting | 8,189 |
19 | U-2-Net | 8,115 |
20 | deeplake | 7,729 |
21 | autogluon | 7,124 |
22 | fiftyone | 6,712 |
23 | BackgroundMattingV2 | 6,665 |
Sponsored