Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 15 Python Deduplication Projects
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
LSH
Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents
-
benji
Benji Backup: A block based deduplicating backup software for Ceph RBD images, iSCSI targets, image files and block devices
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
npbackup
A secure and efficient file backup solution that fits both system administrators (CLI) and end users (GUI)
-
unisim
UniSim is a package for efficient similarity computation, fuzzy matching, and clustering of data.
-
dude
Duplicates Detector is a cross-platform GUI utility for finding duplicate files, allowing you to delete or link them to save space. Duplicate files are displayed and processed on two synchronized panels for efficient and convenient operation. (by PJDude)
-
Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
-
Deduper
The goal of this project is to make a deduper program that anybody can run on their computer to save storage space. (by ThatOneShortGuy)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Ask HN: Open-source Windows 11 backup solutions | news.ycombinator.com | 2024-04-04i use - and recommend - "borgbackup": for example with the "vorta" graphical frontend
* https://www.borgbackup.org/
* https://vorta.borgbase.com/install/windows/
just my 0.02€
Dupeguru
- for important files, a separate box where I have borgmatic [1] in deduplication mode installed; this is updated once in a while
Just curious: Do you have any reason to believe that such a data corruption bug is likely in ZFS? It seems like saying that ext4 could have a bug and you should also store stuff on NTFS, just in case (which I think does not make sense..).
[1]: https://github.com/borgmatic-collective/borgmatic
Project mention: Splink: Fast, accurate, scalable probabilistic data linkage | news.ycombinator.com | 2024-03-13
Project mention: Google UniSim for efficient similarity computation | news.ycombinator.com | 2023-11-30
Week 4: 🪞Image Deduplication
Python Deduplication related posts
-
Splink: Fast, accurate, scalable probabilistic data linkage
-
I Backup
-
Duplicity
-
How to use onedrive for culling photos
-
Does anyone know any freeware duplicate file checkers without an upsell similar to awesome duplicate photo finder?
-
Kopia: Open-Source, Fast and Secure Open-Source Backup Software
-
DupeGuru: Open-source, cross-platform GUI tool to find duplicate files
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 8 Jun 2024
Index
What are some of the best open-source Deduplication projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | BorgBackup | 10,679 |
2 | dupeguru | 4,917 |
3 | borgmatic | 1,662 |
4 | splink | 1,126 |
5 | LSH | 274 |
6 | dduper | 163 |
7 | benji | 137 |
8 | npbackup | 131 |
9 | unisim | 84 |
10 | dude | 84 |
11 | Neural-Scam-Artist | 23 |
12 | dedup | 11 |
13 | image-deduplication-plugin | 9 |
14 | chunkdup | 1 |
15 | Deduper | 0 |