-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
doctr
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
-
PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Appears to be a nice wrapper around Tesseract:
https://github.com/tesseract-ocr/tessdata
https://en.wikipedia.org/wiki/Tesseract_(software)
The demo of course works perfectly on a Mac as this is already built into Ventura.
In November 2020, Brewster Kahle from the Internet Archive praised Tesseract saying:
There's also DocTR which can do text detection and extraction out of the box.
It's command line driven but can display the detected text as an overlay of the document.
https://github.com/mindee/doctr
I’ve had good results from paddle ocr.
https://github.com/PaddlePaddle/PaddleOCR
Cool! I've seen similar ideas before and made my own inspired by these some years ago. It's a simple bash script based on [flameshot](https://flameshot.org/) for taking the screenshot and Tesseract:
#!/usr/bin/env bash