Frog: OCR Tool for Linux

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • normcap

    OCR powered screen-capture tool to capture information instead of images

  • tessdata

    Trained models with fast variant of the "best" LSTM models + legacy models

  • Appears to be a nice wrapper around Tesseract:

    https://github.com/tesseract-ocr/tessdata

    https://en.wikipedia.org/wiki/Tesseract_(software)

    The demo of course works perfectly on a Mac as this is already built into Ventura.

      In November 2020, Brewster Kahle from the Internet Archive praised Tesseract saying:

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • doctr

    docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

  • There's also DocTR which can do text detection and extraction out of the box.

    It's command line driven but can display the detected text as an overlay of the document.

    https://github.com/mindee/doctr

  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

  • PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

  • I’ve had good results from paddle ocr.

    https://github.com/PaddlePaddle/PaddleOCR

  • flameshot

    Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:

  • Cool! I've seen similar ideas before and made my own inspired by these some years ago. It's a simple bash script based on [flameshot](https://flameshot.org/) for taking the screenshot and Tesseract:

        #!/usr/bin/env bash

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • TextSnatcher: Copy text from images, for the Linux Desktop

    7 projects | news.ycombinator.com | 14 Mar 2024
  • A better document viewer

    1 project | /r/linux4noobs | 13 Sep 2023
  • OCR for a full pdf on Neoreader

    1 project | /r/Onyx_Boox | 25 Jun 2023
  • ELI5: why is PDF such a widespread text format, instead of a format that's actually easier to edit?

    1 project | /r/explainlikeimfive | 3 Jun 2023
  • [Free-Post Friday!] Recommendations for high volume document scanners

    1 project | /r/DataHoarder | 19 May 2023