HTML OCR

Open-source HTML projects categorized as OCR

Top 4 HTML OCR Projects

  • unstructured

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

  • Project mention: LlamaCloud and LlamaParse | news.ycombinator.com | 2024-02-20

    Be careful with unstructured:

    https://github.com/Unstructured-IO/unstructured/blob/d11c70c...

    from: https://github.com/open-webui/open-webui/issues/687

  • mokuro

    Read Japanese manga inside browser with selectable text.

  • Project mention: Show HN: Kimchi Reader – Immersive Korean Learning with a Popup Dictionary | news.ycombinator.com | 2023-10-29
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • documentation

    Documentation for Papermerge DMS - Installation, Help, User Manual, REST API (by papermerge)

  • Warframe-OCR

    A relic inventory recognition system for Warframe, based on experimental Rust bindings to Tesseract OCR. Supports detection in real-time. Very much WIP.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

HTML OCR related posts

  • Show HN: Kimchi Reader – Immersive Korean Learning with a Popup Dictionary

    2 projects | news.ycombinator.com | 29 Oct 2023
  • Unstructured – OSS libraries and APIs to build custom preprocessing pipelines

    1 project | news.ycombinator.com | 10 Jul 2023
  • More intelligent Pdf parsers

    1 project | /r/LocalLLaMA | 15 Jun 2023
  • Help extracting data from multiple PDF's

    1 project | /r/datascience | 6 Jun 2023
  • Any way to convert my handwritten diary to searchable PDFs?

    2 projects | /r/linuxquestions | 27 May 2023
  • Pre-processing text documents such as PDFs, HTML and Word Documents for LLMs

    1 project | news.ycombinator.com | 24 May 2023
  • Sites for anime or series sub japanese? or other forms of immersion.

    1 project | /r/LearnJapanese | 20 May 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 17 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source OCR projects in HTML? This list will help you:

Project Stars
1 unstructured 6,682
2 mokuro 728
3 documentation 13
4 Warframe-OCR 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com