Pdfsandwich

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • ripgrep-all

    rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

  • On an even more macro level I've had a great experience with ripgrep-all[0], which uses Tesseract internally.

    I have e.g. a directory with all weekly lecture slides for one lecture, and can directly find where (both file and page) we learned something related to photosynthesis via `rga photoshynthesis`.

    [0]: https://github.com/phiresky/ripgrep-all

  • InvoiceNet

    Deep neural network to extract intelligent information from invoice documents.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...

    The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula

    However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.

  • While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...

    The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula

    However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.

  • tabula

    Tabula is a tool for liberating data tables trapped inside PDF files

  • While trying to find a specific project I recalled, I encountered this list of projects which might be of interest: https://github.com/tstanislawek/awesome-document-understandi...

    The project I had in mind was similar to this one but I can't remember the name currently: https://github.com/tabulapdf/tabula

    However, if you're looking for a ML-based, invoice-specific project looks like the other comment to your reply might be more useful.

  • OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

  • Similar, perhaps more feature-rich tool is OCRmyPDF https://github.com/jbarlow83/OCRmyPDF

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • LlamaCloud and LlamaParse

    9 projects | news.ycombinator.com | 20 Feb 2024
  • Show HN: How do you OCR on a Mac using the CLI or just Python for free

    6 projects | news.ycombinator.com | 2 Jan 2024
  • Extract informations from invoices with machine learning

    2 projects | /r/deeplearning | 7 Apr 2021
  • When Will the GenAI Bubble Burst?

    1 project | news.ycombinator.com | 4 Apr 2024
  • 🔍Underrated Open Source Projects You Should Know About 🧠

    9 projects | dev.to | 20 Mar 2024