-
pdfplumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
textract-ai
TextractAI: Extract and process text from PDFs using Python, OpenAI API, and OCR techniques.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Out of curiosity have you tried ocrs by Robert Knight? https://github.com/robertknight/ocrs
I recently built a similar tool except it’s configured to use some deep learning libraries for the table extraction. I’m excited to integrate unitable which has state of the art performance later this week.
I built this because most of the basic layout detection libraries have terrible performance on anything non trivial. Deep learning is really the long term solution here.
https://github.com/Filimoa/open-parse
My s3-ocr tool can do that with quite a bit of extra configuration.
https://github.com/simonw/s3-ocr
This is cool! I built something similar but it's CLI based. [1] https://github.com/lifeiswilde/textract-ai
Here's an EasyOCR service: https://github.com/MittaAI/mitta-community/tree/main/service.... A PDF to image processor is being built and should be out in a few weeks.
No docs, but happy to help anyone wanting to use it. Email is kord @ the company I'm working on.