parsee-datasets

Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai (by parsee-ai)

Parsee-datasets Alternatives

Similar projects and alternatives to parsee-datasets based on common topics and language

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better parsee-datasets alternative or higher similarity.

parsee-datasets reviews and mentions

Posts with mentions or reviews of parsee-datasets. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-31.
  • FinRAG Datasets and Study
    1 project | news.ycombinator.com | 7 May 2024
    To test this, we created 3 different datasets, all based on the same selection of 1,156 randomly selected annual reports for the year 2023 of publicly listed US companies.

    The resulting (fully labeled) datasets contain a combined total of 10,404 rows, 37,536,847 tokens and 1,156 images and can be found on Github and Huggingface: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...

    For our study, we are evaluating 8 state-of-the-art (M)LLMs on a subset of 100 reports with some interesting results.

  • Parsee.ai – a framework to easily extract complex structured data with LLMs
    2 projects | news.ycombinator.com | 31 Mar 2024
    Yes, another LLM framework. This one is specialized on extracting structured data from various document types (mainly PDFs, images and HTML files).

    Comes with a new (separate) PDF extraction library that is focused on the extraction of numeric tables (tables with numbers, so especially for the financial domain): https://github.com/parsee-ai/parsee-pdf-reader

    Helps to easily set up a dataset to evaluate the performance of various LLMs on data extraction tasks, e.g. extracting revenue figures from financial reports: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...

Stats

Basic parsee-datasets repo stats
2
61
6.4
5 days ago

parsee-ai/parsee-datasets is an open source project licensed under MIT License which is an OSI approved license.

The primary programming language of parsee-datasets is Jupyter Notebook.


Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com