SaaSHub helps you find the best software and product alternatives Learn more β
Paperetl Alternatives
Similar projects and alternatives to paperetl
-
txtai
π‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
tika-python
Tika-Python is a Python binding to the Apache Tikaβ’ REST services allowing Tika to be called natively in the Python community.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
rdm
Our regulatory documentation manager. Streamlines 62304, 14971, and 510(k) documentation for software projects. (by innolitics)
-
science-parse
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
-
nlm-ingestor
This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
paperetl reviews and mentions
- Show HN: Open-source Rule-based PDF parser for RAG
-
Oracle of Zotero: LLM QA of Your Research Library
Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.
-
[P] Parse research papers into structured data
paperai | paperetl
- Parse research papers into a structured dataset
- ETL for medical and scientific papers
- Show HN: ETL for Medical and Scientific Papers
-
Seeking Advice: How to extract Abstract from scientific journals (.pdfs) 10k+.
paperai and paperetl are a set of projects to consider for this task.
- paperetl: ETL processes for medical and scientific papers
-
A note from our sponsor - SaaSHub
www.saashub.com | 13 May 2024
Stats
neuml/paperetl is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of paperetl is Python.
Sponsored