Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 14 unstructured-data Open-Source Projects
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
cursusdb
CursusDB is an open-source distributed in-memory yet persisted document oriented database system with real time capabilities.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
base
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance. (by adansons)
-
html_tag_annotator
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension
-
pipeline-docs-data-extractor
(Let's build a) Robust pipeline for extracting structured data from various documents
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Renumics/spotlight: Interactively explore unstructured datasets from dataframes | news.ycombinator.com | 2024-03-10
Project mention: Tantivy 0.20 is released: Schemaless column store, Schemaless aggregations, Phrase prefix queries, Percentiles, and more... | /r/rust | 2023-06-20You have also NucliaDB that is built on top of tantivy and addresses vector search for documents and video search.
Project mention: CursusDB: Fast, open-source document oriented database with SQL like query | news.ycombinator.com | 2024-01-08
Project mention: Show HN: Generate JSON mock data for testing/initial app development | news.ycombinator.com | 2023-10-03A friend of mine built a tool called Trex that you might find helpful, check it out here: https://github.com/automorphic-ai/trex
It's very consistent at generating templated data.
Project mention: Ask HN: I have many PDFs – what is the best local way to leverage AI for search? | news.ycombinator.com | 2024-05-30
Step 1: Log in to your InstillAI Cloud account. If you don't have an account yet, you can create one here for free using your Email or Google or GitHub ID.
unstructured-data related posts
-
Show HN: LLMWhisperer – Prep complex documents ready for use in LLMs
-
RAGFlow is an open-source RAG engine based on deep document understanding
-
CursusDB: Fast, open-source document oriented database with SQL like query
-
Milvus Adventures Jan 5, 2023
-
A new open-source distributed in-memory and persisted document oriented DBMS
-
CursusDB – Distributed document oriented DBMS with an SQL like query language
-
How to approach databases inside Next.js?
-
A note from our sponsor - InfluxDB
www.influxdata.com | 1 Jun 2024
Index
What are some of the best open-source unstructured-data projects? This list will help you:
Project | Stars | |
---|---|---|
1 | towhee | 3,029 |
2 | bootcamp | 1,653 |
3 | awesome-document-understanding | 1,156 |
4 | spotlight | 1,020 |
5 | Nuclia DB | 585 |
6 | cursusdb | 417 |
7 | trex | 238 |
8 | unstract | 166 |
9 | relevanceai | 103 |
10 | dkm | 95 |
11 | base | 28 |
12 | deprecated-core | 13 |
13 | html_tag_annotator | 12 |
14 | pipeline-docs-data-extractor | 5 |
Sponsored