unstructured-data

Top 14 unstructured-data Open-Source Projects

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • bootcamp

    Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)

  • Project mention: FLaNK-AIM: 20 May 2024 Weekly | dev.to | 2024-05-20
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • spotlight

    Interactively explore unstructured datasets from your dataframe. (by Renumics)

  • Project mention: Renumics/spotlight: Interactively explore unstructured datasets from dataframes | news.ycombinator.com | 2024-03-10
  • Nuclia DB

    NucliaDB, The AI Search database for RAG

  • Project mention: Tantivy 0.20 is released: Schemaless column store, Schemaless aggregations, Phrase prefix queries, Percentiles, and more... | /r/rust | 2023-06-20

    You have also NucliaDB that is built on top of tantivy and addresses vector search for documents and video search.

  • cursusdb

    CursusDB is an open-source distributed in-memory yet persisted document oriented database system with real time capabilities.

  • Project mention: CursusDB: Fast, open-source document oriented database with SQL like query | news.ycombinator.com | 2024-01-08
  • trex

    Enforce structured output from LLMs 100% of the time (by automorphic-ai)

  • Project mention: Show HN: Generate JSON mock data for testing/initial app development | news.ycombinator.com | 2023-10-03

    A friend of mine built a tool called Trex that you might find helpful, check it out here: https://github.com/automorphic-ai/trex

    It's very consistent at generating templated data.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • unstract

    No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

  • Project mention: Ask HN: I have many PDFs – what is the best local way to leverage AI for search? | news.ycombinator.com | 2024-05-30
  • relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

  • dkm

    Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features

  • base

    Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance. (by adansons)

  • deprecated-core

    🔮 Instill Core contains components for supporting Instill VDP and Instill Model

  • Project mention: Building an Instill AI Pipeline in 5 minutes | dev.to | 2023-10-22

    Step 1: Log in to your InstillAI Cloud account. If you don't have an account yet, you can create one here for free using your Email or Google or GitHub ID.

  • html_tag_annotator

    A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension

  • pipeline-docs-data-extractor

    (Let's build a) Robust pipeline for extracting structured data from various documents

  • Project mention: ETL Texts | news.ycombinator.com | 2024-01-14
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

unstructured-data related posts

  • Show HN: LLMWhisperer – Prep complex documents ready for use in LLMs

    1 project | news.ycombinator.com | 10 Apr 2024
  • RAGFlow is an open-source RAG engine based on deep document understanding

    8 projects | news.ycombinator.com | 1 Apr 2024
  • CursusDB: Fast, open-source document oriented database with SQL like query

    1 project | news.ycombinator.com | 8 Jan 2024
  • Milvus Adventures Jan 5, 2023

    1 project | dev.to | 5 Jan 2024
  • A new open-source distributed in-memory and persisted document oriented DBMS

    1 project | news.ycombinator.com | 2 Jan 2024
  • CursusDB – Distributed document oriented DBMS with an SQL like query language

    1 project | news.ycombinator.com | 9 Dec 2023
  • How to approach databases inside Next.js?

    2 projects | /r/nextjs | 9 Dec 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source unstructured-data projects? This list will help you:

Project Stars
1 towhee 3,029
2 bootcamp 1,653
3 awesome-document-understanding 1,156
4 spotlight 1,020
5 Nuclia DB 585
6 cursusdb 417
7 trex 238
8 unstract 166
9 relevanceai 103
10 dkm 95
11 base 28
12 deprecated-core 13
13 html_tag_annotator 12
14 pipeline-docs-data-extractor 5

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com