Pg_vectorize: The simplest way to do vector search and RAG on Postgres

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • pg_vectorize

    The simplest way to orchestrate vector search on Postgres

  • Sorry if i'm completely missing it, I noticed in the code, there is something around chat:

    https://github.com/tembo-io/pg_vectorize/blob/main/src/chat....

    This would lead me to believe there is some way to actually invoke not just embeddings, but querying an LLM... which would be crazy powerful. Are there any examples on how to do this?

  • nlm-ingestor

    This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.

  • >tree-based approach to organize and summarize text data, capturing both high-level and low-level details.

    https://twitter.com/parthsarthi03/status/1753199233241674040

    processes documents, organizing content and improving readability by handling sections, paragraphs, links, tables, lists, page continuations, and removing redundancies, watermarks, and applying OCR, with additional support for HTML and other formats through Apache Tika:

    https://github.com/nlmatics/nlm-ingestor

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pgvector

    Open-source vector similarity search for Postgres

  • There's an issue in the pgvector repo about someone having several ~10-20million row tables and getting acceptable performance with the right hardware and some performance tuning: https://github.com/pgvector/pgvector/issues/455

    I'm in the early stages of evaluating pgvector myself. but having used pinecone I currently am liking pgvector better because of it being open source. The indexing algorithm is clear, one can understand and modify the parameters. Furthermore the database is postgresql, not a proprietary document store. When the other data in the problem is stored relationally, it is very convenient to have the vectors stored like this as well. And postgresql has good observability and metrics. I think when it comes to flexibility for specialized applications, pgvector seems like the clear winner. But I can definitely see pinecone's appeal if vector search is not a core component of the problem/business, as it is very easy to use and scales very easily

  • SemanticSlicer

    A recursive text chunker that attempts to preserve context.

  • I wrote a C# library to do this, which is similar to other chunking approaches that are common, like the way langchain does it: https://github.com/drittich/SemanticSlicer

    Given a list of separators (regexes), it goes through them in order and keeps splitting the text by them until the chunk fits within the desired size. By putting the higher level separators first (e.g., for HTML split by

    before

    ), it's a pretty good proxy for maintaining context.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Integrate txtai with Postgres

    2 projects | dev.to | 25 Apr 2024
  • Vector Database solutions on AWS

    1 project | dev.to | 28 Mar 2024
  • Using pgvector To Locate Similarities In Enterprise Data

    2 projects | dev.to | 21 Mar 2024
  • pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

    1 project | dev.to | 19 Mar 2024
  • Simplifying the Milvus Selection Process

    3 projects | dev.to | 19 Feb 2024