Pg_vectorize: The simplest way to do vector search and RAG on Postgres

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

pg_vectorize

5 643 9.2 Rust

The simplest way to orchestrate vector search on Postgres

Sorry if i'm completely missing it, I noticed in the code, there is something around chat:
https://github.com/tembo-io/pg_vectorize/blob/main/src/chat....
This would lead me to believe there is some way to actually invoke not just embeddings, but querying an LLM... which would be crazy powerful. Are there any examples on how to do this?

nlm-ingestor

3 823 7.1 Python

This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.

>tree-based approach to organize and summarize text data, capturing both high-level and low-level details.
https://twitter.com/parthsarthi03/status/1753199233241674040
processes documents, organizing content and improving readability by handling sections, paragraphs, links, tables, lists, page continuations, and removing redundancies, watermarks, and applying OCR, with additional support for HTML and other formats through Apache Tika:
https://github.com/nlmatics/nlm-ingestor

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
pgvector

78 9,473 9.9 C

Open-source vector similarity search for Postgres

There's an issue in the pgvector repo about someone having several ~10-20million row tables and getting acceptable performance with the right hardware and some performance tuning: https://github.com/pgvector/pgvector/issues/455
I'm in the early stages of evaluating pgvector myself. but having used pinecone I currently am liking pgvector better because of it being open source. The indexing algorithm is clear, one can understand and modify the parameters. Furthermore the database is postgresql, not a proprietary document store. When the other data in the problem is stored relationally, it is very convenient to have the vectors stored like this as well. And postgresql has good observability and metrics. I think when it comes to flexibility for specialized applications, pgvector seems like the clear winner. But I can definitely see pinecone's appeal if vector search is not a core component of the problem/business, as it is very easy to use and scales very easily

SemanticSlicer

1 7 7.5 C#

A recursive text chunker that attempts to preserve context.

I wrote a C# library to do this, which is similar to other chunking approaches that are common, like the way langchain does it: https://github.com/drittich/SemanticSlicer
Given a list of separators (regexes), it goes through them in order and keeps splitting the text by them until the chunk fits within the desired size. By putting the higher level separators first (e.g., for HTML split by
before
), it's a pretty good proxy for maintaining context.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Integrate txtai with Postgres

2 projects | dev.to | 25 Apr 2024
Vector Database solutions on AWS

1 project | dev.to | 28 Mar 2024
Using pgvector To Locate Similarities In Enterprise Data

2 projects | dev.to | 21 Mar 2024
pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

1 project | dev.to | 19 Mar 2024
Simplifying the Milvus Selection Process

3 projects | dev.to | 19 Feb 2024

Pg_vectorize: The simplest way to do vector search and RAG on Postgres

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search
Post date: 6 Mar 2024

pg_vectorize

nlm-ingestor

InfluxDB

pgvector

SemanticSlicer

before

), it's a pretty good proxy for maintaining context.

Related posts

Integrate txtai with Postgres

Vector Database solutions on AWS

Using pgvector To Locate Similarities In Enterprise Data

pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

Simplifying the Milvus Selection Process

Pg_vectorize: The simplest way to do vector search and RAG on Postgres

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com nearest-neighbor-search approximate-nearest-neighbor-search Post date: 6 Mar 2024

pg_vectorize

nlm-ingestor

InfluxDB

pgvector

SemanticSlicer

before

), it's a pretty good proxy for maintaining context.

Related posts

Integrate txtai with Postgres

Vector Database solutions on AWS

Using pgvector To Locate Similarities In Enterprise Data

pgvector vs. pgvecto.rs in 2024: A Comprehensive Comparison for Vector Search in PostgreSQL

Simplifying the Milvus Selection Process

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search
Post date: 6 Mar 2024