Introduction to Vector Similarity Search

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

faiss

71 28,637 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

https://github.com/facebookresearch/faiss

pgvector

79 9,842 9.9 C

Open-source vector similarity search for Postgres

https://github.com/pgvector/pgvector
`ankane/pgvector` docker image is a drop in replacement for the postgres image, so you can fire this up with docker very quickly.
It's a normal postgres db with a vector datatype. It can index the vectors and allows efficient retrieval. Both AWS RDS and Google Cloud now support this in their managed Postgres offerings, so postgres+pgvector is a viable managed production vectordb solution.
> Also, how granular should the text chunks be?
That depends on the use case, the size of your corpus, the context of the model you are using, how much money you are willing to spend.
> Has anyone been able to achieve reliable results from these? Preferably w/o using Langchain.
Definitely. We use postgres+pgvector with php.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Milvus

108 27,510 10.0 Go

A cloud-native vector database, storage for next generation AI applications

If you're just starting out, I'd use sentence-transformers for calculating embeddings. You'll want a bi-encoder model since they produce embeddings. As the author of the blog, I'm partial towards Milvus (https://github.com/milvus-io/milvus) due to its enterprise and scale, but FAISS is a great option too if you're just looking for something more local and contained.
Milvus will perform vector search for you - all you need to do is give it a query vector.

chroma

32 12,771 9.8 Rust

the AI-native open-source embedding database

ah sorry, i should read OP better - chroma's default embedding model is sentence transformers - and we many other integrated - https://github.com/chroma-core/chroma/blob/main/chromadb/uti...
> It would be wonderful if there were a simpler (single file, SQLite or DuckDB like) database for vectors than the complex (and in some cases, unfortunately cloud-based) ones available now.
This is literally chroma!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Simplifying the Milvus Selection Process

3 projects | dev.to | 19 Feb 2024
Milvus Adventures Dec 15, 2023

1 project | dev.to | 15 Dec 2023
GPU-Accelerated Indexing in LanceDB

1 project | news.ycombinator.com | 3 Nov 2023
Code Search with Vector Embeddings: A Transformer's Approach

3 projects | dev.to | 27 Aug 2023
Implementing Vector Database for AI

1 project | dev.to | 23 Aug 2023

Introduction to Vector Similarity Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search Database Embeddings anns
Post date: 11 Jul 2023

faiss

pgvector

InfluxDB

Milvus

chroma

Related posts

Simplifying the Milvus Selection Process

Milvus Adventures Dec 15, 2023

GPU-Accelerated Indexing in LanceDB

Code Search with Vector Embeddings: A Transformer's Approach

Implementing Vector Database for AI

Introduction to Vector Similarity Search

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com nearest-neighbor-search approximate-nearest-neighbor-search Database Embeddings anns Post date: 11 Jul 2023

faiss

pgvector

InfluxDB

Milvus

chroma

Related posts

Simplifying the Milvus Selection Process

Milvus Adventures Dec 15, 2023

GPU-Accelerated Indexing in LanceDB

Code Search with Vector Embeddings: A Transformer's Approach

Implementing Vector Database for AI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search Database Embeddings anns
Post date: 11 Jul 2023