Python Retrieval

Open-source Python projects categorized as Retrieval

Top 16 Python Retrieval Projects

  • mteb

    MTEB: Massive Text Embedding Benchmark

  • Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

    RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:

    - Chunking can interfer with context boundaries

    - Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)

    - Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)

    - RAG will miserably fail with requests like "summarize the whole document"

    - to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb

    1 https://github.com/underlines/awesome-marketing-datascience/...

  • beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

  • Project mention: On building a semantic search engine | news.ycombinator.com | 2024-01-06

    The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • R2R

    The framework for fast development and deployment of RAG systems.

  • Project mention: Show HN: Ellipsis – Automated PR reviews and bug fixes | news.ycombinator.com | 2024-05-09

    Hi HN, hunterbrooks and nbrad here from Ellipsis (https://www.ellipsis.dev). Ellipsis automatically reviews your PRs when opened and on each new commit. If you tag @ellipsis-dev in a comment, it can make changes to the PR (via direct commit or side PR) and answer questions, just like a human.

    Demo video: https://www.youtube.com/watch?v=X61NGZpaNQA

    So far, we have dozens of open source projects and companies using Ellipsis. We seem to have landed in a kind of sweet spot where there’s a good match between the current capabilities of AI tools and the actual needs of software engineers - this doesn’t replace human review, but it saves you time by catching/fixing lots of small silly stuff.

    Here’s an example in the wild: https://github.com/relari-ai/continuous-eval/pull/38, where Ellipsis (1) adds a PR summary; (2) finds a bug and adds a review comment; (3) after a [human] user comments, generates a side PR with the fix; and (4) after a (human) user merges the side PR and adds another commit, re-reviews the PR and approves it

    Here’s another example: https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-..., where Ellipsis adds several comments with inline suggestions that were directly merged by the developer.

    You can configure Ellipsis in natural language to enforce custom rules, style guides, or conventions. For example, here’s how the `jxnl/instructor` repo uses natural language rules to make sure that docs are kept in sync: https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L..., and here’s an example PR that Ellipsis came up with based on those rules: https://github.com/jxnl/instructor/pull/346.

    Don’t worry, your code is never stored or used to train models (https://docs.ellipsis.dev/security).

    Installing into your repo takes 2 clicks at https://www.ellipsis.dev. We’d really appreciate your feedback, thoughts, and ideas!

  • RETRO-pytorch

    Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

  • fastembed

    Fast, Accurate, Lightweight Python library to make State of the Art Embedding

  • Project mention: FastLLM by Qdrant – lightweight LLM tailored For RAG | news.ycombinator.com | 2024-04-01
  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...

  • memorizing-transformers-pytorch

    Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

  • Project mention: What can LLMs never do? | news.ycombinator.com | 2024-04-27

    At one point I experimented a little with transformers that had access to external memory searchable via KNN lookups https://github.com/lucidrains/memorizing-transformers-pytorc... or via routed queries with https://github.com/glassroom/heinsen_routing . Both approaches seemed to work for me, but I had to put that work on hold for reasons outside my control.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • searchGPT

    Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.

  • raptor

    The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

  • Project mention: Show HN: A phone number to text with questions about current events | news.ycombinator.com | 2024-05-10

    Hi HN! For my senior thesis in CS, I built an SMS-based application to make journalism more accessible. It works like this:

    1) You text the topics you're interested in to my phone number. Every day, you'll receive a text with 5 headlines from The Associated Press (https://apnews.com/) related to those topics.

    2) If you have questions about any of the current events the headlines describe, you just text them back. A response is generated from the contents of the articles using the RAPTOR retrieval framework (https://github.com/parthsarthi03/raptor) and texted right back to you.

    The repo can be found here: https://github.com/tdh15/pressText

    I'd really appreciate any and all feedback. Whatever you got, I'd love to hear it :)

  • cherche

    Neural Search

  • ACT

    Atmospheric data Community Toolkit - A python based toolkit for exploring and analyzing time series atmospheric datasets (by ARM-DOE)

  • icl-ceil

    [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.

  • retomaton

    PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)

  • ragswift

    🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform

  • Project mention: Show HN: Ragswift – Scalable embeddings platform powered by distributed compute | news.ycombinator.com | 2024-01-22
  • SHREC2023-ANIMAR

    Source codes of team TikTorch (1st place solution) for track 2 and 3 of the SHREC2023 Challenge

  • FloridaPropertyData

    A Python-based tool for retrieving and processing property data for specific counties in Florida using Parcel ID numbers. Simplifies data retrieval and offers customization options for real estate agents, investors, and government officials.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Retrieval related posts

  • Show HN: A phone number to text with questions about current events

    2 projects | news.ycombinator.com | 10 May 2024
  • RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

    1 project | news.ycombinator.com | 30 Apr 2024
  • [D] Any pre trained retrieval based language models available?

    3 projects | /r/MachineLearning | 22 Oct 2022

Index

What are some of the best open-source Retrieval projects in Python? This list will help you:

Project Stars
1 mteb 1,448
2 beir 1,407
3 R2R 1,232
4 RETRO-pytorch 832
5 fastembed 822
6 NeumAI 785
7 memorizing-transformers-pytorch 614
8 searchGPT 570
9 raptor 491
10 cherche 313
11 ACT 127
12 icl-ceil 81
13 retomaton 64
14 ragswift 33
15 SHREC2023-ANIMAR 6
16 FloridaPropertyData 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com