-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard
One issue I always run into when implementing these approaches is the embedding model's context window being too small to represent what I need.
For example, on this project, looking at the generation of training data [1], it seems like what's actually being generated are embeddings on a string concatenated from each review, title, description, etc. [2]. With the max_seq_length set to 200, wouldn't book reviews with long reviews result in the book description text never being encoded? Wouldn't this result in queries not matching against potentially similar descriptions if the reviews are topically dissimilar (e.g., discussing author's style, book's flow, etc. instead of plot).
[1] https://github.com/veekaybee/viberary/blob/main/src/model/ge...
e5-mistral is essentially a distillation from gpt-4 to a smaller model. You can see here https://github.com/microsoft/unilm/blob/16da2f193b9c1dab0a69...
they actually have custom prompts for each dataset being tested.
Question would be, if you haven't seen the task before, what is a good prompt to prepend for your task?
IMO e5-mistral is overfit to MTEB