Top 13 Python preprocessing Projects
-
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
-
courlan
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
pytorch-VideoDataset
Tools for loading video dataset and transforms on video in pytorch. You can directly load video files without preprocessing.
Project mention: Better RAG Results with Reciprocal Rank Fusion and Hybrid Search | news.ycombinator.com | 2024-05-30Within our open source RAG product RAGFlow(https://github.com/infiniflow/ragflow), Elasticsearch is currently used instead of other general vector databases, because it can provide hybrid search right now. Under the default cases, embedding based reranker is not required, just RRF is enough, while even if reranker is used, keywords based retrieval is also a MUST to be hybridized with embedding based retrieval, that's just what RAGFlow's latest 0.7 release has provided.
On the other hand let me introduce another database we developed, Infinity(https://github.com/infiniflow/infinity), which can provide the fastest hybrid search, you can see the performance here(https://github.com/infiniflow/infinity/blob/main/docs/refere...), both vector search and full-text search could perform much faster than other open source alternatives.
From the next version(weeks later), Infinity will also provide more comprehensive hybrid search capabilities, what you have mentioned the 3-way recalls(dense vector, sparse vector, keyword search) could be provided within single request.
Python preprocessing related posts
Index
What are some of the best open-source preprocessing projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ragflow | 8,245 |
2 | igel | 3,080 |
3 | MLBox | 1,481 |
4 | NVTabular | 1,016 |
5 | nnAudio | 968 |
6 | voicesmith | 207 |
7 | courlan | 70 |
8 | pytorch-VideoDataset | 67 |
9 | podium | 60 |
10 | cpip | 39 |
11 | VHDLproc | 24 |
12 | riffusion-scripts | 0 |
13 | videotooimage | 0 |
Sponsored