Top 23 Python Search Projects

algorithms

3 23,624 4.4 Python

Minimal examples of data structures and algorithms in Python
whoogle-search

146 8,868 8.1 Python

A self-hosted, ad-free, privacy-respecting metasearch engine

Project mention: So I deployed Whoogle on my NAS.... | /r/selfhosted | 2023-12-08

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
searxng

121 8,813 9.7 Python

SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.

Project mention: Mobile Ad Blocker Will No Longer Stop YouTube's Ads | news.ycombinator.com | 2024-04-16

Don't use Youtube without going through a proxy like Invidious [1] or Newpipe
Don't use {site} Search without going through a proxy like SearxNG [2]
Don't use TwiXXer without going through a proxy like Nitter - this has gotten more difficult lately but it still works as long as you feed the daemon some registered accounts. Video does not work at the moment but that seems to be fixable.
Don't use Reddit without going through a proxy like libreddit [4]
Start noticing the pattern? Maybe it is time to start producing promotional posters:
The only thing to come between you and ADS could be a proxy / ADS. I'ts just not worth the risk
ADS / New rules for a sane net / Sane net protects you, your partner and your community
A proxy here and a filter there, ADS nowhere
The more you tighten your grip, ${site}, the more viewers will slip through your fingers
[1] https://github.com/iv-org/invidious
[2] https://github.com/searxng/searxng
[3] https://github.com/zedeus/nitter
[4] https://github.com/libreddit/libreddit

txtai

356 7,111 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Project mention: Show HN: FileKitty – Combine and label text files for LLM prompt contexts | news.ycombinator.com | 2024-05-01

buku

48 6,172 7.0 Python

:bookmark: Personal mini-web in text

Project mention: Enlightenmentware | news.ycombinator.com | 2024-05-20

I really like the buku terminal bookmark manager. https://github.com/jarun/buku I like that I can just `man buku` when I don't understand something and I can actually find the answer I'm looking for.

tribler

65 4,702 9.8 Python

Privacy enhanced BitTorrent client with P2P content discovery

Project mention: Tribler: An attack-resilient micro-economy for media | news.ycombinator.com | 2024-04-25

I noticed that too:
https://github.com/Tribler/tribler/wiki/%22TrustChain%22-arc...
But not much else about it. Would be interested to read more. Using torrent seeding as a form of Proof-of-Work that rewards tokens is actually an interesting use case for cryptocurrency, and not as energy-hungry.

elasticsearch-py

21 4,147 8.9 Python

Official Python client for Elasticsearch
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
elasticsearch-dsl-py

3 3,780 8.3 Python

High level Python client for Elasticsearch
search-plugins

118 3,598 3.9 Python

Search plugins for the search feature

Project mention: Whats the best browser for torrenting? | /r/torrents | 2023-12-10

here

django-haystack

5 3,552 8.4 Python

Modular search for Django
image-match

5 2,911 0.0 Python

🎇 Quickly search over billions of images
datasketch

1 2,362 6.4 Python

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
JobFunnel

1 1,740 0.0 Python

Scrape job websites into a single spreadsheet with no duplicates.
swirl-search

32 1,552 9.8 Python

Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.

Project mention: GitHub - swirlai/swirl-search: Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously, finds the best results using a reader LLM, then prompts Generative AI, enabling you to get answers based on your data. | /r/programming | 2023-12-05

twitter-api-client

24 1,380 7.4 Python

Implementation of X/Twitter v1, v2, and GraphQL APIs (by trevorhobenshield)

Project mention: Reverse Engineering Twitter Spaces - Capture 500 Audio Streams/Live Transcripts per IP | /r/programming | 2023-06-11

R2R

4 1,255 9.7 Python

The framework for fast development and deployment of RAG systems.

Project mention: Show HN: Ellipsis – Automated PR reviews and bug fixes | news.ycombinator.com | 2024-05-09

Hi HN, hunterbrooks and nbrad here from Ellipsis (https://www.ellipsis.dev). Ellipsis automatically reviews your PRs when opened and on each new commit. If you tag @ellipsis-dev in a comment, it can make changes to the PR (via direct commit or side PR) and answer questions, just like a human.
Demo video: https://www.youtube.com/watch?v=X61NGZpaNQA
So far, we have dozens of open source projects and companies using Ellipsis. We seem to have landed in a kind of sweet spot where there’s a good match between the current capabilities of AI tools and the actual needs of software engineers - this doesn’t replace human review, but it saves you time by catching/fixing lots of small silly stuff.
Here’s an example in the wild: https://github.com/relari-ai/continuous-eval/pull/38, where Ellipsis (1) adds a PR summary; (2) finds a bug and adds a review comment; (3) after a [human] user comments, generates a side PR with the fix; and (4) after a (human) user merges the side PR and adds another commit, re-reviews the PR and approves it
Here’s another example: https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-..., where Ellipsis adds several comments with inline suggestions that were directly merged by the developer.
You can configure Ellipsis in natural language to enforce custom rules, style guides, or conventions. For example, here’s how the `jxnl/instructor` repo uses natural language rules to make sure that docs are kept in sync: https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L..., and here’s an example PR that Ellipsis came up with based on those rules: https://github.com/jxnl/instructor/pull/346.
Don’t worry, your code is never stored or used to train models (https://docs.ellipsis.dev/security).
Installing into your repo takes 2 clicks at https://www.ellipsis.dev. We’d really appreciate your feedback, thoughts, and ideas!

paperai

19 1,206 5.9 Python

📄 🤖 Semantic search and workflows for medical/scientific papers

Project mention: Oracle of Zotero: LLM QA of Your Research Library | news.ycombinator.com | 2023-11-26

Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.

RecoverPy

22 1,176 9.3 Python

Interactively find and recover deleted or :point_right: overwritten :point_left: files from your terminal

Project mention: RecoverPy 2.1.3: A Linux tool to recover deleted or overwritten files | /r/opensource | 2023-10-23

Memacs

20 968 2.7 Python

What did I do on February 14th 2007? Visualize your (digital) life in Org-mode

Project mention: Show HN: Khoj – Chat Offline with Your Second Brain Using Llama 2 | news.ycombinator.com | 2023-07-30

Might look into some of the tools like novoids Memacs. Notion here is to build tools that push feeds, history data, into Emacs. Using org in your use case with the Khoj tool, could be the "glue" you need to tie it all together. https://github.com/novoid/Memacs#readme.

notion-search-alfred-workflow

4 816 5.1 Python

An Alfred workflow to search Notion with instant results
twikit

2 730 9.7 Python

Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot

Project mention: Show HN: Twitter API Wrapper for Python – No API Keys Needed | news.ycombinator.com | 2024-02-03

pysolr

0 659 8.2 Python

Pysolr — Python Solr client
stweet

3 570 0.0 Python

Advanced python library to scrap Twitter (tweets, users) from unofficial API
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Search related posts

Show HN: FileKitty – Combine and label text files for LLM prompt contexts

5 projects | news.ycombinator.com | 1 May 2024
What contributing to Open-source is, and what it isn't

1 project | news.ycombinator.com | 27 Apr 2024
DuckDuckGo Privacy Pro

1 project | news.ycombinator.com | 12 Apr 2024
SearXNG is a free internet metasearch engine

3 projects | news.ycombinator.com | 5 Apr 2024
Google will start showing AI-powered search results for users who didn't opt-in

1 project | news.ycombinator.com | 23 Mar 2024
YaCy, a distributed Web Search Engine, based on a peer-to-peer network

9 projects | news.ycombinator.com | 5 Mar 2024
Build knowledge graphs with LLM-driven entity extraction

1 project | dev.to | 21 Feb 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 20 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Search projects in Python? This list will help you:

	Project	Stars
1	algorithms	23,624
2	whoogle-search	8,868
3	searxng	8,813
4	txtai	7,111
5	buku	6,172
6	tribler	4,702
7	elasticsearch-py	4,147
8	elasticsearch-dsl-py	3,780
9	search-plugins	3,598
10	django-haystack	3,552
11	image-match	2,911
12	datasketch	2,362
13	JobFunnel	1,740
14	swirl-search	1,552
15	twitter-api-client	1,380
16	R2R	1,255
17	paperai	1,206
18	RecoverPy	1,176
19	Memacs	968
20	notion-search-alfred-workflow	816
21	twikit	730
22	pysolr	659
23	stweet	570

Python Search

Top 23 Python Search Projects

Python Search related posts

Show HN: FileKitty – Combine and label text files for LLM prompt contexts

What contributing to Open-source is, and what it isn't

DuckDuckGo Privacy Pro

SearXNG is a free internet metasearch engine

Google will start showing AI-powered search results for users who didn't opt-in

YaCy, a distributed Web Search Engine, based on a peer-to-peer network

Build knowledge graphs with LLM-driven entity extraction

Index