Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →
Top 23 Python web-scraping Projects
-
changedetection.io
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
Douyin_TikTok_Download_API
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
curl_cffi
Python binding for curl-impersonate via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
-
web-scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
-
dude
dude uncomplicated data extraction: A simple framework for writing web scrapers using Python decorators
-
wayback-machine-scraper
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
-
letterboxd_recommendations
Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username
-
facebook_page_scraper
Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
-
scrapper
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Scrapy is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this.
Project mention: Google have removed RSS support from their developer blogs | news.ycombinator.com | 2023-12-11I use ChangeDetection,
- https://changedetection.io/#features
- https://github.com/dgtlmoon/changedetection.io
Project mention: Osint update of the Snoop Project tool search for user by nickname | news.ycombinator.com | 2024-01-02
curl_cffi – A http client that can impersonate browser tls/ja3/http2 fingerprints
botasaurus – The All in One Framework to build Awesome Scrapers
Project mention: wayback-machine-scraper: NEW Data - star count:380.0 | /r/algoprojects | 2023-12-10
Use an existing self-hostable tool for getting recommendations from there, such as letterboxd_recommendations
Python web-scraping related posts
-
Claude is now available in Europe
-
wayback-machine-scraper: NEW Data - star count:380.0
-
Trafilatura: Python tool to gather text on the Web
-
Self Hosted Content Recommender?
-
Show HN: Build AI Dags with Memory; Run and Validate LLM Tools in Containers
-
Powerful and free scraper with a headless browser under the hood and Readability for parsing
-
No more rec requests
-
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 6 Jun 2024
Index
What are some of the best open-source web-scraping projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Scrapy | 51,343 |
2 | changedetection.io | 15,285 |
3 | Douyin_TikTok_Download_API | 7,357 |
4 | autoscraper | 6,007 |
5 | trafilatura | 3,071 |
6 | snoop | 2,753 |
7 | Grab | 2,363 |
8 | curl_cffi | 1,530 |
9 | botasaurus | 1,036 |
10 | scrapy-fake-useragent | 681 |
11 | web-scraping | 678 |
12 | google-search-results-python | 532 |
13 | dude | 412 |
14 | basketball_reference_web_scraper | 414 |
15 | wayback-machine-scraper | 408 |
16 | twitter-scraper-selenium | 285 |
17 | letterboxd_recommendations | 223 |
18 | facebook_page_scraper | 201 |
19 | saveddit | 165 |
20 | rymscraper | 157 |
21 | stock_screener | 128 |
22 | GoodreadsScraper | 119 |
23 | scrapper | 123 |