Top 23 Python Scraper Projects

newspaper

13 13,808 0.0 Python

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
chinese-xinhua

1 10,694 0.0 Python

:orange_book: 中华新华字典数据库。包括歇后语，成语，词语，汉字。
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Douyin_TikTok_Download_API

3 7,263 9.2 Python

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具，支持API调用，在线批量解析及下载。
autoscraper

9 5,998 0.0 Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
myGPTReader

6 4,407 6.4 Python

A community-driven way to read and chat with AI bots - powered by chatGPT.
snscrape

29 4,269 7.3 Python

A social networking service scraper in Python

Project mention: Can someone walk me through this? | /r/learnpython | 2023-11-21

Here's what I'm trying to use: https://github.com/JustAnotherArchivist/snscrapeWhat do I need to open/run any of this? My goal with this is to extract my follower list off Twitter, and I'd very much like to know how to run it on my machine instead of having someone run it for me on theirs. I can't even figure out what I need to open the Readme file.

Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

8 3,070 3.5 Python

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
bulk-downloader-for-reddit

80 2,218 0.0 Python

Downloads and archives content from reddit

Project mention: BDFR skipping Reddit hosted videos | /r/DataHoarder | 2023-10-21

JobFunnel

1 1,740 0.0 Python

Scrape job websites into a single spreadsheet with no duplicates.
linkedin_scraper

2 1,765 2.3 Python

A library that scrapes Linkedin for user data
mlscraper

10 1,235 0.6 Python

🤖 Scrape data from HTML websites automatically by just providing examples
animdl

6 1,227 6.7 Python

A highly efficient, fast, powerful and light-weight anime downloader and streamer for your favorite anime.

Project mention: Does anyone have anime websites recommendations without ads? | /r/anime | 2023-07-10

If you're tech-savvy, you can try animdl, which is a command line tool. No browser, no ads. To directly stream with the default provider AllAnime: animdl stream 'Your Anime Name'

cinemagoer

3 1,198 7.1 Python

Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
RedditDownloader

19 1,102 3.2 Python

Scrapes Reddit to download media of your choice.

Project mention: Reddit downloader that works after API upgrade | /r/DataHoarder | 2023-10-10

Is there a functioning tool to download the saved posts / upvotes that you do on reddit? This tool: https://github.com/shadowmoose/RedditDownloader was perfect, but it got rekt by the API changes and has been discontinued.

finviz

9 1,018 4.9 Python

Unofficial API for finviz.com
Scweet

5 978 0.0 Python

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers, user info, images...
GramAddict bot

4 944 8.4 Python

Completely free and open-source human-like Instagram bot. Powered by UIAutomator2 and compatible with basically any Android device 5.0+ that can run Instagram - real or emulated. (by GramAddict)
scrapyrt

3 816 6.8 Python

HTTP API for Scrapy spiders
bookcorpus

4 784 3.1 Python

Crawl BookCorpus

Project mention: The Internet Archive is under a DDoS attack | news.ycombinator.com | 2024-05-28

It is apparently widely suspected that a certain "Books2" dataset mentioned by OpenAI is basically just LibGen:
https://blusharkmedia.medium.com/the-ongoing-battle-against-...
https://techhq.com/2023/09/can-libgen-shadow-library-survive...
https://www.twitter.com/theshawwn/status/1320282152689336320
https://qz.com/openai-books-piracy-microsoft-meta-google-cha...
https://qz.com/shadow-libraries-are-at-the-heart-of-the-moun...
https://goodereader.com/blog/e-book-news/authors-file-lawsui...
When asked about whether this was true, they refused to answer based on confidentiality concerns, then said they had deleted all copies of the dataset, stopped using it, and no longer employed the individuals that compiled it:
https://www.businessinsider.com/openai-destroyed-ai-training...
We do know for a fact that the (non-OpenAI-controlled) "books3" dataset is just "all of bibliotik":
https://www.twitter.com/theshawwn/status/1320282149329784833
https://github.com/soskek/bookcorpus/issues/27
And we also apparently know for a fact that this was included in the datasets used to train LLAMA:
https://en.wikipedia.org/wiki/The_Pile_(dataset)
https://aicopyright.substack.com/p/the-books-used-to-train-l...
https://aicopyright.substack.com/p/has-your-book-been-used-t...

google-maps-scraper

3 784 7.4 Python

👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖

Project mention: I create a google maps scraper, let me know your thoughts | /r/webscraping | 2023-07-06

My scrapers runs at 120 Listing per 10 Minutes. So yours is quite Fast. You can see my scraper at https://github.com/omkarcloud/google-maps-scraper. It is quite popular with 95 Stars.

URS

11 740 7.5 Python

Universal Reddit Scraper - A comprehensive Reddit scraping/archival command-line tool.

Project mention: Nitter Shutting Down | news.ycombinator.com | 2024-01-27

If they don't want you to use their API just respect their wishes and scrape Reddit. https://github.com/JosephLai241/URS it's the only moral thing we can do.

TikTokLive

7 743 8.6 Python

Python library to receive live stream events (comments, gifts, etc.) in realtime from TikTok LIVE.

Project mention: Can someone help me modify this project | /r/learnpython | 2023-09-22

google-play-scraper

3 707 5.7 Python

Google play scraper for Python inspired by <facundoolano/google-play-scraper> (by JoMingyu)
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Scraper related posts

Can someone walk me through this?

1 project | /r/learnpython | 21 Nov 2023
What’s the coolest things you’ve done with python?

3 projects | /r/Python | 15 Nov 2023
BDFR skipping Reddit hosted videos

1 project | /r/DataHoarder | 21 Oct 2023
Updated Drexel Scheduler to Winter Quarter

1 project | /r/Drexel | 19 Oct 2023
Show HN: New AI Dataset Based on LibGen and Sci-Hub

2 projects | news.ycombinator.com | 8 Sep 2023
Exporting a telegram chat without Telegram Desktop?

1 project | /r/DataHoarder | 17 Aug 2023
cryptoCMD: NEW Data - star count:456.0

1 project | /r/algoprojects | 8 Aug 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 31 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Scraper projects in Python? This list will help you:

	Project	Stars
1	newspaper	13,808
2	chinese-xinhua	10,694
3	Douyin_TikTok_Download_API	7,263
4	autoscraper	5,998
5	myGPTReader	4,407
6	snscrape	4,269
7	Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE	3,070
8	bulk-downloader-for-reddit	2,218
9	JobFunnel	1,740
10	linkedin_scraper	1,765
11	mlscraper	1,235
12	animdl	1,227
13	cinemagoer	1,198
14	RedditDownloader	1,102
15	finviz	1,018
16	Scweet	978
17	GramAddict bot	944
18	scrapyrt	816
19	bookcorpus	784
20	google-maps-scraper	784
21	URS	740
22	TikTokLive	743
23	google-play-scraper	707

Python Scraper

Top 23 Python Scraper Projects

Python Scraper related posts

Can someone walk me through this?

What’s the coolest things you’ve done with python?

BDFR skipping Reddit hosted videos

Updated Drexel Scheduler to Winter Quarter

Show HN: New AI Dataset Based on LibGen and Sci-Hub

Exporting a telegram chat without Telegram Desktop?

cryptoCMD: NEW Data - star count:456.0

Index