Top 23 web-scraper Open-Source Projects

awesome-crawler

7 6,167 1.5

A collection of awesome web crawler,spider in different languages
100ProjectsOfCode

26 2,965 0.0

A list of practical knowledge-building projects.

Project mention: Fired from an internship after 2 weeks | /r/cscareerquestions | 2023-06-02

Work on a personal project. There's a list of 100 sample projects at https://github.com/arpit-omprakash/100ProjectsOfCode

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
soup

4 2,133 0.0 Go

Web Scraper in Go, similar to BeautifulSoup
lightnovel-crawler

20 1,304 9.7 Python

Generate and download e-books from online sources.

Project mention: Help with Paperback IOS. | /r/mangapiracy | 2023-06-18

Use Lightnovel crawler on a computer in terminal or in their discord bot to find series across multiple LN / webnovel sites then choose the format to download (epub,pdf, txt, and many more)

stealth

26 997 0.0 JavaScript

:rocket: Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy
Monkey-DL (Anime Downloader)

5 810 0.0 Python

Bulk download your favourite anime episodes from your favourite anime websites
spidr

0 793 6.4 Ruby

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use. (by postmodern)
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
web-scraping

43 678 0.0 Python

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, MacroTrends, SHFE and alternative data crawlers on Tomtom, BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Project mention: web-scraping: NEW Data - star count:554.0 | /r/algoprojects | 2023-09-25

google-maps-scraper

5 649 7.3 Go

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place (by gosom)

Project mention: Show HN: A Google Maps Scraper | news.ycombinator.com | 2023-12-03

PHP Scraper

1 498 5.9 PHP

A universal web-util for PHP.
basketball_reference_web_scraper

2 411 7.2 Python

NBA Stats API via Basketball Reference
crawler

0 300 8.1 PHP

Library for Rapid (Web) Crawler and Scraper Development (by crwlrsoft)
summarizer

4 267 0.0 Python

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
awesome-web-scraper

0 240 4.9

A collection of awesome web scaper, crawler.
facebook_page_scraper

1 200 6.8 Python

Scrapes facebook's pages front end with no limitations & provides a feature to turn data into structured JSON or CSV
cascadia

1 134 4.7 Go

Go cascadia package command line CSS selector
Senpwai

2,176 132 9.3 Python

A desktop app for tracking and batch downloading anime

Project mention: Building W-9 Crafter | dev.to | 2024-03-28

It's been a cool learning experience making a Product Hunt listing, a small demo video, and allll the social posts (Twitter, LinkedIn, etc).

get-sauce

1 113 7.9 Go

A command line program to download Hentai videos and images from multiple websites
public-roadmap

15 42 4.0

Public roadmap for SerpApi, LLC (https://serpapi.com) (by serpapi)

Project mention: AI Report #4: AutoGPT And Open-source lags behind Part 2 | news.ycombinator.com | 2023-06-15

> The google search function is also limited. For comparison, SerpAPI masterfully scrapes Google Search using a proxy network and very intelligent parsing. In experiments using SerpAPI in combination with Microsoft’s guidance module, I got much farther than AutoGPT.
Thanks for your kind words. We are working on SerpApi integration for Auto-GPT: https://github.com/serpapi/public-roadmap/issues/905

CobWeb-lnx

5 38 4.7 Python

CobWeb is a Python library for web scraping. The library consists of two classes: Spider and Scraper.

Project mention: Quem já contribuiu e quem já usou projectos open-source? | /r/devpt | 2023-06-30

yast

1 29 1.7 Go

Yet Another Streaming Tool

Project mention: [OpenSource] I am building high performance Plex alternative in Go for Movies and TV Show | /r/golang | 2023-06-02

I also build a similar tool, it let's you choose and play movies. I used webtorrent behind the scenes. https://github.com/qascade/yast

reddit-bots

3 23 0.0 Python

A collection of Reddit bots that I use to enhance the subreddits I manage.
tagalog-dictionary-scraper

1 23 0.0 Python

Builds a Tagalog dictionary by collecting Tagalog words from tagalog.pinoydictionary.com
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

web-scraper related posts

Show HN: A Google Maps Scraper

1 project | news.ycombinator.com | 3 Dec 2023
Google Maps Scraper in Golang

1 project | news.ycombinator.com | 27 Jul 2023
I'm trying and failing to compile someone else's project to wasm.

2 projects | /r/learnrust | 17 Jul 2023
Help with Paperback IOS.

1 project | /r/mangapiracy | 18 Jun 2023
Fired from an internship after 2 weeks

1 project | /r/cscareerquestions | 2 Jun 2023
Need help thinking of a personal project

1 project | /r/csMajors | 2 Jun 2023
Multiparadigmatic Web Scraping Tool!

1 project | /r/computerscience | 14 May 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 23 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source web-scraper projects? This list will help you:

	Project	Stars
1	awesome-crawler	6,167
2	100ProjectsOfCode	2,965
3	soup	2,133
4	lightnovel-crawler	1,304
5	stealth	997
6	Monkey-DL (Anime Downloader)	810
7	spidr	793
8	web-scraping	678
9	google-maps-scraper	649
10	PHP Scraper	498
11	basketball_reference_web_scraper	411
12	crawler	300
13	summarizer	267
14	awesome-web-scraper	240
15	facebook_page_scraper	200
16	cascadia	134
17	Senpwai	132
18	get-sauce	113
19	public-roadmap	42
20	CobWeb-lnx	38
21	yast	29
22	reddit-bots	23
23	tagalog-dictionary-scraper	23