Beautiful Soup: We called him Tortoise because he taught us

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • html5-parser

    Fast C based HTML 5 parsing for python

  • You want a proper html 5 parser that can handle non valid documents. And the fastest one is https://github.com/kovidgoyal/html5-parser over 30x faster than html5lib

  • SeleniumBase

    📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.

  • In those cases you might want to check out SeleniumBase: https://seleniumbase.io/

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • colly

    Elegant Scraper and Crawler Framework for Golang

  • shot-scraper

    A command-line utility for taking automated screenshots of websites

  • Playwright for Python has really good documentation: https://playwright.dev/python/

    I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

  • playwright-python

    Python version of the Playwright testing and automation library.

  • Playwright for Python has really good documentation: https://playwright.dev/python/

    I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...

  • soup

    Web Scraper in Go, similar to BeautifulSoup

  • > Does anyone know if there as a good equivalent for Go

    Yes: https://github.com/anaskhan96/soup

    It works well.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Flyscrape – A standalone and scriptable web scraper in Go

    6 projects | news.ycombinator.com | 11 Nov 2023
  • New modern web crawling tool

    2 projects | news.ycombinator.com | 30 Apr 2023
  • No code command line webscraper

    3 projects | /r/webscraping | 9 Mar 2023
  • Go for web scraping

    5 projects | /r/golang | 18 Nov 2022
  • Dan terjadi lagi

    3 projects | /r/indonesia | 16 Oct 2022