-
SeleniumBase
📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
You want a proper html 5 parser that can handle non valid documents. And the fastest one is https://github.com/kovidgoyal/html5-parser over 30x faster than html5lib
In those cases you might want to check out SeleniumBase: https://seleniumbase.io/
Playwright for Python has really good documentation: https://playwright.dev/python/
I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...
Playwright for Python has really good documentation: https://playwright.dev/python/
I used it for my https://shot-scraper.datasette.io/ tool, and wrote a bit about CLI-driven scraping using that tool here: https://simonwillison.net/2022/Mar/14/scraping-web-pages-sho...
> Does anyone know if there as a good equivalent for Go
Yes: https://github.com/anaskhan96/soup
It works well.