-
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
If you like the posts on the Crawlee blog so far, please consider giving Crawlee a star on GitHub, it helps us to reach and help more developers.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
In this tutorial, we can extract data from the HTML structure, so we will go with Cheerio, but for extracting data from SPAs or JavaScript-rendered websites, Crawlee also supports headless browser libraries like Playwright and Puppeteer
-
Make sure to update CSS on the App.css file from here: https://github.com/ayush2390/web-show-recommender/blob/main/src/App.css
-
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
In this tutorial, we can extract data from the HTML structure, so we will go with Cheerio, but for extracting data from SPAs or JavaScript-rendered websites, Crawlee also supports headless browser libraries like Playwright and Puppeteer