Scrapy Vs. Crawlee

This page summarizes the projects mentioned and recommended in the original post on dev.to

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Crawlee is one of the few web scraping and automation libraries that supports JavaScript and TypeScript. Crawlee supports CLI just like Scrapy, but it also provides pre-built templates in TypeScript and JavaScript with support for Playwright and Puppeteer. These templates help beginners to quickly understand the file structure and how it works.

  • Scrapy

    Scrapy, a fast high-level web crawling & scraping framework for Python.

  • Scrapy is an open-source Python-based web scraping framework that extracts data from websites. With Scrapy, you create spiders, which are autonomous scripts to download and process web content. The limitation of Scrapy is that it does not work very well with JavaScript rendered websites, as it was designed for static HTML pages. We will do a comparison later in the article about this.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • scrapy-playwright

    🎭 Playwright integration for Scrapy

  • Scrapy does not support headless browsers natively, but it supports them with its plugin system, similarly it does not support scraping JavaScript rendered websites, but the plugin system makes this possible. One of the best examples is its Playwright plugin.

  • apify-sdk-js

    Apify SDK monorepo

  • Crawlee is also an open-source library that originated as Apify SDK. Crawlee has the advantage of being the latest library in the market, so it already has many features that Scrapy lacks, like autoscaling, headless browsing, working with JavaScript rendered websites without any plugins, and many more, which we are going to explain later on.

  • puppeteer

    Node.js API for Chrome

  • In Crawlee, you can scrape JavaScript rendered websites using the built-in headless Puppeteer and Playwright browsers. It is important to note that, by default, Crawlee scrapes in headless mode. If you don't want headless, then just set headless: false.

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • In Crawlee, you can scrape JavaScript rendered websites using the built-in headless Puppeteer and Playwright browsers. It is important to note that, by default, Crawlee scrapes in headless mode. If you don't want headless, then just set headless: false.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Podfile.lock CocoaPods version update

    1 project | /r/reactnative | 12 Jul 2023
  • How I Sliced Deployment Times to a Fraction and Achieved Lightning-Fast Deployments with GitHub Actions

    5 projects | dev.to | 17 May 2023
  • Talk to ChatGPT within the Total.js Flow

    2 projects | dev.to | 15 Feb 2023
  • Spidergram is a collection of tools my company Autogram has built or enabled over the past several years to support our work to automate content inventories for large websites: it's part web crawler, part domain model, and part mad science. We released the first public beta today.

    5 projects | /r/webscraping | 2 Dec 2022
  • Automation ideas with Javascript:

    2 projects | /r/learnjavascript | 28 Oct 2021