Webrecorder: Capture interactive websites and replay them at a later time

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • browsertrix-crawler

    Run a high-fidelity browser-based crawler in a single Docker container

  • (Disclaimer: I work at Webrecorder)

    Our automated crawler browsertrix-crawler (https://github.com/webrecorder/browsertrix-crawler) uses Puppeteer to run browsers that we archive in by loading pages, running behaviors such as auto-scroll, and then record the request/response traffic. We have some custom behavior for some social media and video sites to make sure that content is appropriate captured. It is a bit of a cat-and-mouse game as we have to continue to update these behaviors as sites change, but for the most part it works pretty well.

    The trickier part is in replaying the archived websites, as a certain amount of re-writing has to happen in order to make sure the HTML and JS are working with archived assets rather than the live web. One implementation of this is replayweb.page (https://github.com/webrecorder/replayweb.page), which does all of the rewriting client-side in the browser. This sets you interact with archived websites in WARC or WACZ format as if interacting with the original site.

  • replayweb.page

    Serverless replay of web archives directly in the browser

  • (Disclaimer: I work at Webrecorder)

    Our automated crawler browsertrix-crawler (https://github.com/webrecorder/browsertrix-crawler) uses Puppeteer to run browsers that we archive in by loading pages, running behaviors such as auto-scroll, and then record the request/response traffic. We have some custom behavior for some social media and video sites to make sure that content is appropriate captured. It is a bit of a cat-and-mouse game as we have to continue to update these behaviors as sites change, but for the most part it works pretty well.

    The trickier part is in replaying the archived websites, as a certain amount of re-writing has to happen in order to make sure the HTML and JS are working with archived assets rather than the live web. One implementation of this is replayweb.page (https://github.com/webrecorder/replayweb.page), which does all of the rewriting client-side in the browser. This sets you interact with archived websites in WARC or WACZ format as if interacting with the original site.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • archiveweb.page-site

    The ArchiveWeb.page Site

  • This is actually an issue with their docs that I encountered a few weeks ago when I was first experimenting with this tool. They apparently added a Spanish-language version of the docs, including an associated extra directory tree in the URL, but they failed to set up redirects or even update the existing links in the documentation.

    So those two pages are actually located at https://archiveweb.page/en/troubleshooting/errors/ and https://archiveweb.page/en/contact/ respectively.

    It looks like their docs site is open source at https://github.com/webrecorder/archiveweb.page-site, so I may try and send a pull request later today to go ahead and correct those links, and possibly also try and deploy some redirects to fix any existing links.

  • archiveweb.page

    A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

  • Playwright

    Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

  • See: https://github.com/microsoft/playwright/issues/6319

  • readability

    A standalone version of the readability lib

  • I wonder if Firefox "reader mode as a utility" might be a viable alternative for Pinboard like "content oriented" archiving?

    https://github.com/mozilla/readability

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • r18 database of metadata

    3 projects | /r/DataHoarder | 30 Sep 2022
  • Ask HN: What is going on at archive.ph?

    2 projects | news.ycombinator.com | 27 Sep 2022
  • "scrape" a javascript object from a website?

    1 project | /r/webscraping | 9 Sep 2022
  • Archiveweb.page – A High-Fidelity Web Archiving Extension for Chromium Browsers

    1 project | /r/CKsTechNews | 9 Oct 2021
  • Archiveweb.page – A High-Fidelity Web Archiving Extension for Chromium Browsers

    1 project | news.ycombinator.com | 9 Oct 2021