Content Parser – Extract Markdown, HTML or text from content-heavy websites

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • readability

    A standalone version of the readability lib

  • * [Readability](https://github.com/mozilla/readability) to strip down the page's HTML to a bare minimum.

  • to-markdown

    🛏 An HTML to Markdown converter written in JavaScript

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • puppeteer

    Node.js API for Chrome

  • * [Puppeteer](https://github.com/puppeteer/puppeteer) to download the page.

    It costs me only several cents to parse an entire page, and I think OP can make some money out of this if they get the pricing right.

    Some unsolicited feedbacks on the API:

  • clippy

    Opensource commandline webclipper. (by benprew)

  • I wrote something similar so I could save recipes and web pages for reading offline. And if you save in html, it will inline images, so you can have a single file. In markdown, it just creates a link.

    It also uses turndown and readability.

    It's pretty finicky (readability doesn't always identify the correct content or misses pieces of the content). If you want to charge for it, you'd have to fix some of those edge cases.

    Also, I don't think the value is this product is turning web pages into markdown, there are many free web clippers and archive sites that do this already. I see this as more of an "extra" in a product, like how Evernote has a web clipper built in to their note taking product.

    Also, it's cool to see other people care about a stripped down web reading experience too!

    https://github.com/benprew/clippy

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • The Puppeteer Language Experiment

    3 projects | dev.to | 21 May 2024
  • Hacking out an AI spider with Node

    1 project | dev.to | 11 May 2024
  • Learn Automated Testing At Home: A Beginner's Guide

    4 projects | dev.to | 4 Apr 2024
  • Show HN: Quetta – A privacy-first web browser with enhanced ad blocker inside

    2 projects | news.ycombinator.com | 18 Jan 2024
  • How To Enable Hardware Acceleration on Chrome, Chromium & Puppeteer on AWS in Headless mode

    4 projects | dev.to | 25 Oct 2023