38% of webpages that existed in 2013 are no longer accessible a decade later

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • ArchiveBox

    🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

  • There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.

  • john.soban.ski

    Pelican documents for https://john.soban.ski

  • I post my site content Markdown to an open Git repo for this reason. Anyone can pull and build my pages. I think Git should stay for at least another 100 years. https://github.com/hatdropper1977/john.soban.ski

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • Internet-Places-Database

    Database of Internet places. Mostly domains

  • Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.

    There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.

    Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.

    Our society produces content, not quality, not products.

    SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.

    Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".

    Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.

    Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database

    I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).

    The second thing is I cannot find anything "amiga" related.

    This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.

    Right now I am using less and less google, because I use more and more my bookmark manager.

    https://github.com/rumca-js/Django-link-archive

    My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.

    In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.

    Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.

    There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.

    Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.

    Our society produces content, not quality, not products.

    SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.

    Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".

    Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.

    Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database

    I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).

    The second thing is I cannot find anything "amiga" related.

    This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.

    Right now I am using less and less google, because I use more and more my bookmark manager.

    https://github.com/rumca-js/Django-link-archive

    My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.

    In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.

  • linkrot

    Linkrot checks for broken links on a given website

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Any options available to organize and save (may be) reddit saved posts?

    4 projects | /r/selfhosted | 20 Apr 2023
  • Two servers on the same machine?

    2 projects | /r/jellyfin | 25 Oct 2022
  • So many feed readers, so many behaviors

    4 projects | news.ycombinator.com | 28 May 2024
  • RSS and why I believe most people should come back using them

    2 projects | news.ycombinator.com | 15 May 2024
  • A self-hosted dashboard that puts all your feeds in one place

    1 project | news.ycombinator.com | 10 May 2024