38% of webpages that existed in 2013 are no longer accessible a decade later

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

ArchiveBox

250 20,023 9.8 Python

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.

john.soban.ski

1 4 6.7 Python

Pelican documents for https://john.soban.ski

I post my site content Markdown to an open Git repo for this reason. Anyone can pull and build my pages. I think Git should stay for at least another 100 years. https://github.com/hatdropper1977/john.soban.ski

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Internet-Places-Database

17 27 9.3

Database of Internet places. Mostly domains

Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.

Django-link-archive

14 14 9.6 Python

Link archive for a NAS drive

Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.

linkrot

4 14 5.1 Go

Linkrot checks for broken links on a given website
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Any options available to organize and save (may be) reddit saved posts?

4 projects | /r/selfhosted | 20 Apr 2023
Two servers on the same machine?

2 projects | /r/jellyfin | 25 Oct 2022
So many feed readers, so many behaviors

4 projects | news.ycombinator.com | 28 May 2024
RSS and why I believe most people should come back using them

2 projects | news.ycombinator.com | 15 May 2024
A self-hosted dashboard that puts all your feeds in one place

1 project | news.ycombinator.com | 10 May 2024

38% of webpages that existed in 2013 are no longer accessible a decade later

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
RSS Archiving and Digital Preservation (DP) Golang Aggregator Pocket
Post date: 18 May 2024

ArchiveBox

john.soban.ski

Scout Monitoring

Internet-Places-Database

Django-link-archive

linkrot

InfluxDB

Related posts

Any options available to organize and save (may be) reddit saved posts?

Two servers on the same machine?

So many feed readers, so many behaviors

RSS and why I believe most people should come back using them

A self-hosted dashboard that puts all your feeds in one place

38% of webpages that existed in 2013 are no longer accessible a decade later

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com RSS Archiving and Digital Preservation (DP) Golang Aggregator Pocket Post date: 18 May 2024

ArchiveBox

john.soban.ski

Scout Monitoring

Internet-Places-Database

Django-link-archive

linkrot

InfluxDB

Related posts

Any options available to organize and save (may be) reddit saved posts?

Two servers on the same machine?

So many feed readers, so many behaviors

RSS and why I believe most people should come back using them

A self-hosted dashboard that puts all your feeds in one place

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
RSS Archiving and Digital Preservation (DP) Golang Aggregator Pocket
Post date: 18 May 2024