-
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.
I post my site content Markdown to an open Git repo for this reason. Anyone can pull and build my pages. I think Git should stay for at least another 100 years. https://github.com/hatdropper1977/john.soban.ski
Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.
Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
Some time ago I have created my own domain repository with domain names: https://github.com/rumca-js/Internet-Places-Database
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
https://github.com/rumca-js/Django-link-archive
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.