WikiReader

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Wikireader-Adventures

    Developing useless widgets in Forth with the Wikireader.

  • I've written a few applications for WikiReaders using the built-in Forth interpreter: https://github.com/JohnEarnest/Wikireader-Adventures

  • libzim

    Reference implementation of the ZIM specification

  • I meant the Kiwix dump (https://download.kiwix.org/zim/wikipedia_en_all_nopic.zim – careful, 60GB!).

    At a first glance, the Wikimedia XML dump does not look substantially different from what Kiwix/ZIM does with compressed HTML: They're both compressed (bz2 for the Wikimedia dump, zstd or LZMA for Kiwix/ZIM), and both compress multiple files at once, so inter-file redundancy should hopefully be significantly reduced.

    HTML seems a bit more verbose than the Mediawiki syntax (plus the XML header for each article), but I'd be surprised if that actually accounted for a 3x difference in size.

    Then again, Kiwix seems to have experimented with shared dictionary brotli compression, which supposedly yields an >2x improvement: https://github.com/openzim/libzim/issues/144

    I wonder if their current zstd implementation also uses shared dictionaries. If not, that might just be the reason: If ZIM compression chunks are much smaller than the bz2 streams of the Wikimedia dumps, there would still be a lot of redundancy between chunks.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Libzim now has an official WebAssembly build target... why this is big (for us)!

    1 project | /r/Kiwix | 8 Dec 2022
  • Seeking help

    1 project | /r/Kiwix | 28 May 2023
  • Recent Wiktionary ZIM files don't show a search bar

    3 projects | /r/Kiwix | 25 Apr 2023
  • A new version of Wikipedia_en_all_maxi is available! (link below)

    1 project | /r/Kiwix | 20 Feb 2023
  • How to serve content on website over open WiFi for neighborhood ?

    1 project | /r/computerhelp | 8 Nov 2022