Downloading All of Wikipedia

March 25, 2022

Why?

I recently came across this article explaining why people in Russia are downloading copies of Wikipedia “just in-case”.

I thought it might be a good idea to capture all of the information on Wikipedia because... gestures broadly around me, sort of like a digital prepper.

Also, this is more of an exercise in “why not?” than anything else. I’ll probably delete the 87GB when this 14TB drive gets close to running out of space.

Try It Out:

Go to https://wikipedia.michaellunzer.com to test out what it's like browsing my offline copy of wikipedia.

How:

This project leaned heavily on this webpage containing the docker-compose file instructing how to easily spin up the Kiwix server in a container:

https://thehomelab.wiki/books/docker/page/setup-and-install-kiwix-serve-on-debian-systems

The Homelab Wiki does a great job explaining each .zim file's contents:

What do mini, nopic and maxi mean in the Wikipedia zim files?

File size is always an issue when downloading such big content, so Kiwix produces each Wikipedia file in three flavours:

  • Mini: only the introduction of each article, plus the infobox. Saves about 95% of space vs. the full version.
  • nopic: full articles, but no images. About 75% smaller than the full version
  • Maxi: the default full version.
  1. CD into your mapped data directory for the Docker container to download the zim files using wget.
  2. Add the name of each zim file to your stack.
  3. Start or deploy the docker stack and enjoy!

Zim Files:

All of wikipedia is contained in a singular .zim file. Check them out here:

http://download.kiwix.org/zim/wikipedia/

Please note that the Zim file that contains all of wikipedia was last updated in December 2021. I’m not sure of the update cadence but I’ll check it in a few months and update my file accordingly. I’m also unaware if there is a way to capture incremental changes rather than downloading an incremental change file.

Docker Compose

version: "3.9"
services:
  kiwix-serve:
    image: kiwix/kiwix-serve
    volumes:
      - /srv/dev-disk-by-label-Atlantic14TB/wikipedia:/data
    ports:
      - '8411:80'
    command:
       wikipedia_en_top_mini_2022-02.zim
       wikipedia_en_all_maxi_2021-12.zim
Share: