Skip to content

Debian Server Setup

Jason Sherman edited this page Aug 15, 2025 · 12 revisions

Setting up Wikilink (externallinks) on Wikimedia Debian server, using staging as an example.

  1. Log into Horizon and create a new instance with image: Debian and flavour: g3.cores4.ram8.disk20
  2. Attach the wikilink-backup and docker-data-root volumes to the instance
  3. Create a web proxy to enable http access to the instance.
  4. Shell into the instance
  5. Create a swap file:
    1. fallocate -l 8G /swapfile
    2. chmod 600 /swapfile
    3. mkswap /swapfile
    4. swapon /swapfile
    5. echo "/swapfile none swap sw 0 0">>/etc/fstab
  6. run the cinder configuration scripts to mount the volumes. See (https://wikitech.wikimedia.org/wiki/Help:Adding_Disk_Space_to_Cloud_VPS_instances#Cinder:_Attachable_Block_Storage_for_cloud-vps) for more info
    1. mount docker-data-root to /usr/local/docker-data
    2. mount wikilink-backup to /usr/local/backup
  7. Install docker (https://docs.docker.com/engine/install/debian)
  8. Create the file /etc/docker/daemon.json with the following content to configure docker to use the data volume
{
  "data-root": "/mnt/docker-data"
}
  1. Install docker-compose (https://docs.docker.com/compose/install/) version 1.25.5. 1.
    sudo curl -SL https://github.com/docker/compose/releases/download/1.25.5/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose
    sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
    
    1. docker-compose versions newer than 1.25.5 have a bug in the config subcommand that causes it to require resources .. cpus elements to be a number instead of a string. Docker swarm requires it to be a string instead of a number. This breaks our deployment.
    2. docker-compose versions newer than 1.27.1 have a bug in the config subcommand that causes it to output depends_on elements as a dict instead of a list. This breaks our deployment.
  2. Become root
    1. sudo su root
  3. Create a shared user to manage docker services.
    1. adduser wikilink --disabled-password --quiet ||:
    2. usermod -aG docker wikilink
  4. Clone the externallinks repo
    1. cd /srv
    2. git clone https://github.com/WikipediaLibrary/externallinks.git
  5. Check out appropriate branch.
    1. cd /srv/externallinks
    2. git checkout staging
  6. Copy template.env to .env, and edit to add real values.
    1. cp template.env .env
    2. vim .env
  7. Set permissions.
    1. chown -R wikilink:wikilink /srv/externallinks
  8. Become service user.
    1. su wikilink
  9. Create a swarm.
    1. docker swarm init
  10. Deploy. Note the config command which allows use of .env in swarm.
    1. cd /srv/externallinks
    2. docker stack deploy -c <(docker-compose config 2>/dev/null) staging
  11. Setup cron tasks
    1. crontab -e
    2. Enter the following:
# Run django_cron tasks every 5 minutes.
*/5 * * * *  docker exec -t $(docker ps -q -f name=staging_externallinks) python manage.py runcrons
# Check for and apply externallinks updates every 5 minutes.
*/5 * * * *  /srv/externallinks/bin/swarm_update.sh
# Prune containers and volumes weekly.
0 0 * * 0 docker system prune -a -f; docker volume rm $(docker volume ls -qf dangling=true)

Backup and restore

Backups should happen automatically every week in the production environment.

You may exec the backup script to manually backup:

docker exec -it $(docker ps -aq -f name=staging_externallinks | head -n 1) bin/backup.sh

To restore a backup, exec the restore script with the desired backup as the first and only argument:

docker exec -it $(docker ps -aq -f name=staging_externallinks | head -n 1) bin/restore.sh backup/202009181532.tar.gz

Some helpful commands:

  1. watch -n 30 docker logs -f $(docker ps -a -q -f name=<staging_migrate> | head -n 1) - watch the docker logs for the container at the top of the given service name at an interval of 30 seconds
  2. docker ps - lists all running containers
  3. docker service ls - lists all services
  4. docker exec -it $(docker ps -aq -f name=staging_externallinks | head -n 1) bash -c "date --rfc-3339=seconds && echo 'select timestamp from links_linkevent order by timestamp desc limit 1;' | python manage.py dbshell" - show current timestamp and the latest event timestamp

Pipeline – GitHub Workflows to Docker Hub

  1. The workflow performs two distinct sets of operations – testing and pushing.
  2. Push is permitted only when the tests pass and the commit is performed on certain branches (staging and master).
  3. Push tags two images (externallinks and eventstream) and pushes them to Docker Hub.
  4. We tag the same image twice, once with the branch name and once with the sha of the commit. This allows us to revert back to a previous image since images with branch names as their tags are overwritten on Docker Hub.
  5. We use GitHub secrets to store our secrets. DOCKER_USERNAME and DOCKER_PASSWORD are the credentials of the shared user: wikipedialibrarybot

Clone this wiki locally