Kubic:Update and Reboot

Jump to: navigation, search

Update and Reboot of openSUSE Kubic and openSUSE MicroOS

Package Format

openSUSE Kubic and openSUSE MicroOS are using RPM packages.

There is absolutely no reason to switch to another package format for a minimal system or transactional updates. RPM as a package format is well known, since it is used by several major distributions. There are proven working toolchains from building RPMs to delivery of the resulting packages to users. Additionally a lot of users already have policies and toolchains for RPM updates.

Other RPM advantages are:

  • Signed, easy to verify
  • Verification of installed system possible
  • Delta-RPM to save bandwidth

As result, all updates delivered for openSUSE Kubic are RPM based, which allows you to continue to use your tools for mirroring and verification. Only exception is, that we will not use zypper for updating RPMs, but transactional-update.

Update and Reboot Strategy

For security and stability reasons, the Operating System and the application stack should always be up-to-date. While this is not a problem with single machines, where you can apply all updates by running the commands manually, this can become a real burden in a big cluster. For this reason we believe that automatic updates are the right thing to do.

To update the system fully automatic Transactional Updates are used. The automatic update process can be disabled, or a maintenance window can be configured, in which the update and, if necessary, reboot of the server will be done. Standard RPMs are used for updates, and they will be delivered in the same way as for openSUSE Tumbleweed. If needed, something like SMT or RMT can be used as local proxy.

Transactional Updates

To limit the risk for your machines, updates are applied as transactional updates. This means:

  • They are atomic
    • Either fully applied or not at all. So at any point during the update, you can switch of the machine and at the next boot either the old, unmodified installation boots or the new one, never a mix. This also means, that old snapshots will not be destroyed (as you would happen if you use two partitions and switch between them) and you can still do a rollback if needed.
    • The update does not influence your running system. This means that the running processes don't see, that an update is happening and they will not be restarted. Which would be pretty useless, as a restart of the running daemons would only start the old binary again, the new one is only available after a reboot.
  • Can be rolled back
    • If the update fails or if the update is not compatible, you can quickly restore the situation as it was before the update.
  • The system needs to be rebooted to activate the changes

Responsible for this part is transactional-update(8). It is called by systemd.timer once a day. This is configurable by creating a file /etc/systemd/system/transactional-update.timer.d/timer.conf containing:

 [Timer]
 #the first entry is to delete existing timer, else only a new one is added additional
 OnCalendar=
 OnCalendar=date/time

For more information about which options can be configured and possible values, please see systemd.unit(5) and systemd.time(5).

It should be made sure that not all machines start the update at the same time. Depending on the network infrastructure and the number of machines, this could create a really high (too high) load.

This script checks first, if updates are available. If yes, a new snapshot of the root filesystem is created and updated with zypper dup. So all RPMs which are released at that point in time and not yet installed will be applied. Afterwards, the snapshot is marked as active and only used after the next reboot. For this reason, the script can reboot the machine itself afterwards or inform another service to do the reboot.

After the next reboot, the system verifies itself and if mandatory daemons were not started correctly, a rollback to the last known working snapshot is done automatically.

Cleanup of old snaphots

As every update creates a new snapshot, snapper cleanup policies and rules are used to remove the no longer needed snapshots later:

  • Configured number of snapshots stay, rest will be removed
  • Unimportant snapshots will be deleted first

The command responsible for this is transactional-update cleanup.

  • Runs regular by systemd.timer.
  • Snapshots used in the past get marked as important and that they can be deleted.
  • Snapshots not used in the past get marked as can be deleted, but not as important. So they will be preferred removed.

Disable Automatic Updates

Automatic updates can be disabled with:

 systemctl --now disable transactional-update.timer

Rebooting

There are several ways to reboot a system after an update:

  • systemctl reboot - immediate hard reboot
  • rebootmgr - policy based reboot by a daemon
  • Kubernetes Reboot Daemon - container interacting with kubernetes to reboot the system

Which method should be used to reboot the machine has to be configured in transactional-update.conf(5) for transactional-update. The default is to use rebootmgr, and if that is not running, use systemctl reboot.

rebootmgr

rebootmgr is a daemon, which can be configured to reboot the machine according to special policies. It can be controlled by rebootmgrctl(1). rebootmgr does not drain any kubernetes nodes, it does a hard reboot at the configured time. So it is more a solution for openSUSE MicroOS (as Container Host OS) than for a kubernetes cluster.

Reboot Strategy Options

rebootmgr supports different strategies, when a reboot should be done:

  • instantly - when the signal arrives other services will be informed that we plan to reboot and do the reboot without getting any locks or waiting for a maintenance window.
  • maint-window - reboot only during a specified maintenance window. If no window is specified, reboot immediately.
  • etcd-lock - acquire a lock at etcd for the specified lock-group before reboot. If a maintenance window is specified, acquire the lock only during this window.
  • best-effort - this is the default. If etcd is running, use etcd-lock. If no etcd is running, but a maintenance window is specified, use maint-window. If no maintenance window is specified, reboot immediately (instantly).
  • off - rebootmgr continues to run, but ignores all signals to reboot. Setting the strategy to `off` does not clear the maintenance window. If rebootmgr is enabled again, it will continue to use the old specified maintenance window.

The reboot strategy can be configured via /etc/rebootmgr.conf and at adjusted at runtime via rebootmgrctl. This changes will be written to the configuration file and survive the next reboot. A default configuration file would be:

 [rebootmgr]
 window-start=03:30
 window-duration=1h30m
 strategy=best-effort
 lock-group=default

Which means the machine is only allowed to reboot in the night between 3:30 and 5:00 o'clock. If etcd is running, it tries to get a lock during that time and reboots only afterwards. If no lock could be get during this timeframe, no reboot is done. The format of window-start is the same as described in systemd.time(7). The format of the window-duration is [XXh][YYm]

Locking via etcd

To make sure that not all machines reboot at the same time, the machines can be sorted into groups and the number of machines of a group which are allowed to reboot at the same time can be configured and controlled via etcd. So you can create a group etcd, which contains all machines running etcd, and specify that only one etcd server is allowed to reboot at one time. And a second group worker, in which a higher number of machines are allowed to reboot at the same time.

The etcd path to the directory containing data for a group is: /opensuse.org/rebootmgr/locks/<group>/

This directory contains two variables: mutex, which is by default 0 and can be set via atomic_compare_and_swap to 1 to make sure that only one machine has write access, and a variable `data` containing the following json structure:

 {
   "max":1,
   "holders":[]
 }

holders will contain a unique ID of the machine, in this case the one from /etc/machine-id.

So a record containing two locks out of 10 possible one would look like:

 {
   "max":10,
   "holders":[
     "3cb8c701b4d3474d99a7e88b31dd3439",
     "71c8efe539b280af2fe09b3b5771345e"
   ]
 }


A typical workflow of a client which tries to reboot would look like:

  • check for are free locks, else watch the data variable until it changes
  • get the mutex
  • add yourself to the holders list
  • release the mutex
  • reboot
  • on boot, check if we hold a lock. If yes:
    • get the mutex
    • remove the ID from the holders list
    • release the mutex

Disable Rebootmgr

rebootmgr can be disabled with:

 systemctl --now disable rebootmgr

or

 rebootmgrctl off

Kubernetes Reboot Daemon

The Kubernetes Reboot Daemon kured is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS. It is running inside of a container on every node. So this option is only for openSUSE Kubic with the kubeadm system role.

To enable this method, tell kubernetes to run the container on all nodes:

 kubectl apply -f /usr/share/kured/kured-<version>.yaml

With kubectl get pods --namespace=kube-system you can verify that the container is running on all nodes.

Afterwards, configure transactional-update on every node to use kured. If /etc/transactional-update.conf does not exist (which is the default):

 echo "REBOOT_METHOD=kured" > /etc/transactional-update.conf

else add this entry to the config file.