SDB:Disaster Recovery

Jump to: navigation, search


Goal: Recreate a destroyed system

You like to be prepared so that if your system got destroyed you can recreate it as much as possible as it was before regardless what exactly was destroyed, from messed up software (like deleted essential files, corrupted file systems, destroyed disk partitioning) up to broken hardware (like defective haddisks or completely destroyed computers).

From that goal "recreate a system" follows:

Disaster recovery means installation (reinstalling from scratch)

The core of disaster recovery is an installer that reinstalls the system from scratch.

The fundamental steps of system installation are:

  • Boot an installation system on the "bare metal" where "bare metal" could be also a bare virtual machine.
  • In the installation system run an installer that performs the following fundamental steps:
  1. Prepare persistent storage (disk partitioning with filesystems and mount points).
  2. Store the payload in the persistent storage (install files).
  3. Install a boot loader.
  • Finally reboot so that the installation system is shut down and the installed system is booted.

In case of an initial system installation "store the payload" means usually to install RPM software packages.

In case of disaster recovery "store the payload" means to restore the files from a backup.

The only real difference between a usual system installation and disaster recovery is the way how the payload is stored.

System configuration (e.g. network, users, services, language, keyboard, and so on) is a separated task that is not mixed up with what is meant with "installation" here. (For example also in YaST the actual installation is kept separated from the configuration - at least to a certain extent.)

System configuration is another difference:

An initial usual system installation requires an additional subsequent system configuration step because the installed RPM software packages result a pristine "raw" system with the default configuration files from the software packages that must be configured as needed.

In contrast disaster recovery results a configured system because the configuration files are restored from the backup.

Image installation:

Another way of system installation (with or without configuration) is the so called "image installation". In case of image installation "store the payload" usually means to store an image of the files (whatever "image" exactly means) in the persistent storage that can result a "raw" system with default configuration files or a configured system with adapted configuration files as needed.

For example KIWI is an image build system that can be used for an image installation (see https://en.wikipedia.org/wiki/KIWI_%28computing%29).

Basics

This article is only about how to recover from a disaster by recreating the destroyed system. Other ways how to cope with a disaster (like high availability solutions via redundantly replicated systems) are not described here.

In this particular case "disaster recovery" means to recreate the basic operating system (i.e. what you had initially installed from an openSUSE or SUSE Linux Enterprise install medium).

In particular special third party applications (e.g. a third party database system which often requires special actions to get it installed and set up) must usually be recreated in an additional separate step.

The basic operating system can be recreated on the same hardware or on fully compatible replacement hardware so that "bare metal recovery" is possible.

Fully compatible replacement hardware is needed

"Fully compatible replacement hardware" basically means that the replacement hardware works with the same kernel modules that are used on the original hardware and that it boots the same way. For example one cannot have a replacement network interface card that needs a different kernel module (possibly with different additional firmware) and one cannot switch from BIOS to UEFI or switch to a different architecture (e.g. from 32-bit to 64-bit) and things like that.

In general "hardware" could be also virtual hardware like a virtual machine with virtual harddisks (cf. the section about "Virtual machines" below). But "fully compatible replacement hardware" means that one cannot switch from real hardware to virtual hardware or from one kind of virtualization environment to a different kind of virtualization environment (e.g. from KVM/QEMU to XEN).

Furthermore "fully compatible replacement hardware" means that one cannot switch from one kind of storage structure to a different kind of storage structure (e.g. from a single disk to two disks or vice versa). Also one cannot switch from one kind of storage hardware to a different kind of storage hardware (e.g. from a SATA disk to a NVMe disk or from local disks to remote storage like SAN). Ideally the disks on replacement hardware should have exactly same size as on the original system (usually a bit bigger disk size should not cause big issues but smaller disk sizes are often rather problematic).

Provisions while your system is up and running

  1. Create a backup of all your files
  2. Create a bootable recovery medium that contains a recovery installation system plus a recovery installer for your system
  3. Have replacement hardware available
  4. Verify that it works to recreate your system on your replacement hardware

After your system was destroyed

  1. If needed: Replace broken hardware with your replacement hardware
  2. Recreate your system with your recovery medium plus your backup

Inappropriate expectations

Words like "just", "simple", "easy" are inappropriate for disaster recovery.

  • Disaster recovery is not "easy".
  • Disaster recovery is not at all "simple".
  • There is no such thing as a disaster recovery solution that "just works".

There is an exception where such words are appropriate:

The simpler the system, the simpler and easier the recovery.

Disaster recovery does not just work

Even if you created the recovery medium without an error or warning, there is no guarantee that it will work in your particular case to recreate your system with your recovery medium.

The basic reason why there is no disaster recovery solution that "just works" is that it is practically impossible to autodetect in a reliable working way all information that is needed to recreate a particular system:

  • Information regarding hardware like required kernel modules, kernel parameters,...
  • Information regarding storage like partitioning, filesystems, mount points,...
  • Information regarding bootloader
  • Information regarding network

For example there is the general problem that it is impossible to determine in a reliable way how a running system was actually booted. Imagine during the initial system installation GRUB was installed in the boot sector of the active partition like /dev/sda1 and afterwards LILO was installed manually in the master boot record of the /dev/sda harddisk. Then actually LILO is used to boot the system but the GRUB installation is still there. Also in case of UEFI things like "BootCurrent" (e.g. in the 'efibootmgr -v' output) are not reliable to find out how a system was booted (cf. https://github.com/rear/rear/issues/2276#issuecomment-559865674). Or the bootloader installation on the harddisk may not at all work and the system was actually booted from a removable medium (like CD or USB stick/disk) or the currently running system was not even normally booted but directly launched from another running system via 'kexec' (cf. the below section "Launching the ReaR recovery system via kexec").

In "sufficiently simple" cases disaster recovery might even "just work".

When it does not work, you might perhaps change your system configuration to be more simple or you have to manually adapt and enhance the disaster recovery framework to make it work for your particular case.

No disaster recovery without testing and continuous validation

You must test in advance that it works in your particular case to recreate your particular system with your particular recovery medium and that the recreated system can boot on its own and that the recreated system with all its system services still works as you need it in your particular case.

You must have replacement hardware available on which your system can be recreated and you must try out if it works to recreate your system with your recovery medium on your replacement hardware.

You must continuously validate that the recovery still works on the replacement hardware in particular after each change of the basic system.

Recommendations

Prepare for disaster recovery from the very beginning

Prepare your disaster recovery procedure that fits your particular needs at the same time when you prepare your initial system installation.

See also "Help and Support - Feasible in advance - Hopeless in retrospect" below.

In the end when it comes to the crunch your elaborated and sophisticated system will become useless when you cannot recreate it with your disaster recovery procedure.

The simpler your system, the simpler and easier your recovery (cf. above).

Bottom line what matters in the end:

Regardless how a system was installed and
regardless what is used for disaster recovery
eventually a disaster recovery installation
will be the final system installation.

Cf. the "Essentials about disaster recovery with Relax-and-Recover presentation PDF" link below.

Deployment via recovery installation

After the initial installation (plus configuration) from an openSUSE or SUSE Linux Enterprise install medium set up your system recovery and then reinstall it via your system recovery procedure for the actual productive deployment.

This way you know that your system recovery works at least on the exact hardware which you use for your production system.

Furthermore deployment via recovery installation ensures that in case of a disaster your particular disaster recovery reinstallation does not recreate your system with (possibly subtle but severe) differences (cf. below "The limitation is what the special ReaR recovery system can do") because this way you use one same installer (the recovery installer) both for deployment and for disaster recovery (cf. "Disaster recovery is installation" above).

Prepare replacement hardware for disaster recovery

You must have fully compatible replacement hardware available that is ready to use for disaster recovery.

See above what "fully compatible replacement hardware" means.

"Replacement hardware that is ready to use for disaster recovery" means in particular that its storage devices (harddisks or virtual disks) are fully clean (i.e. your replacement storage must behave same as pristine new storage).

When your replacement storage is not pristine new storage (i.e. when it had been ever used before), you must completely zero out your replacement storage. Otherwise when you try to reinstall your system (cf. "Disaster recovery means ... reinstalling" above) on a disk that had been used before, various kind of unexpected weird issues could get in your way because of whatever kind of remainder data on an already used disk (for example remainders of RAID or partition-table signatures and other kind of "magic strings" like LVM metadata and whatever else).

On non PC compatible architectures there could be issues to boot the Relax-and-Recover (ReaR) recovery system, cf. the below section "Recovery medium compatibility". You may need to prepare your replacement hardware with a small and simple system that supports kexec so that you can launch the ReaR recovery system via kexec, see the below section "Launching the ReaR recovery system via kexec".

Because zeroing out storage space may take a long time on big disks and preparing a system that supports kexec also needs time and testing you should pepare your replacement hardware in advance when you have time to do it in a 'relaxed' way (cf. "Notes on the meaning of 'Relax' in 'Relax-and-Recover'" below).

Be prepared for the worst case

Be prepared that your system recovery fails to recreate your system. Be prepared for a manual recreation from scratch. Always have all information available that you need to recreate your particular system manually. Manually recreate your system on your replacement hardware as an exercise.

Let's face it: Deployment via the recovery installer is a must

The above "recommendations" are actually no nice to have recommendations but mandatory requirements.

As long as you install your system with a different toolset (installation system plus installer) than what you intend to use later in case of emergency and time pressure for disaster recovery, you do not have a real disaster recovery procedure.

It cannot really work when you install your system with a different toolset than what you use to re-install it.

For a real disaster recovery procedure you should use one same installation system and one same installer for all kind of your installations.

At least you must use the same installation system and the same installer for your productive deployment installation and for your disaster recovery re-installation.

And even then still know your system well so that you are always prepared for a manual recreation from scratch.

Maintain a real disaster recovery procedure for your mission critical systems

When you have mission critical systems but you do not have a real disaster recovery procedure for them as described above (i.e. deployment via recovery installation plus continuous validation that your disaster recovery procedure actually works on your replacement hardware), your so called "mission critical systems" cannot be really mission critical for you because in fact you do not sufficiently care about your systems. Usually it is a dead end if your disaster recovery procedure failed to recreate your system on your replacement hardware after your original system was destroyed, see also the section about "Help and Support" below.

Relax-and-Recover (ReaR) RPM packages for disaster recovery

Relax-and-Recover (ReaR) is the de facto standard disaster recovery framework on Linux.

The software in those packages is intended for experienced users and system admins. There is no easy user-frontend (in particular there is no GUI) and in general software for disaster recovery does not behave really foolproof (it runs as 'root' and you need to know what it does).

rear / rear116 / rear1172a / rear118a / rear23a / rear27a

SUSE Linux Enterprise "rear..." RPM packages for disaster recovery with Relax-and-Recover (ReaR) via the SUSE Linux Enterprise High Availability Extension (SLE-HA):

  • "rear116" contains ReaR upstream version 1.16 plus special adaptions for btrfs in SUSE Linux Enterprise 12 GA
  • "rear1172a" contains ReaR upstream version 1.17.2 plus special adaptions for btrfs in SUSE Linux Enterprise 12 SP1 (rear116 does not work with the default btrfs setup in SUSE Linux Enterprise 12 SP1 because there are substantial changes compared to the default btrfs setup in SUSE Linux Enterprise 12 GA)
  • "rear118a" contains ReaR upstream version 1.18 plus lots of ReaR upstream commits towards version 1.19 (basically rear118a is almost ReaR upstream version 1.19) in particluar it contains a SLE12-SP2-btrfs-example.conf file to support btrfs quota setup for snapper that is used since SUSE Linux Enterprise 12 SP2 and another main feature is UEFI support together with the new package ebiso that is used since ReaR version 1.18 for making a UEFI bootable ReaR recovery system ISO image on SUSE Linux Enterprise systems (all ReaR versions we provide up to SUSE Linux Enterprise 12 SP1 only support traditional BIOS, see the "Recovery medium compatibility" section below)
  • "rear23a" contains ReaR upstream version 2.4 plus some later ReaR upstream commits. The rear23a RPM package originated from ReaR version 2.3 (therefore its RPM package name) and was initially provided only in SUSE Linux Enterprise 15 GA. Meanwhile a major rear23a RPM package update was done which contains now ReaR upstream version 2.4 plus some later ReaR upstream commits (up to the ReaR upstream git commit cc9e76872fb7de5ddd6be72d4008a3753046a528 cf. the rear23a RPM changelog). The RPM package name rear23a and version 2.3.a are kept so that an installed rear23a RPM package can be updated. In practice ReaR version 2.4 is basically required on POWER architectures (i.e. on 64-bit PPC64 and PPC64LE but not on the old 32-bit PPC which is not supported). ReaR version 1.18/1.19 should somewhat work on POWER but compared to what was enhanced and fixed at ReaR upstream since that time (March 2016) the rear118a RPM package behaves poor on POWER compared to the current (September 2018) rear23a RPM package.
  • "rear27a" contains the ReaR upstream version 2.7 release.

See the ReaR upstream release notes at http://relax-and-recover.org/documentation/ for new features, bigger enhancements, and possibly backward incompatible changes in the various ReaR versions. The ReaR upstream release notes for ReaR version 2.7 https://relax-and-recover.org/documentation/release-notes-2-7 also contain the changes of former ReaR versions.

For one SUSE Linux Enterprise product we provide several ReaR versions in parallel so that users where version N does not support their particular needs can upgrade to version M but on the other hand users who have a working disaster recovery procedure with version N do not need to upgrade. Therefore the package name contains the version and all packages conflict with each other to avoid that an installed version gets accidentally replaced with another version. See also the "Version upgrades for Relax-and-Recover" section below.

What "rear..." RPM packages are provided for what SUSE Linux Enterprise version:

SUSE Linux Enterprise 12:
rear116
rear1172a (since SLE12-SP1)
rear118a (since SLE12-SP2)
rear23a (as additional package via maintenance update for SLE12-SP2)
rear27a (since SLE12-SP5)

SUSE Linux Enterprise 15:
rear23a (plus maintenance update to its latest content)
rear27a (since SLE15-SP2)

The current openSUSE "rear" RPM package is provided in the openSUSE Build Service project "Archiving", see https://build.opensuse.org/package/show/Archiving/rear and see also the "Version upgrades for Relax-and-Recover" section below.

Disaster recovery with Relax-and-Recover (ReaR)

Relax-and-Recover is a disaster recovery framework.

Relax-and-Recover is the de facto standard disaster recovery framework on Linux in particular for enterprise Linux distributions like Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) via the SUSE Linux Enterprise High Availability Extension (SLE-HA).

Relax-and-Recover is abbreviated ReaR (often misspelled 'Rear' or 'rear'). Regarding Relax-and-Recover the word 'rear' is mainly used when the program /usr/sbin/rear is meant (e.g. programm calls like "rear mkbackup" and "rear recover") and in ReaR file and directory names (e.g. /etc/rear/local.conf). Also RPM package names are usually lowercase so that ReaR RPM packages are named 'rear'.

Relax-and-Recover is written entirely in the native language for system administration: as shell (bash) scripts.

Experienced users and system admins can adapt or extend the ReaR scripts if needed to make things work for their specific cases and - if the worst comes to the worst - even temporary quick and dirty workarounds are relatively easily possible - provided you know ReaR and your system well so that you are prepared for appropriate manual ad hoc intervention.

Professional services and support from Relax-and-Recover upstream are available (see http://relax-and-recover.org/support/).

How disaster recovery with ReaR basically works

Specify the ReaR configuration in /etc/rear/local.conf (cf. "man rear" and /usr/share/rear/conf/examples/ and /usr/share/rear/conf/default.conf) and then run "rear mkbackup" to create a backup.tar.gz on a NFS server and a bootable ReaR recovery ISO image for your system.

The ReaR recovery ISO image contains the ReaR recovery installation system with the ReaR installer that is specific for the system where it was made.

A recovery medium which is made from the ISO image is used to boot the ReaR recovery system on your replacement hardware (the ReaR recovery system runs in main memory via a ramdisk).

In the ReaR recovery system log in as root and run the ReaR installer via "rear recover" which does the following:

  1. Recreate the basic system storage, in particular disk partitioning with filesystems and mount points.
  2. Restore the backup from the NFS server.
  3. Install the boot loader.

Finally remove the ReaR recovery medium and reboot the recreated system.

In "sufficiently simple" cases it "just works" (provided you specified the right configuration in /etc/rear/local.conf for your particular case). But remember: There is no such thing as a disaster recovery solution that "just works". Therefore: When it does not work, you might perhaps change your system configuration to be more simple or you have to manually adapt and enhance the various bash scripts of ReaR to make it work for your particular case.

Notes on the meaning of 'Relax' in 'Relax-and-Recover'

Because there is no such thing as a disaster recovery solution that "just works" and because there is no disaster recovery without testing on actually available replacement hardware, the meaning of 'Relax' is not that one could easily configure /etc/rear/local.conf, just run "rear mkbackup", and simply relax.

The meaning of 'Relax' is: After an experienced admin had set it up (possibly with some needed adaptions) and after it was thoroughly tested and as long as it is continuously validated that the recovery actually works on the replacement hardware (in particular after each change of the basic system), then one can relax.

Additionally the meaning of 'Relax' is that you can spend your time to carefully set up, test, and continuously validate your particular disaster recovery procedure in advance before a disaster happens when you have time for it and when you can do it step by step in a 'relaxed' way.

When later a real disaster happens, even a relatively unexperienced person can do the recovery on the replacement hardware (boot the ReaR recovery system, log in as root, run "rear recover", and finally reboot).

Furthermore the meaning of 'Relax' is that you need to be 'relaxed' and take your time to carefully set up (with some trial an error legwork) and properly validate your particular disaster recovery procedure because eventually your particular disaster recovery installation will become your final system installation, cf. the "Essentials about disaster recovery with Relax-and-Recover presentation PDF" link below.

The limitation is what the special ReaR recovery system can do

The ReaR recovery system with the ReaR installer is totally different compared to the installation system on an openSUSE or SUSE Linux Enterprise install medium with the YaST installer and AutoYaST. This means when ReaR is used to recover your system, a totally different installer recreates your system. Therefore when the initial installation of the basic operating system from an openSUSE or SUSE Linux Enterprise install medium had worked, the special ReaR recovery system may not work in your particular case or it may work but recreate your system with some (possibly subtle but severe) differences.

For example:

The following is only an example for the general kind of issue. That particular issue does no longer happen with current ReaR versions - nowadays other issues of that general kind appear.

In current SUSE systems disks are referenced by persistent storage device names like /dev/disk/by-id/ata-ACME1234_567-part1 instead of traditional device nodes like /dev/sda1 (see /etc/fstab /boot/grub/menu.lst /boot/grub/device.map).

If "rear recover" is run on a system with a new harddisk (e.g. after the disk had failed and was replaced) the reboot may fail because the persistent storage device names are different.

In this case ReaR shows a warning like "Your system contains a reference to a disk by UUID, which does not work".

The fix in the running ReaR recovery system is to switch to the recovered system via "chroot /mnt/local" and therein check in particular the files /etc/fstab, /boot/grub/menu.lst and /boot/grub/device.map and adapt their content (e.g. by replacing names like /dev/disk/by-id/ata-ACME1234_567-part1 with the matching device node like /dev/sda1). After canges in /boot/grub/menu.lst and /boot/grub/device.map the Grub boot loader should be re-installed via "/usr/sbin/grub-install".

See https://github.com/rear/rear/issues/22

Alternatively: If your harddisk layout is sufficiently simple so that you do not need disks referenced by persistent storage device names, you could change your system configuration to be more simple by using traditional device nodes (in particular in /etc/fstab, /boot/grub/menu.lst and /boot/grub/device.map).

Relax-and-Recover versus backup and restore

Relax-and-Recover (ReaR) complements backup and restore of files but ReaR is neither a backup software nor a backup management software and it is not meant to be one.

In general backup and restore of the files is external functionality for ReaR.

I.e. neither backup nor restore functionality is actually implemented in ReaR.

ReaR only calls an external tool that does the backup of the files during "rear mkbackup" and its counterpart to do the restore of the files during "rear recover" (by default that tool is 'tar').

It is your task to verify your backup is sufficiently complete to restore your system as you need it.

It is your task to verify your backup can be read completely to restore all your files.

It is your task to ensure your backup is consistent. First and foremost the files in your backup must be consistent with the data that is stored in your ReaR recovery system (in particular what is stored to recreate the storage in files like var/lib/rear/layout/disklayout.conf) because ReaR will recreate the storage (disk partitions with filesystems and mount points) with the data that is stored in the ReaR recovery system and then it restores the files from the backup into the recreated storage which means in particular restored config files must match the actually recreated system (e.g. the contents of the restored etc/fstab must match the actually recreated disk layout). Therefore after each change of the basic system (in particular after a change of the disk layout) "rear mkbackup" needs to be run to create a new ReaR recovery system together with a matching new backup of the files (or when third party backup software is used "rear mkrescue" needs to be run to create a new ReaR recovery system and additionally a matching new backup of the files must be created). See also below "What 'consistent backup' means". E.g. you may have to stop certain running programs (e.g. whatever services or things like that) that could change files that get included in your backup before your backup tool is run.

It is your task to ensure your backups are kept safe at a sufficiently secure place. In particular the place where ReaR writes a new backup (e.g. a NFS share or a USB disk) is not a really safe place to also keep old backups (arbitrary things might go wrong when writing there).

Regarding "counterpart to do the restore": To be able to call the restore tool during "rear recover" the restore tool and all what it needs to run (libraries, config files, whatever else) must be included in the ReaR recovery system where "rear recover" is run. For several backup and restore tools/solutions ReaR has already built-in functionality to get the restore tool and all what it needs to run in the ReaR recovery system.

Usually only basic support for the various backup tools is implemented in ReaR (e.g. plain making a backup during "rear mkbackup" and plain restore during "rear recover"). It is very different to what extent support for each individual backup tool is implemented in ReaR because support for each individual backup tool is implemented separated from each other. Therefore for some particular backup tools the current support in ReaR could be even only "very basic" (cf. the below sections "How to adapt and enhance Relax-and-Recover" and "How to contribute to Relax-and-Recover"). In particular for most third party backup tools there is only support for plain backup restore during "rear recover" but no support at all to make a backup during "rear mkbackup" so "rear mkbackup" is useless and only "rear mkrescue" is useful for most third party backup tools ("rear mkbackup" creates the ReaR recovery system and makes a backup while "rear mkrescue" only creates the ReaR recovery system), see "man rear" for more information. To ensure your backup is consistent with the data that is stored in your ReaR recovery system you should make a new backup each time you create a new ReaR recovery system and also the other way round: Each time you make a new backup you should also create a new ReaR recovery system. You must create a new ReaR recovery system when the basic system changed (in particular after a change of the disk layout).

There is basically nothing in ReaR that deals in any further way with what to do with the backup except small things like NETFS_KEEP_OLD_BACKUP_COPY, see below.

Regarding NETFS_KEEP_OLD_BACKUP_COPY:

With the NETFS backup method ReaR writes its files (in particular the backup.tar.gz and the ReaR recovery system ISO image) into a mounted directory that belongs to a network file system (usually NFS).

With empty NETFS_KEEP_OLD_BACKUP_COPY="" a second "rear mkbackup" run overwrites the files in the NFS directory from a previous "rear mkbackup" run. This means when after a successful "rear mkbackup" run one does not save the files in the NFS directory to a permanently safe place, one has no backup if a second "rear mkbackup" run fails. In particular one has no backup while a second "rear mkbackup" run is overwriting the old backup.

With non-empty NETFS_KEEP_OLD_BACKUP_COPY="yes" a second "rear mkbackup" run will not overwrite the files in the NFS directory from a previous "rear mkbackup" run. Instead the second "rear mkbackup" run renames an existing NFS directory into *.old before it writes its files.

Note that "KEEP_OLD_BACKUP_COPY" functionality is not generally available for the various backup and restore tools/solutions where ReaR has built-in support to call their backup and restore functionality.

This means in general:

After a "rear mkbackup" run the user has to do on his own whatever is appropriate in his particular environment how to further deal with the backup and the ReaR recovery system ISO image and the ReaR log file and so on.

Version upgrades with Relax-and-Recover

When you have a working disaster recovery procedure, do not upgrade ReaR and do not change the basic software that is used by ReaR (like partitioning tools, filesystem tools, bootloader tools, ISO image creating tools, and so on).

For each ReaR version upgrade and for each change of a software that is used by ReaR you must carefully and completely re-validate that your particular disaster recovery procedure still works for you.

In contrast when a particular ReaR version does not work for you, try a newer version.

See below the section "First steps with Relax-and-Recover" wherefrom you could get newest ReaR versions or see Downloads at Relax-and-Recover upstream.

Because ReaR is only bash scripts (plus documentation) it means that, in the end, it does not really matter which version of those bash scripts you use. What matters is that the particular subset of ReaR's bash scripts that are actually run for your particular disaster recovery procedure work for you or can be adapted or extended to make it work with as little effort as possible.

When it does not work with an up-to-date ReaR release, try to change your basic system configuration to be more traditional (if possible try to avoid using newest features for your basic system) or you have to manually adapt and enhance ReaR to make it work for your particular case.

For any kind of innovation that belongs to the basic system (e.g. kernel, storage, bootloader, init) the new kind (e.g. udev, btrfs, Grub2 / UEFI / secure boot, systemd) will be there first and afterwards ReaR can adapt step by step to support it.

On the other hand this means: When you have a working disaster recovery procedure running and you upgrade software that is related to the basic system or you do other changes in your basic system, you must also carefully and completely re-validate that your particular disaster recovery procedure still works for you.

Testing current ReaR upstream GitHub master code

It is possible to have several ReaR versions in parallel each one in its own separated directory without conflicts between each other and without conflicts with a normally installed ReaR version via RPM package.

Accordingly you could try out the current ReaR upstream GitHub master code from within a separated directory as a test to find out if things work better with current ReaR upstream master code compared to your installed ReaR version.

Basically "git clone" the current ReaR upstream GitHub master code into a separated directory and then configure and run ReaR from within that directory like:

git clone https://github.com/rear/rear.git

mv rear rear.github.master

cd rear.github.master

vi etc/rear/local.conf

usr/sbin/rear -D mkbackup

Note the relative paths "etc/rear/" and "usr/sbin/".

In case of issues with ReaR it is recommended to try out the current ReaR upstream GitHub master code because that is the only place where ReaR upstream fixes bugs, i.e. bugs in released ReaR versions are not fixed by ReaR upstream, cf. the sections "Debugging issues with Relax-and-Recover" and "How to adapt and enhance Relax-and-Recover" below.

First steps with Relax-and-Recover

To get sufficient understanding how ReaR works you need to use it yourself, play around with it yourself, do some trial an error experiments with it yourself, to get used to ReaR which is a mandatory precondition to be able to do a real disaster recovery even in case of emergency and time pressure in a 'relaxed' way, cf. the section "Notes on the meaning of 'Relax' in 'Relax-and-Recover'" above.

For documentation about ReaR see "man rear" and /usr/share/rear/conf/default.conf and the files in /usr/share/doc/packages/rear*/ when you have a rear* RPM package installed or the files in doc/user-guide/ and "man -l doc/rear.8" when you use the current ReaR upstream GitHub master code (cf. above) or online for example:

It is recommend to do your first steps with ReaR as follows:

1.
You need a NFS server machine whereto ReaR can store its backup and ISO image. For example create a /nfs/ directory and export that via NFS. To set up an NFS server you may use the YaST "NFS Server" module (provided by the "yast2-nfs-server" RPM package). You need to adapt the defaults in /etc/exports so that ReaR that runs as root can write its backup and ISO image there like the following example. In particular export it as "rw" like:

/nfs    *(rw,root_squash,sync,no_subtree_check)

Usually it should work with "root_squash" (so that on the NFS server a non-root user and group like "nobody:nogroup" is used) but perhaps in some cases you may even need "no_root_squash" (so that on the NFS server the user root can do anything with unlimited permissions). In case of "root_squash" the exported '/nfs' directory on the NFS server must have sufficient permissions so that "nobody:nogroup" can create/write files and sub-directories there and access files in those sub-directories, for example you may allow any users and groups to do that with those permissions:

drwxrwxrwx ... /nfs

2.
On a sufficiently powerful host (minimum is a dual core CPU and 4GB main memory) create two same virtual KVM/QEMU machines with full hardware virtualization (except CPU virtualization because that would make it far too slow) as follows:

  • a single CPU
  • 1-2 GB memory (RAM)
  • a single 10-20 GB virtual harddisk
  • a single virtual network interface card
  • standard PC compatible architecture with traditional BIOS

Use standard PC compatible architecture (cf. the section about "Non PC compatible architectures" below) - in particular do not use UEFI (unless you want to see how your first steps with ReaR will fail). The meaning of "same virtual KVM/QEMU machines" is as in the "fully compatible replacement hardware is needed" section above. In particular have the same virtual harddisk size on both machines and same type of virtual network interface card. If you use "virt-manager" to create virtual KVM/QEMU machines set the default "OS type" explicitly to "Generic" to get full hardware virtualization so that the virtual harddisk appears like a real harddisk as /dev/sda (if you use some kind of paravirtualization the harddisk appears as /dev/vda or similar). It depends on the virtual networks settings for KVM/QEMU on the host to what extent you can access remote machines from virtual machines. At least the NFS server whereto ReaR will store its backup and ISO image and wherefrom ReaR will restore its backup must be accessible from the virtual machines. The NFS server must run on the host if the virtual network is in so called "isolated mode" (using private IP addresses of the form 192.168.nnn.mmm) where virtual machines can only communicate with each other and with the host but the virtual machines are cut off from the outer/physical network. Actually no physical network is needed for a virtual network in "isolated mode" which means you can do your first steps with ReaR on a single isolated computer that acts as host for the virtual machines and as NFS server. In such an isolated private network the IP address of the host is usually something like 192.168.100.1 or 192.168.122.1 and virtual machines get their IP addresses usually automatically assigned via a special DNS+DHCP server "dnsmasq" that is automatically configured and started as needed and only used for such virtual networks.

3.
On one virtual machine install at least SLES12-SP5 or SLES15-SP2 into a single ext3 or ext4 filesystem. For your first steps with ReaR keep it simple and do not use several partitions (like one for the system and an additional "/home" partition). In particular do not use the complicated btrfs default structure in SLES12 and SLES15 - unless you prefer to deal with complicated issues during your first steps with ReaR.

4.
For your first steps with ReaR use a test system that is small (for small and fast backup and restore) but still normally usable with the X Window System. For example install only those software patterns

  • Base System
  • Minimal System (Appliances)
  • X Window System

Additionally install the package "MozillaFirefox" as an example application to check that the system is normally usable before and after recovery. In particular have a package installed that provides the 'dhclient' program that is needed for DHCP in the ReaR recovery system. In recent openSUSE and SLES versions 'dhclient' is provided by the "dhcp-client" RPM package. Furthermore install the package "lsb-release" which is used by ReaR. It is recommended to also install a package that provides the traditional (meanwhile deprecated) networking tools like 'ifconfig / netstat / route' and so on. In recent openSUSE and SLES versions those tools are still provided by the "net-tools-deprecated" RPM package. Finally the plain text editor 'vi' is needed by ReaR which is provided by the "vim" RPM package.

5.
For your first steps with ReaR keep it simple and use only a single network interface "eth0" with DHCP.

6.
Install an up-to-date ReaR version (at least ReaR version 2.7 preferably via the "rear27a" RPM package in SLES12-SP5 or SLES15-SP2). Current ReaR versions are available via the openSUSE build service projects "Archiving" and "Archiving:Backup:Rear" for direct RPM download from

Alternatively you may use the current ReaR upstream GitHub master code as described in the above section "Testing current ReaR upstream GitHub master code". Of course the current ReaR upstream GitHub master code is under continuous development so sometimes it may not work.

7.
Set up /etc/rear/local.conf by using /usr/share/rear/conf/examples/SLE11-ext3-example.conf as template (copy it onto /etc/rear/local.conf) and adapt it as you need (read the comments in that file and see "man rear" and /usr/share/rear/conf/default.conf). If your do not have a /usr/share/rear/conf/examples/SLE11-ext3-example.conf file, install a more up-to-date ReaR version (at least ReaR version 2.7).

8.
Check that MozillaFirefox (or whatever example application you use) is normally usable on the system and do some user-specific settings (e.g. save some bookmarks in MozillaFirefox). It depends on the networking settings for KVM/QEMU on the host of the virtual machine whether or not you can access outer networks (like https://www.suse.com on the Internet) from within the virtual machine or if you can only access files locally on the virtual machine (like /usr/share/pixmaps).

9.
Now run

rear -d -D mkbackup

On your NFS server machine you should get a 'backup.tar.gz' and a 'rear-hostname.iso' in a 'hostname' sub-directory of its exported '/nfs' directory (cf. above about the NFS server setup).

10.
Shut down the system where "rear -d -D mkbackup" was run to simulate that this system got destroyed.

11.
Boot the other virtual machine with that rear-hostname.iso and select on ReaR's boot screen "recover hostname" (i.e. use the manual recovery - not the automated recovery) and log in as root (no password).

12.
Now the ReaR recovery system runs on the other virtual machine. Therein run the ReaR recovery installer with

rear -d -D recover

You should get the system recreated on the other virtual machine.

13.
Shut down the ReaR recovery system and reboot the recreated system.

14.
Check that MozillaFirefox (or whatever example application you use) is still normally usable in the recreated system and check that your user-specific settings (e.g. the bookmarks in MozillaFirefox) still exist.

You can run "rear recover" from remote via ssh as follows:

In /etc/rear/local.conf set

USE_DHCLIENT="yes"

and something like

SSH_ROOT_PASSWORD="rear"

Never use your original root password here.

On your first virtual machine run

rear -d -D mkbackup

Boot the other virtual machine with the rear-hostname.iso and select on ReaR's boot screen "recover hostname" (i.e. use the manual recovery - not the automated recovery) and log in as root (no password).

Type "ifconfig" or "ip addr" to see the IP in the ReaR recovery system and log in from remote via ssh by using the SSH_ROOT_PASSWORD value and then run

rear -d -D recover

Debugging issues with Relax-and-Recover

Because ReaR is written entirely as bash scripts, debugging ReaR is usual bash debugging.

To only show what scripts would be run (i.e. what scripts would be "sourced" by the ReaR main script /usr/sbin/rear) for a particular rear command (without actually executing them), use the '-s' option (simulation mode), e.g.: "rear -s mkbackup" and "rear -s recover".

To debug ReaR run it both with the '-d' option (log debug messages) and with the '-D' option (debugscript mode) to log commands and their arguments as they are executed (via 'set -x'), e.g.: "rear -d -D mkbackup" and "rear -d -D recover".

Afterwards inspect the ReaR log file for further analysis.

The ReaR log files get stored in the /var/log/rear/ directory.

When "rear -d -D recover" finishes the log file is copied into the recovered system there either into the /root/ directory or for newer ReaR versions into the separated /var/log/rear/recover/ directory to keep the log file from the last recovery safe and separated from other ReaR log files so that the log file from the last recovery can be analyzed at any later time if needed.

When "rear -d -D recover" fails, you need to save the log file out of the ReaR recovery system (where "rear -d -D recover" was run and where it had failed) before you shut down the ReaR recovery system - otherwise the log file would be lost (because the ReaR recovery system runs in a ramdisk). Additionally the files in the /var/lib/rear/ directory and in its sub-directories in the ReaR recovery system (in particular /var/lib/rear/layout/disklayout.conf and /var/lib/rear/layout/diskrestore.sh) are needed to analyze a "rear -d -D recover" failure. See the "First steps with Relax-and-Recover" section above how to access the ReaR recovery system from remote via ssh so that you can use 'scp' to get files out of the ReaR recovery system.

To analyze and debug a "rear mkrescue/mkbackup" failure the following information is mandatory:

  • ReaR version ("/usr/sbin/rear -V")
  • Operating system version ("cat /etc/os-release" or "lsb_release -a" or "cat /etc/rear/os.conf")
  • ReaR configuration files ("cat /etc/rear/local.conf" and/or "cat /etc/rear/site.conf")
  • Hardware (PC or PowerNV BareMetal or ARM) or virtual machine (KVM guest or PoverVM LPAR) of the original system
  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device) of the original system
  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot) of the original system
  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe) of the original system
  • The output of "lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT" on the original system
  • The output of "findmnt -a -o SOURCE,TARGET,FSTYPE -t btrfs,ext2,ext3,ext4,xfs,reiserfs,vfat" on the original system
  • Debug log file of "rear -d -D mkbackup" or "rear -d -D mkrescue" (in /var/log/rear/) on the original system

To analyze and debug a "rear recover" failure the following additional information is also mandatory:

  • Replacement hardware (PC or PowerNV BareMetal or ARM) or replacement virtual machine (KVM guest or PoverVM LPAR), see the above section "Fully compatible replacement hardware is needed"
  • System architecture (x86 compatible or PPC64/PPC64LE or what exact ARM device) of the replacement system (must be same as what is used on the original system)
  • Firmware (BIOS or UEFI or Open Firmware) and bootloader (GRUB or ELILO or Petitboot) on the replacement system (must be same as what is used on the original system)
  • Storage (local disk or SSD) and/or SAN (FC or iSCSI or FCoE) and/or multipath (DM or NVMe) on the replacement hardware (should be as same as possible as what is used on the original system). Replacement storage must behave same as pristine new storage (see the above section "Prepare replacement hardware for disaster recovery")
  • Two times the output of "lsblk -ipo NAME,KNAME,PKNAME,TRAN,TYPE,FSTYPE,LABEL,SIZE,MOUNTPOINT" on the replacement system from within the ReaR recovery system before "rear -d -D recover" is called and also after it had failed (see above how to access the ReaR recovery system from remote via ssh so that you can more easily copy&paste command output)
  • Two times the output of "findmnt -a -o SOURCE,TARGET,FSTYPE -t btrfs,ext2,ext3,ext4,xfs,reiserfs,vfat" on the replacement system from within the ReaR recovery system before "rear -d -D recover" is called and also after it had failed (see above how to access the ReaR recovery system from remote via ssh so that you can more easily copy&paste command output)
  • Debug log file of "rear -d -D mkbackup" or "rear -d -D mkrescue" (in /var/log/rear/) that matches the "rear recover" failure (i.e. debug log from the original system of the "rear -d -D mkbackup" or "rear -d -D mkrescue" command that created the ReaR recovery system where "rear recover" failed)
  • Debug log file of "rear -d -D recover" (in /var/log/rear/) on the replacement system from within the ReaR recovery system where it had failed (see above how to save the log file out of the ReaR recovery system)
  • Contents of the /var/lib/rear/ directory and in its sub-directories on the replacement system from within the ReaR recovery system where "rear -d -D recover" had failed (see above how to save files out of the ReaR recovery system)

How to adapt and enhance Relax-and-Recover

Because ReaR is written entirely as bash scripts, adapting and enhancing ReaR is basically "usual bash scripting".

Often bash scripting is used primarily as some kind of workaround to get something done in a quick and dirty way.

Do not adapt and enhance ReaR in this way - except you know for sure that your particular adaptions and enhancements could never ever be useful for anybody else because you know for sure that nobody else could also be hit by your particular issue why you made your particular adaptions and enhancements.

You got ReaR as free software and you benefit from it.

If you have an issue with ReaR, adapt and enhance it so that also others could benefit from your adaptions and enhancements.

This means you should contribute to ReaR upstream (see How to contribute to Relax-and-Recover) as follows:

How to contribute to Relax-and-Recover

  • report your issue at ReaR upstream so that also others know about it: https://github.com/rear/rear/issues
  • implement your adaptions and enhancements in a backward compatible way so that your changes do not cause regressions for others
  • provide comments in the source code of your adaptions and enhancements that explain what you did and why you did it so that others can easily understand the reasons behind your changes (even if all is totally obvious for you, others who do not know about your particular use case or do not have your particular environment may understand nothing at all about your changes)
  • follow the ReaR coding style guide: https://github.com/rear/rear/wiki/Coding-Style
  • submit your adaptions and enhancements to ReaR upstream so that others benefit from your work: http://relax-and-recover.org/development/

When you submit your adaptions and enhancements so that ReaR upstream can accept them you will benefit even more from ReaR because this is the only possible way that your adaptions and enhancements will be available also in further ReaR releases and that others who also benefit from your adaptions and enhancements could keep your adaptions and enhancements up to date for future ReaR releases.

In contrast when you do your adaptions and enhancements only on your own, you are left on your own.

If you have an issue with ReaR, but you are unable to adapt and enhance it yourself, you may let others do it for you: http://relax-and-recover.org/support/sponsors

Virtual machines

Usually the virtualization host software provides a snapshot functionality so that a whole virtual machine (guest) can be saved and restored. Using the snapshot functionality results that the virtual machine is saved in files which are specific for the used virtualization host software and those files are usually stored on the virtualization host. Therefore those files must be saved in an additional step (usually the complete virtualization host must be saved) to get the virtual machine safe against failure of the virtualization host.

In contrast when using ReaR the virtual machine is saved as backup and ISO image which are independent of the virtualization host.

Full/hardware virtualization

With ReaR it is possible to save a fully virtual machine which runs in a particular full/hardware virtualization software environment on one physical machine and restore it in a same full/hardware virtualization software environment on another physical machine. This way it should be possible to restore a fully virtual machine on different replacement hardware which mitigates the requirement to have same or compatible replacement hardware available. Nevertheless you must test if this works in your particular case with your particular replacement hardware.

Usually it is not possible to save a fully virtual machine which runs in one full/hardware virtualization software environment and restore it in a different full/hardware virtualization software environment because different full/hardware virtualization software environments emulate different machines which are usually not compatible.

Paravirtualization

Paravirtualized virtual machines are a special case, in particular paravirtualized XEN guests.

A paravirtualized XEN guest needs a special XEN kernel (vmlinux-xen) and also a special XEN initrd (initrd-xen). The XEN host software which launches a paravirtualized XEN guest expects the XEN kernel and the XEN initrd in specific file names "/boot/ARCH/vmlinuz-xen" and "/boot/ARCH/initrd-xen" where ARCH is usually i386 or i586 or x86_64.

Furthermore a paravirtualized XEN guest needs in particular the special kernel modules xennet and xenblk to be loaded. This can be specified in /etc/rear/local.conf with a line "MODULES_LOAD=( xennet xenblk )" which lets the ReaR recovery system autoload these modules in the given order (see /usr/share/rear/conf/default.conf).

ReaR does not provide functionality to create a special "medium" that can be used directly to launch a paravirtualized XEN guest. ReaR creates an usual bootable ISO image which boots on usual PC hardware. ReaR creates a bootable ISO image where kernel and initrd are located in the root directory of the ISO image.

To use ReaR to recreate a paravirtualized XEN guest, the configuration of the XEN host must be adapted so that it can launch the ReaR recovery system on a paravirtualized XEN guest. Basically this means to launch a paravirtualized XEN guest from an usual bootable ISO image.

Remember: There is no such thing as a disaster recovery solution that "just works". Therefore: When it does not work, you might perhaps change your system to be more simple (e.g. use full/hardware virtualization instead of paravirtualization) or you have to manually adapt and enhance the disaster recovery framework to make it work for your particular case.

Non PC compatible architectures

Non PC compatible architectures are those that are neither x86/i386/i486/i586 (32-bit) nor x86_64 (64-bit) like ppc, ppc64, ia64, s390, s390x.

Recovery medium compatibility

ReaR (up to version 1.17.2) creates a traditional El Torito bootable ISO image which boots on PC hardware in the traditional way (i.e. it boots in BIOS mode - not in UEFI mode). For a UEFI bootable ISO image one needs at least ReaR version 1.18 plus the ebiso package (cf. "rear118a" in the "rear / rear116 / rear1172a / rear118a / rear23a / rear27a" section above) or a sufficiently recent ReaR version (at least ReaR version 2.7) plus a sufficiently recent xorriso version (cf. https://github.com/rear/rear/issues/3084 therein in particular https://github.com/rear/rear/issues/3084#issuecomment-1891492816)

ReaR does not provide special functionality to create whatever kind of special bootable "medium" that can be used to boot on non PC compatible architectures.

Therefore ReaR cannot be used without appropriate enhancements and/or modifications on hardware architectures that cannot boot an El Torito bootable medium.

Launching the ReaR recovery system via kexec

It should always be possible to launch the ReaR recovery system via kexec from an arbitrary already running system on your replacement hardware (provided that already running system supports kexec).

On your replacement hardware have an already running small and simple system which can be any system that supports kexec and which can be installed as you like (e.g. from an openSUSE or SUSE Linux Enterprise install medium), cf. the above section "Prepare replacement hardware for disaster recovery".

That already running small and simple system does not need to be compatible with the system that you intend to recreate. For example when you intend to recreate a SUSE Linux Enterprise 12 system with a possibly complicated structure of various filesystems, the already running system could be a minimal openSUSE Tumbleweed system with only one single ext4 root filesystem.

It is recommended that the already running system is up-to-date because a more up-to-date kernel should be better prepared against possible kexec failures when there are still ongoing hardware operations after kexec (e.g. old DMA operations that still write somewhere in memory) from the old kernel while the new kernel is already running.

On your original system that you intend to recreate let ReaR copy its plain initrd that contains the ReaR recovery system plus the original kernel of the system that you intend to recreate to ReaR's output location by using 'OUTPUT=RAMDISK'.

On ReaR versions before ReaR 2.6 'OUTPUT=RAMDISK' may not yet work well, cf. https://github.com/rear/rear/pull/2149 and https://github.com/rear/rear/issues/2148 so you should use ReaR version 2.6 or alternatively you may try out current ReaR upstream GitHub master code, cf. the section "Testing current ReaR upstream GitHub master code" above.

In the already running system on your replacement hardware kexec that original kernel of the system that you intend to recreate plus the matching ReaR recovery system initrd to launch the ReaR recovery system.

An example etc/rear/local.conf could be like (excerpts):

OUTPUT=RAMDISK
BACKUP=NETFS
BACKUP_URL=nfs://your.NFS.server.IP/path/to/your/rear/backup

Via OUTPUT=RAMDISK "rear mkrescue/mkbackup" copies the kernel (by default named kernel-HOSTNAME) plus the ReaR recovery system initrd (by default named initramfs-HOSTNAME.img) to the output location at the same place where the backup gets stored (i.e. what is specified by BACKUP_URL).

To recreate that system on your replacement hardware boot the already installed system on your replacement hardware and therein do those steps:

1.)
Copy the kernel and the ReaR recovery system initrd from ReaR's output location into your already running system on your replacement hardware.

2.)
Load the kernel and ReaR's initrd and provide an appropriate kernel command line to run the ReaR recovery system in a ramdisk and use standard VGA 80x25 text mode (you may have to add special hardware dependant parameters from your original kernel command line in /proc/cmdline of your original system that you intend to recreate) for example like:

kexec -l kernel-HOSTNAME --initrd=initramfs-HOSTNAME.img --command-line='root=/dev/ram0 vga=normal rw'

3.)
Kexec the loaded kernel (plus ReaR's initrd) which will instantly reboot into the loaded kernel without a clean shutdown of the currently running system:

kexec -e

4.)
In the booted ReaR recovery system log in as 'root' (this may only work directly on the system console) and recover your system which will completely replace the system from before so it does not matter when it was not cleanly shut down:

rear -D recover

See also the issue "Alternatively do kexec instead of regular boot to speed up booting" https://github.com/rear/rear/issues/2186 at ReaR upstream.

Bootloader compatibility

Basically GRUB as used on usual PC hardware is the only supported bootloader.

There might be some kind of limited support for special bootloader configurations but one cannot rely on it.

Therefore it is recommended to use GRUB with a standard configuration.

If GRUB with a standard configuration cannot be used on non PC compatible architectures, appropriate enhancements are needed to add support for special bootloader configurations.

It is crucial to check in advance whether or not it is possible to recreate your particular non PC compatible systems with ReaR or any other disaster recovery procedure that you use.

All filesystems are equal, but some are more equal than others

ext2 ext3 ext4

When the standard Linux filesystems ext2, ext3, ext4 are used with standard settings there should be no issues but special filesystem tuning settings may need manual adaptions to make it work for your particular special case.

btrfs

First and foremost: As of this writing (October 2013) the btrfs code base is still under heavy development, see btrfs.wiki.kernel.org which means: When you use btrfs do not expect that any kind of disaster recovery framework "just works" for btrfs.

Since ReaR version 1.12 only basic btrfs filesystem backup and restore may work (but no subvolumes). When btrfs is used with subvolumes that contain normal data (no snapshots), at least ReaR version 1.15 is required that provides some first basic support to recreate some kind of btrfs subvolume structure so that backup and restore of the data could work. Since ReaR version 1.17 there is generic basic support for btrfs with normal subvolumes (but no snapshot subvolumes). Note the "basic support". In particular there is no support for "interwoven btrfs subvolume mounts" which means when subvolumes of one btrfs filesystem are mounted at mount points that belong to another btrfs filesystem and vice versa (cf. the Relax-and-Recover upstream issue 497 and the openSUSE Bugzilla bug 908854).

When btrfs is used with snapshots (i.e. with subvolumes that contain btrfs snapshots) then usual backup and restore cannot work. The main reason is: When there are btrfs snapshot subvolumes and you run usual file-based backup software (e.g. 'tar') to backup the whole data of the btrfs filesystem then all what there is in snapshot subvolumes gets backed up as complete files. For example assume there is a 8GB disk with btrfs filesystem where 5GB disk space is used and there is a recent snapshot. A recent snapshot needs almost no disk space because of btrfs' copy on write functionality. But 'tar' would create a backup that has an uncompressed size of about 10GB because the files appear twice: Under their regular path and additionally under the snapshot subvolume's path. It would be impossible to restore that tarball on the disk. This means btrfs snapshot subvolumes cannot be backed up and restored as usual with file-based backup software.

The same kind of issue can happen with all filesystems that implement copy on write functionality (e.g. OCFS2). For background information you may read about "reflink" (versus "hardlink").

For disaster recovery with Relax-and-Recover on SUSE Linux Enterprise 12 GA the special RPM package rear116 is required that provides ReaR version 1.16 with special adaptions and enhancements for the default btrfs subvolume structure in SUSE Linux Enterprise 12 GA but restore of btrfs snapshot subvolumes cannot be supported (see above and see the "Fundamentals about Relax-and-Recover presentation PDF" below). There are changes in the default btrfs subvolume structure in SUSE Linux Enterprise 12 SP1 that require further adaptions and enhancements in ReaR, see the Relax-and-Recover upstream issue 556 so that for disaster recovery with Relax-and-Recover on SUSE Linux Enterprise 12 SP1 the special RPM package rear1172a is required and for SUSE Linux Enterprise 12 SP2 at least the RPM package rear118a should be used, preferably use the highest available ReaR version (cf. above "rear / rear116 / rear1172a / rear118a / rear23a / rear27a" and "Version upgrades with Relax-and-Recover").

Help and Support

Feasible in advance

Help and support is feasible only "in advance" while your system is still up and running when something does not work when you are testing on your replacement hardware to recreate your system with your recovery medium and that the recreated system can boot on its own and that all its system services still work as you need it in your particular case.

Hopeless in retrospect

Help and support is usually hopeless "in retrospect" when it fails to recreate your system on replacement hardware after your system was destroyed.

The special ReaR recovery system provides a minimal set of tools which could help in some cases to fix issues while it recreates your system, see the "Relax-and-Recover" section above. A precondition is that the ReaR recovery system at least boots correctly on your replacement hardware. If the ReaR recovery system fails to boot, it is usually a dead end.

Be prepared for manual recreation

If it finally fails to recreate your system, you have to manually recreate your basic system, in particular your partitioning with filesystems and mount points and afterwards you have to manually restore your backup e.g. from your NFS server or from another place where your backup is. Therefore you must have at least your partitioning, filesystem, mount point, and networking information available so that you can manually recreate your system. It is recommended to manually recreate your system on your replacement hardware as an exercise to be prepared.

It is crucial to have replacement hardware available in advance to verify that you can recreate your system because there is no such thing as a disaster recovery solution that "just works".

SUSE support for Relax-and-Recover

SUSE provides Relax-and-Recover (ReaR) via the SUSE Linux Enterprise High Availability Extension (SLE-HA) which is the only SUSE product/extension where SUSE officially supports ReaR.

For other SUSE products (like plain SLES without the HA extension) SUSE supports on a voluntary basis only the newest ReaR upstream version or preferably the ReaR upstream GitHub master code directly at ReaR upstream https://github.com/rear/rear - see the above sections "Version upgrades with Relax-and-Recover", "Testing current ReaR upstream GitHub master code", "Debugging issues with Relax-and-Recover", and "How to contribute to Relax-and-Recover".

ReaR neither is nor is it meant to be something like a "ready to use solution for disaster recovery", see the above section "Inappropriate expectations". Instead ReaR is only meant to be a framework that can be used by experienced users and system admins to build up their particular disaster recovery procedure. Therefore ReaR is written entirely in the native language for system administration: as shell (bash) scripts. The intent behind is that ReaR users can and should adapt or extend the ReaR scripts if needed to make things work for their specific case, see the above section "Disaster recovery with Relax-and-Recover (ReaR)".

So in general there is no such thing as a "ready to use solution for disaster recovery" that is provided and/or supported by SUSE. In particular SUSE does not provide RPM package updates with adaptions and/or enhancements for particular use cases because in general it is almost impossible to foresee if regressions could happen for other use cases. Instead SUSE provides from time to time ReaR version upgrades as separated RPM packages (cf. the above section "RPM packages for disaster recovery").

On the other hand "SUSE supports ReaR" but that means a solution could be that SUSE offers help so that a user could adapt or extend his ReaR scripts to make things work for his particular case (for reasonable use cases as far as possible with reasonable effort).

Therefore all ReaR scripts are marked as "config(noreplace)" in SUSE's rear* RPM packages since ReaR version 1.17.1 (cf. the above section "RPM packages for disaster recovery") so that ReaR scripts that were changed by the user are not overwritten by a RPM package update. New scripts that are different in a RPM package update compared to scripts that were changed by the user will get installed as '.rpmnew' files. After a RPM package update the user must check the differences in the '.rpmnew' scripts compared to his changed scripts to find out if his changed scripts still work in his particular use case or if further adaptions are needed plus a careful and complete re-validation that his particular disaster recovery procedure still works, cf. the above section "Version upgrades with Relax-and-Recover".

See also

Related articles

External links