openSUSE:High Availability

Jump to: navigation, search

Overview

openSUSE includes a suite of high availability clustering software, notably Corosync, Pacemaker, DRBD, OCFS2 and various related management tools. All of this software is included in the base openSUSE distribution, but we also have repositories on OBS to provide newer versions of these packages for various distros.

Development of these packages occurs in the network:ha-clustering:Factory repository on the Open Build Service.

New development will generally occur in network:ha-clustering:Factory, with periodic submissions to openSUSE:Tumbleweed.

The packages in network:ha-clustering:Stable are largely out of date and should not be relied on, and are mostly used to provide stable builds of crmsh for other distributions like Fedora. For stable versions of the HA packages, see the packages available for openSUSE:Leap. The packages in openSUSE:Tumbleweed are tested by openQA and generally work well, but they are development versions so relying on them for production use is not recommended.

Moving from openSUSE 12.x to 13.x/Leap/Tumbleweed

In openSUSE 13.x we moved from corosync 1.4.x to corosync 2.x, and from the v0 pacemaker plugin to the pacemaker MCP. This means a bunch of stuff changes (corosync config, some protocol changes) which breaks online rolling upgrade.

If you are using anything that requires DLM, or are using a clustered filesystem such as OCFS2, you will have to take the entire cluster offline for the duration of the upgrade.

If you're not using a clustered filesystem or DLM, it's probably viable to put the cluster into maintenance mode, shut down the cluster, and upgrade each node in turn, but note that as of 2013-08-16 this has not been tested for upgrades from openSUSE 12.x to Factory.

Known Issues

If you're running a cluster of openSUSE 13.1 VMs on an openSUSE 12.3 KVM host, multicast may stop working after a while, which will break your cluster. This problem is covered in quite some detail at https://bugzilla.redhat.com/show_bug.cgi?id=880035, but can be worked around by running the following command on the VM host:

echo 1 > /sys/class/net/virbr0/bridge/multicast_querier

Substitute your bride interface name for "virbr0" as appropriate. Alternately, configure the cluster to use udpu instead of multicast.

Hawk

There is a usage guide for Hawk available which includes a Vagrantfile and Chef recipes for configuring a HA cluster running on openSUSE Leap. This is a development cluster and is not recommended for production use, but it can serve as an introduction to using HA with openSUSE Leap.

For more information, see the Hawk Guide.

Discussion

All upstream discussion regarding the use of HA has been brought together in the ClusterLabs users mailing list:

For HA on openSUSE specifically, please subscribe to:

You'll also find us in the #clusterlabs channel on freenode.

Reporting Bugs

If you find bugs in the high availability packages in openSUSE Tumbleweed, please file them against the High Availability component.

Fedora/RedHat/CentOS Packages in OBS

The Fedora/RedHat/CentOS builds of corosync, openais and pacemaker in network:ha-clustering:Stable exist largely to facilitate builds of crmsh and hawk. crmsh and hawk are expected to be stable, so by all means please install and use these tools! But for the underlying cluster stack (corosync+pacemaker), it's recommended to use whatever official builds are available for your distro.

Important Note for Package Maintainers

To move a known good package from network:ha-clustering:Factory to network:ha-clustering:Stable, use osc submitreq:

osc submitreq network:ha-clustering:Factory <package> network:ha-clustering:Stable

As of 2013-08-06 I'm uncertain what effect this has if a package in network:ha-clustering:Stable has extra patches, spec files, etc. so take care as mentioned in the note below about using osc copypac.

You can also use osc copypac if for some reason you need to move a specific older version of a package:

osc copypac -e -r<N> network:ha-clustering:Factory <package> network:ha-clustering:Stable

<N> is the revision to use (check osc log if in doubt) and <package> is the package name. For example:

osc copypac -e -r53 network:ha-clustering:Factory hawk network:ha-clustering:Stable

Note that this will completely clobber whatever package already exists in network:ha-clustering:Stable, so BEFORE RUNNING osc copypac ensure that the target package doesn't have any extra patches, spec files, updated change logs or anything else that will be destroyed. If in doubt, check out the Factory and Stable packages to your local machine then do a recursive diff on them (meld is a particularly handy tool for this, but there's plenty of good diff tools out there, so take your pick).