Home Wiki > openSUSE:High Availability
Sign up | Login

openSUSE:High Availability

tagline: From openSUSE

Overview

openSUSE includes a suite of high availability clustering software, notably Corosync, Pacemaker, DRBD, OCFS2 and various related management tools. All of this software is included in the base openSUSE distribution, but we also have repositories on OBS to provide newer versions of these packages for various distros.

Historically these packages were split into two projects; network:ha-clustering (stable packages) and network:ha-clustering:Factory (unstable/development for the next openSUSE release). As of August 2013, we have transitioned to the following project structure:

The packages in network:ha-clustering:Unstable link to the packages in network:ha-clustering:Factory, which in turn link to those in openSUSE:Factory. New development will generally occur in network:ha-clustering:Factory, with periodic submissions to openSUSE:Factory.

The packages in network:ha-clustering:Stable have been manually copied from either (older) stable versions in network:ha-clustering:Factory and/or previously existing packages in network:ha-clustering. In future, once the development packages stabilize again, they will be copied to network:ha-clustering:Stable for general use.

Using Pacemaker on openSUSE 12.x

Pacemaker on openSUSE 12.x runs in corosync + pacemaker plugin (v0) mode. For reasons largely to do with not breaking rolling upgrades, we still use the openais init script, so you'll want to install corosync, pacemaker, openais (plus resource-agents, etc.), then copy /etc/corosync/corosync.conf.example to /etc/corosync/corosync.conf, tweak bindnetaddr and run rcopenais start to bring the cluster up.

This will change in openSUSE 13.1 when we move to corosync 2.x.

What to Expect in openSUSE 13.1 (Factory)

We're moving from corosync 1.4.x to corosync 2.x, and from the v0 pacemaker plugin to the pacemaker MCP. This means a bunch of stuff changes (corosync config, some protocol changes) which breaks online rolling upgrade.

If you are using anything that requires DLM, or are using a clustered filesystem such as OCFS2, you will have to take the entire cluster offline for the duration of the upgrade.

If you're not using a clustered filesystem or DLM, it's probably viable to put the cluster into maintenance mode, shut down the cluster, and upgrade each node in turn, but note that as of 2013-08-16 this has not been tested for upgrades from openSUSE 12.x to Factory.

Known Issues

If you're running a cluster of openSUSE 13.1 VMs on an openSUSE 12.3 KVM host, multicast may stop working after a while, which will break your cluster. This problem is covered in quite some detail at https://bugzilla.redhat.com/show_bug.cgi?id=880035, but can be worked around by running the following command on the VM host:

echo 1 > /sys/class/net/virbr0/bridge/multicast_querier

Substitute your bride interface name for "virbr0" as appropriate. Alternately, configure the cluster to use udpu instead of multicast.

Discussion

For HA on openSUSE specifically, please subscribe to:

For more general (non-openSUSE-specific discussion), you should also check out the relevant upstream mailing lists - everybody working on HA on openSUSE should be subscribed to these too:

You'll also find us on the #linux-ha and #linux-cluster channels on freenode.

Reporting Bugs

If you find bugs in the high availability packages in openSUSE Factory, please file them against the High Availability component.

Fedora/RedHat/CentOS Packages in OBS

The Fedora/RedHat/CentOS builds of corosync, openais and pacemaker in network:ha-clustering:Stable exist largely to facilitate builds of crmsh and hawk. crmsh and hawk are expected to be stable, so by all means please install and use these tools! But for the underlying cluster stack (corosync+pacemaker), it's recommended to use whatever official builds are available for your distro.

Important Note for Package Maintainers

To move a known good package from network:ha-clustering:Factory to network:ha-clustering:Stable, use osc submitreq:

osc submitreq network:ha-clustering:Factory <package> network:ha-clustering:Stable

As of 2013-08-06 I'm uncertain what effect this has if a package in network:ha-clustering:Stable has extra patches, spec files, etc. so take care as mentioned in the note below about using osc copypac.

You can also use osc copypac if for some reason you need to move a specific older version of a package:

osc copypac -e -r<N> network:ha-clustering:Factory <package> network:ha-clustering:Stable

<N> is the revision to use (check osc log if in doubt) and <package> is the package name. For example:

osc copypac -e -r53 network:ha-clustering:Factory hawk network:ha-clustering:Stable

Note that this will completely clobber whatever package already exists in network:ha-clustering:Stable, so BEFORE RUNNING osc copypac ensure that the target package doesn't have any extra patches, spec files, updated change logs or anything else that will be destroyed. If in doubt, check out the Factory and Stable packages to your local machine then do a recursive diff on them (meld is a particularly handy tool for this, but there's plenty of good diff tools out there, so take your pick).