- 1 HA Storage
- 2 Definitions
- 3 Used Technologies
- 4 Configuration
- 5 Communication
Linux is well suited for a lot of server tasks. Especially NAS operating systems often rely on some specially crafted linux. From a linux perspective, openSUSE and SLES are top performers with regards to performance and stability. Also configuration tasks mostly are easy to manage.
This project does an extra step. It aims to combine high availability features like drbd or a HA nfs-server with new performance features like bcache. All this on top of SLES 11 SP1. All of the code will be publicly available in the openSUSE Buildservice in the project network:storage.
The main paradigm that should always be kept in mind is, that any hardware and any software may fail in certain conditions. This means, that whenever possible we want to have redundancy that does a failover as quickly as possible. From a hardware perspective, all of the hardware will be duplicated. This means, the disks are duplicated and we use two controllers that are responsible for exporting the block devices.
The software is selected as follows (best solution first):
- active/active systems. With every software that can accomplish the task, we will simply run the services on both controllers. This is true at least for all lower level block devices, the device aggregation, volume split up to mirroring the block devices with drbd.
- active/passive systems. The plan is to setup LIO as active/passive. This is still untested and we will see how it works out.
- active on only one controller. Some services like NFS or FTP cannot be run on multiple controllers at the same time. For this kind of services, openAIS will be used to restart the needed service if it fails on one controller. However, because openAIS requires a STONITH device and typically does not care about other services on a node, it will be run in a virtual guest only. This also means, that the controllers will serve as virtualization host.
For our project, we have two machines available. Both have QLogic FC cards available. Both have a direct interconnect with a dedicated GBit Ethernet cable. Storage comes in part from local disks, and in part from FC storage.
The controller is setup twice. We suppose, that both controllers have the same amount and quality of storage available. Mirroring of storage devices is done by the controllers via drbd.
The controller does the following tasks:
- import all physical block devices (local, iscsi, FC).
- aggregate the devices.
- split up the storage in volumes.
- mirror the volumes to the other controller.
- export the devices that can do either active/active or active/passive configurations.
- provide those volumes to the virtual appliance where the export server can run only once (NFS, FTP).
- (optional) run bcache on mirrored devices to increase available performance.
File Server Appliance
For NFS and FTP exports, a openAIS setup is required. These services can only be run once at a time and should be controlled by cluster software. We plan to use a SUSE Studio appliance to setup this part of the system.
- receive block devices from the controller
- make sure NFS and FTP run on one appliance always and only
- make sure to migrate an extra IP address accordingly
- enable webyast on the Appliance
- openAIS configuration
- configuration of hastor via webyast and dbus
- config database structure to be served from configuration vm
There is only one virtual machine on each controller, named fileserver. The fileserver vm is running from a local disk image and its root filesystem is /dev/xvda1.
The configuration is stored on a drbd device (drbd0). It is exported rw to each fileserver VM. Only the VM with the NFS address may write to this device. This device is available on each VM as /dev/xvdb and as volume "config" on each host.
fileserver root: /var/lib/xen/images/fileserver/disk0
Paths are only used inside the fileserver VM. Each of the NFS or FTP exported volumes is mounted by openAIS to /srv/volume/xvd?? . The mapping between volumes and exports is stored in a special config volume (/dev/xvdb).
Path to volumes: /srv/volume/xvd? Root filesystem: /dev/xvda1 Configuration: /dev/xvdb
Drbd uses a global configuration like this. The connection between the two controllers runs over a direct network connection, either a bonded or a simple connection. Drbd uses the IP addresses 192.168.1.1 for controller a and 192.168.1.2 for controller b.
IP controllera: 192.168.1.1 IP controllerb: 192.168.1.2
Volume Names and Logical Volumes
The Hosts use the following naming schema:
vgcreate LR0J_0 /dev/???? | | | | | - number of volume group of this type | - 0J: Jbod, else: raid level - L: local storage, R: remove storage lvcreate -n LR0J_0_001 LR0J_0 | | | - number of volume - ID of volume group
There is a managemnt bridge implemented. It uses vlan8 over the drbd network connection to the other host. The management bridge is available to the respective fileserver VMs.
Connection between Hosts: vlan8 IP controllera (on management bridge): 192.168.0.1 IP controllerb (on management bridge): 192.168.0.2 IP fileserver VM a: 192.168.0.65 IP fileserver VM b: 192.168.0.66
The management bridge is also used by openAIS to communicate between the two fileserver nodes.
The following protocols and features are supported:
- local disks (SATA, SCSI)
- iSCSI disks
- FC disks
- RAID levels
FC and iSCSI distribution
- targets that are available with LIO
NFS and FTP distribution
File Server Appliance