Jump to: navigation, search

What is Ceph

Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability.


If you've got 20 minutes to spare, you might enjoy Tim Serong's Gentle Introduction to Ceph talk. Slides and some more links can be found here.

Upstream Ceph documentation can be found here.

The SUSE Enterprise Storage documentation should also be generally applicable to running Ceph on openSUSE.

Installing Ceph on openSUSE

Available Versions

Ceph is included in openSUSE Leap and Tumbleweed, but some packages needed to deploy it are still missing from the distros, so, while no extra repositories are needed to *install* Ceph, one extra repository is needed to *deploy* it. As of 2020-01-30, installation and deployment has been tested on:

A very recent version of Ceph Octopus (15.x), which is still under active development, is also available:

openSUSE Tumbleweed is currently tracking the Ceph octopus branch.

The OBS projects will shift as upstream releases occur; filesystems:ceph is the devel project for Ceph in Tumbleweed, and will generally track the latest release. LTS Ceph releases are from subprojects as mentioned above, and will go out with particular Leap releases. Similarly, upcoming releases will be staged in subprojects (e.g. Nautilus is in filesystems:ceph:nautilus).

Deploying Ceph

Ordinarily a Ceph cluster would consist of at least several physical hosts, each containing many disks (OSDs). But if you just want to play, you can create a toy Ceph cluster on a few VMs. If you're doing this on a laptop or small desktop system, and the VMs are backed by qcow2 volumes on the same disk, you really want to be using an SSD, not spinning rust.

Manual Setup

  • Install openSUSE Leap 15.x (or Tumbleweed, if you prefer to live on the edge) on at least three VMs.
  • You probably want to give each VM 2GB RAM and 2 CPU cores, and enable KSM on the VM host.
  • Give each of your test VMs at least one additional disk, at least 20GB in size. These will be your OSDs.
  • Make absolutely sure that hostname resolution works, i.e. $(hostname) and $(hostname -f) need to show something sensible when you're logged into your VMs. Your VM host will also want to be able to resolve these hostnames.
  • Make sure the time is sync'd nicely.
  • Add the filesystems:ceph:nautilus OBS repo to all the VMs.
  • Proceed to set up a Salt cluster and deploy Ceph using deepsea (read on)

Using Salt/DeepSea

For any reasonable sized Ceph cluster, you'll want to automate deployment somehow. For that matter, automation is a win even if you're only setting up a toy test cluster. On openSUSE (and on SLES with SUSE Enterprise Storage), we've got DeepSea, which is a collection of Salt state files, runners and modules for deploying Ceph using Salt.

For an introduction to DeepSea, including a walkthrough of setting up a small test cluster, see this blog post.

The latest DeepSea packages for openSUSE can be found in filesystems:ceph:nautilus/deepsea on OBS.

For more information about DeepSea, check out the wiki. For assistance, please join the mailing list. We'd love to get your feedback.

Using Rook

Rook is an open source cloud-native storage orchestrator, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Please refer to the rook documentation on the Rook web page.

To use Rook for Ceph you need a Kubernetes cluster. One method to set one up is described in section "Using Rook in Vagrant cluster".

Deploying Ceph in Vagrant cluster

There are currently two tools available to assist in deploying Ceph in VMs using Vagrant: the newer sesdev and the older vagrant-ceph.


The sesdev tool can be found at This project is documented in the file (which is also rendered on the project's GitHub page).

It allows the user to deploy and manage one or more Ceph clusters running in Vagrant.


If you don't care to deal with setting up VMs yourself, you can use to automate the process.

# curl -o
# chmod +x
# sudo ./

# vagrant up
# vagrant provision

This will deploy a 3-node development cluster on Leap 15.1.

# vagrant ssh admin
# su

The password is the standard one: vagrant

# deepsea stage run ceph.stage.0
# deepsea stage run ceph.stage.1
# deepsea stage run ceph.stage.2
# deepsea stage run ceph.stage.3
# deepsea stage run ceph.stage.4
# deepsea stage run ceph.stage.5

# ceph -s

So, your small development cluster is ready.

For different boxes and more useful configurations, please refer to:

Using Rook in Vagrant cluster

To setup Vagrant cluster you can use to automate the process.

Set up vagrant tool for your distro. For Tumbleweed it is easy as:

# zypper in vagrant vagrant-libvirt

For the Leap, you might need to add repo.

Now you need to add Kubic box:

# vagrant box add --provider libvirt --name opensuse/MicroOS-Kubic-kubeadm

Clone vagrant-ceph and start Kubernetes cluster on top of Kubic:

$ git clone
$ cd vagrant-ceph
$ BOX="opensuse/MicroOS-Kubic-kubeadm" vagrant up

that will bootstrap Kubernetes cluster with predefined insecure token, so make sure it is used only for local development purposes. You might get some "ssh" connection issues during "vagrant up" phase, you might ignore them and proceed with "provision" step.

Those steps would bring 3 nodes cluster, check vagrant-ceph README to learn how to bring up more complex environment.

Now lets deploy Ceph with Rook.

$ vagrant ssh admin
admin$ sudo su
admin# kubectl get nodes

you should see 3 nodes in the cluster. Please wait for nodes to get into the "Ready" state. If there any issues please report to vagrant-ceph github.

This will create Rook operator and RBAC:

admin# kubectl apply -f /usr/share/k8s-yaml/rook/ceph/common.yaml
admin# kubectl apply -f /usr/share/k8s-yaml/rook/ceph/operator.yaml

Optionally you could check if "cephclusters" is present in crd:

admin# kubectl get crd

Now we can create Ceph cluster and toolbox container:

admin# kubectl apply -f /usr/share/k8s-yaml/rook/ceph/cluster.yaml -f /usr/share/k8s-yaml/rook/ceph/toolbox.yaml

You could check containers that were created for rook-ceph namespace:

admin# kubectl -n rook-ceph get pod

It will take some time to create all needed roles, you could check Ceph status after toolbox container was created:

admin# kubectl -n rook-ceph exec $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0]}') -- ceph -s

Follow Rook documentation for other commands.

Deploying Ceph with terraform

Use terraform for deployment and saltstack, Deepsea:

Deploying Ceph with ceph-deploy

NOTE: ceph-deploy is still relevant for Ceph "Jewel" (openSUSE Leap 42.2), but not for anything newer than that. Use DeepSea instead.

ceph-deploy can be used to deploy small clusters, but it rapidly becomes cumbersome for any real-world deployment. Please seriously consider using Salt/DeepSea instead as mentioned above (or, indeed, any other serious configuration management tool or automation framework). ceph-deploy will be deprecated and eventually disappear from openSUSE Tumbleweed and SUSE Enterprise Storage.

That said, here's how to use ceph-deploy:

  • Make sure passwordless ssh login works from a regular user from the VM host, to root on the VM guests.
  • For example if you've named your VM guests "leap1", "leap2" and "leap3", your ~/.ssh/config could include:
   Host leap*
           User root
  • And you'll want to run:
# ssh-copy-id leap1
# ssh-copy-id leap2
# ssh-copy-id leap3
  • On your VM host (as the user who can do passwordless ssh to the VM guests) run the following commands. Replace leap1, leap2 and leap3 with your actual hostnames:
# zypper in ceph-deploy
# mkdir leap-test
# cd leap-test/
# ceph-deploy install leap1 leap2 leap3
# ceph-deploy new leap1 leap2 leap3
# ceph-deploy mon create-initial
# ceph-deploy admin leap1 leap2 leap3
# ceph-deploy mgr create leap1 leap2 leap3
  • Now, if you ssh in to one of your VMs and run ceph -s, you should see something like this:
# ceph -s
    cluster ad773abb-4063-416c-ad42-ecba87880b6a
     health HEALTH_ERR
            64 pgs stuck inactive
            64 pgs stuck unclean
            no osds
     monmap e1: 3 mons at {leap1=,leap2=,leap3=}
            election epoch 6, quorum 0,1,2 leap2,leap1,leap3
     osdmap e1: 0 osds: 0 up, 0 in
            flags sortbitwise
      pgmap v2: 64 pgs, 1 pools, 0 bytes data, 0 objects
            0 kB used, 0 kB / 0 kB avail
                  64 creating
  • Back on your VM host, create some OSDs. You have to run ceph-deploy osd prepare and ceph-deploy osd activate for each of the OSD disks for each of the VMs. For example:
# ceph-deploy osd prepare leap1:sdb
# ceph-deploy osd activate leap1:sdb1
# ceph-deploy osd prepare leap2:sdb
# ceph-deploy osd activate leap2:sdb1
# ceph-deploy osd prepare leap2:sdb
# ceph-deploy osd activate leap2:sdb1
  • Now, ceph -s on the VMs should give something like:
# ceph -s
    cluster ad773abb-4063-416c-ad42-ecba87880b6a
     health HEALTH_OK
     monmap e1: 3 mons at {leap1=,leap2=,leap3=}
            election epoch 6, quorum 0,1,2 leap2,leap1,leap3
     osdmap e13: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v26: 64 pgs, 1 pools, 0 bytes data, 0 objects
            100 MB used, 58234 MB / 58334 MB avail
                  64 active+clean

And you're done.


Daemon Startup

If your guest VMs are configured via DHCP, the default timeouts may not be sufficient for the network to be configured correctly before the various Ceph daemons start. If this happens, the Ceph MONs and OSDs will not start correctly (systemctl status ceph\* will show "unable to bind" errors). This can be avoided by increasing the DHCP client timeout to at least 30 seconds on each node in your storage cluster. This can be done by changing the following settings on each node:

  • In /etc/sysconfig/network/dhcp set DHCLIENT_WAIT_AT_BOOT="30"
  • In /etc/sysconfig/network/config set WAIT_FOR_INTERFACES="60"

Old ceph-deploy Versions

Prior to ceph-deploy 1.5.31, there was a bug which may result in ceph-deploy mon create initial stalling trying to create some keys. The workaround is to SSH to each of your ceph nodes and run chown -R ceph.ceph /var/lib/ceph/mon as root, then re-run ceph-deploy mon create-initial.

Replication / Stuck PGs

By default, pools will have min_size=2 and size=3. This means that your data will be replicated across a minimum of two OSDs (on two separate hosts, with the default CRUSH map), but preferably across three OSDs on three hosts. So, if you've got three hosts with one OSD each (i.e. a total of three OSDs), at least two of the OSDs must be up in order for everything to work.

If ceph status perpetually shows stuck PGs, make sure all the OSDs are up, and make sure your CRUSH map actually looks like a tree, for example:

# ceph osd tree
-1 0.05548 root default
-2 0.01849     host leap1
 0 0.01849         osd.0       up  1.00000          1.00000
-3 0.01849     host leap2
 1 0.01849         osd.1       up  1.00000          1.00000
-4 0.01849     host leap3
 2 0.01849         osd.2       up  1.00000          1.00000

In the above, "root" contains three hosts (leap1, leap2 and leap3), and each host contains one OSD (osd.0, osd.1, osd.2), all of which are up.

If you've got a CRUSH map that doesn't look like a tree, something is wrong. For example:

# ceph osd tree
-1      0 root default
 0      0 osd.0             up  1.00000          1.00000
 1      0 osd.1             up  1.00000          1.00000
 2      0 osd.2             up  1.00000          1.00000 

Here, there's no hosts, and it's completely flat. This happened on a test system where hostname resolution wasn't working, so none of the hosts knew what their names were, and thus couldn't be added to the CRUSH map.


Mailing Lists, IRC, etc.

Team members