Appendix M: Ceph Erasure Coding and Device Classing


This appendix is intended as a post deployment guide to re-configuring RADOS gateway pools to use erasure coding rather than replication. It also covers use of a specific device class (NVMe, SSD or HDD) when creating the erasure coding profile as well as other configuration options that need to be considered during deployment.


Any existing data is maintained by following this process, however reconfiguration should take place immediately post deployment to avoid prolonged ‘copy-pool’ operations.

RADOS Gateway bucket weighting

The weighting of the various pools in a deployment drives the number of placement groups (PG’s) created to support each pool. In the ceph-radosgw charm this is configured for the data bucket using:

juju config ceph-radosgw rgw-buckets-pool-weight=20

Note the default of 20% - if the deployment is a pure ceph-radosgw deployment this value should be increased to the expected % use of storage. The device class also needs to be taken into account (but for erasure coding this needs to be specified post deployment via action execution).

Ceph automatic device classing

Newer versions of Ceph do automatic classing of OSD devices. Each OSD will be placed into ‘nvme’, ‘ssd’ or ‘hdd’ device classes. These can be used when creating erasure profiles or new CRUSH rules (see following sections).

The classes can be inspected using:

sudo ceph osd crush tree

-1       8.18729 root default
-5       2.72910     host node-laveran
 2  nvme 0.90970         osd.2
 5   ssd 0.90970         osd.5
 7   ssd 0.90970         osd.7
-7       2.72910     host node-mees
 1  nvme 0.90970         osd.1
 6   ssd 0.90970         osd.6
 8   ssd 0.90970         osd.8
-3       2.72910     host node-pytheas
 0  nvme 0.90970         osd.0
 3   ssd 0.90970         osd.3
 4   ssd 0.90970         osd.4

Configuring erasure coding

The RADOS gateway makes use of a number of pools, but the only pool that should be converted to use erasure coding (EC) is the data pool:

All other pools should be replicated as they are by default.

To create a new EC profile and pool:

juju run-action --wait ceph-mon/0 create-erasure-profile \
    name=nvme-ec device-class=nvme

juju run-action --wait ceph-mon/0 create-pool \ \
    pool-type=erasure \
    erasure-profile-name=nvme-ec \

The percent-data option should be set based on the type of deployment but if the RADOS gateway is the only target for the NVMe storage class, then 90% is appropriate (other RADOS gateway pools are tiny and use between 0.10% and 3% of storage)


The create-erasure-profile action has a number of other options including adjustment of the K/M values which affect the computational overhead and underlying storage consumed per MB stored. Sane defaults are provided but they require a minimum of five hosts with block devices of the right class.

To avoid any creation/mutation of stored data during migration, shutdown all RADOS gateway instances:

juju run --application ceph-radosgw \
    "sudo systemctl stop"

The existing pool can then be copied and switched:

juju run-action --wait ceph-mon/0 rename-pool \ \

juju run-action --wait ceph-mon/0 rename-pool \ \

At this point the RADOS gateway instances can be restarted:

juju run --application ceph-radosgw \
    "sudo systemctl start"

Once successful operation of the deployment has been confirmed, the old pool can be deleted:

juju run-action --wait ceph-mon/0 delete-pool \

Moving other RADOS gateway pools to NVMe storage

The pool is the largest pool and the one that can make use of EC; other pools could also be migrated to the same storage class for consistent performance:

juju run-action --wait ceph-mon/0 create-crush-rule \
    name=replicated_nvme device-class=nvme

The CRUSH rule for the other RADOS gateway pools can then be updated:


for pool in $pools; do
    juju run-action --wait ceph-mon/0 pool-set \
        name=$pool key=crush_rule value=replicated_nvme