Host Failure

Test Environment

  • Cluster size: 4 host machines

  • Number of disks: 24 (= 6 disks per host * 4 hosts)

  • Kubernetes version: 1.10.5

  • Ceph version: 12.2.3

  • OpenStack-Helm commit: 25e50a34c66d5db7604746f4d2e12acbdd6c1459

Case: One host machine where ceph-mon is running is rebooted

Symptom:

After reboot (node voyager3), the node status changes to NotReady.

$ kubectl get nodes
NAME       STATUS     ROLES     AGE       VERSION
voyager1   Ready      master    6d        v1.10.5
voyager2   Ready      <none>    6d        v1.10.5
voyager3   NotReady   <none>    6d        v1.10.5
voyager4   Ready      <none>    6d        v1.10.5

Ceph status shows that ceph-mon running on voyager3 becomes out of quorum. Also, six osds running on voyager3 are down; i.e., 18 osds are up out of 24 osds.

(mon-pod):/# ceph -s
  cluster:
    id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
    health: HEALTH_WARN
            6 osds down
            1 host (6 osds) down
            Degraded data redundancy: 195/624 objects degraded (31.250%), 8 pgs degraded
            too few PGs per OSD (17 < min 30)
            mon voyager1 is low on available space
            1/3 mons down, quorum voyager1,voyager2

  services:
    mon: 3 daemons, quorum voyager1,voyager2, out of quorum: voyager3
    mgr: voyager1(active), standbys: voyager3
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
    osd: 24 osds: 18 up, 24 in
    rgw: 2 daemons active

  data:
    pools:   18 pools, 182 pgs
    objects: 208 objects, 3359 bytes
    usage:   2630 MB used, 44675 GB / 44678 GB avail
    pgs:     195/624 objects degraded (31.250%)
             126 active+undersized
             48  active+clean
             8   active+undersized+degraded

Recovery:

The node status of voyager3 changes to Ready after the node is up again. Also, Ceph pods are restarted automatically. Ceph status shows that the monitor running on voyager3 is now in quorum.

$ kubectl get nodes
NAME       STATUS    ROLES     AGE       VERSION
voyager1   Ready     master    6d        v1.10.5
voyager2   Ready     <none>    6d        v1.10.5
voyager3   Ready     <none>    6d        v1.10.5
voyager4   Ready     <none>    6d        v1.10.5
(mon-pod):/# ceph -s
  cluster:
    id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
    health: HEALTH_WARN
            too few PGs per OSD (22 < min 30)
            mon voyager1 is low on available space

  services:
    mon: 3 daemons, quorum voyager1,voyager2,voyager3
    mgr: voyager1(active), standbys: voyager3
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
    osd: 24 osds: 24 up, 24 in
    rgw: 2 daemons active

  data:
    pools:   18 pools, 182 pgs
    objects: 208 objects, 3359 bytes
    usage:   2635 MB used, 44675 GB / 44678 GB avail
    pgs:     182 active+clean

Case: A host machine where ceph-mon is running is down

This is for the case when a host machine (where ceph-mon is running) is down.

Symptom:

After the host is down (node voyager3), the node status changes to NotReady.

$ kubectl get nodes
NAME       STATUS     ROLES     AGE       VERSION
voyager1   Ready      master    14d       v1.10.5
voyager2   Ready      <none>    14d       v1.10.5
voyager3   NotReady   <none>    14d       v1.10.5
voyager4   Ready      <none>    14d       v1.10.5

Ceph status shows that ceph-mon running on voyager3 becomes out of quorum. Also, 6 osds running on voyager3 are down (i.e., 18 out of 24 osds are up). Some placement groups become degraded and undersized.

(mon-pod):/# ceph -s
  cluster:
      id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
      health: HEALTH_WARN
              6 osds down
              1 host (6 osds) down
              Degraded data redundancy: 227/720 objects degraded (31.528%), 8 pgs
  degraded
              too few PGs per OSD (17 < min 30)
              mon voyager1 is low on available space
              1/3 mons down, quorum voyager1,voyager2

    services:
      mon: 3 daemons, quorum voyager1,voyager2, out of quorum: voyager3
      mgr: voyager1(active), standbys: voyager3
      mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:stan
  dby
      osd: 24 osds: 18 up, 24 in
      rgw: 2 daemons active

    data:
      pools:   18 pools, 182 pgs
      objects: 240 objects, 3359 bytes
      usage:   2695 MB used, 44675 GB / 44678 GB avail
      pgs:     227/720 objects degraded (31.528%)
               126 active+undersized
               48  active+clean
               8   active+undersized+degraded

The pod status of ceph-mon and ceph-osd shows as NodeLost.

$ kubectl get pods -n ceph -o wide|grep voyager3
ceph-mgr-55f68d44b8-hncrq                  1/1       Unknown     6          8d        135.207.240.43   voyager3
ceph-mon-6bbs6                             1/1       NodeLost    8          8d        135.207.240.43   voyager3
ceph-osd-default-64779b8c-lbkcd            1/1       NodeLost    1          6d        135.207.240.43   voyager3
ceph-osd-default-6ea9de2c-gp7zm            1/1       NodeLost    2          8d        135.207.240.43   voyager3
ceph-osd-default-7544b6da-7mfdc            1/1       NodeLost    2          8d        135.207.240.43   voyager3
ceph-osd-default-7cfc44c1-hhk8v            1/1       NodeLost    2          8d        135.207.240.43   voyager3
ceph-osd-default-83945928-b95qs            1/1       NodeLost    2          8d        135.207.240.43   voyager3
ceph-osd-default-f9249fa9-n7p4v            1/1       NodeLost    3          8d        135.207.240.43   voyager3

After 10+ miniutes, Ceph starts rebalancing with one node lost (i.e., 6 osds down) and the status stablizes with 18 osds.

(mon-pod):/# ceph -s
  cluster:
    id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
    health: HEALTH_WARN
            mon voyager1 is low on available space
            1/3 mons down, quorum voyager1,voyager2

  services:
    mon: 3 daemons, quorum voyager1,voyager2, out of quorum: voyager3
    mgr: voyager1(active), standbys: voyager2
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
    osd: 24 osds: 18 up, 18 in
    rgw: 2 daemons active

  data:
    pools:   18 pools, 182 pgs
    objects: 240 objects, 3359 bytes
    usage:   2025 MB used, 33506 GB / 33508 GB avail
    pgs:     182 active+clean

Recovery:

The node status of voyager3 changes to Ready after the node is up again. Also, Ceph pods are restarted automatically. The Ceph status shows that the monitor running on voyager3 is now in quorum and 6 osds gets back up (i.e., a total of 24 osds are up).

(mon-pod):/# ceph -s
  cluster:
    id:     9d4d8c61-cf87-4129-9cef-8fbf301210ad
    health: HEALTH_WARN
            too few PGs per OSD (22 < min 30)
            mon voyager1 is low on available space

  services:
    mon: 3 daemons, quorum voyager1,voyager2,voyager3
    mgr: voyager1(active), standbys: voyager2
    mds: cephfs-1/1/1 up  {0=mds-ceph-mds-65bb45dffc-cslr6=up:active}, 1 up:standby
    osd: 24 osds: 24 up, 24 in
    rgw: 2 daemons active

  data:
    pools:   18 pools, 182 pgs
    objects: 240 objects, 3359 bytes
    usage:   2699 MB used, 44675 GB / 44678 GB avail
    pgs:     182 active+clean

Also, the pod status of ceph-mon and ceph-osd changes from NodeLost back to Running.

$ kubectl get pods -n ceph -o wide|grep voyager3
ceph-mon-6bbs6                             1/1       Running     9          8d        135.207.240.43   voyager3
ceph-osd-default-64779b8c-lbkcd            1/1       Running     2          7d        135.207.240.43   voyager3
ceph-osd-default-6ea9de2c-gp7zm            1/1       Running     3          8d        135.207.240.43   voyager3
ceph-osd-default-7544b6da-7mfdc            1/1       Running     3          8d        135.207.240.43   voyager3
ceph-osd-default-7cfc44c1-hhk8v            1/1       Running     3          8d        135.207.240.43   voyager3
ceph-osd-default-83945928-b95qs            1/1       Running     3          8d        135.207.240.43   voyager3
ceph-osd-default-f9249fa9-n7p4v            1/1       Running     4          8d        135.207.240.43   voyager3