The deletion policy is provided to help users control the election of victim nodes when a cluster is about to be shrank. In other words, when the size of a cluster is to be decreased, which node(s) should be removed first.
Currently, this policy is applicable to clusters of all profile types and it is enforced when the cluster’s size is about to be reduced.
- Number of seconds before actual deletion happens.
- Type of lifecycle hook
- Zaqar queue to receive lifecycle hook message
- Url sink to which to send lifecycle hook message
Below is a typical spec for a deletion policy:
# Sample deletion policy that can be attached to a cluster. type: senlin.policy.deletion version: 1.0 description: A policy for choosing victim node(s) from a cluster for deletion. properties: # The valid values include: # OLDEST_FIRST, OLDEST_PROFILE_FIRST, YOUNGEST_FIRST, RANDOM criteria: OLDEST_FIRST # Whether deleted node should be destroyed destroy_after_deletion: True # Length in number of seconds before the actual deletion happens # This param buys an instance some time before deletion grace_period: 60 # Whether the deletion will reduce the desired capacity of # the cluster as well reduce_desired_capacity: False
The valid values for the “
criteria property include:
OLDEST_FIRST: always select node(s) which were created earlier than other nodes.
YOUNGEST_FIRST: always select node(s) which were created recently instead of those created earlier.
OLDEST_PROFILE_FIRST: compare the profile used by each individual nodes and select the node(s) whose profile(s) were created earlier than others.
RANDOM: randomly select node(s) from the cluster for deletion. This is the default criteria if omitted.
There is an implicit rule (criteria) when electing victim nodes. Senlin engine always rank those nodes which are not in ACTIVE state or which are marked as tainted before others.
There are more several actions that can trigger a deletion policy. Some of
them may already carry a list of candidates to remove, e.g.
NODE_DELETE; others may only carry a number of
nodes to remove, e.g.
CLUSTER_RESIZE. For actions
that already have a list of candidates, the deletion policy will respect the
action inputs. The election of victims only happens when no such candidates
have been identified.
Deletion vs Destroy¶
There are cases where you don’t want the node(s) removed from a cluster to be destroyed. Instead, you prefer them to become “orphan” nodes so that in future you can quickly add them back to the cluster without having to create new nodes.
If this is your situation, you may want to set
false. Senlin engine won’t delete the node(s) after removing them from the
The default behavior is to delete (destroy) the node(s) after they are deprived of their cluster membership.
Another common scenario is to grant a node a period of time for it to shutdown gracefully. Even if a node doesn’t have a builtin logic to perform a graceful shutdown, granting them some extra time may still help ensure the resources they were using have been properly released.
The default value for
grace_period property is 0, which means the node
deletion happens as soon as it is removed from the cluster. You can customize
this value according to your need. Note that the grace period will be granted
to all node(s) deleted. When setting this value to a large number, be sure
it will not exceed the typical timeout value for action execution. Or else the
node deletion will be a failure.
Reduce Desired Capacity or Not¶
In most cases, users would anticipate the “desired_capacity” of a cluster be reduced when there are nodes removed from it. Since the victim selection algorithm always pick nodes in non-ACTIVE status over ACTIVE ones, you can actually remove erroneous nodes by taking advantage of this rule.
For example, there are 4 nodes in a cluster and 2 of them are known to be in inactive status. You can use the command openstack cluster members del to remove the bad nodes. If you have a deletion policy attached to the cluster, you get a chance to tell the Senlin engine that you don’t want to change the capacity of the cluster. Instead, you only want the bad nodes removed. With the help of other cluster health related commands, you can quickly recover the cluster to a healthy status. You don’t have to change the desired capacity of the cluster to a smaller value and then change it back.
If this is your use case, you can set
in the policy spec. The cluster’s desired capacity won’t be changed after
cluster membership is modified.
If there is a need to receive notification of a node deletion, you can specify a lifecycle hook in the deletion policy:
type: senlin.policy.deletion version: 1.1 properties: hooks: type: 'zaqar' timeout: 120 params: queue: 'my_queue'
The valid values for the
zaqar: send message to zaqar queue. The name of the zaqar must be specified in
webhook: send message to webhook URL. The URL of the webhook must be specified in
timeout property specifies the number of seconds to wait before the
actual node deletion happens. This timeout can be preempted by calling
complete lifecycle hook API.
Hooks of type
webhook will be supported in a future version. Currently
only hooks of type
zaqar are supported.
Deleting Nodes Across Regions¶
With the help of Region Placement Policy, you will be able to distribute a cluster’s nodes into different regions as instructed. However, when you are removing nodes from more than one regions, the same distribution rule has to be respected as well.
When there is a region placement policy in effect, the deletion policy will first determine the number of nodes to be removed from each region. Then in each region, the policy performs a victim election based on the criteria you specified in the policy spec.
Deleting Nodes Across Availability Zones¶
Similarly, when there is a zone placement policy attached to the cluster in question, nodes in the cluster may get distributed across a few availability zones based on a preset algorithm.
The deletion policy, when triggered, will first determine the number for nodes to be removed from each availability zone. Then it proceeds to elect victim nodes based on the criteria specified in the policy spec within each availability zone.