Bare Metal State Machine¶
State Machine Diagram¶
The diagram below shows the provisioning states that an Ironic node goes through during the lifetime of a node. The diagram also depicts the events that transition the node to different states.
Stable states are highlighted with a thicker border. All transitions from stable states are initiated by API requests. There are a few other API-initiated-transitions that are possible from non-stable states. The events for these API-initiated transitions are indicated with ‘(via API)’. Internally, the conductor initiates the other transitions (depicted in gray).
Please click the image above to view the diagram at it’s full size, as the presence in the documentation results in it being scaled down.
Note
There are aliases for some transitions:
- deployis an alias for- active.
- undeployis an alias for- deleted
Enrollment and Preparation¶
- enroll (stable state)
- This is the state that all nodes start off in when created using API version 1.11 or newer. When a node is in the - enrollstate, the only thing ironic knows about it is that it exists, and ironic cannot take any further action by itself. Once a node has its driver/interfaces and their required information set in- node.driver_info, the node can be transitioned to the- verifyingstate by setting the node’s provision state using the- manageverb.- See Enrolling hardware with Ironic for information on enrolling nodes. 
- verifying
- ironic will validate that it can manage the node using the information given in - node.driver_infoand with either the driver/hardware type and interfaces it has been assigned. This involves going out and confirming that the credentials work to access whatever node control mechanism they talk to.
- manageable (stable state)
- Once ironic has verified that it can manage the node using the driver/interfaces and credentials passed in at node create time, the node will be transitioned to the - manageablestate. From- manageable, nodes can transition to:- manageable(through- cleaning) by setting the node’s provision state using the- cleanverb.
- manageable(through- inspecting) by setting the node’s provision state using the- inspectverb.
- available(through- cleaningif automatic cleaning is enabled) by setting the node’s provision state using the- provideverb.
- active(through- adopting) by setting the node’s provision state using the- adoptverb.
 - manageableis the state that a node should be moved into when any updates need to be made to it such as changes to fields in driver_info and updates to networking information on ironic ports assigned to the node.- manageableis also the only stable state that can be transitioned to, from these failure states:- adopt failed
- clean failed
- inspect failed
 
- inspecting
- inspectingwill utilize node introspection to update hardware-derived node properties to reflect the current state of the hardware. Typically, the node will transition to- manageableif inspection is synchronous, or- inspect waitif asynchronous. The node will transition to- inspect failedif error occurred.- See Hardware Inspection for information about inspection. 
- inspect wait
- This is the provision state used when an asynchronous inspection is in progress. A successfully inspected node shall transition to - manageablestate.
- inspect failed
- This is the state a node will move into when inspection of the node fails. From here the node can transitioned to: - inspectingby setting the node’s provision state using the- inspectverb.
- manageableby setting the node’s provision state using the- manageverb
 
- cleaning
- Nodes in the - cleaningstate are being scrubbed and reprogrammed into a known configuration.- When a node is in the - cleaningstate it means that the conductor is executing the clean step (for out-of-band clean steps) or preparing the environment (building PXE configuration files, configuring the DHCP, etc) to boot the ramdisk for running in-band clean steps.
- clean wait
- Just like the - cleaningstate, the nodes in the- clean waitstate are being scrubbed and reprogrammed. The difference is that in the- clean waitstate the conductor is waiting for the ramdisk to boot or the clean step which is running in-band to finish.- The cleaning process of a node in the - clean waitstate can be interrupted by setting the node’s provision state using the- abortverb if the task that is running allows it.
Deploy and Undeploy¶
- available (stable state)
- After nodes have been successfully preconfigured and cleaned, they are moved into the - availablestate and are ready to be provisioned. From- available, nodes can transition to:- active(through- deploying) by setting the node’s provision state using the- activeor- deployverbs.
- manageableby setting the node’s provision state using the- manageverb
 
- deploying
- Nodes in - deployingare being prepared to run a workload on them. This consists of running a series of tasks, such as:- Setting appropriate BIOS configurations 
- Partitioning drives and laying down file systems. 
- Creating any additional resources (node-specific network config, a config drive partition, etc.) that may be required by additional subsystems. 
 - See Deploying with Bare Metal service and Using deploy steps and templates for information about deploying nodes. 
- wait call-back
- Just like the - deployingstate, the nodes in- wait call-backare being deployed. The difference is that in- wait call-backthe conductor is waiting for the ramdisk to boot or execute parts of the deployment which need to run in-band on the node (for example, installing the bootloader, or writing the image to the disk).- The deployment of a node in - wait call-backcan be interrupted by setting the node’s provision state using the- deletedor- undeployverbs.
- deploy failed
- This is the state a node will move into when a deployment fails, for example a timeout waiting for the ramdisk to PXE boot. From here the node can be transitioned to: - active(through- deploying) by setting the node’s provision state using the- active,- deployor- rebuildverbs.
- available(through- deletingand- cleaning) by setting the node’s provision state using the- deletedor- undeployverbs.
 
- active (stable state)
- Nodes in - activehave a workload running on them. ironic may collect out-of-band sensor information (including power state) on a regular basis. Nodes in- activecan transition to:- available(through- deletingand- cleaning) by setting the node’s provision state using the- deletedor- undeployverbs.
- active(through- deploying) by setting the node’s provision state using the- rebuildverb.
- rescue(through- rescuing) by setting the node’s provision state using the- rescueverb.
 
- deleting
- Nodes in - deletingstate are being torn down from running an active workload. In- deleting, ironic tears down and removes any configuration and resources it added in- deployingor- rescuing.
- error (stable state)
- This is the state a node will move into when deleting an active deployment fails. From - error, nodes can transition to:- available(through- deletingand- cleaning) by setting the node’s provision state using the- deletedor- undeployverbs.
 
- adopting
- This state allows ironic to take over management of a baremetal node with an existing workload on it. Ordinarily when a baremetal node is enrolled and managed by ironic, it must transition through - cleaningand- deployingto reach- activestate. However, those baremetal nodes that have an existing workload on them, do not need to be deployed or cleaned again, so this transition allows these nodes to move directly from- manageableto- active.- See Node adoption for information about this feature. 
Rescue¶
- rescuing
- Nodes in - rescuingare being prepared to perform rescue operations. This consists of running a series of tasks, such as:- Setting appropriate BIOS configurations. 
- Creating any additional resources (node-specific network config, etc.) that may be required by additional subsystems. 
 - See Rescue Mode for information about this feature. 
- rescue wait
- Just like the - rescuingstate, the nodes in- rescue waitare being rescued. The difference is that in- rescue waitthe conductor is waiting for the ramdisk to boot or execute parts of the rescue which need to run in-band on the node (for example, setting the password for user named- rescue).- The rescue operation of a node in - rescue waitcan be aborted by setting the node’s provision state using the- abortverb.
- rescue failed
- This is the state a node will move into when a rescue operation fails, for example a timeout waiting for the ramdisk to PXE boot. From here the node can be transitioned to: - rescue(through- rescuing) by setting the node’s provision state using the- rescueverb.
- active(through- unrescuing) by setting the node’s provision state using the- unrescueverb.
- available(through- deleting) by setting the node’s provision state using the- deletedverb.
 
- rescue (stable state)
- Nodes in - rescuehave a rescue ramdisk running on them. Ironic may collect out-of-band sensor information (including power state) on a regular basis. Nodes in- rescuecan transition to:- active(through- unrescuing) by setting the node’s provision state using the- unrescueverb.
- available(through- deleting) by setting the node’s provision state using the- deletedverb.
 
- unrescuing
- Nodes in - unrescuingare being prepared to transition to- activestate from- rescuestate. This consists of running a series of tasks, such as setting appropriate BIOS configurations such as changing boot device.
- unrescue failed
- This is the state a node will move into when an unrescue operation fails. From here the node can be transitioned to: - rescue(through- rescuing) by setting the node’s provision state using the- rescueverb.
- active(through- unrescuing) by setting the node’s provision state using the- unrescueverb.
- available(through- deleting) by setting the node’s provision state using the- deletedverb.
 
Servicing¶
- servicing
- Nodes in the - servicingstate are nodes that are having service performed on them. This service is similar to cleaning, but is performed on nodes currently in- activestate and returns them to- activestate when complete.- When a node is in the - servicingstate it means that the conductor is executing the service step or preparing the environment to execute the step.- See Node servicing for more details on Node servicing. 
- service wait
- Just like the - servicingstate, the nodes in the- service waitstate are being serviced with service steps. The difference is that in the- service waitstate the conductor is waiting for the ramdisk to boot or the clean step which is running in-band to finish.- The servicing of a node in the - service waitstate can be interrupted by setting the node’s provision state using the- abortverb if the task that is running allows it.
- service failed
- This is the state a node will move into when a service operation fails, for example a timeout waiting for the ramdisk to PXE boot. From here the node can be transitioned to: - active(through- servicing) by setting the node’s provision state using the- serviceverb.
- rescue(through- rescuing) by setting the node’s provision state using the- rescueverb.
- activeby setting the node’s provision state using the- abortverb.
 
Note
Prior to attempting aborting a servicing operation on a node either in
service wait or service failed state, the user must check the
remote console of the machine as a precaution and ensure there is no active
firmware update running on the node. Aborting service may result in a power
cycle operation which may interrupt the running update, causing
irreversible damage to the hardware.
Once it is confirmed no update operation is running, using abort verb
is not expected to cause any issues.
