2023.2 (21.5.x - 23.0.x) Series Release Notes

23.0.0-29

Bug Fixes

  • Fixes an issue with units tests that show this DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future. cls = validator_for(schema) Removed the warning for deprecated schema by using a new template.

  • Fixes the issue of service steps not starting due to servicing states (states.SERVICING and states.SERVICEWAIT) missing from _FASTTRACK_HEARTBEAT_ALLOWED constant.

  • Firmware components are now also cached on the transition to the manageable state in addition to cleaning. This is consisent with how BIOS settings, vendor and boot mode are cached.

  • Fixes the behavior of file:/// image URLs pointing at a symlink. Ironic no longer creates a hard link to the symlink, which could cause confusing FileNotFoundError to happen if the symlink is relative.

  • Nodes no longer get stuck in cleaning when the firmware components caching code raises an unexpected exception.

  • Prevents a database constraints error on caching firmware components when a supported component does not have the current version.

  • Fixes an issue when listing allocations as a project scoped user when the legacy RBAC policies have been disabled which forced an HTTP 406 error being erroneously raised. Users attempting to list allocations with a specific owner, different from their own, will now receive an HTTP 403 error.

  • Properly eject the virtual media from a DVD device in case this is the only MediaType available from the Hardware, and Ironic requested CD as the device to be used. See bug 2039042 for details.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • When configured to listen on a unix socket, Ironic will now properly cleanup the unix socket on a clean service stop.

  • The idrac hardware type is now compatible with the redfish firmware interface. The link between them was missing initially.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • When Ironic hits the limit on the number of the concurrent deploys (specified in the [conductor]max_concurrent_deploy option), the resulting HTTP code is now 503 instead of the more generic 500.

  • The per-node external_http_url setting in the driver info is now used for a boot ISO. Previously this setting was only used for a config floppy.

  • Fixes an issue where the conductor service would fail to launch when the neutron network_interface setting was enabled, and no global cleaning_network or provisioning_network is set in ironic.conf. These settings have long been able to be applied on a per-node basis via the API. As such, the service can now be started and will error on node validation calls, as designed for drivers missing networking parameters.

  • When configuring secure boot via Redfish, internal server errors are now retried for a longer period than by default, accounting for the SecureBoot resource unavailability during configuration on some hardware.

  • Fixes Raid creation issue in iLO6 and other BMC with latest schema by removing ‘VolumeType’, ‘Encrypted’ and changing placement of ‘Drives’ to inside ‘Links’.

  • Provides a fix for service role support to enable the use case where a dedicated service project is used for cloud service operation to facilitate actions as part of the operation of the cloud infrastructure.

    OpenStack clouds can take a variety of configuration models for service accounts. It is now possible to utilize the [DEFAULT] rbac_service_role_elevated_access setting to enable users with a service role in a dedicated service project to act upon the API similar to a “System” scoped “Member” where resources regardless of owner or lessee settings are available. This is needed to enable synchronization processes, such as nova-compute or the networking-baremetal ML2 plugin to perform actions across the whole of an Ironic deployment, if desirable where a “System” scoped user is also undesirable.

    This functionality can be tuned to utilize a customized project name aside from the default convention service, for example baremetal or admin, utilizing the [DEFAULT] rbac_service_project_name setting.

    Operators can alternatively entirely override the service_role RBAC policy rule, if so desired, however Ironic feels the default is both reasonable and delineates sufficiently for the variety of Role Based Access Control usage cases which can exist with a running Ironic deployment.

23.0.0

Prelude

Ironic is proud to announce the release of 23.0, the capstone release of a six month OpenStack 2023.2 (Bobcat) cycle.

Our focus this cycle has been on improving the ability for operators to secure and service their Ironic nodes. There are also, as always, a myriad of quality of life fixes, including improvements to sqlite support, and graceful shutdown of conductors.

We hope the latest release of Ironic serves you well!

New Features

  • Adds inspection hooks in the agent inspect interface for processing data received from the ramdisk at the /v1/continue_inspection endpoint. The four default configuration hooks ramdisk-error, validate-interfaces, ports and architecture are added. Two new configuration options default_hooks and hooks are added in the inspector configuration section to allow configuring the default enabled hooks and optional additional hooks, respectively.

  • Adds a new Ironic capability called service_steps which allows a deployed ACTIVE node to be modified utilizing a new API provision state verb of service which can include a list of service_steps to be performed. This work is inspired by clean_steps and deploy_steps and similar to those efforts, this functionality will continue to evolve as new features, functionality, and capabilities are added.

  • Adds a new driver method decorator base.service_step which operates exactly like the existing base.clean_step and base.deploy_step decorators. Driver methods which are decorated can be invoked utilizing the service steps.

  • Adds Firmware Interface support to ironic, we would like to receive feedback since this is a new feature we introduced and we as a developer community have limited hardware access, reach out to us in case of any unexpected behavior.

    • Adds version 1.86 of the Bare Metal API, which includes:

      • List all firmware components of a node via the GET /v1/nodes/{node_ident}/firmware API.

      • The firmware_interface field of the node resource. A firmware interface can be set when creating or updating a node.

      • The default_firmware_interface and enabled_firmware_interface fields of the driver resource.

    • Adds new configuration options for the firmware interface feature:

      • Firmware interfaces are enabled via [DEFAULT]/enabled_firmware_interfaces. A default firmware interface to use when creating or updating nodes can be specified with [DEFAULT]/default_firmware_interface.

    • Available interfaces: redfish, no-firmware and fake.

    • Support to update firmware of BIOS and BMC via update step, can be done via clean or deploy steps, the node should be using the redfish driver and set the firmware_interface.

  • Introduce new config parameters in the conductor group. The deploy_kernel_by_arch, deploy_ramdisk_by_arch, rescue_kernel_by_arch, and rescue_ramdisk_by_arch are dictionaries allowing operators to specify parameters of kernel and ramdisk by the architecture of the node.

  • Adds a [agent]allow_md5_checksum configuration option which can be used to tell ironic-python-agent versions newer than version 9.4.0 if MD5 is a permitted algorithm.

  • Adds the storage of the [json_rpc]port configuration value to the internal conductor hostname field when the [DEFAULT]rpc_transport setting is set to “json-rpc”. This allows deployments to utilize varying port configurations for JSON-RPC. As a result of this change, the RPC API version has been incremented to 1.57 and the feature is not available until any [DEFAULT]pin_release_version setting is removed.

Known Issues

  • When boot mode needs to be changed during provisioning, an additional reboot may happen on certain hardware. This is to ensure consistent behavior when any boot setting change results in a separate internal job.

Upgrade Notes

  • Ironic 23.0 is part of the OpenStack 2023.2 (Bobcat) release. This a non-SLURP release, meaning users of a 2023.1 (Antelope) cycle Ironic release can upgrade directly to the release accompanying 2024.1 (Caracal) when available. For more information, please visit Release Cadence Adjustment.

  • Changing the boot mode or the secure boot state via the direct API (/v1/nodes/{node_ident}/states/boot_mode and /v1/nodes/{node_ident}/states/secure_boot accordingly) may now result in a reboot. This happens when the change cannot be applied immediately. Previously, the change would be applied whenever the next reboot happens for any unrelated reason, causing inconsistent behavior.

  • Operators utilizing JSON-RPC transport to conductors with a non-default port configuration should expect to see the hash ring layout change as the port number is now included in the hash ring calculation. This will only occur once the hash ring pin has been removed.

  • Requires ironic-lib version 5.5.0 for the json-rpc port to be properly set and utilized.

Deprecation Notes

  • The deploy_kernel, deploy_ramdisk, rescue_kernel, and rescue_ramdisk parameters have been marked as deprecated as the new parameters allow more configuration options.

Bug Fixes

  • Fixes an issue where inspection would fail if an IPv6 address wrapped in brackets is used for the redfish BMC address. See bug: 2036455.

  • Fixes an issue where lookups to generate an agent token would stack up as the internal lock upgrade logic silently holds on to the request while trying to obtain a lock. The task creation will now immediately fail with a NodeLocked exception, which the agent will retry.

  • While updating boot mode or secure boot state in the Redfish driver, the node is now rebooted if the change is not detected on the System resource refresh. Ironic then waits up to [redfish]boot_mode_config_timeout seconds until the change is applied.

Other Notes

  • While investigating bug 2033430 we discovered we were emitting DHCP option 210 only with OVN, and never emitted it with dnsmasq because it was not being set previously. Our internal notes also indicated this was for PXELinux support, but was never actually needed. As it was excess, and redundant configuration being provided to Neutron, it has been removed.

22.1.0

New Features

  • Adds a capability for syncrhonous steps to be executed through the cleaning and deployment steps framework upon child nodes, as associated through the parent_node field. The new, optional step arguments are a boolean value of execute_on_child_nodes, and limit_child_node_execution which consists of a list of node UUIDs. The ability to invoke this permisison requires the ability to set a provision state action upon the child node in the RBAC model.

  • Adds a power_on, power_on, and reboot reserved step name actions which toggles power through the conductor. This allows embedded devices such as child nodes to have power state toggled as part of the parent node’s cleaning or deployment sequnece, if so stated through the supplied configuration or deployment template.

  • Adds a clean hold and a deploy hold provision state in which baremetal nodes can be put in utilizing specialed hold cleaning and deployment steps. Allowing for patterns and processes where Ironic’s work is intentionally paused to allow for any external or operator processes to take place. In these new states, a unhold provision state verb can be used to inform Ironic to proceed. The abort verb is also a possible option should operators wish to start over.

  • Adds the ability to send an unhold provision state verb utilizing API version 1.85.

  • Uses Redfish to collect the available hardware inventory information and stores it in the right format. Information collected includes cpu information including “count”, “architecture”, “model_name”, and “frequency”, disk “size” (in bytes), interface “mac_address”, “system_vendor” information including “product_name”, “serial_number” and “manufacturer”, and “current_boot_mode”.

  • Adds a wait clean/deploy step, which takes an optional argument, passed in a step definition of seconds to force an explicit pause of the current process. Otherwise the next heartbeat action triggers resumption of the process.

  • The ilo hardware type firmware upgrade steps, now support checksum determination by legnth in order to allow SHA256 and SHA512 checksums to be supplied by the step caller.

  • Methods in vendor interfaces may now be decroated with clean_step and deploy_step decorators.

  • The ipmitool vendor interface’s send_raw method can now be called as a part of cleaning or deployment steps with an “raw_bytes” argument matching how it can be called with the vendor passthru interface.

Upgrade Notes

  • This release removes two internal foreign key constraints which were redundant and which SQLAlchemy indicated may result in an error at some point in time. No action is required by an operator for this.

Deprecation Notes

  • The use of a SQLite database with mutli-process (i.e. ironic-api and ironic-conductor services) is not supported, and the ability to launch a dedicated ironic-api process with a SQLite database backend will be an error in the future. In this case, the single process combined API and Conductor service should be utilized.

  • The default value of the [inspector]require_managed_boot option will change from False to True in the future, causing in-band inspection to fail if the boot interface cannot prepare the ramdisk boot (e.g. in case of missing ports). Please set this option to an explicit value to avoid the behavior change.

Bug Fixes

  • Adds a database write retry decorate for SQLite failures reporting “database is locked”. By default, through the new configuration parameter [database]sqlite_max_wait_for_retry, retries will be performed on failing write operations for up to 30 seconds.

    This value can be tuned, but be warned it is an expotential backoff retry model, and HTTP requests can give up if no response is received in a reasonable time, thus 30 seconds was deemed a reasonable default.

    The retry logic can be disabled using the [database]sqlite_retries option, which defaults to True. Users of other, mutli-threaded/concurrent-write database platforms are not impacted by this change, as the retry logic recognizes if another database is in use and bypasses the retry logic in that case. A similar retry logic concept already exists with other databases in the form of a “Database Deadlock” retry where two writers conflict on the same row or table. The database abstraction layer already handles such deadlock conditions. The SQLite file based locking issue is unfortunately more common with file based write locking as the entire file, in other words the entire database, to perform the write operation.

  • Fixes issues with locks related to the execution of periodic tasks where the task has a lingering transaction. For more information please see bug 2027405.

  • Fixes a bug that occurs when a node is inspected more than once and the database is configured as a storage backend: a new node inventory entry is added in the database for each inspection result, causing more than one inventory to exist for the node in the node_inventory table.

    This is handled by:

    • Deleting any previous inventory entries for a node before adding a new entry in the database.

    • Retrieving the most recent node inventory from the database when the database is queried. (To cater for databases that already contain duplicate node inventories due to the bug.)

  • Fixes the bug where provisioning a Redfish managed node fails if the BMC only supports virtual media devices limited to MediaType of DVD (and not CD). Also ddds handling of BadRequest exceptions while iterating through the list of virtual media devices. This fix is needed to successfully provision machines such as Cisco UCSB and UCSX.

  • Fixes the bug where provisioning a Redfish managed node fails if changing BIOS settings is attempted on a BMC that doesn’t provide supportedApplyTime information. This is done by adding handling of AttributeError exception in apply_configuration() method.

  • Database locks with a sqlite database backend should now be lessened as the conductor will no longer perform a keepalive heartbeat operation when the use of SQLite has been detected.

Other Notes

  • Fixes the generated state machine diagram and updates it to match the current state of the code.

  • The ipmitool vendor passthrough interface method no longer requires a http_method parameter. This is optional in the code base, but included on all API initiated vendor passthru method calls. The value was not utilized.

22.0.0

New Features

  • Add new conductor conf option: [conductor]poweroff_in_cleanfail (default: False). when True nodes entering clean failed state will be powered off. This option may be unsafe when using Cleaning to perform hardware-transformative actions such as firmware upgrade.

  • Adds the concept of parent_node which allows a “child node”, such as an independently managed BMC controlled device deployed within a parent_node as part of API version 1.83. Child nodes are hidden from normal node lists as they are not “general purpose” machines, but have a specific embedded usage. In this model, RBAC rules also apply so if you wish an owner or lessee to have the child node visible, they must also have the the appropriate owner or lessee value set matching the parent node.

  • Adds a /v1/nodes/?include_children=True parameter to get a list of all nodes and their children.

  • Adds a /v1/nodes/?parent_node=<node_ident> query parameter to permit retrieval of a list of child nodes assigned to the parent denoted by <node_ident>.

  • On shutdown the conductor will wait for at most [DEFAULT]graceful_shutdown_timeout seconds for existing lock node reservations to clear. Previously lock reservations were cleared immediately, which in some cases would result in nodes going into a failed state.

  • The Redfish firmware upgrade interface now supports checksum determination by length, and sha256 and sha512 checksums may now be supplied to the step arguments.

Upgrade Notes

  • This upgrade contains an additional field for the nodes table, named parent_node. This update also indexes the parent_node database column to prevent performance issues in large deployments.

  • [DEFAULT]graceful_shutdown_timeout defaults to 60s. Systemd TimeoutStopSec defaults to 30s. Kubernetes terminationGracePeriodSeconds defaults to 90s. It is recommended to align the value of [DEFAULT]graceful_shutdown_timeout with the graceful timeout of the process manager of the conductor process.

  • Fully removes the cpus property from the documentation and inspect interface implementations. It was never used internally by Ironic, and is no longer used by Nova.

  • The defaults for kernel_append_params have had the Linux kernel command line parameter nomodeset removed from the defaults for the kernel_append_params settings. The nomodeset option is for troubleshooting and changes the behavior of the graphics interface such that memory can be locked upon graphical updates on physical servers with BMC graphical interfaces, which results in spikes in latency and packet loss whenever graphics updates occur. Operators may add the option to their local configuration, but should be aware that large image transfers or other high IO operations can be impacted.

Bug Fixes

  • [bug 2010613] Fixes issue with SNMP v3 auth protocol and priv protocol set in driver info not being retrieved correctly when a SNMP client is initialized.

  • Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.

  • Fixes Invalid cross-device link in some cases when using file:// image URLs.

  • Fixes issues in Ironic’s use of SQLAlchemy with SQLite Databases, which is common with users like Metal3, which prevented Ironic from supporting SQLAlchemy 2.0 properly, as autocommit was re-enabled.

  • Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.

  • Fixes bug in iRMC driver, where irmc power_interface sets and updates irmc_ipmi_succeed flag which is used by rest of iRMC driver code to deal with iRMC firmware’s IPMI incompatibility but ipmitool power_interface doesn’t set nor update irmc_ipmi_succeed flag and rest of iRMC driver code fail to handle iRMC firmware’s IPMI incompatibility correctly.

  • Fixes an issue where an agent token could be inadvertently orphaned if a node is already in the target power state when we attempt to turn the node off.

  • Fixes scope classification check with the “self_owned_node” policy check where it was limited to check execution with only project scoped, so system scoped users who ticked the policy endpoint would basically get an incorrect error.

  • Enables boot mode switching during anaconda deploy for ilo and ilo5 hardware types.

  • Fixes secure boot with anaconda deploy.

  • Fixes the bug where provisioning a Redfish managed node fails if the BMC doesn’t support EthernetInterfaces attribute, even if MAC address information is provided manually. This is done by handling of MissingAttributeError sushy exception in get_mac_addresses() method. This fix is needed to successfully provision machines such as Cisco UCSB and UCSX.

  • No longer re-calculates checksums for images that are already raw. Previously, it would cause significant delays in deploying raw images.

  • Fixes an issue where the database upgrade can hang on Python 3.10. This was because open transactions could become orphaned awaiting the Python runtime to clean up their memory references due to the way the overall database query was being intiiated to pre-flight check the upgrade. We have structurally changed the behavior to remedy this case.

  • Agents deploying on physical servers with default kernel arguments were suspetible to packet loss if a Matrox VGA/Aspeed BMC Graphics interface is present on the machine. The defaults have been changed to remove the use of the nomodeset kernel command line parameter which should only be used for troubleshooting as it has been determined that the memory updates can lock all of the kernel memory upon any console graphics update which can negatively impact IO for Networking or Disk interactions.

  • Fixes an issue where an agent token was being orphaned if a baremetal node timed out during cleaning operations, leading to issues where the node would not be able to establish a new token with Ironic upon future in some cases. We now always wipe the token in this case.