2023.2 Series (21.5.0 - 23.0.x) Release Notes

23.0.3

Security Issues

  • An issue in Ironic has been resolved where image checksums would not be checked prior to the conversion of an image to a raw format image from another image format.

    With default settings, this normally would not take place, however the image_download_source option, which is available to be set at a node level for a single deployment, by default for that baremetal node in all cases, or via the [agent]image_download_source configuration option when set to local. By default, this setting is http.

    This was in concert with the [DEFAULT]force_raw_images when set to True, which caused Ironic to download and convert the file.

    In a fully integrated context of Ironic’s use in a larger OpenStack deployment, where images are coming from the Glance image service, the previous pattern was not problematic. The overall issue was introduced as a result of the capability to supply, cache, and convert a disk image provided as a URL by an authenticated user.

    Ironic will now validate the user supplied checksum prior to image conversion on the conductor. This can be disabled using the [conductor]disable_file_checksum configuration option.

Bug Fixes

  • Fixes inspection failure when bmc_address or bmc_v6address is null in the inventory received from the ramdisk.

  • Fixes a security issue where Ironic would fail to checksum disk image files it downloads when Ironic had been requested to download and convert the image to a raw image format. This required the image_download_source to be explicitly set to local, which is not the default.

    This fix can be disabled by setting [conductor]disable_file_checksum to True, however this option will be removed in new major Ironic releases.

    As a result of this, parity has been introduced to align Ironic to Ironic-Python-Agent’s support for checksums used by standalone users of Ironic. This includes support for remote checksum files to be supplied by URL, in order to prevent breaking existing users which may have inadvertently been leveraging the prior code path. This support can be disabled by setting [conductor]disable_support_for_checksum_files to True.

  • Fixes aborting in-band inspection. Previously, it would fail with Can not transition from state 'inspect failed' on event 'abort'.

23.0.2

Upgrade Notes

  • When upgrading Ironic to address the qemu-img image conversion security issues, the ironic-python-agent ramdisks will also need to be upgraded.

  • When upgrading Ironic to address the qemu-img image conversion security issues, the [conductor]conductor_always_validates_images setting may be set to True as a short term remedy while ironic-python-agent ramdisks are being updated. Alternatively it may be advisable to also set the [agent]image_download_source setting to local to minimize redundant network data transfers.

  • As a result of security fixes to address qemu-img image conversion security issues, a new configuration parameter has been added to Ironic, [conductor]permitted_image_formats with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.

Security Issues

  • Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in raw format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting [conductor]disable_deep_image_inspection to True. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic can also inspect images which would normally be provided as a URL for direct download by the ironic-python-agent ramdisk. This is not enabled by default as it will increase the overall network traffic and disk space utilization of the conductor. This level of inspection can be enabled by setting [conductor]conductor_always_validates_images to True. Once the ironic-python-agent ramdisk has been updated, it will perform similar image security checks independently, should an image conversion be required. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic now explicitly enforces a list of permitted image types for deployment via the [conductor]permitted_image_formats setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known to qemu-img, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.

  • Ironic now explicitly passes the source input format to executions of qemu-img to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks against qemu-img.

  • The ansible deploy interface example playbooks now supply an input format to execution of qemu-img. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations of qemu-img. If you do not do so, qemu-img will automatically try and guess which can lead to known security issues with the incorrect source format driver.

  • Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking qemu-img. If there are any questions or concerns, please reach out to the Ironic project developers.

  • Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.

  • The [conductor]disable_deep_image_inspection setting is conveyed to the ironic-python-agent ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.

  • The [conductor]permitted_image_formats setting is conveyed to the ironic-python-agent ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.

Bug Fixes

  • Fixes issue with configuring virtual media boot for executing service steps by adding missing entries for states.SERVICING and states.SERVICEWAIT in the whitelist of the states allowed by this method.

  • Fixes multiple issues in the handling of images as it relates to the execution of the qemu-img utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from an ironic-conductor process’s operating environment.

    Ironic now explicitly enforces a list of approved image formats as a [conductor]permitted_image_formats list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.

  • Service step validation no longer requires a priority field, which is not supported for servicing.

  • Adds an ISO publisher value to ISO images which are mastered as part of cleaning/deployment/service operations in support of a fix for bug 2032377.

23.0.1

Bug Fixes

  • Fixes an issue with units tests that show this DeprecationWarning: The metaschema specified by $schema was not found. Using the latest draft to validate, but this will raise an error in the future. cls = validator_for(schema) Removed the warning for deprecated schema by using a new template.

  • Fixes the issue of service steps not starting due to servicing states (states.SERVICING and states.SERVICEWAIT) missing from _FASTTRACK_HEARTBEAT_ALLOWED constant.

  • Firmware components are now also cached on the transition to the manageable state in addition to cleaning. This is consisent with how BIOS settings, vendor and boot mode are cached.

  • Fixes the behavior of file:/// image URLs pointing at a symlink. Ironic no longer creates a hard link to the symlink, which could cause confusing FileNotFoundError to happen if the symlink is relative.

  • Nodes no longer get stuck in cleaning when the firmware components caching code raises an unexpected exception.

  • Prevents a database constraints error on caching firmware components when a supported component does not have the current version.

  • Fixes an issue when listing allocations as a project scoped user when the legacy RBAC policies have been disabled which forced an HTTP 406 error being erroneously raised. Users attempting to list allocations with a specific owner, different from their own, will now receive an HTTP 403 error.

  • Properly eject the virtual media from a DVD device in case this is the only MediaType available from the Hardware, and Ironic requested CD as the device to be used. See bug 2039042 for details.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • When configured to listen on a unix socket, Ironic will now properly cleanup the unix socket on a clean service stop.

  • The idrac hardware type is now compatible with the redfish firmware interface. The link between them was missing initially.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • When Ironic hits the limit on the number of the concurrent deploys (specified in the [conductor]max_concurrent_deploy option), the resulting HTTP code is now 503 instead of the more generic 500.

  • The per-node external_http_url setting in the driver info is now used for a boot ISO. Previously this setting was only used for a config floppy.

  • Fixes an issue where the conductor service would fail to launch when the neutron network_interface setting was enabled, and no global cleaning_network or provisioning_network is set in ironic.conf. These settings have long been able to be applied on a per-node basis via the API. As such, the service can now be started and will error on node validation calls, as designed for drivers missing networking parameters.

  • When configuring secure boot via Redfish, internal server errors are now retried for a longer period than by default, accounting for the SecureBoot resource unavailability during configuration on some hardware.

  • Fixes Raid creation issue in iLO6 and other BMC with latest schema by removing ‘VolumeType’, ‘Encrypted’ and changing placement of ‘Drives’ to inside ‘Links’.

  • Provides a fix for service role support to enable the use case where a dedicated service project is used for cloud service operation to facilitate actions as part of the operation of the cloud infrastructure.

    OpenStack clouds can take a variety of configuration models for service accounts. It is now possible to utilize the [DEFAULT] rbac_service_role_elevated_access setting to enable users with a service role in a dedicated service project to act upon the API similar to a “System” scoped “Member” where resources regardless of owner or lessee settings are available. This is needed to enable synchronization processes, such as nova-compute or the networking-baremetal ML2 plugin to perform actions across the whole of an Ironic deployment, if desirable where a “System” scoped user is also undesirable.

    This functionality can be tuned to utilize a customized project name aside from the default convention service, for example baremetal or admin, utilizing the [DEFAULT] rbac_service_project_name setting.

    Operators can alternatively entirely override the service_role RBAC policy rule, if so desired, however Ironic feels the default is both reasonable and delineates sufficiently for the variety of Role Based Access Control usage cases which can exist with a running Ironic deployment.

  • Fixes service steps that rely on a reboot. Previously, the reboot was not properly recognized in the conductor logic.

23.0.0

Prelude

Ironic is proud to announce the release of 23.0, the capstone release of a six month OpenStack 2023.2 (Bobcat) cycle.

Our focus this cycle has been on improving the ability for operators to secure and service their Ironic nodes. There are also, as always, a myriad of quality of life fixes, including improvements to sqlite support, and graceful shutdown of conductors.

We hope the latest release of Ironic serves you well!

New Features

  • Adds inspection hooks in the agent inspect interface for processing data received from the ramdisk at the /v1/continue_inspection endpoint. The four default configuration hooks ramdisk-error, validate-interfaces, ports and architecture are added. Two new configuration options default_hooks and hooks are added in the inspector configuration section to allow configuring the default enabled hooks and optional additional hooks, respectively.

  • Adds a new Ironic capability called service_steps which allows a deployed ACTIVE node to be modified utilizing a new API provision state verb of service which can include a list of service_steps to be performed. This work is inspired by clean_steps and deploy_steps and similar to those efforts, this functionality will continue to evolve as new features, functionality, and capabilities are added.

  • Adds a new driver method decorator base.service_step which operates exactly like the existing base.clean_step and base.deploy_step decorators. Driver methods which are decorated can be invoked utilizing the service steps.

  • Adds Firmware Interface support to ironic, we would like to receive feedback since this is a new feature we introduced and we as a developer community have limited hardware access, reach out to us in case of any unexpected behavior.

    • Adds version 1.86 of the Bare Metal API, which includes:

      • List all firmware components of a node via the GET /v1/nodes/{node_ident}/firmware API.

      • The firmware_interface field of the node resource. A firmware interface can be set when creating or updating a node.

      • The default_firmware_interface and enabled_firmware_interface fields of the driver resource.

    • Adds new configuration options for the firmware interface feature:

      • Firmware interfaces are enabled via [DEFAULT]/enabled_firmware_interfaces. A default firmware interface to use when creating or updating nodes can be specified with [DEFAULT]/default_firmware_interface.

    • Available interfaces: redfish, no-firmware and fake.

    • Support to update firmware of BIOS and BMC via update step, can be done via clean or deploy steps, the node should be using the redfish driver and set the firmware_interface.

  • Introduce new config parameters in the conductor group. The deploy_kernel_by_arch, deploy_ramdisk_by_arch, rescue_kernel_by_arch, and rescue_ramdisk_by_arch are dictionaries allowing operators to specify parameters of kernel and ramdisk by the architecture of the node.

  • Adds a [agent]allow_md5_checksum configuration option which can be used to tell ironic-python-agent versions newer than version 9.4.0 if MD5 is a permitted algorithm.

  • Adds the storage of the [json_rpc]port configuration value to the internal conductor hostname field when the [DEFAULT]rpc_transport setting is set to “json-rpc”. This allows deployments to utilize varying port configurations for JSON-RPC. As a result of this change, the RPC API version has been incremented to 1.57 and the feature is not available until any [DEFAULT]pin_release_version setting is removed.

Known Issues

  • When boot mode needs to be changed during provisioning, an additional reboot may happen on certain hardware. This is to ensure consistent behavior when any boot setting change results in a separate internal job.

Upgrade Notes

  • Ironic 23.0 is part of the OpenStack 2023.2 (Bobcat) release. This a non-SLURP release, meaning users of a 2023.1 (Antelope) cycle Ironic release can upgrade directly to the release accompanying 2024.1 (Caracal) when available. For more information, please visit Release Cadence Adjustment.

  • Changing the boot mode or the secure boot state via the direct API (/v1/nodes/{node_ident}/states/boot_mode and /v1/nodes/{node_ident}/states/secure_boot accordingly) may now result in a reboot. This happens when the change cannot be applied immediately. Previously, the change would be applied whenever the next reboot happens for any unrelated reason, causing inconsistent behavior.

  • Operators utilizing JSON-RPC transport to conductors with a non-default port configuration should expect to see the hash ring layout change as the port number is now included in the hash ring calculation. This will only occur once the hash ring pin has been removed.

  • Requires ironic-lib version 5.5.0 for the json-rpc port to be properly set and utilized.

Deprecation Notes

  • The deploy_kernel, deploy_ramdisk, rescue_kernel, and rescue_ramdisk parameters have been marked as deprecated as the new parameters allow more configuration options.

Bug Fixes

  • Fixes an issue where inspection would fail if an IPv6 address wrapped in brackets is used for the redfish BMC address. See bug: 2036455.

  • Fixes an issue where lookups to generate an agent token would stack up as the internal lock upgrade logic silently holds on to the request while trying to obtain a lock. The task creation will now immediately fail with a NodeLocked exception, which the agent will retry.

  • While updating boot mode or secure boot state in the Redfish driver, the node is now rebooted if the change is not detected on the System resource refresh. Ironic then waits up to [redfish]boot_mode_config_timeout seconds until the change is applied.

Other Notes

  • While investigating bug 2033430 we discovered we were emitting DHCP option 210 only with OVN, and never emitted it with dnsmasq because it was not being set previously. Our internal notes also indicated this was for PXELinux support, but was never actually needed. As it was excess, and redundant configuration being provided to Neutron, it has been removed.