2024.1 Series Release Notes

24.1.1-3

Bug Fixes

  • Fixes issue with configuring virtual media boot for executing service steps by adding missing entries for states.SERVICING and states.SERVICEWAIT in the whitelist of the states allowed by this method.

  • Service step validation no longer requires a priority field, which is not supported for servicing.

  • Fixes generated URL when using the virtual media attachment API. Previously, it missed the node UUID, causing conflicts between different nodes.

24.1.0

Prelude

Ironic contributors are thrilled to present the release of 24.1.0, tested as part of OpenStack 2024.1 (Caracal) throughout the last six months. This release can be upgraded directly to from Ironic 21.4 as part of a SLURP upgrade from OpenStack 2023.1 (Antelope). Ironic’s first release came during the 2014.1 (Icehouse) cycle – a decade ago. In those ten years, redfish has been created, the default deploy driver has been replaced, and Ironic has expanded into the CNCF community with Metal3. Thanks for making us a part of your cloud!

New Features

  • Adds a http boot interface, based upon the pxe boot interface which informs the DHCP server of an HTTP URL to boot the machine from, and then requests the BMC boot the machine in UEFI HTTP mode.

  • Adds a http-ipxe boot interface, based upon the ipxe boot interface which informs the DHCP server of an HTTP URL to boot the machine from, and then requests the BMC boot the machine in UEFI HTTP mode.

  • Adds node auto-discovery support to the agent inspection implementation.

  • Add support for ovn vtep switches. Operators will be able to use logical and physical switches. Minimally tested in production.

  • Adds a new service ironic-pxe-filter that is designed to work with the agent inspect interface to conduct “unmanaged” inspection. It is adapted from the ironic-inspector’s dnsmasq PXE filter and can be used as its replacement. See documentation for more details.

  • Adds implementation of attach/detach generic virtual media device to the Redfish driver.

Known Issues

  • Testing of the http boot interface with Ubuntu 22.04 provided Grub2 yielded some intermittent failures which appear to be more environmental in nature as the signed Shim loader would start, then load the GRUB loader, and then some of the expected files might be attempted to be accessed, and then fail due to an apparent transfer timeout. Consultation with some grub developers concur this is likely environmental, meaning the specific grub build or CI performance related. If you encounter any issues, please do not hestitate to reach out to the Ironic developer community.

Upgrade Notes

  • Adds an online migration to the new inspection interface. If the agent inspection is enabled and the inspector inspection is disabled, the inspect_interface field will be updated for all nodes that use inspector and are currently not on inspection (i.e. not in the inspect wait or inspecting states).

    If some nodes may be inspecting during the upgrade, you may want to run the online migrations several times with a delay to finish migrating all nodes.

Deprecation Notes

  • The redfish vendor eject vmedia action is now deprecated and it will be removed during the next cycle in favor of the generic API.

Bug Fixes

  • Fixes Redfish virtual media boot on BMCs that only expose the VirtualMedia resource on Systems instead of Managers. For more informations, you can see bug 2039458.

  • Fixes a vague error when attempting to use the ilo hardware type with iLO6 hardware, by returning a more specific error suggesting action to take in order to remedy the issue. Specifically, one of the API’s used by the ilo hardware type is disabled in iLO6 BMCs in favor of users utilizing Redfish. Operators are advised to utilize the redfish hardware type for these machines.

  • Some of Ironic’s API endpoints, when the new RBAC policy is being enforced, were previously emitting 500 error codes when insufficent access rights were being used, specifically because the policy required system scope. This has been corrected, and the endpoints should now properly signal a 403 error code if insufficient access rights are present for an authenticated requestor.

  • Increases the 32-character limit of the user column in the NodeHistory model to support up to 64-character-long values. For more information, see bug.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • Fixes an issue where the conductor service would fail to launch when the neutron network_interface setting was enabled, and no global cleaning_network or provisioning_network is set in ironic.conf. These settings have long been able to be applied on a per-node basis via the API. As such, the service can now be started and will error on node validation calls, as designed for drivers missing networking parameters.

  • Each conductor now reserves a small proportion of its worker threads (5% by default) for API requests and other critical tasks. This ensures that the API stays responsive even under extreme internal load.

  • Provides a fix for service role support to enable the use case where a dedicated service project is used for cloud service operation to facilitate actions as part of the operation of the cloud infrastructure.

    OpenStack clouds can take a variety of configuration models for service accounts. It is now possible to utilize the [DEFAULT] rbac_service_role_elevated_access setting to enable users with a service role in a dedicated service project to act upon the API similar to a “System” scoped “Member” where resources regardless of owner or lessee settings are available. This is needed to enable synchronization processes, such as nova-compute or the networking-baremetal ML2 plugin to perform actions across the whole of an Ironic deployment, if desirable where a “System” scoped user is also undesirable.

    This functionality can be tuned to utilize a customized project name aside from the default convention service, for example baremetal or admin, utilizing the [DEFAULT] rbac_service_project_name setting.

    Operators can alternatively entirely override the service_role RBAC policy rule, if so desired, however Ironic feels the default is both reasonable and delineates sufficiently for the variety of Role Based Access Control usage cases which can exist with a running Ironic deployment.

  • Query parameters in the API that expect lists now accept repeated arguments (param=value1&param=value2) in addition to comma-separated strings (param=value1,value2). The former seems to be more common and is actually (incorrectly) used in GopherCloud.

  • Fixes error handling in the virtual media attachment API when the image downloading fails. Now the last_error field is populated correctly and the error is logged.

24.0.0

New Features

  • Adds the capability to define a default_conductor_group setting which allows operators to assign a default conductor group to new nodes created in Ironic if they do not otherwise have a conductor_group set upon creation. By default, this setting has no value.

  • Adds support for Redfish based HTTPBoot, which leveragings the DMTF Redfish HttpBootUri ComputerSystem resource in a BMC, to assert the URL for the next boot operation. This requires Sushy 4.7.0 as the minimum version.

  • Adds a new capability allowing to attach or detach generic iso images as virtual media devices after a node has been provisioned.

  • Previously the key for building temporary URLs from Swift was taken from the x-account-meta-temp-url-key header in the object store account. Now the header x-account-meta-temp-url-key-2 is also checked, which allows password rotation to occur without breaking old URLs.

    This applies to the following temporary URL scenarios:

    • Temp URL image transfer from Glance (when [glance]swift_temp_url_key is not set)

    • Publishing an image with the Swift publisher ([redfish]use_swift=True or [ilo]use_web_server_for_images=False)

    • Storing the config drive in Swift ([deploy]configdrive_use_object_store=True)

    • Fetching Swift stored firmware update payloads.

  • Introducing basic authentication and configurable authentication strategy support for image and image checksum download processes. This feature introduces 3 new configuration variables that could be used to select the authentication strategy and provide credentials for authentication strategies. The 3 variables are structured in way that 1 of them [deploy]image_server_auth_strategy (string) provides the ability to select between authentication strategies by specifying the name of the authentication strategy.

    Currently the only supported authentication strategy is the http-basic which will make IPA use HTTP(S) basic authentication also known as the RFC 7617 standard. The other 2 variables are [deploy]image_server_password and [deploy]image_server_user provide username and password credentials for image download processes. The [deploy]image_server_password and [deploy]image_server_user are not strategy specific and could be reused for any username + password based authentication strategy, but for the moment these 2 variables are only used for the http-basic strategy.

    [deploy]image_server_auth_strategy doesn’t just enable the feature but enforces checks on the values of the 2 related credentials. When the http-basic strategy is enabled for image server download workflow the download logic will make sure to raise an exception in case any of the credentials are None or an empty string.

    Example of activating the http-basic strategy can be found in HTTP(s) Authentication strategy for user image servers section of the admin guide.

Upgrade Notes

  • The Ironic service API Role Based Access Control policy has been updated to disable the legacy RBAC policy by default. The effect of this is that deprecated legacy roles of baremetal_admin and baremetal_observer are no longer functional by default, and policy checks may prevent actions such as viewing nodes when access rights do not exist by default.

    This change is a result of the new policy which was introduced as part of Secure Role Based Access Control effort along with the Consistent and Secure RBAC community goal and the underlying [oslo_policy] enforce_scope and [oslo_policy] enforce_new_defaults settings being changed to True.

    The Ironic project believes most operators will observe no direct impact from this change, unless they are specifically running legacy access configurations utilizing the legacy roles for access.

    Operators which are suddenly unable to list or deploy nodes may have a misconfiguration in credentials, or need to allow the user’s project the ability to view and act upon the node through the node owner or lessee fields. By default, the Ironic API policy permits authenticated requests with a system scoped token to access all resources, and applies a finer grained access model across the API for project scoped users.

    Ironic users who have not already changed their nova-compute service settings for connecting to Ironic may also have issues scheduling Bare Metal nodes. Use of a system scoped user is available, by setting [ironic] system_scope to a value of all in your nova-compute service configuration, which can be done independently of other services, as long as the credentials supplied are also valid with Keystone for system scoped authentication.

    Heat users which encounter any issues after this upgrade, should check their user’s roles. Heat’s execution and model is entirely project scoped, which means users will need to have access granted through the owner or lessee field to work with a node.

    Operators wishing to revert to the old policy configuration may do so by setting the following values in ironic.conf.:

    [oslo_policy]
    enforce_new_defaults=False
    enforce_scope=False
    

    Operators who revert the configuration are encourated to make the necessary changes to their configuration, as the legacy RBAC policy will be removed at some point in the future in alignment with 2024.1-Release Timeline. Failure to do so will may force operators to craft custom policy override configuration.

  • Removes the sphinxcontrib-seqdiag dependency as the Pillow upgrade to version 10.x (from OpenStack upper constraints) breaks its usage. seqdiag has not been maintained for the last 3 years, hence the upgrade causes it to break. In the ironic docs (source) rst files, adds references to svg files, and keeps the svg files in the doc/source/images/ directory, alongside their associated .diag files as backup.

  • The default value of the configuration option [inspector]require_managed_boot is now True for the newer agent inspect interface. The older inspector implementation is not affected. Operators with deployments that support unmanaged inspection must set this value to False explicitly.

  • python-swiftclient is no longer a dependency, all OpenStack Swift operations are now down using openstacksdk.

    Configuration option [swift]swift_max_retries has been removed and any custom value will no longer have any effect on failed object-store operations.

Deprecation Notes

  • The deploy_kernel, deploy_ramdisk, rescue_kernel and rescue_ramdisk configuration options, incorrectly deprecated in the 2023.2 release series, are no longer deprecated.

  • The idrac hardware type management interface steps import_configuration and export_configuration steps are deprecated, and will be removed once a formalized generic step templating mechanism has been created within Ironic. The Ironic community is open to reconsidering this decision should the overall bulk configuration reset/templating model become adopted by DMTF Redfish as a standardized cross-vendor feature.

  • The ibmc hardware type is deprecated due to a lack of upstream communication, driver maintenance, and a recognition that the Redfish hardware type likely works for the users at this point. This driver is expected to be removed during the 2024.2 development cycle.

  • The xclarity hardware type is deprecated due to a lack of upstream communication, driver maintenance, and a recognition that the Redfish hardware type is suitable for Lenovo hardware users moving forward. This driver is expected to be removed during the 2024.2 development cycle.

  • The idrac-wsman interfaces on the idrac hardware type are deprecated due to a lack of upstream communication, and the decision of the driver’s maintainer in the past to move in to the direction of using Redfish for driver interactions. These driver interfaces are expected to be removed during the 2024.2 development cycle.

  • Rootwrap support is deprecated since Ironic no longer runs any commands as root. Files /etc/ironic/rootwrap.conf, /etc/ironic/rootwrap.d and the ironic-rootwrap command will be removed in a future release.

Bug Fixes

  • Firmware components are now also cached on the transition to the manageable state in addition to cleaning. This is consistent with how BIOS settings, vendor and boot mode are cached.

  • Fixes the behavior of file:/// image URLs pointing at a symlink. Ironic no longer creates a hard link to the symlink, which could cause confusing FileNotFoundError to happen if the symlink is relative.

  • Nodes no longer get stuck in cleaning when the firmware components caching code raises an unexpected exception.

  • Prevents a database constraints error on caching firmware components when a supported component does not have the current version.

  • Fixes an issue when listing allocations as a project scoped user when the legacy RBAC policies have been disabled which forced an HTTP 406 error being erroneously raised. Users attempting to list allocations with a specific owner, different from their own, will now receive an HTTP 403 error.

  • In case the lldp raw data collected by the inspection process includes non utf-8 information, the parser fails breaking the inspection process. This patch works around that excluding the malformed data and adding an entry in the logs to provide information on the failed tlv.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • When configured to listen on a unix socket, Ironic will now properly cleanup the unix socket on a clean service stop.

  • The idrac hardware type is now compatible with the redfish firmware interface. The link between them was missing initially.

  • Fixes the inspection lookup to consider all nodes with the same BMC hostname, as can happen with Redfish. In this case, the nodes are distinguished by MAC addresses.

  • Fixes getting details of a conductor if it uses a non-standard JSON RPC port or an IPv6 address as the name, e.g. GET /v1/conductors/[2001:db8::1]:8090. Previously, it would result in a HTTP error 400.

  • Fixes enable_netboot_fallback to write out pxe config on adopt.

  • When configuring secure boot via Redfish, internal server errors are now retried for a longer period than by default, accounting for the SecureBoot resource unavailability during configuration on some hardware.

  • Fixes Raid creation issue in iLO6 and other BMC with latest schema by removing ‘VolumeType’, ‘Encrypted’ and changing placement of ‘Drives’ to inside ‘Links’.

  • Fixes the payload format required to query physical storage drives using redfish, when configuring RAID using redfish.

  • Uses the volume_name provided in the target_raid_config field of a node to set the storage volume name when configuring RAID with the redfish driver (instead of discarding the volume_name given in target_raid_config)

  • Use the ‘volume_name’ field from the logical_disk in the target_raid_config field of a node, instead of just ‘name’ (which is incorrect as per the Ironic API expectation), to create the RAID volume using the Redfish driver

Other Notes

  • The classic ilo hardware types may be deprecated in the future for removal or major changes, however our last communication with the maintainers as of the 2024.1 Project Teams Gathering sessions indicated they were still working to determine their own forward path with a strong emphasis on the use of Redfish.

23.1.0

New Features

  • Sending signal SIGUSR2 to a conductor process will now trigger a drain shutdown. This is similar to a SIGTERM graceful shutdown but the timeout is determined by [DEFAULT]drain_shutdown_timeout which defaults to 1800 seconds. This is enough time for running tasks on existing reserved nodes to either complete or reach their own failure timeout.

    During the drain period the conductor will be removed from the hash ring to prevent new tasks from starting. Other conductors will no longer fail reserved nodes on the draining conductor, which previously appeared to be orphaned. This is achieved by running the conductor keepalive heartbeat for this period, but setting the online state to False.

  • While Ironic has not explicitly added support for OVN, because that is in theory a Neutron implementation detail, we have added some basic testing and are pleased to announce that you can use OVN’s DHCP service for IPv4 based provisioning with OVN v23.06.00 and beyond. This is not without issues, and we’ve added ovn documentation as a result to help provide as much Ironic operator clarity as possible.

Known Issues

  • Use of OVN may require disabling SNAT for provisioning with IPv4 when using TFTP. This is due to the Linux Kernel, and how IP packet handling occurs with OVN. No solution is known to this issue, and use of provisioning technologies which do not use TFTP is also advisable.

  • Use of OVN may require careful attention to the MTUs of networks. Oversized packets and networking may be dropped. That being said this is more likely an issue for testing than with actual physical baremetal in a production deployment.

  • Use of OVN for IPv6 based PXE/iPXE is not supported by Neutron. The Ironic project expects this to be addressed during the Caracal (2024.1) development cycle.

  • When configuring a single-conductor environment, make sure the number of worker pools ([conductor]worker_pool_size) is larger than the maximum parallel deployments ([conductor]max_concurrent_deploy). This was not the case by default previously (the options used to be set to 100 and 250 accordingly).

Upgrade Notes

  • Because of a fix in the internal worker pool handling, you may now start seeing requests rejected with HTTP 503 under a very high load earlier than before. In this case, try increasing the [conductor]worker_pool_size option or consider adding more conductors.

  • The default worker pool size (the [conductor]worker_pool_size option) has been increased from 100 to 300. You may want to consider increasing it even further if your environment allows that.

Bug Fixes

  • The parent_node field, a newly added API field, has been constrained to store UUIDs over the names of nodes. When names are used, the value is changed to the UUID of the node.

  • Properly eject the virtual media from a DVD device in case this is the only MediaType available from the Hardware, and Ironic requested CD as the device to be used. See bug 2039042 for details.

  • When Ironic hits the limit on the number of the concurrent deploys (specified in the [conductor]max_concurrent_deploy option), the resulting HTTP code is now 503 instead of the more generic 500.

  • The per-node external_http_url setting in the driver info is now used for a boot ISO. Previously this setting was only used for a config floppy.

  • Fixes issue of changing or getting state of indicator LED of attached disk caused by misunderstanding SimpleStorage provides this functionality but actually Storage resource does.

  • Fixes handling new requests when the maximum number of internal workers is reached. Previously, after reaching the maximum number of workers (100 by default), we would queue the same number of requests (100 again). This was not intentional, and now Ironic no longer queues requests if there are no free threads to run them.