Wallaby Series (16.1.0 - 17.0.x) Release Notes

17.1.0-19

Upgrade Notes

  • When upgrading Ironic to address the qemu-img image conversion security issues, the ironic-python-agent ramdisks will also need to be upgraded.

  • As a result of security fixes to address qemu-img image conversion security issues, a new configuration parameter has been added to Ironic, [conductor]permitted_image_formats with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.

  • Adds sha256, sha384 and sha512 as supported SNMPv3 authentication protocols to iRMC driver.

Security Issues

  • Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in raw format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting [conductor]disable_deep_image_inspection to True. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic also inspect images which would normally be provided as a URL for direct download by the ironic-python-agent ramdisk. This is enabled by default and increases the overall network traffic and disk space utilization of the conductor. This level of inspection can be disabled by setting [conductor]conductor_always_validates_images to False. Doing so is not advisable as Zed release and earlier ironic-python-agent ramdisks will not be made available due to backport regression risk. This is the result of CVE-2024-44082 tracked as bug 2071740.

  • Ironic now explicitly enforces a list of permitted image types for deployment via the [conductor]permitted_image_formats setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known to qemu-img, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.

  • Ironic now explicitly passes the source input format to executions of qemu-img to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks against qemu-img.

  • The ansible deploy interface example playbooks now supply an input format to execution of qemu-img. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations of qemu-img. If you do not do so, qemu-img will automatically try and guess which can lead to known security issues with the incorrect source format driver.

  • Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking qemu-img. If there are any questions or concerns, please reach out to the Ironic project developers.

  • Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.

  • The [conductor]disable_deep_image_inspection setting is conveyed to the ironic-python-agent ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.

  • The [conductor]permitted_image_formats setting is conveyed to the ironic-python-agent ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.

Bug Fixes

  • Fixes multiple issues in the handling of images as it relates to the execution of the qemu-img utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from an ironic-conductor process’s operating environment.

    Ironic now explicitly enforces a list of approved image formats as a [conductor]permitted_image_formats list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.

  • Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.

  • Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.

  • Fixes an issue where a System Scoped user could not trigger a node into a manageable state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in the system scope, there is no associated project and the request fails.

    Ironic now checks if the request has been made with a system scope, and if so it utilizes the internal credential configuration to communicate with Neutron.

  • Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s driver_info, and can be used for authentication:

    • irmc_snmp_user

    • irmc_snmp_auth_password

    • irmc_snmp_priv_password

    • irmc_snmp_auth_proto (Optional, defaults to sha)

    • irmc_snmp_priv_proto (Optional, defaults to aes)

    irmc_snmp_user replaces irmc_snmp_security. irmc_snmp_security will be ignored if irmc_snmp_user is set. irmc_snmp_auth_proto and irmc_snmp_priv_proto can also be set through the following options in the [irmc] section of /etc/ironic/ironic.conf:

    • snmp_auth_proto

    • snmp_priv_proto

  • Modify iRMC driver to use ironic.conf [deploy] default_boot_mode to determine default boot_mode.

  • Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.

  • Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an agent token had been established, which is the very first step in agent initialization.

Other Notes

  • Updates the minimum version of python-scciclient library to 0.10.1.

17.1.0

Upgrade Notes

  • On Wallaby release, to use certification file on HTTPS connection, iRMC driver requires python-scciclient version to be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0 or >=0.10.1,<0.11.0 and packaging >=16.5

Security Issues

  • Modifies the irmc hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0, or >=0.10.1,<0.11.0 Or certificate verification will not occur.

Bug Fixes

  • Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.

  • For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.

  • The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).

  • For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.

  • For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.

  • For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.

  • The anaconda deploy interface was saving internal information in the node’s instance_info, in the user-facing ‘stage2’ and ‘ks_template’ fields. This broke rebuilds using a different image with different stage2 or template specified in the image properties. This has been fixed by saving the information in the node’s driver_internal_info instead.

  • Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the redfish-virtual-media boot interface. Previously, the ISO would not be configured.

  • Fixes a bug in the anaconda deploy interface where the ‘ks_options’ key was not found when rendering the default kickstart template.

  • Fixes issue where PXEAnacondaDeploy interface’s deploy() method did not return states.DEPLOYWAIT so the instance went straight to ‘active’ instead of ‘wait call-back’.

  • Fixes an issue where the anaconda deploy interface mistakenly expected ‘squashfs_id’ instead of ‘stage2_id’ property on the image.

  • Fixes the heartbeat mechanism in the default kickstart template ks.cfg.template as the heartbeat API only accepts ‘POST’ and expects a mandatory ‘callback_url’ parameter.

  • Fixes handling of tarball images in anaconda deploy interface. Allows user specified file extensions to be appended to the disk image symlink. Users can now set the file extensions by setting the ‘disk_file_extension’ property on the OS image. This enables users to deploy tarballs with anaconda deploy interface.

  • Fixes issue where automated cleaning was not supported when anaconda deploy interface is used.

  • Fixed an issue where duplicate extra DHCP options was passed in the port update request to the Networking service. The duplicate DHCP options caused an error in the Networking service and node provisioning would fail. See bug: 2009774.

  • Fixes idrac-wsman management interface set_boot_device method that would fail deployment when there are existing jobs present with error “Failed to change power state to ‘’power on’’ by ‘’rebooting’’. Error: DRAC operation failed. Reason: Unfinished config jobs found: <list of existing jobs>. Make sure they are completed before retrying.”. Now there can be non-BIOS jobs present during deployment. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving to idrac-redfish that does not have this limitation when setting boot device.

  • Fixed an issue where provisioning/cleaning would fail on IPv6 routed provider networks. See bug: 2009773.

  • Fixes redfish and idrac-redfish RAID create_configuration, apply_configuration, delete_configuration clean and deploy steps to update node’s raid_config field at the end of the steps.

  • Fixes the determination of a failed RAID configuration task in the redfish hardware type. Prior to this fix the tasks that have failed were reported as successful.

  • Fixes the redfish hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to use redfish raid interface on iDRAC systems.

  • Fixes redfish-virtual-media boot interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work with redfish-virtual-media before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to use redfish-virtual-media with iDRAC.

  • Fixes an issue where clients would get a 404 due to the node pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes an issue where clients would get a 404 due to the port and portgroups pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes File name too long in the image caching code when a URL contains a long query string.

  • Fixes the initrd kernel parameter when booting ramdisk directly from Swift/RadosGW using iPXE. Previously it was always deploy_ramdisk, even when the actual file name is different.

  • Adds driver_info/irmc_verify_ca option to specify certification file. Default value of driver_info/irmc_verify_ca is True.

  • Fixes an issue with installation of Ansible in driver-requirements.txt on Python 3.8. Since the release of Ansible 6.0.0, significant backtracking occurred in the Pip resolver.

  • Fixes connection caching issues with Redfish BMCs where AccessErrors were previously not disqualifying the cached connection from being re-used. Ironic will now explicitly open a new connection instead of using the previous connection in the cache. Under normal circumstances, the sushy redfish library would detect and refresh sessions, however a prior case exists where it may not detect a failure and contain cached session credential data which is ultimately invalid, blocking future access to the BMC via Redfish until the cache entry expired or the ironic-conductor service was restarted. For more information please see story 2009719.

17.0.4

Upgrade Notes

  • The query pattern for the database when lists of nodes are retrieved has been changed to a more efficient pattern at scale, where a list of nodes is generated, and then additional queries are executed to composite this data together. This is from a model where the database client in the conductor was having to deduplicate the resulting data set which is overall less efficent.

Critical Issues

  • Fixes upgrade failure caused by the missing version of BIOSSetting database objects.

Bug Fixes

  • Skips port creation during redfish inspect for devices reported without a MAC address.

  • Fixes potential cache coherency issues by caching the AgentClient per task, rather than globally.

  • Fixes a regression in the ramdisk deploy where custom kernel parameters were not used during inspection and cleaning.

  • Slow database retrieval of nodes has been addressed at the lower layer by explicitly passing and handling only the requested fields. The result is excess discarded work is not performed, making the overall process more efficent. This is particullarly beneficial for OpenStack Nova’s syncronization with Ironic.

  • Fixes configuring Redfish RAID using interface_type when error “failed to find matching physical disks for all logical disks” occurs.

  • Fixes issue in idrac-redfish clean/deploy step import_configuration where partially successful jobs were treated as fully successful. Such jobs, completed with errors, are now treated as failures.

  • Fix idrac-redfish clean/deploy step import_configuration to handle completed import configuration tasks that are deleted by iDRAC before Ironic has checked task’s status. Prior iDRAC firmware version 5.00.00.00 completed tasks are deleted after 1 minute in iDRAC Redfish. That is not always sufficient to check for their status in periodic check that runs every minute by default. Before this fix node got stuck in wait mode forever. This is fixed by failing the step with error informing to decrease periodic check interval or upgrade iDRAC firmware if not done already.

  • Fixes idrac-wsman BIOS and RAID interface steps to correctly check status of iDRAC job that completed with errors. Now these jobs are treated as failures. Before this fix node stayed in wait state as it was only checking for “Completed” or “Failed” job status, but not “Completed with Errors”.

  • Fixes idrac-wsman power interface to wait for the hardware to reach the target state before returning. For systems where soft power off at the end of deployment to boot to instance failed and forced hard power off was used, this left node successfully deployed in off state without any errors. This broke other workflows expecting node to be on booted into OS at the end of deployment. Additional information can be found in story 2009204.

  • When an http(s):// image is used, the cached copy of the image will always be updated if the HTTP server does not provide the last modification date and time. Previously the cached image would be considered up-to-date, which could cause invalid behavior if the image is generated on fly or was modified while being served.

  • Improves record retrieval performance for baremetal nodes by enabling ironic to not make redundant calls as part of generating API result sets for the baremetal nodes endpoint.

  • Fixes the pattern of execution for periodic tasks such that the majority of drivers now evaluate if work needs to be performed in advance of creating a node task. Depending on the individual driver query pattern, this prevents excess database queries from being triggered with every task execution.

  • Removes unused local images after ejecting a virtual media device via the eject_vmedia vendor passthru call of the redfish vendor interface.

  • In Redfish RAID clean and deploy steps skip non-RAID storage controllers for RAID operations. In Redfish systems that do not implement SupportedRAIDTypes they are still processed and could result in unexpected errors.

  • Retries ssl.SSLError when connecting to the agent.

  • Fixes an issue of powering off with the idrac-wsman management interface while the execution of a clear job queue cleaning step is proceeding. Prior to this fix, the clean step would fail when powering off a node.

Other Notes

  • The default database query pattern has been changed which will result in additional database queries when compositing lists of nodes by separately querying traits and tags. Previously this was a joined query which requires deduplication of the result set before building composite objects.

17.0.3

Security Issues

  • Fixes an issue with the /v1/nodes/detail endpoint where an authenticated user could explicitly ask for an instance_uuid lookup and the associated node would be returned to the user with sensitive fields redacted in the result payload if the user did not explicitly have owner or lessee permissions over the node. This is considered a low-impact low-risk issue as it requires the API consumer to already know the UUID value of the associated instance, and the returned information is mainly metadata in nature. More information can be found in Storyboard story 2008976.

Bug Fixes

  • If the agent accepts a command, but is unable to reply to Ironic (which sporadically happens before of the eventlet’s TLS implementation), we currently retry the request and fail because the command is already executing. Ironic now detects this situation by checking the list of executing commands after receiving a connection error. If the requested command is the last one, we assume that the command request succeeded.

  • When local boot is used (e.g. by default), the instance image validation now happens only in the deploy interface, not in the boot interface (as before). This means that the boot interface validation will now pass in many cases where it would previously fail.

  • Fixes an issue with the /v1/nodes/detail endpoint where requests for an explicit instance_uuid match would not follow the standard query handling path and thus not be filtered based on policy determined access level and node level owner or lessee fields appropriately. Additional information can be found in story 2008976.

  • No longer masks configdrive when sending the node’s record to in-band deploy steps.

  • Fixes handling of single-value (non-key-value) parameters in the [inspector]extra_kernel_params configuration options.

  • The behavior when a bootable iso ramdisk is provided behind an http server is to download and serve the image from the conductor; the image is removed only when the node is undeployed. In certain cases, for example on large deployments, this could cause undesired behaviors, like the conductor nodes running out of disk storage. To avoid this event we provide an option [deploy]ramdisk_image_download_source to be able to tell the ramdisk interface to directly use the bootable iso url from its original source instead of downloading it and serving it from the conductor node. The default behavior is unchanged.

  • Fixes sub-optimal Ironic API performance where Secure RBAC related field level policy checks were executing without first checking if there were field results. This helps improve API performance when only specific columns have been requested by the API consumer.

17.0.2

Bug Fixes

  • Fixes the idrac-wsman BIOS factory_reset clean and deploy step to indicate success and update the cached BIOS settings to their defaults only when the BIOS settings have actually been reset. See story 2008058 for more details.

  • Removes temporary cleaning information on starting or restarting cleaning.

  • Removes unnecessary delay before the start of the cleaning process when fast-track is used.

  • Correctly processes in-band deploy steps on fast-track deployment.

  • Correctly wipes agent token on inspection start and abort.

  • Fixes providing agent tokens with pre-built ISO images and the redfish-virtual-media boot interface.

17.0.0

Prelude

The Ironic community is proud to release Ironic 17.0!

Where if it were developer years instead of major versions, we would all be very afraid since it already has access to the car keys.

This release of Ironic includes numerous advancements which extend an operator’s ability to customize and further extend their deployment to meet their needs.

  • Redfish enhancements including Out of Band RAID configuration management and automatic setting of Secure Boot on nodes deployed using redfish.

  • Deployment enhancements including UEFI Partition Image handling, per-instance per-deployments of default interface selections, user requestable deploy_steps at deploy time, IPA file injection, and support for setting a node’s boot mode via instance_info.

  • Support for system scoped Role Based Access controls and project scoped access is available by default for associated nodes when the node owner or lessee fields are set. This effort alone added over 1,500 new unit tests.

  • Operator friendly fixes such as memory over-consumption guard for memory intensive tasks, vendor hardware aware handling to help address issues such as different settings being needed to invoke UEFI, and “lazy” loading of database attributes to reduce the overall database load.

Along with all of this massive amount of work, a number of bugs were fixed while we were along the road trip of this development cycle.

We sincerely hope you enjoy it!

New Features

  • It is now possible to configure a priority for both the delete and create configuration RAID cleaning steps which are disabled by default.

  • Adds import_configuration, export_configuration and import_export_configuration steps to idrac-redfish management interface. These steps allow to use configuration from another system as template and replicate that configuration to other, similarly capable, systems. Currently, this feature is experimental.

  • Adds support for passing a kernel_append_param setting to the ilo-virtual-media and ilo-uefi-https boot interfaces using the configuration parameter [ilo]/kernel_append_param with the ilo and ilo5 hardware types.

  • Adds support for the discovery of PXE Enabled NICs using the idrac-redfish inspect interface with the idrac hardware type. With this feature, a port’s pxe_enabled status will be recorded on the bare metal port.

  • Adds support to manage certificates to the ilo5 hardware type. A new optional boolean driver_info parameter ilo_add_certificates is introduced which can be used by the user to request addition of certificates to the iLO with ilo-uefi-https boot interface.

  • Adds the [deploy]enable_nvme_secure_erase option which allows the operator to enable NVMe format option for all nodes being managed by the conductor.

  • Add anaconda deploy interface to Ironic. This driver will deploy the OS using anaconda installer and kickstart file instead of IPA. To support this feature a new configuration group anaconda is added to Ironic configuration file along with default_ks_template configuration option.

    The deploy interface uses heartbeat API to communicate. The kickstart template must include %pre %post %onerror and %traceback sections that should send status of the deployment back to Ironic API using heartbeats. An example of such calls to hearbeat API can be found in the default kickstart template. To enable anaconda to send status back to Ironic API via heartbeat agent_status and agent_status_message are added to the heartbeat API. Use of these new parameters require API microversion 1.72 or greater.

  • Adds support for fast-tracking to ansible deploy interface.

  • Allows providing a list of IPMI cipher suite versions via the new configuration option [ipmi]/cipher_suite_versions. The configuration is only used when ipmi_cipher_suite is not set in driver_info.

  • Adds a new disable_ramdisk parameter to the manual cleaning API. If set to true, IPA won’t get booted for cleaning. Only steps explicitly marked as compatible can be executed this way.

    The parameter is available in the API version 1.70.

  • Provides operator ability to override URL settings required for provisioning/cleaning in the event of virtual media based deployment. These scenarios tend to require more delineation than more traditional deployments as they often have a different environmental security requirements. Set these two new configuration options using an IP address that is available to these nodes (both the ramdisk and the BMCs):

    [deploy]
    external_http_url = <routable URL of the HTTP server>
    external_callback_url = <routable URL of bare metal API>
    
  • Adds new GPU dynamic capabilities to ilo drivers inspection. gpu_<vendor>_count: Integer gpu_<gpu_device_name>_count: Integer gpu_<gpu_device_name>: Boolean

  • Enhance idrac-wsman inspect hardware interface to report an additional GPU device namely GV100GL [Tesla V100 PCIe 16GB]. With this enhancement, following GPU devices are reported

    • TU104GL [Tesla T4]

    • GV100GL [Tesla V100 PCIe 16GB]

  • Adds basic support for managing RAID configuration via the Redfish out-of-band (OOB) management protocol to the idrac hardware type by adding new interface named idrac-redfish. For this iDRAC firmware greater than 4.40.00.00 is required. The idrac hardware type now supports idrac-wsman, idrac, idrac-redfish, and no-raid interfaces in given priority order.

  • Allows node *_interface values to be overridden by values in a node instance_info field. This gives non-administrative users a temporary method of setting interface values.

  • The network data schema is now configurable via the new configuration options [api]network_data_schema.

  • Adds capability to use project scoped requests in concert with system scoped requests for a composite Role Based Access Control (RBAC) model. As Ironic is mainly an administrative service, this capability has only been extended to API endpoints which are not purely administrative in nature. This consists of the following API endpoints: nodes, ports, portgroups, volume connectors, volume targets, and allocations.

  • Project scoped requests for baremetal allocations, will automatically record the project_id of the requestor as the owner of the node.

  • Adds support for automatic creation of ports for redfish enabled bare metal nodes using prior to ironic-inspector introspection. This feature is a part of redfish management interface.

  • Supplying configuration to the agent using the redfish-virtual-media boot interface now works through USB instead of floppy by default. Modern hardware (and even virtual machines) has limited support for floppies.

  • Adds support for pre-built ISO images to the redfish-virtual-media boot interface and its derivatives.

  • Adds a redfish native raid_interface to the redfish hardware type. See story 2003514 for details.

    Note that common RAID cases have been tested, but cases that are more complex or rely on vendor-specific implementation details may not work as desired due to capability limitations.

  • Adds support for managing an iDRAC – reset, clear job queue, and reset to known good state – via the Redfish out-of-band (OOB) management protocol to the idrac hardware type. This is offered by new idrac-redfish management hardware interface implementation cleaning steps: reset_idrac, clear_job_queue, and known_good_state. known_good_state both resets an iDRAC and clears its job queue.

  • Adds [conductor]clean_step_priority_override configuration parameter which allows the operator to define a custom order in which the cleaning steps are to run.

  • The Baremetal API, provided by the ironic-api process, now supports use of system scoped keystone authentication for the following endpoints: nodes, ports, portgroups, chassis, drivers, driver vendor passthru, volume targets, volume connectors, conductors, allocations, events, deploy templates

  • Introduces lazy-loading of ports, portgroups, volume connections and volume targets in task manager. For periodic tasks which create a task manager object but don’t require the aforementioned data (e.g. power sync), this change should reduce the number of database interactions by around two thirds, speeding up overall execution.

  • Adds support for multipath volumes. If the volume properties have multiple portals, then it will generate multiple iscsi urls and append them together for use in the generated ipxe file.

Known Issues

  • The addition of both project and system scoped Role Based Access controls does add additional database queries when linked resources are accessed. Example, when attempting to access a port or portgroup, the associated node needs to be checked as this helps govern overall object access to the object for project scoped requests. This does not impact system scoped requests. Operators who adopt project scoped access may find it necessary to verify or add additional database indexes in relation to the node uuid column as well as node_id field in any table which may recieve heavy project query scope activity. The ironic project anticipates that this will be a future work item of the project to help improve database performance.

Upgrade Notes

  • The ilo-virtual-media and ilo-uefi-https boot interfaces does not use [pxe]pxe_append_params anymore. To pass kernel parameters use new configuration parameter [ilo]/kernel_append_param.

  • Legacy policy rules have been deprecated. Operators are advised to review and update any custom policy files in use. Please see Secure Role Based Access Controls for more information.

  • The functionality of using a port.extra vif_port_id value to signal and control a VIF attachment has been removed to support changing the permission model and access control policy. Use of vif_port_id outside of the VIF attachment/detachment workflow has been deprecated since the Ocata development cycle.

  • Deprecated policy rules are not expressed via a default policy file generation from the source code. The generated default policy file indicates the new default policies with notes on the deprecation to which oslo.policy falls back to, until the [oslo_policy]enforce_scope and [oslo_policy]enforce_new_defaults have been set to True. Please see the Victoria policy configuration documentation to reference prior policy configuration.

  • Operators are encouraged to move to system scope based authentication by setting [oslo_policy]enforce_scope and [oslo_policy]enforce_new_defaults. This requires a migration from using an admin project with the baremetal_admin and baremetal_observer. System wide administrators using system scoped admin and reader accounts superceed the deprecated model.

Deprecation Notes

  • Deprecates ATA specific agent_continue_if_ata_erase_failed agent’s option which is replaced with agent_continue_if_secure_erase_failed. The new option supports both ATA and NVMe secure erase. In order to ensure a smooth migration to the new configuration option, the operators need to upgrade Ironic Python Agent image to Wallaby release prior to upgrading Ironic Conductor to Xena.

  • Pre-RBAC support rules have been deprecated. These consist of:
    • admin_api

    • is_member

    • is_observer

    • is_node_owner

    • is_node_lessee

    • is_allocation_owner

    These rules will likely be removed in the Xena development cycle. Operators are advised to review any custom policy rules for these rules and move to the Secure Role Based Access Controls model.

  • The node’s driver_info parameter config_via_floppy of the redfish-virtual-media boot interface has been renamed to config_via_removable. The old alias is deprecated.

  • Use of an admin project with ironic is deprecated. With this the custom roles, baremetal_admin and baremetal_observer are also deprecated. Please migrate to using a system scoped account with the admin and reader roles, respectively.

Security Issues

  • Ability to create an allocation has been restricted by a new policy rule baremetal::allocation::create_pre_rbac which prevents creation of allocations by any project administrator when operating with the new Role Based Access Control model. The use and enforcement of this rule is disabled when [oslo_policy]enforce_new_defaults is set which also makes the population of a owner field for allocations to become automatically populated. Most deployments should not encounter any issues with this security change, and the policy rule will be removed when support for the legacy baremetal_admin custom role has been removed.

  • Fixes an issue where ironic was not properly labeling dynamicly built virtual media ramdisks with the signifier flag so the ramdisk understands it was booted from virtual media.

Bug Fixes

  • When using the Neutron DHCP driver, Ironic would only use the first fixed IP address to determine what IP versions are use on the port. Now, it checks for all the IP addresses and adds DHCP options for all IP versions.

  • Rejects configdrive that is not a JSON, a URL or a base64 string. Previously invalid JSON supplied to ironicclient could end up accepted as a configdrive, which would cause a failure much later.

  • Fixes the [deploy]configdrive_use_object_store option that was broken during the Python 3 transition.

  • Fixes the problem about grub2 config file. Some higher versions of grub2 (e.g. 2.05 or 2.06-rc1) use grub.cfg-01-MAC, while another lower versions of grub2 (e.g. 2.04) use MAC.conf, so we generate both paths in order to be compatible with both.

  • Fixes the missing boot_method ramdisk parameter for dynamicly build virtual media payloads. This value must be set to vmedia for the ramdisk running on virtual media to understand it is executing from virtual media. This was fixed for cases where it is used with the redfish-virtual-media based boot interfaces as well as the ilo-virtual-media boot interface, which is where dynamic virtual media deployment/cleaning ramdisk generation is supported.

  • Fixes idrac-wsman BIOS apply_configuration and factory_reset clean and deploy steps to fail correctly in case of error when checking completed jobs. Before the fix when BIOS job failed, then node clean or deploy failed with timeout instead of actual error in cleaning or deploying step.

  • Adds handling of Redfish BMC’s which lack a BootSourceOverrideMode flag, such that it is no longer a fatal error for a deployment if the BMC does not support this field. This most common on BMCs which feature only a partial implementation of the ComputerSystem resource boot, but may also be observable on some older generations of BMCs which recieved updates to have partial Redfish support.

  • The fix for story 2008252 synced the boot mode after changing the boot device because Supermicro nodes reset the boot mode if not included in the boot device set. However this can cause a problem on Dell nodes when changing the mode uefi->bios or bios->uefi, see story 2008712 for details. Restrict the syncing of the boot mode to Supermicro.

Other Notes

  • Clean steps can now be marked with requires_ramdisk=False to make them compatible with the new disable_ramdisk argument of the manual cleaning API.

  • The API version of the Bare Metal API provided by the ironic-api service has been incremented to 1.71 to signify that the API supports System and Project scoped Role Based Access Controls, which is purely informational in nature, as the version itself cannot be used to change the API behavior for access controls. In excess of 1500 unit tests were added as part of the effort to implement Role Based Access Controls to help ensure the effort did not break the API behavior.