Wallaby Series (16.1.0 - 17.0.x) Release Notes

17.1.0-6

Upgrade Notes

  • Adds sha256, sha384 and sha512 as supported SNMPv3 authentication protocols to iRMC driver.

Bug Fixes

  • Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s driver_info, and can be used for authentication:

    • irmc_snmp_user

    • irmc_snmp_auth_password

    • irmc_snmp_priv_password

    • irmc_snmp_auth_proto (Optional, defaults to sha)

    • irmc_snmp_priv_proto (Optional, defaults to aes)

    irmc_snmp_user replaces irmc_snmp_security. irmc_snmp_security will be ignored if irmc_snmp_user is set. irmc_snmp_auth_proto and irmc_snmp_priv_proto can also be set through the following options in the [irmc] section of /etc/ironic/ironic.conf:

    • snmp_auth_proto

    • snmp_priv_proto

  • Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an agent token had been established, which is the very first step in agent initialization.

Other Notes

  • Updates the minimum version of python-scciclient library to 0.10.1.

17.1.0

Upgrade Notes

  • On Wallaby release, to use certification file on HTTPS connection, iRMC driver requires python-scciclient version to be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0 or >=0.10.1,<0.11.0 and packaging >=16.5

Security Issues

  • Modifies the irmc hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0, or >=0.10.1,<0.11.0 Or certificate verification will not occur.

Bug Fixes

  • Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.

  • For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.

  • The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).

  • For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.

  • For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.

  • For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.

  • The anaconda deploy interface was saving internal information in the node’s instance_info, in the user-facing ‘stage2’ and ‘ks_template’ fields. This broke rebuilds using a different image with different stage2 or template specified in the image properties. This has been fixed by saving the information in the node’s driver_internal_info instead.

  • Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the redfish-virtual-media boot interface. Previously, the ISO would not be configured.

  • Fixes a bug in the anaconda deploy interface where the ‘ks_options’ key was not found when rendering the default kickstart template.

  • Fixes issue where PXEAnacondaDeploy interface’s deploy() method did not return states.DEPLOYWAIT so the instance went straight to ‘active’ instead of ‘wait call-back’.

  • Fixes an issue where the anaconda deploy interface mistakenly expected ‘squashfs_id’ instead of ‘stage2_id’ property on the image.

  • Fixes the heartbeat mechanism in the default kickstart template ks.cfg.template as the heartbeat API only accepts ‘POST’ and expects a mandatory ‘callback_url’ parameter.

  • Fixes handling of tarball images in anaconda deploy interface. Allows user specified file extensions to be appended to the disk image symlink. Users can now set the file extensions by setting the ‘disk_file_extension’ property on the OS image. This enables users to deploy tarballs with anaconda deploy interface.

  • Fixes issue where automated cleaning was not supported when anaconda deploy interface is used.

  • Fixed an issue where duplicate extra DHCP options was passed in the port update request to the Networking service. The duplicate DHCP options caused an error in the Networking service and node provisioning would fail. See bug: 2009774.

  • Fixes idrac-wsman management interface set_boot_device method that would fail deployment when there are existing jobs present with error “Failed to change power state to ‘’power on’’ by ‘’rebooting’’. Error: DRAC operation failed. Reason: Unfinished config jobs found: <list of existing jobs>. Make sure they are completed before retrying.”. Now there can be non-BIOS jobs present during deployment. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving to idrac-redfish that does not have this limitation when setting boot device.

  • Fixed an issue where provisioning/cleaning would fail on IPv6 routed provider networks. See bug: 2009773.

  • Fixes redfish and idrac-redfish RAID create_configuration, apply_configuration, delete_configuration clean and deploy steps to update node’s raid_config field at the end of the steps.

  • Fixes the determination of a failed RAID configuration task in the redfish hardware type. Prior to this fix the tasks that have failed were reported as successful.

  • Fixes the redfish hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to use redfish raid interface on iDRAC systems.

  • Fixes redfish-virtual-media boot interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work with redfish-virtual-media before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to use redfish-virtual-media with iDRAC.

  • Fixes an issue where clients would get a 404 due to the node pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes an issue where clients would get a 404 due to the port and portgroups pagination breaking at max_limit due to an uninitialised resource_url.

  • Fixes File name too long in the image caching code when a URL contains a long query string.

  • Fixes the initrd kernel parameter when booting ramdisk directly from Swift/RadosGW using iPXE. Previously it was always deploy_ramdisk, even when the actual file name is different.

  • Adds driver_info/irmc_verify_ca option to specify certification file. Default value of driver_info/irmc_verify_ca is True.

  • Fixes an issue with installation of Ansible in driver-requirements.txt on Python 3.8. Since the release of Ansible 6.0.0, significant backtracking occurred in the Pip resolver.

  • Fixes connection caching issues with Redfish BMCs where AccessErrors were previously not disqualifying the cached connection from being re-used. Ironic will now explicitly open a new connection instead of using the previous connection in the cache. Under normal circumstances, the sushy redfish library would detect and refresh sessions, however a prior case exists where it may not detect a failure and contain cached session credential data which is ultimately invalid, blocking future access to the BMC via Redfish until the cache entry expired or the ironic-conductor service was restarted. For more information please see story 2009719.

17.0.4

Upgrade Notes

  • The query pattern for the database when lists of nodes are retrieved has been changed to a more efficient pattern at scale, where a list of nodes is generated, and then additional queries are executed to composite this data together. This is from a model where the database client in the conductor was having to deduplicate the resulting data set which is overall less efficent.

Critical Issues

  • Fixes upgrade failure caused by the missing version of BIOSSetting database objects.

Bug Fixes

  • Skips port creation during redfish inspect for devices reported without a MAC address.

  • Fixes potential cache coherency issues by caching the AgentClient per task, rather than globally.

  • Fixes a regression in the ramdisk deploy where custom kernel parameters were not used during inspection and cleaning.

  • Slow database retrieval of nodes has been addressed at the lower layer by explicitly passing and handling only the requested fields. The result is excess discarded work is not performed, making the overall process more efficent. This is particullarly beneficial for OpenStack Nova’s syncronization with Ironic.

  • Fixes configuring Redfish RAID using interface_type when error “failed to find matching physical disks for all logical disks” occurs.

  • Fixes issue in idrac-redfish clean/deploy step import_configuration where partially successful jobs were treated as fully successful. Such jobs, completed with errors, are now treated as failures.

  • Fix idrac-redfish clean/deploy step import_configuration to handle completed import configuration tasks that are deleted by iDRAC before Ironic has checked task’s status. Prior iDRAC firmware version 5.00.00.00 completed tasks are deleted after 1 minute in iDRAC Redfish. That is not always sufficient to check for their status in periodic check that runs every minute by default. Before this fix node got stuck in wait mode forever. This is fixed by failing the step with error informing to decrease periodic check interval or upgrade iDRAC firmware if not done already.

  • Fixes idrac-wsman BIOS and RAID interface steps to correctly check status of iDRAC job that completed with errors. Now these jobs are treated as failures. Before this fix node stayed in wait state as it was only checking for “Completed” or “Failed” job status, but not “Completed with Errors”.

  • Fixes idrac-wsman power interface to wait for the hardware to reach the target state before returning. For systems where soft power off at the end of deployment to boot to instance failed and forced hard power off was used, this left node successfully deployed in off state without any errors. This broke other workflows expecting node to be on booted into OS at the end of deployment. Additional information can be found in story 2009204.

  • When an http(s):// image is used, the cached copy of the image will always be updated if the HTTP server does not provide the last modification date and time. Previously the cached image would be considered up-to-date, which could cause invalid behavior if the image is generated on fly or was modified while being served.

  • Improves record retrieval performance for baremetal nodes by enabling ironic to not make redundant calls as part of generating API result sets for the baremetal nodes endpoint.

  • Fixes the pattern of execution for periodic tasks such that the majority of drivers now evaluate if work needs to be performed in advance of creating a node task. Depending on the individual driver query pattern, this prevents excess database queries from being triggered with every task execution.

  • Removes unused local images after ejecting a virtual media device via the eject_vmedia vendor passthru call of the redfish vendor interface.

  • In Redfish RAID clean and deploy steps skip non-RAID storage controllers for RAID operations. In Redfish systems that do not implement SupportedRAIDTypes they are still processed and could result in unexpected errors.

  • Retries ssl.SSLError when connecting to the agent.

  • Fixes an issue of powering off with the idrac-wsman management interface while the execution of a clear job queue cleaning step is proceeding. Prior to this fix, the clean step would fail when powering off a node.

Other Notes

  • The default database query pattern has been changed which will result in additional database queries when compositing lists of nodes by separately querying traits and tags. Previously this was a joined query which requires deduplication of the result set before building composite objects.

17.0.3

Security Issues

  • Fixes an issue with the /v1/nodes/detail endpoint where an authenticated user could explicitly ask for an instance_uuid lookup and the associated node would be returned to the user with sensitive fields redacted in the result payload if the user did not explicitly have owner or lessee permissions over the node. This is considered a low-impact low-risk issue as it requires the API consumer to already know the UUID value of the associated instance, and the returned information is mainly metadata in nature. More information can be found in Storyboard story 2008976.

Bug Fixes

  • If the agent accepts a command, but is unable to reply to Ironic (which sporadically happens before of the eventlet’s TLS implementation), we currently retry the request and fail because the command is already executing. Ironic now detects this situation by checking the list of executing commands after receiving a connection error. If the requested command is the last one, we assume that the command request succeeded.

  • When local boot is used (e.g. by default), the instance image validation now happens only in the deploy interface, not in the boot interface (as before). This means that the boot interface validation will now pass in many cases where it would previously fail.

  • Fixes an issue with the /v1/nodes/detail endpoint where requests for an explicit instance_uuid match would not follow the standard query handling path and thus not be filtered based on policy determined access level and node level owner or lessee fields appropriately. Additional information can be found in story 2008976.

  • No longer masks configdrive when sending the node’s record to in-band deploy steps.

  • Fixes handling of single-value (non-key-value) parameters in the [inspector]extra_kernel_params configuration options.

  • The behavior when a bootable iso ramdisk is provided behind an http server is to download and serve the image from the conductor; the image is removed only when the node is undeployed. In certain cases, for example on large deployments, this could cause undesired behaviors, like the conductor nodes running out of disk storage. To avoid this event we provide an option [deploy]ramdisk_image_download_source to be able to tell the ramdisk interface to directly use the bootable iso url from its original source instead of downloading it and serving it from the conductor node. The default behavior is unchanged.

  • Fixes sub-optimal Ironic API performance where Secure RBAC related field level policy checks were executing without first checking if there were field results. This helps improve API performance when only specific columns have been requested by the API consumer.

17.0.2

Bug Fixes

  • Fixes the idrac-wsman BIOS factory_reset clean and deploy step to indicate success and update the cached BIOS settings to their defaults only when the BIOS settings have actually been reset. See story 2008058 for more details.

  • Removes temporary cleaning information on starting or restarting cleaning.

  • Removes unnecessary delay before the start of the cleaning process when fast-track is used.

  • Correctly processes in-band deploy steps on fast-track deployment.

  • Correctly wipes agent token on inspection start and abort.

  • Fixes providing agent tokens with pre-built ISO images and the redfish-virtual-media boot interface.

17.0.0

Prelude

The Ironic community is proud to release Ironic 17.0!

Where if it were developer years instead of major versions, we would all be very afraid since it already has access to the car keys.

This release of Ironic includes numerous advancements which extend an operator’s ability to customize and further extend their deployment to meet their needs.

  • Redfish enhancements including Out of Band RAID configuration management and automatic setting of Secure Boot on nodes deployed using redfish.

  • Deployment enhancements including UEFI Partition Image handling, per-instance per-deployments of default interface selections, user requestable deploy_steps at deploy time, IPA file injection, and support for setting a node’s boot mode via instance_info.

  • Support for system scoped Role Based Access controls and project scoped access is available by default for associated nodes when the node owner or lessee fields are set. This effort alone added over 1,500 new unit tests.

  • Operator friendly fixes such as memory over-consumption guard for memory intensive tasks, vendor hardware aware handling to help address issues such as different settings being needed to invoke UEFI, and “lazy” loading of database attributes to reduce the overall database load.

Along with all of this massive amount of work, a number of bugs were fixed while we were along the road trip of this development cycle.

We sincerely hope you enjoy it!

New Features

  • It is now possible to configure a priority for both the delete and create configuration RAID cleaning steps which are disabled by default.

  • Adds import_configuration, export_configuration and import_export_configuration steps to idrac-redfish management interface. These steps allow to use configuration from another system as template and replicate that configuration to other, similarly capable, systems. Currently, this feature is experimental.

  • Adds support for passing a kernel_append_param setting to the ilo-virtual-media and ilo-uefi-https boot interfaces using the configuration parameter [ilo]/kernel_append_param with the ilo and ilo5 hardware types.

  • Adds support for the discovery of PXE Enabled NICs using the idrac-redfish inspect interface with the idrac hardware type. With this feature, a port’s pxe_enabled status will be recorded on the bare metal port.

  • Adds support to manage certificates to the ilo5 hardware type. A new optional boolean driver_info parameter ilo_add_certificates is introduced which can be used by the user to request addition of certificates to the iLO with ilo-uefi-https boot interface.

  • Adds the [deploy]enable_nvme_secure_erase option which allows the operator to enable NVMe format option for all nodes being managed by the conductor.

  • Add anaconda deploy interface to Ironic. This driver will deploy the OS using anaconda installer and kickstart file instead of IPA. To support this feature a new configuration group anaconda is added to Ironic configuration file along with default_ks_template configuration option.

    The deploy interface uses heartbeat API to communicate. The kickstart template must include %pre %post %onerror and %traceback sections that should send status of the deployment back to Ironic API using heartbeats. An example of such calls to hearbeat API can be found in the default kickstart template. To enable anaconda to send status back to Ironic API via heartbeat agent_status and agent_status_message are added to the heartbeat API. Use of these new parameters require API microversion 1.72 or greater.

  • Adds support for fast-tracking to ansible deploy interface.

  • Allows providing a list of IPMI cipher suite versions via the new configuration option [ipmi]/cipher_suite_versions. The configuration is only used when ipmi_cipher_suite is not set in driver_info.

  • Adds a new disable_ramdisk parameter to the manual cleaning API. If set to true, IPA won’t get booted for cleaning. Only steps explicitly marked as compatible can be executed this way.

    The parameter is available in the API version 1.70.

  • Provides operator ability to override URL settings required for provisioning/cleaning in the event of virtual media based deployment. These scenarios tend to require more delineation than more traditional deployments as they often have a different environmental security requirements. Set these two new configuration options using an IP address that is available to these nodes (both the ramdisk and the BMCs):

    [deploy]
    external_http_url = <routable URL of the HTTP server>
    external_callback_url = <routable URL of bare metal API>
    
  • Adds new GPU dynamic capabilities to ilo drivers inspection. gpu_<vendor>_count: Integer gpu_<gpu_device_name>_count: Integer gpu_<gpu_device_name>: Boolean

  • Enhance idrac-wsman inspect hardware interface to report an additional GPU device namely GV100GL [Tesla V100 PCIe 16GB]. With this enhancement, following GPU devices are reported

    • TU104GL [Tesla T4]

    • GV100GL [Tesla V100 PCIe 16GB]

  • Adds basic support for managing RAID configuration via the Redfish out-of-band (OOB) management protocol to the idrac hardware type by adding new interface named idrac-redfish. For this iDRAC firmware greater than 4.40.00.00 is required. The idrac hardware type now supports idrac-wsman, idrac, idrac-redfish, and no-raid interfaces in given priority order.

  • Allows node *_interface values to be overridden by values in a node instance_info field. This gives non-administrative users a temporary method of setting interface values.

  • The network data schema is now configurable via the new configuration options [api]network_data_schema.

  • Adds capability to use project scoped requests in concert with system scoped requests for a composite Role Based Access Control (RBAC) model. As Ironic is mainly an administrative service, this capability has only been extended to API endpoints which are not purely administrative in nature. This consists of the following API endpoints: nodes, ports, portgroups, volume connectors, volume targets, and allocations.

  • Project scoped requests for baremetal allocations, will automatically record the project_id of the requestor as the owner of the node.

  • Adds support for automatic creation of ports for redfish enabled bare metal nodes using prior to ironic-inspector introspection. This feature is a part of redfish management interface.

  • Supplying configuration to the agent using the redfish-virtual-media boot interface now works through USB instead of floppy by default. Modern hardware (and even virtual machines) has limited support for floppies.

  • Adds support for pre-built ISO images to the redfish-virtual-media boot interface and its derivatives.

  • Adds a redfish native raid_interface to the redfish hardware type. See story 2003514 for details.

    Note that common RAID cases have been tested, but cases that are more complex or rely on vendor-specific implementation details may not work as desired due to capability limitations.

  • Adds support for managing an iDRAC – reset, clear job queue, and reset to known good state – via the Redfish out-of-band (OOB) management protocol to the idrac hardware type. This is offered by new idrac-redfish management hardware interface implementation cleaning steps: reset_idrac, clear_job_queue, and known_good_state. known_good_state both resets an iDRAC and clears its job queue.

  • Adds [conductor]clean_step_priority_override configuration parameter which allows the operator to define a custom order in which the cleaning steps are to run.

  • The Baremetal API, provided by the ironic-api process, now supports use of system scoped keystone authentication for the following endpoints: nodes, ports, portgroups, chassis, drivers, driver vendor passthru, volume targets, volume connectors, conductors, allocations, events, deploy templates

  • Introduces lazy-loading of ports, portgroups, volume connections and volume targets in task manager. For periodic tasks which create a task manager object but don’t require the aforementioned data (e.g. power sync), this change should reduce the number of database interactions by around two thirds, speeding up overall execution.

  • Adds support for multipath volumes. If the volume properties have multiple portals, then it will generate multiple iscsi urls and append them together for use in the generated ipxe file.

Known Issues

  • The addition of both project and system scoped Role Based Access controls does add additional database queries when linked resources are accessed. Example, when attempting to access a port or portgroup, the associated node needs to be checked as this helps govern overall object access to the object for project scoped requests. This does not impact system scoped requests. Operators who adopt project scoped access may find it necessary to verify or add additional database indexes in relation to the node uuid column as well as node_id field in any table which may recieve heavy project query scope activity. The ironic project anticipates that this will be a future work item of the project to help improve database performance.

Upgrade Notes

  • The ilo-virtual-media and ilo-uefi-https boot interfaces does not use [pxe]pxe_append_params anymore. To pass kernel parameters use new configuration parameter [ilo]/kernel_append_param.

  • Legacy policy rules have been deprecated. Operators are advised to review and update any custom policy files in use. Please see Secure Role Based Access Controls for more information.

  • The functionality of using a port.extra vif_port_id value to signal and control a VIF attachment has been removed to support changing the permission model and access control policy. Use of vif_port_id outside of the VIF attachment/detachment workflow has been deprecated since the Ocata development cycle.

  • Deprecated policy rules are not expressed via a default policy file generation from the source code. The generated default policy file indicates the new default policies with notes on the deprecation to which oslo.policy falls back to, until the [oslo_policy]enforce_scope and [oslo_policy]enforce_new_defaults have been set to True. Please see the Victoria policy configuration documentation to reference prior policy configuration.

  • Operators are encouraged to move to system scope based authentication by setting [oslo_policy]enforce_scope and [oslo_policy]enforce_new_defaults. This requires a migration from using an admin project with the baremetal_admin and baremetal_observer. System wide administrators using system scoped admin and reader accounts superceed the deprecated model.

Deprecation Notes

  • Deprecates ATA specific agent_continue_if_ata_erase_failed agent’s option which is replaced with agent_continue_if_secure_erase_failed. The new option supports both ATA and NVMe secure erase. In order to ensure a smooth migration to the new configuration option, the operators need to upgrade Ironic Python Agent image to Wallaby release prior to upgrading Ironic Conductor to Xena.

  • Pre-RBAC support rules have been deprecated. These consist of:
    • admin_api

    • is_member

    • is_observer

    • is_node_owner

    • is_node_lessee

    • is_allocation_owner

    These rules will likely be removed in the Xena development cycle. Operators are advised to review any custom policy rules for these rules and move to the Secure Role Based Access Controls model.

  • The node’s driver_info parameter config_via_floppy of the redfish-virtual-media boot interface has been renamed to config_via_removable. The old alias is deprecated.

  • Use of an admin project with ironic is deprecated. With this the custom roles, baremetal_admin and baremetal_observer are also deprecated. Please migrate to using a system scoped account with the admin and reader roles, respectively.

Security Issues

  • Ability to create an allocation has been restricted by a new policy rule baremetal::allocation::create_pre_rbac which prevents creation of allocations by any project administrator when operating with the new Role Based Access Control model. The use and enforcement of this rule is disabled when [oslo_policy]enforce_new_defaults is set which also makes the population of a owner field for allocations to become automatically populated. Most deployments should not encounter any issues with this security change, and the policy rule will be removed when support for the legacy baremetal_admin custom role has been removed.

  • Fixes an issue where ironic was not properly labeling dynamicly built virtual media ramdisks with the signifier flag so the ramdisk understands it was booted from virtual media.

Bug Fixes

  • When using the Neutron DHCP driver, Ironic would only use the first fixed IP address to determine what IP versions are use on the port. Now, it checks for all the IP addresses and adds DHCP options for all IP versions.

  • Rejects configdrive that is not a JSON, a URL or a base64 string. Previously invalid JSON supplied to ironicclient could end up accepted as a configdrive, which would cause a failure much later.

  • Fixes the [deploy]configdrive_use_object_store option that was broken during the Python 3 transition.

  • Fixes the problem about grub2 config file. Some higher versions of grub2 (e.g. 2.05 or 2.06-rc1) use grub.cfg-01-MAC, while another lower versions of grub2 (e.g. 2.04) use MAC.conf, so we generate both paths in order to be compatible with both.

  • Fixes the missing boot_method ramdisk parameter for dynamicly build virtual media payloads. This value must be set to vmedia for the ramdisk running on virtual media to understand it is executing from virtual media. This was fixed for cases where it is used with the redfish-virtual-media based boot interfaces as well as the ilo-virtual-media boot interface, which is where dynamic virtual media deployment/cleaning ramdisk generation is supported.

  • Fixes idrac-wsman BIOS apply_configuration and factory_reset clean and deploy steps to fail correctly in case of error when checking completed jobs. Before the fix when BIOS job failed, then node clean or deploy failed with timeout instead of actual error in cleaning or deploying step.

  • Adds handling of Redfish BMC’s which lack a BootSourceOverrideMode flag, such that it is no longer a fatal error for a deployment if the BMC does not support this field. This most common on BMCs which feature only a partial implementation of the ComputerSystem resource boot, but may also be observable on some older generations of BMCs which recieved updates to have partial Redfish support.

  • The fix for story 2008252 synced the boot mode after changing the boot device because Supermicro nodes reset the boot mode if not included in the boot device set. However this can cause a problem on Dell nodes when changing the mode uefi->bios or bios->uefi, see story 2008712 for details. Restrict the syncing of the boot mode to Supermicro.

Other Notes

  • Clean steps can now be marked with requires_ramdisk=False to make them compatible with the new disable_ramdisk argument of the manual cleaning API.

  • The API version of the Bare Metal API provided by the ironic-api service has been incremented to 1.71 to signify that the API supports System and Project scoped Role Based Access Controls, which is purely informational in nature, as the version itself cannot be used to change the API behavior for access controls. In excess of 1500 unit tests were added as part of the effort to implement Role Based Access Controls to help ensure the effort did not break the API behavior.

16.2.0

New Features

  • Adds support for deploy_steps parameter to provisioning endpoint /v1/nodes/{node_ident}/states/provision. Available and optional when target is ‘active’ or ‘rebuild’. When overlapping, these steps override deploy template and driver steps. deploy_steps is a list of dictionaries with required keys ‘interface’, ‘step’, ‘priority’ and ‘args’.

  • By default Ironic will now not start new memory intensive work IF insufficent system memory exists. This can be disabled by setting the [DEFAULT]minimum_memory_warning_only value to True.

  • The force_persistent_boot_device parameter now consistently applies to all boot interfaces, rather than only PXE and iPXE.

  • Supports setting boot mode via an instance_info capability.

  • The ironic-conductor process now has a concept of an internal memory limit. The intent of this is to prevent the conductor from running the host out of memory when a large number of deployments have been requested.

    These settings can be tuned using [DEFAULT]minimum_required_memory, [DEFAULT]mimimum_memory_wait_time, [DEFAULT]minimum_memory_wait_retries, and [DEFAULT]minimum_memory_warning_only.

    Where possible, Ironic will attempt to wait out the time window, thus consuming the conductor worker thread which will resume if the memory becomes available. This will effectively rate limit concurrency.

    If raw image conversions with-in the conductor is required, and a situation exists where insufficent memory exists and it cannot be waited, the deployment operation will fail. For the iscsi deployment interface, which is the other location in ironic that may consume large amounts of memory, the conductor will wait until the next agent heartbeat.

  • Supports attaching configdrives when doing ramdisk deploy with the redfish-virtual-media boot. A configdrive is attached to a free USB slot.

  • Adds the [DEFAULT]raw_image_growth_factor configuration option which is a scale factor used for estimating the size of a raw image converted from compact image formats such as QCOW2. By default this is set to 2.0.

    When clearing the cache to make space for a converted raw image, the full virtual size is attempted first, and if not enough space is available a second attempt is made with the (smaller) estimated size.

  • Adds support for automatically configuring secure boot for nodes using the redfish management interface.

  • The pxe and ipxe boot interfaces now automatically configure secure boot if the management interface supports it.

Upgrade Notes

  • The default value of [oslo_policy]policy_file config option has been changed from policy.json to policy.yaml. Operators who are utilizing customized policy files or previously generated static policy files (which are not needed by default), should generate new policy files and modify them to meet their needs in the event of any new policies or rules have been added. Please consult the oslopolicy-convert-json-to-yaml tool to convert a JSON to YAML formatted policy file in backward compatible way.

Deprecation Notes

  • Use of legacy policy format was deprecated by the oslo.policy library during the Victoria development cycle. As a result, this deprecation is being noted in the Wallaby with an anticipated future removal of support by oslo.policy. As such operators will need to convert to YAML policy files. Please see the upgrade notes for details on migration of any custom policy files.

  • Using instance_info/deploy_boot_mode is deprecated, use the boot_mode capability in instance_info/capabilities instead.

  • Currently the bare metal API permits setting the secure_boot capability for nodes, which driver does not support setting secure boot. This is deprecated and will become a failure in the Xena cycle.

Bug Fixes

  • Fixes fast-track to prevent marking the agent as alive if trying to rebuild a node before the fast-track timeout has expired.

  • Fixes redfish firmware update for ilo5 hardware type by fixing the Redfish task message detection and correctly preparing the ramdisk before rebooting.

  • Boot mode is now correctly handled when using redfish-virtual-media boot with locally booted images.

  • The redfish-virtual-media boot interface now makes fewer calls to the BMC when preparing boot.

  • The redfish-virtual-media boot interface no longer passes validation for Dell nodes. The idrac-redfish-virtual-media boot interface must be used for these nodes instead.

  • Failed cleaning no longer results in maintenance mode if no clean step is running, e.g. on PXE timeout or failed clean steps validation.

  • Retries virtual media insert on failure to allow for an eject that may not have finished (see story 2008504).

  • When Ironic configures the BootSourceOverrideTarget setting via Redfish, on Supermicro BMCs it must always configure BootSourceOverrideEnabled or that will revert to default (Once) on the BMC, see story 2008547 for details. This is different than what is currently implemented for other BMCs in which the BootSourceOverrideEnabled is not configured if it matches the current setting (see story 2007355).

    This requires that node.properties['vendor'] be supermicro which will be set on transition to manageable based on the Redfish system object or can be set manually.

Other Notes

  • Register all conductor hardware interfaces together. Adds all conductor hardware interfaces in to the database in a single transaction and to allow this update the register_hardware_interfaces API. This allows Restful API consumers to understand if the conductor is fully on-line via the presence of driver entries. Previously this was done one driver at a time.

  • Extends ManagementInterface with two new calls: get_secure_boot_state and set_secure_boot_state. They are optional and may be implemented for hardware that supports dynamically enabling/disabling secure boot.

16.1.0

New Features

  • Allows disabling automated cleaning per node if it is enabled globally. An existing automated_clean field will allow disabling of automated cleaning on the node object. A new baremetal:node:disable_cleaning policy is added which defaults to baremetal:node:update.

  • Retrieves BIOS configuration settings when moving a node to manageable. This allows the settings to be used when choosing which node to deploy. For more details, see story 2008326.

  • When deploying a node with software RAID with an image not from Glance, the new instance_info field image_rootfs_uuid can be used to specify the UUID of the root partition to install the bootloader on.

  • The ramdisk log file name now contains the node name when it is set.

  • Provides a new vendor passthru method for Redfish to eject a virtual_media device. A specific device can be given (either cd, dvd, floppy, or usb), or if no device is provided then all attached devices will be ejected.

  • A new option [agent]api_ca_file allows passing a CA file to the ramdisk when redfish-virtual-media boot is used. Requires ironic-python-agent from the Wallaby cycle.

Known Issues

  • Building ramdisks for DHCP-less deploy using the simple-init element is known not to work for distributions using NetworkManager. The debian-minimal element seems to work.

  • When redfish-virtual-media is used, fast-track mode will not work as expected, nodes will be rebooted between operations.

Upgrade Notes

  • The default value of [api]api_workers is now limited to 4. Set it explicitly if you need a higher value.

  • An automated detection of a IPMI BMC hardware vendor has been added to appropriately handle IPMI BMC variations. Ironic will now query this and save this value if not already set in order to avoid querying for every single operation. Operators upgrading should expect an elongated first power state synchronization if for nodes with the ipmi hardware type.

  • The agent RAID interface now removes any root device hint after the RAID configuration is successfully deleted.

Bug Fixes

  • No longer launches too many API workers on systems with a lot of CPU cores by default.

  • Fixes the logic which determines the partition table type to utilize with partition images account for the boot mode of the machine. If no value is set by the API user, Ironic now correctly defaults to GPT if the node has been set in UEFI mode.

  • It is no longer possible to set a port’s physical_network to an empty string, making the port unusable.

  • Fixes recognition of a busy agent to also handle recognition during deployment steps by more uniformly detecting and identifying when the ironic-python-agent service is busy.

  • Fixes inspection with the idrac-redfish-virtual-media boot interface.

  • Correctly handles the node’s custom network data when the noop network interface is used. Previously it was ignored.

  • Fixes incorrect injected network data location when using virtual media.

  • Fixes redfish BIOS apply_configuration clean and deploy step to fail correctly in case of error when checking if BIOS updates are successfully applied. Before the fix when BIOS updates were unsuccessful, then node cleaning or deploying failed with timeout instead of actual error in clean or deploy step.

  • Fixes idrac-wsman RAID create_configuration clean step, apply_configuration deploy step and delete_configuration clean and deploy step to fail correctly in case of error when checking completed jobs. Before the fix when RAID job failed, then node cleaning or deploying failed with timeout instead of actual error in clean or deploy step.

  • Fixes issues when UEFI boot mode has been requested with persistent boot to DISK where some versions of ipmitool do not properly handle multiple options being set at the same time. While some of this logic was addressed in upstream ipmitool development, new versions are not released and vendors maintain downstream forks of the ipmitool utility. When considering vendor specific selector differences along with the current stance of new versions from the upstream ipmitool community, it only made sense to handle this logic with-in Ironic. In part this was because if already set the selector value would not be updated. Now ironic always transmits the selector value for UEFI.

  • Fixes handling of Supermicro UEFI supporting BMCs with the ipmi hardware type such that an appropriate boot device selector value is sent to the remote BMC to indicate boot from local storage. This is available for both persistent and one-time boot applications. For more information, please consult story 2008241.

  • Fixes handling of the ipmi hardware type where UEFI boot mode and “one-time” boot to PXE has been requested. As Ironic now specifically transmits the raw commands, this setting should be properly appied where previously PXE boot operations may have previously occured in Legacy BIOS mode.

  • Calculating the ipmitool -N and -R arguments from the configuration options [ipmi]command_retry_timeout and [ipmi]min_command_interval now takes into account the 1 second interval increment that ipmitool adds on each retry event.

    Failure-path ipmitool run duration will now be just less than command_retry_timeout instead of much longer.

  • When configured to use JSON RPC, the [DEFAULT]host configuration option can now be set to an IPv6 address. Previously it could only be an IPv4 address or a DNS name.

  • No longer tries to pass BOOTIF=None as a kernel parameter when using virtual media. This could break inspection.

  • Fixes the issue that when the MAC address of a port group is not set and been attached to instance, the landed bond port cannot get IP address due to inconsistent MAC address between the tenant port and the initially allocated one in the config drive.

  • Fixes the issue that root device hint is not removed after the agent RAID interface has successfully deleted RAID configuration. The previous hint is not guranteed to be valid and may cause a deployment failure.

  • Fixes cleaning with the ramdisk deploy interface by reusing the same procedure as for the direct deploy interface.

  • Fixes a bug where a conductor could fail to complete a deployment if there was contention on a shared lock. This would manifest as an instance being stuck in the “deploying” state, though the node had in fact started or even completed its final boot.

  • After changing the boot device via Redfish, checks that the boot mode being reported matches what is configured and, if not, sets it to the configured value. Some BMCs change the boot mode when the device is set via Redfish, see story 2008252 for details.

  • Fixes wiping agent token on rebooting via API.

  • Adds secure boot support to ilo-uefi-https boot interface. Secure boot support already exists for other boot interfaces but missing for this interface.

  • The virtual media ISO image building process now respects the default_boot_mode configuration option.

  • Fixes timeout in fast-track mode with redfish-virtual-media when running one operation after another (e.g. cleaning after inspection).

  • Fixes permission issues when injecting network data into a virtual media.

  • Adds timeout to HTTP image validation and downloading operations, so that the direct deploy does not hang when the remote server is not responsive. The default timeout is 60 seconds and can be changed via the new webserver_connection_timeout option.

Other Notes

  • Adds a detect_vendor management interface method to the ipmi hardware type. This method is being promoted as a higher level interface as the fundimental need to be able to have logic aware of the hardware vendor is necessary with vendor agnostic drivers where slight differences require slightly different behavior.

  • The configdrive argument to some utils in ironic.common.images and ironic.drivers.modules.image_utils has been replaced with a new inject_files argument. The previous approach did not really work in all situations and we don’t expect 3rd party drivers to use it.