Rocky Series (11.0.0 - 11.1.x) Release Notes

11.1.3-9

Bug Fixes

  • Fixes a deployment issue encountered during deployment, more precisely during the configdrive partition creation step. On some specific devices like NVMe drives, the created configdrive partition could not be correctly identified (required to dump data onto it afterward). https://storyboard.openstack.org/#!/story/2005764

  • Fixes an issue with using serial number as root device hints with the ansible deploy interface.

  • Fixes an issue regarding the ansible deploy interface. Node deployment was broken for any image that was not public because the original request context was not available anymore at the time some image information was fetched.

  • Fixes an issue where baremetal node deployment would fail on clouds with a high number of security groups. Listing the security groups took too long. Instead of listing all security groups, a query filter was added to list only the security groups to be used for the network. (See bug 2006256.)

  • Fixes an issue in updating firmware using update_firmware_sum clean step from management interface of ilo hardware type with an error stating that unable to connect to iLO address due to authentication failure. See story 2006223 for details.

11.1.3

Deprecation Notes

  • Using the fake management interface with the manual-management hardware type is deprecated, please use noop instead. Existing nodes will have to be updated after the upgrade.

Bug Fixes

  • Fixes an issue regarding the ansible deployment interface cleaning workflow. Handling the error in the driver and returning nothing caused the manager to consider the step done and go to the next one instead of interrupting the cleaning workflow.

  • Fixes an issue with the ansible deployment interface where raw images could not be streamed correctly to the host.

  • Fixes deployment with the ansible deploy interface and instance images with GPT partition table.

  • Fixes an issue where the sensor data parsing method for the ipmitool interface lacked the ability to handle the automatically included ipmitool debugging information when the debug option is set to True in the ironic.conf file. As such, extra debugging information supplied by the underlying ipmitool command is disregarded. More information can be found in story 2005331.

  • Fixes an issue where deploy fails during node preparation if the node capabilities are passed as string.

  • Fixes an issue for validating checksum when trying to calculate the actual checksum and failing with UnicodeDecode Error. The fix uses the oslo_utils library for calculating the actual checksum.

  • The manual-management hardware type now defaults to the noop management interface. Unlike the fake management interface, it does not fail on attempt to set the boot device to the local disk.

  • Fixes a bug where cinder block storage service volumes volume fail to attach expecting a mountpoint to be a valid string. See story 2004864 for additional information.

  • Returns the correct error message on providing an invalid reference to image_source. Previously an internal error was raised.

  • Reverts the fix to the idrac hardware type creating port objects during inspection with pxe_enabled fields not set to reflect the configuration of the physical ports. It is inconsistent with the stable branch policy [1]. It requires python-dracclient version 1.5.0 and greater; however, driver-requirements.txt specifies version 1.3.0 and greater can be used on this branch.

    [1] https://docs.openstack.org/project-team-guide/stable-branches.html

11.1.2

Bug Fixes

  • A bug has been fixed in the node update code that could cause the nodes to become not updatable if their driver is no longer available.

  • Fixes an issue where the master instance image cache could not be disabled. The configuration option [pxe]/instance_master_path may now be set to the empty string to disable the cache.

  • Fixes an issue where the master TFTP image cache could not be disbled. The configuration option [pxe]/tftp_master_path may now be set to the empty string to disable the cache. For more information, see story 2004608.

  • Fixes a bug where ironic port is not updated in node introspection as per PXE enabled setting for idrac hardware type. See bug 2004340 for details.

11.1.1

New Features

  • Setting these configuration options to 0 will disable the periodic tasks:

    • [conductor]sync_power_state_interval: sync power states for the nodes

    • [conductor]check_provision_state_interval:

      • check deployments and time out if the deployment takes too long

      • check the status of cleaning a node and time out if it takes too long

      • check the status of inspecting a node and time out if it takes too long

      • check for and handle nodes that are taken over by new conductors (if an old conductor disappeared)

    • [conductor]send_sensor_data_interval: send sensor data to ceilometer

    • [conductor]sync_local_state_interval: refresh a conductor’s copy of the consistent hash ring. If any mappings have changed, determines which, if any, nodes need to be “taken over”. The ensuing actions could include preparing a PXE environment, updating the DHCP server, and so on.

    • [oneview]periodic_check_interval:

      • check for nodes taken over by OneView users

      • check for nodes freed by OneView users

Known Issues

  • Building RAID1 is known to not work with Dell BOSS cards using python-dracclient 1.4.0 or earlier. Upgrade to python-dracclient 1.5.0 to use this feature.

Upgrade Notes

  • The hash_ring_reset_interval configuration option was changed from 180 to 15 seconds. Previously, this option was essentially ignored on the API side, becase the hash ring was reset on each API access. The lower value minimizes the probability of a request routed to a wrong conductor when the ring needs rebalancing.

  • If you are doing a minor version upgrade, please re-run the ironic-dbsync online_data_migrations command to properly update the versions of the Objects in the database. Otherwise, the next major upgrade may fail.

Critical Issues

  • The ironic-dbsync online_data_migrations command was not updating the objects to their latest versions, which could prevent upgrades from working (i.e. when running the next release’s ironic-dbsync upgrade). Objects are updated to their latest versions now when running that command. See story 2004174 for more information.

Bug Fixes

  • Fixes an issue with a baremetal node that times out during cleaning. The ironic-conductor was attempting to change the node’s provision state to ‘clean failed’ twice, resulting in the node’s last_error being set incorrectly. This no longer happens. For more information, see story 2004299.

  • Fixes an issue where setting these configuration options to 0 caused a ValueError exception to be raised. You can now set them to 0 to disable the associated periodic tasks. (For more information, see story 2002059.):

    • [conductor]sync_power_state_interval: sync power states for the nodes

    • [conductor]check_provision_state_interval:

      • check deployments and time out if the deployment takes too long

      • check the status of cleaning a node and time out if it takes too long

      • check the status of inspecting a node and time out if it takes too long

      • check for and handle nodes that are taken over by new conductors (if an old conductor disappeared)

    • [conductor]send_sensor_data_interval: send sensor data to ceilometer

    • [conductor]sync_local_state_interval: refresh a conductor’s copy of the consistent hash ring. If any mappings have changed, determines which, if any, nodes need to be “taken over”. The ensuing actions could include preparing a PXE environment, updating the DHCP server, and so on.

    • [oneview]periodic_check_interval:

      • check for nodes taken over by OneView users

      • check for nodes freed by OneView users

  • Fixes an issue where Neutron ports would be left with a baremetal MAC address associated after an instance is deleted from a baremetal host. This caused problems with MAC address conflicts in follow up deployments to the same baremetal host. bug 2004428.

  • Fixes an issue where a flat Neutron port would be left with a host ID associated with it after an instance is deleted from a baremetal host. This caused problems with reusing the same port for a new instance as it is already bound to the old instance.

  • Fixes a bug where the number of CPU sockets was being returned by the idrac hardware type during introspection, instead of the number of virtual CPUs. See bug 2004155 for details.

  • Fixes a race condition in the hash ring implementation that could cause an internal server error on any request. See story 2003966 for details.

  • Properly reports an error when the image cache and the image HTTP or TFTP location are on different file system, causing hard link to fail.

  • Fixes an issue where iSCSI based deployments fail if the cpu_arch property is not specified on a node.

  • Fixes redfish hardware type to reuse HTTP session tokens when talking to BMC using session authentication. Prior to this fix redfish hardware type never tried to reuse session token given out by BMC during previous connection what may sometimes lead to session pool exhaustion with some BMC implementations.

  • Fixes an issue wherein provisioning fails if ironic node is configured with ramdisk deploy interface. See bug 2003532 for more details.

  • The IPMI hardware type unconditionally instructed the BMC to not automatically clear boot flag valid bit if Chassis Control command not received within 60-second timeout (countdown restarts when a Chassis Control command is received). Some BMCs do not support setting this; if sent it causes the boot to be aborted instead. For IPMI hardware type a new driver option node['driver_info']['ipmi_disable_boot_timeout'] can be specified. It is True by default; set it to False to bypass sending this command. See story 2004266 for additional information.

11.1.0

Prelude

Ironic 11.1… Where the volume dial turned more!

While Pixie Boots has rocked out to Rock and Roll, the Bare Metal as a Service team has wrapped up our Rocky release with 11.1. This new release contains a number of major features that we hope will improve the lives of bare metal operators everywhere!

  • Conductor grouping enabling nodes to be assigned to groups of different conductors.

  • Deployment steps framework enabling greater flexibility for deployers to request specific steps.

  • Bios setting interfaces for the ilo and irmc hardware types.

  • Ramdisk deployment interface for disk-less deployments.

  • Capability to reset nodes to their default interfaces via the API when resetting the node’s driver.

New Features

  • Added support for local booting a partition image for ppc64* hardware. If a PReP partition is detected when deploying to a ppc64* machine, the partition will be specified to IPA causing the bootloader to be installed there directly. This feature requires a ironic-python-agent ramdisk with ironic-lib >=2.14.

  • Adds new optional snmp_community_read and snmp_community_write properties to snmp driver configuration (specified via a node’s driver_info field). If present, the value(s) will be used respectively for SNMP reads and/or writes to the PDU. When not present, snmp_community value will be used instead.

  • The iRMC driver can now automatically update the node.traits field with CUSTOM_CPU_FPGA value based on information provided by the node during node inspection.

  • Adds a ramdisk deploy interface for deployments that wish to network boot to a ramdisk, as opposed to perform a complete traditional deployment to a physical media. This may be useful in scientific use cases or where ephemeral baremetal machines are desired.

    The ramdisk deploy interface is intended for advanced users and has some particular operational caveats that the users should be aware of prior to use, such as network access list requirements and configuration drive architectural restrictions and the inability to leverage configuration drives.

  • Adds a new configuration option [pxe]pxe_config_subdir to allow operators to define the specific directory that may be used inside of /tftpboot or /httpboot for a boot loader to locate the configuration file for the node. This option defaults to pxelinux.cfg which is the directory that the Syslinux pxelinux.0 bootloader utilized. Operators may wish to change the directory name if they are using other boot loaders such as GRUB or iPXE.

  • Conductors and nodes may be arbitrarily grouped to provide a basic level of affinity between conductors and nodes. Conductors use the [conductor]/conductor_group configuration option to set the group which they belong to. The same value may be set on one or more nodes in the conductor_group field (available in API version 1.46), and these will be matched such that only conductors with a given group will manage nodes with the same group.

    A group name may be up to 255 characters containing a-z, 0-9, _, -, and .. The group is case-insensitive. The default group is the empty string ("").

    The “node list” API endpoint (GET /v1/nodes) may also be filtered by conductor group in API version 1.46.

  • The framework for deployment steps is in place. All in-tree drivers (DeployInterfaces) have one (big) deploy step; the conductor executes this step when deploying a node.

    Starting with the Bare Metal REST API version 1.44, the current deploy step (if any) being executed is available in a node’s deploy_step field in the responses for the following queries:

    • GET /v1/nodes/<node identifier>

    • GET /v1/nodes/detail

    • GET /v1/nodes?fields=deploy_step,...

  • Implements bios interface for ilo hardware type. Adds the list of supported bios interfaces for the ilo hardware type. Adds manual cleaning steps apply_configuration and factory_reset which support managing the BIOS settings for the iLO servers using ilo hardware type.

  • Adds support for the new noop interface to the ipmi hardware type. This interface targets hardware that does not correctly change boot mode via the IPMI protocol. Using it requires pre-configuring the boot order on a node to try PXE, then fall back to local booting.

  • Adds new bios interface to irmc hardware type. This provides out-of-band BIOS configuration solution for iRMC driver which makes the functionality available via manual cleaning.

  • Adds out-of-band RAID configuration solution for the iRMC driver which makes the functionality available via manual cleaning. See iRMC hardware type documentation for more details.

  • Starting with API version 1.45, PATCH requests to /v1/nodes/<NODE> accept the new query parameter reset_interfaces. It can be provided whenever the driver field is updated. If set to ‘true’, all hardware interfaces wil be reset to their defaults, except for ones updated in the same request.

Upgrade Notes

  • Operators utilizing grub for PXE booting, typically with UEFI, should change their deployed master PXE configuration file provided for nodes PXE booting using grub. Ironic 11.1 now writes both MAC address and IP address based PXE confiuration links for network booting via grub. The grub variable should be changed from $net_default_ip to $net_default_mac. IP address support is deprecated and will be removed in the Stein release.

  • The minimum required version of pysnmp has been bumped to 4.3. This pysnmp version introduces simpler, faster and more functional high-level SNMP API on which ironic snmp driver has been migrated.

  • The minimum required version of the osprofiler library is now 1.5.0. This is now a new dependency, ironic has not been able to start with 1.4.0 since the Pike release when this dependency was introduced.

  • The swift/endpoint_type configuration option is now removed. python-swiftclient 3.2.0 (Ocata) and above removed support for the native URL type used by radosgw. Since using a swift/endpoint_type value of radosgw would fail anyway, it is removed. Deployers must now configure ceph with rgw swift account in url = True. This must be set before upgrading to this release.

  • The snmp hardware type now uses the noop management interface instead of fake used previously. Support for fake is left for backward compatibility.

Deprecation Notes

  • All drivers must implement their deployment process using deploy steps. Out-of-tree drivers without deploy steps will be supported until the Stein release. For more details, see story 1753128.

  • The xclarity hardware type, as well as the supporting driver interfaces have been deprecated and are scheduled to be removed from ironic in the Stein development cycle. This is due to the lack of operational Third Party testing to help ensure that the support for Lenovo XClarity is functional.

    The xclarity hardware type was introduced at the end of the Queens development cycle. During implementation of Third Party CI, the Lenovo team encountered some unforseen delays. Lenovo is continuing to work towards Third Party CI, and upon establishment and verification of functional Third Party CI, this deprecation will be rescinded.

  • Support for ironic to link PXE boot configuration files via the assigned interface IP address has been deprecated. This option was only the case when [pxe]ipxe_enabled was set to false and the node was being deployed using UEFI.

  • Using the fake management interfaces with the snmp hardware type is now deprecated, please use noop instead.

Bug Fixes

  • Better handles the case when an operator attempts to perform an upgrade from a release older than Pike, directly to a release newer than Pike, skipping one or more releases in between (i.e. a “skip version upgrade”). Instead of crashing, the operator will be informed that upgrading from a version older than the previous release is not supported (skip version upgrades) and that (as of Pike) all database migrations need to be performed using the previous releases for a fast-forward upgrade. [Bug 2002558]

  • Fixes support for grub based UEFI PXE booting by enabling links to the PXE configuration files to be written using the MAC address of the node in addition to the interface IP address. If the [dhcp]dhcp_provider option is set to none, only the MAC based links will be created.

  • Fixes an issue that caused the integrated Dell Remote Access Controller (iDRAC) management hardware interface implementation, idrac, to fail to boot nodes in Unified Extensible Firmware Interface (UEFI) boot mode. That interface is supported by the idrac hardware type. The issue is resolved for Dell EMC PowerEdge 13th and 14th generation servers. It is not resolved for PowerEdge 12th generation and earlier servers. For more information, see story 1656841.

  • If a node gets stuck in one of the states deploying, cleaning, verifying, inspecting, adopting, rescuing, unrescuing for some reason (eg. conductor goes down when executing a task), it will be moved to an appropriate failure state in the next time the conductor starts.

  • Changes the iPXE behavior to retry a total of 10 times with an increasing backoff time between each retry in order to not create a Denial of Service situation with the iPXE HTTP server. Should the retries fail, the node will be powered-off after a warning is displayed on the console for 30 seconds. For more information, see story.

  • The cleaning operation may fail, if an in-band clean step were to execute after the completion of out-of-band clean step that performs reboot of the node. The failure is caused because of race condition where in cleaning is resumed before the Ironic Python Agent(IPA) is ready to execute clean steps. This has been fixed. For more information, see bug 2002731.

Other Notes

  • The deprecated configuration option [ipmi]retry_timeout was removed, use [ipmi]command_retry_timeout instead.

11.0.0

Prelude

I R O N I C turns the dial to 11

In preparation for the OpenStack Rocky development cycle release, the “ironic” Bare Metal as a Service team announces the release of version 11.0. While it is not quite like a volume knob, this release lays the foundation for features coming in future releases and user experience enhancements.

Some of these include the BIOS configuration framework, power fault recovery, additonal error handling, refactoring, removal of classic drivers, and many bug fixes.

New Features

  • Adds the healthcheck middleware from oslo, configurable via the [healthcheck]/enabled option. This middleware adds a status check at /healthcheck. This is useful for load balancers to determine if a service is up (and add or remove it from rotation), or for monitoring tools to see the health of the server. This endpoint is unauthenticated, as not all load balancers or monitoring tools support authenticating with a health check endpoint.

  • Adds support to abort the inspection of a node in the inspect wait state, as long as this operation is supported by the inspect interface in use. A node in the inspect wait state accepts the abort provisioning verb to initiate the abort process. This feature is supported by the inspector inspect interface and is available starting with API version 1.41.

  • Adds support for reading and changing the node’s bios_interface field and enables the GET endpoints to check BIOS settings, if they have already been cached. This requires a compatible bios_interface to be set. This feature is available starting with API version 1.40.

  • The new ironic configuration setting [deploy]/default_boot_mode allows the operator to set the default boot mode when ironic can’t pick boot mode automatically based on node configuration, hardware capabilities, or bare-metal machine configuration.

  • Adds support to the redfish management interface for reading and setting bare metal node’s boot mode.

  • Adds new Power Distribution Unit (PDU) snmp driver type - BayTech MRP27.

  • Adds new auto type of the driver_info/snmp_driver setting which makes ironic automatically select a suitable SNMP driver type based on the SNMPv2-MIB::sysObjectID value as reported by the PDU being managed.

  • Adds SNMPv3 message authentication and encryption features to ironic snmp hardware type. To enable these features, the following parameters should be used in the node’s driver_info:

    • snmp_user

    • snmp_auth_protocol

    • snmp_auth_key

    • snmp_priv_protocol

    • snmp_priv_key

    Also adds support for the context_engine_id and context_name parameters of SNMPv3 message at ironic snmp hardware type. They can be configured in the node’s driver_info.

  • Add ?detail= boolean query to the API list endpoints to provide a more RESTful alternative to the existing /nodes/detail and similar endpoints. The default is False. Now these API requests are possible:

    • /nodes?detail=True

    • /ports?detail=True

    • /chassis?detail=True

    • /portgroups?detail=True

  • Adds external storage interface which is short for “externally managed”. This adds logic to allow the Bare Metal service to identify when a BFV scenario is being requested based upon the configuration set for volume targets.

    The user must create the entry, and no syncronizaiton with a Block Storage service will occur. Documentation has been updated to reflect how to use this interface.

  • Adds the [deploy]enable_ata_secure_erase option which allows an operator to disable ATA Secure Erase for all nodes being managed by the conductor. This setting defaults to True which aligns with the prior behavior of the Bare Metal service.

  • Adds new parameter fields to driver_info, which will become mandatory in Stein release:

    • xclarity_manager_ip: IP address of the XClarity Controller.

    • xclarity_username: Username for the XClarity Controller.

    • xclarity_password: Password for XClarity Controller username.

    • xclarity_port: Port to be used for XClarity Controller connection.

  • Adds support for the ipmitool power interface to the irmc hardware type.

  • Adds support for the fault field in the node, beginning with API version 1.42. This field records the fault, if any, detected by ironic for a node. If no fault is detected, the fault is None. The fault field value is set to one of following values according to different circumstances:

    • power failure: when a node is put into maintenance due to power sync failures that exceed max retries.

    • clean failure: when a node is put into maintenance due to failure of a cleaning operation.

    • rescue abort failure: when a node is put into maintenance due to failure of cleaning up during rescue abort.

    The fault field will be set to None if an operator manually set maintenance to False. The fault field can be used as a filter for querying nodes.

  • Adds power failure recovery to ironic. For nodes that ironic had put into maintenance mode due to power failure, ironic periodically checks their power state, and moves them out of maintenance mode when power state can be retrieved. The interval of this check is configured via [conductor]power_failure_recovery_interval configuration option, the default value is 300 (seconds). Set to 0 to disable this behavior.

  • Adds support for RAID 1 creation on Dell Boot Optimized Storage Solution (BOSS).

  • Adds support for rescue interface agent for the ilo hardware type when the corresponding boot interface being used is ilo-virtual-media. The supported values of the rescue interface for the ilo hardware type are agent and no-rescue. The default value is no-rescue.

  • Adds support for rescue interface agent for the irmc hardware type when the corresponding boot interface is irmc-virtual-media. The supported values of rescue interface for irmc hardware type are agent and no-rescue. The default value is no-rescue.

  • Issuing a SIGHUP (e.g. pkill -HUP ironic) to an ironic-api or ironic-conductor service will cause the service to reload and use any changed values for mutable configuration options. The mutable configuration options are:

    • [DEFAULT]/debug

    • [DEFAULT]/log_config_append

    • [DEFAULT]/pin_release_version

    Mutable configuration options are indicated as such in the sample configuration file by Note: This option can be changed without restarting.

    A warning is logged for any changes to immutable configuration options.

Upgrade Notes

  • Adds an inspect wait state to handle asynchronous hardware introspection. Caution should be taken due to the timeout monitoring is shifted from inspecting to inspect wait, please stop all running asynchronous hardware inspection or wait until it is finished before upgrading to the Rocky release. Otherwise nodes in asynchronous inspection will be left at inspecting state forever unless the database is manually updated.

  • Extends the instance_info column in the nodes table for MySQL/MariaDB from up to 64KiB to up to 4GiB (type is changed from TEXT to LONGTEXT). This upgrade will not be executed on PostgreSQL as its TEXT is unlimited.

  • To use CoreOS based deploy/cleaning ramdisk built using Ironic Python Agent from the Rocky release, Ironic should be upgraded to the Rocky release if PXE is used. Otherwise, a node cannot be deployed or cleaned because the IPA fails to boot due to an unsupported parameter passed via PXE. See bug 2002093 for details.

  • With the deploy ramdisk based on Ironic Python Agent version 3.1.0 and beyond, the drivers using direct deploy interface performs netboot or local boot for whole disk image based on value of boot option setting. When you upgrade Ironic Python Agent in your deploy ramdisk, ensure that boot option is set appropriately for the node. The boot option can be set using configuration [deploy]/default_boot_option or as a boot_option capability in node’s properties['capabilities']. Also please note that this functionality requires hexdump command in the ramdisk.

  • ironic-dbsync online_data_migrations will migrate any port’s and port group’s extra[‘vif_port_id’] value to their internal_info[‘tenant_vif_port_id’]. For API versions >= 1.28, the ability to attach/detach the VIF via the port’s or port group’s extra[‘vif_port_id’] will not be supported starting with the Stein release.

    Any out-of-tree network interface implementations that had a different behavior in support of attach/detach VIFs via the port or port group’s extra[‘vif_port_id’] must be updated appropriately.

  • It is no longer possible to load a classic driver. Only hardware types are supported from now on.

  • The /v1/drivers/?type=classic API always returns an empty list since classic drivers can no longer be loaded.

  • The deprecated iDRAC classic drivers pxe_drac and pxe_drac_inspector have been removed. Please use the idrac hardware type.

  • The deprecated iLO classic drivers pxe_ilo, iscsi_ilo and agent_ilo have been removed. Please use the ilo hardware type.

  • The deprecated classic drivers pxe_ipmitool and agent_ipmitool have been removed. Please use the ipmi hardware type instead.

  • The deprecated classic drivers pxe_irmc, agent_irmc and iscsi_irmc have been removed. Please use the irmc hardware type.

  • The deprecated classic drivers iscsi_pxe_oneview and agent_pxe_oneview have been removed. Please use the oneview hardware type.

  • The deprecated pxe_snmp classic driver has been removed. Please use the snmp hardware type instead.

  • The deprecated classic drivers pxe_ucs and agent_ucs have been removed. Please use the cisco-ucs-managed hardware type.

  • The deprecated classic drivers pxe_iscsi_cimc and pxe_agent_cimc have been removed. Please use the cisco-ucs-standalone hardware type.

  • All fake classic drivers, deprecated in the Queens release, have been removed. This includes:

    • fake

    • fake_agent

    • fake_cimc

    • fake_drac

    • fake_ilo

    • fake_inspector

    • fake_ipmitool

    • fake_ipmitool_socat

    • fake_irmc

    • fake_oneview

    • fake_pxe

    • fake_snmp

    • fake_soft_power

    • fake_ucs

    Please use the fake-hardware hardware type instead (you can combine it with any other interfaces, fake or real).

  • Adds a new configuration option [disk_utils]partprobe_attempts which defaults to 10. This is the maximum number of times to try to read a partition (if creating a config drive) via a partprobe command. Set it to 1 if you want the previous behavior, where no retries were done.

  • Power failure recovery introduces a new configuration option [conductor]power_failure_recovery_interval, which is enabled and set to 300 seconds by default. In case the default value is not suitable for the needs or scale of a deployment, please make adjustment or turn it off during upgrade.

  • Power failure recovery does not apply to nodes that were in maintenance mode due to power failure before upgrade, they have to be manually moved out of maintenance mode.

  • Deprecated options ansible_deploy_username and ansible_deploy_key_file in node driver_info for the ansible deploy interface were removed and will be ignored. Use ansible_username and ansible_key_file options in the node driver_info respectively.

  • The behavior for retention of VIF interface attachments has changed.

    If your use of the Bare Metal service is reliant upon the behavior of the VIFs being retained, which was introduced as a behavior change during the Ocata cycle, then you must update your tooling to explicitly re-add the VIF attachments prior to deployment.

  • Deprecated option [keystone]\region_name was removed and will be ignored. Instead use region_name option in other sections related to contacting other services ([service_catalog], [cinder], [glance], [neutron], [swift] and [inspector]).

    As the option [keystone]\region_name was the only option in [keystone] section of ironic configuration file, this section was removed as well.

Deprecation Notes

  • Adds an inspect wait state to handle asynchronous hardware introspection. The [conductor]inspect_timeout configuration option is deprecated for removal, please use [conductor]inspect_wait_timeout instead to specify the timeout of inspection process.

  • Deprecates the snmp_security field in driver_info for ironic snmp hardware type, it will be removed in Stein release. Please use snmp_user field instead.

  • The [inspector]enabled configuration option is deprecated. It only affected classic drivers, and with their removal it no longer has any effect. Use the enabled_inspect_interfaces option to enable/disable support for ironic-inspector.

  • The oneview hardware type, as well as the supporting driver interfaces have been deprecated and are scheduled to be removed from ironic in the Stein development cycle. This is due to the lack of operational Third Party testing to help ensure that the support for Oneview is functional. Oneview Third Party CI was shutdown just prior to the start of the Rocky development cycle, and at the time of this deprecation the Ironic community has no indication that testing will be restablished. Should testing be restablished, this deprecation shall be rescinded.

  • Configuration options [xclarity]/manager_ip, [xclarity]/username, and [xclarity]/password are deprecated and will be removed in the Stein release.

  • The enabled_drivers option is now deprecated. Since classic drivers can no longer be loaded, setting this option to anything non-empty will result in the conductor failing to start.

Security Issues

  • Fixes an issue where an enabled console could be left running after a node was unprovisioned. This allowed a user to view the console even after the instance was gone. Ironic now stops the console during unprovisioning to block this.

  • Xclarity password specified in configuration file is now properly masked during logging.

Bug Fixes

  • Fixes bug 1749755 causing timeouts to not work properly because an unsupported sqalchemy filter was being used.

  • Adds more ipmitool error messages to be treated as retryable by the ipmitool interfaces (such as power and management hardware interfaces). Specifically, Node busy, Timeout, Out of space and BMC initialization in progress reporting emitted by ipmitool will cause ironic to retry IPMI command. This change should improve the reliability of IPMI-based communicaton with BMC.

  • If the bare metal machine’s boot mode differs from the requested one, ironic will now attempt to set requested boot mode on the bare metal machine and fail explicitly if the driver does not support setting boot mode on the node.

  • The config drive passed to the node can now contain more than 64KiB in case of MySQL/MariaDB. For more details see bug 1596421.

  • Fixes a bug preventing a node from booting into the user instance after unrescuing if instance netboot is used. See bug 1749433 for details.

  • Fixes rescue timeout due to incorrect kernel parameter in the iPXE script. See bug 1749860 for details.

  • Fixes a bug where a node’s hardware type cannot be changed to another hardware type which doesn’t support any hardware interface currently used. See bug 2001832 for details.

  • Fixes a bug that exposes an internal node ID in an error message when requested to delete a trait which doesn’t exist. See bug 2002062 for details.

  • When a conductor managing a node dies mid-cleaning the node would get stuck in the CLEANING state. Now upon conductor startup nodes in the CLEANING state will be moved to the CLEANFAIL state.

  • Fixes an issue where parameters required in driver_info and descriptions in documentation are different.

  • Fixes an issue with validation of Infiniband ports. Infiniband ports do not require the local_link_connection field to be populated as the network topology is discoverable by the Infiniband Subnet Manager. See bug 1753222 for details.

  • Fixes an issue where RAID 10 creation fails with greater than 16 drives when using the idrac hardware type. See bug 2002771 for details.

  • Adds missed noop implementations (e.g. no-inspect) to the fake-hardware hardware type. This fixes enabling this hardware type without enabling all (even optional) fake interfaces.

  • Fixes an issue seen during cleaning when the node being cleaned has one or more traits assigned. This issue caused cleaning to fail, and the node to enter the clean failed state. See bug 1750027 for details.

  • Fixes an issue with iPXE where the incorrect iscsi volume authentication data was being used with boot from volume when multi-attach volumes were present.

  • Fixes direct deploy interface to invoke boot.prepare_instance irrespective of image type being provisioned. It was calling boot.prepare_instance only if the image being provisioned is a partition image. See bugs 1713916 and 1750958 for details.

  • Fixes the HTTP response code for a validation failure when attempting to move an ironic node to the active state. Validation failure in this scenario now responses with a 400 status code correctly indicating a user input error.

  • Fixes an issue where node ramdisk heartbeat operations would collide with conductor locks and erroniously record an error in node’s last_error field.

  • Fixes collection of periodic tasks from hardware interfaces that are not used in any enabled classic drivers. See bug 2001884 for details.

  • The periodic tasks for the inspector inspect interface are no longer disabled if the [inspector]enabled option is not set to True. The help string of this option claims that it does not apply to hardware types. In any case, the periodic tasks are only run if any enabled classic driver or hardware interface requires them.

  • Fixes a compatability issue where the iPXE kernel command line was no longe compatible with dracut. The ip parameter has been removed as it is incompatible with the BOOTIF and missing autoconf parameters when dracut is used. Further details can be found in storyboard.

  • Fixes empty last_error field on cleaning failures.

  • Fixes an issue where only nodes in DEPLOYING state would have locks cleared for the nodes. Now upon node take over, any locks that are left from the old conductor are cleared by the new one.

  • Adds a new configuration option [disk_utils]partprobe_attempts which defaults to 10. This is the maximum number of times to try to read a partition (if creating a config drive) via a partprobe command. Previously, no retries were done which caused failures. This addresses bug 1756760.

  • Fixes rare race condition which resulted in the port list API returning HTTP 400 (bad request) if some nodes were being removed in parallel. See bug 1748893 for details.

  • Fixes an issue where no error was raised if there were no PXE-enabled ports available for the node, when creating a neutron port. See bug 2001811 for more details.

  • Fixes potential case of VIF records being orphaned as the service now removes all records of VIF attachments upon the teardown of a deployed node. This is in order to resolve issues related to where it is operationally impossible in some circumstances to remove a VIF attachment while a node is being undeployed as the Compute service will only attempt to remove the VIF for five minutes.

    See bug 1743652 for more details.

  • Ironic API now returns 503 Service Unavailable for action requiring a conductor when no conductors are online. Bug: 2002600.

  • Fixes an issue seen during node tear down where a port being deleted by the Bare Metal service could be deleted by the Compute service, leading to an unhandled error from the Networking service. See story 2002637 for further details.

  • Fixes an issue where the ilo hardware type would not properly update the boot mode on the bare metal machine for cleaning as per given boot_mode in node’s properties/capabilities. See bug 1559835 for more details.

  • During node cleaning, the conductor was using a cached copy of the node’s driver_internal_info field. It is possible that the copy is outdated, which would cause issues with the state of the node. This has been fixed. For more information, see bug 2002688.

  • Fixes an issue where a node’s instance_info.traits field could be incorrectly formatted, or contain traits that are not traits of the node. When validating drivers and prior to deployment, the Bare Metal service now validates that a node’s traits include all the traits in its instance_info.traits field. See bug 1755146 for details.

  • Reverts the fix for orphaned VIF records from the previous release, as it causes a regression. See bug 1750785 for details.

Other Notes

  • Adds an inspect wait state to handle asynchronous hardware introspection. Returning INSPECTING from the inspect_hardware method of inspect interface is deprecated, INSPECTWAIT should be returned instead.

  • Adds get_boot_mode, set_boot_mode and get_supported_boot_modes methods to driver management interface. Drivers can override these methods implementing boot mode management calls to the BMC of the baremetal nodes being managed.

  • Adds new method validate_rescue() to boot interface to validate node’s properties related to rescue operation. This method is called by the validate() method of rescue interface.

  • For out-of-tree drivers that have vendor passthru methods. The async parameter of the passthru and driver_passthru decorators is deprecated and will be removed in the Stein cycle. Please use its replacement instead, the async_call parameter. For more information, see bug 1751306.

  • The conductor no longer tries to collect or report sensors data for nodes in maintenance mode. See bug 1652741.

  • On taking over nodes in CLEANING state, the new conductor moves them to the CLEAN FAIL state and sets maintenance.

  • Removes the software metric named validate_boot_option_for_trusted_boot. This was the timing for a short-lived, internal function that is already included in the PXEBoot.validate metric.