Train Series (12.2.0 - 13.0.x) Release Notes

13.0.2

Security Issues

  • Node secrets (such as BMC credentials) are no longer logged when JSON RPC is used and DEBUG logging is enabled.

Bug Fixes

  • Fixes a bug in the idrac hardware type where a race condition can occur on a host that has a mix of controllers where some support realtime mode and some do not. The approach is to use only realtime mode if all controllers support realtime. This removes the race condition. See bug 2006502 https://storyboard.openstack.org/#!/story/2006502 for details

  • Fixes issue where the resource list API returned results with requested fields only until the API MAX_LIMIT. After the API MAX_LIMIT is reached the API started ignoring user requested fields. This fix will make sure that the next url generated by the pagination code will include the user requested fields as query parameter.

  • Fixes drive sensors information collection in redfish management interface. Prior to this fix, wrong Redfish schema has been used for Drive resource what has been causing exception and ultimately sensor data collection failure.

  • Fixes a possible console lockup issue in case of PID file not being yet created while daemon start has call already returned success return code.

  • Fixes a bug in the idrac hardware type where executing the clear_job_queue clean step, pending non-BIOS config jobs (E.g. create/delete virtual disk) were not being deleted before job execution.

    See bug 2006580 https://storyboard.openstack.org/#!/story/2006580 for details

  • Fixes a bug with the grub ramdisk boot template handling, such that the template now properly references the user provided kernal and ramdisk. Previously the deployment ramdisk and kernel was referenced in the template.

13.0.1

Bug Fixes

  • Fixes a bug in the idrac hardware type where configuration job for RAID delete_configuration cleaning step gets created even when there are no virtual disks or hotspares/dedicated hotspares present on any controller. See bug 2006562 https://storyboard.openstack.org/#!/story/2006562 for details.

13.0.0

Prelude

“Choooooo! Choooooo!” The Train is now departing the station. The OpenStack Bare Metal as a service team is proud to announce the release of Ironic 13.0.0. This release brings the long desired feature of software RAID configuration, Redfish virtual media boot support, sensor data improvements, and numerous bug fixes. We hope you enjoy your ride on the OpenStack Ironic Train.

New Features

  • Adds support for deploy steps to the idrac-wsman raid interface. The methods apply_configuration and delete_configuration can be used as deploy steps.

  • Adds a new delete_existing argument to the create_configuration clean step on the idrac-wsman raid interface which can be used to delete existing virtual disks. The default for this argument is False.

  • Adds support for deploy steps to bios interface of ilo hardware type. The methods factory_reset and apply_configuration can be used as deploy steps.

  • Adds support for deploy steps to the management interface of the ilo hardware type. The methods reset_ilo, reset_ilo_credential, reset_bios_to_default, reset_secure_boot_keys_to_default, clear_secure_boot_keys and update_firmware can be used as deploy steps.

  • Adds support for deploy steps to raid interface of ilo5 hardware type. The methods apply_configuration and delete_configuration can be used as deploy steps.

  • Adds support for deploy steps to bios interface of redfish hardware type. The methods factory_reset and apply_configuration can be used as deploy steps.

  • Adds virtual media boot interface to redfish hardware type supporting virtual media boot. The redfish-virtual-media boot interface operates on the same kernel/ramdisk as, for example, PXE boot interface does, however redfish-virtual-media boot interface can additionally require EFI system partition image (ESP) when performing UEFI boot. Either the [conductor]bootloader configuration option or the [driver_info]/bootloader node attribute can be used to convey ESP location to ironic. Bootable ISO images can be served to BMCs either from Swift or from an HTTP server running on an ironic conductor machine. This is controlled by the [redfish]use_swift ironic configuration option.

  • Adds sensor data collector to redfish management interface. Temperature, power, cooling and drive health metrics are collected.

  • Add target_raid_config data to ironic variable under raid_config top-level key which will expose the RAID configuration to the ansible driver. See story 2006417 for details.

  • Adds a clear_job_queue cleaning step to the idrac-wsman management interface. The clear_job_queue cleaning step clears the Lifecycle Controller job queue including any pending jobs.

  • Adds an ilo-ipxe boot interface to ilo hardware type which allows for instance level iPXE enablement as opposed to conductor-wide enablement of iPXE. To perform iPXE boot with ilo-ipxe boot interface:

  • Adds power state change callbacks of an instance to the Compute service by performing API notifications. This feature is enabled by default and can be disabled via the new [nova]send_power_notifications configuration option.

    Whenever there is a change in the power state of a physical instance, the Bare Metal service will send a power-update external event to the Compute service which will cause the power state of the instance to be updated in the Compute database. It also adds the possibility of bringing up/down a physical instance through the Bare Metal service API even if it was put down/up through the Compute service API.

  • The deploy and/or rescue kernel and ramdisk can now be configured via the new configuration options deploy_kernel, deploy_ramdisk, rescue_kernel and rescue_ramdisk respectively.

  • Adds a new configuration option [drac]boot_device_job_status_timeout that specifies the maximum amount of time (in seconds) to wait for the boot device configuration job to transition to the scheduled state to allow a reboot or power on action to complete.

  • Adds initial idrac hardware type support of interface implementations that utilize the Redfish out-of-band (OOB) management protocol and are compatible with the integrated Dell Remote Access Controller (iDRAC) baseboard management controller (BMC), presently those of the management and power hardware interfaces. They are named idrac-redfish.

    Introduces a new name for the idrac interface implementations, idrac-wsman, and deprecates idrac. They both use the Web Services Management (WS-Man) OOB management protocol.

    The idrac hardware type declares support for those new interface implementations, in addition to all interface implementations it has been supporting. The priority order of supported interfaces remains the same. Interface implementations which rely on WS-Man continue to have the highest priority, and the new idrac-wsman is listed before the deprecated idrac. It now supports the following interface implementations, which are listed in priority order from highest to lowest:

    • bios: no-bios

    • boot: ipxe, pxe

    • console: no-console

    • deploy: iscsi, direct, ansible, ramdisk

    • inspect: idrac-wsman, idrac, inspector, no-inspect

    • management: idrac-wsman, idrac, idrac-redfish

    • network: flat, neutron, noop

    • power: idrac-wsman, idrac, idrac-redfish

    • raid: idrac-wsman, idrac, no-raid

    • rescue: no-rescue, agent

    • storage: noop, cinder, external

    • vendor: idrac-wsman, idrac, no-vendor

    For more information, see story 2004592.

  • Adds idrac hardware type support of an inspect interface implementation that utilizes the Redfish out-of-band (OOB) management protocol and is compatible with the integrated Dell Remote Access Controller (iDRAC) baseboard management controller (BMC). It is named idrac-redfish.

    The idrac hardware type declares support for that new interface implementation, in addition to all inspect interface implementations it has been supporting. The highest priority inspect interfaces remain the same, those which rely on the Web Services Management (WS-Man) OOB management protocol. The new ‘idrac-redfish’ immediately follows those. It now supports the following inspect interface implementations, listed in priority order from highest to lowest: idrac-wsman, idrac, idrac-redfish, inspector, and no-inspect.

  • Adds functionality to perform out-of-band sanitize disk-erase operation for iLO5 based HPE Proliant servers. Management interface ilo5 has been added to ilo5 hardware type. A clean step erase_devices has been added to management interface ilo5 to support this operation.

  • Adds support for the Intel IPMI Hardware with a new hardware type intel-ipmitool. This hardware type is the same as the ipmi hardware type with additional support of Intel Speed Select Performance Profile Technology. It uses the intel-ipmitool management interface, which supports setting the desired configuration level for Intel SST-PP.

  • Ironic API service now supports HTTP proxy headers parsing with the help of oslo.middleware package, enabled via new option [oslo_middleware]/enable_proxy_headers_parsing (False by default).

    This enables more complex setups of Ironic API service, for example when the same service instance serves both internal and public API endpoints via separate proxies.

    When proxy headers parsing is enabled, the value of [api]/public_endpoint option is ignored.

  • Allows retrying PXE/iPXE boot during deployment, cleaning and rescuing. This feature is disabled by default and can be enabled by setting [pxe]boot_retry_timeout to the timeout (in seconds) after which the boot should be retried.

    The new option [pxe]boot_retry_check_interval defines how often to check the nodes for timeout and defaults to 90 seconds.

  • Adds support for software RAID via the generic hardware manager when using a Train release ironic-python-agent deployment or cleaning ramdisk.

    This may be used by means of the target_raid_config a single RAID-1 or one RAID-1 plus one RAID-N can be configured (where N can be 0, 1, and 1+0). The RAID is created/deleted during manual cleaning. Note that this initial implementation will use all available devices for the setup of the software RAID device(s). More information is available in the Ironic Administrator documentation.

  • Foreign drives and global and dedicated hot spares will be freed up during the RAID delete_configuration cleaning step.

Upgrade Notes

  • In order to support power state change call backs to nova, the [nova] section must be configured in the Bare Metal service configuration. As the functionality to process the event is new to nova’s Train release, this should only be set to True in ironic, once ALL nova-compute instances have been upgraded to the Train release of nova.

  • The Cisco cisco-ucs-managed and cisco-ucs-standalone hardware types and cimc and ucsm hardware interfaces which were deprecated in the 12.1.0 release have now been removed.

    After upgrading, if any of these hardware types or interfaces are specified in ironic’s configuration options, the ironic-conductor service will fail to start. Any existing ironic nodes with these hardware types or interfaces will become inoperational via ironic after the upgrade. If these hardware types or interfaces are being used, the affected nodes should be changed to use other hardware types or interfaces; or install these hardware types (and interfaces) from elsewhere separately. For more information, see story 2005033.

  • The deprecated configuration options enabled and service_url from the inspector section have been removed.

  • The python-ironic-inspector-client package is no longer required for the inspector inspect interface (openstacksdk is used instead).

  • The deprecated options url, url_timeout and auth_strategy from the [neutron] section have been removed. Use endpoint_override, timeout and auth_type respectively.

  • When a failure occurs during cleaning, nodes will no longer be shut down. The behaviour was changed to prevent harm and allow for an admin intervention when sensitive operations, such as firmware upgrades, are performed and fail during cleaning.

  • The deprecated options glance_api_servers, glance_api_insecure, glance_cafile and auth_strategy from the [glance] section have been remove. Please use the corresponding keystoneauth options instead.

  • The do_disk_erase, has_disk_erase_completed and get_available_disk_types interfaces of ‘proliantutils’ library has been enhanced to support out-of-band sanitize disk-erase operation for ilo5 hardware type. To leverage this feature, the ‘proliantutils’ library needs to be upgraded to version ‘2.9.0’.

  • Users of the irmc hardware type with iPXE should switch to the ipxe boot interface from the deprecated [pxe]ipxe_enabled option.

  • Explicit support for CoreOS Ironic Python Agent images has been removed. If you use a ramdisk based on CoreOS, you may want to re-add coreos.configdrive=0 to your PXE templates, see story 1433812 for the background.

  • The deprecated ironic/api/app.wsgi script has been removed. The automatically generated ironic-api-wsgi script must be used instead.

  • Support for elilo has been removed as support was deprecated and elilo has been dropped by most Linux distributions. Users should migrate to another PXE loader.

Deprecation Notes

  • The configuration option [glance]glance_num_retries has been renamed to [glance]num_retries. The old name will be removed in a future release.

  • The idrac interface implementation name is deprecated in favor of a new name, idrac-wsman, and may be removed in a future release. A deprecation warning will be logged for every loaded idrac interface implementation. Use idrac-wsman instead.

  • The ironic-lib configuration option [disk_utils]iscsi_verify_attempts has been deprecated in favor of:

    • [iscsi]verify_attempts to specify the number of attempts to establish an iSCSI connection.

    • [disk_utils]partition_detection_attempts to specify the number of attempts to find a newly created partition.

Bug Fixes

  • Fixes an issue where if there is a pending BIOS config job in job queue, then ironic will abandon an introspection attempt for the node, which will cause overall introspection to fail.

  • Allows deleting unbound ports on an active node. See story 2006385 for details.

  • Fixes a confusing AttributeError if an adapter returns None for the bare metal API.

  • Prevents the adapter configuration options from getting ignored if a matching endpoint cannot be found. An error is now raised.

  • By immediately conveying power state changes of a node through external events to the Compute service, the Bare Metal service becomes the source of truth about the node’s power state, preventing the Compute service from forcing wrong power states on instances during the periodic power state synchronization between the Compute and Bare Metal services.

    Note

    There is a possibility of a race condition due to the nova-ironic power sync task happening during or right before the power state change event is received from the Bare Metal service, in which case the instance state will be forced on the baremetal node.

  • Fixes an issue in the discovery playbook for the ansible deploy interface that prevented gathering WWN and serial numbers under Python 3.

  • Fixes an issue with using serial number as root device hints with the ansible deploy interface.

  • Fixes an issue regarding the ansible deploy interface, where the configdrive partition could not be correctly built if the node root device was set to some logical device (like an md array, /dev/md0). https://storyboard.openstack.org/#!/story/2006334

  • Fixes deploying non-public images using the ansible deploy interface.

  • Currently Ironic allows entering deployment or cleaning for nodes in maintenance mode. However, heartbeats do not cause any actions for such nodes, thus deployment or cleaning will never finish if the nodes are not moved out of maintenance. A new configuration option [conductor]allow_provisioning_in_maintenance (defaulting to True) is added to configure this behavior. If it is set to False, deployment and cleaning will be prevented from nodes in maintenance mode.

  • Fixes an issue with asynchronous deploy steps that poll for completion where the step could fail to execute. The deployment_polling and cleaning_polling flags may be used by driver implementations to signal that the driver is polling for completion. See story 2003817 for details.

  • Fixes an issue in the idrac hardware type where a configuration job does not transition to the correct state and start execution during a power on or reboot operation. If the boot device is being changed, the system might complete its POST before the job is ready, leaving the job in the queue, and the system will boot from the wrong device. See bug 2004909 for details.

  • Fixes a bug where ironic would shut a node down upon cleaning failure. Now, the node stays powered on (as documented and intended).

  • Fixes an issue where baremetal node deployment would fail on clouds with a high number of security groups. Listing the security groups took too long. Instead of listing all security groups, a query filter was added to list only the security groups to be used for the network. (See bug 2006256.)

  • Fixed the issue with node being locked for longer than [console]subprocess_timeout seconds when shellinabox process fails to start before the specifed timeout elapses.

  • Fixed a bug when executing create_configuration cleaning step for disks of PERC H740P controller, first disks get created and then controller doesn’t allow to create next couple disks because controller is getting busy.

  • Fixes an issue wherein asynchronous out-of-band deploy steps in deployment template fails to execute. See story 2006342 for details.

  • Fixes an issue where users attempting to leverage non-iPXE UEFI booting would experience failures when their dhcp_provider was set to none.

  • Fixes a bug in iLO UEFI iSCSI Boot, where it fails if a server has multiple NIC adapters, since Proliant Servers have a limitation of creating only four iSCSI NIC sources and the existing implementation would try to create for more and failed accordingly.

  • Adds the missing ipxe boot interface to the irmc hardware type. It is supposed to be used instead of the deprecated [pxe]ipxe_enabled configuration option.

  • Fixes an issue where clean steps of redfish BIOS interface do not boot up the IPA ramdisk after cleaning reboot. See story 2006217 for details.

  • Fixes an issue in ISO creation for UEFI boot mode when efiboot.img file is provided and the directory of location of grub.cfg file set using config [DEFAULT]/grub_config_path is not same as that of efiboot.img file. See story 2006218 for details.

  • Fixes an issue in updating firmware using update_firmware_sum clean step from management interface of ilo hardware type with an error stating that unable to connect to iLO address due to authentication failure. See story 2006223 for details.

  • Fixes an issue in powering-on of server in ilo hardware type. Server was failing to return success for power-on operation if no bootable device was found. See story 2006288 for details.

  • Fixes an issue in creation of RAID if none of the ‘logical_disks’ in ‘target_raid_config’ have ‘controller’ parameter. See story 2006316 for details.

  • Fixes an issue in creation of RAID for ilo5 RAID interface wherein second time RAID creation fails. See story 2006321 for details.

  • Provides an opt-in fix to change the default port attachment behavior for deployment and cleaning operations through a new configuration option, [neutron]add_all_ports. This option causes ironic to transmit all port information to neutron as opposed to only a single physical network port. This enables operators to successfully operate static Port Group configurations with Neutron ML2 drivers, where previously configuration of networking would fail.

    When these ports are configured with pxe_enabled set to False, neutron will be requested not to assign an IP address to the port. This is to prevent additional issues that may occur depending on physical switch configuration with static Port Group configurations.

  • Fixes an issue during provisioning network attachment where neutron ports were being created with the same data structure being re-used.

Other Notes

  • This release allows to configure retryable ipmitool exceptions via [ipmi]additional_retryable_ipmi_errors so that, depending on the environment, operators could allow retrying ipmitool commands containing specified substrings.

12.2.0

New Features

  • Adds option allow_deleting_available_nodes to control whether nodes in state available should be deletable (which is and stays the default). Setting this option to False will remove available from the list of states in which nodes can be deleted from ironic. It hence provides protection against accidental removal of nodes which are ready for allocation (and is meant as a safeguard for the operational effort to bring nodes into this state). For backwards compatibility reasons, the default value for this option is True. The other states in which nodes can be deleted from ironic (manageable, enroll, and adoptfail) remain unchanged. This option can be changed without service restart.

  • Adds capability to hardware type idrac for creating and deleting RAID sets without rebooting the baremetal node. This realtime mechanism is supported on PERC H730 and H740 RAID controllers that are running firmware version 25.5.5.0005 or later.

  • Adds reset_idrac and known_good_state cleaning steps to hardware type idrac. reset_idrac actually resets the iDRAC; known_good_state also resets the iDRAC and clears the Lifecycle Controller job queue to make sure the iDRAC is in good state.

  • API version 1.58 allows backfilling allocations for existing deployed nodes by providing node to POST /v1/allocations.

  • API version 1.57 adds a REST API endpoint for updating an existing allocation. Only name and extra fields are allowed to be updated.

  • Adds a new option enable_mdns which enables publishing the baremetal API endpoint via mDNS as specified in the API SIG guideline.

  • Adds a [conductor]send_sensor_data_for_undeployed_nodes option to enable ironic to collect and transmit sensor data for all nodes for which sensor data collection is available. By default, this option is not enabled which aligns with the prior behavior of sensor data collection and transmission where such data was only collected if an instance_uuid was present to signify that the node has been or is being deployed. With this option set to True, operators may be able to identify hardware in a faulty state through the sensor data and take action before an instance workload is deployed.

  • The Smart-Nic functionality that was added to the Bare Metal Service during the Stein cycle can now be used with a Train version of the Networking Service (neutron) as Smart-Nic support merged into that project during the Train development cycle.

Upgrade Notes

  • Updates the minimum required version of python-dracclient to 3.0.0 when using the idrac hardware type.

  • Removes commit_required from the dictionary returned by the set_bios_config vendor passthru call in the idrac hardware type. commit_required was split into two keys: is_commit_required and is_reboot_required, which indicate the actions necessary to complete setting the BIOS settings. commit_required was removed in python-dracclient version 3.0.0.

  • Removes deprecated option [ilo]/power_retry. Please use [conductor]/soft_power_off_timeout instead.

  • Removes the configuration option [DEFAULT]/hash_distribution_replicas which was deprecated in the Stein cycle.

  • Removes the configuration option [DEFAULT]enabled_drivers. The option was deprecated in Rocky, and setting this option has raised an exception preventing conductor from starting since then. [DEFAULT]enabled_hardware_types should be used instead.

  • Updates the minimum required version of ironic-lib to 2.17.1.

Deprecation Notes

  • The configuration option [DEFAULT]/fatal_exception_format_errors is now deprecated. Please use the configuration option [ironic_lib]/fatal_exception_format_errors instead.

Bug Fixes

  • Fixes an issue where the Networking Service performs a pre-flight operation which can exceed the prior default for 30 seconds. The new default is 45 seconds, and operators can tune the setting via the [neutron]request_timeout setting.

  • Fixes overflowing of the node fields last_error and maintenance_reason, which would prevent the object from being correctly committed to the database. The maximum message length can be customized through a new configuration parameter, [DEFAULT]/log_in_db_max_size (default, 4096 characters).

  • Fixes an issue encountered during deployment, more precisely during the configdrive partition creation step. On some specific devices like NVMe drives, the created configdrive partition could not be correctly identified (required to dump data onto it afterward). See story 2005764.

  • Fixes an issue regarding the ansible deploy interface cleaning workflow. Handling the error in the driver and returning nothing caused the manager to consider the step done and go to the next one instead of interrupting the cleaning workflow.

  • Fixes an issue with the ansible deploy interface where raw images could not be streamed correctly to the host.

  • Fixes deployment with the ansible deploy interface and instance images with GPT partition table.

  • Fixes traceback on cleaning of nodes with the redfish hardware type if their BMC does not support BIOS settings.

  • Fixes an issue where the sensor data parsing method for the ipmitool interface lacked the ability to handle the automatically included ipmitool debugging information when the debug option is set to True in the ironic.conf file. As such, extra debugging information supplied by the underlying ipmitool command is disregarded. More information can be found in story 2005331.

  • Fixes an issue where deploy fails during node preparation if the node capabilities are passed as string.

  • Fixes GRUB configuration file generation procedure when building bootable ISO images that include user EFI boot loader image. Prior to this fix, no bootable ISO image could be generated unless EFI boot loader is extracted from deploy ISO image.

  • Fixes an issue when the image source is a local file, the image will be truncated to 2G and fails deployment due to image corruption.

  • Fixes binary files upload to Swift. Prior to this fix, binary file upload to Swift might fail at unicode characters interpretation.

  • The internal JSON RPC server now binds to :: by default, allowing it to work correctly with IPv6.

  • This fix binds the jsonschema to use draft-04 for raid schema. The jsonschema 3.0.1 supports draft-03, draft-04, draft-06 and draft-07 and by default the validate function uses latest draft validator. Draft-04 is the latest draft in the jsonschema 2.6. Hence binding the schema to draft-04 validator makes it compliant for both jsonschema 2.6 and jsonschema 3.0.1.

  • Fixes the duplication of the “ipxe” tag when using IPv6, which leads to the dhcp server possibly returning an incorrect response to the DHCPv6 client.

  • Fixes an issue where nodes in the process of deployment may have metrics data collected and transmitted during the deployment process which may erroneously generate alarms depending on the operator’s monitoring configuration. This was due to a database filter relying upon the indicator of an instance_uuid as opposed to the state of a node.

  • No longer tries to create a temporary URL with zero lifetime if the deploy_callback_timeout option is set to zero. The default of 1800 seconds is used in that case. Use the new configdrive_swift_temp_url_duration option to override.