Train Series (3.7.0 - 5.0.x) Release Notes

5.0.4-17

New Features

  • Adds an configuration option which can be encoded into the ramdisk itself or the PXE parameters being provided to instruct the agent to ignore bootloader installation or configuration failures. This functionality is useful to work around well-intentioned hardware which is auto-populating all possible device into the UEFI nvram firmware in order to try and help ensure the machine boots. Except, this can also mean any explict configuration attempt will fail. Operators needing this bypass can use the ipa-ignore-bootloader-failure configuration option on the PXE command line or utilize the ignore_bootloader_failure option for the Ramdisk configuration. In a future version of ironic, this setting may be able to be overriden by ironic node level configuration.

  • Adds the capability into the agent to read and act upon bootloader CSV files which serve as authoritative indicators of what bootloader to load instead of leaning towards utilizing the default.

Known Issues

  • If multiple bootloader CSV files are present on the EFI filesystem, the first CSV file discovered will be utilized. The Ironic team considers multiple files to be a defect in the image being deployed. This may be changed in the future.

Bug Fixes

  • Setting the new ipa-ignore-bootloader-failure config option prevents errors due to bootloader installation failure generated by automatic bootloader entries configuration from multiple attached devices.

  • The system file system configuration file for Linux machines, the /etc/fstab file is now updated to include a reference to the EFI partition in the case of a partition image base deployment. Without this reference, images deployed using partition images could end up in situations where upgrading the bootloader could fail.

  • Fixes an error with UEFI based deployments where using a partition image a NVMe device was previously failing due to the different device name pattern.

  • Fixes failures with disk image conversions which result in memory allocation or input/output errors due to memory limitations by limiting the number of available memory allocation pools to a non-dynamic reasonable number which should not exceed the available system memory.

  • The lshw package version B.02.19.2-5 on CentOS 8.4 and 8.5 contains a bug that prevents the size of individual memory banks from being reported, with the result that the total memory size would be reported as 0 in some places. The total memory size is now taken from lshw’s total memory size output (which does not suffer from the same problem) when available.

  • Fixes the agent’s EFI boot handling such that EFI assets from a partition image are preserved and used instead of overridden. This should permit operators to use Secure Boot with partition images IF the assets are already present in the partition image.

  • Fixes nodes failing after deployment completes due to issues in the Grub2 EFI loader entry addition where a BOOT.CSV file provides the authoritative pointer to the bootloader to be used for booting the OS. The base issue with Grub2 is that it would update the UEFI bootloader NVRAM entries with whatever is present in a vendor specific BOOT.CSV or BOOTX64.CSV file. In some cases, a baremetal machine can crash when this occurs. More information can be found at story 2008962.

  • Fixes Python3 based support for loading files as file loading of bootloader configuration files requires an explicit open operation with an unicode indicator, which was inadvertently broken in backporting for Python2 support.

  • Increase memory usage limit for qemu-img convert command to 2 GiB. See Story 2008667 for details.

5.0.4

Bug Fixes

  • Fixes an issue where the bootloader installation can fail on a software RAID volume when no root_device hint is set. See Story 2007905

  • Fixes an issue with the IntelCnaHardwareManager which prevented hardware managers with lower priority to be executed and therefore may blocked the initialization and collection of hardware these managers are supposed to take care of.

  • Fixes retry logic issues with the Agent Lookup which can result in the lookup failing prematurely before being completed, typically resulting in an abrupt end to the agent logging and potentially weird errors like TypeError being reported on the agent process standard error output. For more information see bug 2007968.

  • Fixes an issue with the ironic-python-agent where we would call to setup the bootloader, which is necessary with software raid, but also attempt to clean up iSCSI. This can cause issues when using the direct deploy_interface. Now the agent will only clean up iSCSI connections if iSCSI was explicitly started. For more information, please see story 2007937.

5.0.3

Bug Fixes

  • Fixes deployment failures when the image download is interrupted mid-stream while the contents are being downloaded. Previously retries were limited to only opening the initial connection.

  • Fixes the short timeout retries interval, which was previously 5 seconds, to a length that will allow the agent to retry after a network interruption. The time between retries is now 10 seconds, and the number of retries are set to 9 to help ensure intermittent network outages do not cause recoverable failures.

  • Speeds up going from inspection to cleaning with fast-track enabled by caching hardware information between the steps.

  • Fixes serializing exceptions originating from ironic-lib. Previously an attempt to do so would result in a TypeError, for example: Object of type ‘InstanceDeployFailure’ is not JSON serializable.

  • Fixes failure to detect a hung file download connection in the event that the kernel has not rapidly detected that the remote server has hung up the socket. This can happen when there is intermittent and transient connectivity issues such as those that can occur due to LACP failure response hold-downs timers in switching fabrics.

5.0.2

Bug Fixes

  • Fixes an issue with deployment ramdisks running in UEFI boot mode where dual-boot images may cause the logic to prematurely exit before UEFI parameters can be updated. Internal checks for a BIOS bootloader will always return False now when the machine is in UEFI mode.

  • Fixes error handling if efibootmgr is not present in ramdisk. See story for more details.

  • Provides timeout and retries when establishing a connection to download an image in the standby extension. Reduces probability of an image download getting stuck in the event of network problems.

    The default timeout is 60 seconds and can be set via the ipa-image-download-connection-timeout kernel parameter. The default number of retries is 2 and can be set via the ipa-image-download-connection-retries parameter.

  • Fixes an issue where the agent was failing to rescan the device deployed upon before checking uefi contents. This would occur with an iSCSI based deployment, as partition management operations are performed by the conductor, and not locally.

  • No longer tries to use GRUB2 for configuring boot for whole disk images with an EFI partition present but only marked as boot (not esp).

5.0.1

Bug Fixes

  • Fixes the workflow for wholedisk images when using uefi boot mode, when possible it will use efibootmgr instead of grub2 to update the nvram.

  • Fixes an issue with the tinyIPA CI testing image by providing a fallback root volume uuid detection method via the findfs utility, which is also already packaged in most distributions with lsblk.

    This fallback was necesary as the lsblk command in TinyCore Linux, upon which TinyIPA is built, does not return data as expected for volume UUID values.

  • Fixes an issue where metadata erasure cleaning would fail on devices that are read-only at the hardware level. Typically these are virtual devices being offered to the operating system for purposes like OS self-installation.

    In the case of full device erasure, this is explicitly raised as a hard failure requiring operator intervention.

  • Skips NIC numa_node discovery if it’s not assigned to a numa_node as in some rare case, such as a VM with virtual NUMA node, NICs might not be in a NUMA node and this breaks numa-topology discovery.

  • Fixes the numa-topology inspection collector to be compatible with Pint < 0.5.2.

  • Fixes an issue where wholedisk images are requested for deployment and the bootloader is overridden. IPA now explicitly looks for the boot partition, and examines the contents if the disk appears to be MBR bootable. If override/skip bootloader installation does not apply if UEFI or PREP boot partitions are present on the disk.

Other Notes

  • Bumps up ipa-ip-lookup-attempts to 6, adding extra time for networking to be setup before giving up.

5.0.0

New Features

  • Adds support for creating software RAID on NVMe drives.

Upgrade Notes

  • Images based on CoreOS are no longer supported and built. They were deprecated in the Stein cycle and an alternative based on diskimage-builder is being developed.

Bug Fixes

  • Fixes detection of physical memory amount on AArch64 that was caused by different output of the lshw utility.

  • Fixes an issue where md devices disk holders could not be listed correctly if they were NVMe drives.

  • Fixes cleaning operations when floppy disk devices are present on the baremetal node. Floppy disk devices are now explicitly ignored.

  • No longer tries to use zRAM devices for anything.

  • Fixes size conversion when creating software RAID with size_gb provided. From the RAID documentation size_gb unit is GiB but parted defaults to MB.

  • Fixes creating software RAID when several logical drives have a size specified (i.e not ‘MAX’). See story 2006352.

  • Fixes creating software RAID when a logical drive with size ‘MAX’ is not the last in the list of logical drives.

  • Zap superblocks from all block devices, as an attempt to erase any softraid hint from devices when calling delete_configuration, including from drives that are no more members of any raid.

  • Tries to assemble software RAID automatically on start up to avoid problems with ramdisks that don’t do it automatically (like tinyipa).

Other Notes

  • The default list_all_block_devices hardware manager method has been changed to ignore floppy disk devices, introducing an argument ignore_floppy with a default value of True. A value of False may be passed to the list_all_block_devices method to include such devices.

4.0.0

New Features

  • Adds a new CLI command ironic-collect-introspection-data to enable manually publishing into the baremetal-introspection service. Executing this command on a system unknown to the Bare Metal service will likely result in the machine becoming registered to Ironic, and as such this command should be used with caution.

    If the capability to update introspection data for running machines has been enabled in the Bare Metal introspection service, then an operator may use this command in the active or rescue states to update introspection data.

Bug Fixes

  • The lshw output no longer pollutes the debug logging, instead it’s now stored as part of the ramdisk logs.

  • Fixes the missing ipv6 module for TinyCore based IPA images which are used in CI testing.

3.7.0

New Features

  • Add the hostname to the introspection data. This will likely be the hostname as set by the DHCP server.

  • IPv6 BMC address is now discovered during inspection and sent as a new bmc_v6address inventory field.

  • Supports fetching baremetal and baremetal introspection endpoints from mDNS instead of providing them via kernel parameters or a configuration file. See story 2005393 for more details.

  • Adds support for software RAID via the generic hardware manager. By means of the target_raid_config a single RAID-1 or one RAID-1 plus one RAID-N can be configured (where N can be 0, 1, and 1+0). The RAID is created/deleted during manual cleaning. Note that this initial implementation will use all available devices for the setup of the software RAID device(s).

Upgrade Notes

  • When no baremetal API URL is provided (e.g. via the ipa-api-url kernel parameter), ironic-python-agent now tries to get the URL using mDNS service discovery.

Bug Fixes

  • Supports channel numbers 1 to 11 when looking for a BMC address. This is consistent with the IPMI specification v2.0. Previously, only channels 1 to 7 were considered.

  • Mounts /run into chroot when installing bootloader to prevent timeouts.

  • Fixes an issue with retrieving all available physical memory. For more details see story 2005308.

  • Fixes an issue where md5 checksum is still required in the image information when os_hash_algo and os_hash_value are present. The checksum field is now optional, while os_hash_algo and os_hash_value fields must be set if the checksum field is not provided.