Victoria Series (6.2.0 - 6.4.x) Release Notes

6.4.3-22

New Features

  • Adds an configuration option which can be encoded into the ramdisk itself or the PXE parameters being provided to instruct the agent to ignore bootloader installation or configuration failures. This functionality is useful to work around well-intentioned hardware which is auto-populating all possible device into the UEFI nvram firmware in order to try and help ensure the machine boots. Except, this can also mean any explict configuration attempt will fail. Operators needing this bypass can use the ipa-ignore-bootloader-failure configuration option on the PXE command line or utilize the ignore_bootloader_failure option for the Ramdisk configuration. In a future version of ironic, this setting may be able to be overriden by ironic node level configuration.

  • Adds the capability into the agent to read and act upon bootloader CSV files which serve as authoritative indicators of what bootloader to load instead of leaning towards utilizing the default.

Known Issues

  • If multiple bootloader CSV files are present on the EFI filesystem, the first CSV file discovered will be utilized. The Ironic team considers multiple files to be a defect in the image being deployed. This may be changed in the future.

Bug Fixes

  • Setting the new ipa-ignore-bootloader-failure config option prevents errors due to bootloader installation failure generated by automatic bootloader entries configuration from multiple attached devices.

  • The system file system configuration file for Linux machines, the /etc/fstab file is now updated to include a reference to the EFI partition in the case of a partition image base deployment. Without this reference, images deployed using partition images could end up in situations where upgrading the bootloader could fail.

  • IPA now properly checks if the root partition is already mounted. See Story 2008631 for details.

  • Fixes an error with UEFI based deployments where using a partition image a NVMe device was previously failing due to the different device name pattern.

  • Fixes an issue where the NTP time sync at the IPA startup via chronyd is not immediate (which can break time sensitive components such as the generation of a TLS certificate).

  • Fixes failures with disk image conversions which result in memory allocation or input/output errors due to memory limitations by limiting the number of available memory allocation pools to a non-dynamic reasonable number which should not exceed the available system memory.

  • The lshw package version B.02.19.2-5 on CentOS 8.4 and 8.5 contains a bug that prevents the size of individual memory banks from being reported, with the result that the total memory size would be reported as 0 in some places. The total memory size is now taken from lshw’s total memory size output (which does not suffer from the same problem) when available.

  • No longer crashes if MAC address cannot be determined for one of the network interfaces.

  • Fixes an issue where metadata erasure cleaning fails for partitions because the read-only file isn’t found, while it is available at the base device. Adds a check for the base device file on failure. See story 2008696.

  • Fixes the agent’s EFI boot handling such that EFI assets from a partition image are preserved and used instead of overridden. This should permit operators to use Secure Boot with partition images IF the assets are already present in the partition image.

  • Mirrors the previously disconnected EFI system partitions (ESPs) in UEFI software RAID setups. Disconnected ESPs can lead to nodes booting with outdated kernel parameters or the UEFI firmware not finding bootable kernels at all.

  • Fixes incorrect root partition UUID after streaming a raw partition image.

  • Fixes nodes failing after deployment completes due to issues in the Grub2 EFI loader entry addition where a BOOT.CSV file provides the authoritative pointer to the bootloader to be used for booting the OS. The base issue with Grub2 is that it would update the UEFI bootloader NVRAM entries with whatever is present in a vendor specific BOOT.CSV or BOOTX64.CSV file. In some cases, a baremetal machine can crash when this occurs. More information can be found at story 2008962.

  • Adds a call to “udevadm settle” in write_image.sh. After GPT and MBR are destroyed systemd-udevd gets triggered which may hold /dev/sda open preventing qemu-img from writting its image.

  • Provides a more specific error message if a UEFI-incompatible image is used in the UEFI mode.

  • Increase memory usage limit for qemu-img convert command to 2 GiB. See Story 2008667 for details.

6.4.3

New Features

  • Adds the ability to bring up VLAN interfaces and include them in the introspection report. This is needed in environments that require an IP address to be configured on tagged VLANs. A new kernel params field is added - ipa-enable-vlan-interfaces, which defines either the VLAN interface to enable, the interface to use, or ‘all’ - which indicates all interfaces. If the particular VLAN is not provided, IPA will use the LLDP information for the interface to determine which VLANs should be enabled. See story 2008298.

Bug Fixes

  • Automatically generated TLS certificates now have their validity starting in the past (1 hour by default) to allow for clock skew.

  • Fixes the agent process for determining what partition label type to utilize when writing partition images. In many cases, this could fallback to msdos if the instance flavor was not properly labeled.

  • Correctly decodes error messages from ironic API.

6.4.2

Bug Fixes

  • The mdadm utility is no longer a hard requirement. It’s still required if software RAID is used (even when not managed by ironic).

6.4.1

Bug Fixes

  • Fixes the write_image deploy step to actually check and return any errors during its execution.

  • Avoids a traceback when using install_bootloader with whole disk images. If the root UUID cannot be detected, don’t try to call grub.

6.4.0

New Features

  • Enables support in IPA for hosting the API server over TLS. Using this support requires setting [DEFAULT]listen_tls to True, and then setting [ssl]cert_file, [ssl]key_file, and optionally [ssl]ca_file to files embedded in the ramdisk IPA runs inside.

  • When a recent enough version of ironic is detected and listen_tls is False, agent will now generate a self-signed TLS certificate and send it to ironic on heartbeat. This ensures encrypted communication from ironic to the agent. Set enable_auto_tls to False to disable this behavior.

  • The logs inspection collector is now enabled by default, change ipa-inspection-collectors to disable.

Upgrade Notes

  • IPA heartbeat intervals now rely on accurate clock time. Any clean or deploy steps which attempt to sync the clock may cause heartbeats to not be emitted. IPA syncs time at startup and shutdown, so these steps should not be required.

Bug Fixes

  • Fixes an issue with nodes undergoing fast-track from introspection to deployment where the agent internal cache of the node may be stale. In particular, this can be observed if node does not honor a root device hint which is saved to Ironic’s API after the agent was started. More information can be found in story 2008039.

  • Fixes a minor incorrect keyword argument that was matching between the method caller and the unit test but not the actual method, unit test, and caller. This was a non-fatal issue, and should now permit the agent to attempt to lookup the node one last time before deploying the instance image to pick-up a root device hint.

  • Fixes an issue with the IntelCnaHardwareManager which prevented hardware managers with lower priority to be executed and therefore may blocked the initialization and collection of hardware these managers are supposed to take care of.

  • Fixes a bug where the partitions created during software RAID setup are cleaned too early and therefore may prevent the proper cleaning of the md superblocks. Leaving superblocks behind will impact the creation of new md devices later on.

  • Detects md component devices by their UUID, rather than by scanning the output of mdadm. This will prevent that devices miss md superblock cleanup when they are currently not part of an array.

Other Notes

  • Adds an explicit capture of connectivity failures in the heartbeat process to provide a more verbose error message in line with what is occuring as opposed to just indicating that an error occured. This new exception is called HeartbeatConnectionError and is likely only going to be visible if there is a local connectivity failure such as a router failure, switchport in a blocking state, or connection centered transient failure.

6.3.0

New Features

  • The new kernel parameter ipa-advertise-protocol can be used to change the protocol of the callback URL to https.

  • The deploy.erase_devices_metadata clean step can now also be used as a deploy step.

  • Introspection of PCI devices now collects PCI class, revision and PCI bus.

  • Adds a Poll extension which provides the ability to retrieve hardware information as well as set node data from API. This feature is required for poll mode deployment driven by ironic.

Bug Fixes

  • Fixes the return value of the apply_configuration deploy step: the agent RAID interface expects the final RAID configuration to be returned.

  • Fixes an issue where the bootloader installation can fail on a software RAID volume when no root_device hint is set. See Story 2007905

  • Fixes retry logic issues with the Agent Lookup which can result in the lookup failing prematurely before being completed, typically resulting in an abrupt end to the agent logging and potentially weird errors like TypeError being reported on the agent process standard error output. For more information see bug 2007968.

  • Fixes an issue with the ironic-python-agent where we would call to setup the bootloader, which is necessary with software raid, but also attempt to clean up iSCSI. This can cause issues when using the direct deploy_interface. Now the agent will only clean up iSCSI connections if iSCSI was explicitly started. For more information, please see story 2007937.

6.2.0

Bug Fixes

  • Fixes deployment failures when the image download is interrupted mid-stream while the contents are being downloaded. Previously retries were limited to only opening the initial connection.

  • Fixes the short timeout retries interval, which was previously 5 seconds, to a length that will allow the agent to retry after a network interruption. The time between retries is now 10 seconds, and the number of retries are set to 9 to help ensure intermittent network outages do not cause recoverable failures.

  • Fixes an issue with high cpu usage caused by ironic-python-agent greenthread eventlent implementation.

    Using eventlet.sleep(0.1) instead of eventlet.sleep(0) gives other processes of IPA more cpu time to run.

  • Speeds up going from inspection to cleaning with fast-track enabled by caching hardware information between the steps.

  • Fixes serializing exceptions originating from ironic-lib. Previously an attempt to do so would result in a TypeError, for example: Object of type ‘InstanceDeployFailure’ is not JSON serializable.

  • Fixes failure to detect a hung file download connection in the event that the kernel has not rapidly detected that the remote server has hung up the socket. This can happen when there is intermittent and transient connectivity issues such as those that can occur due to LACP failure response hold-downs timers in switching fabrics.