Wallaby Series (6.5.0 - 7.0.x) Release Notes¶
Fixes UEFI NVRAM record handling with efibootmgr so we can accept and handle UTF-16 encoded data which is to be expected in UEFI NVRAM as the records are UTF-16 encoded.
Fixes handling of UEFI NVRAM records to allow for unexpected characters in the response, so it is non-fatal to Ironic.
Fixes a minor issue with the regular expression used for UEFI duplicate entry cleanup which was introduced in a prior change to refactor the cleanup operation to avoid UEFI firmware which treats deletion of entries after addition as an invalid operation.
Fixes cases where duplicates may not be found in the UEFI firmware NVRAM boot entry table by explicitly looking for, and deleting for matching labels in advance of creating the EFI boot loader entry.
In case the CSV file used for the bootloader hint does not have BOM we fail reading its content as utf-16 codec is too generic. Fail over to utf-16-le as Little Endian is mostly used.
Fixes configuring UEFI boot when the EFI partition is located on a devicemapper device.
Fixes GenericHardwareManager to find network information for bonded interfaces if they exist.
Fixes a race on software RAID creation: since the creation of partitions is asynchronous, we need to wait for all udev events to be processed before we can use the partitions to create an md device.
Fixes an issue where partitions are not visible due to a incorrect call to have the partition table re-read.
Fixes an issue where partitions are not visible due to an incorrect call to have the partition table re-read during raid configuration creation.
Fixes handling of Software RAID device discovery so RAID device
Eventsfield values do not inadvertently cause the command to return unexpected output. Previously this could cause a deployment to when handling UEFI partitions.
Fixes an issue when the EFI partition UUID is not set and an attempt to edit /etc/fstab is made.
Fixes handling of a Partition UUID being returned instead of a Partition’s UUID when the OS may not return the Partition’s UUID in time. These two fields are typically referred to as PARTUUID and UUID, respectively. Often these sorts of issues arise under heavy IO load. We now scan, and identify which “UUID” we identified, and update a Linux fstab entry appropriately. For more information, please see story #2009881.
Recent releases of redhat grub2 will always fail when installing to EFI paths, to encourage a transition to the signed shim bootloader. Partition image deploys avoid calling grub2-install with the preserve-efi-assets functions. Deploying whole disk images doesn’t require grub2-install. This leaves whole disk images installed onto softraid devices, which still calls grub2-install. Running grub2-install is still attempted in this one remaining case, but any failures are now ignored.
Fixes failures with handling of Multipath IO devices where Active/Passive storage arrays are in use. Previously, “standby” paths could result in IO errors causing cleaning to terminate. The agent now explicitly attempts to handle and account for multipaths based upon the MPIO data available. This requires the
multipathdutility to be present in the ramdisk. These are supplied by the
multipath-toolspackages, and are not requried for the agent’s use.
Fixes non-ideal behavior when performing cleaning where Active/Active MPIO devices would ultimately be cleaned once per IO path, instead of once per backend device.
Fixes discovering WWN/serial numbers for devicemapper devices.
The agent will now attempt to collect any multipath path information and upload it to the agent ramdisk, if the tooling is present.
Heartbeats to the conductor are grouped when they are scheduled or requested within a time interval of five seconds to avoid sending them in quick succession.
Adds the capability into the agent to read and act upon bootloader CSV files which serve as authoritative indicators of what bootloader to load instead of leaning towards utilizing the default.
If multiple bootloader CSV files are present on the EFI filesystem, the first CSV file discovered will be utilized. The Ironic team considers multiple files to be a defect in the image being deployed. This may be changed in the future.
Fixes an issue with bootloader installation on a software RAID by checking if the ESP is already mounted.
Fixes an issue where a quick succession of heartbeats exposes a race condition in the conductor’s RPC handling.
Fixes fall-back to sysrq when powering off or rebooting the node from inside a container.
Fixes an error with UEFI based deployments where using a partition image a NVMe device was previously failing due to the different device name pattern.
Fixes an issue where the NTP time sync at the IPA startup via chronyd is not immediate (which can break time sensitive components such as the generation of a TLS certificate).
Fixes failures with disk image conversions which result in memory allocation or input/output errors due to memory limitations by limiting the number of available memory allocation pools to a non-dynamic reasonable number which should not exceed the available system memory.
The lshw package version B.02.19.2-5 on CentOS 8.4 and 8.5 contains a bug that prevents the size of individual memory banks from being reported, with the result that the total memory size would be reported as 0 in some places. The total memory size is now taken from lshw’s total memory size output (which does not suffer from the same problem) when available.
Mirrors the previously disconnected EFI system partitions (ESPs) in UEFI software RAID setups. Disconnected ESPs can lead to nodes booting with outdated kernel parameters or the UEFI firmware not finding bootable kernels at all.
Fixes nodes failing after deployment completes due to issues in the Grub2 EFI loader entry addition where a
BOOT.CSVfile provides the authoritative pointer to the bootloader to be used for booting the OS. The base issue with Grub2 is that it would update the UEFI bootloader NVRAM entries with whatever is present in a vendor specific
BOOTX64.CSVfile. In some cases, a baremetal machine can crash when this occurs. More information can be found at story 2008962.
Fixes initial logging before configuration is loaded to re-log anything recorded for the purposes of troubleshooting. This is necessary as systemd does not report stdout from a process launch as part of the process’s logging. Now messages will be re-logged once the configuration has been loaded.
No longer crashes if MAC address cannot be determined for one of the network interfaces.
Adds a call to “udevadm settle” in write_image.sh. After GPT and MBR are destroyed systemd-udevd gets triggered which may hold /dev/sda open preventing qemu-img from writting its image.
Adds support for NVMe-specific storage cleaning to IPA. Currently this is implemented by using nvme-cli format functionality. Crypto Erase is used if supported by the device, otherwise the code falls back to User Data Erase. The operators can control NVMe cleaning by using deploy.enable_nvme_erase config option which controls
agent_enable_nvme_eraseinternal setting in driver_internal_info.
Adds a new deploy step
deploy.inject_filesto inject arbitrary files into the instance. See the hardware managers documentation for details.
Logic around virtual media device validation is now much more strict, and may not work in all cases. Should you discover a case, please provide the output from
lsblk -P -Owith a virtual media device attached to the Ironic development community via Storyboard.
Internal logic to copy configuration data from virtual media now requires the
boot_method=vmediaflag to be set on the kernel command line of the bootloader for the virtual media. Operators crafting custom boot ISOs, should ensure that the appropriate command line is being added in any custom build processes.
It is no longer possible to enable the so called standalone mode, in which the agent does not communicate with ironic. This mode is only useful for local testing, enabling it on production is always wrong. The ironic team does not support using ironic-python-agent as a standalone application outside of the normal workflow.
Addresses a potential vector in which an system authenticated malicious actor could leveraged data left on disk in some limited cases to make the API of the
ironic-python-agentattackable, or possibly break cleaning processes to prevent the machine from being able to be returned to the available pool. Please see story 2008749 for more information.
Adds validation of Virtual Media devices in order to prevent existing partitions on the system from being considered as potential sources of IPA configuration data.
Adds check into the configuration load from virtual media, to ensure it only occurs when the machine booted from virtual media.
IPA will now successfully clean configuration when it encounters a software RAID array that was previously created using entire devices instead of partitions.
IPA now properly checks if the root partition is already mounted. See Story 2008631 for details.
Fixes an issue where metadata erasure cleaning fails for partitions because the read-only file isn’t found, while it is available at the base device. Adds a check for the base device file on failure. See story 2008696.
Fixes incorrect root partition UUID after streaming a raw partition image.
Increase memory usage limit for
qemu-img convertcommand to 2 GiB. See Story 2008667 for details.
The kernel parameter
lldp-timeout(deprecated during the Newton development cycle) has been removed, please use
Fix UEFI boot entry creation for aarch64 when using diskimage-builder created whole disk images.
Provides a more specific error message if a UEFI-incompatible image is used in the UEFI mode.
Adds UUID of the disks to the inventory of block devices that is collected during inspection.
Adds the ability to bring up VLAN interfaces and include them in the introspection report. A new kernel params field is added -
ipa-enable-vlan-interfaces, which defines either the VLAN interface to enable, the interface to use, or ‘all’ - which indicates all interfaces. If the particular VLAN is not provided, IPA will use the LLDP information for the interface to determine which VLANs should be enabled. See story 2008298.
Adds a clean step to erase the Linux kernel’s pstore. The step is disabled by default.
Adds an configuration option which can be encoded into the ramdisk itself or the PXE parameters being provided to instruct the agent to ignore bootloader installation or configuration failures. This functionality is useful to work around well-intentioned hardware which is auto-populating all possible device into the UEFI nvram firmware in order to try and help ensure the machine boots. Except, this can also mean any explict configuration attempt will fail. Operators needing this bypass can use the
ipa-ignore-bootloader-failureconfiguration option on the PXE command line or utilize the
ignore_bootloader_failureoption for the Ramdisk configuration. In a future version of ironic, this setting may be able to be overriden by ironic node level configuration.
Deployers in highly-secure environments can now manually set Ironic API version instead of relying on unauthenticated autodetection via the
ipa-ironic-api-versionon the kernel command line. This is not a recommended configuration.
For Software RAID, the IPA will use partition LABEL along with UUID and PARTUUID passed from the conductor to identify the root partition. The root file system LABEL can be set as value of the
rootfs_uuidimage metadata property.
If enabled, the new clean step ‘erase_pstore’ removes all pstore entries (the oops/panic logs from a failing kernel) upon cleaning. This is to reduce the risk that potentially sensitive data is preserved across instantiations (and therefore different users) of a bare metal node.
Fixes an issue where intermittent or transitory connection issues can cause inspection to fail. The ramdisk now retries to report to inspector a total of five times.
The system file system configuration file for Linux machines, the
/etc/fstabfile is now updated to include a reference to the EFI partition in the case of a partition image base deployment. Without this reference, images deployed using partition images could end up in situations where upgrading the bootloader could fail.
Automatically generated TLS certificates now have their validity starting in the past (1 hour by default) to allow for clock skew.
Fixes the agent process for determining what partition label type to utilize when writing partition images. In many cases, this could fallback to
msdosif the instance flavor was not properly labeled.
Fixes issue where the running system operating mode was not taken into account when writing partition images. The agent now utilises a helper instead of explicitly expecting the flavor derived information to supply all deployment context.
Fixes an issue where deployments of Fedora or Centos can hang when using grub2 with the execution of the
grub2-mkconfigcommand not returning before the deployment process times out. This is because
os-proberwhich can take an extended period of time to evaluate additional unrelated devices for dual-boot scenarios. Since operators are not dual booting their machines enrolled in ironic, it seems like an un-necessary scan and has thus been disabled.
Correctly decodes error messages from ironic API.
mdadmutility is no longer a hard requirement. It’s still required if software RAID is used (even when not managed by ironic).
write_imagedeploy step to actually check and return any errors during its execution.
Fixes the agent’s EFI boot handling such that EFI assets from a partition image are preserved and used instead of overridden. This should permit operators to use Secure Boot with partition images IF the assets are already present in the partition image.
Upon the creation of Software RAID devices, component devices are sometimes kicked out immediately (for no apparent reason). This fix re-adds devices in such cases in order to prevent the component to be missing next time the device is assembled, which, for instance may prevent the UEFI ESPs to be installed properly.
Avoids a traceback when using
install_bootloaderwith whole disk images. If the root UUID cannot be detected, don’t try to call grub.
Agent configuration files found on attached virtual media or config drive devices are now copied to the ramdisk and loaded on start up.