Xena Series (18.0.0 - 18.2.x) Release Notes¶
18.3.0-13¶
Upgrade Notes¶
When upgrading Ironic to address the
qemu-img
image conversion security issues, theironic-python-agent
ramdisks will also need to be upgraded.
As a result of security fixes to address
qemu-img
image conversion security issues, a new configuration parameter has been added to Ironic,[conductor]permitted_image_formats
with a default value of “raw,qcow2,iso”. Raw and qcow2 format disk images are the image formats the Ironic community has consistently stated as what is supported and expected for use with Ironic. These formats also match the formats which the Ironic community tests. Operators who leverage other disk image formats, may need to modify this setting further.
Security Issues¶
Ironic now checks the supplied image format value against the detected format of the image file, and will prevent deployments should the values mismatch. If being used with Glance and a mismatch in metadata is identified, it will require images to be re-uploaded with a new image ID to represent corrected metadata. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic always inspects the supplied user image content for safety prior to deployment of a node should the image pass through the conductor, even if the image is supplied in
raw
format. This is utilized to identify the format of the image and the overall safety of the image, such that source images with unknown or unsafe feature usage are explicitly rejected. This can be disabled by setting[conductor]disable_deep_image_inspection
toTrue
. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic also inspect images which would normally be provided as a URL for direct download by the
ironic-python-agent
ramdisk. This is enabled by default and increases the overall network traffic and disk space utilization of the conductor. This level of inspection can be disabled by setting[conductor]conductor_always_validates_images
toFalse
. Doing so is not advisable as Zed release and earlierironic-python-agent
ramdisks will not be made available due to backport regression risk. This is the result of CVE-2024-44082 tracked as bug 2071740.
Ironic now explicitly enforces a list of permitted image types for deployment via the
[conductor]permitted_image_formats
setting, which defaults to “raw”, “qcow2”, and “iso”. While the project has classically always declared permissible images as “qcow2” and “raw”, it was previously possible to supply other image formats known toqemu-img
, and the utility would attempt to convert the images. The “iso” support is required for “boot from ISO” ramdisk support.
Ironic now explicitly passes the source input format to executions of
qemu-img
to limit the permitted qemu disk image drivers which may evaluate an image to prevent any mismatched format attacks againstqemu-img
.
The
ansible
deploy interface example playbooks now supply an input format to execution ofqemu-img
. If you are using customized playbooks, please add “-f {{ ironic.image.disk_format }}” to your invocations ofqemu-img
. If you do not do so,qemu-img
will automatically try and guess which can lead to known security issues with the incorrect source format driver.
Operators who have implemented any custom deployment drivers or additional functionality like machine snapshot, should review their downstream code to ensure they are properly invoking
qemu-img
. If there are any questions or concerns, please reach out to the Ironic project developers.
Operators are reminded that they should utilize cleaning in their environments. Disabling any security features such as cleaning or image inspection are at your own risk. Should you have any issues with security related features, please don’t hesitate to open a bug with the project.
The
[conductor]disable_deep_image_inspection
setting is conveyed to theironic-python-agent
ramdisks automatically, and will prevent those operating ramdisks from performing deep inspection of images before they are written.
The
[conductor]permitted_image_formats
setting is conveyed to theironic-python-agent
ramdisks automatically. Should a need arise to explicitly permit an additional format, that should take place in the Ironic service configuration.
Bug Fixes¶
Fixes multiple issues in the handling of images as it relates to the execution of the
qemu-img
utility, which is used for image format conversion, where a malicious user could craft a disk image to potentially extract information from anironic-conductor
process’s operating environment.Ironic now explicitly enforces a list of approved image formats as a
[conductor]permitted_image_formats
list, which mirrors the image formats the Ironic project has historically tested and expressed as known working. Testing is not based upon file extension, but upon content fingerprinting of the disk image files. This is tracked as CVE-2024-44082 via bug 2071740.
Fixes Ironic integration with Cinder because of changes which resulted as part of the recent Security related fix in bug 2004555. The work in Ironic to track this fix was logged in bug 2019892. Ironic now sends a service token to Cinder, which allows for access restrictions added as part of the original CVE-2023-2088 fix to be appropriately bypassed. Ironic was not vulnerable, but the restrictions added as a result did impact Ironic’s usage. This is because Ironic volume attachments are not on a shared “compute node”, but instead mapped to the physical machines and Ironic handles the attachment life-cycle after initial attachment.
Fixes bug of iRMC driver in parse_driver_info where, if FIPS is enabled, SNMP version is always required to be version 3 even though iRMC driver’s xxx_interface doesn’t use SNMP actually.
Fixes an issue in the online upgrade logic where database models for Node Traits and BIOS Settings resulted in an error when performing the online data migration. This was because these tables were originally created as extensions of the Nodes database table, and the schema of the database was slightly different enough to result in an error if there was data to migrate in these tables upon upgrade, which would have occured if an early BIOS Setting adopter had data in the database prior to upgrading to the Yoga release of Ironic.
The online upgrade parameter now subsitutes an alternate primary key name name when applicable.
Fixes an issue where a System Scoped user could not trigger a node into a
manageable
state with cleaning enabled, as the Neutron client would attempt to utilize their user’s token to create the Neutron port for the cleaning operation, as designed. This is because with requests made in thesystem
scope, there is no associated project and the request fails.Ironic now checks if the request has been made with a
system
scope, and if so it utilizes the internal credential configuration to communicate with Neutron.
Modify iRMC driver to use ironic.conf [deploy] default_boot_mode to determine default boot_mode.
Fixes issues with Lenovo hardware where the system firmware may display a blue “Boot Option Restoration” screen after the agent writes an image to the host in UEFI boot mode, requiring manual intervention before the deployed node boots. This issue is rooted in multiple changes being made to the underlying NVRAM configuration of the node. Lenovo engineers have suggested to only change the UEFI NVRAM and not perform any further changes via the BMC to configure the next boot. Ironic now does such on Lenovo hardware. More information and background on this issue can be discovered in bug 2053064.
18.3.0¶
Upgrade Notes¶
Adds
sha256
,sha384
andsha512
as supported SNMPv3 authentication protocols to iRMC driver.
Bug Fixes¶
Fixes an issue where if selinux is enabled and enforcing, and the published image is a hardlink, the source selinux context is preserved, causing access denied when retrieving the image using hardlink URL.
Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s
driver_info
, and can be used for authentication:irmc_snmp_user
irmc_snmp_auth_password
irmc_snmp_priv_password
irmc_snmp_auth_proto
(Optional, defaults tosha
)irmc_snmp_priv_proto
(Optional, defaults toaes
)
irmc_snmp_user
replacesirmc_snmp_security
.irmc_snmp_security
will be ignored ifirmc_snmp_user
is set.irmc_snmp_auth_proto
andirmc_snmp_priv_proto
can also be set through the following options in the[irmc]
section of/etc/ironic/ironic.conf
:snmp_auth_proto
snmp_priv_proto
Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an
agent token
had been established, which is the very first step in agent initialization.
Other Notes¶
Updates the minimum version of
python-scciclient
library to0.11.3
.
18.2.2¶
Known Issues¶
When using
jsonschema
4.0.0 or newer, make sure to include a proper$schema
field in your custom network data or RAID schemas.
Security Issues¶
Modifies the
irmc
hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.4,<0.10.0, >=0.10.1,<0.11.0 or >=0.11.3,<0.12.0 Or certificate verification will not occur.
Bug Fixes¶
Fixes detecting of allowable values for a BIOS settings enumeration in the
redfish
BIOS interface when onlyValueDisplayName
is provided.
The anaconda deploy interface was treating the config drive as a dict, whereas it could be a dict or in iso6600 format, gzipped and base64-encoded. This has been fixed.
The anaconda deploy interface was adding commands that deal with the config drive, to the end of the kickstart config file. Which means that they are handled after an ironic API request is sent (to the conductor) to indicate that the node has been provisioned and is ready to be rebooted. Which means that there is a possible race condition wrt these commands being completed before the node is powered off. A sync is added to ensure that all modifications have been written to disk, before the API request is sent – as the last thing.
Extra newlines (’n’) were incorrectly added to the user data content. This broke the content-type decoding and cloud-init was unable to proces them. The extra newlines have been removed.
Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.
For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.
The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).
For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.
For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.
For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.
The anaconda deploy interface was saving internal information in the node’s instance_info, in the user-facing ‘stage2’ and ‘ks_template’ fields. This broke rebuilds using a different image with different stage2 or template specified in the image properties. This has been fixed by saving the information in the node’s driver_internal_info instead.
Fixes pagination for the following collections:
/v1/allocations /v1/chassis /v1/conductors /v1/deploy_templates /v1/nodes/{node}/history
The
next
link now contains a valid URL.
Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the
redfish-virtual-media
boot interface. Previously, the ISO would not be configured.
Fixes
redfish
andidrac-redfish
RAIDcreate_configuration
,apply_configuration
,delete_configuration
clean and deploy steps to update node’sraid_config
field at the end of the steps.
Fixes the determination of a failed RAID configuration task in the
redfish
hardware type. Prior to this fix the tasks that have failed were reported as successful.
Fixes the
redfish
hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to useredfish
raid
interface on iDRAC systems.
Fixes
redfish-virtual-media
boot
interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work withredfish-virtual-media
before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to useredfish-virtual-media
with iDRAC.
Fixes the
initrd
kernel parameter when booting ramdisk directly from Swift/RadosGW using iPXE. Previously it was alwaysdeploy_ramdisk
, even when the actual file name is different.
Adds
driver_info/irmc_verify_ca
option to specify certification file. Default value of driver_info/irmc_verify_ca is True.
Fixes compatibility with
jsonschema
package version 4.0.0 or newer by providing a proper schema version (Draft-07 currently).
The image cache now respects the
Cache-Control: no-store
header for HTTP(s) images.
File images are no longer cached in the image cache to avoid unnecessary consumption of the disk space.
18.2.1¶
Bug Fixes¶
No longer validates boot interface parameters when adopting a node that uses local boot.
Fixes installation and unit testing of ironic when using the sushy library by setting an appropriate upper constraint. This version of Ironic is not compatible with Sushy 4.0.0.
Fixes a bug in the anaconda deploy interface where the ‘ks_options’ key was not found when rendering the default kickstart template.
Fixes issue where PXEAnacondaDeploy interface’s deploy() method did not return states.DEPLOYWAIT so the instance went straight to ‘active’ instead of ‘wait call-back’.
Fixes an issue where the anaconda deploy interface mistakenly expected ‘squashfs_id’ instead of ‘stage2_id’ property on the image.
Fixes the heartbeat mechanism in the default kickstart template ks.cfg.template as the heartbeat API only accepts ‘POST’ and expects a mandatory ‘callback_url’ parameter.
Fixes handling of tarball images in anaconda deploy interface. Allows user specified file extensions to be appended to the disk image symlink. Users can now set the file extensions by setting the ‘disk_file_extension’ property on the OS image. This enables users to deploy tarballs with anaconda deploy interface.
Fixes issue where automated cleaning was not supported when anaconda deploy interface is used.
Fixed an issue where duplicate extra DHCP options was passed in the port update request to the Networking service. The duplicate DHCP options caused an error in the Networking service and node provisioning would fail. See bug: 2009774.
Fixes
idrac-wsman
management interfaceset_boot_device
method that would fail deployment when there are existing jobs present with error “Failed to change power state to ‘’power on’’ by ‘’rebooting’’. Error: DRAC operation failed. Reason: Unfinished config jobs found: <list of existing jobs>. Make sure they are completed before retrying.”. Now there can be non-BIOS jobs present during deployment. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving toidrac-redfish
that does not have this limitation when setting boot device.
Fixed an issue where provisioning/cleaning would fail on IPv6 routed provider networks. See bug: 2009773.
Fixes validation of input argument
firmware_images
ofredfish
hardware type clean stepupdate_firmware
. Now it validates the argument at the beginning of clean step. Prior to this fix issues were determined at the time of executing firmware update or not at all (for example, mistyping optional field ‘wait’).
Fixes
redfish
hardware typeupdate_firmware
cleaning step to work with Sushy version 4.0.0 or greater.
Fixes an issue where clients would get a 404 due to the node pagination breaking at max_limit due to an uninitialised resource_url.
Fixes an issue where clients would get a 404 due to the port and portgroups pagination breaking at max_limit due to an uninitialised resource_url.
Fixes
File name too long
in the image caching code when a URL contains a long query string.
Inspection no longer fails when one of the NICs reports NIC address that is not a valid MAC (e.g. a WWN).
Fixed the bug of repeated resume cleaning due to the value of fgi_status not being updated correctly when obtaining the RAID configuration status of the node managed by the irmc hardware type.
When configuring RAID on iRMC machines through ironic, polling is not set when RAID is created. After creating the RAID, set up polling will notify ironic to wait for the RAID configuration to complete before proceeding to the next step instead of check IPA.
Fixes connection caching issues with Redfish BMCs where AccessErrors were previously not disqualifying the cached connection from being re-used. Ironic will now explicitly open a new connection instead of using the previous connection in the cache. Under normal circumstances, the
sushy
redfish library would detect and refresh sessions, however a prior case exists where it may not detect a failure and contain cached session credential data which is ultimately invalid, blocking future access to the BMC via Redfish until the cache entry expired or theironic-conductor
service was restarted. For more information please see story 2009719.
Removing ?filename=file.iso suffix from the virtual media image URL when the image is a regular file due to incompatibility with SuperMicro X12 machines which do not accept special characters such as = or ? in the URL. Historically, this suffix was being added to improve compatibility with those BMCs which require .iso suffix in the URL while using swift as the image store. Old behaviour will remain for swift backed images.
18.2.0¶
Prelude¶
The Ironic team hearby announces the release of Ironic 18.2.
During the Xena development cycle, thirty eight contributors collaborated together, and with our adjacent communities to support the needs of our end users in all the many forms they take. Over 48,000 lines of code were modified, and twenty two new features made it into Ironic along with a number of bug fixes. We sincerely hope you enjoy!
New Features¶
Adds support for fields selector in driver api. See story 1674775.
GET /v1/drivers?fields=...
GET /v1/drivers/{driver_name}?fields=...
Adds API version
1.78
which provides the capability to retrieve node history events which may have been recorded in the process of management of the node, which may be aid in troubleshooting or identifying a problem area with a specific node or configuration which has been supplied.
Adds a capability to allow bootloaders to be copied into the configured network boot path. This capability can be opted in by using the
[pxe]loader_file_paths
by being set to a list of key, value pairs of destination filename, and source file path.[pxe] loader_file_paths = bootx64.efi:/path/to/shimx64.efi,grubx64.efi:/path/to/grubx64.efi
Manual clean step
clear_ca_certificates
is added to remove the CA certificates from iLO.
Adds endpoints to change boot mode and secure boot state of node.
PUT /v1/nodes/{node_ident}/states/boot_mode
PUT /v1/nodes/{node_ident}/states/secure_boot
The API will respond with 202 (Accepted) on validating the request and accepting to process it. Changes occur asynchronously in a background task. The user can then poll the states endpoint
/v1/nodes/{node_ident}/states
for observing current status of the requested change.
Allows limiting the number of parallel downloads for cached images (instance and TFTP images currently).
Adds support to specify HttpHeaders when creating a subscription via redfish vendor passthru.
Upgrade Notes¶
The
parallel_image_downloads
option is now set toTrue
by default. Use the newimage_download_concurrency
option to tune the behavior, the default concurrency is 20.
In-band cleaning has been fixed for
ramdisk
andanaconda
deploy interfaces. If you rely on actual clean steps not running, you need to disable cleaning instead for the relevant nodes:baremetal node set <node> --no-automated-clean
Deprecation Notes¶
Ironic previously announced the default for the
[deploy]default_boot_mode
would be changing “in a future release”. This was announced during the Stein development cycle. Ironic will change This default touefi
during the Yoga development cycle.
The
parallel_image_downloads
option is deprecated in favour of the newimage_download_concurrency
option that allows more precise tuning.
Bug Fixes¶
Fixes a regression in the
ramdisk
deploy where custom kernel parameters were not used during inspection and cleaning.
Resolve issue where
[conductor]clean_step_priority_override
values are applied too late, after disabled steps have been already filtered out. With this change, priority overrides are applied prior to filtering out disabled steps, so that this configuration option can use used to enable or disable steps (in particular clean steps) in addition to changing priorities they are run with.
The validation for
create_subscription
now uses the default values from Redfish for Context and Protocol to avoid None. The fields returned bycreate_subscription
andget_subscription
are now filtered by the common fields between vendors. Deleting a subscription that doesn’t exist will return 404 instead of 500.
Fixes an issue in db schema version testing where objects with a initial version, e.g. “1.0”, are allowed to not already have their DB tables pre-exist when performing the pre-upgrade compatability check for the database. This allows the upgrade to proceed and update the database schema without an explicit known list having to be maintained in Ironic.
Handles excessively long errors when the status upgrade check is executed, and simply indicates now if a table is missing, suggesting to update the database schema before proceeding.
Fixes issue in
idrac-redfish
clean/deploy stepimport_configuration
where partially successful jobs were treated as fully successful. Such jobs, completed with errors, are now treated as failures.
Fix
idrac-redfish
clean/deploy stepimport_configuration
to handle completed import configuration tasks that are deleted by iDRAC before Ironic has checked task’s status. Prior iDRAC firmware version 5.00.00.00 completed tasks are deleted after 1 minute in iDRAC Redfish. That is not always sufficient to check for their status in periodic check that runs every minute by default. Before this fix node got stuck in wait mode forever. This is fixed by failing the step with error informing to decrease periodic check interval or upgrade iDRAC firmware if not done already.
Fixes
idrac-redfish
RAID interfacedelete_configuration
clean/deploy step for controllers having foreign physical disks. Now foreign configuration is cleared after deleting virtual disks.
Fixes
idrac-redfish
RAID interface increate_configuration
clean step andapply_configuration
deploy step when there are drives in non-RAID mode. With this fix, non-RAID drives are converted to RAID mode before creating virtual disks.
Fixes
idrac-wsman
BIOS and RAID interface steps to correctly check status of iDRAC job that completed with errors. Now these jobs are treated as failures. Before this fix node stayed in wait state as it was only checking for “Completed” or “Failed” job status, but not “Completed with Errors”.
Fixes
idrac-wsman
power interface to wait for the hardware to reach the target state before returning. For systems where soft power off at the end of deployment to boot to instance failed and forced hard power off was used, this left node successfully deployed in off state without any errors. This broke other workflows expecting node to be on booted into OS at the end of deployment. Additional information can be found in story 2009204.
When an
http(s)://
image is used, the cached copy of the image will always be updated if the HTTP server does not provide the last modification date and time. Previously the cached image would be considered up-to-date, which could cause invalid behavior if the image is generated on fly or was modified while being served.
Fixes the pattern of execution for periodic tasks such that the majority of drivers now evaluate if work needs to be performed in advance of creating a node task. Depending on the individual driver query pattern, this prevents excess database queries from being triggered with every task execution.
Fixes in-band cleaning for the
ramdisk
andanaconda
deploy interfaces. Previously no in-band steps were fetched from the ramdisk.
Retries
ssl.SSLError
when connecting to the agent.
Other Notes¶
Removes a
NEW_MODELS
internal list from the dbsync utility which helped the tool navigate new models, however it was never used. Instead the tool now utilizes the database version and appropriate base version to make the appropriate decision in pre-upgrade checks.
The cleaning code has been moved from
AgentDeployMixin
toAgentBaseMixin
. Most of 3rd party deploy interfaces will need to include both anyway.