Wallaby Series (16.1.0 - 17.0.x) Release Notes¶
17.1.0-6¶
Upgrade Notes¶
Adds
sha256
,sha384
andsha512
as supported SNMPv3 authentication protocols to iRMC driver.
Bug Fixes¶
Fixes SNMPv3 message authentication and encryption functionality of iRMC driver. The SNMPv3 authentication between iRMC driver and iRMC was only by the security name with no passwords and encryption. To increase security, the following parameters are now added to the node’s
driver_info
, and can be used for authentication:irmc_snmp_user
irmc_snmp_auth_password
irmc_snmp_priv_password
irmc_snmp_auth_proto
(Optional, defaults tosha
)irmc_snmp_priv_proto
(Optional, defaults toaes
)
irmc_snmp_user
replacesirmc_snmp_security
.irmc_snmp_security
will be ignored ifirmc_snmp_user
is set.irmc_snmp_auth_proto
andirmc_snmp_priv_proto
can also be set through the following options in the[irmc]
section of/etc/ironic/ironic.conf
:snmp_auth_proto
snmp_priv_proto
Fixes a race condition in PXE initialization where logic to retry what we suspect as potentially failed PXE boot operations was not consulting if an
agent token
had been established, which is the very first step in agent initialization.
Other Notes¶
Updates the minimum version of
python-scciclient
library to0.10.1
.
17.1.0¶
Upgrade Notes¶
On Wallaby release, to use certification file on HTTPS connection, iRMC driver requires python-scciclient version to be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0 or >=0.10.1,<0.11.0 and packaging >=16.5
Security Issues¶
Modifies the
irmc
hardware type to include a capability to control enforcement of HTTPS certificate verification. By default this is enforced. python-scciclient version must be one of >=0.8.2,<0.9.0, >=0.9.5,<0.10.0, or >=0.10.1,<0.11.0 Or certificate verification will not occur.
Bug Fixes¶
Fixes the logic for the anaconda deploy interface. If the ironic node’s instance_info doesn’t have both ‘stage2’ and ‘ks_template’ specified, we weren’t using the instance_info at all. This has been fixed to use the instance_info if it was specified. Otherwise, ‘stage2’ is taken from the image’s properties (assumed that it is set there). ‘ks_template’ value is from the image properties if specified there (since it is optional); else we use the config setting ‘[anaconda] default_ks_template’.
For the anaconda deploy interface, the ‘stage2’ directory was incorrectly being created using the full path of the stage2 file; this has been fixed.
The anaconda deploy interface expects the node’s instance_info to be populated with the ‘image_url’; this is now populated (via PXEAnacondaDeploy’s prepare() method).
For the anaconda deploy interface, when the deploy was finished and the bm node was being rebooted, the node’s provision state was incorrectly being set to ‘active’ – the provisioning state-machine mechanism now handles that.
For the anaconda deploy interface, the code that was doing the validation of the kickstart file was incorrect and resulted in errors; this has been addressed.
For the anaconda deploy interface, the ‘%traceback’ section in the packaged ‘ks.cfg.template’ file is deprecated and fails validation, so it has been removed.
The anaconda deploy interface was saving internal information in the node’s instance_info, in the user-facing ‘stage2’ and ‘ks_template’ fields. This broke rebuilds using a different image with different stage2 or template specified in the image properties. This has been fixed by saving the information in the node’s driver_internal_info instead.
Fixes rebooting into the agent after changing BIOS settings in fast-track mode with the
redfish-virtual-media
boot interface. Previously, the ISO would not be configured.
Fixes a bug in the anaconda deploy interface where the ‘ks_options’ key was not found when rendering the default kickstart template.
Fixes issue where PXEAnacondaDeploy interface’s deploy() method did not return states.DEPLOYWAIT so the instance went straight to ‘active’ instead of ‘wait call-back’.
Fixes an issue where the anaconda deploy interface mistakenly expected ‘squashfs_id’ instead of ‘stage2_id’ property on the image.
Fixes the heartbeat mechanism in the default kickstart template ks.cfg.template as the heartbeat API only accepts ‘POST’ and expects a mandatory ‘callback_url’ parameter.
Fixes handling of tarball images in anaconda deploy interface. Allows user specified file extensions to be appended to the disk image symlink. Users can now set the file extensions by setting the ‘disk_file_extension’ property on the OS image. This enables users to deploy tarballs with anaconda deploy interface.
Fixes issue where automated cleaning was not supported when anaconda deploy interface is used.
Fixed an issue where duplicate extra DHCP options was passed in the port update request to the Networking service. The duplicate DHCP options caused an error in the Networking service and node provisioning would fail. See bug: 2009774.
Fixes
idrac-wsman
management interfaceset_boot_device
method that would fail deployment when there are existing jobs present with error “Failed to change power state to ‘’power on’’ by ‘’rebooting’’. Error: DRAC operation failed. Reason: Unfinished config jobs found: <list of existing jobs>. Make sure they are completed before retrying.”. Now there can be non-BIOS jobs present during deployment. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving toidrac-redfish
that does not have this limitation when setting boot device.
Fixed an issue where provisioning/cleaning would fail on IPv6 routed provider networks. See bug: 2009773.
Fixes
redfish
andidrac-redfish
RAIDcreate_configuration
,apply_configuration
,delete_configuration
clean and deploy steps to update node’sraid_config
field at the end of the steps.
Fixes the determination of a failed RAID configuration task in the
redfish
hardware type. Prior to this fix the tasks that have failed were reported as successful.
Fixes the
redfish
hardware type RAID device creation and deletion when creating or deleting more than 1 logical disk on RAID controllers that require rebooting and do not allow more than 1 running task per RAID controller. Before this fix 2nd logical disk would fail to be created or deleted. With this change it is now possible to useredfish
raid
interface on iDRAC systems.
Fixes
redfish-virtual-media
boot
interface to allow it with iDRAC firmware from 6.00.00.00 (released June 2022) as it has virtual media boot issue fixed that prevented iDRAC firmware to work withredfish-virtual-media
before. Consider upgrading iDRAC firmware if not done already, otherwise will still get an error when trying to useredfish-virtual-media
with iDRAC.
Fixes an issue where clients would get a 404 due to the node pagination breaking at max_limit due to an uninitialised resource_url.
Fixes an issue where clients would get a 404 due to the port and portgroups pagination breaking at max_limit due to an uninitialised resource_url.
Fixes
File name too long
in the image caching code when a URL contains a long query string.
Fixes the
initrd
kernel parameter when booting ramdisk directly from Swift/RadosGW using iPXE. Previously it was alwaysdeploy_ramdisk
, even when the actual file name is different.
Adds
driver_info/irmc_verify_ca
option to specify certification file. Default value of driver_info/irmc_verify_ca is True.
Fixes an issue with installation of Ansible in
driver-requirements.txt
on Python 3.8. Since the release of Ansible 6.0.0, significant backtracking occurred in the Pip resolver.
Fixes connection caching issues with Redfish BMCs where AccessErrors were previously not disqualifying the cached connection from being re-used. Ironic will now explicitly open a new connection instead of using the previous connection in the cache. Under normal circumstances, the
sushy
redfish library would detect and refresh sessions, however a prior case exists where it may not detect a failure and contain cached session credential data which is ultimately invalid, blocking future access to the BMC via Redfish until the cache entry expired or theironic-conductor
service was restarted. For more information please see story 2009719.
17.0.4¶
Upgrade Notes¶
The query pattern for the database when lists of nodes are retrieved has been changed to a more efficient pattern at scale, where a list of nodes is generated, and then additional queries are executed to composite this data together. This is from a model where the database client in the conductor was having to deduplicate the resulting data set which is overall less efficent.
Critical Issues¶
Fixes upgrade failure caused by the missing version of
BIOSSetting
database objects.
Bug Fixes¶
Skips port creation during redfish inspect for devices reported without a MAC address.
Fixes potential cache coherency issues by caching the AgentClient per task, rather than globally.
Fixes a regression in the
ramdisk
deploy where custom kernel parameters were not used during inspection and cleaning.
Slow database retrieval of nodes has been addressed at the lower layer by explicitly passing and handling only the requested fields. The result is excess discarded work is not performed, making the overall process more efficent. This is particullarly beneficial for OpenStack Nova’s syncronization with Ironic.
Fixes configuring Redfish RAID using
interface_type
when error “failed to find matching physical disks for all logical disks” occurs.
Fixes issue in
idrac-redfish
clean/deploy stepimport_configuration
where partially successful jobs were treated as fully successful. Such jobs, completed with errors, are now treated as failures.
Fix
idrac-redfish
clean/deploy stepimport_configuration
to handle completed import configuration tasks that are deleted by iDRAC before Ironic has checked task’s status. Prior iDRAC firmware version 5.00.00.00 completed tasks are deleted after 1 minute in iDRAC Redfish. That is not always sufficient to check for their status in periodic check that runs every minute by default. Before this fix node got stuck in wait mode forever. This is fixed by failing the step with error informing to decrease periodic check interval or upgrade iDRAC firmware if not done already.
Fixes
idrac-wsman
BIOS and RAID interface steps to correctly check status of iDRAC job that completed with errors. Now these jobs are treated as failures. Before this fix node stayed in wait state as it was only checking for “Completed” or “Failed” job status, but not “Completed with Errors”.
Fixes
idrac-wsman
power interface to wait for the hardware to reach the target state before returning. For systems where soft power off at the end of deployment to boot to instance failed and forced hard power off was used, this left node successfully deployed in off state without any errors. This broke other workflows expecting node to be on booted into OS at the end of deployment. Additional information can be found in story 2009204.
When an
http(s)://
image is used, the cached copy of the image will always be updated if the HTTP server does not provide the last modification date and time. Previously the cached image would be considered up-to-date, which could cause invalid behavior if the image is generated on fly or was modified while being served.
Improves record retrieval performance for baremetal nodes by enabling ironic to not make redundant calls as part of generating API result sets for the baremetal nodes endpoint.
Fixes the pattern of execution for periodic tasks such that the majority of drivers now evaluate if work needs to be performed in advance of creating a node task. Depending on the individual driver query pattern, this prevents excess database queries from being triggered with every task execution.
Removes unused local images after ejecting a virtual media device via the
eject_vmedia
vendor passthru call of theredfish
vendor interface.
In Redfish RAID clean and deploy steps skip non-RAID storage controllers for RAID operations. In Redfish systems that do not implement
SupportedRAIDTypes
they are still processed and could result in unexpected errors.
Retries
ssl.SSLError
when connecting to the agent.
Fixes an issue of powering off with the
idrac-wsman
management interface while the execution of a clear job queue cleaning step is proceeding. Prior to this fix, the clean step would fail when powering off a node.
Other Notes¶
The default database query pattern has been changed which will result in additional database queries when compositing lists of
nodes
by separately queryingtraits
andtags
. Previously this was a joined query which requires deduplication of the result set before building composite objects.
17.0.3¶
Security Issues¶
Fixes an issue with the
/v1/nodes/detail
endpoint where an authenticated user could explicitly ask for aninstance_uuid
lookup and the associated node would be returned to the user with sensitive fields redacted in the result payload if the user did not explicitly haveowner
orlessee
permissions over the node. This is considered a low-impact low-risk issue as it requires the API consumer to already know the UUID value of the associated instance, and the returned information is mainly metadata in nature. More information can be found in Storyboard story 2008976.
Bug Fixes¶
If the agent accepts a command, but is unable to reply to Ironic (which sporadically happens before of the eventlet’s TLS implementation), we currently retry the request and fail because the command is already executing. Ironic now detects this situation by checking the list of executing commands after receiving a connection error. If the requested command is the last one, we assume that the command request succeeded.
When local boot is used (e.g. by default), the instance image validation now happens only in the deploy interface, not in the boot interface (as before). This means that the boot interface validation will now pass in many cases where it would previously fail.
Fixes an issue with the
/v1/nodes/detail
endpoint where requests for an explicitinstance_uuid
match would not follow the standard query handling path and thus not be filtered based on policy determined access level and node levelowner
orlessee
fields appropriately. Additional information can be found in story 2008976.
No longer masks configdrive when sending the node’s record to in-band deploy steps.
Fixes handling of single-value (non-key-value) parameters in the
[inspector]extra_kernel_params
configuration options.
The behavior when a bootable iso ramdisk is provided behind an http server is to download and serve the image from the conductor; the image is removed only when the node is undeployed. In certain cases, for example on large deployments, this could cause undesired behaviors, like the conductor nodes running out of disk storage. To avoid this event we provide an option
[deploy]ramdisk_image_download_source
to be able to tell the ramdisk interface to directly use the bootable iso url from its original source instead of downloading it and serving it from the conductor node. The default behavior is unchanged.
Fixes sub-optimal Ironic API performance where Secure RBAC related field level policy checks were executing without first checking if there were field results. This helps improve API performance when only specific columns have been requested by the API consumer.
17.0.2¶
Bug Fixes¶
Fixes the
idrac-wsman
BIOSfactory_reset
clean and deploy step to indicate success and update the cached BIOS settings to their defaults only when the BIOS settings have actually been reset. See story 2008058 for more details.
Removes temporary cleaning information on starting or restarting cleaning.
Removes unnecessary delay before the start of the cleaning process when fast-track is used.
Correctly processes in-band deploy steps on fast-track deployment.
Correctly wipes agent token on inspection start and abort.
Fixes providing agent tokens with pre-built ISO images and the
redfish-virtual-media
boot interface.
17.0.0¶
Prelude¶
The Ironic community is proud to release Ironic 17.0!
Where if it were developer years instead of major versions, we would all be very afraid since it already has access to the car keys.
This release of Ironic includes numerous advancements which extend an operator’s ability to customize and further extend their deployment to meet their needs.
Redfish enhancements including Out of Band RAID configuration management and automatic setting of Secure Boot on nodes deployed using
redfish
.Deployment enhancements including UEFI Partition Image handling, per-instance per-deployments of default interface selections, user requestable
deploy_steps
at deploy time, IPA file injection, and support for setting a node’s boot mode viainstance_info
.Support for
system
scoped Role Based Access controls andproject
scoped access is available by default for associated nodes when thenode
owner
orlessee
fields are set. This effort alone added over 1,500 new unit tests.Operator friendly fixes such as memory over-consumption guard for memory intensive tasks, vendor hardware aware handling to help address issues such as different settings being needed to invoke UEFI, and “lazy” loading of database attributes to reduce the overall database load.
Along with all of this massive amount of work, a number of bugs were fixed while we were along the road trip of this development cycle.
We sincerely hope you enjoy it!
New Features¶
It is now possible to configure a priority for both the delete and create configuration RAID cleaning steps which are disabled by default.
Adds
import_configuration
,export_configuration
andimport_export_configuration
steps toidrac-redfish
management interface. These steps allow to use configuration from another system as template and replicate that configuration to other, similarly capable, systems. Currently, this feature is experimental.
Adds support for passing a
kernel_append_param
setting to theilo-virtual-media
andilo-uefi-https
boot interfaces using the configuration parameter[ilo]/kernel_append_param
with theilo
andilo5
hardware types.
Adds support for the discovery of PXE Enabled NICs using the
idrac-redfish
inspect interface with theidrac
hardware type. With this feature, a port’spxe_enabled
status will be recorded on the bare metal port.
Adds support to manage certificates to the
ilo5
hardware type. A new optional boolean driver_info parameterilo_add_certificates
is introduced which can be used by the user to request addition of certificates to the iLO withilo-uefi-https
boot interface.
Adds the
[deploy]enable_nvme_secure_erase
option which allows the operator to enable NVMe format option for all nodes being managed by the conductor.
Add
anaconda
deploy interface to Ironic. This driver will deploy the OS using anaconda installer and kickstart file instead of IPA. To support this feature a new configuration groupanaconda
is added to Ironic configuration file along withdefault_ks_template
configuration option.The deploy interface uses heartbeat API to communicate. The kickstart template must include %pre %post %onerror and %traceback sections that should send status of the deployment back to Ironic API using heartbeats. An example of such calls to hearbeat API can be found in the default kickstart template. To enable anaconda to send status back to Ironic API via heartbeat
agent_status
andagent_status_message
are added to the heartbeat API. Use of these new parameters require API microversion 1.72 or greater.
Adds support for fast-tracking to
ansible
deploy interface.
Allows providing a list of IPMI cipher suite versions via the new configuration option
[ipmi]/cipher_suite_versions
. The configuration is only used whenipmi_cipher_suite
is not set indriver_info
.
Adds a new
disable_ramdisk
parameter to the manual cleaning API. If set totrue
, IPA won’t get booted for cleaning. Only steps explicitly marked as compatible can be executed this way.The parameter is available in the API version 1.70.
Provides operator ability to override URL settings required for provisioning/cleaning in the event of virtual media based deployment. These scenarios tend to require more delineation than more traditional deployments as they often have a different environmental security requirements. Set these two new configuration options using an IP address that is available to these nodes (both the ramdisk and the BMCs):
[deploy] external_http_url = <routable URL of the HTTP server> external_callback_url = <routable URL of bare metal API>
Adds new GPU dynamic capabilities to
ilo
drivers inspection. gpu_<vendor>_count: Integer gpu_<gpu_device_name>_count: Integer gpu_<gpu_device_name>: Boolean
Enhance
idrac-wsman
inspect hardware interface to report an additional GPU device namely GV100GL [Tesla V100 PCIe 16GB]. With this enhancement, following GPU devices are reportedTU104GL [Tesla T4]
GV100GL [Tesla V100 PCIe 16GB]
Adds basic support for managing RAID configuration via the Redfish out-of-band (OOB) management protocol to the
idrac
hardware type by adding new interface namedidrac-redfish
. For this iDRAC firmware greater than 4.40.00.00 is required. Theidrac
hardware type now supportsidrac-wsman
,idrac
,idrac-redfish
, andno-raid
interfaces in given priority order.
Allows node
*_interface
values to be overridden by values in a nodeinstance_info
field. This gives non-administrative users a temporary method of setting interface values.
The network data schema is now configurable via the new configuration options
[api]network_data_schema
.
Adds capability to use
project
scoped requests in concert withsystem
scoped requests for a composite Role Based Access Control (RBAC) model. As Ironic is mainly an administrative service, this capability has only been extended to API endpoints which are not purely administrative in nature. This consists of the following API endpoints: nodes, ports, portgroups, volume connectors, volume targets, and allocations.
Project
scoped
requests for baremetal allocations, will automatically record theproject_id
of the requestor as theowner
of the node.
Adds support for automatic creation of ports for
redfish
enabled bare metal nodes using prior to ironic-inspector introspection. This feature is a part ofredfish
management interface.
Supplying configuration to the agent using the
redfish-virtual-media
boot interface now works through USB instead of floppy by default. Modern hardware (and even virtual machines) has limited support for floppies.
Adds support for pre-built ISO images to the
redfish-virtual-media
boot interface and its derivatives.
Adds a
redfish
nativeraid_interface
to theredfish
hardware type. See story 2003514 for details.Note that common RAID cases have been tested, but cases that are more complex or rely on vendor-specific implementation details may not work as desired due to capability limitations.
Adds support for managing an iDRAC – reset, clear job queue, and reset to known good state – via the Redfish out-of-band (OOB) management protocol to the
idrac
hardware type. This is offered by newidrac-redfish
management hardware interface implementation cleaning steps:reset_idrac
,clear_job_queue
, andknown_good_state
.known_good_state
both resets an iDRAC and clears its job queue.
Adds
[conductor]clean_step_priority_override
configuration parameter which allows the operator to define a custom order in which the cleaning steps are to run.
The Baremetal API, provided by the
ironic-api
process, now supports use ofsystem
scopedkeystone
authentication for the following endpoints: nodes, ports, portgroups, chassis, drivers, driver vendor passthru, volume targets, volume connectors, conductors, allocations, events, deploy templates
Introduces lazy-loading of ports, portgroups, volume connections and volume targets in task manager. For periodic tasks which create a task manager object but don’t require the aforementioned data (e.g. power sync), this change should reduce the number of database interactions by around two thirds, speeding up overall execution.
Adds support for multipath volumes. If the volume properties have multiple portals, then it will generate multiple iscsi urls and append them together for use in the generated ipxe file.
Known Issues¶
The addition of both
project
andsystem
scoped Role Based Access controls does add additional database queries when linked resources are accessed. Example, when attempting to access aport
orportgroup
, the associated node needs to be checked as this helps govern overall object access to the object forproject
scoped requests. This does not impactsystem
scoped requests. Operators who adopt project scoped access may find it necessary to verify or add additional database indexes in relation to the nodeuuid
column as well asnode_id
field in any table which may recieve heavy project query scope activity. Theironic
project anticipates that this will be a future work item of the project to help improve database performance.
Upgrade Notes¶
The
ilo-virtual-media
andilo-uefi-https
boot interfaces does not use[pxe]pxe_append_params
anymore. To pass kernel parameters use new configuration parameter[ilo]/kernel_append_param
.
Legacy policy rules have been deprecated. Operators are advised to review and update any custom policy files in use. Please see Secure Role Based Access Controls for more information.
The functionality of using a port.extra
vif_port_id
value to signal and control a VIF attachment has been removed to support changing the permission model and access control policy. Use ofvif_port_id
outside of the VIF attachment/detachment workflow has been deprecated since the Ocata development cycle.
Deprecated policy rules are not expressed via a default policy file generation from the source code. The generated default policy file indicates the new default policies with notes on the deprecation to which
oslo.policy
falls back to, until the[oslo_policy]enforce_scope
and[oslo_policy]enforce_new_defaults
have been set toTrue
. Please see the Victoria policy configuration documentation to reference prior policy configuration.
Operators are encouraged to move to
system
scope based authentication by setting[oslo_policy]enforce_scope
and[oslo_policy]enforce_new_defaults
. This requires a migration from using anadmin project
with thebaremetal_admin
andbaremetal_observer
. System wide administrators usingsystem
scopedadmin
andreader
accounts superceed the deprecated model.
Deprecation Notes¶
Deprecates ATA specific
agent_continue_if_ata_erase_failed
agent’s option which is replaced withagent_continue_if_secure_erase_failed
. The new option supports both ATA and NVMe secure erase. In order to ensure a smooth migration to the new configuration option, the operators need to upgrade Ironic Python Agent image to Wallaby release prior to upgrading Ironic Conductor to Xena.
- Pre-RBAC support rules have been deprecated. These consist of:
admin_api
is_member
is_observer
is_node_owner
is_node_lessee
is_allocation_owner
These rules will likely be removed in the Xena development cycle. Operators are advised to review any custom policy rules for these rules and move to the Secure Role Based Access Controls model.
The node’s
driver_info
parameterconfig_via_floppy
of theredfish-virtual-media
boot interface has been renamed toconfig_via_removable
. The old alias is deprecated.
Use of an
admin project
with ironic is deprecated. With this the custom roles,baremetal_admin
andbaremetal_observer
are also deprecated. Please migrate to using asystem
scoped account with theadmin
andreader
roles, respectively.
Security Issues¶
Ability to create an allocation has been restricted by a new policy rule
baremetal::allocation::create_pre_rbac
which prevents creation of allocations by any project administrator when operating with the new Role Based Access Control model. The use and enforcement of this rule is disabled when[oslo_policy]enforce_new_defaults
is set which also makes the population of aowner
field for allocations to become automatically populated. Most deployments should not encounter any issues with this security change, and the policy rule will be removed when support for the legacybaremetal_admin
custom role has been removed.
Fixes an issue where ironic was not properly labeling dynamicly built virtual media ramdisks with the signifier flag so the ramdisk understands it was booted from virtual media.
Bug Fixes¶
When using the Neutron DHCP driver, Ironic would only use the first fixed IP address to determine what IP versions are use on the port. Now, it checks for all the IP addresses and adds DHCP options for all IP versions.
Rejects
configdrive
that is not a JSON, a URL or a base64 string. Previously invalid JSON supplied to ironicclient could end up accepted as a configdrive, which would cause a failure much later.
Fixes the
[deploy]configdrive_use_object_store
option that was broken during the Python 3 transition.
Fixes the problem about grub2 config file. Some higher versions of grub2 (e.g. 2.05 or 2.06-rc1) use grub.cfg-01-MAC, while another lower versions of grub2 (e.g. 2.04) use MAC.conf, so we generate both paths in order to be compatible with both.
Fixes the missing
boot_method
ramdisk parameter for dynamicly build virtual media payloads. This value must be set tovmedia
for the ramdisk running on virtual media to understand it is executing from virtual media. This was fixed for cases where it is used with theredfish-virtual-media
based boot interfaces as well as theilo-virtual-media
boot interface, which is where dynamic virtual media deployment/cleaning ramdisk generation is supported.
Fixes
idrac-wsman
BIOSapply_configuration
andfactory_reset
clean and deploy steps to fail correctly in case of error when checking completed jobs. Before the fix when BIOS job failed, then node clean or deploy failed with timeout instead of actual error in cleaning or deploying step.
Adds handling of Redfish BMC’s which lack a
BootSourceOverrideMode
flag, such that it is no longer a fatal error for a deployment if the BMC does not support this field. This most common on BMCs which feature only a partial implementation of theComputerSystem
resourceboot
, but may also be observable on some older generations of BMCs which recieved updates to have partial Redfish support.
The fix for story 2008252 synced the boot mode after changing the boot device because Supermicro nodes reset the boot mode if not included in the boot device set. However this can cause a problem on Dell nodes when changing the mode uefi->bios or bios->uefi, see story 2008712 for details. Restrict the syncing of the boot mode to Supermicro.
Other Notes¶
Clean steps can now be marked with
requires_ramdisk=False
to make them compatible with the newdisable_ramdisk
argument of the manual cleaning API.
The API version of the Bare Metal API provided by the
ironic-api
service has been incremented to1.71
to signify that the API supports System and Project scoped Role Based Access Controls, which is purely informational in nature, as the version itself cannot be used to change the API behavior for access controls. In excess of 1500 unit tests were added as part of the effort to implement Role Based Access Controls to help ensure the effort did not break the API behavior.
16.2.0¶
New Features¶
Adds support for
deploy_steps
parameter to provisioning endpoint/v1/nodes/{node_ident}/states/provision
. Available and optional when target is ‘active’ or ‘rebuild’. When overlapping, these steps override deploy template and driver steps.deploy_steps
is a list of dictionaries with required keys ‘interface’, ‘step’, ‘priority’ and ‘args’.
By default Ironic will now not start new memory intensive work IF insufficent system memory exists. This can be disabled by setting the
[DEFAULT]minimum_memory_warning_only
value toTrue
.
The
force_persistent_boot_device
parameter now consistently applies to all boot interfaces, rather than only PXE and iPXE.
Supports setting boot mode via an
instance_info
capability.
The
ironic-conductor
process now has a concept of an internal memory limit. The intent of this is to prevent the conductor from running the host out of memory when a large number of deployments have been requested.These settings can be tuned using
[DEFAULT]minimum_required_memory
,[DEFAULT]mimimum_memory_wait_time
,[DEFAULT]minimum_memory_wait_retries
, and[DEFAULT]minimum_memory_warning_only
.Where possible, Ironic will attempt to wait out the time window, thus consuming the conductor worker thread which will resume if the memory becomes available. This will effectively rate limit concurrency.
If raw image conversions with-in the conductor is required, and a situation exists where insufficent memory exists and it cannot be waited, the deployment operation will fail. For the
iscsi
deployment interface, which is the other location in ironic that may consume large amounts of memory, the conductor will wait until the next agent heartbeat.
Supports attaching configdrives when doing
ramdisk
deploy with theredfish-virtual-media
boot. A configdrive is attached to a free USB slot.
Adds the
[DEFAULT]raw_image_growth_factor
configuration option which is a scale factor used for estimating the size of a raw image converted from compact image formats such as QCOW2. By default this is set to 2.0.When clearing the cache to make space for a converted raw image, the full virtual size is attempted first, and if not enough space is available a second attempt is made with the (smaller) estimated size.
Adds support for automatically configuring secure boot for nodes using the
redfish
management interface.
The
pxe
andipxe
boot interfaces now automatically configure secure boot if the management interface supports it.
Upgrade Notes¶
The default value of
[oslo_policy]policy_file
config option has been changed frompolicy.json
topolicy.yaml
. Operators who are utilizing customized policy files or previously generated static policy files (which are not needed by default), should generate new policy files and modify them to meet their needs in the event of any new policies or rules have been added. Please consult the oslopolicy-convert-json-to-yaml tool to convert a JSON to YAML formatted policy file in backward compatible way.
Deprecation Notes¶
Use of legacy policy format was deprecated by the
oslo.policy
library during the Victoria development cycle. As a result, this deprecation is being noted in the Wallaby with an anticipated future removal of support byoslo.policy
. As such operators will need to convert to YAML policy files. Please see the upgrade notes for details on migration of any custom policy files.
Using
instance_info/deploy_boot_mode
is deprecated, use theboot_mode
capability ininstance_info/capabilities
instead.
Currently the bare metal API permits setting the
secure_boot
capability for nodes, which driver does not support setting secure boot. This is deprecated and will become a failure in the Xena cycle.
Bug Fixes¶
Fixes fast-track to prevent marking the agent as alive if trying to rebuild a node before the fast-track timeout has expired.
Fixes redfish firmware update for
ilo5
hardware type by fixing the Redfish task message detection and correctly preparing the ramdisk before rebooting.
Boot mode is now correctly handled when using
redfish-virtual-media
boot with locally booted images.
The
redfish-virtual-media
boot interface now makes fewer calls to the BMC when preparing boot.
The
redfish-virtual-media
boot interface no longer passes validation for Dell nodes. Theidrac-redfish-virtual-media
boot interface must be used for these nodes instead.
Failed cleaning no longer results in maintenance mode if no clean step is running, e.g. on PXE timeout or failed clean steps validation.
Retries virtual media insert on failure to allow for an eject that may not have finished (see story 2008504).
When Ironic configures the BootSourceOverrideTarget setting via Redfish, on Supermicro BMCs it must always configure BootSourceOverrideEnabled or that will revert to default (Once) on the BMC, see story 2008547 for details. This is different than what is currently implemented for other BMCs in which the BootSourceOverrideEnabled is not configured if it matches the current setting (see story 2007355).
This requires that
node.properties['vendor']
besupermicro
which will be set on transition tomanageable
based on the Redfish system object or can be set manually.
Other Notes¶
Register all conductor hardware interfaces together. Adds all conductor hardware interfaces in to the database in a single transaction and to allow this update the
register_hardware_interfaces
API. This allows Restful API consumers to understand if the conductor is fully on-line via the presence of driver entries. Previously this was done one driver at a time.
Extends
ManagementInterface
with two new calls:get_secure_boot_state
andset_secure_boot_state
. They are optional and may be implemented for hardware that supports dynamically enabling/disabling secure boot.
16.1.0¶
New Features¶
Allows disabling automated cleaning per node if it is enabled globally. An existing
automated_clean
field will allow disabling of automated cleaning on the node object. A newbaremetal:node:disable_cleaning
policy is added which defaults tobaremetal:node:update
.
Retrieves BIOS configuration settings when moving a node to
manageable
. This allows the settings to be used when choosing which node to deploy. For more details, see story 2008326.
When deploying a node with software RAID with an image not from Glance, the new
instance_info
fieldimage_rootfs_uuid
can be used to specify the UUID of the root partition to install the bootloader on.
The ramdisk log file name now contains the node name when it is set.
Provides a new vendor passthru method for Redfish to eject a virtual_media device. A specific device can be given (either
cd
,dvd
,floppy
, orusb
), or if no device is provided then all attached devices will be ejected.
A new option
[agent]api_ca_file
allows passing a CA file to the ramdisk whenredfish-virtual-media
boot is used. Requires ironic-python-agent from the Wallaby cycle.
Known Issues¶
Building ramdisks for DHCP-less deploy using the
simple-init
element is known not to work for distributions using NetworkManager. Thedebian-minimal
element seems to work.
When
redfish-virtual-media
is used, fast-track mode will not work as expected, nodes will be rebooted between operations.
Upgrade Notes¶
The default value of
[api]api_workers
is now limited to 4. Set it explicitly if you need a higher value.
An automated detection of a IPMI BMC hardware vendor has been added to appropriately handle IPMI BMC variations. Ironic will now query this and save this value if not already set in order to avoid querying for every single operation. Operators upgrading should expect an elongated first power state synchronization if for nodes with the
ipmi
hardware type.
The
agent
RAID interface now removes any root device hint after the RAID configuration is successfully deleted.
Bug Fixes¶
No longer launches too many API workers on systems with a lot of CPU cores by default.
Fixes the logic which determines the partition table type to utilize with partition images account for the boot mode of the machine. If no value is set by the API user, Ironic now correctly defaults to GPT if the node has been set in UEFI mode.
It is no longer possible to set a port’s
physical_network
to an empty string, making the port unusable.
Fixes recognition of a busy agent to also handle recognition during deployment steps by more uniformly detecting and identifying when the
ironic-python-agent
service is busy.
Fixes inspection with the
idrac-redfish-virtual-media
boot interface.
Correctly handles the node’s custom network data when the
noop
network interface is used. Previously it was ignored.
Fixes incorrect injected network data location when using virtual media.
Fixes
redfish
BIOSapply_configuration
clean and deploy step to fail correctly in case of error when checking if BIOS updates are successfully applied. Before the fix when BIOS updates were unsuccessful, then node cleaning or deploying failed with timeout instead of actual error in clean or deploy step.
Fixes
idrac-wsman
RAIDcreate_configuration
clean step,apply_configuration
deploy step anddelete_configuration
clean and deploy step to fail correctly in case of error when checking completed jobs. Before the fix when RAID job failed, then node cleaning or deploying failed with timeout instead of actual error in clean or deploy step.
Fixes issues when
UEFI
boot mode has been requested with persistent boot toDISK
where some versions ofipmitool
do not properly handle multiple options being set at the same time. While some of this logic was addressed in upstream ipmitool development, new versions are not released and vendors maintain downstream forks of the ipmitool utility. When considering vendor specific selector differences along with the current stance of new versions from the upstreamipmitool
community, it only made sense to handle this logic with-in Ironic. In part this was because if already set the selector value would not be updated. Now ironic always transmits the selector value forUEFI
.
Fixes handling of Supermicro
UEFI
supporting BMCs with theipmi
hardware type such that an appropriate boot device selector value is sent to the remote BMC to indicate boot from local storage. This is available for both persistent and one-time boot applications. For more information, please consult story 2008241.
Fixes handling of the
ipmi
hardware type whereUEFI
boot mode and “one-time” boot to PXE has been requested. As Ironic now specifically transmits the raw commands, this setting should be properly appied where previously PXE boot operations may have previously occured inLegacy BIOS
mode.
Calculating the ipmitool
-N
and-R
arguments from the configuration options[ipmi]command_retry_timeout
and[ipmi]min_command_interval
now takes into account the 1 second interval increment that ipmitool adds on each retry event.Failure-path ipmitool run duration will now be just less than
command_retry_timeout
instead of much longer.
When configured to use JSON RPC, the
[DEFAULT]host
configuration option can now be set to an IPv6 address. Previously it could only be an IPv4 address or a DNS name.
No longer tries to pass
BOOTIF=None
as a kernel parameter when using virtual media. This could break inspection.
Fixes the issue that when the MAC address of a port group is not set and been attached to instance, the landed bond port cannot get IP address due to inconsistent MAC address between the tenant port and the initially allocated one in the config drive.
Fixes the issue that root device hint is not removed after the
agent
RAID interface has successfully deleted RAID configuration. The previous hint is not guranteed to be valid and may cause a deployment failure.
Fixes cleaning with the
ramdisk
deploy interface by reusing the same procedure as for thedirect
deploy interface.
Fixes a bug where a conductor could fail to complete a deployment if there was contention on a shared lock. This would manifest as an instance being stuck in the “deploying” state, though the node had in fact started or even completed its final boot.
After changing the boot device via Redfish, checks that the boot mode being reported matches what is configured and, if not, sets it to the configured value. Some BMCs change the boot mode when the device is set via Redfish, see story 2008252 for details.
Fixes wiping agent token on rebooting via API.
Adds secure boot support to ilo-uefi-https boot interface. Secure boot support already exists for other boot interfaces but missing for this interface.
The virtual media ISO image building process now respects the
default_boot_mode
configuration option.
Fixes timeout in fast-track mode with
redfish-virtual-media
when running one operation after another (e.g. cleaning after inspection).
Fixes permission issues when injecting network data into a virtual media.
Adds timeout to HTTP image validation and downloading operations, so that the
direct
deploy does not hang when the remote server is not responsive. The default timeout is 60 seconds and can be changed via the newwebserver_connection_timeout
option.
Other Notes¶
Adds a
detect_vendor
management interface method to theipmi
hardware type. This method is being promoted as a higher level interface as the fundimental need to be able to have logic aware of the hardware vendor is necessary with vendor agnostic drivers where slight differences require slightly different behavior.
The
configdrive
argument to some utils inironic.common.images
andironic.drivers.modules.image_utils
has been replaced with a newinject_files
argument. The previous approach did not really work in all situations and we don’t expect 3rd party drivers to use it.