Current Series Release Notes

24.0.0.0rc1

Prelude

The 24.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 23.0.0 (Wallaby) to 24.0.0 (Xena).

There are a few major changes worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for Xena is v2.90. Details on REST API microversions added since the 23.0.0 Wallaby release can be found in the REST API Version History page.

  • Support for accelerators in Nova servers has been improved. Now Cyborg-managed SmartNICs can be attached as SR-IOV devices.

  • Two new nova-manage CLI commands can be used for checking the volume attachment connection information and for refreshing it if the connection is stale (for example with a Ceph backing store and MON IP addresses). Some documentation on how to use them can be found here.

  • Instance hostnames published by the metadata API service or config drives can be explicitly defined at instance creation time thanks to the new 2.90 API microversion. See the hostname field documentation on the API docs for further details.

  • Libvirt virt driver now supports any PCI device, not just virtual GPUs, that are using the VFIO-mdev virtualization framework, like network adapters or compute accelerators. See more in the spec.

New Features

  • Microversion 2.89 has been introduced and will include the attachment_id of a volume attachment, bdm_uuid of the block device mapping record and removes the duplicate id from the responses for GET /servers/{server_id}/os-volume_attachments and GET /servers/{server_id}/os-volume_attachments/{volume_id}.

  • A number of commands have been managed to nova-manage to help update stale volume attachment connection info for a given volume and instance.

    • The nova-manage volume_attachment show command can be used to show the current volume attachment information for a given volume and instance.

    • The nova-manage volume_attachment get_connector command can be used to get updated host connector for the localhost.

    • Finally, the nova-manage volume_attachment refresh command can be used to update the volume attachment with this updated connection information.

  • A --sleep option has been added to the nova-manage db archive_deleted_rows CLI. When this command is run with the --until-complete option, the process will archive rows in batches in a tight loop, which can cause problems in busy environments where the aggressive archiving interferes with other requests trying to write to the database. The --sleep option can be used to specify a time to sleep between batches of rows while archiving with --until-complete, allowing the process to be throttled.

  • A --task-log option has been added to the nova-manage db archive_deleted_rows CLI. When --task-log is specified, task_log table records will be archived while archiving the database. The --task-log option works in conjunction with --before if operators desire archiving only records that are older than <date>. The updated_at field is used by --task-log --before <date> to determine the age of a task_log record for archival.

    The task_log database table contains instance usage audit records if nova-compute has been configured with [DEFAULT]instance_usage_audit = True. This will be the case if OpenStack Telemetry is being used in the deployment, as the option causes Nova to generate audit notifications that Telemetry consumes from the message bus.

    Usage data can also be later retrieved by calling the /os-instance_usage_audit_log REST API [1].

    Historically, there has been no way to delete task_log table records other than manual database modification. Because of this, task_log records could pile up over time and operators are forced to perform manual steps to periodically truncate the task_log table.

    [1] https://docs.openstack.org/api-ref/compute/#server-usage-audit-log-os-instance-usage-audit-log

  • A new configuration option is now available for supporting PCI devices that use the VFIO-mdev kernel framework and are stateless. Instead of using the VGPU resource class for both the inventory and the related allocations, the operator could ask to use another custom resource class for a specific mdev type by using the dynamic mdev_class.

  • When using the libvirt virt driver with the QEMU or KVM backends, instances will now be created with the vmcoreinfo feature enabled by default. This creates a fw_cfg entry for a guest to store dump details, necessary to process kernel dump with KASLR enabled and providing additional kernel details. For more information, refer to the libvirt documentation.

  • The 2.90 microversion has been added. This microversion allows users to specify a requested hostname to be configured for the instance metadata when creating an instance (POST /servers), updating an instance (PUT /servers/{id}), or rebuilding an instance (POST /servers/{server_id}/action (rebuild)). When specified, this hostname replaces the hostname that nova auto-generates from the instance display name. As with the auto-generated hostnames, a service such as cloud-init can automatically configure the hostname in the guest OS using this information retrieved from the metadata service.

    In addition, starting with the 2.90 microversion, the OS-EXT-SRV-ATTR:hostname field is now returned for all users. Previously this was restricted to admin users.

  • Add support for the bochs libvirt video model. This is a legacy-free video model that is best suited for UEFI guests. In limited cases (e.g. if the guest does not depend on direct VGA hardware access), it can be useable for BIOS guests as well.

  • Add support for smartnic via Cyborg device profiles in Neutron ports with vnic type accelerator-direct. When such port is used Cyborg will manage the smartnic and Nova will pass through the smartnic VF to the server. Note that while vnic type accelerator-direct-physical also exists in Neutron it is not yet supported by Nova and the server create request will fail with such port.

Known Issues

  • Linux guest images that have known kernel bugs related to virtualized apic initialization previously would sporadically hang. For images where the kernel cannot be upgraded, a [workarounds] config option has been introduced:

    [workarounds]libvirt_disable_apic

    This option is primarily intended for CI and development clouds as a bridge for operators to mitigate the issue while they work with their upstream image vendors.

Upgrade Notes

  • As part of the fix for bug 1910466, code that attempted to optimize VM CPU thread assignment based on the host CPU topology as it was determined to be buggy, undocumented and rejected valid virtual CPU topologies while also producing different behavior when CPU pinning was enabled vs disabled. The optimization may be reintroduced in the future with a more generic implementation that works for both pinned and unpinned VMs.

  • A few of the APIs return code was not consistent for the operations/ features not implemented or supported. It was returned as 403, 400, or 409 (for Operation Not Supported For SEV , Operation Not Supported For VTPM cases). Now we have made it consistent and return 400 always when any operations/features are not implemented or supported.

  • Support for automatically retrying all database interactions by configuring the [database] use_db_reconnect config option has been removed. This behavior was only ever supported for interactions with the main database and was generally not necessary as a number of lookups were already explicitly wrapped in retries. The [database] use_db_reconnect option is provided by oslo.db and will now be ignored by nova.

  • Experimental support for thread pooling of DB API calls has been removed. This feature was first introduced in the 2014.2 (Juno) release but has not graduated to fully-supported status since nor was it being used for any API DB calls. The [oslo_db] use_tpool config option used to enable this feature will now be ignored by nova.

  • The [workarounds]disable_native_luksv1 workaround configurable has been removed after previously being deprecated during the Wallaby (23.0.0) release.

  • The [workarounds]rbd_volume_local_attach workaround configurable has been removed after previously being deprecated in the Wallaby (23.0.0) release.

  • A number of scheduler-related config options were renamed during the 15.0.0 (Ocata) release. The deprecated aliases have now been removed. These are:

    • [DEFAULT] scheduler_max_attempts (now [scheduler] max_attempts)

    • [DEFAULT] scheduler_host_subset_size (now [scheduler] host_subset_size)

    • [DEFAULT] max_io_ops_per_host (now [scheduler] max_io_ops_per_host)

    • [DEFAULT] max_instances_per_host (now [scheduler] max_instances_per_host)

    • [DEFAULT] scheduler_tracks_instance_changes (now [scheduler] track_instance_changes)

    • [DEFAULT] scheduler_available_filters (now [scheduler] available_filters)

  • Nova now requires that the Placement API supports at least microversion 1.36, added in Train. The related nova-upgrade check has been modified to warn if this prerequisite is not fulfilled.

  • The database migration engine has changed from sqlalchemy-migrate to alembic. For most deployments, this should have minimal to no impact and the switch should be mostly transparent. The main user-facing impact is the change in schema versioning. While sqlalchemy-migrate used a linear, integer-based versioning scheme, which required placeholder migrations to allow for potential migration backports, alembic uses a distributed version control-like schema where a migration’s ancestor is encoded in the file and branches are possible. The alembic migration files therefore use a arbitrary UUID-like naming scheme and the nova-manage db sync and nova-manage api_db sync commands now expect such an version when manually specifying the version that should be applied. For example:

    $ nova-manage db sync 8f2f1571d55b
    

    It is no longer possible to specify an sqlalchemy-migrate-based version. When the nova-manage db sync and nova-manage api_db sync commands are run, all remaining sqlalchemy-migrate-based migrations will be automatically applied. Attempting to specify an sqlalchemy-migrate-based version will result in an error.

Deprecation Notes

  • The AvailabilityZoneFilter scheduler filters is now deprecated for removal in a future release. The functionality of the AvailabilityZoneFilter has been replaced by the map_az_to_placement_aggregate pre-filter which was introduced in 18.0.0 (Rocky). This pre-filter is now enabled by default and will be mandatory in a future release.

  • The existing config options in the [devices] group for managing virtual GPUs are now renamed in order to be more generic since the mediated devices framework from the linux kernel can support other devices:

    • enabled_vgpu_types is now deprecated in favour of enabled_mdev_types

    • Dynamic configuration groups called [vgpu_*] are now deprecated in favour of [mdev_*]

    Support for the deprecated options will be removed in a future release.

  • The os_compute_api:os-extended-server-attributes policy controls which users a number of server extended attributes are shown to. Configuring visiblity of the OS-EXT-SRV-ATTR:hostname attribute via this policy has now been deprecated and will be removed in a future release. Upon removal, this attribute will be shown for all users regardless of policy configuration.

Security Issues

  • A vulnerability in the console proxies (novnc, serial, spice) that allowed open redirection has been patched. The novnc, serial, and spice console proxies are implemented as websockify servers and the request handler inherits from the python standard SimpleHTTPRequestHandler. There is a known issue in the SimpleHTTPRequestHandler which allows open redirects by way of URLs in the following format:

    http://vncproxy.my.domain.com//example.com/%2F..
    

    which if visited, will redirect a user to example.com.

    The novnc, serial, and spice console proxies will now reject requests that pass a redirection URL beginning with “//” with a 400 Bad Request.

  • In this release OVS port creation has been delegated to os-vif when the noop or openvswitch security group firewall drivers are enabled in Neutron. Those options, and others that disable the hybrid_plug mechanism, will now use os-vif instead of libvirt to plug VIFs into the bridge. By delegating port plugging to os-vif we can use the isolate_vif config option to ensure VIFs are plugged securely preventing guests from accessing other tenants’ networks before the neutron ovs agent can wire up the port. See bug #1734320 for details. Note that OVN, ODL and other SDN solutions also use hybrid_plug=false but they are not known to be affected by the security issue caused by the previous behavior. As such the isolate_vif os-vif config option is only used when deploying with ml2/ovs.

Bug Fixes

  • Improved detection of anti-affinity policy violation when performing live and cold migrations. Most of the violations caused by race conditions due to performing concurrent live or cold migrations should now be addressed by extra checks in the compute service. Upon detection, cold migration operations are automatically rescheduled, while live migrations have two checks and will be rescheduled if detected by the first one, otherwise the live migration will fail cleanly and revert the instance state back to its previous value.

  • Bug 1851545, wherein unshelving an instance with SRIOV Neutron ports did not update the port binding’s pci_slot and could cause libvirt PCI conflicts, has been fixed.

    Important

    Constraints in the fix’s implementation mean that it only applies to instances booted after it has been applied. Existing instances will still experience bug 1851545 after being shelved and unshelved, even with the fix applied.

  • Fixes an issue with multiple nova-compute services used with Ironic, where a rebalance operation could result in a compute node being deleted from the database and not recreated. See bug 1853009 for details.

  • The nova libvirt driver supports two independent features, virtual CPU topologies and virtual NUMA topologies. Previously, when hw:cpu_max_sockets, hw:cpu_max_cores and hw:cpu_max_threads were specified for pinned instances (hw:cpu_policy=dedicated) without explicit hw:cpu_sockets, hw:cpu_cores, hw:cpu_threads extra specs or their image equivalent, nova failed to generate a valid virtual CPU topology. This has now been fixed and it is now possible to use max CPU constraints with pinned instances. e.g. a combination of hw:numa_nodes=2, hw:cpu_max_sockets=2, hw:cpu_max_cores=2, hw:cpu_max_threads=8 and hw:cpu_policy=dedicated can now generate a valid topology using a flavor with 8 vCPUs.

  • Addressed an issue that prevented instances with 1 vcpu using multiqueue feature from being created successfully when their vif_type is TAP.

  • On some hardware platforms, an SR-IOV virtual function for a NIC port may exist without being associated with a parent physical function that has an assocatied netdev. In such a case the the PF interface name lookup will fail. As the PciDeviceNotFoundById exception was not handled this would prevent the nova compute agent from starting on affected hardware. See: https://bugs.launchpad.net/nova/+bug/1915255 for more details. This edgecase has now been addressed, however, features that depend on the PF name such as minimum bandwidth based QoS cannot be supported on these platforms.

  • In this release we delegate port plugging to os-vif for all OVS interface types. This allows os-vif to create the OVS port before libvirt creates a tap device during a live migration therefore preventing the loss of the MAC learning frames generated by QEMU. This resolves a long-standing race condition between Libvirt creating the OVS port, Neutron wiring up the OVS port and QEMU generating RARP packets to populate the vswitch MAC learning table. As a result this reduces the interval during a live migration where packets can be lost. See bug #1815989 for details.

  • To fix device detach issues in the libvirt driver the detach logic has been changed from a sleep based retry loop to waiting for libvirt domain events. During this change we also introduced two new config options to allow fine tuning the retry logic. For details see the description of the new [libvirt]device_detach_attempts and [libvirt]device_detach_timeout config options.

  • Minimizes a race condition window when using the ironic virt driver where the data generated for the Resource Tracker may attempt to compare potentially stale instance information with the latest known baremetal node information. While this doesn’t completely prevent nor resolve the underlying race condition identified in bug 1841481, this change allows Nova to have the latest state information, as opposed to state information which may be out of date due to the time which it may take to retrieve the status from Ironic. This issue was most observable on baremetal clusters with several thousand physical nodes.