Victoria Series Release Notes¶
Fixes slow compute restart when using the
nova.virt.ironiccompute driver where the driver was previously attempting to attach VIFS on start-up via the
plug_vifsdriver method. This method has grown otherwise unused since the introduction of the
attach_interfacemethod of attaching VIFs. As Ironic manages the attachment of VIFs to baremetal nodes in order to align with the security requirements of a physical baremetal node’s lifecycle. The ironic driver now ignores calls to the
If compute service is down in source node and user try to stop instance, instance gets stuck at powering-off, hence evacuation fails with msg: Cannot ‘evacuate’ instance <instance-id> while it is in task_state powering-off. It is now possible for evacuation to ignore the vm task state. For more details see: bug 1978983
Minimizes a race condition window when using the
ironicvirt driver where the data generated for the Resource Tracker may attempt to compare potentially stale instance information with the latest known baremetal node information. While this doesn’t completely prevent nor resolve the underlying race condition identified in bug 1841481, this change allows Nova to have the latest state information, as opposed to state information which may be out of date due to the time which it may take to retrieve the status from Ironic. This issue was most observable on baremetal clusters with several thousand physical nodes.
Support for cold migration and resize between hosts with different network backends was previously incomplete. If the os-vif plugin for all network backends available in the cloud are not installed on all nodes unplugging will fail during confirming the resize. The issue is caused by the VIF unplug that happened during the resize confirm action on the source host when the original backend information of the VIF was not available. The fix moved the unplug to happen during the resize action when such information is still available. See bug #1895220 for more details.
The libvirt virt driver in Nova implements power on and hard reboot by destroying the domain first and unpluging the vifs then recreating the domain and replugging the vifs. However nova does not wait for the network-vif-plugged event before unpause the domain. This can cause the domain to start running and requesting IP via DHCP before the networking backend has finished plugging the vifs. The config option [workarounds]wait_for_vif_plugged_event_during_hard_reboot has been added, defaulting to an empty list, that can be used to ensure that the libvirt driver waits for the network-vif-plugged event for vifs with specific
vnic_typebefore it unpauses the domain during hard reboot. This should only be used if the deployment uses a networking backend that sends such event for the given
vif_typeat vif plug time. The ml2/ovs and the networking-odl Neutron backend is known to send plug time events for ports with
vnic_type. For more information see https://bugs.launchpad.net/nova/+bug/1946729
A vulnerability in the console proxies (novnc, serial, spice) that allowed open redirection has been patched. The novnc, serial, and spice console proxies are implemented as websockify servers and the request handler inherits from the python standard SimpleHTTPRequestHandler. There is a known issue in the SimpleHTTPRequestHandler which allows open redirects by way of URLs in the following format:
which if visited, will redirect a user to example.com.
The novnc, serial, and spice console proxies will now reject requests that pass a redirection URL beginning with “//” with a 400 Bad Request.
Addressed an issue that prevented instances with 1 vcpu using multiqueue feature from being created successfully when their vif_type is TAP.
Improved detection of anti-affinity policy violation when performing live and cold migrations. Most of the violations caused by race conditions due to performing concurrent live or cold migrations should now be addressed by extra checks in the compute service. Upon detection, cold migration operations are automatically rescheduled, while live migrations have two checks and will be rescheduled if detected by the first one, otherwise the live migration will fail cleanly and revert the instance state back to its previous value.
The libvirt virt driver will no longer attempt to fetch volume encryption metadata or the associated secret key when attaching
LUKSv1encrypted volumes if a libvirt secret already exists on the host.
This resolves bug 1905701 where instances with
LUKSv1encrypted volumes could not be restarted automatically by the
nova-computeservice after a host reboot when the
[DEFAULT]/resume_guests_state_on_host_bootconfigurable was enabled.
When the tempest test coverage was added for resize and cold migrate with neutron ports having QoS minimum bandwidth policy rules we discovered that the cross cell resize code path cannot handle such ports. See bug https://bugs.launchpad.net/nova/+bug/1907522 for details. A fix was implemented that makes sure that Nova falls back to same-cell resize if the server has such ports.
Nova services only support old computes if the compute is not older than the previous major nova release. From now on nova services will emit a warning at startup if the deployment contains too old compute services. From the 23.0.0 (Wallaby) release nova services will refuse to start if the deployment contains too old compute services to prevent compatibility issues.
Fixes bug 1892361 in which the pci stat pools are not updated when an existing device is enabled with SRIOV capability. Restart of nova-compute service updates the pci device type from type-PCI to type-PF but the pools still maintain the device type as type-PCI. And so the PF is considered for allocation to instance that requests vnic_type=direct. With this fix, the pci device type updates are detected and the pci stat pools are updated properly.
When upgrading compute services from Ussuri to Victoria each by one, the Compute RPC API was pinning to 5.11 (either automatically or by using the specific rpc version in the option) but when rebuilding an instance, a TypeError was raised as an argument was not provided. This error is fixed by bug 1902925.
The 22.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 21.0.0 (Ussuri) to 22.0.0 (Victoria).
There are a few major changes worth mentioning. This is not an exhaustive list:
The latest Compute API microversion supported for Victoria is v2.87. No new microversions were added during this cycle but you can find all of them in the REST API Version History page.
Support for a new
mixedflavor CPU allocation policy that allows both pinned and floating CPUs within the same instance.
Custom Placement resource inventories and traits can now be described using a single providers configuration file.
Glance multistore configuration with multiple RBD backends is now supported within Nova for libvirt RBD-backed images using
An emulated Virtual Trusted Platform Module can be exposed to instances running on a
Support for the
parallelslibvirt backends has been deprecated.
XenAPIvirt driver has been removed, including the related configuration options.
VMWarevirt driver is now supported again in Victoria after being deprecated during the Ussuri release, as testing issues have been addressed.
It is now possible to allocate all cores in an instance to realtime and omit the
hw:cpu_realtime_maskextra spec. This requires specifying the
It is now possible to specify a mask in
hw:cpu_realtime_maskwithout a leading
^. When this is ommitted, the value will specify the cores that should be included in the set of realtime cores, as opposed to those that should be excluded.
Nova now supports adding an emulated virtual Trusted Platform Module to libvirt guests with a
qemu. Not all server operations are fully supported yet. See the documentation for details.
[glance]/enable_rbd_downloadconfig option was introduced. The option allows for the configuration of direct downloads of Ceph hosted glance images into the libvirt image cache via rbd when
[glance]/rbd_ceph_confare correctly configured.
--forceoption to the
nova-manage placement heal_allocationscommand to forcefully heal allocations for a specific instance.
The libvirt RBD image backend module can now handle a Glance multistore environment where multiple RBD clusters are in use across a single Nova/Glance deployment, configured as independent Glance stores. In the case where an instance is booted with an image that does not exist in the RBD cluster that Nova is configured to use, Nova can ask Glance to copy the image from whatever store it is currently in to the one that represents its RBD cluster. To enable this feature, set
[libvirt]/images_rbd_glance_store_nameto tell Nova the Glance store name of the RBD cluster it uses.
A new configuration option,
[DEFAULT]/max_concurrent_snapshots, has been added. This allow operator to configure maximum concurrent snapshots on a compute host and prevent resource overuse related to snapshot.
Nova now supports defining of additional resource provider traits and inventories by way of YAML configuration files. The location of these files is defined by the new config option
[compute]provider_config_location. Nova will look in this directory for
*.yamlfiles. See the specification and admin guide for more details.
Add the ability to use
vmxnet3NIC on a host using the QEMU/KVM driver. This allows the migration of an ESXi VM to QEMU/KVM, without any driver changes.
vmxnet3comes with better performance and lower latency comparing to an emulated driver like
[libvirt]/rbd_destroy_volume_retries, defaulting to 12, and
[libvirt]/rbd_destroy_volume_retry_interval, defaulting to 5, that Nova will use when trying to remove a volume from Ceph in a retry loop that combines these parameters together. Thus, maximum elapsing time is by default 60 seconds.
Nova now supports attaching and detaching PCI device backed Neutron ports to running servers.
mixedinstance CPU allocation policy for instance mixing with both
VCPUresources. This is useful for applications that wish to schedule the CPU intensive workload on the
PCPUand the other workloads on
VCPU. The mixed policy avoids the necessity of making all instance CPUs to be pinned CPUs, as a result, reduces the consuption of pinned CPUs and increases the instance density.
Extend the real-time instance with the
mixedCPU allocation policy. In comparing with
dedicatedpolicy real-time instance, the non-real-time CPUs are not longer required to be pinned on dedicated host CPUs, but float on a range of host CPUs sharing with other instances.
Add the extra spec
hw:cpu_dedicated_maskto set the pinned CPUs for the mixed instance. This is a core mask and can be used to include or exclude CPUs. Any core not included or explicitly excluded is treated as a shared CPU.
Export instance pinned CPU list through the
dedicated_cpussection in the metadata service API.
bug 1894804 documents a known device detachment issue with QEMU
4.2.0as shipped by the Focal
20.04Ubuntu release. This can lead to the failure to detach devices from the underlying libvirt domain of an instance as QEMU never emits the correct
DEVICE_DELETEDevent to libvirt. This in turn leaves the device attached within libvirt and OpenStack Nova while it has been detached from the underlying QEMU process. Subsequent attempts to detach the device will also fail as it is no longer found within the QEMU process.
There is no known workaround within OpenStack Nova to this issue.
All APIs except deprecated APIs were modified to implement
scope_typeand use new defaults in 21.0.0 (Ussuri). The remaining APIs have now been updated.
Refer to the Nova Policy Concepts for details and migration plan.
Nova policies implemented the
scope_typeand new defaults provided by keystone. Old defaults are deprecated and still work if rules are not overridden in the policy file. If you don’t override any policies at all, then you don’t need to do anything different until the W release when old deprecated rules are removed and tokens need to be scoped to work with new defaults and scope of policies. For migration to new policies you can refer to this document.
If you are overwriting the policy rules (all or some of them) in the policy file with new default values or any new value that requires scoped tokens, then non-scoped tokens will not work. Also if you generate the policy file with ‘oslopolicy-sample-generator’ json format or any other tool, you will get rules defaulted in the new format, which examines the token scope. Unless you turn on
oslo_policy.enforce_scope, scope-checking rules will fail. Thus, be sure to enable
oslo_policy.enforce_scopeand educate end users on how to request scoped tokens from Keystone, or use a pre-existing sample config file from the Train release until you are ready to migrate to scoped policies. Another way is to generate the policy file in yaml format as described here and update the policy.yaml location in
For more background about the possible problem, check this bug. A upgrade check has been added to the
nova-status upgrade checkcommand for this.
The default value of
[oslo_policy] policy_fileconfig option has been changed from
policy.yaml. Nova policy new defaults since 21.0.0 and current default value of
[oslo_policy] policy_fileconfig option (
policy.json) does not work when
policy.jsonis generated by oslopolicy-sample-generator tool. Refer to bug 1875418 for more details. Also check oslopolicy-convert-json-to-yaml tool to convert the JSON to YAML formatted policy file in backward compatible way.
When using file-backed memory, the
nova-computeservice will now fail to start if the amount of reserved memory configured using
[DEFAULT] reserved_host_memory_mbis equal to or greater than the total amount of memory configured using
[libvirt] file_backed_memory. Where reserved memory is less than the total amount of memory configured, a warning will be raised. This warning will become an error in a future release.
The former combination is invalid as it would suggest reserved memory is greater than total memory available, while the latter is considered incorrect behavior as reserving of file-backed memory can and should be achieved by reducing the filespace allocated as memory by modifying
The default for
[glance] num_retrieshas changed from
3. The option controls how many times to retry a Glance API call in response to a HTTP connection failure. When deploying Glance behind HAproxy it is possible for a response to arrive just after the HAproxy idle time. As a result, an exception will be raised when the connection is closed resulting in a failed request. By increasing the default value, Nova can be more resilient to this scenario were HAproxy is misconfigured by retrying the request.
Previously, the number of concurrent snapshots was unlimited, now it is limited via
[DEFAULT]/max_concurrent_snapshots, which currently defaults to 5.
Support for hooks has been removed. In previous versions of nova, these provided a mechanism to extend nova with custom code through a plugin mechanism. However, they were deprecated in 13.0.0 (Mitaka) as unmaintainable long-term. Versioned notifications and vendordata should be used instead. For more information, refer to this thread.
nova.image.downloadentry point hook has been removed, per the deprecation announcement in the 17.0.0 (Queens) release.
Intel CMT perf events -
mbml- are no longer supported by the
[libvirt] enabled_perf_eventsconfig option. These event types were broken by design and are not supported in recent Linux kernels (4.14+).
[spice] keymapconfiguration options, first deprecated in 18.0.0 (Rocky), have now been removed. The VNC option affected the libvirt and VMWare virt drivers, while the SPICE option only affected libvirt. For the libvirt driver, configuring these options resulted in lossy keymap conversions for the given graphics method. Users can replace this host-level configuration with guest-level configuration. This requires noVNC 1.0.0 or greater, which provides support for QEMU’s Extended Key Event messages. Refer to bug #1682020 and the QEMU RFB pull request for more information.
For the VMWare driver, only the VNC option applied. However, the
[vmware] vnc_keymapoption was introduce in 18.0.0 (Rocky) and can be used to replace
The following deprecated scheduler filters have been removed.
Deprecated in Train (20.0.0). The RetryFilter has not been requied since Queens following the completion of the return-alternate-hosts blueprint
- Aggregatefilter, AggregateRAMFilter, AggregateDiskFilter
Deprecated in Train (20.0.0). These filters have not worked correctly since the introduction of placement in ocata.
On upgrade operators should ensure they have not configured any of the new removed filters and instead should use placement to control cpu, ram and disk allocation ratios.
Refer to the config reference documentation for more information.
XenAPIdriver, which was deprecated in the 20.0.0 (Train), has now been removed.
The following config options only apply when using the
XenAPIvirt driver which has now been removed. The config options have therefore been removed also.
The minimum required version of libvirt used by the nova-compute service is now 5.0.0. The minimum required version of QEMU used by the nova-compute service is now 4.0.0. Failing to meet these minimum versions when using the libvirt compute driver will result in the nova-compute service not starting.
Support for the
parallelslibvirt backends, configured via the
[libvirt] virt_typeconfig option, has been deprecated. None of these drivers have upstream testing and the
umlbackends specifically have never been considered production ready. With this change, only the
qemubackends are considered supported when using the libvirt virt driver.
The vmwareapi driver was deprecated in Ussuri due to missing third-party CI coverage and a clear maintainer. These issues have been addressed during the Victoria cycle and the driver is now undeprecated.
Since Libvirt v.1.12.0 and the introduction of the libvirt issue , there is a fact that if we set cache mode whose write semantic is not O_DIRECT (i.e. “unsafe”, “writeback” or “writethrough”), there will be a problem with the volume drivers (i.e. LibvirtISCSIVolumeDriver, LibvirtNFSVolumeDriver and so on), which designate native io explicitly.
When the driver_cache (default is none) has been configured as neither “none” nor “directsync”, the libvirt driver will ensure the driver_io to be “threads” to avoid an instance spawning failure.
Add support for the
hw:hide_hypervisor_idextra spec. This is an alias for the
hide_hypervisor_idextra spec, which was not compatible with the
AggregateInstanceExtraSpecsFilterscheduler filter. See bug 1841932 for more details.
This release contains a fix for bug 1874032 which delegates snapshot upload into a dedicated thread. This ensures nova compute service stability on busy environment during snapshot, when concurrent snapshots or any other tasks slow down storage performance.
Bug 1875418 is fixed by changing the default value of
[oslo_policy] policy_fileconfig option to YAML format.
[workarounds]/reserve_disk_resource_for_image_cacheconfig option was added to fix the bug 1878024 where the images in the compute image cache overallocate the local disk. If this new config is set then the libvirt driver will reserve DISK_GB resources in placement based on the actual disk usage of the image cache.
Previously, attempting to configure an instance with the
VirtualE1000eVIF types on a host using the QEMU/KVM driver would result in an incorrect
UnsupportedHardwareexception. These interfaces are now correctly marked as supported.
Previously, it was possible to specify values for the
hw:cpu_realtime_maskextra spec that were not within the range of valid instances cores. This value is now correctly validated.
Bug #1888022: An issue that prevented detach of multi-attached fs-based volumes is resolved.
An issue that could result in instances with the
isolatethread policy (
hw:cpu_thread_policy=isolate) being scheduled to hosts with SMT (HyperThreading) and consuming
PCPUhas been resolved. See bug #1889633 for more information.
Resolve a race condition that may occur during concurrent
interface detach/attach, resulting in an interface accidentally unbind after attached. See bug 1892870 for more details.
Addressed an issue that prevented instances using multiqueue feature from being created successfully when their vif_type is TAP.
Resolved an issue whereby providing an empty list for the
policiesfield in the request body of the
POST /os-server-groupsAPI would result in a server error. This only affects the 2.1 to 2.63 microversions, as the 2.64 microversion replaces the
policieslist field with a
policystring field. See bug #1894966 for more information.
Since the 16.0.0 (Pike) release, nova has collected NIC feature flags via libvirt. To look up the NIC feature flags for a whitelisted PCI device the nova libvirt driver computed the libvirt nodedev name by rendering a format string using the netdev name associated with the interface and its current MAC address. In some environments the libvirt nodedev list can become out of sync with the current MAC address assigned to a netdev and as a result the nodedev look up can fail. Nova now uses PCI addresses, rather than MAC addresses, to look up these PCI network devices.
Nova tries to remove a volume from Ceph in a retry loop of 10 attempts at 1 second intervals, totaling 10 seconds overall - which, due to 30 second ceph watcher timeout, might result in intermittent object removal failures on Ceph side (bug 1856845). Setting default values for
[libvirt]/rbd_destroy_volume_retriesto 12 and
[libvirt]/rbd_destroy_volume_retry_intervalto 5, now gives Ceph reasonable amount of time to complete the operation successfully.
In the Rocky (18.0.0) release support was added to nova to use neutron’s multiple port binding feature when the binding-extended API extension is available. In the Train (20.0.0) release the SR-IOV live migration feature broke the semantics of the vifs field in the
migration_dataobject that signals if the new multiple port binding workflow should be used by always populating it even when the
binding-extendedAPI extension is not present. This broke live migration for any deployment that did not support the optional
binding-extendedAPI extension. The Rocky behavior has now been restored enabling live migration using the single port binding workflow when multiple port bindings are not available.