Zed Series Release Notes

26.0.0-4

Other Notes

  • A workaround has been added to the libvirt driver to catch and pass migrations that were previously failing with the error:

    libvirt.libvirtError: internal error: migration was active, but no RAM info was set

    See bug 1982284 for more details.

26.0.0

Prelude

The 26.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 25.0.0 (Yoga) to 26.0.0 (Zed).

There are a few major changes worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for Zed is v2.93.

  • Virtual IOMMU devices can now be created and attached to an instance when running on a x86 host and using the libvirt driver.

  • Improved behavior for Windows guest by adding by default following Hyper-V enlightments on all libvirt guests : vpindex, runtime, synic, reset, frequencies, reenlightenment, tlbflush, ipi and evmc.

  • All lifecycle actions are now fully supported for instances with vDPA ports, including vDPA hot-plug live migration, suspend and attach/detach.

  • Volume-backed instances (instances with root disk attached as a volume) can now be rebuilt by specifying a 2.93 microversion instead of returning a HTTP400 exception.

  • The unshelve instance API action now provides a new host parameter with 2.91 microversion (for only admins).

  • With microversion 2.92, you can only import a public key and not generate a keypair. You can also use an extended name pattern.

  • The default system scope is removed from all APIs hence finishing to implement phase #1 of new RBAC guidelines that are opt-in.

New Features

  • Added support for rebuilding a volume-backed instance with a different image. This is achieved by reimaging the boot volume i.e. writing new image on the boot volume at cinder side. Previously rebuilding volume-backed instances with same image was possible but this feature allows rebuilding volume-backed instances with a different image than the existing one in the boot volume. This is supported starting from API microversion 2.93.

  • The 2.92 microversion makes the following changes:

    • Make public_key a mandatory parameter for keypair creation. This means that by this microversion, Nova will stop to support automatic keypair generations. Only imports will be possible.

    • Allow 2 new special characters: ‘@’ and ‘.’ (dot), in addition to the existing constraints of [a-z][A-Z][0-9][_- ]

  • Nova started tracking PCI devices in Placement. This is an optional feature disabled by default while we are implementing inventory tracking and scheduling support for both PCI passthrough devices and SR-IOV devices consumed via Neutron ports. Please read our documentation for more details on what is supported how this feature can be enabled.

  • Microversion 2.91 adds the optional parameter host to the unshelve server action API. Specifying a destination host is only allowed to admin users and server status must be SHELVED_OFFLOADED otherwise a HTTP 400 (bad request) response is returned. It also allows to set availability_zone to None to unpin a server from an availability_zone.

  • The Libvirt driver can now add a virtual IOMMU device to all created guests, when running on an x86 host and using the Q35 machine type or on AArch64.

    To enable this, provide hw:viommu_model in flavor extra spec or equivalent image metadata property hw_viommu_model and with the guest CPU architecture and OS allows, we will enable viommu in Libvirt driver. Support values intel|smmuv3|virtio|auto. Default to auto. Which auto will automatically select virtio if Libvirt supports it, else intel on X86 (Q35) and smmuv3 on AArch64. vIOMMU config will raise invalid exception if the guest architecture is neither X86 (Q35) or AArch64.

    Note that, enable vIOMMU might introduce significant performance overhead. You can see performance comparision table from AMD vIOMMU session on KVM Forum 2021. For above reason, vIOMMU should only be enable for workflow that require it. .. AMD vIOMMU session on KVM Forum 2021: https://static.sched.com/hosted_files/kvmforum2021/da/vIOMMU%20KVM%20Forum%202021%20-%20v4.pdf

  • Add new hw:locked_memory extra spec and hw_locked_memory image property to lock memory on libvirt guest. Locking memory marks the guest memory allocations as unmovable and unswappable. hw:locked_memory extra spec and hw_locked_memory image property accept boolean values in string format like ‘Yes’ or ‘false’ value. Exception LockMemoryForbidden will raise, if you set lock memory value but not set either flavor extra spec hw:mem_page_size or image property hw_mem_page_size, so we can ensure that the scheduler can actually account for this correctly and prevent out of memory events.

  • The Nova policies have been modified to drop the system scope. Every API policy is scoped to project. This means that system scoped users will get 403 permission denied error.

    Also, the project reader role is ready to use. Users with reader role can only perform the read-only operations within their project. This role can be used for the audit purposes.

    Currently, nova supports the following roles:

    • admin (Legacy admin)

    • project member

    • project reader

    For the details on what changed from the existing policy, please refer to the RBAC new guidelines. We have implemented only phase-1 of the RBAC new guidelines. Currently, scope checks and new defaults are disabled by default. You can enable them by switching the below config option in nova.conf file:

    [oslo_policy]
    enforce_new_defaults=True
    enforce_scope=True
    

    We recommend to enable the both scope as well new defaults together otherwise you may experience some late failures with unclear error messages.

    Please refer Policy New Defaults for detail about policy new defaults and migration plan.

  • The following enlightenments are now added by default to the libvirt XML for Windows guests:

    • vpindex

    • runtime

    • synic

    • reset

    • frequencies

    • reenlightenment

    • tlbflush

    • ipi

    • evmc

    This adds to the list of already existing enlightenments, namely:

    • relaxed

    • vapic

    • spinlocks retries

    • vendor_id spoofing

  • vDPA support was first introduced in the 23.0.0 (Wallaby) release with limited instance lifecycle operations. Nova now supports all instance lifecycle operations including suspend, attach/detach and hot-plug live migration.

    QEMU and the Linux kernel do not currently support transparent live migration of vDPA devices at this time. Hot-plug live migration unplugs the VDPA device on the source host before the VM is live migrated and automatically hot-plugs the device on the destination after the migration. While this can lead to packet loss it enable live migration to be used when needed until transparent live migration can be added in a future release.

    VDPA Hot-plug live migration requires all compute services to be upgraded to service level 63 to be enabled. Similarly suspend resume need service level 63 and attach/detach require service level 62. As such it will not be available to use during a rolling upgrade but will become available when all host are upgraded to the 26.0.0 (Zed) release.

    With the addition of these features, all instance lifecycle operations are now valid for VMs with VDPA neutron ports.

Known Issues

  • Nova’s use of libvirt’s compareCPU() API served its purpose over the years, but its design limitations break live migration in subtle ways. For example, the compareCPU() API compares against the host physical CPUID. Some of the features from this CPUID aren not exposed by KVM, and then there are some features that KVM emulates that are not in the host CPUID. The latter can cause bogus live migration failures.

    With QEMU >=2.9 and libvirt >= 4.4.0, libvirt will do the right thing in terms of CPU compatibility checks on the destination host during live migration. Nova satisfies these minimum version requirements by a good margin. So, this workaround provides a way to skip the CPU comparison check on the destination host before migrating a guest, and let libvirt handle it correctly.

    This workaround will be deprecated and removed once Nova replaces the older libvirt APIs with their newer counterparts. The work is being tracked via this blueprint cpu-selection-with-hypervisor-consideration.

Upgrade Notes

  • During the triage of https://bugs.launchpad.net/nova/+bug/1978372 we compared the performance of nova’s numa allocations strategies as it applied to the large numbers of host and guest numa nodes. Prior to Xena nova only supported a linear packing strategy. In Xena [compute]/packing_host_numa_cells_allocation_strategy was introduced maintaining the previous packing behavior by default. The numa allocation strategy has now been defaulted to spread. The old behavior can be restored by defining: [compute]/packing_host_numa_cells_allocation_strategy=true

  • The default api-paste.ini file has been updated and now the Metadata API pipeline includes the HTTPProxyToWSGI middleware.

  • Python 3.6 & 3.7 support has been dropped. The minimum version of Python now supported by nova is Python 3.8.

  • In the libvirt driver, the default value of the <cputune><shares> element has been removed, and is now left to libvirt to decide. This is because allowed values are platform dependant, and the previous code was not guaranteed to be supported on all platforms. If any of your flavors are using the quota:cpu_shares extra spec, you may need to resize to a supported value before upgrading.

    To facilitate the transition to no Nova default for <cputune><shares>, its value will be removed during live migration unless a value is set in the quota:cpu_shares extra spec. This can cause temporary CPU starvation for the live migrated instance if other instances on the destination host still have the old default <cputune><shares> value. To fix this, hard reboot, cold migrate, or live migrate the other instances.

  • The powervm virt driver has been removed. The driver was not tested by the OpenStack project nor did it have clear maintainers and thus its quality could not be ensured.

  • The upgrade check tooling now returns a non-zero exit code in the presence of compute node services that are too old. This is to avoid situations in which Nova control services fail to start after an upgrade.

Deprecation Notes

  • The [pci]passthrough_whitelist config option is renamed to [pci]device_spec. The old name is deprecated and aliased to the new one. The old name will be removed in a future release.

  • The [api] use_forwarded_for parameter has been deprecated. Instead of using this parameter, add the HTTPProxyToWSGI middleware to api pipelines, and [oslo_middleware] enable_proxy_headers_parsing = True to nova.conf.

Bug Fixes

  • As a fix for bug 1942329 nova now updates the MAC address of the direct-physical ports during mova operations to reflect the MAC address of the physical device on the destination host. Those servers that were created before this fix need to be moved or the port needs to be detached and the re-attached to synchronize the MAC address.

  • Instances with hardware offloaded ovs ports no longer lose connectivity after failed live migrations. The driver.rollback_live_migration_at_source function is no longer called during during pre_live_migration rollback which previously resulted in connectivity loss following a failed live migration. See Bug 1944619 for more details.

  • Extending attached encrypted volumes that failed before because they were not being decrypted using libvirt (any other than LUKS) now work as expected and the new size will be visible within the instance. See Bug 1967157 for more details.

  • Bug #1970383: Fixes a permissions error when using the ‘query_placement_for_routed_network_aggregates’ scheduler variable, which caused a traceback on instance creation for non-admin users.

  • The algorithm that is used to see if a multi NUMA guest fits to a multi NUMA host has been optimized to speed up the decision on hosts with high number of NUMA nodes ( > 8). For details see bug 1978372

  • Bug #1978444: Now nova retries deleting a volume attachment in case Cinder API returns 504 Gateway Timeout. Also, 404 Not Found is now ignored and leaves only a warning message.

  • Bug #1981813: Now nova detects if the vnic_type of a bound port has been changed in neutron and leaves an ERROR message in the compute service log as such change on a bound port is not supported. Also the restart of the nova-compute service will not crash any more after such port change. Nova will log an ERROR and skip the initialization of the instance with such port during the startup.

  • Bug #1941005 is fixed. During resize Nova now uses the PCI requests from the new flavor to select the destination host.

  • Bug #1986838: Nova now correctly schedules an instance that requests multiple PCI devices via multiple PCI aliases in the flavor extra_spec when multiple similar devices are requested but the compute host has only one such device matching with each request individually.

  • When the server group policy validation upcall is enabled nova will assert that the policy is not violated on move operations and initial instance creation. As noted in bug 1890244, if a server was created in a server group and that group was later deleted the validation upcall would fail due to an uncaught excpetion if the server group was deleted. This prevented evacuate and other move operations form functioning. This has now been fixed and nova will ignore deleted server groups.

  • If compute service is down in source node and user try to stop instance, instance gets stuck at powering-off, hence evacuation fails with msg: Cannot ‘evacuate’ instance <instance-id> while it is in task_state powering-off. It is now possible for evacuation to ignore the vm task state. For more details see: bug 1978983

  • Added validation for image machine type property. Different APIs which uses machine type for server creation, resize or rebuild will raise InvalidMachineType exception with message “provided machine type is not supported by host” and suggest possible/valid machine types in compute logs. For more details see: bug 1933097

  • When vDPA was first introduced move operations were implemented in the code but untested either in a real environment or in functional tests. Due to this gap nova elected to block move operations for instance with vDPA devices. All move operations except for live migration have now been tested and found to indeed work so the API blocks have now been removed and functional tests introduced. Other operations such as suspend and live migration require code changes to support and will be enabled as new features in the future.

  • For the VMware ESXi, VM memory should be multiple of 4. Otherwise creating instance on ESXi fails with error “VimFaultException: Memory (RAM) size is invalid.”. Instances will now fail to spawn if flavor memory is not a multiple of 4.