Yoga Series Release Notes

25.3.0

Upgrade Notes

  • In the libvirt driver, the default value of the <cputune><shares> element has been removed, and is now left to libvirt to decide. This is because allowed values are platform dependant, and the previous code was not guaranteed to be supported on all platforms. If any of your flavors are using the quota:cpu_shares extra spec, you may need to resize to a supported value before upgrading.

    To facilitate the transition to no Nova default for <cputune><shares>, its value will be removed during live migration unless a value is set in the quota:cpu_shares extra spec. This can cause temporary CPU starvation for the live migrated instance if other instances on the destination host still have the old default <cputune><shares> value. To fix this, hard reboot, cold migrate, or live migrate the other instances.

25.2.1

Bug Fixes

  • Bug #1941005 is fixed. During resize Nova now uses the PCI requests from the new flavor to select the destination host.

25.2.0

Upgrade Notes

  • Configuration of service user tokens is now required for all Nova services to ensure security of block-storage volume data.

    All Nova configuration files must configure the [service_user] section as described in the documentation.

    See https://bugs.launchpad.net/nova/+bug/2004555 for more details.

25.1.1

Bug Fixes

  • When the server group policy validation upcall is enabled nova will assert that the policy is not violated on move operations and initial instance creation. As noted in bug 1890244, if a server was created in a server group and that group was later deleted the validation upcall would fail due to an uncaught excpetion if the server group was deleted. This prevented evacuate and other move operations form functioning. This has now been fixed and nova will ignore deleted server groups.

  • Fix rescuing volume based instance by adding a check for ‘hw_rescue_disk’ and ‘hw_rescue_device’ properties in image metadata before attempting to rescue instance.

25.1.0

Known Issues

  • Nova’s use of libvirt’s compareCPU() API served its purpose over the years, but its design limitations break live migration in subtle ways. For example, the compareCPU() API compares against the host physical CPUID. Some of the features from this CPUID aren not exposed by KVM, and then there are some features that KVM emulates that are not in the host CPUID. The latter can cause bogus live migration failures.

    With QEMU >=2.9 and libvirt >= 4.4.0, libvirt will do the right thing in terms of CPU compatibility checks on the destination host during live migration. Nova satisfies these minimum version requirements by a good margin. So, this workaround provides a way to skip the CPU comparison check on the destination host before migrating a guest, and let libvirt handle it correctly.

    This workaround will be deprecated and removed once Nova replaces the older libvirt APIs with their newer counterparts. The work is being tracked via this blueprint cpu-selection-with-hypervisor-consideration.

Bug Fixes

  • As a fix for bug 1942329 nova now updates the MAC address of the direct-physical ports during mova operations to reflect the MAC address of the physical device on the destination host. Those servers that were created before this fix need to be moved or the port needs to be detached and the re-attached to synchronize the MAC address.

  • Bug #1978444: Now nova retries deleting a volume attachment in case Cinder API returns 504 Gateway Timeout. Also, 404 Not Found is now ignored and leaves only a warning message.

  • Bug #1981813: Now nova detects if the vnic_type of a bound port has been changed in neutron and leaves an ERROR message in the compute service log as such change on a bound port is not supported. Also the restart of the nova-compute service will not crash any more after such port change. Nova will log an ERROR and skip the initialization of the instance with such port during the startup.

  • If compute service is down in source node and user try to stop instance, instance gets stuck at powering-off, hence evacuation fails with msg: Cannot ‘evacuate’ instance <instance-id> while it is in task_state powering-off. It is now possible for evacuation to ignore the vm task state. For more details see: bug 1978983

  • When vDPA was first introduced move operations were implemented in the code but untested either in a real environment or in functional tests. Due to this gap nova elected to block move operations for instance with vDPA devices. All move operations except for live migration have now been tested and found to indeed work so the API blocks have now been removed and functional tests introduced. Other operations such as suspend and live migration require code changes to support and will be enabled as new features in the future.

Other Notes

  • A workaround has been added to the libvirt driver to catch and pass migrations that were previously failing with the error:

    libvirt.libvirtError: internal error: migration was active, but no RAM info was set

    See bug 1982284 for more details.

25.0.1

Bug Fixes

  • Instances with hardware offloaded ovs ports no longer lose connectivity after failed live migrations. The driver.rollback_live_migration_at_source function is no longer called during during pre_live_migration rollback which previously resulted in connectivity loss following a failed live migration. See Bug 1944619 for more details.

  • Bug #1970383: Fixes a permissions error when using the ‘query_placement_for_routed_network_aggregates’ scheduler variable, which caused a traceback on instance creation for non-admin users.

25.0.0

Prelude

The 25.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 24.0.0 (Xena) to 25.0.0 (Yoga).

There are a few major changes worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for Yoga is v2.90 (same as the Xena release).

  • Experimental support is added for Keystone’s unified limits. This will allow operators to test this feature in non-production systems so we can collect early feedback about performance.

  • Keystone’s policy concepts of system vs. project scope and roles has been implemented in Nova and defaults roles and scopes have been defined, while legacy policies continue to be enabled by default. Operators are encouraged to familiarize with the new policies and enable them in advance before Nova switches from the legacy roles in a later release.

  • Support is added for network backends that leverage SmartNICs to offload the control plane from the host server. Accordingly, Neutron needs to be configured in order to enable it correctly. Increased security is enabled by removing the control plane from the host server and overhead is reduced by leveraging the cpu and ram resources on modern SmartNIC DPUs.

  • Experimental support for emulated architecture is now implemented. AArch64, PPC64LE, MIPs, and s390x guest architectures are available independent of the host architecture. This is strictly not intended for production use for various reasons, including no security guarantees.

New Features

  • Added support for VMware VStorageObject based volumes in VMware vCenter driver. vSphere version 6.5 is required.

  • Added a new configuration option [workarounds]/enable_qemu_monitor_announce_self that when enabled causes the Libvirt driver to send a announce_self QEMU monitor command post live-migration. Please see bug 1815989 for more details. Please note that this causes the domain to be considered tainted by libvirt.

  • Nova now allows to create an instance with a non-deferred port that has no fixed IP address if the network backend has level-2 connectivity.

  • image meta now includes the hw_emulation_architecture property. This allows an operator to define their emulated cpu architecture for an image, and nova will deploy accordingly.

    See the spec for more details and reasoning.

  • The Nova policies have been modified to isolate the system and project level APIs policy. This means system users will be allowed to perform the operation on system level resources and will not to allowed any operation on project level resources. Project Level APIs operation will be performed by the project scoped users. Currently, nova supports:

    • system admin

    • project admin

    • project member

    • project reader

    For the details on what changed from the existing policy, please refer the RBAC new guidelines. We have implemented only phase-1 RBAC new guidelines. Currently, scope checks and new defaults are disabled by default. You can enable them by switching the below config option in nova.conf file:

    [oslo_policy]
    enforce_new_defaults=True
    enforce_scope=True
    

    Please refer Policy New Defaults for detail about policy new defaults and migration plan.

  • A new [cinder]/debug configurable has been introduced to enable DEBUG logging for both the python-cinderclient and os-brick libraries independently to the rest of Nova.

  • Extra sortings were added to numa_fit_instance_to_host function to balance usage of hypervisor’s NUMA cells. Hypervisor’s NUMA cells with more free resources (CPU, RAM, PCI if requested) will be used first (spread strategy) when configuration option packing_host_numa_cells_allocation_strategy was set to False. Default value of packing_host_numa_cells_allocation_strategy option is set to True which leads to packing strategy usage.

  • The hw:vif_multiqueue_enabled flavor extra spec has been added. This is a boolean option that, when set, can be used to enable or disable multiqueue for virtio-net VIFs. It complements the equivalent image metadata property, hw_vif_multiqueue_enabled. If both values are set, they must be identical or an error will be raised.

  • Nova now support integration with the Lightbits Labs (http://www.lightbitslabs.com) LightOS storage solution. LightOS is a software-defined, cloud native, high-performance, clustered scale-out and redundant NVMe/TCP storage that performs like local NVMe flash.

  • New nova-manage image_property commands have been added to help update instance image properties that have become invalidated by a change of instance machine type.

    • The nova-manage image_property show command can be used to show the current stored image property value for a given instance and property.

    • The nova-manage image_property set command can be used to update the stored image properties stored in the database for a given instance and image properties.

    For more detail on command usage, see the machine type documentation:

    https://docs.openstack.org/nova/latest/admin/hw-machine-type.html#device-bus-and-model-image-properties

  • Add VPD capability parsing support when a PCI VPD capability is exposed via node device XML in Libvirt. The XML data from Libvirt is parsed and formatted into PCI device JSON dict that is sent to Nova API and is stored in the extra_info column of a PciDevice.

    The code gracefully handles the lack of the capability since it is optional or Libvirt may not support it in a particular release.

    A serial number is extracted from PCI VPD of network devices (if present) and is sent to Neutron in port updates.

    Libvirt supports parsing the VPD capability from PCI/PCIe devices and exposing it via nodedev XML as of 7.9.0.

  • Added support for ports with minimum guaranteed packet rate QoS policy rules. Support is provided for all server operations including cold migration, resize, interface attach/detach, etc. This feature required adding support for the port-resource-request-groups neutron API extension, as ports with such a QoS policy will have multiple rules, each requesting resources. For more details see the admin guide.

  • The nova-manage placement heal_allocations CLI now allows regenerating the placement allocation of servers with ports using minimum guaranteed packet rate QoS policy rules.

  • The libvirt driver now allows using Native NVMeoF multipathing for NVMeoF connector, via the configuration attribute in nova-cpu.conf [libvirt]/volume_use_multipath, defaulting to False (disabled).

  • From this release, Nova instances will get virtio as the default display device (instead of cirrus, which has many limitations). If your guest has a native kernel (called “virtio-gpu” in Linux; available since Linux 4.4 and above) driver, then it’ll be used; otherwise, the ‘virtio’ model will gracefully fallback to VGA compatibiliy mode, which is still better than cirrus.

  • Added support for off-path networking backends where devices exposed to the hypervisor host are managed remotely (which is the case, for example, with various SmartNIC DPU devices). VNIC_TYPE_REMOTE_MANAGED ports can now be added to Nova instances as soon as all compute nodes are upgraded to the new compute service version. In order to use this feature, VF PCI/PCIe devices need to be tagged as remote_managed: "true"` in the Nova config in the ``passthrough_whitelist option.

    This feature relies on Neutron being upgraded to the corresponding release of OpenStack and having an appropriate backend capable of binding VNIC_TYPE_REMOTE_MANAGED ports (at the time of writing, ML2 with the OVN ML2 mechanism driver is the only supported backend, see the Neutron documentation for more details).

    Note that the PCI devices (VFs or, alternatively, their PF) must have a valid PCI Vital Product Data (VPD) with a serial number present in it for this feature to work properly. Also note that only VFs can be tagged as remote_managed: "true" and they cannot be used for legacy SR-IOV use-cases.

    Nova operations on instances with VNIC_TYPE_REMOTE_MANAGED ports follow the same logic as the operations on direct SR-IOV ports.

    This feature is only supported with the Libvirt driver.

Known Issues

  • The libvirt virt driver in Nova implements power on and hard reboot by destroying the domain first and unpluging the vifs then recreating the domain and replugging the vifs. However nova does not wait for the network-vif-plugged event before unpause the domain. This can cause the domain to start running and requesting IP via DHCP before the networking backend has finished plugging the vifs. The config option [workarounds]wait_for_vif_plugged_event_during_hard_reboot has been added, defaulting to an empty list, that can be used to ensure that the libvirt driver waits for the network-vif-plugged event for vifs with specific vnic_type before it unpauses the domain during hard reboot. This should only be used if the deployment uses a networking backend that sends such event for the given vif_type at vif plug time. The ml2/ovs and the networking-odl Neutron backend is known to send plug time events for ports with normal vnic_type. For more information see https://bugs.launchpad.net/nova/+bug/1946729

Upgrade Notes

  • The bandwidth field has been removed from the instance.exists and instance.update versioned notifications and the version for both notifications has been bumped to 2.0. The bandwidth field was only relevant when the XenAPI virt driver was in use, but this driver was removed in the Victoria (22.0.0) release and the field has been a no-op since.

  • vnc-related config options were deprecated in Pike release and now has been removed:

    • vncserver_listen opt removed, now we use only server_listen to bind vnc address opt.

    • vncserver_proxyclient_address opt removed, now we use only server_proxyclient_address opt.

  • Support for the qos-queue extension provided by the vmware-nsx neutron plugin for the VMWare NSX Manager has been removed. This extension was removed from the vmware-nsx project when support for NSX-MH was removed in 15.0.0.

Deprecation Notes

  • The powervm virt driver is deprecated and may be removed in a future release. The driver is not tested by the OpenStack project nor does it have clear maintainers and thus its quality can not be ensured.

Bug Fixes

  • The POST /servers (create server) API will now reject attempts to create a server with the same port specified multiple times. This was previously accepted by the API but the instance would fail to spawn and would instead transition to the error state.

  • Bug #1829479: Now deleting a nova-compute service removes allocations of successfully evacuated instances. This allows the associated resource provider to be deleted automatically even if the nova-compute service cannot recover after all instances on the node have been successfully evacuated.

  • Amended the guest resume operation to support mediated devices, as libvirt’s minimum required version (v6.0.0) supports the hot-plug/unplug of mediated devices, which was addressed in v4.3.0.

  • The bug 1952941 is fixed where a pre-Victoria server with pinned CPUs cannot be migrated or evacuated after the cloud is upgraded to Victoria or newer as the scheduling fails with NotImplementedError: Cannot load 'pcpuset' error.

  • [bug 1958636] Explicitly check for and enable SMM when firmware requires it. Previously we assumed libvirt would do this for us but this is not true in all cases.

  • Fixed bug 1960230 that prevented resize of instances that had previously failed and not been cleaned up.

  • The bug 1960401 is fixed which can cause invalid BlockDeviceMappings to accumulate in the database. This prevented the respective volumes from being attached again to the instance.

  • Bug 1950657, fixing behavior when nova-compute wouldn’t retry image download when gets “Corrupt image download” error from glanceclient and has num_retries config option set.

  • Fixes slow compute restart when using the nova.virt.ironic compute driver where the driver was previously attempting to attach VIFS on start-up via the plug_vifs driver method. This method has grown otherwise unused since the introduction of the attach_interface method of attaching VIFs. As Ironic manages the attachment of VIFs to baremetal nodes in order to align with the security requirements of a physical baremetal node’s lifecycle. The ironic driver now ignores calls to the plug_vifs method.

  • During the havana cycle it was discovered that eventlet monkey patching of greendns broke ipv6. https://bugs.launchpad.net/nova/+bug/1164822 Since then nova has been disabling eventlet monkey patching of greendns. Eventlet adressed the ipv6 limitation in v0.17 with the introduction of python 3 support in 2015. Nova however continued to disable it, which can result i slow dns queries blocking the entire nova api or other binary because socket.getaddrinfo becomes a blocking call into glibc see: https://bugs.launchpad.net/nova/+bug/1964149 for more details.

Other Notes

  • This release includes work in progress support for Keystone’s unified limits. This should not be used in production. It is included so we can collect early feedback from operators around the performance of the new limits system. There is currently no way to export your existing quotas and import them into Keystone. There is also no proxy API to allow you to update unified limits via Nova APIs. All the update APIs behave as if you are using the noop driver when the unified limits quota driver is configured.

    When you enable unified limits, those are configured in Keystone against the Nova endpoint, using the names:

    • class:VCPU

    • servers

    • class:MEMORY_MB

    • server_metadata_items

    • server_injected_files

    • server_injected_file_content_bytes

    • server_injected_file_path_bytes

    • server_key_pairs

    • server_groups

    • server_group_members

    All other resources classes requested via flavors are also now supported as unified limits. Note that nova configuration is ignored, as the default limits come from the limits registered for the Nova endpoint in Keystone.

    All previous quotas other than cores, instances and ram are still enforced, but the limit can only be changed globally in Keystone as registered limits. There are no per project or per user overrides possible.

    Work in progress support for Keystone’s unified limits can be enabled via [quota]/driver=nova.quota.UnifiedLimitsDriver

    A config option [workarounds]unified_limits_count_pcpu_as_vcpu is available for operators who require the legacy quota usage behavior where VCPU = VCPU + PCPU. Note that if PCPU is specified in the flavor explicitly, it will be expected to have its own unified limit registered and PCPU usage will not be merged into VCPU usage.

  • Default image properties for device buses and models are now persisted in the instance system metadata for the following image properties:

    • hw_cdrom_bus

    • hw_disk_bus

    • hw_input_bus

    • hw_pointer_model

    • hw_video_model

    • hw_vif_model

    Instance device buses and models will now remain stable across reboots and will not be changed by new defaults in libosinfo or the OpenStack Nova libvirt driver.