Wallaby Series Release Notes

23.0.0

Prelude

The 23.0.0 release includes many new features and bug fixes. Please be sure to read the upgrade section which describes the required actions to upgrade your cloud from 22.0.0 (Victoria) to 23.0.0 (Wallaby).

There are a few major changes worth mentioning. This is not an exhaustive list:

New Features

  • A new image metadata property, hw_input_bus, has been added. This allows you to specify the bus used for input devices - a pointer and keyboard - which are attached to the instance when graphics are enabled on compute nodes using the libvirt virt driver. Two values are currently accepted: usb and virtio. This image metadata property effectively replaced the hw_pointer_model image metadata property, which is nontheless retained for backwards compatibility purposes.

  • The libvirt driver now allows explicitly disabling CPU flags for guests via the [libvirt]cpu_model_extra_flags config attribute. This is possible via a + / - notation, where if you specify a CPU flag prefixed with a + sign (without quotes), it will be enabled for the guest, while a prefix of - will disable it. If neither + nor - is specified, the CPU flag will be enabled, which is the default behaviour.

    Refer to the [libvirt]cpu_model_extra_flags documentation for more information.

  • Add Cyborg shelve/unshelve support.

    After shelve the ARQs are still kept bound to the instance.

    After shelve offload the ARQs of the instance will be feered in Cyborg.

    During unshelve the ARQs will be reallocated and bound to the instance if needed.

  • Added IP addresses to the metadata in libvirt XML. If an instance has more than one IP address, enumerate those IP addresses. The port attach or detach is performed dynamically after the creation of the instance. Every time there is a change, it is reflected in the contents of the XML.

  • The “API unexpected exception” message can now be configured by the cloud provider to point to a custom support page. By default it continues to show “http://bugs.launchpad.net/nova/”. It can be configured using the release file.

  • A [compute]image_type_exclusion_list configuration option was added to remove supported image types from being advertised by a compute node as supported. This is to be used in conjunction with [scheduler]query_placement_for_image_type_support to prevent instances from booting on a compute node with a given image type, even if the underlying hypervisor supports it.

  • Support was added to specify a port NUMA affinity policy for SR-IOV ports. This feature allows users to set a NUMA affinity policy between a neutron port and a NUMA guest’s CPUs and memory. This feature supports the same policies as the existing VM Scoped PCI NUMA Affinity policy and take precedence over the flavor and image policy. This allows operators to set a default affinity policy in the flavor or image while end users can express a more granular affinity policy. To use this feature operators must enable the port-numa-affinity-policy neutron extension and configure the service plugin in neutron. By default the extension is listed as available but is not enabled.

  • The Hyper-V driver can now attach Cinder RBD volumes. The minimum requirements are Ceph 16 (Pacific) and Windows Server 2016.

  • The scheduler can now verify if the requested networks or the port are related to Neutron routed networks with some specific segments to use. In this case, the routed networks prefilter will require the related aggregates to be reported in Placement, so only hosts within the asked aggregates would be accepted. In order to support this behaviour, operators need to set the [scheduler]/query_placement_for_routed_network_aggregates configuration option which defaults to False.

  • A new PCI NUMA affinity policy is available. The hw:pci_numa_affinity_policy flavor extra spec and hw_pci_numa_affinity_policy image metadata property now accept a socket policy value. This value indicates that the PCI device must be affined to the same host socket as at least one of the guest NUMA nodes. For more information, see the PCI Passthrough guide.

  • The POST /servers/{server_id}/os-interface API now supports attaching neutron ports with QoS minimum bandwidth rules attached.

  • Now nova-api and nova-api-metadata WSGI services support command line arguments similarly to other nova services. For example these services now support specifying mutliple config files via –config-file parameter. Please note that passing command line arguments to WSGI apps depends on the given WSGI runner. For example uwsgi supports this via the –pyargv parameter of the uwsgi binary.

  • The libvirt driver has added support for hardware-offloaded OVS with vDPA (vhost Data Path Acceleration) type interfaces. vDPA allows virtio net interfaces to be presented to the guest while the datapath can be offloaded to a software or hardware implementation. This enables high performance networking with the portablity of standard virtio interfaces.

Known Issues

  • When the tempest test coverage was added for resize and cold migrate with neutron ports having QoS minimum bandwidth policy rules we discovered that the cross cell resize code path cannot handle such ports. See bug https://bugs.launchpad.net/nova/+bug/1907522 for details. A fix was implemented that makes sure that Nova falls back to same-cell resize if the server has such ports.

  • The dnspython 2.0.0 package is incompatible with even the latest eventlet package version. This makes nova-novncproxy service to fail if the version of the dnspython package is equal or greater than 2.0.0. See eventlet issue 619 for more details

  • Nova currenly does not support the following livecycle operations when combined with a instance using vDPA ports: shelve, resize, cold migration, live migration, evacuate, suspend or interface attach/detach. Attempting to use one of the above operations will result in a HTTP 409 (Conflict) error. While some operations like “resize to same host”, shelve or attach interface technically work, they have been blocked since unshelve and detach interface currently do not. Resize to a different host has been blocked since its untested, evacuate has also been blocked for the same reason. These limitation may be removed in the future as testing is improved. Live migration is currently not supported with vDPA interfaces by QEMU and therefore cannot be enabled in openstack at this time.

    Like SR-IOV, vDPA leverages DMA transfer between the guest and hardware. This requires the DMA buffers to be locked in memory. As the DMA buffers are allocated by the guest and can be allocated anywhere in the guest RAM, QEMU locks all guest RAM. By default the RLIMIT_MEMLOCK for a normal QEMU intance is set to 0 and qemu is not allowed to lock guest memory. In the case of SR-IOV, libvirt automatically set the limit to guest RAM + 1G which enables QEMU to lock the memory. This does not happen today with vDPA ports. As a result if you use VDPA ports without enabling locking of the guest memory you will get DMA errors. To workaround this issues until libvirt is updated, you must set hw:cpu_realtime=yes and define a valid CPU-REALTIME-MASK e.g hw:cpu_realtime_mask=^0 or define hw:emulator_threads_policy=share|isolate. Note that since we are just using hw:cpu_realtime for its side-effect of locking the guest memory, this usage does not require the guest or host to use realtime kernels. However, all other requirements of hw:cpu_realtime such as requiring hw:cpu_policy=dedicated still apply. It is also stongly recommended that hugpages be enabled for all instnace with locked memory. This can be done by setting hw:mem_page_size. This will enable nova to correctly account for the fact that the memory is unswapable.

Upgrade Notes

  • Be sure to read the Security release notes about upgrade impacts for resolving bug 1552042.

  • Support for the libvirt+UML hypervisor model has been removed. This has not been validated in some time and was never intended for production use.

  • Support for the libvirt+xen hypervisor model has been removed. This has not been validated in some time and was not supported.

  • The [libvirt] xen_hvmloader_path config option has been removed. This was only used with the libvirt+xen hypervisor, which is no longer supported.

  • The libvirt virt driver will now attempt to record the machine type of an instance at startup and when launching an instance if the machine type is not already recorded in the image metadata associated with the instance.

    This machine type will then be used when the instance is restarted or migrated as it will now appear as an image metadata property associated with the instance.

    The following new nova-manage commands have been introduced to help operators manage the hw_machine_type image property:

    nova-manage libvirt get_machine_type

    This command will print the current machine type if set in the image metadata of the instance.

    nova-manage libvirt set_machine_type

    This command will set or update the machine type of the instance assuming the following criteria are met:

    • The instance must have a vm_state of STOPPED, SHELVED or SHELVED_OFFLOADED.

    • The machine type is supported. The supported list includes alias and versioned types of pc, pc-i440fx, pc-q35, q35, virt, s390-ccw-virtio, hyperv-gen1 and hyperv-gen2 as supported by the hyperv driver.

    • The update will not move the instance between underlying machine types. For example, pc to q35.

    • The update will not move the instance between an alias and versioned machine type or vice versa. For example, pc to pc-1.2.3 or pc-1.2.3 to pc.

    A --force flag is provided to skip the above checks but caution should be taken as this could easily lead to the underlying ABI of the instance changing when moving between machine types.

    nova-manage libvirt list_unset_machine_type

    This command will list instance UUIDs that do not have a machine type recorded. An optional cell UUID can be provided to list on instances without a machine type from that cell.

    A new nova-status check has been introduced to help operators identify if any instances within their environment have hw_machine_type unset before they attempt to change the [libvirt]hw_machine_type configurable.

  • Nova services only support old computes if the compute is not older than the previous major nova release. To prevent compatibility issues at run time nova services will refuse to start if the deployment contains too old compute services.

  • Support for custom scheduler drivers, deprecated since the 21.0.0 (Ussuri) release, has been removed. The default filter_scheduler is now considered performant enough to suit all use cases. Users with specific requirements that they feel are not met by the filter scheduler should contact the nova developers to discuss their issue.

  • The [scheduler] scheduler_driver config option has been removed, along with the nova.scheduler.driver setuptools entrypoint.

  • The [scheduler] periodic_task_interval config option has been removed. It was no longer used by any supported scheduler drivers.

  • The [libvirt] use_usb_tablet config option, which was first deprecated in the 14.0.0 (Newton) release, has now been removed. It has been replaced by the [DEFAULT] pointer_model config option.

  • The [glance]/allowed_direct_url_schemes config option, which was first deprecated in the 17.0.0 (Queens) release has now been removed.

  • The nova-manage db ironic_flavor_migration command has been removed. This command could be used to assist users skipping the 16.0.0 (Pike) release, which is now in the distant past.

  • The Ironic Flavor Migration upgrade check has been removed. It is no longer necessary.

  • The nova-manage db null_instance_uuid_scan command has been removed. A blocking migration has been in place since the 12.0.0 (Liberty) release making this check unnecessary.

  • The minimum required version of libvirt used by the nova-compute service is now 6.0.0. The next minimum required version to be used in a future release is 7.0.0.

    The minimum required version of QEMU used by the nova-compute service is now 4.2.0. The next minimum required version to be used in a future release is 5.2.0.

    Failing to meet these minimum versions when using the libvirt compute driver will result in the nova-compute service not starting.

Deprecation Notes

  • The [libvirt]live_migration_tunnelled option is deprecated as of Wallaby (23.0.0) release.

    The “tunnelled live migration” has two inherent limitations: (a) it cannot handle live migration of disks in a non-shared storage setup, and (b) it has a huge performance overhead and latency, because it burns more CPU and memory during live migration.

    Both these problems are addressed by the QEMU-native support in Nova – this is the recommended approach for securing all live migration streams (guest RAM, device state, and disks). Assuming TLS environment is setup, this can be enabled by setting the config attribute [libvirt]live_migration_with_native_tls.

  • The [workarounds]rbd_volume_local_attach and [workarounds]disable_native_luksv1 options have been deprecated as of the 23.0.0 release ahead of removal in the future as the underlying libgcrypt performance regressions that prompted their introduction have been resolved.

    Any remaining users of these workarounds should plan to disable these workarounds as soon as possible. Note that this requires that any instances on compute hosts using the workaround be shutdown ahead of the value of the workaround changing, before being restarted.

  • The 2.88 API microversion has been added. This microversion removes a number of fields have been removed from the GET /os-hypervisors/detail (detailed list) and GET /os-hypervisors/{hypervisor_id} (show) APIs:

    - ``current_workload``
    - ``cpu_info``
    - ``vcpus``
    - ``vcpus_used``
    - ``free_disk_gb``
    - ``local_gb``
    - ``local_gb_used``
    - ``disk_available_least``
    - ``free_ram_mb``
    - ``memory_mb``
    - ``memory_mb_used``
    - ``running_vms``
    

    The fields have been removed as the information they provided was frequently misleading or outright wrong, and more accurate information can now be queried from placement.

    In addition, the GET /os-hypervisors/statistics API, which provided a summary view with just the fields listed above, has been removed entirely and will now raise a HTTP 404 with microversion 2.88 or greater.

    Finally, the GET /os-hypervisors/{hypervisor}/uptime API, which provided a similar response to the GET /os-hypervisors/{hypervisor} API but with an additional uptime field, has been removed in favour of including this field in the primary GET /os-hypervisors/{hypervisor} API.

Security Issues

Bug Fixes

  • Nova will now replace periods (.) with dashes (-) when santizing an instance’s display name for use as a hostname.

    Nova publishes hostnames for instances via the metadata service and config drives. This hostname is based on a sanitized version of the instance name combined with the domain value specified in [api] dhcp_domain. The previous sanitization of the hostname included the replacement of whitespace and underscores with dashes and the stripping of unicode characters along with leading and trailing periods and dashes. It did not, however, include the removal of periods in the name. Periods are not valid in the hostname or, more specifically, in the host-specific or leaf label (the host in host.example.com) and their presence can cause conflicts when [api] dhcp_domain is configured, leading to instances being mistakenly configured with hostnames like host.example.com.example.com. More pressingly, their use can result in a failure to boot instances if DNS integration is enabled in neutron, likely via designate, as the hostname is identified as a FQDN (fully-qualified domain name) by neutron and reasonable instance names like test-ubuntu20.04 will be rejected as invalid FQDNs, in this case because the name would yield a TLD (top-level domain) of 04 and TLDs cannot be entire numerical. To avoid these issues, periods are now replaced with dashes.

  • Fixes bug 1892361 in which the pci stat pools are not updated when an existing device is enabled with SRIOV capability. Restart of nova-compute service updates the pci device type from type-PCI to type-PF but the pools still maintain the device type as type-PCI. And so the PF is considered for allocation to instance that requests vnic_type=direct. With this fix, the pci device type updates are detected and the pci stat pools are updated properly.

  • bug 1882521 has now been resolved by increasing the incremental and max sleep times between device detach attempts. This works around some undefined QEMU behaviour documented in bug 1894804 where overlapping device_del requests would cancel the initial call leading to a situation where the device was never detached fully.

  • When upgrading compute services from Ussuri to Victoria each by one, the Compute RPC API was pinning to 5.11 (either automatically or by using the specific rpc version in the option) but when rebuilding an instance, a TypeError was raised as an argument was not provided. This error is fixed by bug 1902925.

  • The libvirt virt driver will no longer attempt to fetch volume encryption metadata or the associated secret key when attaching LUKSv1 encrypted volumes if a libvirt secret already exists on the host.

    This resolves bug 1905701 where instances with LUKSv1 encrypted volumes could not be restarted automatically by the nova-compute service after a host reboot when the [DEFAULT]/resume_guests_state_on_host_boot configurable was enabled.

  • Previously, when using the libvirt driver on x86 hosts, a USB controller was added by default to all instances even if no guest device actually required this controller. This has been resolved. A USB controller will now only be added if an input or disk device requires a USB bus.

  • Support for cold migration and resize between hosts with different network backends was previously incomplete. If the os-vif plugin for all network backends available in the cloud are not installed on all nodes unplugging will fail during confirming the resize. The issue is caused by the VIF unplug that happened during the resize confirm action on the source host when the original backend information of the VIF was not available. The fix moved the unplug to happen during the resize action when such information is still available. See bug #1895220 for more details.

Other Notes

  • Remove the old config option bindir since it was used for nova-network which had been removed.