Rocky Series Release Notes

Rocky Series Release Notes

18.0.3-8

Upgrade Notes

  • The nova-manage db online_data_migrations command now returns exit status 2 in the case where some migrations failed (raised exceptions) and no others were completed successfully from the last batch attempted. This should be considered a fatal condition that requires intervention. Exit status 1 will be returned in the case where the --max-count option was used and some migrations failed but others succeeded (updated at least one row), because more work may remain for the non-failing migrations, and their completion may be a dependency for the failing ones. The command should be reiterated while it returns exit status 1, and considered completed successfully only when it returns exit status 0.

Bug Fixes

  • The os-simple-tenant-usage pagination has been fixed. In some cases, nova usage-list would have returned incorrect results because of this. See bug https://launchpad.net/bugs/1796689 for details.

18.0.3

Upgrade Notes

  • A new check is added to the nova-status upgrade check CLI to check for use of the nova-consoleauth service to warn and provide additional instructions to set [workarounds]enable_consoleauth = True while performing a live/rolling upgrade.

18.0.1

Bug Fixes

  • A change has been introduced in the libvirt driver to correctly handle IPv6 addresses for live migration.

18.0.0

Prelude

The 18.0.0 release includes many new features and bug fixes. It is difficult to cover all the changes that have been introduced. Please at least read the upgrade section which describes the required actions to upgrade your cloud from 17.0.0 (Queens) to 18.0.0 (Rocky).

That said, a few major changes are worth mentioning. This is not an exhaustive list:

  • The latest Compute API microversion supported for Rocky is v2.65. Details on REST API microversions added since the 17.0.0 Queens release can be found in the REST API Version History page.
  • Nova is now using the new Neutron port binding API to minimize network downtime during live migrations. See the related spec for more details.
  • Volume-backed instances will no longer report root_gb usage for new instances and existing instances will heal during move operations.
  • Several REST APIs specific to nova-network were removed and the core functionality of nova-network is planned to be removed in the 19.0.0 Stein release.
  • A nova-manage db purge command to purge archived shadow table data is now available. A new --purge option is also available for the nova-manage db archive_deleted_rows command.
  • It is now possible to disable a cell to stop scheduling to a cell by using the nova-manage cell_v2 update_cell command.
  • The libvirt compute driver now supports trusted image certificates when using the 2.63 compute API microversion. See the image signature certificate validation documentation for more details.
  • It is now possible to configure a separate database for the placement service, which could help in easing the eventual placement service extraction from Nova and data migration associated with it.
  • A nova-manage placement heal_allocations command is now available to allow users of the CachingScheduler to get the placement service populated for their eventual migration to the FilterScheduler. The CachingScheduler is deprecated and could be removed as early as Stein.
  • The placement service now supports granular RBAC policy rules configuration. See the placement policy documentation for details.
  • A new zVM virt driver is now available.
  • The nova-consoleauth service has been deprecated.

New Features

  • AArch64 architecture is supported by Nova with libvirt min version 3.6.0. See the Nova support matrix for more details.
  • The support to abort live migrations with queued and preparing status using DELETE /servers/{server_id}/migrations/{migration_id} API has been added in microversion 2.65.
  • Instance action versioned notifications now contain action_initiator_user and action_initiator_project fields to distinguish between the owner of the instance and who initiated the action upon the instance, for example an administrator or another user within the same project.
  • Add CPUWeigher weigher. This can be used to spread (default) or pack workloads on hosts based on their vCPU usage. This can be configured using the [filter_scheduler] cpu_weight_multiplier configuration option.
  • A new option disabled has been added to nova-manage cell_v2 create_cell command by which users can create pre-disabled cells. Hence unless such cells are enabled, no VMs will be spawned on the hosts in these cells.
  • Two new options --enable and --disable have been added to the nova-manage cell_v2 update_cell command. Using these flags users can enable or disable scheduling to a cell.
  • Exposes flavor extra_specs in the flavor representation since microversion 2.61. Flavor extra_specs will be included in Response body of the following APIs:

    • GET /flavors/detail
    • GET /flavors/{flavor_id}
    • POST /flavors
    • PUT /flavors/{flavor_id}

    Now users can see the flavor extra-specs in flavor APIs response and do not need to call GET /flavors/{flavor_id}/os-extra_specs API. The visibility of the flavor extra_specs within the flavor resource will be controlled by the same policy rules as are used for showing the flavor extra_specs. If the user has no access to query extra_specs, the flavor.extra_specs will not be included.

  • A new traceback field has been added to each versioned instance notification. In an error notification this field contains the full traceback string of the exception which caused the error notification. See the notification dev ref for the sample file of instance.create.error as an example.
  • The microversion 2.62 adds host (hostname) and hostId (an obfuscated hashed host id string) fields to the instance action GET /servers/{server_id}/os-instance-actions/{req_id} API. The display of the newly added host field will be controlled via policy rule os_compute_api:os-instance-actions:events, which is the same policy used for the events.traceback field. If the user is prevented by policy, only hostId will be displayed.
  • The request_id field has been added to all instance action and instance update versioned notification payloads. Note that notifications triggered by periodic tasks will have the request_id field set to be None.
  • Add support, in a new placement microversion 1.21, for the member_of query parameter, representing one or more aggregate UUIDs. When supplied, it will filter the returned allocation candidates to only those resource_providers that are associated with (“members of”) the specified aggregate(s). This parameter can have a value of either a single aggregate UUID, or a comma-separated list of aggregate UUIDs. When specifying more than one aggregate, a resource provider needs to be associated with at least one of the aggregates in order to be included; it does not have to be associated with all of them. Because of this, the list of UUIDs must be prefixed with in: to represent the logical OR of the selection.
  • Introduces new placement API version 1.26. Starting with this version it is allowed to define resource provider inventories with reserved value equal to total.
  • The scheduler can now use placement to more efficiently query for hosts within an availability zone. This requires that a host aggregate is created in nova with the availability_zone key set, and the same aggregate is created in placement with an identical UUID. The [scheduler]/query_placement_for_availability_zone config option enables this behavior and, if enabled, eliminates the need for the AvailabilityZoneFilter to be enabled.
  • It is now possible to configure granular policy rules for placement REST API operations.

    By default, all operations continue to use the role:admin check string so there is no upgrade impact.

    A new configuration option is introduced, [placement]/policy_file, which is used to configure the location of the placement policy file. By default, the placement-policy.yaml file may live alongside the nova policy file, e.g.:

    • /etc/nova/policy.yaml
    • /etc/nova/placement-policy.yaml

    However, if desired, [placement]/policy_file makes it possible to package and deploy the placement policy file separately to make the future split of placement and nova packages easier, e.g.:

    • /etc/placement/policy.yaml

    All placement policy rules are defined in code so by default no extra configuration is required and the default rules will be used on start of the placement service.

    For more information about placement policy including a sample file, see the configuration reference documentation:

    https://docs.openstack.org/nova/latest/configuration/index.html#placement-policy

  • Supports instance rescue and unrescue with ironic virt driver. This feature requires an ironic service supporting API version 1.38 or later, which is present in ironic releases >= 10.1. It also requires python-ironicclient >= 2.3.0.
  • libvirt: add support for virtio-net rx/tx queue sizes

    Add support for configuring the rx_queue_size and tx_queue_size options in the QEMU virtio-net driver by way of nova.conf. Only supported for vhost/vhostuser interfaces

    Currently, valid values for the ring buffer sizes are 256, 512, and 1024.

    Adjustable RX queue sizes requires QEMU 2.7.0, and libvirt 2.3.0 (or newer) Adjustable TX queue sizes requires QEMU 2.10.0, and libvirt 3.7.0 (or newer)

  • Added support for nvmeof type volumes to the libvirt driver.
  • The URLs in cell mapping records may now include variables that are filled from the corresponding default URL specified in the host’s configuration file. This allows per-host credentials, as well as other values to be set in the config file which will affect the URL of a cell, as calculated when loading the record. For database_connection, the [database]/connection URL is used as the base. For transport_url, the [DEFAULT]/transport_url is used. For more information, see the cells configuration docs: https://docs.openstack.org/nova/latest/user/cells.html
  • Microversion 2.64 is added and enables users to define rules on server group policy to meet more advanced policy requirements. This microversion brings the following changes in server group APIs:
    • Add policy and rules fields in the request of POST /os-server-groups. The policy represents the name of policy. The rules field, which is a dict, can be applied to the policy, which currently only supports max_server_per_host for anti-affinity policy.
    • The policy and rules fields will be returned in response body of POST, GET /os-server-groups API and GET /os-server-groups/{server_group_id} API.
    • The policies and metadata fields have been removed from the response body of POST, GET /os-server-groups API and GET /os-server-groups/{server_group_id} API.
  • The amount of PCI Express ports (slots in virtual motherboard) can now be configured using num_pcie_ports option in libvirt section of nova.conf file. This affects x86-64 with hw_machine_type set to ‘pc-q35’ value and AArch64 instances of ‘virt’ hw_machine_type (which is default for that architecture). Due to QEMU’s memory map limits on aarch64/virt maximum value is limited to 28.
  • Adds a new generation column to the consumers table. This value is incremented every time allocations are made for a consumer. The new placement microversion 1.28 requires that all POST /allocations and PUT /allocations/{consumer_uuid} requests now include the consumer_generation parameter to ensure that if two processes are allocating resources for the same consumer, the second one to complete doesn’t overwrite the first. If there is a mismatch between the consumer_generation in the request and the current value in the database, the allocation will fail, and a 409 Conflict response will be returned. The calling process must then get the allocations for that consumer by calling GET /allocations/{consumer}. That response will now contain, in addition to the allocations, the current generation value for that consumer. Depending on the use case, the calling process may error; or it may wish to combine or replace the existing allocations with the ones it is trying to post, and re-submit with the current consumer_generation.
  • The nova-manage discover_hosts command now has a --by-service option which allows discovering hosts in a cell purely by the presence of a nova-compute binary. At this point, there is no need to use this unless you’re using ironic, as it is less efficient. However, if you are using ironic, this allows discovery and mapping of hosts even when no ironic nodes are present.
  • Introduces [compute]/cpu_shared_set option for compute nodes. Some workloads run best when the hypervisor overhead processes (emulator threads in libvirt/QEMU) can be placed on different physical host CPUs than other guest CPU resources. This allows those workloads to prevent latency spikes for guest vCPU threads.

    To place a workload’s emulator threads on a set of isolated physical CPUs, set the [compute]/cpu_shared_set configuration option to the set of host CPUs that should be used for best-effort CPU resources. Then set a flavor extra spec to hw:emulator_threads_policy=share to instruct nova to place that workload’s emulator threads on that set of host CPUs.

  • The libvirt driver now supports additional Cinder front-end QoS specs, allowing the specification of additional IO burst limits applied for each attached disk, individually.

    • quota:read_bytes_sec_max
    • quota:write_bytes_sec_max
    • quota:total_bytes_sec_max
    • quota:read_iops_sec_max
    • quota:write_iops_sec_max
    • quota:total_iops_sec_max
    • quota:size_iops_sec

    For more information, see the Cinder admin guide:

    https://docs.openstack.org/cinder/latest/admin/blockstorage-basic-volume-qos.html

  • The shutdown retry interval in powering off instances can now be set using the configuration setting shutdown_retry_interval, in the compute configuration group.
  • Added support for forbidden traits to the scheduler. A flavor extra spec is extended to support specifying the forbidden traits. The syntax of extra spec is trait:<trait_name>=forbidden, for example:

    • trait:HW_CPU_X86_AVX2=forbidden
    • trait:STORAGE_DISK_SSD=forbidden

    The scheduler will pass the forbidden traits to the GET /allocation_candidates endpoint in the Placement API to include only resource providers that do not include the forbidden traits. Currently the only valid values are required and forbidden. Any other values will be considered invalid.

    This requires that the Placement API version 1.22 is available before the nova-scheduler service can use this feature.

    The FilterScheduler is currently the only scheduler driver that supports this feature.

  • Added support for granular resource and traits requests to the scheduler. A flavor extra spec is extended to support specifying numbered groupings of resources and required/forbidden traits. A resources key with a positive integer suffix (e.g. resources42:VCPU) will be logically associated with trait keys with the same suffix (e.g. trait42:HW_CPU_X86_AVX). The resources and required/forbidden traits in that group will be satisfied by the same resource provider on the host selected by the scheduler. When more than one numbered grouping is supplied, the group_policy extra spec is required to indicate how the groups should interact. With group_policy=none, separate groupings - numbered or unnumbered - may or may not be satisfied by the same provider. With group_policy=isolate, numbered groups are guaranteed to be satisfied by different providers - though there may still be overlap with the unnumbered group.

    trait keys for a given group are optional. That is, you may specify resources42:XXX without a corresponding trait42:YYY. However, the reverse (specifying trait42:YYY without resources42:XXX) will result in an error.

    The semantic of the (unnumbered) resources and trait keys is unchanged: the resources and traits specified thereby may be satisfied by any provider on the same host or associated via aggregate.

  • Added a new flavor extra_spec, hide_hypervisor_id, which hides the hypervisor signature for the guest when true (‘kvm’ won’t appear in lscpu). This acts exactly like and in parallel to the image property img_hide_hypervisor_id and is useful for running the nvidia drivers in the guest. Currently, this is only supported in the libvirt driver.
  • The libvirt driver now allows specifying individual CPU feature flags for guests, via a new configuration attribute [libvirt]/cpu_model_extra_flags – this is valid in combination with all the three possible values for [libvirt]/cpu_mode: custom, host-model, or host-passthrough. The cpu_model_extra_flags also allows specifying multiple CPU flags. Refer to its documentation in nova.conf for usage details.

    One of the motivations for this is to alleviate the performance degradation (caused as a result of applying the “Meltdown” CVE fixes) for guests running with certain Intel-based virtual CPU models. This guest performance impact is reduced by exposing the CPU feature flag ‘PCID’ (“Process-Context ID”) to the guest CPU, assuming that it is available in the physical hardware itself.

    Note that besides custom, Nova’s libvirt driver has two other CPU modes: host-model (which is the default), and host-passthrough. Refer to the [libvirt]/cpu_model_extra_flags documentation for what to do when you are using either of those CPU modes in context of ‘PCID’.

  • The libvirt driver now allows utilizing file backed memory for qemu/KVM virtual machines, via a new configuration attribute [libvirt]/file_backed_memory, defaulting to 0 (disabled).

    [libvirt]/file_backed_memory specifies the available capacity in MiB for file backed memory, at the directory configured for memory_backing_dir in libvirt’s qemu.conf. When enabled, the libvirt driver will report the configured value for the total memory capacity of the node, and will report used memory as the sum of all configured guest memory.

    Live migrations from nodes not compatible with file backed memory to nodes with file backed memory is not allowed, and will result in an error. It’s recommended to upgrade all nodes before enabling file backed memory.

    Note that file backed memory is not compatible with hugepages, and is not compatible with memory overcommit. If file backed memory is enabled, ram_allocation_ratio must be configured to 1.0

    For more details, see the admin guide documentation:

    https://docs.openstack.org/nova/latest/admin/file-backed-memory.html

  • We now attempt to mirror the association of compute host to host aggregate into the placement API. When administrators use the POST /os-aggregates/{aggregate_id}/action Compute API call to add or remove a host from an aggregate, the nova-api service will attempt to ensure that a corresponding record is created in the placement API for the resource provider (compute host) and host aggregate UUID.

    The nova-api service needs to understand how to connect to the placement service in order for this mirroring process to work. Administrators should ensure that there is a [placement] section in the nova.conf file which is used by the nova-api service, and that credentials for interacting with placement are contained in that section.

    If the [placement] section is missing from the nova-api’s nova.conf file, nothing will break however there will be some warnings generated in the nova-api’s log file when administrators associate a compute host with a host aggregate. However, this will become a failure starting in the 19.0.0 Stein release.

  • A new 1.24 placement API microversion adds the ability to specify multiple member_of query parameters for the GET /resource_providers and GET allocation_candidates endpoints. When multiple member_of query parameters are received, the placement service will return resource providers that match all of the requested aggregate memberships. The member_of=in:<agg uuids> format is still supported and continues to indicate an IN() operation for aggregate membership. Some examples for using the new functionality: Get all providers that are associated with BOTH agg1 and agg2: ?member_of=agg1&member_of=agg2 Get all providers that are associated with agg1 OR agg2: ?member_of=in:agg1,agg2 Get all providers that are associated with agg1 and ANY OF (agg2, agg3): ?member_of=agg1&member_of=in:agg2,agg3 Get all providers that are associated with ANY OF (agg1, agg2) AND are also associated with ANY OF (agg3, agg4): ?member_of=in:agg1,agg2&member_of=in:agg3,agg4
  • It is now possible to configure multiple nova-scheduler workers via the [scheduler]workers configuration option. By default, the option runs ncpu workers if using the filter_scheduler scheduler driver, otherwise the default is 1.

    Since blueprint placement-claims in Pike, the FilterScheduler uses the Placement service to create resource allocations (claims) against a resource provider (i.e. compute node) chosen by the scheduler. That reduces the risk of scheduling collisions when running multiple schedulers.

    Since other scheduler drivers, like the CachingScheduler, do not use Placement, it is recommended to set workers=1 (default) for those other scheduler drivers.

  • From microversion 1.29, we support allocation candidates with nested resource providers. Namely, the following features are added. 1) GET /allocation_candidates is aware of nested providers. Namely, when provider trees are present, allocation_requests in the response of GET /allocation_candidates can include allocations on combinations of multiple resource providers in the same tree. 2) root_provider_uuid and parent_provider_uuid are added to provider_summaries in the response of GET /allocation_candidates.
  • The following legacy notifications have been transformed to a new versioned payload:

    • aggregate.updatemetadata
    • aggregate.updateprop
    • instance.exists
    • instance.live_migration._post
    • instance.live.migration.force.complete
    • instance.live_migration.post.dest
    • instance.live_migration.rollback.dest
    • instance.rebuild.scheduled
    • metrics.update
    • servergroup.addmember

    Consult https://docs.openstack.org/nova/latest/reference/notifications.html for more information including payload samples.

  • A nova-manage placement sync_aggregates command has been added which can be used to mirror nova host aggregates to resource provider aggregates in the placement service. This is a useful tool if you are using aggregates in placement to optimize scheduling:

    https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregates-in-placement

    The os-aggregates compute API add_host and remove_host actions will automatically add/remove compute node resource providers from resource provider aggregates in the placement service if the nova-api service is configured to communicate with the placement service, so this command is mostly useful for existing deployments with host aggregates which are not yet mirrored in the placement service.

    For more details, see the command documentation:

    https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement

  • It is now possible to configure NUMA affinity for most neutron networks. This is available for networks that use a provider:network_type of flat or vlan and a provider:physical_network (L2 networks) or networks that use a provider:network_type of vxlan, gre or geneve (L3 networks).

    For more information, refer to the spec and documentation.

  • Placement API microversion 1.19 enhances the payloads for the GET /resource_providers/{uuid}/aggregates response and the PUT /resource_providers/{uuid}/aggregates request and response to be identical, and to include the resource_provider_generation. As with other generation-aware APIs, if the resource_provider_generation specified in the PUT request does not match the generation known by the server, a 409 Conflict error is returned.
  • An optional configuration group placement_database can be used in nova.conf to configure a separate database for use with the placement API.

    If placement_database.connection has a value then the placement_database configuration group will be used to configure a separate placement database, including using connection to identify the target database. That database will have a schema that is a replica of all the tables used in the API database. The new database schema will be created and synchronized when the nova-manage api_db sync command is run.

    When the placement_database.connection setting is omitted the existing settings for the api_database will be used for hosting placement data.

    Setting placement_database.connection and calling nova-manage api_db sync will only create tables. No data will be migrated. In an existing OpenStack deployment, if there is existing placement data in the nova_api database this will not be copied. It is up to the deployment to manually replicate that data in a fashion that works best for the environment.

  • In microversion 1.23 of the placement service, JSON formatted error responses gain a new attribute, code, with a value that identifies the type of this error. This can be used to distinguish errors that are different but use the same HTTP status code. Any error response which does not specifically define a code will have the code placement.undefined_code.
  • Placement microversion ‘1.22’ adds support for expressing traits which are forbidden when filtering GET /resource_providers or GET /allocation_candidates. A forbidden trait is a properly formatted trait in the existing required parameter, prefixed by a !. For example required=!STORAGE_DISK_SSD asks that the results not include any resource providers that provide solid state disk.
  • In placement API microversion 1.20, a successful POST /resource_providers returns 200 with a payload representing the newly-created resource provider. The format is the same format as the result of the corresponding GET /resource_providers/{uuid} call. This is to allow the caller to glean automatically-set fields, such as UUID and generation, without a subsequent GET.
  • In version 1.25 of the Placement API, GET /allocation_candidates is enhanced to accept numbered groupings of resource, required/forbidden trait, and aggregate association requests. A resources query parameter key with a positive integer suffix (e.g. resources42) will be logically associated with required and/or member_of query parameter keys with the same suffix (e.g. required42, member_of42). The resources, required/forbidden traits, and aggregate associations in that group will be satisfied by the same resource provider in the response. When more than one numbered grouping is supplied, the group_policy query parameter is required to indicate how the groups should interact. With group_policy=none, separate groupings - numbered or unnumbered - may or may not be satisfied by the same provider. With group_policy=isolate, numbered groups are guaranteed to be satisfied by different providers - though there may still be overlap with the unnumbered group. In all cases, each allocation_request will be satisfied by providers in a single non-sharing provider tree and/or sharing providers associated via aggregate with any of the providers in that tree.

    The required and member_of query parameters for a given group are optional. That is, you may specify resources42=XXX without a corresponding required42=YYY or member_of42=ZZZ. However, the reverse (specifying required42=YYY or member_of42=ZZZ without resources42=XXX) will result in an error.

    The semantic of the (unnumbered) resources, required, and member_of query parameters is unchanged: the resources, traits, and aggregate associations specified thereby may be satisfied by any provider in the same non-sharing tree or associated via the specified aggregate(s).

  • Prior to microversion 1.8 of the placement API, one could create allocations and not supply a project or user ID for the consumer of the allocated resources. While this is no longer allowed after placement API 1.8, older allocations exist and we now ensure that a consumer record is created for these older allocations. Use the two new CONF options CONF.placement.incomplete_consumer_project_id and CONF.placement.incomplete_consumer_user_id to control the project and user identifiers that are written for these incomplete consumer records.
  • Placement API microversion 1.18 adds support for the required query parameter to the GET /resource_providers API. It accepts a comma-separated list of string trait names. When specified, the API results will be filtered to include only resource providers marked with all the specified traits. This is in addition to (logical AND) any filtering based on other query parameters.

    Trait names which are empty, do not exist, or are otherwise invalid will result in a 400 error.

  • From microversion 1.27, the provider_summaries field in the response of the GET /allocation_candidates API includes all the resource class inventories, while it had only requested resource class inventories with older microversions. Now callers can use this additional inventory information in making further sorting or filtering decisions.
  • The PowerVM driver now supports hot plugging/unplugging of network interfaces.
  • The PowerVM virt driver now supports booting from local ephemeral disk. Two new configuration options have been introduced to the powervm configuration group, disk_driver and volume_group_name. The former allows the selection of either ssp or localdisk for the PowerVM disk driver. The latter specifies the name of the volume group when using the localdisk disk driver.
  • The PowerVM virt driver now supports instance snapshot.
  • The PowerVM virt driver now supports vSCSI Fibre Channel cinder volumes. PowerVM now supports attaching, detaching, and extending the size of vSCSI FC cinder volumes.
  • The nova-manage command now has a ‘db purge’ command that will delete data from the shadow tables after ‘db archive_deleted_rows’ has been run. There is also now a --purge option for ‘db archive_deleted_rows’ that will automatically do a full purge after archiving.
  • Introduces the powervm configuration group which contains the proc_units_factor configuration option. This allows the operator to specify the physical processing power to assign per vCPU.
  • Currently the nova-manage cell_v2 map_instances command uses a marker setup by which repeated runs of the command will start from where the last run finished, by default. A --reset option has been added to this command by which the marker can be reset and users can start the process from the beginning if needed, instead of the default behavior.
  • Utilizing recent changes in oslo.messaging, the rpc_response_timeout value can now be increased significantly if needed or desired to solve issues with long-running RPC calls timing out before completing due to legitimate reasons (such as live migration prep). If rpc_response_timeout is increased beyond the default, nova will request active call monitoring from oslo.messaging, which will effectively heartbeat running activities to avoid a timeout, while still detecting failures related to service outages or message bus congestion in a reasonable amount of time. Further, the [DEFAULT]/long_rpc_timeout option has been added which allows setting an alternate timeout value for longer-running RPC calls which are known to take a long time. The default for this is 1800 seconds, and the rpc_response_timeout value will be used for the heartbeat frequency interval, providing a similar failure-detection experience for these calls despite the longer overall timeout. Currently, only the live migration RPC call uses this longer timeout value.
  • Added ability to extend an attached ScaleIO volume when using the libvirt compute driver.
  • Support for filtering out disabled cells during scheduling for server create requests has been added. Firstly the concept of disabled cells has been introduced which means such disabled cells will not be candidates for the scheduler. Secondly changes have been made to the filter scheduler to ensure that it chooses only the enabled cells for scheduling and filters out the disabled ones. Note that operations on existing instances already inside a disabled cell like move operations will not be blocked.
  • The scheduler can now use placement to more efficiently query for hosts within a tenant-restricted aggregate. This requires that a host aggregate is created in nova with the filter_tenant_id key (optionally suffixed with any string for multiple tenants, like filter_tenant_id3=$tenantid) and the same aggregate is created in placement with an identical UUID. The [scheduler]/limit_tenants_to_placement_aggregate config option enables this behavior and [scheduler]/placement_aggregate_required_for_tenants makes it either optional or mandatory, allowing only some tenants to be restricted. For more information, see the schedulers section of the administration guide.
  • The 2.63 compute REST API microversion adds support for the trusted_image_certificates parameter, which is used to define a list of trusted certificate IDs that can be used during image signature verification and certificate validation. The list is restricted to a maximum of 50 IDs. Note that there is not support with volume-backed servers.

    The trusted_image_certificates request parameter can be passed to the server create and rebuild APIs (if allowed by policy):

    • POST /servers
    • POST /servers/{server_id}/action (rebuild)

    The following policy rules were added to restrict the usage of the trusted_image_certificates request parameter in the server create and rebuild APIs:

    • os_compute_api:servers:create:trusted_certs
    • os_compute_api:servers:rebuild:trusted_certs

    The trusted_image_certificates parameter will be in the response body of the following APIs (not restricted by policy):

    • GET /servers/detail
    • GET /servers/{server_id}
    • PUT /servers/{server_id}
    • POST /servers/{server_id}/action (rebuild)

    The payload of the instance.create.start and instance.create.end and instance.create.error versioned notifications have been extended with the trusted_image_certificates field that contains the list of trusted certificate IDs used when the instance is created.

    The payload of the instance.rebuild.start and instance.rebuild.end and instance.rebuild.error versioned notifications have been extended with the trusted_image_certificates field that contains the list of trusted certificate IDs used when the instance is rebuilt. This change also causes the type of the payload object to change from InstanceActionPayload version 1.6 to InstanceActionRebuildPayload version 1.7. See the notification dev reference for the sample file of instance.rebuild.start as an example.

  • As of the 2018-08-27 metadata API version, a boolean vf_trusted key appears for all network interface devices in meta_data.json, indicating whether the device is a trusted virtual function or not.
  • The libvirt compute driver now allows users to create instances with SR-IOV virtual functions which will be configured as trusted.

    The operator will have to create pools of devices with tag trusted=true.

    For example, modify /etc/nova/nova.conf and set:

    [pci]
    passthrough_whitelist = {"devname": "eth0", "trusted": "true",
                             "physical_network":"sriovnet1"}
    

    Where “eth0” is the interface name related to the physical function.

    Ensure that the version of ip-link on the compute host supports setting the trust mode on the device.

    Ports from the physical network will have to be created with a binding profile to match the trusted tag. Only ports with binding:vif_type=hw_veb and binding:vnic_type=direct are supported.

    $ neutron port-create <net-id> \
                          --name sriov_port \
                          --vnic-type direct \
                          --binding:profile type=dict trusted=true
    
  • The new style policy field has been added to ServerGroupPayload. The server_group.create, server_group.delete and server_group.add_member versioned notifications will be updated to include the new policy and rules field. The policies field is deprecated for removal but still put into the notification payload for backward compatibility.
  • Add a new option of image_handler in the xenapi section for configuring the image handler plugin which will be used by XenServer to download or upload images. The value for this option should be a short name representing a supported handler.

    The following are the short names and description of the plugins which they represent:

    • direct_vhd

      This plugin directly processes the VHD files in XenServer SR(Storage Repository). So this plugin only works when the host’s SR type is file system based e.g. ext, nfs. This is the default plugin.

    • vdi_local_dev

      This plugin implements an image upload method which attaches the VDI as a local disk in the VM in which the OpenStack Compute service runs. It uploads the raw disk to glance when creating an image; When booting an instance from a glance image, it downloads the image and streams it into the disk which is attached to the compute VM.

    • vdi_remote_stream

      This plugin implements an image proxy in nova compute service.

      For image upload, the proxy will export a data stream for a VDI from XenServer via the remote API supplied by XAPI; convert the stream to the image format supported by glance; and upload the image to glance.

      For image download, the proxy downloads an image stream from glance; extracts the data stream from the image stream; and then remotely imports the data stream to XenServer’s VDI via the remote API supplied by XAPI.

      Note: Under this implementation, the image data may reside in one or more pieces of storage of various formats on the host, but the import and export operations interact with a single, proxied VDI object independent of the underlying structure.

Known Issues

  • The initial implementation of native LUKS decryption within Libvirt 2.2.0 had a known issue with the use of passphrases that were a multiple of 16 bytes in size. This was resolved in the upstream 3.3.0 release of Libvirt and has been backported to various downstream distribution specific versions.

    A simple warning will reference the above if this issue is encountered by Nova however operators of the environment will still need to update Libvirt to a version where this issue has been fixed to resolve the issue.

Upgrade Notes

  • The minimum version of libvirt on AArch64 architecture that nova compute will interoperate with is now 3.6.0. Deployments using older versions of libvirt on AArch64 should upgrade.
  • The nova-network service has been deprecated since the 14.0.0 Newton release and now the following nova-network specific REST APIs have been removed along with their related policy rules. Calling these APIs will now result in a 410 HTTPGone error response.

    • GET /os-fping
    • GET /os-fping/{server_id}
    • GET /servers/{server_id}/os-virtual-interfaces
    • GET /os-fixed-ips/{fixed_ip}
    • POST /os-fixed-ips/{fixed_ip}/action (reserve)
    • POST /os-fixed-ips/{fixed_ip}/action (unreserve)
    • GET /os-floating-ips-bulk
    • GET /os-floating-ips-bulk/{host_name}
    • POST /os-floating-ips-bulk
    • PUT /os-floating-ips-bulk/delete
    • GET /os-floating-ip-dns
    • PUT /os-floating-ip-dns/{domain}
    • DELETE /os-floating-ip-dns/{domain}
    • GET /os-floating-ip-dns/{domain}/entries/{ip}
    • GET /os-floating-ip-dns/{domain}/entries/{name}
    • PUT /os-floating-ip-dns/{domain}/entries/{name}
    • DELETE /os-floating-ip-dns/{domain}/entries/{name}

    In addition, the following configuration options have been removed.

    • [api]/fping_path
  • The nova-api service now requires the [placement] section to be configured in nova.conf if you are using a separate config file just for that service. This is because the nova-api service now needs to talk to the placement service in order to delete resource provider allocations when deleting an instance and the nova-compute service on which that instance is running is down. This change is idempotent if [placement] is not configured in nova-api but it will result in new warnings in the logs until configured. See bug https://bugs.launchpad.net/nova/+bug/1679750 for more details.
  • The image_ref_url entry in legacy instance notification payloads will be just the instance image id if [glance]/api_servers is not set and the notification is being sent from a periodic task. In this case the periodic task does not have a token to get the image service endpoint URL from the identity service so only the image id is in the payload. This does not affect versioned notifications.
  • A new check is added to nova-status upgrade check which will scan all cells looking for nova-osapi_compute service versions which are from before Ocata and which may cause issues with how the compute API finds instances. This will result in a warning if:

    • No cell mappings are found
    • The minimum nova-osapi_compute service version is less than 15 in any given cell

    See https://bugs.launchpad.net/nova/+bug/1759316 for more details.

  • noVNC 1.0.0 introduced a breaking change in the URLs used to access the console. Previously, the vnc_auto.html path was used but it is now necessary to use the vnc_lite.html path. When noVNC is updated to 1.0.0, [vnc] novncproxy_base_url configuration value must be updated on each compute node to reflect this change.
  • Deployments with custom scheduler filters (or weighers) that rely on the HostState.instances dict to contain full Instance objects will now hit a performance penalty because the Instance values in that dict are no longer fully populated objects. The in-tree filters that do rely on HostState.instances only care about the (1) uuids of the instances per host, which is the keys in the dict and (2) the number of instances per host, which can be determined via len(host_state.instances).

    Custom scheduler filters and weighers should continue to function since the Instance objects will lazy-load any accessed fields, but this means a round-trip to the database to re-load the object per instance, per host.

    If this is an issue for you, you have three options:

    • Accept this change along with the performance penalty
    • Revert change I766bb5645e3b598468d092fb9e4f18e720617c52 and carry the fork in the scheduler code
    • Contribute your custom filter/weigher upstream (this is the best option)
  • A new online data migration has been added to populate missing instance.availability_zone values for instances older than Pike whose availability_zone was not specified during boot time. This can be run during the normal nova-manage db online_data_migrations routine. This fixes Bug 1768876
  • This release moves the livirt driver IVS VIF plug-unplug to a separate package called os-vif-bigswitch. This package is a requirement on compute nodes when using networking-bigswitch as neutron ML2 and L3 driver. Releases are available on https://pypi.org/project/os-vif-bigswitch/. Major version for the package matches upstream neutron version number. Minor version tracks compatiblity with Big Cloud Fabric (BCF) releases, and typically is set to the lowest supported BCF release.
  • The new [scheduler]workers configuration option defaults to ncpu workers if using the filter_scheduler scheduler driver. If you are running nova-scheduler on the same host as other services, you may want to change this default value, or to otherwise account for running other instances of the nova-scheduler service.
  • A new check is added to the nova-status upgrade check CLI which can assist with determining if ironic instances have had their embedded flavor migrated to use the corresponding ironic node custom resource class.
  • A new check is added to the nova-status upgrade check CLI to make sure request spec online migrations have been run per-cell. Missing request spec compatibility code is planned to be removed in the Stein release.
  • The PowerVM virt driver previously used the PowerVM Shared Storage Pool disk driver by default. The default disk driver for PowerVM is now localdisk. See configuration option [powervm]/disk_driver for usage details.
  • The following commands are no longer required to be listed in your rootwrap configuration: e2fsck; mkfs; tune2fs; xenstore_read.
  • Previously the PowerVM driver would default to 0.5 physical processors per vCPU, which is the default from the pypowervm library. The default will now be 0.1 physical processors per vCPU, from the proc_units_factor configuration option in the powervm configuration group.
  • The [filter_scheduler]/use_baremetal_filters and [filter_scheduler]/baremetal_enabled_filters configuration options were deprecated in the 16.0.0 Pike release since deployments serving baremetal instances should be scheduling based on resource classes. Those options have now been removed.

    Similarly, the ironic_host_manager choice for the [scheduler]/host_manager configuration option was deprecated in the 17.0.0 Queens release because ironic_host_manager is only useful when using use_baremetal_filters=True and baremetal_enabled_filters. Now that those options are gone, the deprecated ironic_host_manager host manager choice has also been removed. As a result, the [scheduler]/host_manager configuration option has also been removed since there is only one host manager now and no need for an option.

    Remember to run nova-status upgrade check before upgrading to 18.0.0 Rocky to ensure baremetal instances have had their embedded flavor migrated to use the corresponding ironic node custom resource class.

  • The following options, previously found in the [crypto] group, have been removed:

    • ca_file
    • key_file
    • crl_file
    • keys_path
    • ca_path
    • use_project_ca
    • user_cert_subject
    • project_cert_subject

    These have not been used in recent releases.

  • The db_driver configuration option was deprecated in a previous release and has now been removed. This option allowed you to replace the SQLAlchemy database layer with one of your own. The approach was deprecated and unsupported, and it is now time to remove it completely.
  • The following deprecated configuration options have been removed from the api section of nova.conf:

    • allow_instance_snapshots

    These were deprecated in the 16.0.0 release as they allowed inconsistent API behavior across deployments. To disable snapshots in the createImage server action API, change the os_compute_api:servers:create_image and os_compute_api:servers:create_image:allow_volume_backed policies.

  • The following deprecated configuration options have been removed from the compute section of nova.conf:

    • multi_instance_display_name_template

    These were deprecated in the 15.0.0 release as they allowed for inconsistent API behavior across deployments.

  • The following deprecated options have been removed from the placement group of nova.conf:

    • os_region_name (use region_name instead)
    • os_interface (use valid_interfaces instead)

    These were deprecated in 17.0.0 as they have been superseded by their respective keystoneauth1 Adapter configuration options.

  • The following configuration options were deprecated for removal in the 17.0.0 Queens release and have now been removed:

    • [DEFAULT]/monkey_patch
    • [DEFAULT]/monkey_patch_modules
    • [notifications]/default_publisher_id

    Monkey patching nova is not tested, not supported, and is a barrier to interoperability. If you have code which relies on monkey patching decorators, for example, for notifications, please propose those changes upstream.

  • The [DEFAULT]/scheduler_driver_task_period configuration option, which was deprecated in the 15.0.0 Ocata release, has now been removed. Use the [scheduler]/periodic_task_interval option instead.
  • The [conductor] topic configuration option was previously deprecated and is now removed from nova. There was no need to let users choose the RPC topics for all services. There was little benefit from this and it made it really easy to break nova by changing the value of topic options.
  • The [xenserver]/vif_driver configuration option was deprecated in the 15.0.0 Ocata release and has now been removed. The only supported vif driver is now XenAPIOpenVswitchDriver used with Neutron as the backend networking service configured to run the neutron-openvswitch-agent service. See the XenServer configuration guide for more details on networking setup.
  • The minimum required version of libvirt used by the nova-compute service is now 1.3.1. And the minimum required version of QEMU used by the nova-compute service is now 2.5.0. Failing to meet these minimum versions when using the libvirt compute driver will result in the nova-compute service not starting.
  • The [quota]/driver configuration option is no longer deprecated but now only allows one of two possible values:

    • nova.quota.DbQuotaDriver
    • nova.quota.NoopQuotaDriver

    This means it is no longer possible to class-load custom out-of-tree quota drivers.

  • If the scheduler service is started before the cell mappings are created or setup, nova-scheduler needs to be restarted or SIGHUP-ed for the newly added cells to get registered in the scheduler cache.
  • The default value of the configuration attribute [libvirt]/rng_dev_path is now set to /dev/urandom. Refer to the documentation of rng_dev_path for details.
  • The nova-consoleauth service has been deprecated and new consoles will have their token authorizations stored in cell databases. With this, console proxies are required to be deployed per cell. All existing consoles will be reset. For most operators, this should be a minimal disruption as the default TTL of a console token is 10 minutes.

    There is a new configuration option [workarounds]/enable_consoleauth for use by operators who:

    • Are performing a live, rolling upgrade and all compute hosts are not currently running Rocky code
    • Have not yet deployed console proxies per cell
    • Have configured a much longer token TTL
    • Otherwise wish to avoid immediately resetting all existing consoles

    When the option is set to True, the console proxy will fall back on the nova-consoleauth service to locate existing console authorizations. The option defaults to False.

    Operators may unset the configuration option when:

    • The live, rolling upgrade has all compute hosts running Rocky code
    • Console proxies have been deployed per cell
    • All of the existing consoles have expired. For example, if a deployment has configured a token TTL of one hour, the operator may disable the [workarounds]/enable_consoleauth option, one hour after deploying the new code.

    Note

    Cells v1 was not converted to use the database backend for console token authorizations. Cells v1 console token authorizations will continue to be supported by the nova-consoleauth service and use of the [workarounds]/enable_consoleauth option does not apply to Cells v1 users.

Deprecation Notes

  • Support to monitor performance events for Intel CMT (Cache Monitoring Technology, or “CQM” in Linux kernel parlance) – namely cmt, mbm_local and mbm_total – via the config attribute [libvirt]/enabled_perf_events is now deprecated from Nova, and will be removed in the “Stein” release. Otherwise, if you have enabled those events, and upgraded to Linux kernel 4.14 (or suitable downstream version), it will result in instances failing to boot.

    That is because the Linux kernel has deleted the perf framework integration with Intel CMT, as the feature was broken by design – an incompatibility between Linux’s perf infrastructure and Intel CMT. It was removed in upstream Linux version v4.14; but bear in mind that downstream Linux distributions with lower kernel versions than 4.14 have backported the said change.

  • Running API services (nova-osapi_compute or nova-metadata) with eventlet is now deprecated. Deploy with a WSGI server such as uwsgi or mod_wsgi.
  • Two keymap-related configuration options have been deprecated:

    • [vnc] keymap
    • [spice] keymap

    The VNC option affects the libvirt and VMWare virt drivers, while the SPICE option only affects libvirt. For the libvirt driver, configuring these options resulted in lossy keymap conversions for the given graphics method. It is recommended that users should unset these options and configure their guests as necessary instead. In the case of noVNC, noVNC 1.0.0 should be used as this provides support for QEMU’s Extended Key Event messages. Refer to bug #1682020 and the QEMU RFB pull request for more information.

    For the VMWare driver, only the VNC option applies. However, this option is deprecated and will not affect any other driver in the future. A new option has been added to the [vmware] group to replace this:

    • [vmware] vnc_keymap

    The [vnc] keymap and [spice] keymap options will be removed in a future release.

  • The following options, found in DEFAULT, were only used for configuring nova-network and are, like nova-network itself, now deprecated.
    • network_manager
  • The nova-consoleauth service has been deprecated. Console token authorization storage is moving from the nova-consoleauth service backend to the database backend, with storage happening in both, in Rocky. In Stein, only the database backend will be used for console token authorization storage.

    Note

    Cells v1 was not converted to use the database backend for console token authorizations. Cells v1 console token authorizations will continue to be supported by the nova-consoleauth service.

  • The fping_path configuration option has been deprecated. /os-fping is used by nova-network and nova-network itself is deprecated and will be removed in the future.
  • The [libvirt]/sparse_logical_volumes configuration option is now deprecated. Sparse logical volumes were never verified by tests in Nova and some bugs were found without having fixes so we prefer to deprecate that feature. By default, the LVM image backend allocates all the disk size to a logical volume. If you want to have the volume group having thin-provisioned logical volumes, use Cinder with volume-backed instances.
  • The following configuration options in the [upgrade_levels] group have been deprecated:
    • network - The nova-network service was deprecated in the 14.0.0 Newton release and will be removed in an upcoming release.
    • cert - The nova-cert service was removed in the 16.0.0 Pike release so this option is no longer used.
    • consoleauth - The nova-consoleauth service was deprecated in the 18.0.0 Rocky release and will be removed in an upcoming release.
  • The image_upload_handler option in the xenserver conf section has been deprecated. Please use the new option of image_handler to configure the image handler which is used to download or upload images.

Security Issues

  • A new policy rule, os_compute_api:servers:create:zero_disk_flavor, has been introduced which defaults to rule:admin_or_owner for backward compatibility, but can be configured to make the compute API enforce that server create requests using a flavor with zero root disk must be volume-backed or fail with a 403 HTTPForbidden error.

    Allowing image-backed servers with a zero root disk flavor can be potentially hazardous if users are allowed to upload their own images, since an instance created with a zero root disk flavor gets its size from the image, which can be unexpectedly large and exhaust local disk on the compute host. See https://bugs.launchpad.net/nova/+bug/1739646 for more details.

    While this is introduced in a backward-compatible way, the default will be changed to rule:admin_api in a subsequent release. It is advised that you communicate this change to your users before turning on enforcement since it will result in a compute API behavior change.

  • To mitigate potential issues with compute nodes disabling themselves in response to failures that were either non-fatal or user-generated, the consecutive build failure counter functionality in the compute service has been changed to advise the scheduler of the count instead of self-disabling the service upon exceeding the threshold. The [compute]/consecutive_build_service_disable_threshold configuration option still controls whether the count is tracked, but the action taken on this value has been changed to a scheduler weigher. This allows the scheduler to be configured to weigh hosts with consecutive failures lower than other hosts, configured by the [filter_scheduler]/build_failure_weight_multiplier option. If the compute threshold option is nonzero, computes will report their failure count for the scheduler to consider. If the threshold value is zero, then computes will not report this value and the scheduler will assume the number of failures for non-reporting compute nodes to be zero. By default, the scheduler weigher is enabled and configured with a very large multiplier to ensure that hosts with consecutive failures are scored low by default.

Bug Fixes

  • The nova-compute service now allows specifying the interval for updating nova-compute-side cache of the compute node resource provider’s aggregates and traits info via a new config option called [compute]/resource_provider_association_refresh which defaults to 300. This was previously hard-coded to run every 300 seconds which may be too often in a large deployment.
  • Booting volume-backed instances no longer includes an incorrect allocation against the compute node for the root disk. Historically, this has been quite broken behavior in Nova, where volume-backed instances would count against available space on the compute node, even though their storage was provided by the volume service. Now, newly-booted volume-backed instances will not create allocations of DISK_GB against the compute node for the root_gb quantity in the flavor. Note that if you are still using a scheduler configured with the (now deprecated) DiskFilter (including deployments using CachingScheduler), the above change will not apply to you.
  • Listing server and migration records used to give a 500 to users when a cell database was unreachable. Now only records from available cells are included to avoid the 500 error. The down cells are basically skipped when forming the results and this solution is planned to be further enhanced through the blueprint handling-down-cell.
  • The SchedulerReportClient (nova.scheduler.client.report.SchedulerReportClient) sends requests with the global request ID in the X-Openstack-Request-Id header to the placement service. Bug 1734625
  • The DELETE /os-services/{service_id} compute API will now return a 409 HTTPConflict response when trying to delete a nova-compute service which is still hosting instances. This is because doing so would orphan the compute node resource provider in the placement service on which those instances have resource allocations, which affects scheduling. See https://bugs.launchpad.net/nova/+bug/1763183 for more details.
  • The behaviour of ImagePropertiesFilter when using multiple architectures in a cloud can be unpredictable for a user if they forget to set the architecture property in their image. Nova now allows the deployer to specify a fallback in [filter_scheduler]image_properties_default_architecture to use a default architecture if none is specified. Without this, it is possible that a VM would get scheduled on a compute node that does not support the image.
  • Note that the original fix for bug 1414559 committed early in rocky was automatic and always enabled. Because of bug 1786346 that fix has since been reverted and superseded by an opt-in mechanism which must be enabled. Setting [compute]/live_migration_wait_for_vif_plug=True will restore the behavior of waiting for neutron events during the live migration process.

Other Notes

  • The [api]/instance_list_per_project_cells configuration option was added, which controls whether or not an instance list for non-admin users checks all cell databases for results. If disabled (the default), then a list will always contact each cell database looking for instances. This is appropriate if you have a small number of cells, and/or if you spread instances from tenants evenly across cells. If you confine tenants to a subset of cells, then enabling this will result in fewer cell database calls, as nova will only query the cells for which the tenant has instances mapped. Doing this requires one more (fast) call to the API database to get the relevant subset of cells, so if that is likely to always be the same, disabling this feature will provide better performance.
  • A new configuration option, [compute]/live_migration_wait_for_vif_plug, has been added which can be used to configure compute services to wait for network interface plugging to complete on the destination host before starting the guest transfer on the source host during live migration.

    Note that this option is read on the destination host of a live migration. If you set this option the same on all of your compute hosts, which you should do if you use the same networking backend universally, you do not have to worry about this.

    This is disabled by default for backward compatibilty and because the compute service cannot reliably determine which types of virtual interfaces (port.binding:vif_type) will send network-vif-plugged events without an accompanying port binding:host_id change. Open vSwitch and linuxbridge should be OK, but OpenDaylight is at least one known backend that will not currently work in this case, see bug https://launchpad.net/bugs/1755890 for more details.

  • A new nova-manage placement heal_allocations CLI has been added to help migrate users from the deprecated CachingScheduler. Starting in 16.0.0 (Pike), the nova-compute service no longer reports instance allocations to the Placement service because the FilterScheduler does that as part of scheduling. However, the CachingScheduler does not create the allocations in the Placement service, so any instances created using the CachingScheduler after Ocata will not have allocations in Placement. The new CLI allows operators using the CachingScheduler to find all instances in all cells which do not have allocations in Placement and create those allocations. The CLI will skip any instances that are undergoing a task state transition, so ideally this would be run when the API is down but it can be run, if necessary, while the API is up. For more details on CLI usage, see the man page entry:

    https://docs.openstack.org/nova/latest/cli/nova-manage.html#placement

Creative Commons Attribution 3.0 License

Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.