Resize and cold migrate

The resize API and cold migrate API are commonly confused in nova because the internal API code, conductor code and compute code use the same methods. This document explains some of the differences in what happens between a resize and cold migrate operation.

For the most part this document describes same-cell resize. For details on cross-cell resize, refer to Cross-cell resize.

High level

Cold migrate is an operation performed by an administrator to power off and move a server from one host to a different host using the same flavor. Volumes and network interfaces are disconnected from the source host and connected on the destination host. The type of file system between the hosts and image backend determine if the server files and disks have to be copied. If copy is necessary then root and ephemeral disks are copied and swap disks are re-created.

Resize is an operation which can be performed by a non-administrative owner of the server (the user) with a different flavor. The new flavor can change certain aspects of the server such as the number of CPUS, RAM and disk size. Otherwise for the most part the internal details are the same as a cold migration.

Scheduling

Depending on how the API is configured for allow_resize_to_same_host, the server may be able to be resized on the current host. All compute drivers support resizing to the same host but only the vCenter driver supports cold migrating to the same host. Enabling resize to the same host is necessary for features such as strict affinity server groups where there are more than one server in the same affinity group.

Starting with microversion 2.56 an administrator can specify a destination host for the cold migrate operation. Resize does not allow specifying a destination host.

Flavor

As noted above, with resize the flavor must change and with cold migrate the flavor will not change.

Resource claims

Both resize and cold migration perform a resize claim on the destination node. Historically the resize claim was meant as a safety check on the selected node to work around race conditions in the scheduler. Since the scheduler started atomically claiming VCPU, MEMORY_MB and DISK_GB allocations using Placement the role of the resize claim has been reduced to detecting the same conditions but for resources like PCI devices and NUMA topology which, at least as of the 20.0.0 (Train) release, are not modeled in Placement and as such are not atomic.

If this claim fails, the operation can be rescheduled to an alternative host, if there are any. The number of possible alternative hosts is determined by the scheduler.max_attempts configuration option.

Allocations

Since the 16.0.0 (Pike) release, the scheduler uses the placement service to filter compute nodes (resource providers) based on information in the flavor and image used to build the server. Once the scheduler runs through its filters and weighers and picks a host, resource class allocations are atomically consumed in placement with the server as the consumer.

During both resize and cold migrate operations, the allocations held by the server consumer against the source compute node resource provider are moved to a migration record and the scheduler will create allocations, held by the instance consumer, on the selected destination compute node resource provider. This is commonly referred to as migration-based allocations which were introduced in the 17.0.0 (Queens) release.

If the operation is successful and confirmed, the source node allocations held by the migration record are dropped. If the operation fails or is reverted, the source compute node resource provider allocations held by the migration record are reverted back to the instance consumer and the allocations against the destination compute node resource provider are dropped.

Summary of differences

Resize

Cold migrate

New flavor

Yes

No

Authorization (default)

Admin or owner (user)

Policy rule: os_compute_api:servers:resize

Admin only

Policy rule: os_compute_api:os-migrate-server:migrate

Same host

Maybe

Only vCenter

Can specify target host

No

Yes (microversion >= 2.56)

Sequence Diagrams

The following diagrams are current as of the 21.0.0 (Ussuri) release.

Resize

This is the sequence of calls to get the server to VERIFY_RESIZE status.

Resize standard workflow

Confirm resize

This is the sequence of calls when confirming or deleting a server in VERIFY_RESIZE status.

Note that in the below diagram, if confirming a resize while deleting a server the API synchronously calls the source compute service.

Resize confirm workflow

Revert resize

This is the sequence of calls when reverting a server in VERIFY_RESIZE status.

Resize revert workflow