Provide resource provider - request group mapping in allocation candidates¶

https://blueprints.launchpad.net/nova/+spec/placement-resource-provider-request-group-mapping-in-allocation-candidates

To support QoS minimum bandwidth policy during server scheduling Neutron needs to know which resource provider provides the bandwidth resource for each port in the server create request. Similar needs arise in case of handling VGPUs and accelerator devices.

Problem description¶

Placement supports granular request groups in the GET allocation_candidates query but the returned allocation candidates do not contain explicit information about which granular request group is fulfilled by which RP in the candidate. For example the resource request of a Neutron port is mapped to a granular request group by Nova towards Placement during scheduling. After scheduling Neutron needs the information about which port got allocation from which RP to set up the proper port binding towards those network device RPs. Similar examples can be created with VGPU and accelerator devices.

Doing this mapping in Nova is possible (see the current implementation) but scales pretty badly even for small amount of ports in a single server create request. See the Non-scalable Nova based solution section with detailed examples and analysis.

On the other hand when Placement builds an allocation candidate it does that by building allocations for each granular request group. Therefore Placement could include the necessary mapping information in the response with significantly less effort.

So doing the mapping in Nova also duplicates logic that is already implemented in Placement.

Use Cases¶

The use case of the bandwidth resource provider spec applies here because to fulfill that use case in a scalable way we need to consider the change proposed in this spec. Similarly handling VGPUs and accelerator devices requires this mapping information as well.

Proposed change¶

Extend the response of the GET /allocation_candidates API with an extra field mapping for each candidate. This field contains a mapping between resource request group names and RP UUIDs for each candidate to express which RP provides the resource for which request groups.

Alternatives¶

For API alternatives about the proposed REST API change see the REST API section.

Non-scalable Nova based solution¶

Given a single compute with the following inventories:

Compute RP (name=compute1, uuid=compute_uuid)
 +    CPU = 1
 |    MEMORY = 1024
 |    DISK = 10
 |
 +--+Network agent RP (for SRIOV agent),
        +    uuid=sriov_agent_uuid
        |
        |
        +--+Physical network interface RP
        |    uuid = uuid5(compute1:eth0)
        |    resources:
        |        NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=2000
        |        NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=2000
        |    traits:
        |        CUSTOM_PHYSNET_1
        |        CUSTOM_VNIC_TYPE_DIRECT
        |
        +--+Physical network interface RP
             uuid = uuid5(compute1:eth1)
             resources:
                 NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=2000
                 NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=2000
             traits:
                 CUSTOM_PHYSNET_1
                 CUSTOM_VNIC_TYPE_DIRECT

Example 1 - boot with a single port having bandwidth request¶

Neutron port:

{
    'id': 'da941911-a70d-4aac-8be0-c3b263e6fd4f',
    'resource_request': {
        "resources": {
            "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND": 1000,
            "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND": 1000},
        "required": ["CUSTOM_PHYSNET_1",
                     "CUSTOM_VNIC_TYPE_DIRECT"]
    }
}

Placement request during scheduling:

GET /placement/allocation_candidates?
    limit=1000&
    resources=DISK_GB=1,MEMORY_MB=512,VCPU=1&
    required1=CUSTOM_PHYSNET_1,CUSTOM_VNIC_TYPE_DIRECT&
    resources1=NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=1000,
               NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=1000

Placement response:

{
   "allocation_requests":[
      {
         "allocations":{
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000
               }
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
               }
            }
         }
      },
      // ... another similar allocations with uuid5(compute1:eth1)
   ],
   "provider_summaries":{
       // ...
   }
}

Filter scheduler selects the first candidate that points to uuid5(compute1:eth0)

The nova-compute needs to pass RP UUID which provides resource for each port to Neutron in the port binding. To be able to do that nova (in the current implementation the nova-conductor) needs to find the RP in the selected allocation candidate which provides the resources the Neutron port is requested. The current implementation does this by checking which RP provides the matching resource classes and resource amounts.

During port binding nova updates the port with that network device RP:

{
  "id":"da941911-a70d-4aac-8be0-c3b263e6fd4f",
  "resource_request":{
      "resources":{
         "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000,
         "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000
      },
      "required":[
         "CUSTOM_PHYSNET_1",
         "CUSTOM_VNIC_TYPE_DIRECT"
      ]
   },
   "binding:host_id":"compute1",
   "binding:profile":{
      "allocation": uuid5(compute1:eth0)
   },
}

This scenario is easy as only one port is requesting bandwidth resources so there will be only one RP in the each allocation candidate that provides such resources.

Example 2 - boot with two ports having bandwidth request¶

Neutron port1:

{
    'id': 'da941911-a70d-4aac-8be0-c3b263e6fd4f',
    'resource_request': {
        "resources": {
            "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND": 1000,
            "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND": 1000},
        "required": ["CUSTOM_PHYSNET_1",
                     "CUSTOM_VNIC_TYPE_DIRECT"]
    }
}

Neutron port2:

{
    'id': '2f2613ce-95a9-490a-b3c4-5f1c28c1f886',
    'resource_request': {
        "resources": {
            "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND": 1000,
            "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND": 2000},
        "required": ["CUSTOM_PHYSNET_1",
                     "CUSTOM_VNIC_TYPE_DIRECT"]
    }
}

Placement request during scheduling:

GET /placement/allocation_candidates?
    group_policy=isolate&
    limit=1000&
    resources=DISK_GB=1,MEMORY_MB=512,VCPU=1&
    required1=CUSTOM_PHYSNET_1,CUSTOM_VNIC_TYPE_DIRECT&
    resources1=NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=1000,
               NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=1000&
    required2=CUSTOM_PHYSNET_1,CUSTOM_VNIC_TYPE_DIRECT&
    resources2=NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=1000,
               NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=2000

In the above request the granular request group1 is generated from port1 and granular request group2 is generated from port2.

Placement response:

{
   "allocation_requests":[
      {
         "allocations":{
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000
               }
            },
            uuid5(compute1:eth1):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":2000
               }
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
               }
            }
         }
      },
      // ... another similar allocation_request where the allocated
      // amounts are reversed between uuid5(compute1:eth0) and
      // uuid5(compute1:eth1)
   ],
   "provider_summaries":{
       // ...
   }
}

Filter scheduler selects the first candidate.

Nova needs to find the RP in the selected allocation candidate which provides the resources for each Neutron port request.

For the selected allocation candidate there are two possible port - RP mappings but only one valid mapping if we consider the bandwidth amounts:

port1 - uuid5(compute1:eth0)
port2 - uuid5(compute1:eth1)

When Nova tries to map the first port, port1, then both uuid5(compute1:eth0) and uuid5(compute1:eth1) still has enough resources in the allocation request to match with the request of port1. So at that point Nova can map port1 to uuid5(compute1:eth1). However this means that Nova will not find any viable mapping later for port2 and therefore Nova has to go back an retry to create the mapping with port1 mapped to the other alternative. This means that Nova needs to implement a full backtracking algorithm to find the proper mapping.

Scaling considerations¶

With 4 RPs and 4 ports, in worst case, we have 4! (24) possible mappings and each mappings needs 4 steps to be generated (assuming that in the worst case the mapping of the 4th port is the one that fails). So this backtrack makes 96 steps. So I think this code will scale pretty badly.

Note that our example uses the group_policy=isolate query param so the RPs in the allocation candidate cannot overlap. If we set group_policy=none and therefore allow RP overlapping then the necessary calculation step could grow even more.

Note that even if having more than 4 ports for an server considered unrealistic, additional granular request groups can appear in the allocation candidate request from other sources than Neutron, e.g. from flavor extra_spec due to VGPUs or from Cyborg due to accelerators.

Data model impact¶

None

REST API impact¶

Extend the response of the GET /allocation_candidates API with an extra field mappings for each candidate in a new microversion. This field contains a mapping between resource request group names and RP UUIDs for each candidate to express which RP provides the resource for which request groups.

For the request:

GET /placement/allocation_candidates?
    resources=DISK_GB=1,MEMORY_MB=512,VCPU=1&
    required1=CUSTOM_PHYSNET_1,CUSTOM_VNIC_TYPE_DIRECT&
    resources1=NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=1000,
               NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=1000&
    required2=CUSTOM_PHYSNET_1,CUSTOM_VNIC_TYPE_DIRECT&
    resources2=NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND=1000,
               NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND=2000

Placement would return the response:

{
   "allocation_requests":[
      {
         "allocations":{
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000
               },
            },
            uuid5(compute1:eth1):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":2000
               },
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
                },
            }
         },
         "mappings": {
             "1": [uuid5(compute1:eth0)],
             "2": [uuid5(compute1:eth1)],
             "": [compute_uuid],
         },
      },
      {
         "allocations":{
            uuid5(compute1:eth1):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000
               },
            },
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":2000
               },
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
                },
            }
         },
         "mappings": {
             "1": [uuid5(compute1:eth1)],
             "2": [uuid5(compute1:eth0)],
             "": [compute_uuid],
         },
      },
   ],
   "provider_summaries":{
       // unchanged
   }
}

The numbered groups are always satisfied by a single RP so the length of the mapping value will be always 1. However the unnumbered group might be satisfied by more than one RPs so the length of the mapping value there can be bigger than 1.

This new field will be added to the schema for POST /allocations, PUT /allocations/{consumer_uuid}, and POST /reshaper so the client does not need to strip it from the candidate before posting that back to Placement to make the allocation. The contents of the field will be ignored by these operations.

Alternatively the mapping can be added as a separate top level key to the response.

Response:

{
   "allocation_requests":[
      {
         "allocations":{
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000
               },
            },
            uuid5(compute1:eth1):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":2000
               },
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
                },
            }
         }
      },
      {
         "allocations":{
            uuid5(compute1:eth0):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":2000
               },
            },
            uuid5(compute1:eth1):{
              "resources":{
                  "NET_BANDWIDTH_EGRESS_KILOBITS_PER_SECOND":1000,
                  "NET_BANDWIDTH_INGRESS_KILOBITS_PER_SECOND":1000
               },
            },
            compute_uuid:{
               "resources":{
                  "MEMORY_MB":512,
                  "DISK_GB":1,
                  "VCPU":1
                },
            }
         }
      },
   ],
   "provider_summaries":{
       // unchanged
   }

   "resource_provider-request_group-mappings":[
       {
           "1": [uuid5(compute1:eth0)],
           "2": [uuid5(compute1:eth1)],
           "": [compute_uuid],
       },
       {
           "1": [uuid5(compute1:eth1)],
           "2": [uuid5(compute1:eth0)],
           "": [compute_uuid],
       }
   ]
}

This has the advantage that the allocation requests are unchanged and therefore still can be transparently sent back to placement to do the allocation.

This has the disadvantage that one mapping in the resource_provider-request_group-mappings connected to one candidate in the allocation_requests list by the list index only.

We decided to go with the primary proposal.

Security impact¶

None

Notifications impact¶

None

Other end user impact¶

None

Performance Impact¶

None

Other deployer impact¶

None

Developer impact¶

None

Upgrade impact¶

None

Implementation¶

Assignee(s)¶

Primary assignee:: None

Work Items¶

Extend the placement allocation candidate generation algorithm to return the mapping that is internally calculated.
Extend the API with a new microversion to return the mapping to the API client as well
Within the same microverison extend the JSON schema for POST /allocations, PUT /allocations/{uuid}, and POST /reshaper to accept (and ignore) the mappings key.

Dependencies¶

None

Testing¶

New gabbi tests for the new API microversion and unit test to cover the unhappy path.

Documentation Impact¶

Placement API ref needs to be updated with the new microversion.

References¶

History¶

Revisions¶
Release Name	Description
Stein	Proposed in nova spec repo but was not approved
Train	Re-proposed in the placement repo

Provide resource provider - request group mapping in allocation candidates

Provide resource provider - request group mapping in allocation candidates¶

Problem description¶

Use Cases¶

Proposed change¶

Alternatives¶

Non-scalable Nova based solution¶

Example 1 - boot with a single port having bandwidth request¶

Example 2 - boot with two ports having bandwidth request¶

Scaling considerations¶

Data model impact¶

REST API impact¶

Security impact¶

Notifications impact¶

Other end user impact¶

Performance Impact¶

Other deployer impact¶

Developer impact¶

Upgrade impact¶

Implementation¶

Assignee(s)¶

Work Items¶

Dependencies¶

Testing¶

Documentation Impact¶

References¶

History¶

openstack-placement 14.1.0.dev22

Page Contents