5.1. 1000 Compute nodes resource consumption/scalability testing

Abstract:This document describes a test plan for measuring OpenStack services resources consumption along with scalability potential. It also provides a results which could be used to find bottlenecks and/or potential pain points for scaling standalone OpenStack services and OpenStack cloud itself.

5.1.1. Test Plan

Most of current OpenStack users wonder how it will behave on scale with a lot of compute nodes. This is a valid concern because OpenStack have a lot of services whose have different load and resources consumptions patterns. Most of the cloud operations are related to the two things: workloads placement and simple control/data plane management for them. So the main idea of this test plan is to create simple workloads (10-30k of VMs) and observe how core services working with them and what is resources consumption during active workloads placement and some time after that. Test Environment

Test assumes that each and every service will be monitored separately for resources consuption using known techniques like atop/nagios/containerization and any other toolkits/solutions which will allow to:

  1. Measure CPU/RAM consumption of process/set of processes.
  2. Separate services and provide them as much as possible resources available to fulfill their needs.
List of mandatory services for OpenStack testing:
nova-api nova-scheduler nova-conductor nova-compute glance-api glance-registry neutron-server keystone-all
List of replaceable but still mandatory services:
neutron-dhcp-agent neutron-ovs-agent rabbitmq libvirtd mysqld openvswitch-vswitch
List of optional service which may be omitted with performance decrease:
List of optional service which may be omitted:

Rally fits here as a pretty stable and reliable load runner. Monitoring could be done by any suitable software which will be able to provide a results in a form which allow to build graphs/visualize resources consumption to analyze them or do the analysis automatically. Preparation

Common preparation steps

To begin testing environment should have all the OpenStack services up and running. Of course they should be configured accordingly to the recommended settings from release and/or for your specific environment or use case. To have real world RPS/TPS/etc metrics all the services (including compute nodes) should be on the separate physical servers but again it depends on setup and requirements. For simplicity and testing only control plane the Fake compute driver could be used. Environment description

The environment description includes hardware specification of servers, network parameters, operation system and OpenStack deployment characteristics. Hardware

This section contains list of all types of hardware nodes.

Parameter Value Comments
model   e.g. Supermicro X9SRD-F
CPU   e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz Network

This section contains list of interfaces and network parameters. For complicated cases this section may include topology diagram and switch parameters.

Parameter Value Comments
card model   e.g. Intel
driver   e.g. ixgbe
speed   e.g. 10G or 1G Software

This section describes installed software.

Parameter Value Comments
OS   e.g. Ubuntu 14.04.3
DB   e.g. MySQL 5.6
MQ broker   e.g. RabbitMQ v3.4.25
OpenStack release   e.g. Liberty Configuration

This section describes configuration of OpenStack and core services

Parameter File
Keystone ./results/keystone.conf
Nova-api ./results/nova-api.conf
... Test Case 1: Resources consumption under severe load Description

This test should spawn a number of instances in n parallel threads and along with that record all CPU/RAM metrics from all the OpenStack and core services like MQ brokers and DB server. As test itself is pretty long there is no need in very high test resolution. 1 measure per 5 seconds should be more than enough.

Rally scenario that creates load of 50 parallel threads spawning VMs and calling for VMs list can be found in test plan folder and can be used for testing purposes. It could be modified to fit specific deployment needs. Parameters

Parameter name Value
OpenStack release Liberty, Mitaka
Compute nodes amount 50,100,200,500,1000,2000,5000,10000
Services configurations Configuration for each OpenStack and core service List of performance metrics

Test case result is presented as a weighted tree structure with operations as nodes and time spent on them as node weights for every control plane operation under the test. This information is automatically gathered in Ceilometer and can be gracefully transformed to the human-friendly report via OSprofiler.

Priority Value Measurement Units Description
1 CPU load Mhz CPU load for each OpenStack service
2 RAM consumption Gb RAM consumption for each OpenStack service
3 Instances amnt Amount Max number of instances spawned
4 Operation time milliseconds Time spent for every instance spawn