.. _1000_nodes: =========================================================== 1000 Compute nodes resource consumption/scalability testing =========================================================== :status: **ready** :version: 1.0 :Abstract: This document describes a test plan for measuring OpenStack services resources consumption along with scalability potential. It also provides a results which could be used to find bottlenecks and/or potential pain points for scaling standalone OpenStack services and OpenStack cloud itself. Test Plan ========= Most of current OpenStack users wonder how it will behave on scale with a lot of compute nodes. This is a valid concern because OpenStack have a lot of services whose have different load and resources consumptions patterns. Most of the cloud operations are related to the two things: workloads placement and simple control/data plane management for them. So the main idea of this test plan is to create simple workloads (10-30k of VMs) and observe how core services working with them and what is resources consumption during active workloads placement and some time after that. Test Environment ---------------- Test assumes that each and every service will be monitored separately for resources consuption using known techniques like atop/nagios/containerization and any other toolkits/solutions which will allow to: 1. Measure CPU/RAM consumption of process/set of processes. 2. Separate services and provide them as much as possible resources available to fulfill their needs. List of mandatory services for OpenStack testing: nova-api nova-scheduler nova-conductor nova-compute glance-api glance-registry neutron-server keystone-all List of replaceable but still mandatory services: neutron-dhcp-agent neutron-ovs-agent rabbitmq libvirtd mysqld openvswitch-vswitch List of optional service which may be omitted with performance decrease: memcached List of optional service which may be omitted: horizon Rally fits here as a pretty stable and reliable load runner. Monitoring could be done by any suitable software which will be able to provide a results in a form which allow to build graphs/visualize resources consumption to analyze them or do the analysis automatically. Preparation ^^^^^^^^^^^ **Common preparation steps** To begin testing environment should have all the OpenStack services up and running. Of course they should be configured accordingly to the recommended settings from release and/or for your specific environment or use case. To have real world RPS/TPS/etc metrics all the services (including compute nodes) should be on the separate physical servers but again it depends on setup and requirements. For simplicity and testing only control plane the Fake compute driver could be used. Environment description ^^^^^^^^^^^^^^^^^^^^^^^ The environment description includes hardware specification of servers, network parameters, operation system and OpenStack deployment characteristics. Hardware ~~~~~~~~ This section contains list of all types of hardware nodes. +-----------+-------+----------------------------------------------------+ | Parameter | Value | Comments | +-----------+-------+----------------------------------------------------+ | model | | e.g. Supermicro X9SRD-F | +-----------+-------+----------------------------------------------------+ | CPU | | e.g. 6 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz | +-----------+-------+----------------------------------------------------+ Network ~~~~~~~ This section contains list of interfaces and network parameters. For complicated cases this section may include topology diagram and switch parameters. +------------------+-------+-------------------------+ | Parameter | Value | Comments | +------------------+-------+-------------------------+ | card model | | e.g. Intel | +------------------+-------+-------------------------+ | driver | | e.g. ixgbe | +------------------+-------+-------------------------+ | speed | | e.g. 10G or 1G | +------------------+-------+-------------------------+ Software ~~~~~~~~ This section describes installed software. +-------------------+--------+---------------------------+ | Parameter | Value | Comments | +-------------------+--------+---------------------------+ | OS | | e.g. Ubuntu 14.04.3 | +-------------------+--------+---------------------------+ | DB | | e.g. MySQL 5.6 | +-------------------+--------+---------------------------+ | MQ broker | | e.g. RabbitMQ v3.4.25 | +-------------------+--------+---------------------------+ | OpenStack release | | e.g. Liberty | +-------------------+--------+---------------------------+ Configuration ~~~~~~~~~~~~~ This section describes configuration of OpenStack and core services +-------------------+-------------------------------+ | Parameter | File | +-------------------+-------------------------------+ | Keystone | ./results/keystone.conf | +-------------------+-------------------------------+ | Nova-api | ./results/nova-api.conf | +-------------------+-------------------------------+ | ... + | +-------------------+-------------------------------+ Test Case 1: Resources consumption under severe load ---------------------------------------------------- Description ^^^^^^^^^^^ This test should spawn a number of instances in n parallel threads and along with that record all CPU/RAM metrics from all the OpenStack and core services like MQ brokers and DB server. As test itself is pretty long there is no need in very high test resolution. 1 measure per 5 seconds should be more than enough. Rally scenario that creates load of 50 parallel threads spawning VMs and calling for VMs list can be found in test plan folder and can be used for testing purposes. It could be modified to fit specific deployment needs. Parameters ^^^^^^^^^^ ============================ ==================================================== Parameter name Value ============================ ==================================================== OpenStack release Liberty, Mitaka Compute nodes amount 50,100,200,500,1000,2000,5000,10000 Services configurations Configuration for each OpenStack and core service ============================ ==================================================== List of performance metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Test case result is presented as a weighted tree structure with operations as nodes and time spent on them as node weights for every control plane operation under the test. This information is automatically gathered in Ceilometer and can be gracefully transformed to the human-friendly report via OSprofiler. ======== =============== ================= ================================= Priority Value Measurement Units Description ======== =============== ================= ================================= 1 CPU load Mhz CPU load for each OpenStack service 2 RAM consumption Gb RAM consumption for each OpenStack service 3 Instances amnt Amount Max number of instances spawned 4 Operation time milliseconds Time spent for every instance spawn ======== =============== ================= ================================= Reports ======= Test plan execution reports: * :ref:`1000_nodes_report` * :ref:`1000_nodes_fake_driver_report`