3. Results of measuring performance of Kargo


This document includes performance test results of Kargo as for Kubernetes deployment solution. All tests have been performed regarding Measuring performance of Kargo.

Kargo sets up Kubernetes in the following way:

  • master: Calico, Kubernetes API services
  • minion: Calico, Kubernetes minion services
  • etcd: etcd service

Kargo deploys Kubernetes cluster with the following matching hostnames and roles:

  • node1: minion+master+etcd
  • node2: minion+master+etcd
  • node3: minion+etcd
  • all other nodes: minion

3.1. Environment description

3.1.1. Hardware configuration of each server

Description of servers hardware
server name node-{1..500} node-{1..500}
role kubernetes cluster kubernetes cluster
vendor,model Dell, R630 Lenovo, RD550-1U
CPU vendor,model Intel, E5-2680v3 Intel, E5-2680 v3
processor_count 2 2
core_count 12 12
frequency_MHz 2500 2500
RAM vendor,model Hynix HMA42GR7MFR4N-TF Samsung M393A2G40DB0-CPB
amount_MB 262144 262144
NETWORK interface_name bond0 bond0
vendor,model Intel, X710 Dual Port Intel, X710 Dual Port
interfaces_count 2 2
bandwidth 10G 10G
STORAGE dev_name /dev/sda /dev/sda
raid1 PERC H730P Mini
2 disks Intel S3610
raid1 MegaRAID 3108
2 disks Intel S3610
size 800GB 800GB

3.1.2. Network scheme and part of configuration of hardware network switches

Network scheme of the environment:

Network Scheme of the environment

Here is the piece of switch configuration for each switch port which is a part of bond0 interface of a server:

show run int et1 interface Ethernet1

description - r02r13c33 switchport trunk native vlan 4 switchport trunk allowed vlan 4 switchport mode trunk channel-group 133 mode active lacp port-priority 16384 spanning-tree portfast

show run int po1 interface Port-Channel1

description osscr02r13c21 switchport trunk native vlan 131 switchport trunk allowed vlan 130-159 switchport mode trunk port-channel lacp fallback static port-channel lacp fallback timeout 30 mlag 1

3.1.3. Software configuration of Kargo Setting up Kargo:

Kargo installation was performed on the bare metal Ubuntu Xenial servers. Kargo requires dedicated user (not root) to exist on the target nodes. To configure and launch Kargo section Launcher script has been used.

Versions of some software
Software Version
Ubuntu Ubuntu 16.04.1 LTS
fuel-ccp-installer 6b26170f70e523fb04bda8d6f15077d461fba9de
kargo 016b7893c64fede07269c01cac31e96c8ee0d257 Test tool:

We were using Dstat utility as main tool for collecting timing and system performarce durring tests. Script for parsing collected metrics was used to parse performance metrics after installation tests. Operating system configuration:

You can find /etc folder contents from the one of the target servers where K8S cluster was deployed: etc_tarball_of_node1

3.2. Testing process

  1. Kargo launcher script was set up and executed on node1 server as described in Setting up Kargo: section.
  2. During Kargo run dstat tool was launched on the node1 with the following options:
root@node1:~# dstat --nocolor --time --cpu --mem --net -N bond0 --io --output /root/dstat.csv
  1. After finishing of Kargo run we parsed resulted “dstat.csv” files with Script for parsing collected metrics.

The above steps were repeated with the following numbers of nodes: 50,150,350

As a result of this part we got the following CSV files:




3.3. Results

After simple processing results the following plots for performance metrics collected during provisioning of the nodes in depend on time created (click to expand an image):

Number of nodes Plot CPU(TIME) Plot RAM(TIME)
150 ../../../../_images/150_nodes_-_CPU.png ../../../../_images/150_nodes_-_RAM.png
350 ../../../../_images/350_nodes_-_CPU.png ../../../../_images/350_nodes_-_RAM.png
Number of nodes Plot NET(TIME) Plot DISK(TIME)
150 ../../../../_images/150_nodes_-_net.png ../../../../_images/150_nodes_-_disk.png
350 ../../../../_images/350_nodes_-_net.png ../../../../_images/350_nodes_-_disk.png

The following table shows how performance metrics and deployment time parameters depend on a number of nodes.

number of nodes 50 150 350
deployment time 2049.00 3922.00 13065.00
cpu_usage_max 99.0210 99.56 99.06
cpu_usage_min 0 0 0
cpu_usage_average 7.2920 10.03 12.63
cpu_usage_percentile 90% 19.6495 24.92 29.12
ram_usage_max 4466.10 13859.56 112079.57
ram_usage_min 1061.51 1033.32 1075.16
ram_usage_average 2121.20 4335.69 31288.94
ram_usage_percentile 90% 2876.33 8570.32 79915.96
net_all_max 3864760.75 20996615.75 60130883.88
net_all_min 0 0 0
net_all_average 70602.55 102913.32 177943.40
net_all_percentile 90% 253590.90 263933.25 180409.81
dsk_io_all_max 3503 3196 3470
dsk_io_all_min 0 0 0
dsk_io_all_average 26 37 56
dsk_io_all_percentile 90% 58 14 8

3.4. Issues that have been found during the tests

During the testing we’ve found several issues that prevented us from achieving test results at scale:

Issue Link
etcd list sometimes hangs https://github.com/kubespray/kargo/pull/448
K8S DNS services not working correctly https://github.com/kubespray/kargo/pull/458
Calico creates extra pool during run https://github.com/kubespray/kargo/pull/462
Timeout to quay.io to fetch etcd image https://github.com/kubespray/kargo/pull/481
Downloading images doesn’t scale well https://github.com/kubespray/kargo/pull/488
Kargo is too slow on scale https://github.com/kubespray/kargo/issues/478

3.5. Applications

3.5.1. Launcher script

#!/bin/bash -xe

if [[ -d ./fuel-ccp-installer ]] ; then
    rm -rf ./fuel-ccp-installer

git clone https://review.openstack.org/openstack/fuel-ccp-installer

export ENV_NAME="kargo-test"
export DEPLOY_METHOD="kargo"
export WORKSPACE="~/workspace"
export ADMIN_USER="vagrant"
export ADMIN_PASSWORD="kargo"

# for 50 nodes
#export SLAVES_COUNT=50
#export ADMIN_IP=""
#export SLAVE_IPS=""

# for 150 nodes:
#export SLAVES_COUNT=150
#export ADMIN_IP=""
#export SLAVE_IPS=""

# for 350 nodes:
#export SLAVES_COUNT=350
#export ADMIN_IP=""
#export SLAVE_IPS=""

export CUSTOM_YAML='docker_version: 1.12
hyperkube_image_repo: "quay.io/coreos/hyperkube"
hyperkube_image_tag: "v1.3.5_coreos.0"
etcd_image_repo: "quay.io/coreos/etcd"
etcd_image_tag: "v3.0.1"
calicoctl_image_repo: "calico/ctl"
#calico_node_image_repo: "calico/node"
calico_node_image_repo: "l23network/node"
calico_node_image_tag: "v0.20.0"
calicoctl_image_tag: "v0.20.0"
kube_apiserver_insecure_bind_address: ""'

mkdir -p $WORKSPACE
echo "Running on $NODE_NAME: $ENV_NAME"
cd ./fuel-ccp-installer

bash -xe "./utils/jenkins/run_k8s_deploy_test.sh"

3.5.2. Script for parsing collected metrics

#!/bin/bash -e

if [[ ! $1 ]] || [[ ! $2 ]] ; then
    echo \$1 = kargo_env_name, \$2 = csv file path
    exit 1

csv_name=`basename $2`
if [[ ! -d $cur_dir ]] ; then mkdir -p $cur_dir ; fi

awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline;
    print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"}
    {printf "%s,%0.3f,%0.3f,%0.3f,%0.3f,%0.3f,%d,%d,%d\n", $1,100-$4,$8/1048576,$12/8,$13/8,($12+$13)/8,$14,$15,$14+$15 }' $2 > $cur_dir/${csv_name}