.. _Results_of_measuring_performance_of_Kargo: ******************************************* Results of measuring performance of Kargo ******************************************* :Abstract: This document includes performance test results of Kargo_ as for Kubernetes deployment solution. All tests have been performed regarding :ref:`Measuring_performance_of_Kargo`. Kargo sets up Kubernetes in the following way: - master: Calico, Kubernetes API services - minion: Calico, Kubernetes minion services - etcd: etcd service Kargo deploys Kubernetes cluster with the following matching hostnames and roles: - node1: minion+master+etcd - node2: minion+master+etcd - node3: minion+etcd - all other nodes: minion Environment description ======================= Hardware configuration of each server ------------------------------------- .. table:: Description of servers hardware +-------+----------------+------------------------+------------------------+ |server |name |node-{1..500} |node-{1..500} | | +----------------+------------------------+------------------------+ | |role |kubernetes cluster | kubernetes cluster | | +----------------+------------------------+------------------------+ | |vendor,model |Dell, R630 |Lenovo, RD550-1U | | +----------------+------------------------+------------------------+ | |operating_system| | 4.4.0-36-generic | | 4.4.0-36-generic | | | | | Ubuntu-xenial | | Ubuntu-xenial | | | | | x86_64 | | x86_64 | +-------+----------------+------------------------+------------------------+ |CPU |vendor,model |Intel, E5-2680v3 |Intel, E5-2680 v3 | | +----------------+------------------------+------------------------+ | |processor_count |2 |2 | | +----------------+------------------------+------------------------+ | |core_count |12 |12 | | +----------------+------------------------+------------------------+ | |frequency_MHz |2500 |2500 | +-------+----------------+------------------------+------------------------+ |RAM |vendor,model |Hynix HMA42GR7MFR4N-TF |Samsung M393A2G40DB0-CPB| | +----------------+------------------------+------------------------+ | |amount_MB |262144 |262144 | +-------+----------------+------------------------+------------------------+ |NETWORK|interface_name |bond0 |bond0 | | +----------------+------------------------+------------------------+ | |vendor,model |Intel, X710 Dual Port |Intel, X710 Dual Port | | +----------------+------------------------+------------------------+ | |interfaces_count|2 |2 | | +----------------+------------------------+------------------------+ | |bandwidth |10G |10G | +-------+----------------+------------------------+------------------------+ |STORAGE|dev_name |/dev/sda |/dev/sda | | +----------------+------------------------+------------------------+ | |vendor,model | | raid1 PERC H730P Mini| | raid1 MegaRAID 3108 | | | | | 2 disks Intel S3610 | | 2 disks Intel S3610 | | +----------------+------------------------+------------------------+ | |SSD/HDD |SSD |SSD | | +----------------+------------------------+------------------------+ | |size | 800GB | 800GB | +-------+----------------+------------------------+------------------------+ Network scheme and part of configuration of hardware network switches --------------------------------------------------------------------- Network scheme of the environment: .. image:: images/Network_Scheme.png :alt: Network Scheme of the environment :scale: 80 Here is the piece of switch configuration for each switch port which is a part of bond0 interface of a server: .. code:: bash show run int et1 interface Ethernet1 description - r02r13c33 switchport trunk native vlan 4 switchport trunk allowed vlan 4 switchport mode trunk channel-group 133 mode active lacp port-priority 16384 spanning-tree portfast show run int po1 interface Port-Channel1 description osscr02r13c21 switchport trunk native vlan 131 switchport trunk allowed vlan 130-159 switchport mode trunk port-channel lacp fallback static port-channel lacp fallback timeout 30 mlag 1 Software configuration of Kargo ----------------------------------------- Setting up Kargo: ^^^^^^^^^^^^^^^^^^^^^^^^ Kargo installation was performed on the bare metal Ubuntu Xenial servers. Kargo requires dedicated user (not root) to exist on the target nodes. To configure and launch Kargo section `Launcher script`_ has been used. .. table:: Versions of some software +--------------------+-----------------------------------------+ |Software |Version | +--------------------+-----------------------------------------+ |Ubuntu |Ubuntu 16.04.1 LTS | +--------------------+-----------------------------------------+ |fuel-ccp-installer |6b26170f70e523fb04bda8d6f15077d461fba9de | | | | +--------------------+-----------------------------------------+ |kargo |016b7893c64fede07269c01cac31e96c8ee0d257 | +--------------------+-----------------------------------------+ Test tool: ^^^^^^^^^^ We were using `Dstat`_ utility as main tool for collecting timing and system performarce durring tests. `Script for parsing collected metrics`_ was used to parse performance metrics after installation tests. Operating system configuration: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can find /etc folder contents from the one of the target servers where K8S cluster was deployed: :download:`etc_tarball_of_node1 ` Testing process =============== 1. Kargo launcher script was set up and executed on node1 server as described in `Setting up Kargo:`_ section. 2. During Kargo run dstat tool was launched on the node1 with the following options: .. code:: bash root@node1:~# dstat --nocolor --time --cpu --mem --net -N bond0 --io --output /root/dstat.csv 3. After finishing of Kargo run we parsed resulted "dstat.csv" files with `Script for parsing collected metrics`_. The above steps were repeated with the following numbers of nodes: 50,150,350 As a result of this part we got the following CSV files: :download:`METRICS(NUMBER_OF_NODES=50) <./results/50_node1.csv>` :download:`METRICS(NUMBER_OF_NODES=150) <./results/150_node1.csv>` :download:`METRICS(NUMBER_OF_NODES=350) <./results/350_node1.csv>` Results ======= After simple processing results the following plots for performance metrics collected during provisioning of the nodes in depend on time created (click to expand an image): CPU(TIME), RAM(TIME) ^^^^^^^^^^^^^^^^^^^^ +---------+---------------------------------------+---------------------------------------+ |Number of|Plot CPU(TIME) |Plot RAM(TIME) | |nodes | | | +=========+=======================================+=======================================+ |50 |.. image:: images/50_nodes_-_CPU.png |.. image:: images/50_nodes_-_RAM.png | | | :alt: CPU_USAGE(TIME, NODES=50) | :alt: RAM_USAGE(TIME, NODES=50) | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ |150 |.. image:: images/150_nodes_-_CPU.png |.. image:: images/150_nodes_-_RAM.png | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ |350 |.. image:: images/350_nodes_-_CPU.png |.. image:: images/350_nodes_-_RAM.png | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ NET(TIME), DISK(TIME) ^^^^^^^^^^^^^^^^^^^^^ +---------+---------------------------------------+---------------------------------------+ |Number of|Plot NET(TIME) |Plot DISK(TIME) | |nodes | | | +=========+=======================================+=======================================+ |50 |.. image:: images/50_nodes_-_net.png |.. image:: images/50_nodes_-_disk.png | | | :alt: CPU_USAGE(TIME, NODES=50) | :alt: RAM_USAGE(TIME, NODES=50) | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ |150 |.. image:: images/150_nodes_-_net.png |.. image:: images/150_nodes_-_disk.png | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ |350 |.. image:: images/350_nodes_-_net.png |.. image:: images/350_nodes_-_disk.png | | | :scale: 6 | :scale: 6 | +---------+---------------------------------------+---------------------------------------+ The following table shows how performance metrics and deployment time parameters depend on a number of nodes. +----------------------+---------------+---------------+---------------+ |number of nodes |50 |150 |350 | +======================+===============+===============+===============+ |deployment time |2049.00 |3922.00 |13065.00 | +----------------------+---------------+---------------+---------------+ |cpu_usage_max |99.0210 |99.56 |99.06 | +----------------------+---------------+---------------+---------------+ |cpu_usage_min |0 |0 |0 | +----------------------+---------------+---------------+---------------+ |cpu_usage_average |7.2920 |10.03 |12.63 | +----------------------+---------------+---------------+---------------+ |cpu_usage_percentile |19.6495 |24.92 |29.12 | |90% | | | | +----------------------+---------------+---------------+---------------+ |ram_usage_max |4466.10 |13859.56 |112079.57 | +----------------------+---------------+---------------+---------------+ |ram_usage_min |1061.51 |1033.32 |1075.16 | +----------------------+---------------+---------------+---------------+ |ram_usage_average |2121.20 |4335.69 |31288.94 | +----------------------+---------------+---------------+---------------+ |ram_usage_percentile |2876.33 |8570.32 |79915.96 | |90% | | | | +----------------------+---------------+---------------+---------------+ |net_all_max |3864760.75 |20996615.75 |60130883.88 | +----------------------+---------------+---------------+---------------+ |net_all_min |0 |0 |0 | +----------------------+---------------+---------------+---------------+ |net_all_average |70602.55 |102913.32 |177943.40 | +----------------------+---------------+---------------+---------------+ |net_all_percentile |253590.90 |263933.25 |180409.81 | |90% | | | | +----------------------+---------------+---------------+---------------+ |dsk_io_all_max |3503 |3196 |3470 | +----------------------+---------------+---------------+---------------+ |dsk_io_all_min |0 |0 |0 | +----------------------+---------------+---------------+---------------+ |dsk_io_all_average |26 |37 |56 | +----------------------+---------------+---------------+---------------+ |dsk_io_all_percentile |58 |14 |8 | |90% | | | | +----------------------+---------------+---------------+---------------+ Issues that have been found during the tests ============================================= During the testing we've found several issues that prevented us from achieving test results at scale: +---------------------------------------+-----------------------------------------------+ |Issue |Link | +=======================================+===============================================+ |etcd list sometimes hangs | https://github.com/kubespray/kargo/pull/448 | +---------------------------------------+-----------------------------------------------+ |K8S DNS services not working correctly | https://github.com/kubespray/kargo/pull/458 | +---------------------------------------+-----------------------------------------------+ |Calico creates extra pool during run | https://github.com/kubespray/kargo/pull/462 | +---------------------------------------+-----------------------------------------------+ |Timeout to quay.io to fetch etcd image | https://github.com/kubespray/kargo/pull/481 | +---------------------------------------+-----------------------------------------------+ |Downloading images doesn't scale well | https://github.com/kubespray/kargo/pull/488 | +---------------------------------------+-----------------------------------------------+ |Kargo is too slow on scale | https://github.com/kubespray/kargo/issues/478 | +---------------------------------------+-----------------------------------------------+ Applications ============ Launcher script ------------------------------------ .. literalinclude:: scripts/create_kargo_env.sh :language: bash Script for parsing collected metrics ------------------------------------------------ .. literalinclude:: scripts/parse_dstat.sh :language: bash .. references: .. _Fuel-ccp-installer: https://github.com/openstack/fuel-ccp-installer .. _Kargo: https://github.com/kubespray/kargo .. _Dstat: http://dag.wiee.rs/home-made/dstat/