Vexxhost Magnum Cluster API driver

About

Magnum can be deployed with support for the Kubernetes Cluster API using this repository. This page describes the Vexxhost Magnum Cluster API driver.

The role builds upon a control plane Kubernetes cluster which is instantiated during the OSA setup-infrastructure stage, adding driver support into Magnum.

The following architectural features are present:

  • The control plane k8s cluster is an integral part of the openstack-ansible deployment, and forms part of the foundational components alongside mariadb and rabbitmq.

  • The control plane k8s cluster is deployed on the infra hosts and integrated with the haproxy loadbalancer and OpenStack internal API endpoint, and not exposed outside of the deployment

  • SSL is supported between all components and configuration is possible to support different certificate authorities on the internal and external loadbalancer endpoints.

  • Control plane traffic can stay entirely within the management network if required

  • The magnum-cluster-api-proxy service is deployed to allow communication between the control plane and workload clusters when a floating IP is not attached to the workload cluster.

  • It is possible to do a completely offline install for airgapped environments

The magnum-cluster-api driver for magnum can be found here https://github.com/vexxhost/magnum-cluster-api

Documentation for the Vexxhost magnum-cluster-api driver is here https://vexxhost.github.io/magnum-cluster-api/

The ansible collection used to deploy the controlplane k8s cluster is here https://github.com/adriacloud/ansible-collection-kubernetes

The ansible collection used to deploy the container runtime for the controlplane k8s cluster is here https://github.com/vexxhost/ansible-collection-containers

These playbooks require Openstack-Ansible Flamingo or later. An earlier version was provided in the openstack-ansible-ops repository

Highlevel overview of the Magnum infrastructure these playbooks will build and operate against.

OSA Magnum Cluster API Architecture

Pre-requisites

  • An existing openstack-ansible deployment

  • Control plane using LXC containers, bare metal deployment is not tested

  • Core openstack services plus Octavia

OpenStack-Ansible Integration

OpenStack-Ansible configuration for magnum-cluster-api driver

Define the physical hosts that will host the controlplane k8s cluster in /etc/openstack_deploy/conf.d/k8s.yml. This example is for an all-in-one deployment and should be adjusted to match a real deployment with multiple hosts if high availability is required.

cluster_api_hosts:
  aio1:
    ip: 172.29.236.100
    management_ip: 172.29.236.100

You can set config-overrides for the control plane of the k8s cluster in /etc/openstack_deploy/group_vars/k8s_all/main.yml.

---
# Pick a range of addresses for cilium that do not collide with anything else
cilium_ipv4_cidr: 172.29.200.0/22

# Set a clusterctl version. Supported list can be found in defaults:
# https://github.com/adriacloud/ansible-collection-kubernetes/blob/main/roles/clusterctl/defaults/main.yml
clusterctl_version: 1.10.5
cluster_api_version: 1.10.5
cluster_api_infrastructure_provider: openstack
cluster_api_infrastructure_version: 0.12.4

# Define k8s version for the control cluster
kubernetes_version: 1.33.5

# Enable periodic cluster API state collection (note: this is not a guaranteed functional backup)
# See https://cluster-api.sigs.k8s.io/clusterctl/commands/move
cluster_api_backups_enabled: False

Next, set up config-overrides for the magnum service in /etc/openstack_deploy/group_vars/magnum_all/main.yml. You should ensure suitable images are uploaded for tenants’ k8s cluster hosts.

Attention must be given to the SSL configuration. Users and workload clusters will interact with the external endpoint and must trust the SSL certificate. The magnum service and cluster-api can be configured to interact with either the external or internal endpoint and must trust the SSL certificiate. Depending on the environment, these may be derived from different certificate authorities.

---
magnum_k8s_driver: "vexxhost"

magnum_capi_vexxhost_git_install_branch: v0.33.0
magnum_capi_vexxhost_git_repo: "{{ openstack_github_base_url | default('https://github.com') ~ '/vexxhost/magnum-cluster-api' }}"

Run the deployment

For a new deployment

Run the OSA setup playbooks as usual, following the normal deployment guide.

For an existing deployment

Create the k8s control plane containers

openstack-ansible openstack.osa.containers_lxc_create --limit k8s_all

Run the magnum-cluster-api deployment

openstack-ansible openstack.osa.k8s

Add the magnum-cluser-api driver to the magnum service

openstack-ansible openstack.osa.magnum

Optionally run a functional test of magnum-cluster-api

TODO: This is currently available to zuul CI only

Use Magnum to create a workload cluster

Upload Images

Create a cluster template

Create a workload cluster

Optional Components

Use of magnum-cluster-api-proxy

As the control plane k8s cluster need to access a k8s control plane of tenant cluster for it’s further configuration, the only way to do it out of the box is through the public network (Floating IP). This means, that API of the k8s control plane must be globally reachable, which posses a security threat to such tenant clusters.

On order to solve the issue and provide access for the control plane k8s cluster to tenant clusters inside their internal networks a proxy service is introduced.

#.. image:: assets/magnum_capi_proxy.drawio.png # :scale: 100 % # :alt: Cluster Network Connectivity # :align: center

Proxy service must be spawned on hosts, where Neutron Metadata agents are spawned. For LXB/OVS these are members of neutron-agent_hosts, while for OVN the service should be installed to all compute_hosts (or neutron_ovn_controller).

The service will configure own HAProxy instance and create backends for managed k8s clusters to point inside corresponding network namespaces. Service does not spawn own namespaces, but leverages already existing metadata namespaces to get connection to the Load Balancer inside the tenant network.

Configuration of the service is relatively trivial:

# Define a group of hosts where to install the service.
# OVN: compute_hosts / neutron_ovn_controller
# OVS/LXB: neutron_metadata_agent
mcapi_vexxhost_proxy_hosts: compute_hosts
# Define address and port HAProxy instance to listen on
mcapi_vexxhost_proxy_environment:
   PROXY_BIND: "{{ management_address }}"
   PROXY_PORT: 44355

Also, in case of proxy service deployment, ensure that variable magnum_magnum_cluster_api_git_install_branch is defined for the mcapi_vexxhost_proxy_hosts as well, or align value of the magnum_magnum_cluster_api_git_install_branch with mcapi_vexxhost_proxy_install_branch to avoid conflicts caused by different versions of driver used.

Once configuration is complete, you can run the playbook:

openstack-ansible osa_ops.mcapi_vexxhost.mcapi_proxy

Deploy the workload clusters with a local registry

TODO - describe how to do this

Deploy the control plane cluster from a local registry

TODO - describe how to do this

Troubleshooting

Local testing

An OpenStack-Ansible all-in-one configured with Magnum and Octavia is capable of running a functioning magnum-cluster-api deployment.

Sufficient memory should be available beyond the minimum 8G usually required for an all-in-one. A multinode workload cluster may require nova to boot several Ubuntu images in addition to an Octavia loadbalancer instance. 64G would be an appropriate amount of system RAM.

There also must be sufficient disk space in /var/lib/nova/instances to support the required number of instances - the normal minimum of 60G required for an all-in-one deployment will be insufficient, 500G would be plenty.