Train Series Release Notes

9.1.0

New Features

  • Add fedora coreos driver. To deploy clusters with fedora coreos operators or users need to add os_distro=fedora-coreos to the image. The scripts to deploy kubernetes on top are the same with fedora atomic. Note that this driver has selinux enabled.

  • Along with the kubernetes version upgrade support we just released, we’re adding the support to upgrade the operating system of the k8s cluster (including master and worker nodes). It’s an inplace upgrade leveraging the atomic/ostree upgrade capability.

  • Cluster upgrade API supports upgrading specific nodegroups in kubernetes clusters. If a user chooses a default nodegroup to be upgraded, then both of the default nodegroups will be upgraded since they are in one stack. For non-default nodegroups users are allowed to use only the cluster template already set in the cluster. This means that the cluster (default nodegroups) has to be upgraded on the first hand. For now, the only label that is taken into consideration during upgrades is the kube_tag. All other labels are ignored.

  • Choose whether system containers etcd, kubernetes and the heat-agent will be installed with podman or atomic. This label is relevant for k8s_fedora drivers.

    k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used pulling containers from docker.io/openstackmagnum. use_podman=true is accepted as well, which will pull containers by k8s.gcr.io.

    k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.

    Note that, to use kubernetes version greater or equal to v1.16.0 with the k8s_fedora_atomic_v1 driver, you need to set use_podman=true. This is necessary since v1.16 dropped the –containerized flag in kubelet. https://github.com/kubernetes/kubernetes/pull/80043/files

Known Issues

  • The startup of the heat-container-agent uses a workaround to copy the SoftwareDeployment credentials to /var/lib/cloud/data/cfn-init-data. The fedora coreos driver requires heat train to support ignition.

9.0.0

New Features

  • Add information about the cluster in magnum event notifications. Previously the CADF notification’s target ID was randomly generated and no other relevant info about the cluster was sent. Cluster details are now included in the notifications. This is useful for other OpenStack projects like Searchlight or third party projects that cache information regarding OpenStack objects or have custom actions running on notification. Caching systems can now efficiently update one single object (e.g. cluster), while without notifications they need to periodically retrieve object list, which is inefficient.

  • When using a public cluster template, user still need the capability to reuse their existing network/subnet, and they also need to be able to turn of/off the floating IP to overwrite the setting in the public template. Now this is supported by adding those three items as parameters when creating cluster.

  • Support boot from volume for Kubernetes all nodes (master and worker) so that user can create a big size root volume, which could be more flexible than using docker_volume_size. And user can specify the volume type so that user can leverage high performance storage, e.g. NVMe etc. And a new label etcd_volme_type is added as well so that user can set volume type for etcd volume. If the boot_volume_type or etcd_volume_type are not passed by labels, Magnum will try to read them from config option default_boot_volume_type and default_etcd_volume_type. A random volume type from Cinder will be used if those options are not set.

  • Add nginx as an additional Ingress controller option for Kubernetes. Installation is done via the upstream nginx-ingress helm chart, and selection can be done via label ingress_controller=nginx.

  • Now the fedora atomic Kubernetes driver can support rolling upgrade for k8s version change or the image change. User can call command openstack coe cluster upgrade <cluster ID> <new cluster template ID> to upgrade current cluster to the new version defined in the new cluster template. At this moment, only the image change and the kube_tag change are supported.

  • k8s_fedora_atomic_v1 Add PodSecurityPolicy for privileged pods. Use privileged PSP for calico and node-problem-detector. Add PSP for flannel from upstream.

  • Added label traefik_ingress_controller_tag to enable specifying traefik container version.

  • Using Node Problem Detector, Draino and AutoScaler to support auto healing for K8s cluster, user can use a new label “auto_healing_enabled’ to turn on/off it.

    Meanwhile, a new label “auto_scaling_enabled” is also introduced to enable the capability to let the k8s cluster auto scale based its workload.

  • A new tag auto_healing_controller is introduced to allow the user to choose the auto-healing service when auto_healing_enabled is specified in the labels, draino and magnum-auto-healer are supported for now. Another label magnum_auto_healer_tag is also added to specify the magnum-auto-healer image tag.

  • Support multi DNS server when creating template. User can use a comma delimited ipv4 address list to specify multi dns server, for example “8.8.8.8,114.114.114.114”

  • A new API endpoint <ClusterID>/actions/upgrade is added to support rolling upgrade the base OS of nodes and the version of Kubernetes. More details please refer the API Refreence document.

Known Issues

  • With the new config option keystone_auth_default_policy, cloud admin can set a default keystone auth policy for k8s cluster when the keystone auth is enabled. As a result, user can use their current keystone user to access k8s cluster as long as they’re assigned correct roles, and they will get the pre-defined permissions defined by the cloud provider.

  • There is a known issue when doing image(operating system) upgrade for k8s cluster. Because when doing image change for a server resource, Heat will trigger the Nova rebuild to rebuild the instnace and there is no chance to call kubectl drain to drain the node, so there could be a very minior downtime when doing(starting to do) the rebuild and meanwhile a request is routed to that node.

  • Minion is not a good name for k8s worker node anymore, now it has been replaced in the fedora atomic driver with ‘node’ to align with the k8s terminologies. So the server name of a worker will be something like k8s-1-lnveovyzpreg-node-0 instead of k8s-1-lnveovyzpreg-worker-0.

Security Issues

  • Regarding passwords, they could be guessed if there is no faild-to-ban-like solution. So it’d better to disable it for security reasons. It’s only effected for fedora atomic images.

Bug Fixes

  • There shouldn’t be a default value for floating_ip_enabled when creating cluster. By default, when it’s not set, the cluster’s floating_ip_enabled attribute should be set with the value of cluster template. It’s fixed by removing the default value from Magnum API.

  • The coe_version was out of sync with the k8s version deployed for the cluster. Now it is fixed by making sure the kube_version is consistent with the kube_tag when creating the cluster and upgrading the cluster.

  • Fixed an issue that applications running on master nodes which rely on network connection keep restarting because of timeout or connection lost, by making calico devices unmanaged in NetworkManager config on master nodes.

  • Now the resize and upgrade action of cluster will return cluster ID to be consistent with other actions of Magnum cluster.

  • Traefik container now defaults to a fixed tag (v1.7.10) instead of tag (latest)

Other Notes

  • Now the heat-container-agent default tag for Train release is train-dev.