Train Series Release Notes

9.4.0

New Features

  • The default 10 seconds health polling interval is too frequent for most of the cases. Now it has been changed to 60s. A new config health_polling_interval is supported to make the interval configurable. Cloud admin can totally disable the health polling by set a negative value for the config.

Upgrade Notes

  • If it’s still preferred to have 10s health polling interval for Kubernetes cluster. It can be set by config health_polling_interval under kubernetes section.

Bug Fixes

  • Fix an issue with private clusters getting stuck in CREATE_IN_PROGRESS status where floating_ip_enabled=True in the cluster template but this is disabled when the cluster is created.

  • The taint of master node kubelet has been improved to get the conformance test (sonobuoy) passed.

9.3.0

New Features

  • Add cinder_csi_enabled label to support out of tree Cinder CSI.

  • Now the Fedora CoreOS driver can support the sha256 verification for the hyperkube image when bootstraping the Kubernetes cluster.

Bug Fixes

  • Fixed the usage of cert_manager_api=true making cluster creation fail due to a logic lock between kubemaster.yaml and kubecluster.yaml

  • This proxy issue of Prometheus/Grafana script has been fixed.

9.2.0

New Features

  • Added label heapster_enabled to control heapster installation in the cluster.

  • Installs the metrics-server service that is replacing kubernetes deprecated heapster as a cluster wide metrics reporting service used by schedulling, HPA and others. This service is installed and configured using helm and so tiller_enabled flag must be True. The label metrics_server_chart_tag can be used to specify the stable/metrics-server chart tag to be used. The label metrics_server_enabled is used to enable disable the installation of the metrics server (default: true).

  • Added custom.metrics.k8s.io API installer by means of stable/prometheus-adapter helm chart. The label prometheus_adapter_enabled (default: true) controls configuration. You can also use prometheus_adapter_chart_tag to select helm chart version, and prometheus_adapter_configmap if you would like to setup your own metrics (specifying this other than default overwrites default configurations). This feature requires the usage of label monitoring_enabled=true.

Upgrade Notes

  • nginx-ingress-controller QoS changed from Guaranteed to Burstable. Priority class ‘system-cluster-critical’ or higher for nginx-ingress-controller.

Bug Fixes

  • A regression issue about downloading images has been fixed. Now both Fedora Atomic driver and Fedora CoreOS driver can support using proxy in template to create cluster.

  • nginx-ingress-controller requests.memory increased to 256MiB. This is a result of tests that showed the pod getting oom killed by the node on a relatively generic use case.

  • k8s-keystone-auth now uses the upstream k8scloudprovider docker repo instead of the openstackmagnum repo.

  • Fixes the next url in the list nodegroups API response.

  • Bump up prometheus operator chart version to 8.2.2 so that it is compatible with k8s 1.16.x.

  • Bump up traefik to 1.7.19 for compatibility with Kubernetes 1.16.x.

9.1.0

New Features

  • Add fedora coreos driver. To deploy clusters with fedora coreos operators or users need to add os_distro=fedora-coreos to the image. The scripts to deploy kubernetes on top are the same with fedora atomic. Note that this driver has selinux enabled.

  • Along with the kubernetes version upgrade support we just released, we’re adding the support to upgrade the operating system of the k8s cluster (including master and worker nodes). It’s an inplace upgrade leveraging the atomic/ostree upgrade capability.

  • Cluster upgrade API supports upgrading specific nodegroups in kubernetes clusters. If a user chooses a default nodegroup to be upgraded, then both of the default nodegroups will be upgraded since they are in one stack. For non-default nodegroups users are allowed to use only the cluster template already set in the cluster. This means that the cluster (default nodegroups) has to be upgraded on the first hand. For now, the only label that is taken into consideration during upgrades is the kube_tag. All other labels are ignored.

  • Choose whether system containers etcd, kubernetes and the heat-agent will be installed with podman or atomic. This label is relevant for k8s_fedora drivers.

    k8s_fedora_atomic_v1 defaults to use_podman=false, meaning atomic will be used pulling containers from docker.io/openstackmagnum. use_podman=true is accepted as well, which will pull containers by k8s.gcr.io.

    k8s_fedora_coreos_v1 defaults and accepts only use_podman=true.

    Note that, to use kubernetes version greater or equal to v1.16.0 with the k8s_fedora_atomic_v1 driver, you need to set use_podman=true. This is necessary since v1.16 dropped the –containerized flag in kubelet. https://github.com/kubernetes/kubernetes/pull/80043/files

Known Issues

  • The startup of the heat-container-agent uses a workaround to copy the SoftwareDeployment credentials to /var/lib/cloud/data/cfn-init-data. The fedora coreos driver requires heat train to support ignition.

Bug Fixes

  • For k8s_coreos set REQUESTS_CA for heat-agent. The heat-agent as a python service needs to use the ca bundle of the host.

  • core-podman Mount os-release properly To display the node OS-IMAGE in k8s properly we need to mount /usr/lib/os-release, /ets/os-release is just a symlink.

9.0.0

New Features

  • Add information about the cluster in magnum event notifications. Previously the CADF notification’s target ID was randomly generated and no other relevant info about the cluster was sent. Cluster details are now included in the notifications. This is useful for other OpenStack projects like Searchlight or third party projects that cache information regarding OpenStack objects or have custom actions running on notification. Caching systems can now efficiently update one single object (e.g. cluster), while without notifications they need to periodically retrieve object list, which is inefficient.

  • When using a public cluster template, user still need the capability to reuse their existing network/subnet, and they also need to be able to turn of/off the floating IP to overwrite the setting in the public template. Now this is supported by adding those three items as parameters when creating cluster.

  • Support boot from volume for Kubernetes all nodes (master and worker) so that user can create a big size root volume, which could be more flexible than using docker_volume_size. And user can specify the volume type so that user can leverage high performance storage, e.g. NVMe etc. And a new label etcd_volme_type is added as well so that user can set volume type for etcd volume. If the boot_volume_type or etcd_volume_type are not passed by labels, Magnum will try to read them from config option default_boot_volume_type and default_etcd_volume_type. A random volume type from Cinder will be used if those options are not set.

  • Add nginx as an additional Ingress controller option for Kubernetes. Installation is done via the upstream nginx-ingress helm chart, and selection can be done via label ingress_controller=nginx.

  • Now the fedora atomic Kubernetes driver can support rolling upgrade for k8s version change or the image change. User can call command openstack coe cluster upgrade <cluster ID> <new cluster template ID> to upgrade current cluster to the new version defined in the new cluster template. At this moment, only the image change and the kube_tag change are supported.

  • k8s_fedora_atomic_v1 Add PodSecurityPolicy for privileged pods. Use privileged PSP for calico and node-problem-detector. Add PSP for flannel from upstream.

  • Added label traefik_ingress_controller_tag to enable specifying traefik container version.

  • Using Node Problem Detector, Draino and AutoScaler to support auto healing for K8s cluster, user can use a new label “auto_healing_enabled’ to turn on/off it.

    Meanwhile, a new label “auto_scaling_enabled” is also introduced to enable the capability to let the k8s cluster auto scale based its workload.

  • A new tag auto_healing_controller is introduced to allow the user to choose the auto-healing service when auto_healing_enabled is specified in the labels, draino and magnum-auto-healer are supported for now. Another label magnum_auto_healer_tag is also added to specify the magnum-auto-healer image tag.

  • Support multi DNS server when creating template. User can use a comma delimited ipv4 address list to specify multi dns server, for example “8.8.8.8,114.114.114.114”

  • A new API endpoint <ClusterID>/actions/upgrade is added to support rolling upgrade the base OS of nodes and the version of Kubernetes. More details please refer the API Refreence document.

Known Issues

  • With the new config option keystone_auth_default_policy, cloud admin can set a default keystone auth policy for k8s cluster when the keystone auth is enabled. As a result, user can use their current keystone user to access k8s cluster as long as they’re assigned correct roles, and they will get the pre-defined permissions defined by the cloud provider.

  • There is a known issue when doing image(operating system) upgrade for k8s cluster. Because when doing image change for a server resource, Heat will trigger the Nova rebuild to rebuild the instnace and there is no chance to call kubectl drain to drain the node, so there could be a very minior downtime when doing(starting to do) the rebuild and meanwhile a request is routed to that node.

  • Minion is not a good name for k8s worker node anymore, now it has been replaced in the fedora atomic driver with ‘node’ to align with the k8s terminologies. So the server name of a worker will be something like k8s-1-lnveovyzpreg-node-0 instead of k8s-1-lnveovyzpreg-worker-0.

Security Issues

  • Regarding passwords, they could be guessed if there is no faild-to-ban-like solution. So it’d better to disable it for security reasons. It’s only effected for fedora atomic images.

Bug Fixes

  • There shouldn’t be a default value for floating_ip_enabled when creating cluster. By default, when it’s not set, the cluster’s floating_ip_enabled attribute should be set with the value of cluster template. It’s fixed by removing the default value from Magnum API.

  • The coe_version was out of sync with the k8s version deployed for the cluster. Now it is fixed by making sure the kube_version is consistent with the kube_tag when creating the cluster and upgrading the cluster.

  • Fixed an issue that applications running on master nodes which rely on network connection keep restarting because of timeout or connection lost, by making calico devices unmanaged in NetworkManager config on master nodes.

  • Now the resize and upgrade action of cluster will return cluster ID to be consistent with other actions of Magnum cluster.

  • Traefik container now defaults to a fixed tag (v1.7.10) instead of tag (latest)

Other Notes

  • Now the heat-container-agent default tag for Train release is train-dev.