Zed Series Release Notes¶
etcd is now exposed internally via HAProxy on
Services using etcd3gw via tooz now use etcd via haproxy. This removes a single point of failure, where we hardcoded the first etcd host for backend_url.
Default tags of
glance_tls_proxyhave been changed to
haproxy_tag, as both services are using
haproxycontainer image. Any custom tag overrides for those services should be altered before upgrade.
Set the etcd internal hostname and cacert for tls internal enabled deployments. This allows services to work with etcd when coordination is enabled for TLS interal deployments. Without this fix, the coordination backend fails to connect to etcd and the service itself crashes.
Adds the flag
om_enable_rabbitmq_high_availablity. Setting this to
truewill enable both durable queues and classic mirrored queues in RabbitMQ. Note that classic queue mirroring and transient (aka non-durable) queues are deprecated and subject to removal in RabbitMQ version 4.0 (date of release unknown). Changes the pattern used in classic mirroring to exclude some queue types. This pattern is
ovn-monitor-allvariable. A boolean value that tells if ovn-controller should unconditionally monitor all records in OVS databases. Setting
ovn-monitor-allvariable to ‘true’ will remove some CPU load from OVN SouthBound DB but will effect with more updates comming to ovn-controller. Might be helpfull in large deployments with many compute hosts.
kolla_dockermodule which did not take into account the common_options parameter, so there were always module’s default values. LP#2003079
The value of
[oslo_messaging_rabbit] heartbeat_in_pthreadis explicitly set to either
truefor wsgi applications, or
Fix issue with octavia config generation when using
genconfigcommand. Note that access to the OpenStack API is necessary for Octavia auto configuration to work, even when generating config. See LP#1987299 for more details.
Fixes OVN deployment order - as recommended in OVN docs. LP#1979329
Fixes an issue where some prechecks would fail or not run when running in check mode. LP#2002657
Prevent haproxy-config role from attempting to configure firewalld during a kolla-ansible genconfig. LP#2002522
Adds a set of variables to control the cinder backend name, as used in cinder.conf. This is the name you use when setting the volume_backend_name property on volume types. Details are in the cinder guide section of the documentation.
Enables configuring firewalld for external API services. Extracts the required services and checks the external port, then adds the ports to a firewalld zone. Assumes that firewalld has been installed and configured beforehand. The variable disable_firewall, is disabled by default to preserve backwards compatibility. But its good practice to have the system firewall configured.
Adds support for deploying OpenSearch and OpenSearch dashboards. These services directly replace ElasticSearch and Kibana which are now end-of-life. Support for sending logs to a remote ElasticSearch (or OpenSearch) cluster is maintained.
Allow cinder-volume to be configured to use Pure Storage FlashArray with either the iSCSI or FC driver.
Adds possibility for inlcuding custom alert notification templates with Prometheus Alertmanager.
Adds a new, disabled by default, option for Prometheus OpenStack exporter, named “enable_prometheus_openstack_exporter_external”. This option allows exposing OpenStack exporter through HAProxy, and may be used to expose OpenStack metrics to an existing Prometheus server outside the OpenStack cloud, instead of using the default one provided by OpenStack.
Adds a new flag,
openvswitch_ovs_vsctl_wrapper_enabledwhich will install a wrapper script to
/usr/bin/ovs-vsctlto docker exec into the openvswitchd container.
prometheus_scrape_intervalconfiguration option. The default is set to
60s. This configures the default scrape interval for all jobs.
bifrost_deploy_verbosityparameter. It allows to change the verbosity of the Bifrost bootstrap task.
-vvvvis a default value.
Adds support for configuring the CloudKitty fetcher using
New switches added to control deployment of the Masakari monitors. The deployment of each type of monitors can be controlled individually via
enable_masakari_hostmonitor. By default, both are set to
truewhen the deployment of the Masakari is enabled via
Sanity checks have been removed. These “smoke tests” orignially were implemented for barbican, cinder, glance and keystone.
Kolla Ansible now supports failing execution early if fact collection fails on any of the hosts. This is to avoid late failures due to missing facts (especially cross-host). This is possible by setting
kolla_ansible_setup_any_errors_fatal: true. Do note this still supports host fact caching and it will not affect scenarios with all facts cached (as there is no task to fail).
Adds a new variable
It is dictionary whose keys and respective values are added to the pushgateway’s URL, checking that no “None” value is being set.
For example, the following configurations:
ceilometer_prometheus_pushgateway_host: "127.0.0.1" ceilometer_prometheus_pushgateway_port: "9091" ceilometer_prometheus_pushgateway_options: timeout: 180 max_retries: verify_ssl: yes
Result in the following URL:
prometheus://127.0.0.1:9091/ \ metrics/job/openstack-telemetry/?timeout=180&verify_ssl=True
Adds support for managing resource providers via config files.
Adds support for setting up arbitrary HAProxy services in active/passive mode.
Implements container healthchecks for mariadb-server service. See blueprint
Adds support for configuring a coordination backend for Ironic Inspector via the
ironic_coordination_backendvariable. Possible values are
Adds support for multiple DHCP ranges in the Ironic Inspector DHCP server.
ironic_http_interface/ironic_http_interface_addressparameters to set the addresses for the
Support for both PXE and iPXE enabled in Ironic at the same time.
Adds variables to configure whether monitoring services should be exposed externally:
Adds support for configuring a number of UDP workers for Designate’s bind9 backend via the
Adds support for configuring the Openstack Compute API microversion used by the OpenStack exporter for Prometheus using the
prometheus_openstack_exporter_compute_api_versionvariable. The default value is
latest, matching the default behaviour of the exporter.
ovn-openflow-probe-intervalvariable. It sets the inactivity probe interval of the OpenFlow connection to the OpenvSwitch integration bridge, in seconds. If the value is zero, it disables the connection keepalive feature. The default value is 60 seconds.
Adds support for deploying
prometheus-msteams, which can be used to forward Prometheus Alertmanager notifications to Microsoft Teams. It is enabled by setting
Adds ability to configure ProxySQL’s max replication lag via configuration value
proxysql_backend_max_replication_lagwhich is set to default value as per documentation. If it is greater than 0, ProxySQL will regularly monitor replication lag and if it goes beyond the configured threshold it will temporary shun the host until replication catches up. Please see the official upgrade notes for more detail.
If you are currently deploying ElasticSearch with Kolla Ansible, you should backup the data before starting the upgrade. The contents of the ElasticSearch data volume will be automatically moved to the OpenSearch volume. The ElasticSearch, ElasticSearch Curator and Kibana containers will be removed automatically. The inventory must be updated so that the
elasticsearchgroup is renamed to
opensearch, and the kibana group is renamed to
Enable TLS by default in Bifrost. Bifrost is now configured to enable TLS for the services it deploys, and generate self-signed certificates for them. TLS may be disabled by setting
image_upload_use_cinder_backend = Trueis no longer set on the Cinder’s default Ceph RBD backend, the common upstream default is now used (
Falsecurrently). See also LP#1991516
Kolla Ansible no longer sets
show_multiple_locations = Trueby default when Glance’s Ceph RBD backend is enabled. This was applied as a fix but operators must note that this, in turn, disables the Cinder’s and Nova’s optimisations. On the other hand, these optimisations might have been causing other operators’ trouble. Please see the linked bug report. Operators relying on this feature can set the flag themselves using service config overrides. LP#1992153
Modifies the default value of
masakari-hostmonitoris enabled. LP#1934149
Sanity checks have been removed because they were broken.
The Nova legacy service and its endpoints are no longer advertised by default. To revert to the old behaviour, please set
keystone_token_providerdoes not exist anymore, because there is no alternative.
OpenStack Monasca is no longer supported by Kolla Ansible. Support for deploying
zookeeperhas been dropped since they have been used only with Monasca. Post-upgrade cleanup of those services can be done using
kolla-ansible monasca_cleanup- for details please see Monasca guide
Modifies the default lease time of the Ironic Inspector DHCP server to 10 minutes. This is small enough to use small pools of IP addresses for inspection but gives more room for the inspection to succeed. This default can be changed globally via
ironic_dnsmasq_dhcp_default_lease_timevariable or per range via
ironic_dnsmasq_default_gatewayin favour of
ironic_dnsmasq_dhcp_ranges. For example, if you have:
ironic_dnsmasq_dhcp_range: "10.42.0.2,10.42.0.254,255.255.255.0" ironic_dnsmasq_default_gateway: "10.42.0.1"
replace it with:
ironic_dnsmasq_dhcp_ranges: - range: "10.42.0.2,10.42.0.254,255.255.255.0" routers: "10.42.0.1"
Ironic volumes related to PXE (TFTP) and iPXE & direct deploy (HTTP) are refactored to share a common parent path at
/var/lib/ironic. This is done to support both PXE and iPXE at the same time. Operators doing advanced customisations might need to review the relevant defaults section.
Upgrades of Ironic will now wait for nodes in
waitstates to change their state. This is to improve the user experience by avoiding breaking processes being waited on. This can be disabled by setting
Ironic containers related to PXE (TFTP) and iPXE & direct deploy (HTTP) are renamed to better reflect their role:
ironic_http. Operators doing advanced customisations might need to review the relevant defaults section. Additionally, their respective host groups have changed analogously:
The Keystone’s admin endpoint is no longer created by default. Operators of existing deployments may wish to remove it after the upgrade completes. Operators having external services relying on the availability of the Keystone’s admin endpoint may set
trueto keep creating the admin endpoint but such support will be removed after Zed.
Keystone’s admin interface no longer points to a separate port. On upgrade, the port is preserved to maintain the intermediate compatibility. Users are advised to run the deploy and post-deploy commands afterwards to ensure port’s cleanup. For more information, please refer to the docs. Please note that the relevant variables
admin_protocolare no longer used and are deprecated for removal after Zed. Please cease their usage in your customisations.
Starting with Zed, Neutron marked the
linuxbridgeML2 driver experimental. The Kolla team has decided to honour the upstream’s decision and make sure users are aware they are using a badly supported driver instead of having it configured out of the box. Thus, all users of this driver are advised to get acquainted with Neutron docs and proceed accordingly.
ovnrole has been split into
ovn-dbroles, therefore users that have
ovn_extra_volumesconfigured need to adapt their config to use
For ovn the default value of openflow-probe-interval was changed to 60 seconds. Use the
ovn-openflow-probe-intervalvariable to override.
Prometheus has been switched to active/passive mode. This is enabled by default but can be turned off by setting
no. See bug 1928193.
Prometheus Alertmanager has been switched to active/passive mode. This is enabled by default but can be turned off by setting
enable_ironic_ipxevariable has been removed. The iPXE still works by default and it can be disabled by setting the more-aptly-named
storage_interfacevariable has been removed. Please set the
Deprecated sysctl knobs related to
infuxdb_internal_endpointhas been fixed to
influxdb_internal_endpoint. Operators might need to review the relevant variable.
enable_ironic_ipxeis deprecated in favour of
ironic_dnsmasq_serve_ipxewhich reflects the effect better.
enable_ironic_ipxewill be removed in Zed.
enable_ironic_pxe_uefiis deprecated and will be removed in Zed. This variable is not documented and results in a broken PXE setup for Ironic Inspector. The recommended way to support EFI/UEFI deployments in Ironic Inspector is to stay with the recommended default of iPXE in Ironic Inspector (see docs on
In the April 2022 PTG the deprecation and removal of the sanity checks has been confirmed. Therefore the usage of
is not possible any more.
admin_protocolare deprecated for removal after Zed.
Kolla Ansible used to run Ironic’s tftpd as an (unprivileged) root user. Now, it will explicitly use the nobody user.
The scrape interval for the prometheus data source in grafana is now to set to
prometheus_scrape_interval. This fixes issues with dashboards that use the
$__rate_intervalgrafana variable as the default scrape interval of 60s does not match the grafana default of 15s.
Fixes an issue in the
bifrost_deploycontainer where passwords generated by Bifrost were not persistent beyond the lifetime of the container. This is generally not a problem unless you access the Ironic or Inspector APIs outside of the Bifrost playbooks. LP#1983356
Fixes the issue of exponential growth of /run/openvswitch mounts when kolla-toolbox container is restarted. LP#1979295
Fixes LP#1982777. Set multipathd user_friendly_names to “no” to make os-brick able to resize volumes online. Adds ability to override multipathd config.
Fixed bug #1987982. This bug caused the database log_bin_trust_function_creators variable not to be set back to “OFF” after a keystone upgrade.
image_upload_use_cinder_backend = Trueis no longer set on the Cinder’s default Ceph RBD backend. Related ERRORs and WARNINGs in Cinder and Glance logs are prevented. LP#1991516
Kolla Ansible no longer sets
show_multiple_locations = Trueby default when Glance’s Ceph RBD backend is enabled. This caused various issues with the services running with the recommended Ceph permissions. LP#1992153
Fixes missing logrotate configuration for proxysql logs. LP#1995248
Fixes an issue when
masakari-hostmonitoris enabled while corosync/pacemaker is not deployed. LP#1934149
Fixes an issue with recovering multi-node MariaDB Galera cluster.
Adds configuration necessary for application credential access rules to properly function. LP#1965111
Fixes an issue with AlertManager external Web URL being unconfigurable. A new variable
prometheus_alertmanager_external_urlhas been introduced that users can use to set web.external-url to public.
Fixes an issue where Ironic Inspector could be configured without authentication in a multi-region environment in a region without a local Keystone service.
Fixes Keystone OIDC failing to validate JWT because of missing key on Azure auth-oidc endpoint. Adds new variable containing JWKS uri that delivers missing keys. LP#1990375
[taskflow]section in masakari.conf.j2 LP#1966536
Fixes Zun capsules loosing network namespaces after restarting zun_cni_daemon container
Under circumstances of extended disruption to the Fluentd-ElasticSearch central logging pipeline, it is possible to generate a sufficient buffer of unsent log data that takes longer than the default Fluentd request timeout (default 5 seconds) to transfer the buffer. The default request timeout value is raised to
60s, and made configurable using new parameter
prometheus_openstack_exporter_timeoutto 45 seconds to reduce the odds of scrape failures on deployments with large number of OpenStack resources. LP#1976629
Fixes Ironic API healthchecks when backend TLS encryption is enabled. LP#1990819
dhcp-sequential-ipconfiguration option from
ironic_dnsmasqto avoid a race condition offering the same IP address to multiple hosts being inspected at the same time.
Fixes an issue with
ironic-inspectorusing the wrong option to configure the interface used to communicate with the Ironic API. LP#1995246
Fixes an issue with
ironic-neutron-agentusing the wrong option to configure the interface used to communicate with the Ironic API. LP#1990675
ironic_enabled_notification_topicsis set to
ironic_notification_levelis set to
infoin order to ensure that Ironic actually sends out notifications.
See bug 1969826 for details.
Fixes monitor: kolla be added in external_labels by default. Prometheus default config should not include environment-specific details. In this patch, modify external_labels be optional, we can add any <labelname>: <labelvalue> in external_labels. LP#1944699
Fixes an issue with Masakari instance monitor when libvirt SASL is enabled. libvirt SASL was enabled by default in a recent change to Kolla Ansible. LP#1965754
Fixes an issue where a failure of any Nova compute service to register itself would cause only the host querying the nova API to fail. Now, only hosts that fail to register will fail the Kolla Ansible run. Alternatively, to fail all hosts in a cell when any compute service fails to register, set
The prometheus openstack exporters are now behind haproxy, providing a unique time series in the prometheus database. Also ensures that only one exporter queries the openstack APIs at any given time interval. With the previous behavior each openstack exporter was scraped at the same time. This caused each exporter to query the openstack APIs simultaneously introducing unneccesary load and duplicate time series in the prometheus database due to the instance label being unique for each exporter. LP#1972818
Fixes an issue with misaligned data points in grafana when loadbalancing over multiple prometheus server instances. See bug 1928193.
Fixes an issue with Alertmanager silence creation leading to a 404 page. LP#1987866
sets balancing algorithm to round-robin for horizon if memcached is enabled LP#1990523
Rocky Linux 9 based images are now recommended (instead of CentOS Stream ones).