Xena Series Release Notes


New Features

  • Added two new flags to alter behaviour in RabbitMQ: * rabbitmq_message_ttl_ms, which lets you set a TTL on messages. * rabbitmq_queue_expiry_ms, which lets you set an expiry time on queues. See https://www.rabbitmq.com/ttl.html for more information on both.

Upgrade Notes

  • Now ironic_tftp service does not bind on, by default it uses ip address of the api_interface. To revert to the old behaviour, please set ironic_tftp_interface_address: in globals.yml.

  • Influxdb variable infuxdb_internal_endpoint has been fixed to influxdb_internal_endpoint. Operators might need to review the relevant variable.

Security Issues

  • The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd commands now creates or updates passwords.yml with correct permissions. Also they display warning message about incorrect permissions.

  • Restrict the access to the http Openstack services exposed /server-status by default through the HAProxy on the public endpoint. Fixes issue for Ubuntu/Debian installations. RockyLinux/CentOS not affected. LP#1996913

Bug Fixes

  • The precheck for RabbitMQ failed incorrectly when kolla_externally_managed_cert was set to true. LP#1999081

  • Fixes create sasl account before config file is ready. LP#2015589

  • Fixes an issue when Kolla is setting the producer tasks to None, and this disables all designate producer tasks. LP#1879557

  • Configuration of service user tokens for all Nova and Cinder services is now done automatically, to ensure security of block-storage volume data.

    See LP#[2004555] for more details.

  • Fixes ironic_tftp which binds to all ip addresses on the system. Added ironic_tftp_interface, ironic_tftp_address_family and ironic_tftp_interface_address parameters to set the address for the ironic_tftp service. LP#2024664

  • Adds configuration necessary for application credential access rules to properly function. LP#1965111

  • Fixes deployment when using Ansible check mode. LP#2002661

  • Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080

  • When upgrading or deploying RabbitMQ, the policy ha-all is cleared if om_enable_rabbitmq_high_availability is set to false.


New Features

  • Since CVE-2022-29404 is fixed the default value for the LimitRequestBody directive in the Apache HTTP Server has been changed from 0 (unlimited) to 1073741824 (1 GiB). This limits the size of images (for example) uploaded in Horizon. Now this limit can be configured via horizon_httpd_limitrequestbody. LP#2012588

  • etcd is now exposed internally via HAProxy on etcd_client_port.

  • The config option rabbitmq_ha_replica_count is added, to allow for changing the replication factor of mirrored queues in RabbitMQ. While the flag is unset, the queues are mirrored across all nodes using “ha-mode”:”all”. Note that this only has an effect if the flag ` om_enable_rabbitmq_high_availability` is set to True, as otherwise queues are not mirrored.

  • The config option rabbitmq_ha_promote_on_shutdown has been added, which allows changing the RabbitMQ definition ha-promote-on-shutdown. By default ha-promote-on-shutdown is “when-synced”. We recommend changing this to be “always”. This basically means we don’t mind losing some messages, instead we give priority to rabbitmq availability. This is most relevant when restarting rabbitmq, such as when upgrading. Note that setting the value of this flag, even to the default value of “when-synced”, will cause RabbitMQ to be restarted on the next deploy. For more details please see: https://www.rabbitmq.com/ha.html#cluster-shutdown

  • Services using etcd3gw via tooz now use etcd via haproxy. This removes a single point of failure, where we hardcoded the first etcd host for backend_url.

Upgrade Notes

  • ironic.conf now sets [pxe]\kernel_append_params instead of [pxe]\pxe_append_params which has been deprecated. Please override the new config option if you are overriding the old one.

  • Default tags of neutron_tls_proxy and glance_tls_proxy have been changed to haproxy_tag, as both services are using haproxy container image. Any custom tag overrides for those services should be altered before upgrade.

Bug Fixes

  • Set the etcd internal hostname and cacert for tls internal enabled deployments. This allows services to work with etcd when coordination is enabled for TLS interal deployments. Without this fix, the coordination backend fails to connect to etcd and the service itself crashes.

  • fix missing [taskflow] section in masakari.conf.j2 LP#1966536

  • When upgrading RabbitMQ, the policy ha-all was cleared only if rabbitmq_remove_ha_all_policy is set to true. Now, om_enable_rabbitmq_high_availability must also be set to false.


New Features

  • Adds the flag om_enable_rabbitmq_high_availablity. Setting this to true will enable both durable queues and classic mirrored queues in RabbitMQ. Note that classic queue mirroring and transient (aka non-durable) queues are deprecated and subject to removal in RabbitMQ version 4.0 (date of release unknown). Changes the pattern used in classic mirroring to exclude some queue types. This pattern is ^(?!(amq\\.)|(.*_fanout_)|(reply_)).*.

Bug Fixes

  • Fixes kolla_docker module which did not take into account the common_options parameter, so there were always module’s default values. LP#2003079

  • Fixes the baremetal role to avoid an error “apparmor_parser apparmor_parser –version failed” by installing apparmor package on debian like systems. LP#2004583

  • The value of [oslo_messaging_rabbit] heartbeat_in_pthread is explicitly set to either true for wsgi applications, or false otherwise.

  • Fix issue with octavia config generation when using octavia_auto_configure and the genconfig command. Note that access to the OpenStack API is necessary for Octavia auto configuration to work, even when generating config. See LP#1987299 for more details.

  • Fixes an issue where some prechecks would fail or not run when running in check mode. LP#2002657


Bug Fixes

  • Fixes an issue with ironic-inspector using the wrong option to configure the interface used to communicate with the Ironic API. LP#1995246


Upgrade Notes

  • image_upload_use_cinder_backend = True is no longer set on the Cinder’s default Ceph RBD backend, the common upstream default is now used (False currently). See also LP#1991516

Bug Fixes

  • image_upload_use_cinder_backend = True is no longer set on the Cinder’s default Ceph RBD backend. Related ERRORs and WARNINGs in Cinder and Glance logs are prevented. LP#1991516

  • Fixes Keystone OIDC failing to validate JWT because of missing key on Azure auth-oidc endpoint. Adds new variable containing JWKS uri that delivers missing keys. LP#1990375

  • Removes the dhcp-sequential-ip configuration option from ironic_dnsmasq to avoid a race condition offering the same IP address to multiple hosts being inspected at the same time.


Bug Fixes

  • Fixes an issue with AlertManager external Web URL being unconfigurable. A new variable prometheus_alertmanager_external_url has been introduced that users can use to set web.external-url to public.

  • Under circumstances of extended disruption to the Fluentd-ElasticSearch central logging pipeline, it is possible to generate a sufficient buffer of unsent log data that takes longer than the default Fluentd request timeout (default 5 seconds) to transfer the buffer. The default request timeout value is raised to 60s, and made configurable using new parameter fluentd_elasticsearch_request_timeout. LP#1983031

  • Fixes Ironic API healthchecks when backend TLS encryption is enabled. LP#1990819

  • Fixes an issue with ironic-neutron-agent using the wrong option to configure the interface used to communicate with the Ironic API. LP#1990675


Security Issues

  • Kolla Ansible used to run Ironic’s tftpd as an (unprivileged) root user. Now, it will explicitly use the nobody user.

Bug Fixes

  • Fixes 1982777. Set multipathd user_friendly_names to “no” to make os-brick able to resize volumes online. Adds ability to override multipathd config. LP#1982777

  • Fixed bug #1987982 This bug caused the database log_bin_trust_function_creators variable not to be set back to “OFF” after a keystone upgrade.

  • Fixes an issue where ping might not be installed on some systems, causing HAProxy prechecks to fail.

  • If ironic_enabled_notification_topics is set to true, ironic_notification_level is set to info in order to ensure that Ironic actually sends out notifications.

    See bug 1969826 for details.


New Features

  • Adds variables to configure whether monitoring services should be exposed externally:

    • enable_grafana_external

    • enable_kibana_external

    • enable_prometheus_alertmanager_external

Bug Fixes

  • Fixes an issue where Ironic Inspector could be configured without authentication in a multi-region environment in a region without a local Keystone service.


New Features

  • Adds support for configuring the Openstack Compute API microversion used by the OpenStack exporter for Prometheus using the prometheus_openstack_exporter_compute_api_version variable. The default value is 2.1 to keep metrics unchanged when using recent exporter releases.

Bug Fixes

  • Fixes the issue of exponential growth of /run/openvswitch mounts when kolla-toolbox container is restarted. LP#1979295

  • Fixes an issue with recovering multi-node MariaDB Galera cluster.

  • Increases prometheus_openstack_exporter_timeout to 45 seconds to reduce the odds of scrape failures on deployments with large number of OpenStack resources. LP#1976629


New Features

  • Deploys and configures a prometheus-libvirt-exporter image as part of the Prometheus monitoring stack.

  • Adds a tls_connect module to the Prometheus blackbox exporter. This can be used to test connectivity of TLS servers.

  • New switches added to control deployment of the Masakari monitors. The deployment of each type of monitors can be controlled individually via enable_masakari_instancemonitor and enable_masakari_hostmonitor. By default, both are set to true when the deployment of the Masakari is enabled via enable_masakari.

  • Implements container healthchecks for ironic-neutron-agent service. See blueprint

  • Adds support for libvirt SASL authentication. It is enabled by default. LP#1964013

  • Adds support for Rocky Linux 8 as Host OS.

Known Issues

  • Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. These will not be deleted by the new logrotate config and will have to be removed manually.

Upgrade Notes

  • RabbitMQ’s Prometheus plugin is no longer enabled by default if Prometheus is not deployed. If external Prometheus is used, you need to turn on rabbitmq_enable_prometheus_plugin to get old behaviour.

  • The addition of libvirt SASL authentication requires a new password in passwords.yml, libvirt_sasl_password. This may be generated using the existing kolla-genpwd and kolla-mergepwd tooling.

  • The addition of libvirt SASL authentication requires both the nova_libvirt and nova_compute containers to be updated simultaneously, using new images with the necessary Cyrus SASL dependencies, as well as configuration containing the SASL credentials.

  • It is no longer possible to override the removal of the Monasca Log Metrics service and it will be removed automatically if it hasn’t already been removed in the Wallaby release. It is up to the operator to remove any associated docker volumes.

  • update the default value of node_custom_config to {{ node_config }}/config, when specified using –configdir

Security Issues

  • Explicitly removes the net.ipv4.ip_forward sysctl from /etc/sysctl.conf on hosts with Neutron L3 Agent. In the absence of another source for this sysctl, it should revert to the default of 0 after the next reboot. This is a follow up to a previous change which stopped setting the sysctl, but leaves existing systems with the original value of 1 set.

    A deployer looking to more aggressively change the value may set neutron_l3_agent_host_ipv4_ip_forward to 0 using a Yoga release of Kolla Ansible. This option will be removed in future. Any deployments still relying on the previous value may set neutron_l3_agent_host_ipv4_ip_forward to 1. LP#1945453

  • Fixes an issue where the default configuration of libvirt did not use authentication for the API exposed over TCP on the internal API network. This allowed anyone with access to the internal API network read-write access to libvirt. While the internal API network is typically trusted, other services on this network generally at least require authentication.

    SASL authentication is now enabled for libvirt by default. Kolla Ansible supports libvirt TLS since the Train release, and this is recommended to provide a higher level of security. LP#1964013

Bug Fixes

  • Fixes an issue with an OIDC authentication flow requiring unnecessary action from the user. Redirecting to the target IdP page now happens automatically. LP#930055

  • Removes custom value of max_allowed_secret_in_bytes in barbican.conf. The default maximum size in Barbican was doubled to avoid issues with some certificates. LP #1957795

  • Fixes deploy Zun with Cinder Ceph support. Adds support for zun to access cinder volumes when external ceph is configured for cinder. LP#1848934

  • Fixed the deployment failure of outward_rabbitmq by resolving port conflicts by customizing RabbitMQ’s prometheus.tcp.port. LP #1885106

  • Use Volume V3 API in OpenStack exporter. Volume V2 API has been removed since OpenStack Wallaby. LP#1938194

  • Fixes the copy job for grafana custom home dashboard file. The copy job for the grafana home dashboard file needs to run priviliged, otherwise permission denied error occurs. LP#[1947710]

  • Fixes Octavia’s “Connection refused” errors by adding ovn_sb_connection to octavia.conf. LP#195011

  • Ironic API and Ironic Inspector API use separate policy files. Ironic role was updated to be able to handle both policies separately. LP#1952948

  • Continue to run all actions if one action failed in Elasticsearch curator. LP#1954720

  • Fixes Placement no logrotate configuration LP#1954723

  • Fixes Nova resize failing when migration_interface is customised. LP#1956976

  • Fixes unable to connect to zun console when kolla_enable_tls_external is true. Access to console of any zun container fails when kolla_enable_tls_external is true. This fix sets the protocol for wsproxy base_url in zun.conf according to the value of kolla_enable_tls_external LP#1957117

  • Fixes Register Identity Providers in OpenStack task which was missing an = in the openstack command causing the task to fail to register an IDP with Keystone. LP#1959022

  • Fixes Glance with Cinder iSCSI backend failing due to lack of lock_path setting. LP#1959663

  • Fixes logrotate config missing for openvswitch and prometheus services. LP#1961795

  • Fixes an issue with Ironic’s PXE components not getting updated on upgrade. LP#1963752

  • Fixes configuration of the Prometheus HTTP API URL when using the Prometheus collector in CloudKitty. LP#1961615

  • Fixes an issue with Prometheus scraping when targets’ Ansible inventory hostnames (inventory_hostname) do not resolve to reachable IP addresses. Reverts to the previous behaviour of using IP addresses to communicate with targets. The side effect of this is that targets instances will again be labelled using IP addresses rather than hostnames. LP#1955563

  • Fix the apache’s wsgi configuration for the aodh service in Debuntu binary flavours. LP#1953059

  • Fixes the baremetal role to avoid an error “Unable to remove “libvirtd”. Now the symlink /etc/apparmor.d/disable/usr.sbin.libvirtd is created by the role. LP#1960302

  • Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. Standardise rotation and deletion of logs using logrotate.

  • Fixes an issue with setting up OIDC based Keystone federation against IDP that has a different response type than id_token. This can now be set using a new variable keystone_federation_oidc_response_type. LP#1959781

  • adds back the option to configure the rabbitmq clustering interface via kolla LP#1900160 <https://bugs.launchpad.net/kolla-ansible/+bug/1900160>

  • Fixes an issue seen when using Jinja2 3.1.0.

  • Fixes an issue with Masakari instance monitor when libvirt SASL is enabled. libvirt SASL was enabled by default in a recent change to Kolla Ansible. LP#1965754

  • Fixes the configuration option setting the type of endpoint used by Neutron to send requests to Placement. LP#1960503

  • Fixes a configuration issue with Node Exporter causing all file system metrics of a host to be identical. LP#1961438

  • Fixes an issue where a failure of any Nova compute service to register itself would cause only the host querying the nova API to fail. Now, only hosts that fail to register will fail the Kolla Ansible run. Alternatively, to fail all hosts in a cell when any compute service fails to register, set nova_compute_registration_fatal to true. LP#1940119

  • The prometheus openstack exporters are now behind haproxy, providing a unique time series in the prometheus database. Also ensures that only one exporter queries the openstack APIs at any given time interval. With the previous behavior each openstack exporter was scraped at the same time. This caused each exporter to query the openstack APIs simultaneously introducing unneccesary load and duplicate time series in the prometheus database due to the instance label being unique for each exporter. LP#1972818

  • Fixes an issue where RabbitMQ was configured to mirror classic transient queues for all services. According to the RabbitMQ documentation this is not a supported configuration, and contributed to numerous bug reports. In order to avoid making unexpected changes to the RabbitMQ cluster, it is necessary to set rabbitmq_remove_ha_all_policy to yes in order to apply this fix. This variable will be removed in the Yoga release. LP#1954925

  • Fixes an issue with Cinder upgrade where Cinder services would remain pinned to the previous release’s RPC & object versions. LP#1954932


Security Issues

  • Adds mitigation for the Apache Log4j2 Remote Code Execution (RCE) Vulnerability in Elasticsearch - CVE-2021-44228.

Bug Fixes

  • Only run configure ovn in ovsdb task on ovn-controller hosts The task will fail on hosts (like controller nodes) without tunnel interface LP#1953367

  • Fixes an issue where the Nova API logs were written to files ending with -wsgi.log which affected the processing of these logs in the Fluentd pipeline. LP#1950185

  • On slower nodes, the initial grafana startup could experience a timeout failure when the migrations for setting up the database took longer than expected. This has been fixed by increasing the default timeout. The timeout settings can be changed via new parameters grafana_start_first_node_delay and grafana_start_first_node_retries for the grafana role. LP#1769962

Other Notes

  • The container ironic-dnsmasq now creates the dnsmasq.log just as the container neutron-dhcp-agent. For both log files verbosity can be increased globally via openstack_logging_debug or per service via ironic_logging_debug or neutron_logging_debug variables.


New Features

  • Add support for Alertmanager metrics scraping in Prometheus.

  • Adds support for integrating Fluentd metrics into Prometheus. By default this is now enabled when Prometheus is enabled. This behaviour can be overridden via the enable_prometheus_fluentd_integration flag. By default the integration provides metrics relating to the processing of logs by Fluentd. These metrics can be useful for monitoring the status of the Fluentd service. Additional metrics can also be extracted from logs via custom Fluentd config.

  • Adds config parameter haproxy_nova_spicehtml5_proxy_tunnel_timeout to configure the Tunnel TimeOut directive for spicehtml5proxy haproxy service.

  • Adds support for CentOS Stream 8 as a host Operating System and base container image. This is the only distribution of CentOS supported from the Wallaby release. The Victoria release will support both CentOS Linux 8 and CentOS Stream 8 hosts and images, and provides a route for migration.

  • Adds support for integration with Ceph RadosGW.

  • Supports Debian Bullseye (11) as host distribution.

  • Adds a new variable, disable_firewall, which defaults to true. If set to false, then the host firewall will not be disabled during kolla-ansible bootstrap-servers.

  • Disables usage collection (telemetry) in Kibana by default. User has still an option to enable it via GUI.

  • Adds support in kolla_docker module to set CgroupnsMode for Docker containers (via cgroupns_mode module param). Requires Docker 20.10. Note that pre-20.10 all containers behave as if they were run with mode host.

  • Added a new haproxy configuration variable, haproxy_host_ipv4_tcp_retries2, which allows users to modify this kernel option. This option sets maximum number of times a TCP packet is retransmitted in established state before giving up. The default kernel value is 15, which corresponds to a duration of approximately between 13 to 30 minutes, depending on the retransmission timeout. This variable can be used to mitigate an issue with stuck connections in case of VIP failover, see bug 1917068 for details.

  • Adds a kolla-ansible gather-facts command that may be used to gather Ansible host facts.

  • The haproxy-config role now allows user to set weight per haproxy’s backend. This can be achieved by setting a hostvar haproxy_{{ service }}_weight in inventory file to any integer value in range from 1 to 256, so the higher the weight, the higher the load. This can be set per {{ service }}. If hostvar is not specified, backend’s weight is not rendered in final haproxy configuration.

  • Adds two new variables service_images_pull_retries and service_images_pull_delay which control the behaviour of image pulling tasks. These are useful if your registry is not 100% reliable (usually due to load). The defaults have been set to 3 retries and 5 seconds delay to ensure a better default experience (these are actually Ansible defaults when task retries are enabled).

  • Implements container healthchecks for memcached services. See blueprint

  • Implements container healthchecks for rabbitmq services. See blueprint

  • Implemented container healthchecks for the following services: ceilometer, kafka, keystone-fernet, kuryr, mistral, nova-spicehtml5proxy, qdrouterd, zun. See blueprint

  • Adds two new arguments to the kolla-ansible command, --check and --diff. They are passed through directly to ansible-playbook.

  • Transitions to using system-scoped tokens when authenticating as the Keystone admin user. This is a necessary step towards being able to enable the updated oslo policies in services that allow finer grained access to system-level resources and APIs. Since Queens, the admin role is assigned to the admin user with system scope as well as in the admin project.

  • Add ability to use and enable the neutron packet logging framework.

  • Adds the ability to override the automatic detection of fluentd_version and fluentd_binary. These can now be defined as extra variables. This removes the dependency of having docker configured for config generation.

  • Added support to override rabbitmq config (erl_inetrc and rabbitmq-env.conf) in the kolla-toolbox container.

  • OVN deployment will now configure external_ids:ovn-chassis-mac-mappings to make DVR work on VLAN tenant networks.

  • Changes target names in Prometheus to user-friendly, Ansible inventory based values.

  • Adds support for passing extra runtime options to cAdvisor via prometheus_cadvisor_cmdline_extras new variable. By default system cgroups’ metrics are disabled, plus container labels don’t get exposed to Prometheus. Expensive metrics that usually should not be exported are also enforced to be disabled - consult https://github.com/google/cadvisor/blob/master/docs/runtime_options.md#metrics for a list. These defaults create savings in resources usage by both cAdvisor and Prometheus.

  • Due to the removal of the Monasca Grafana fork, the Monasca datasource is now configured in vanilla Grafana.

  • Adds support for configuring the filter and gather_subset arguments for the setup module via kolla_ansible_setup_filter and kolla_ansible_setup_gather_subset respectively. These can be used to reduce the number of facts, which can have a significant effect on performance of Ansible.

  • Adds functionality to allow passwords that are generated for Kolla Ansible to be stored in Hashicorp Vault. Use new CLI commands kolla-readpwd and kolla-writepwd to read and write Kolla Ansible passwords to a configured Hashicorp Vault kv secrets engine.

  • Adds “manila_cephfs_filesystem_name” variable to support multi-fs Ceph Pacific+ deloyments.

  • It is now possible to pass multiple inventories to kolla-ansible. To do so you should specify --inventory multiple times.

  • New variable ironic_enable_keystone_integration was added. It helps to add keystone connection information into ironic.conf if we want to connect to existing keystone (not installing it at the same time).

Upgrade Notes

  • Minimum supported Ansible version is now 2.10 and maximum supported is 4 (ansible-core 2.11).

  • Updates all references to Ansible facts within Kolla Ansible from using individual fact variables to using the items in the ansible_facts dictionary. This allows users to disable fact variable injection in their Ansible configuration, which may provide some performance improvement. Check for facts referenced in local configuration files, and update to use ansible_facts before disabling fact variable injection.

  • rp_filter is no longer set by Kolla Ansible by default. Users may wish to remove the related setting from kolla_sysctl_conf_path (/etc/sysctl.conf by default).

  • Kolla Ansible now defaults docker_registry_insecure to false. If you relied on the previous behaviour, please switch it back on but bear in mind the consequences as discussed in the related security note as well as the linked bug report. LP#1940547

  • To fix LP#1941940, nova_libvirt_dimensions now by default combines with nova_libvirt_default_dimensions. Please consider this when customising that variable.

  • Bumps minimum required Docker version to 18.09 and minimum required Docker Python SDK version to 3.4.1. These two are checked in prechecks.

  • CentOS Linux 8 is no longer supported as a host Operating System or base container image. CentOS users should migrate to CentOS Stream 8. The Victoria release will support both CentOS Linux 8 and CentOS Stream 8 hosts and images, and provides a route for migration.

  • Updates the default image type to source. Users wishing to deploy binary type images should set kolla_install_type to binary in globals.yml. This change is to reflect the reality that source images are tested more thoroughly and we (as OpenStack community) have better control over them.

  • Adds a new flag, docker_disable_ip_forward, which defaults to docker_disable_default_iptables_rules and is used to disable docker’s ip-forward option which makes docker set net.ipv4.ip_forward sysctl to 1. By default, docker_disable_default_iptables_rules is true, in which case docker’s ip-forward option is disabled.

    For existing hosts, this configuration change is applied when configuring docker via kolla-ansible bootstrap-servers. Docker changes the sysctl in a non-persistent manner, so it will revert to the default of 0 after a reboot, if not configured elsewhere. This should not cause a problem, since Kolla Ansible applies the sysctl where necessary. Operators may wish to perform a proactive reboot, or apply the default through other means.

  • enable_host_ntp variable is dropped per the deprecation process.

  • A new group loadbalancer is required in inventory file prior to upgrade. The loadbalancer group is a replacement for the haproxy group.

  • An HTTP server is now always deployed for Ironic conductor, while previously it was only deployed when iPXE is enabled.

    In the Xena release, Ironic removed the iSCSI driver. The recommended deploy driver is direct, which uses HTTP to transfer the disk image. This requires an HTTP server, and the simplest option is to use the one previously deployed when enable_ironic_ipxe is set to true.

  • The haproxy_single_service_listen.cfg.j2 template is not supported in haproxy roles and has been deleted.

  • Changes the default of mariadb_clustercheck_tag and mariabackup_tag from openstack_tag to mariadb_tag. This allows one variable to set the tag for all MariaDB images.

  • Updates the default value of monasca_ntp_server from external_ntp_servers[0] to 0.pool.ntp.org. This is due to the removal of the external_ntp_servers variable as part of the removal of Chrony deployment.

  • Modifies the default value of ceph_nova_user from nova to the value of ceph_cinder_user, in line with the default for ceph_nova_keyring. Users who have overridden ceph_nova_keyring to use separate keyrings for Nova and Cinder should also override ceph_nova_user to match the Nova keyring. LP#1934145

  • Changes Prometheus targets naming. This makes their names more user friendly but also creates a completely new set of a time series data. New target names are taken from Ansible inventory and have the exporter port number stripped off. Any Grafana dashboard that relies on a specific, hard-coded naming pattern for the targets will stop showing metrics after the upgrade.

  • cAdvisor has now reduced number of Prometheus metrics and labels exported by default. This means that corresponding timeseries will no longer be created. If existing setup relies on these, e.g. for visualisation or alerting, they could be explicitly enabled prior to upgrading with the prometheus_cadvisor_cmdline_extras new variable. Reference for the possible options: https://github.com/google/cadvisor/blob/master/docs/runtime_options.md#metrics.

  • Modifies the default value of rabbitmq_server_additional_erl_args from an empty string to +S 2:2 +sbwt none +sbwtdcpu none +sbwtdio none.

  • Support for deployment of chrony has been removed.

  • Service containers and configuration for the Monasca Grafana service will be removed automatically. It is up to the operator to remove the related HAProxy configuration, the Monasca Grafana database, and associated Docker volumes.

  • Support for panko has been removed due to upstream retirement.

  • Removes support for Prometheus v1 deployment. Any previously deployed Prometheus v1 instances will create a conflict during an upgrade. They should be either manually stopped/removed or Prometheus v2 deployment should be disabled by setting enable_prometheus to no.

  • The Rally and tempest projects are not OpenStack services, but clients. Their images and support are removed since Xena cycle.

  • The wsrep-notify.sh script has been removed (following deprecation in Wallaby).

  • Switches default images source (docker_registry) to quay.io. The docker_namespace is also changed to openstack.kolla to match. This is to make the default experience better, especially for users in China, those deploying more than once and/or beyond the all-in-one (AIO) environment used for development, testing and evaluation. Do note for multinode and production deployments it is still recommended to use a local registry as docs suggest. LP#1942134

Deprecation Notes

  • Setting rp_filter via Kolla Ansible is deprecated.

  • Support for configuration of NTP daemon (via enable_host_ntp) is deprecated and will be removed in the next Kolla Ansible release (Xena). Please use other means of configuring NTP.

  • The Monasca Fork of Grafana is deprecated due to lack of maintenance and will be removed in the Xena release. Instead, support will be provided to allow Monasca users to migrate to the vanilla Grafana service with the Monasca datasource.

  • Support for deploying tempest and rally is deprecated and will be removed in the Xena cycle. The reason is that these are not services of an OpenStack cloud but its clients.

Critical Issues

  • Fixes a critical bug which caused Nova instances (VMs) using libvirtd (the default/usual choice) to get killed on libvirtd (nova_libvirt) container stop (and thus any restart - either manual or done by running Kolla Ansible). It was affecting Wallaby+ on CentOS, Ubuntu and Debian Buster (not Bullseye). If your deployment is also affected, please read the referenced Launchpad bug report, comment #22, for how to fix it without risking data loss. In short: fixing requires redeploying and this will trigger the bug so one has to first migrate important VMs away and only then redeploy empty compute nodes. LP#1941706

Security Issues

  • Previously, Kolla Ansible, by default (as documented in several places), configured Docker to insecure mode for the configured registry (i.e., if not using the default one). This is controlled by the docker_registry_insecure variable. If operators did not notice this quirk, they could have opened their deployments up for potential MITM attacks. See the bug report for more discussion. LP#1940547

  • Fixes net.ipv4.ip_forward not to be enabled by Kolla Ansible on the default network namespace. It was enabled on hosts with Neutron L3 Agent (thus in most common setups with OVS and/or Linux Bridge, but not OVN) and allowed, unless users had extra iptables rules to avoid that, any traffic to be accepted for forwarding (as long as it was routable and passed other checks). Users of existing setups are advised to re-evaluate whether they need this sysctl enabled and disable if not necessary. Kolla Ansible will simply no longer try to set this sysctl at all. Neutron L3 Agent handles forwarding enablement per managed namespace. LP#1945453

Bug Fixes

  • Fixes monasca-thresh to correctly submit the topology to Storm. The previous container ran the topology in local mode (within the container), and didn’t use the Storm cloud. The new container handles submitting the topology to Storm and also handles killing and replaces the topology when it’s configuration has changed. As a result, the monasca-thresh container is only used for submission, and exits after that’s completed. The logs for the topology will now be available in the storm worker-artifact logs. LP#1808805

  • Workarounds rp_filter setting issues by defaulting to skipping it. LP#1837551

  • Fixes an issue where configuration in containers could become stale. This prevented containers with updated configuration from being restarted, e.g., if the kolla-ansible genconfig and kolla-ansible deploy-containers commands were used together. LP#1848775

  • chronyd crash loop if server is rebooted (Debian) LP#1915528

  • Fixed an issue when Docker was configured after startup on Debian/Ubuntu, which resulted in iptables rules being created - before they were disabled. LP#1923203

  • Fixes an issue with Octavia SSH key copying if user disabled Octavia auto configuration. LP##1927727

  • Fixes elasticsearch fluentd output being enabled when elasticsearch is not enabled. LP#1927880

  • Fixed an issue where docker python SDK 5.0.0 was failing due to missing six - introduced a constraint to install version lower than 5.x. LP#1928915

  • Fixes more-than-2-node RabbitMQ upgrade failing randomly. LP#1930293.

  • Fixes Swift deploy when TLS enabled. Added the missing handler and corrected the container name. LP#1931097

  • Fixes missing region_name in keystone_auth sections. See bug 1933025 for details.

  • Fixes iscsid failing in current CentOS 8 based images due to pid file being needlessly set. LP#1933033

  • Fixes host bootstrap on Debian not removing the conflicting packages. It now behaves in accordance with the docs. LP#1933122

  • Fixes default Masakari host monitor config to work with other config that Kolla Ansible sets. This sets disable_ipmi_check due to restrict_to_remotes being set. It prevents the TypeError that happened when host monitor had to take action. This does not affect any functionality so far as Kolla Ansible does not manage IPMI credentials in Pacemaker. LP#1933209

  • Fixes an issue with timesync checks on deployment host. See bug 1933347 for details.

  • Fixes horizon’s healthcheck when SSL is turned on. LP#1933846

  • Fixes an issue seen when customising the Docker Yum repository URL on CentOS, where the docker_yum_gpgkey variable is not used consistently. LP#1934913

  • Fixes an issue where spice console is freezed after while, see LP#1938549.

  • Fixes Masakari in multi-region deployments to query Nova API in its own region. LP#1939291

  • Fixes nova’s healthchecks when upgrading from previous version. LP#1939679

  • Fixed broken kolla-toolbox container when RabbitMQ is disabled and IPv6 is used. LP#1939883

  • Fixes inability to attach devices (e.g., volumes via iSCSI/FC) to instances on Debian Bullseye. LP#1941940

  • Fixes kolla-toolbox ansible.log logging for different users than ansible. LP#1942846

  • Fixes mariadb-clustercheck not to run when there is no HAProxy. LP#1944114

  • No longer creates directories for haproxy and swift logs where they are not needed. LP#1945070

  • Fixes an issue with multinode MariaDB deployments which could fail the playbook execution on WSREP check due to the new behaviour of Galera 4. LP#1947485.

  • Fixes an issue with single node MariaDB deployments with HAProxy disabled. See bug 1947534 for details.

  • Fixes the generation of wsrep_cluster_address in galera.cnf when --limit is used while deploying MariaDB nodes. LP#1947589

  • Fixes an error in placement role which prevents to deploy the placement service when custom policy file is used. LP#1948835

  • Fixes missing current Ansible version in the error message. LP#1948979

  • Fix octavia role doesn’t set the amphora network’s gateway_ip LP#1949260

  • Fixes an issue where kolla-ansible exits with a zero exit code when executed with a bogus command name. LP#1929397

  • Fixes an issue with Cyborg deployment. LP#1937911

  • Removes deprecated export_synchronous option from Designate config.

  • Fixes potential issue with Alertmanger in non-HA deployments. In this scenario, peer gossip protocol is now disabled and Alertmanager won’t try to form a cluster with non-existing other instances. LP#1926463

  • Adds a new flag, docker_disable_ip_forward, which defaults to docker_disable_default_iptables_rules and is used to disable docker’s ip-forward option which makes docker set net.ipv4.ip_forward sysctl to 1. This is to protect from creating all-forwarding hosts. LP#1931615

  • Fixes an issue when generating /etc/hosts during kolla-ansible bootstrap-servers when one or more hosts has an api_interface with dashes (-) in its name. LP#1927357

  • Fixes the container health check for the ironic_ipxe container on Debian and Ubuntu systems. LP#1937037

  • Fixes an issue with Gnocchi when gnocchi-statsd is disabled. LP#1926914

  • Fixes HAProxy prechecks when kolla_externally_managed_cert is used.

  • Fixes an issue with Magnum when TLS is enabled. LP#781062

  • Fixes an issue with config.json for neutron-server when a VMware plugin agent is used.

  • Stops Fluentd warning message when posting to Elasticsearch 7 bulk API.

  • Fixes an issue with Neutron linuxbridge ML2 agent when neutron_external_interface includes multiple interfaces. LP#1863935

  • Fixes an issue with Manila configuration which was missing a [glance] section, preventing some drivers from operating.

  • Fixes the container image used by mariabackup. It was using the mariadb image, which was deprecated in Victoria and removed in Wallaby. LP#1928129

  • Fixes an issue with default Nova configuration for Ceph where the RBD user is set to nova, but only a cinder keyring is copied. The default value of ceph_nova_user is changed to the value of ceph_cinder_user, in line with the default for ceph_nova_keyring. LP#1934145

  • Fixes an issue with Octavia deployment when using a custom service auth project. If octavia_service_auth_project is set to a project that does not exist, Octavia deployment would fail. The project is now created. LP#1922100

  • Fixes an issue where Libvirt secrets were not persisted. There are no known negative side-effects to this, however it was fixed as a precaution. LP#1821696

  • Removes “fix_cephfs_owner.yaml” which related to pre-wallaby Manila’s use of subfolders. Post-wallaby Manila now uses cephfs volumes instead, as such this file is no longer required. LP#1938285 LP#1935784

  • Removes use of “cephfs_enable_snapshots” in Manila config as this option was removed from Manila in the Wallaby release.

Other Notes

  • Following Cinder upstream, support for using ZFSSA with Cinder has been removed. ZFSSA was unsupported in Train and later removed in Ussuri.

  • Optimised image pulling to avoid looping over disabled services.