Zed Series Release Notes


Upgrade Notes

  • If credentials are updated in passwords.yml kolla-ansible is now able to update these credentials in the keystone database and in the on disk config files.

    The changes to passwords.yml are applied once kolla-ansible -i INVENTORY reconfigure has been run.

    If you want to revert to the old behavior - credentials not automatically updating during reconfigure if they changed in passwords.yml - you can specify this by setting update_keystone_service_user_passwords: false in your globals.yml.

    Notice that passwords are only changed if you change them in passwords.yml. This mechanism is not a complete solution for automatic credential rollover. No passwords are changed if you do not change them inside passwords.yml.

Bug Fixes

  • Fixes mariadb role deployment when using Ansible check mode. LP#2052501

  • Updated configuration of service user tokens for all Nova and Cinder services to stop using admin role for service_token and use service role.

    See LP#[2004555] and LP#[2049762] for more details.

  • Add Keystone Service role. Keystone is creating service in bootstrap since Bobcat. Service role is needed for SLURP to work from Antelope. This role is also needed in Antelope and Zed for Cinder for proper service token support. LP#2049762

  • Changes to service user passwords in passwords.yml will now be applied when reconfiguring services.

    This behaviour can reverted by setting update_keystone_service_user_passwords: false.

    Fixes LP#2045990


Bug Fixes

  • Fixes enabled usage audit notifications when they are not needed. See LP##2049503.


New Features

  • Updates apache grok pattern to match the size of response in bytes, time taken to serve the request and user agent.

  • Masakari coordination backend can now be configured via masakari_coordination_backend variable. Coordination is optional and can now be set to either redis or etcd.

  • Set a log retention policy for OpenSearch via Index State Management (ISM). Documentation.

  • Adds the ability to configure rabbitmq via rabbitmq_extra_config which can be overriden in globals.yml.

  • In the configuration template of the Senlin service the cafile parameter is now set by default in the authentication section. This way the use of self-signed certificates on the internal Keystone endpoint is also usable in the Senlin service.

Upgrade Notes

  • Added log retention in OpenSearch, previously handled by Elasticsearch Curator. By default the soft and hard retention periods are 30 and 60 days respectively. If you are upgrading from Elasticsearch, and have previously configured elasticsearch_curator_soft_retention_period_days or elasticsearch_curator_hard_retention_period_days, those variables will be used instead of the defaults. You should migrate your configuration to use the new variable names before the Caracal release.

Bug Fixes

  • Fix MariaDB backup if enable_proxysql is enable

  • Fixes keystone’s task which is connecting via ssh instead locally. LP#2004224

  • Fixes 504 timeout when scraping openstack exporter. Ensures that HAProxy server timeout is the same as the scrape timeout for the openstack exporter backend. LP#2006051

  • Fixes non-persistent Neutron agent state data. LP2009884

  • Fix issue with octavia security group rules creation when using IPv6 configuration for octavia management network. See LP#2023502 for more details.

  • Fixes glance-api failed to start privsep daemon when cinder_backend_ceph is set to true. See LP#2024541 for more details.

  • Fixes 2024554. Adds host and mariadb_port to the wsrep sync status check. This is so none standard ports can be used for mariadb deployments. LP#2024554

  • Fixes an issue with high CPU usage of the cAdvisor container by setting the per-container housekeeping interval to the same value as the Prometheus scrape interval. LP#2048223

  • Fixes an issue where Prometheus would fail to scrape the OpenStack exporter when using internal TLS with an FQDN. LP#2008208

  • Fixes Docker health check for the sahara_engine container. LP#2046268

  • Fixes an issue where Fluentd was parsing Horizon WSGI application logs incorrectly. Horizon error logs are now written to horizon-error.log instead of horizon.log. See LP#1898174

  • Added log retention in OpenSearch, previously handled by Elasticsearch Curator, now using Index State Management (ISM) OpenSearch bundled plugin. LP#2047037.

  • Fixes an issue where Prometheus scraping of Etcd metrics would fail if Etcd TLS is enabled. LP#2036950


New Features

  • Added capability to specify custom kernel modules for Neutron: neutron_modules_default: Lists default modules. neutron_modules_extra: For custom modules and parameters.

  • Added a neutron check for ML2/OVS and ML2/OVN presence at the start of deploy phase. It will fail if neutron_plugin_agent is set to ovn and use of ML2/OVS container detected. In case where neutron_plugin_agent is set to openvswitch the check will fail when it detects ML2/OVN container or any of the OVN specific volumes.

Upgrade Notes

  • Default keystone user role has been changed from deprecated role _member_ to member role.

  • Now ironic_tftp service does not bind on, by default it uses ip address of the api_interface. To revert to the old behaviour, please set ironic_tftp_interface_address: in globals.yml.

  • Before upgrading to the Zed release of Kolla-Ansible on Ubuntu, ensure that Elasticsearch indexes created in version 6 or earlier are reindexed. OpenSearch 2.x does not support these older indexes. A precheck for this scenario has now been introduced.

  • Configure Nova libvirt.num_pcie_ports to 16 by default. Nova currently sets ‘num_pcie_ports’ to “0” (defaults to libvirt’s “1”), which is not sufficient for hotplug use with ‘q35’ machine type.

  • Changes default value of nova libvirt driver setting skip_cpu_compare_on_dest to true. With the libvirt driver, during live migration, skip comparing guest CPU with the destination host. When using QEMU >= 2.9 and libvirt >= 4.4.0, libvirt will do the correct thing with respect to checking CPU compatibility on the destination host during live migration.

Security Issues

  • Restrict the access to the http Openstack services exposed /server-status by default through the HAProxy on the public endpoint. Fixes issue for Ubuntu/Debian installations. RockyLinux/CentOS not affected. LP#1996913

Bug Fixes

  • Fixes issues with OVN NB/SB DB deployment, where first node needs to be rebootstrapped. LP#1875223

  • enable_keystone_federation and keystone_enable_federation_openid have not been explicitly handled as bool in various templates in the keystone role so far. LP#2036390

  • Fixes an issue when Kolla is setting the producer tasks to None, and this disables all designate producer tasks. LP#1879557

  • Fixes ironic_tftp which binds to all ip addresses on the system. Added ironic_tftp_interface, ironic_tftp_address_family and ironic_tftp_interface_address parameters to set the address for the ironic_tftp service. LP#2024664

  • Fixes an OpenSearch migration process by adding precheck for Elasticsearch indexes in too low version for OpenSearch 2.x.

  • Fixes an issue where a Docker health check wasn’t configured for the OpenSearch Dashboards container. See bug 2028362.

  • Fixes an issue where ‘q35’ libvirt machine type VM could not hotplug more than one PCIe device at a time.

  • Fixes an issue where keepalived track script fails on single controller environment and keepalived VIP goes into BACKUP state. keepalived_track_script_enabled variable has been introduced (default: true), which can be used to disable track scripts in keepalived configuration. LP#2025219

  • Fixes an issue were an OVS-DPDK task had a different name to how it was being notified.

  • When upgrading Nova to a new release, we use the tool nova-status upgrade check to make sure that there are no nova-compute that are older than N-1 releases. This was performed using the current nova-api container, so computes which will be too old after the upgrade were not caught. Now the upgraded nova-api container image is used, so older computes are identified correctly. LP#1957080


New Features

  • Since CVE-2022-29404 is fixed the default value for the LimitRequestBody directive in the Apache HTTP Server has been changed from 0 (unlimited) to 1073741824 (1 GiB). This limits the size of images (for example) uploaded in Horizon. Now this limit can be configured via horizon_httpd_limitrequestbody. LP#2012588

  • etcd is now exposed internally via HAProxy on etcd_client_port.

  • Added two new flags to alter behaviour in RabbitMQ: * rabbitmq_message_ttl_ms, which lets you set a TTL on messages. * rabbitmq_queue_expiry_ms, which lets you set an expiry time on queues. See https://www.rabbitmq.com/ttl.html for more information on both.

  • The config option rabbitmq_ha_replica_count is added, to allow for changing the replication factor of mirrored queues in RabbitMQ. While the flag is unset, the queues are mirrored across all nodes using “ha-mode”:”all”. Note that this only has an effect if the flag ` om_enable_rabbitmq_high_availability` is set to True, as otherwise queues are not mirrored.

  • The config option rabbitmq_ha_promote_on_shutdown has been added, which allows changing the RabbitMQ definition ha-promote-on-shutdown. By default ha-promote-on-shutdown is “when-synced”. We recommend changing this to be “always”. This basically means we don’t mind losing some messages, instead we give priority to rabbitmq availability. This is most relevant when restarting rabbitmq, such as when upgrading. Note that setting the value of this flag, even to the default value of “when-synced”, will cause RabbitMQ to be restarted on the next deploy. For more details please see: https://www.rabbitmq.com/ha.html#cluster-shutdown

  • Services using etcd3gw via tooz now use etcd via haproxy. This removes a single point of failure, where we hardcoded the first etcd host for backend_url.

Upgrade Notes

  • Default tags of neutron_tls_proxy and glance_tls_proxy have been changed to haproxy_tag, as both services are using haproxy container image. Any custom tag overrides for those services should be altered before upgrade.

Security Issues

  • The kolla-genpwd, kolla-mergepwd, kolla-readpwd and kolla-writepwd commands now creates or updates passwords.yml with correct permissions. Also they display warning message about incorrect permissions.

Bug Fixes

  • The precheck for RabbitMQ failed incorrectly when kolla_externally_managed_cert was set to true. LP#1999081

  • Fixes removal of Elasicsearch and Kibana loadbalancer configs during migration to Opensearch, when those services are running on a dedicated monitoring node.

  • Fixes create sasl account before config file is ready. LP#2015589

  • Set correct permissions for opensearch-dashboard data location LP#2020152 https://bugs.launchpad.net/kolla-ansible/+bug/2020152

  • Configuration of service user tokens for all Nova and Cinder services is now done automatically, to ensure security of block-storage volume data.

    See LP#[2004555] for more details.

  • Fixes deployment when using Ansible check mode. LP#2002661

  • Fixes the incorrect endpoint URLs and service type information for the Cyborg service in the Keystone. LP#2020080

  • Set the etcd internal hostname and cacert for tls internal enabled deployments. This allows services to work with etcd when coordination is enabled for TLS interal deployments. Without this fix, the coordination backend fails to connect to etcd and the service itself crashes.

  • Fixes opensearch migration process. Including case when elasticsearch is located in regular folder instead of docker volume. Furthermore it now has checks if there is data to migrate.

  • When upgrading or deploying RabbitMQ, the policy ha-all is cleared if om_enable_rabbitmq_high_availability is set to false.


New Features

  • Adds the flag om_enable_rabbitmq_high_availablity. Setting this to true will enable both durable queues and classic mirrored queues in RabbitMQ. Note that classic queue mirroring and transient (aka non-durable) queues are deprecated and subject to removal in RabbitMQ version 4.0 (date of release unknown). Changes the pattern used in classic mirroring to exclude some queue types. This pattern is ^(?!(amq\\.)|(.*_fanout_)|(reply_)).*.

  • Adds ovn-monitor-all variable. A boolean value that tells if ovn-controller should unconditionally monitor all records in OVS databases. Setting ovn-monitor-all variable to ‘true’ will remove some CPU load from OVN SouthBound DB but will effect with more updates comming to ovn-controller. Might be helpfull in large deployments with many compute hosts.

Bug Fixes

  • Fixes kolla_docker module which did not take into account the common_options parameter, so there were always module’s default values. LP#2003079

  • The value of [oslo_messaging_rabbit] heartbeat_in_pthread is explicitly set to either true for wsgi applications, or false otherwise.

  • Fix issue with octavia config generation when using octavia_auto_configure and the genconfig command. Note that access to the OpenStack API is necessary for Octavia auto configuration to work, even when generating config. See LP#1987299 for more details.

  • Fixes OVN deployment order - as recommended in OVN docs. LP#1979329

  • Fixes an issue where some prechecks would fail or not run when running in check mode. LP#2002657

  • Prevent haproxy-config role from attempting to configure firewalld during a kolla-ansible genconfig. LP#2002522


New Features

  • Adds a set of variables to control the cinder backend name, as used in cinder.conf. This is the name you use when setting the volume_backend_name property on volume types. Details are in the cinder guide section of the documentation.

  • Enables configuring firewalld for external API services. Extracts the required services and checks the external port, then adds the ports to a firewalld zone. Assumes that firewalld has been installed and configured beforehand. The variable disable_firewall, is disabled by default to preserve backwards compatibility. But its good practice to have the system firewall configured.

  • Adds support for deploying OpenSearch and OpenSearch dashboards. These services directly replace ElasticSearch and Kibana which are now end-of-life. Support for sending logs to a remote ElasticSearch (or OpenSearch) cluster is maintained.

  • Allow cinder-volume to be configured to use Pure Storage FlashArray with either the iSCSI or FC driver.

  • Adds possibility for inlcuding custom alert notification templates with Prometheus Alertmanager.

  • Adds a new, disabled by default, option for Prometheus OpenStack exporter, named “enable_prometheus_openstack_exporter_external”. This option allows exposing OpenStack exporter through HAProxy, and may be used to expose OpenStack metrics to an existing Prometheus server outside the OpenStack cloud, instead of using the default one provided by OpenStack.

  • Adds a new flag, openvswitch_ovs_vsctl_wrapper_enabled which will install a wrapper script to /usr/bin/ovs-vsctl to docker exec into the openvswitchd container.

  • Adds the prometheus_scrape_interval configuration option. The default is set to 60s. This configures the default scrape interval for all jobs.

  • Adds bifrost_deploy_verbosity parameter. It allows to change the verbosity of the Bifrost bootstrap task. -vvvv is a default value.

  • Adds support for configuring the CloudKitty fetcher using cloudkitty_fetcher_backend.

  • New switches added to control deployment of the Masakari monitors. The deployment of each type of monitors can be controlled individually via enable_masakari_instancemonitor and enable_masakari_hostmonitor. By default, both are set to true when the deployment of the Masakari is enabled via enable_masakari.

  • Sanity checks have been removed. These “smoke tests” orignially were implemented for barbican, cinder, glance and keystone.

  • Kolla Ansible now supports failing execution early if fact collection fails on any of the hosts. This is to avoid late failures due to missing facts (especially cross-host). This is possible by setting kolla_ansible_setup_any_errors_fatal: true. Do note this still supports host fact caching and it will not affect scenarios with all facts cached (as there is no task to fail).

  • Adds a new variable ceilometer_prometheus_pushgateway_options.

    It is dictionary whose keys and respective values are added to the pushgateway’s URL, checking that no “None” value is being set.

    For example, the following configurations:

    ceilometer_prometheus_pushgateway_host: ""
    ceilometer_prometheus_pushgateway_port: "9091"
        timeout: 180
        verify_ssl: yes

    Result in the following URL: prometheus:// \ metrics/job/openstack-telemetry/?timeout=180&verify_ssl=True


  • Adds support for managing resource providers via config files.

  • Adds support for setting up arbitrary HAProxy services in active/passive mode.

  • Implements container healthchecks for mariadb-server service. See blueprint

  • Adds support for configuring a coordination backend for Ironic Inspector via the ironic_coordination_backend variable. Possible values are redis or etcd.

  • Adds support for multiple DHCP ranges in the Ironic Inspector DHCP server.

  • Adds ironic_http_interface/ironic_http_interface_address parameters to set the addresses for the ironic_http service.

  • Support for both PXE and iPXE enabled in Ironic at the same time.

  • Adds variables to configure whether monitoring services should be exposed externally:

    • enable_grafana_external

    • enable_kibana_external

    • enable_prometheus_alertmanager_external

  • Adds support for configuring a number of UDP workers for Designate’s bind9 backend via the designate_backend_bind9_workers variable.

  • Adds support for configuring the Openstack Compute API microversion used by the OpenStack exporter for Prometheus using the prometheus_openstack_exporter_compute_api_version variable. The default value is latest, matching the default behaviour of the exporter.

  • Adds ovn-openflow-probe-interval variable. It sets the inactivity probe interval of the OpenFlow connection to the OpenvSwitch integration bridge, in seconds. If the value is zero, it disables the connection keepalive feature. The default value is 60 seconds.

  • Adds support for deploying prometheus-msteams, which can be used to forward Prometheus Alertmanager notifications to Microsoft Teams. It is enabled by setting enable_prometheus_msteams to true.

  • Adds ability to configure ProxySQL’s max replication lag via configuration value proxysql_backend_max_replication_lag which is set to default value as per documentation. If it is greater than 0, ProxySQL will regularly monitor replication lag and if it goes beyond the configured threshold it will temporary shun the host until replication catches up. Please see the official upgrade notes for more detail.

Upgrade Notes

  • If you are currently deploying ElasticSearch with Kolla Ansible, you should backup the data before starting the upgrade. The contents of the ElasticSearch data volume will be automatically moved to the OpenSearch volume. The ElasticSearch, ElasticSearch Curator and Kibana containers will be removed automatically. The inventory must be updated so that the elasticsearch group is renamed to opensearch, and the kibana group is renamed to opensearch-dashboards.

  • Enable TLS by default in Bifrost. Bifrost is now configured to enable TLS for the services it deploys, and generate self-signed certificates for them. TLS may be disabled by setting enable_tls to false in /etc/kolla/config/bifrost/bifrost.yml.

  • image_upload_use_cinder_backend = True is no longer set on the Cinder’s default Ceph RBD backend, the common upstream default is now used (False currently). See also LP#1991516

  • Kolla Ansible no longer sets show_multiple_locations = True by default when Glance’s Ceph RBD backend is enabled. This was applied as a fix but operators must note that this, in turn, disables the Cinder’s and Nova’s optimisations. On the other hand, these optimisations might have been causing other operators’ trouble. Please see the linked bug report. Operators relying on this feature can set the flag themselves using service config overrides. LP#1992153

  • Modifies the default value of enable_hacluster from no to yes if masakari-hostmonitor is enabled. LP#1934149

  • Sanity checks have been removed because they were broken.

  • The Nova legacy service and its endpoints are no longer advertised by default. To revert to the old behaviour, please set nova_enable_nova_legacy_service: true in globals.yml.

  • The variable keystone_token_provider does not exist anymore, because there is no alternative.

  • OpenStack Monasca is no longer supported by Kolla Ansible. Support for deploying kafka, storm and zookeeper has been dropped since they have been used only with Monasca. Post-upgrade cleanup of those services can be done using kolla-ansible monasca_cleanup - for details please see Monasca guide

  • Modifies the default lease time of the Ironic Inspector DHCP server to 10 minutes. This is small enough to use small pools of IP addresses for inspection but gives more room for the inspection to succeed. This default can be changed globally via ironic_dnsmasq_dhcp_default_lease_time variable or per range via lease_time parameter.

  • Replaced ironic_dnsmasq_dhcp_range and ironic_dnsmasq_default_gateway in favour of ironic_dnsmasq_dhcp_ranges. For example, if you have:

    ironic_dnsmasq_dhcp_range: ",,"
    ironic_dnsmasq_default_gateway: ""

    replace it with:

      - range: ",,"
        routers: ""
  • Ironic volumes related to PXE (TFTP) and iPXE & direct deploy (HTTP) are refactored to share a common parent path at /var/lib/ironic. This is done to support both PXE and iPXE at the same time. Operators doing advanced customisations might need to review the relevant defaults section.

  • Upgrades of Ironic will now wait for nodes in wait states to change their state. This is to improve the user experience by avoiding breaking processes being waited on. This can be disabled by setting ironic_upgrade_skip_wait_check to yes.

  • Ironic containers related to PXE (TFTP) and iPXE & direct deploy (HTTP) are renamed to better reflect their role: ironic_pxe is now ironic_tftp, while ironic_ipxe is now ironic_http. Operators doing advanced customisations might need to review the relevant defaults section. Additionally, their respective host groups have changed analogously: ironic-pxe is now ironic-tftp, and ironic-ipxe is now ironic-http.

  • The Keystone’s admin endpoint is no longer created by default. Operators of existing deployments may wish to remove it after the upgrade completes. Operators having external services relying on the availability of the Keystone’s admin endpoint may set keystone_create_admin_endpoint to true to keep creating the admin endpoint but such support will be removed after Zed.

  • Keystone’s admin interface no longer points to a separate port. On upgrade, the port is preserved to maintain the intermediate compatibility. Users are advised to run the deploy and post-deploy commands afterwards to ensure port’s cleanup. For more information, please refer to the docs. Please note that the relevant variables keystone_admin_port, keystone_admin_url and admin_protocol are no longer used and are deprecated for removal after Zed. Please cease their usage in your customisations.

  • Starting with Zed, Neutron marked the linuxbridge ML2 driver experimental. The Kolla team has decided to honour the upstream’s decision and make sure users are aware they are using a badly supported driver instead of having it configured out of the box. Thus, all users of this driver are advised to get acquainted with Neutron docs and proceed accordingly.

  • ovn role has been split into ovn-controller and ovn-db roles, therefore users that have ovn_extra_volumes configured need to adapt their config to use ovn_db_extra_volumes or ovn_controller_extra_volumes.

  • For ovn the default value of openflow-probe-interval was changed to 60 seconds. Use the ovn-openflow-probe-interval variable to override.

  • Prometheus has been switched to active/passive mode. This is enabled by default but can be turned off by setting prometheus_active_passive to no. See bug 1928193.

  • Prometheus Alertmanager has been switched to active/passive mode. This is enabled by default but can be turned off by setting prometheus_alertmanager_active_passive to no.

  • The deprecated enable_ironic_ipxe variable has been removed. The iPXE still works by default and it can be disabled by setting the more-aptly-named ironic_dnsmasq_serve_ipxe to false.

  • The deprecated storage_interface variable has been removed. Please set the swift_storage_interface directly.

  • Deprecated sysctl knobs related to ip_forward and rp_filter were removed.

  • Influxdb variable infuxdb_internal_endpoint has been fixed to influxdb_internal_endpoint. Operators might need to review the relevant variable.

Deprecation Notes

  • enable_ironic_ipxe is deprecated in favour of ironic_dnsmasq_serve_ipxe which reflects the effect better. enable_ironic_ipxe will be removed in Zed.

  • enable_ironic_pxe_uefi is deprecated and will be removed in Zed. This variable is not documented and results in a broken PXE setup for Ironic Inspector. The recommended way to support EFI/UEFI deployments in Ironic Inspector is to stay with the recommended default of iPXE in Ironic Inspector (see docs on ironic_dnsmasq_serve_ipxe).

  • In the April 2022 PTG the deprecation and removal of the sanity checks has been confirmed. Therefore the usage of

    kolla-ansible check

    is not possible any more.

  • Variables keystone_admin_port, keystone_admin_url and admin_protocol are deprecated for removal after Zed.

Security Issues

  • Kolla Ansible used to run Ironic’s tftpd as an (unprivileged) root user. Now, it will explicitly use the nobody user.

Bug Fixes

  • The scrape interval for the prometheus data source in grafana is now to set to prometheus_scrape_interval. This fixes issues with dashboards that use the $__rate_interval grafana variable as the default scrape interval of 60s does not match the grafana default of 15s.

  • Fixes an issue in the bifrost_deploy container where passwords generated by Bifrost were not persistent beyond the lifetime of the container. This is generally not a problem unless you access the Ironic or Inspector APIs outside of the Bifrost playbooks. LP#1983356

  • Fixes the issue of exponential growth of /run/openvswitch mounts when kolla-toolbox container is restarted. LP#1979295

  • Fixes LP#1982777. Set multipathd user_friendly_names to “no” to make os-brick able to resize volumes online. Adds ability to override multipathd config.

  • Fixed bug #1987982. This bug caused the database log_bin_trust_function_creators variable not to be set back to “OFF” after a keystone upgrade.

  • image_upload_use_cinder_backend = True is no longer set on the Cinder’s default Ceph RBD backend. Related ERRORs and WARNINGs in Cinder and Glance logs are prevented. LP#1991516

  • Kolla Ansible no longer sets show_multiple_locations = True by default when Glance’s Ceph RBD backend is enabled. This caused various issues with the services running with the recommended Ceph permissions. LP#1992153

  • Fixes missing logrotate configuration for proxysql logs. LP#1995248

  • Fixes an issue when masakari-hostmonitor is enabled while corosync/pacemaker is not deployed. LP#1934149

  • Fixes an issue with recovering multi-node MariaDB Galera cluster.

  • Adds configuration necessary for application credential access rules to properly function. LP#1965111

  • Fixes an issue with AlertManager external Web URL being unconfigurable. A new variable prometheus_alertmanager_external_url has been introduced that users can use to set web.external-url to public.

  • Fixes an issue where Ironic Inspector could be configured without authentication in a multi-region environment in a region without a local Keystone service.

  • Fixes Keystone OIDC failing to validate JWT because of missing key on Azure auth-oidc endpoint. Adds new variable containing JWKS uri that delivers missing keys. LP#1990375

  • Fixes missing [taskflow] section in masakari.conf.j2 LP#1966536

  • Fixes Zun capsules loosing network namespaces after restarting zun_cni_daemon container

  • Under circumstances of extended disruption to the Fluentd-ElasticSearch central logging pipeline, it is possible to generate a sufficient buffer of unsent log data that takes longer than the default Fluentd request timeout (default 5 seconds) to transfer the buffer. The default request timeout value is raised to 60s, and made configurable using new parameter fluentd_elasticsearch_request_timeout. LP#1983031

  • Increases prometheus_openstack_exporter_timeout to 45 seconds to reduce the odds of scrape failures on deployments with large number of OpenStack resources. LP#1976629

  • Fixes Ironic API healthchecks when backend TLS encryption is enabled. LP#1990819

  • Removes the dhcp-sequential-ip configuration option from ironic_dnsmasq to avoid a race condition offering the same IP address to multiple hosts being inspected at the same time.

  • Fixes an issue with ironic-inspector using the wrong option to configure the interface used to communicate with the Ironic API. LP#1995246

  • Fixes an issue with ironic-neutron-agent using the wrong option to configure the interface used to communicate with the Ironic API. LP#1990675

  • If ironic_enabled_notification_topics is set to true, ironic_notification_level is set to info in order to ensure that Ironic actually sends out notifications.

    See bug 1969826 for details.

  • Fixes monitor: kolla be added in external_labels by default. Prometheus default config should not include environment-specific details. In this patch, modify external_labels be optional, we can add any <labelname>: <labelvalue> in external_labels. LP#1944699

  • Fixes an issue with Masakari instance monitor when libvirt SASL is enabled. libvirt SASL was enabled by default in a recent change to Kolla Ansible. LP#1965754

  • Fixes an issue where a failure of any Nova compute service to register itself would cause only the host querying the nova API to fail. Now, only hosts that fail to register will fail the Kolla Ansible run. Alternatively, to fail all hosts in a cell when any compute service fails to register, set nova_compute_registration_fatal to true. LP#1940119

  • The prometheus openstack exporters are now behind haproxy, providing a unique time series in the prometheus database. Also ensures that only one exporter queries the openstack APIs at any given time interval. With the previous behavior each openstack exporter was scraped at the same time. This caused each exporter to query the openstack APIs simultaneously introducing unneccesary load and duplicate time series in the prometheus database due to the instance label being unique for each exporter. LP#1972818

  • Fixes an issue with misaligned data points in grafana when loadbalancing over multiple prometheus server instances. See bug 1928193.

  • Fixes an issue with Alertmanager silence creation leading to a 404 page. LP#1987866

Other Notes

  • sets balancing algorithm to round-robin for horizon if memcached is enabled LP#1990523

  • tools/ovs-dpdkctl.sh moved to ansible/roles/ovs-dpdk/files/ovs-dpdkctl.sh

  • Rocky Linux 9 based images are now recommended (instead of CentOS Stream ones).

New Features

  • Adds support for the VMware NSX Policy plugin

  • Adds support for openEuler 20.03-LTS-SP2 as a host OS distribution.

  • Deploys and configures a prometheus-libvirt-exporter image as part of the Prometheus monitoring stack.

  • Adds Venus deployment support. The project provides a solution for log collection.

  • Adds support for the VMware FCD as Cinder volumes.

  • Adds a tls_connect module to the Prometheus blackbox exporter. This can be used to test connectivity of TLS servers.

  • Adds the ability to use Prometheus as the metrics database for Ceilometer.

    Adapts Ceilometer configurations so metrics can be pushed to a Prometheus Pushgateway. LP#1964135

  • Adds new variables to be used by the common role, cron_logrotate_log_minsize and cron_logrotate_log_maxsize. They allow to configure global logrotate’s minsize and maxsize options.

  • Allow to disable Designate Sink service (and notifications to/from it) by setting designate_enable_notifications_sink to no.

  • Introduce nova_enable_external_metadata that defaults to no to control if external facing metadata haproxy frontend should be configured.

  • With this release, kolla-ansible no longer creates admin endpoints for any service other than Keystone. Make sure that you only reference public or internal endpoints in your applications and configurations.

  • Allows the use of variables in ceph configuration and keyring files. This includes but is not limited to ansible lookup expressions. LP#1959565

  • Implements the HAProxy Admin Socket. Allows operators to set the flag haproxy_socket_level_admin (default: “no”) which adds level admin to socket that gets created at /var/lib/kolla/haproxy/haproxy.sock inside the HAProxy container. This allows operators to interact with HAProxy, including but not limited to disabling backend servers for controlled maintenance operations. bug 1960215.

  • horizon deployment now supports custom themes.

  • Implements container healthchecks for ironic-neutron-agent service. See blueprint

  • Implements container healthchecks for neutron-bgp-dragent service. See blueprint

  • Implements container healthchecks for solum services. See blueprint

  • Implements container healthchecks for storm services. See blueprint

  • Implements container healthchecks for zookeeper services. See blueprint

  • Adds support for running a libvirt daemon on the host, rather than in a container. This is done by setting enable_nova_libvirt_container to false. Currently this is only supported for fresh deployments without an existing nova_libvirt container.

  • Adds support for libvirt SASL authentication. It is enabled by default. LP#1964013

  • Adds support to the kolla-ansible certificates command for generating certificates for libvirt TLS, when libvirt_tls is true. The same certificate and key are used for the libvirt client and server.

    The certificates use the same root CA as the other generated certificates, and are written to {{ node_custom_config }}/nova/nova-libvirt/, ready to be picked up by nova-libvirt and nova-compute.

  • Adds a new variable to be used by the common role, cron_logrotate_schedule. This allows to configure how often the cron runs for logrotate.

  • Adds an SSH key for Neutron server which can be used for passwordless public key authentication in external systems (e.g. for networking-generic-switch managed switches).

  • Adds a kolla-ansible nova-libvirt-cleanup command, which may be used to clean up the nova_libvirt container. This may be useful if switching to a host libvirt daemon.

  • Keystone OIDC integration now uses memcached for the caching backend if enable_memcached is True. This can be disabled by setting keystone_oidc_enable_memcached to False.

  • Adds functionality to enable HW offload in OpenvSwitch using openvswitch_hw_offload variable.

  • Adds variables to define extra command-line parameters to be passed to Prometheus exporters:

    • prometheus_blackbox_exporter_cmdline_extras

    • prometheus_elasticsearch_exporter_cmdline_extras

    • prometheus_haproxy_exporter_cmdline_extras

    • prometheus_memcached_exporter_cmdline_extras

    • prometheus_mysqld_exporter_cmdline_extras

    • prometheus_node_exporter_cmdline_extras

    • prometheus_openstack_exporter_cmdline_extras

  • Add enable_prometheus_etcd_integration configuration parameter which can be used to configure Prometheus to scrape etcd metrics endpoints. The default value of enable_prometheus_etcd_integration is set to the combined values of enable_prometheus and enable_etcd.

  • Adds “manila_cephfs_filesystem_name” variable to support multi-fs Ceph Pacific+ deloyments.

  • Adds support for Rocky Linux 8 as Host OS.

  • Adds support for configuring a Vendordata file for Nova. This allows users to pass through arbitrary data to instances.

Known Issues

  • Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. These will not be deleted by the new logrotate config and will have to be removed manually.

Upgrade Notes

  • Minimum supported Ansible version is now 4 (ansible-core 2.11) and maximum supported is 5 (ansible-core 2.12).

  • Restores upstream default value for max_allowed_request_size_in_bytes in barbican.conf. It was set to 1000000 bytes instead of the upstream default of 25000 bytes.

  • RabbitMQ’s Prometheus plugin is no longer enabled by default if Prometheus is not deployed. If external Prometheus is used, you need to turn on rabbitmq_enable_prometheus_plugin to get old behaviour.

  • External Nova metadata service is now disabled by default. It can be enabled by setting nova_enable_external_metadata to yes.

  • With this release, kolla-ansible no longer creates admin endpoints for any service other than Keystone. Existing endpoints will not be removed automatically, if you want to clean up your existing cloud, you can use a command like:

    openstack endpoint list --interface admin -f value | \
    awk '!/keystone/ {print $1}' | xargs openstack endpoint delete
  • enable_host_ntp variable is dropped per the deprecation process.

  • Support for deploying vmtp has been dropped per the mailing list notice. The vmtp project is no longer buildable, is outside of the OpenStack namespace and looks plain abandoned. See the mailing list notice

  • fluentd_binary and fluentd_version variables are no longer in use as Kolla Ansible supports a single fluentd version across all supported Kolla image flavours.

  • Starting with Yoga, Ironic has changed the default PXE from plain PXE to iPXE. Kolla Ansible follows this upstream decision but allows users to revert to the previous default of plain PXE. For details, please refer to Kolla Ansible’s documentation.

  • The bootloader used to boot Ironic nodes in UEFI boot mode during inspection when iPXE is enabled has been changed from ipxe.efi to snponly.efi. This is in line with the default UEFI iPXE bootloader used in Ironic since the Xena release. The bootloader may be changed via ironic_dnsmasq_uefi_ipxe_boot_file.

  • ironic.conf now sets [pxe]\kernel_append_params instead of [pxe]\pxe_append_params which has been deprecated. Please override the new config option if you are overriding the old one.

  • The addition of libvirt SASL authentication requires a new password in passwords.yml, libvirt_sasl_password. This may be generated using the existing kolla-genpwd and kolla-mergepwd tooling.

  • The addition of libvirt SASL authentication requires both the nova_libvirt and nova_compute containers to be updated simultaneously, using new images with the necessary Cyrus SASL dependencies, as well as configuration containing the SASL credentials.

  • If both Designate and Neutron are enabled, Neutron now uses the subnet_dns_publish_fixed_ip instead of the simpler dns extension in order to support more features in the DNS integration. Override via the neutron_extension_drivers config option if this is not suitable for your deployment.

  • It is no longer possible to override the removal of the Monasca Log Metrics service and it will be removed automatically if it hasn’t already been removed in the Wallaby release. It is up to the operator to remove any associated docker volumes.

  • The policy for classic transient mirrored queues in RabbitMQ has been removed from the RabbitMQ configuration. The policy will be removed automatically during upgrade of the RabbitMQ service.

  • The wsrep-notify.sh script has been removed (following deprecation in Wallaby).

  • update the default value of node_custom_config to {{ node_config }}/config, when specified using –configdir

Deprecation Notes

  • The storage_interface variable is deprecated and will be removed in the next release as it was causing confusion. The variable only sets the default for swift_storage_interface which we now recommend to set directly instead.

Security Issues

  • Explicitly removes the net.ipv4.ip_forward sysctl from /etc/sysctl.conf on hosts with Neutron L3 Agent. In the absence of another source for this sysctl, it should revert to the default of 0 after the next reboot. This is a follow up to a previous change which stopped setting the sysctl, but leaves existing systems with the original value of 1 set.

    A deployer looking to more aggressively change the value may set neutron_l3_agent_host_ipv4_ip_forward to 0 using a Yoga release of Kolla Ansible. This option will be removed in future. Any deployments still relying on the previous value may set neutron_l3_agent_host_ipv4_ip_forward to 1. LP#1945453

  • Fixes an issue where the default configuration of libvirt did not use authentication for the API exposed over TCP on the internal API network. This allowed anyone with access to the internal API network read-write access to libvirt. While the internal API network is typically trusted, other services on this network generally at least require authentication.

    SASL authentication is now enabled for libvirt by default. Kolla Ansible supports libvirt TLS since the Train release, and this is recommended to provide a higher level of security. LP#1964013

  • Adds mitigation for the Apache Log4j2 Remote Code Execution (RCE) Vulnerability in Elasticsearch - CVE-2021-44228.

Bug Fixes

  • Fixes an issue with an OIDC authentication flow requiring unnecessary action from the user. Redirecting to the target IdP page now happens automatically. LP#930055

  • Removes custom value of max_allowed_secret_in_bytes in barbican.conf. The default maximum size in Barbican was doubled to avoid issues with some certificates. LP #1957795

  • Fixes deploy Zun with Cinder Ceph support. Adds support for zun to access cinder volumes when external ceph is configured for cinder. LP#1848934

  • Fixed the deployment failure of outward_rabbitmq by resolving port conflicts by customizing RabbitMQ’s prometheus.tcp.port. LP #1885106

  • Use Volume V3 API in OpenStack exporter. Volume V2 API has been removed since OpenStack Wallaby. LP#1938194

  • Adds the node parameter when using the rabbitmq_user Ansible module. LP#1946506

  • Fixes an issue with multinode MariaDB deployments which could fail the playbook execution on WSREP check due to the new behaviour of Galera 4. LP#1947485.

  • Fixes an issue with single node MariaDB deployments with HAProxy disabled. See bug 1947534 for details.

  • Fixes the generation of wsrep_cluster_address in galera.cnf when --limit is used while deploying MariaDB nodes. LP#1947589

  • Fixes the copy job for grafana custom home dashboard file. The copy job for the grafana home dashboard file needs to run priviliged, otherwise permission denied error occurs. LP#1947710

  • Fixes an error in placement role which prevents to deploy the placement service when custom policy file is used. LP#1948835

  • Fixes missing current Ansible version in the error message. LP#1948979

  • Fix octavia role doesn’t set the amphora network’s gateway_ip LP#1949260

  • Fixes Octavia’s “Connection refused” errors by adding ovn_sb_connection to octavia.conf. LP#195011

  • Ironic API and Ironic Inspector API use separate policy files. Ironic role was updated to be able to handle both policies separately. LP#1952948

  • Only run configure ovn in ovsdb task on ovn-controller hosts The task will fail on hosts (like controller nodes) without tunnel interface LP#1953367

  • Continue to run all actions if one action failed in Elasticsearch curator. LP#1954720

  • Fixes Placement no logrotate configuration LP#1954723

  • Fixes Nova resize failing when migration_interface is customised. LP#1956976

  • Fixes unable to connect to zun console when kolla_enable_tls_external is true. Access to console of any zun container fails when kolla_enable_tls_external is true. This fix sets the protocol for wsproxy base_url in zun.conf according to the value of kolla_enable_tls_external LP#1957117

  • Fixes Register Identity Providers in OpenStack task which was missing an = in the openstack command causing the task to fail to register an IDP with Keystone. LP#1959022

  • Fixes Glance with Cinder iSCSI backend failing due to lack of lock_path setting. LP#1959663

  • Fixes logrotate config missing for openvswitch and prometheus services. LP#1961795

  • Fixes an issue with Ironic’s PXE components not getting updated on upgrade. LP#1963752

  • Adds Fluentd configurations to allow matching OpenvSwitch logs. LP#1965815

  • Fixes an issue where the Nova API logs were written to files ending with -wsgi.log which affected the processing of these logs in the Fluentd pipeline. LP#1950185

  • Fixes configuration of the Prometheus HTTP API URL when using the Prometheus collector in CloudKitty. LP#1961615

  • Fixes an issue with Prometheus scraping when targets’ Ansible inventory hostnames (inventory_hostname) do not resolve to reachable IP addresses. Reverts to the previous behaviour of using IP addresses to communicate with targets. The side effect of this is that targets instances will again be labelled using IP addresses rather than hostnames. LP#1955563

  • Fix the apache’s wsgi configuration for the aodh service in Debuntu binary flavours. LP#1953059

  • Fixes the baremetal role to avoid an error “Unable to remove “libvirtd”. Now the symlink /etc/apparmor.d/disable/usr.sbin.libvirtd is created by the role. LP#1960302

  • Existing fluentd log rotation failed to delete old haproxy, swift, glance-tls-proxy and neutron-tls-proxy logs. Standardise rotation and deletion of logs using logrotate.

  • Fixes an issue with setting up OIDC based Keystone federation against IDP that has a different response type than id_token. This can now be set using a new variable keystone_federation_oidc_response_type. LP#1959781

  • adds back the option to configure the rabbitmq clustering interface via kolla LP#1900160 <https://bugs.launchpad.net/kolla-ansible/+bug/1900160>

  • On slower nodes, the initial grafana startup could experience a timeout failure when the migrations for setting up the database took longer than expected. This has been fixed by increasing the default timeout. The timeout settings can be changed via new parameters grafana_start_first_node_delay and grafana_start_first_node_retries for the grafana role. LP#1769962

  • Fixes an issue seen when using Jinja2 3.1.0.

  • Fixes the configuration option setting the type of endpoint used by Neutron to send requests to Placement. LP#1960503

  • Fixes a configuration issue with Node Exporter causing all file system metrics of a host to be identical. LP#1961438

  • Fixes an issue where RabbitMQ was configured to mirror classic transient queues for all services. According to the RabbitMQ documentation this is not a supported configuration, and contributed to numerous bug reports. LP#1954925

  • Removes “fix_cephfs_owner.yaml” which related to pre-wallaby Manila’s use of subfolders. Post-wallaby Manila now uses cephfs volumes instead, as such this file is no longer required. LP#1938285 LP#1935784

  • Removes use of “cephfs_enable_snapshots” in Manila config as this option was removed from Manila in the Wallaby release.

  • Fixes an issue with Cinder upgrade where Cinder services would remain pinned to the previous release’s RPC & object versions. LP#1954932

Other Notes

  • The container ironic-dnsmasq now creates the dnsmasq.log just as the container neutron-dhcp-agent. For both log files verbosity can be increased globally via openstack_logging_debug or per service via ironic_logging_debug or neutron_logging_debug variables.