Zed Series Release Notes

11.0.2

Upgrade Notes

  • A patch that fixes an issue making the VIP port unreachable because of missing IP rules requires an update of the Amphora image.

Bug Fixes

  • Fixed the ability to use the ‘text/plain’ mime type with the healthcheck endpoint.

  • Fixed an issue when deleting the last listener from a load balancer may trigger a failover.

  • Fixed an issue when using certificates with a blank subject or missing CN.

  • The validation for the allowed_cidr parameter only took into account the IP version of the primary VIP. CIDRs which only matched the version of an additonal VIP were rejected. This if fixed and CIDRs are now matched against the IP version of all VIPs.

  • Fix amphora haproxy_count to return the number of haproxy processes that are running.

  • Fixed a bug in amphorav1, the subnet of a member that was being deleted was not immediately unplugged from the amphora, but only during the next update of the members.

  • Fixed an issue when adding or deleting a member, Octavia might have reconfigured the management port of the amphora by adding or removing additional subnets. Octavia no longer updates the management port during those tasks.

  • Fixed a bug that could have made the VIP port unreachable because of the removal of some IP rules in the Amphora. It could have been triggered only when sending a request from a subnet that is not the VIP subnet but that is plugged as a member subnet.

  • Fixed an issue with load balancers stuck in a PENDING_* state during database outages. Now when a task fails in Octavia, it retries to update the provisioning_status of the load balancer until the database is back (or it gives up after a really long timeout - around 2h45)

  • Fixed a bug in octavia-status which reported an incorrect status for the amphorav2 driver when using the default amphora alias.

  • Modified default Keepalived LVS persistence granularity configuration value so it would be ipv6 compatible.

  • Fixed a race condition in the members batch update API call, the data passed to the Octavia worker service may have been incorrect when quickly sending successive API calls. Then the load balancer was stuck in PENDING_UPDATE provisioning_status.

  • Fixed a too long timeout when attempting to start the VRRP service in an unreachable amphora during a failover. A specific shorter timeout should be used during the failovers.

  • Reduce the duration of the failovers of ACTIVE_STANDBY load balancers. Many updates of an unreachable amphora may have been attempted during a failover, now if an amphora is not reachable at the first update, the other updates are skipped.

  • Reduce the duration of the failovers of ACTIVE_STANDBY load balancers when both amphorae are unreachable.

Other Notes

  • Noop certificate manager was added. Now any Octavia certificate operations using noop drivers will be faster (as they won’t be validated).

11.0.1

Security Issues

  • Filter out private information from the taskflow logs when ‘’INFO’’ level messages are enabled and when jobboard is enabled. Logs might have included TLS certificates and private_key. By default, in Octavia only WARNING and above messages are enabled in taskflow and jobboard is disabled.

Bug Fixes

  • Added a filter to hide a bogus ComputeWaitTimeoutException exception when creating an amphora when jobboard is disabled. This exception is part of the flow when creating a load balancer or an amphora and should not be shown to the user.

  • The parameters of a taskflow Flow were logged in ‘’INFO’’ level messages by taskflow, it included TLS-enabled listeners and pools parameters, such as certificates and private_key.

  • Fix an authentication error with Barbican when creating a TERMINATED_HTTPS listener with application credential tokens or trust IDs.

  • Fixed a potential race condition in the member batch update API call, the load balancers might not have been locked properly.

  • Fixed a “corrupted global server state file” error in Centos 9 Stream when reloading the state of the servers after restarting haproxy. It also fixed the recovering of the operational state of the servers in haproxy after its restart.

  • Fix a bug when full graph of load balancer is created without listeners if jobboard_enabled=False

  • Fixed a bug that prevented Octavia from creating listeners with the fully-populated load balancer API in SINGLE topology mode.

  • Fixed backwards compatibility issue with the feature that preserves HAProxy server states between reloads. HAProxy version 1.5 or below do not support this feature, so Octavia will not to activate it on amphorae with those versions.

  • Fixed a bug that didn’t set all the active load balancer Health Monitors ONLINE in populated LB single-create calls.

  • Fix a bug that prevented the operating_status of a health-monitor to be set to ONLINE when ipv6 addresses were enclosed within square brackets in controller_ip_port_list.

  • Fixed a potential error when plugging a member from a new network after deleting another member and unplugging its network. Octavia may have tried to plug the new network to a new interface but with an already existing name. This fix requires to update the Amphora image.

  • Fix an issue with PING health-monitors on Centos 8 Stream. Changes in Centos and systemd prevent an unprivileged user from sending ping requests from a network namespace.

  • Fixed a bug that didn’t set the correct provisioning_status for unattached pools when creating a fully-populated load balancer.

  • Fixed an SELinux issues with TCP-based health-monitor on UDP pools, some specific monitoring ports were denied by SELinux. The Amphora image now enables the keepalived_connect_any SELinux boolean that allows connections to any ports.

  • When plugging a new member subnet, the amphora sends an IP advertisement of the newly allocated IP. It allows the servers on the same L2 network to flush the ARP entries of a previously allocated IP address.

11.0.0

New Features

  • Configuration of the amphora’s timezone is now possible using new configuration setting “amp_timezone” in the controller_worker options group.

  • Octavia now supports oslo.message notifications for loadbalancer create, delete, and update operations.

  • A new configuration option failover_threshold can be set to limit the number of amphorae simultaneously pending failover before halting the automatic failover process. This should help prevent unwanted mass failover events that can happen in cases like network interruption to an AZ or the database becoming read-only. This feature is not enabled by default, and it should be configured carefully based on the size of the environment. For example, with 100 amphorae a good threshold might be 20 or 30, or a value greater than the typical number of amphorae that would be expected on a single host.

  • It is now possible to create a loadbalancer with more than one VIP. There is a new structure additional_vips in the create body, which allows a subnet, and optionally an IP, to be specified. All VIP subnets must be part of the same network.

Known Issues

  • When using a distribution with a recent SELinux release such as CentOS 8 Stream, PING health-monitor does not work as shell_exec_t calls are denied by SELinux.

  • Fixed configuration issue which allowed authenticated and authorized users to inject code into HAProxy configuration using API requests. Octavia API no longer accepts unencoded whitespace characters in url_path values in update requests for healthmonitors.

Upgrade Notes

  • A new option is provided in the oslo_messaging namespace to disable event_notifications.

  • The default for the output file has been changed in diskimage-create.sh. It is now amphora-x64-haproxy.qcow2 instead of amphora-x64-haproxy.

  • The fix that updates the Netfilter Conntrack Sysfs variables requires rebuilding the amphora image in order to be effective.

  • Update Python base version from 3.6 to 3.8. As per Openstack Python runtime versions policy Python 3.8 will be the the minimum Python version in the Zed release cycle.

  • To support multi-VIP loadbalancers, a new amphora image must be built. It is safe to upload the new image before the upgrade, as it is fully backwards compatible.

  • diskimage-create defaults now to distribution release 9 when selecting RHEL as base OS and to release 9-stream when selecting CentOS as base OS.

Deprecation Notes

  • The ‘amphorav1’ provider is deprecated and will be removed in a future release. Use the ‘amphora’ provider (an alias for ‘amphorav2’) instead.

Bug Fixes

  • In order to avoid hitting the Neutron API hard when batch update with creating many new members, we cache the subnet validation results in batch update members API call. We also change to validate new members only during batch update members since subnet ID is immutable.

  • diskimage-create.sh used $AMP_OUTPUTFILENAME.$AMP_IMAGETYPE for constructing the image file path when checking the file size, which was not correct and caused an “No such file or directory” error.

  • Ensure that the provided rsyslog configuration file is used by rsyslog in the amphora by restarting the service when using the amphorav1 provider, it fixes the log offloading feature on distributions that start rsyslog before cloud-init.

  • Fix an issue that may have occurred when running the amphorav2 with persistence, the ComputeActiveWait was incorrectly executed twice on different controllers.

  • Fix disabled UDP pools. Disabled UDP pools were marked as “OFFLINE” but the requests were still forwarded to the members of the pool.

  • Fix the shutdown of the driver-agent, the process might have been stuck while waiting for threads to finish. Systemd would have killed the process after a timeout, but some children processes might have leaked on the controllers.

  • Enable required SELinux booleans for CentOS or RHEL amphora image.

  • Fix a bug that prevented the provisioning_state of a health-monitor to be set to ERROR when an error occurred while creating, updating or deleting a health-monitor.

  • Fixes listener creation failure when protocol used is PROXY or PROXYV2 which are pool protocol and not listener protocol.

  • Fix update listener certs failed. The fix ensures that an existing certificate gets overwritten properly.

  • Netfilter Conntrack Sysfs variables net.netfilter.nf_conntrack_max and nf_conntrack_expect_max get set to sensible values on the amphora now. Previously, kernel default values were used which were much too low for the configured net.netfilter.nf_conntrack_buckets value. As a result packets could get dropped because the conntrack table got filled too quickly. Note that this affects only UDP and SCTP protocol listeners. Connection tracking is disabled for TCP-based connections on the amphora including HTTP(S).

  • Now the [nova] service_name parameter is effectively used to find the nova endpoint in keystone catalog. The parameter had no effect before it was fixed.

  • Fix PING health-monitors with recent haproxy releases (>=2.2), haproxy now requires an additional “insecure-fork-wanted” option to authorize the Octavia PING healthcheck.

  • Fix a bug when adding a member on a subnet that belongs to a network with multiple subnets, an incorrect subnet may have been plugged in the amphora.

  • Fix a bug when deleting the last member plugged on a network, the port that was no longer used was not deleted.

  • Fix a bug when updating a load balancer with a QoS policy after a failover, Octavia attempted to update the VRRP ports of the deleted amphorae, moving the provisioning status of the load balancer to ERROR.

  • Fix a potential race condition when updating a resource in the amphorav2 worker. The worker was not waiting for the resource to be set to PENDING_UPDATE, so the resource may have been updated with old data from the database, resulting in a no-op update.

  • Fix the rescheduling of taskflow tasks that have been resumed after being interrupted.

  • Fixed issue with SELinux and the lvs-masquerade.sh script on the amphora. The script already runs with root permissions, so the use of sudo inside the script is unneeded.

  • Fix an issue when Octavia performs a failover of an ACTIVE-STANDBY load balancer that has both amphorae missing. Some tasks in the controller took too much time to timeout because the timeout value defined in [haproxy_amphora].active_connection_max_retries and [haproxy_amphora].active_connection_rety_interval was not used.

  • Fix a serialization issue when using TLSContainer with amphorav2 driver with persistence, a list of bytes type in the data model was not correctly converted to serializable data.

  • Fixed “Could not retrieve certificate” error when updating/deleting the client_ca_tls_container_ref field of a listener after a CA/CRL was deleted.

  • Fix a python3 error that prevented to use the [controller_worker]/user_data_config_drive option when building amphorae.

  • Fixed validations in L7 rule and session cookie APIs in order to prevent authenticated and authorized users to inject code into HAProxy configuration. CR and LF (\r and \n) are no longer allowed in L7 rule keys and values. The session persistence cookie names must follow the rules described in https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie.

  • Fix load balancers stuck in PENDING_UPDATE issues for some API calls (POST /l7rule, PUT /pool) when a provider denied the call.

  • The Octavia API returned an unhelpful message when a constraint failed while creating an object in the DB. The error now contains the name and the value of the parameter that breaks the constraints.

  • Validate that the creation of L7 policies is compatible with the protocol of the listener in the Amphora driver. L7 policies are allowed for Terminated HTTPS or HTTP protocol listeners, but not for HTTPS, TCP or UDP protocols listeners.

Other Notes

  • The netaddr python module has been removed as an Octavia requirement. It has been replaced with the python standard library ‘ipaddress’ module.

  • Admin documentation page has been added to explain the available events, the notification format, and how to disable event notifications.

  • The string representation of data base model objects has been improved. Calling str() on them will return a certain subset of fields and calling repr() on them will return all fields. This is helpful for debugging, but it may also change some of the log messages that Octavia emits.

10.0.0

New Features

  • Added a new PROMETHEUS listener that exposes a prometheus exporter endpoint.

Known Issues

  • PROMETHEUS listeners will not report information for UDP or SCTP listeners.

Upgrade Notes

  • PROMETHEUS listeners require an amphora image with HAProxy 2.0 or newer.

  • The [haproxy_amphora].active_connection_rety_interval configuration option has been renamed to [haproxy_amphora].active_connection_retry_interval. An alias for the old name is in place to maintain compatibility with old configuration files.

Bug Fixes

  • Increased the TCP buffer memory maximum and enabled MTU ICMP black hole detection.

  • The generated RSyslog configuration on the amphora supports now RSyslog failover with TCP if multiple RSyslog servers were specified.

  • Ensure that the provided rsyslog configuration file is used by the rsyslog by restarting the service, it fixes the log offloading feature on distributions that start rsyslog before cloud-init.

  • The [haproxy_amphora].active_connection_rety_interval configuration option has been renamed to [haproxy_amphora].active_connection_retry_interval.

  • Fixed issues when building amphora image for Centos Stream 9.

  • Fixed issues when building amphora image for RHEL 9.

  • Correctly detect the member operating status “drain” when querying status data from HAProxy.

  • Fix an issue with IPv6 members that could have been set in operating_status ERROR just after being added.

  • Fix an issue with amphorav2 and persistence, some long tasks executed by a controller might have been released in taskflow and rescheduled on another controller. Octavia now ensures that a task is never released early by using a keepalive mechanism to notify taskflow (and its redis backend) that a job is still running.

  • Fixed an issue with members in ERROR operating status that may have been updated briefly to ONLINE during a Load Balancer configuration change.

  • Fix an issue with the provisioning status of a load balancer that was set to ERROR too early when an error occurred, making the load balancer mutable while the execution of the tasks for this resources haven’t finished yet.

  • Fix an issue that could set the provisioning status of a load balancer to a PENDING_UPDATE state when an error occurred in the amphora failover flow.

  • Fix a bug that could have triggered a race condition when configuring a member interface in the amphora. Due to a race condition, a network interface might have been deleted from the amphora, leading to a loss of connectivity.