Queens Series Release Notes

Queens Series Release Notes


New Features

  • A new variable has been added to allow a deployer to control the restart of containers from common-tasks/os-lxc-container-setup.yml. This new option is lxc_container_allow_restarts and has a default of true. If a deployer wishes to disable the auto-restart functionality they can set this value to false and automatic container restarts will be disabled. This is a complement to the same option already present in the lxc_container_create role. This option is useful to avoid uncoordinated restarts of galera or rabbitmq containers if the LXC container configuration changes in a way that requires a restart.
  • OpenStack-Ansible now supports the openSUSE Leap 42.X distributions mainly targeting the latest 42.3 release.
  • The Ceph stable release used by openstack-ansible and its ceph-ansible integration has been changed to the recent Ceph LTS Luminous release.
  • The galera cluster now supports cluster health checks over HTTP using port 9200. The new cluster check ensures a node is healthy by running a simple query against the wsrep sync status using monitoring user. This change will provide for a more robust cluster check ensuring we have the most fault tolerant galera cluster possible.
  • A typical OSA install will put the neutron and octavia queues on different vhosts thus preventing the event streamer from working While octavia is streaming to its own queue the consumer on the neutron side listens to the neutron queue. With a recent octavia enhancement a separate queue for the event streamer can be configured. This patch will set up the event streamer to post into the neutron queue using neutron’s credentials. Thus reaching the consumer on the neutron-lbaas side and allowing for streaming.
  • Generating and validating checksums for all files installed by packages is now disabled by default. The check causes delays in playbook runs and it can consume a significant amount of CPU and I/O resources. Deployers can re-enable the check by setting security_check_package_checksums to yes.
  • Deployers of CentOS 7 environments can use the openstack_hosts_enable_yum_fastestmirror variable to enable or disable yum’s fastestmirror plugin. The default setting of yes ensures that fastestmirror is enabled.
  • New hypervisor groups have been added allowing deployers to better define their compute workloads. While the generic “compute_hosts” group will still work explicit definitions for compute hosts can now be defined using the ironic-compute_hosts, kvm-compute_hosts, lxd-compute_hosts, qemu-compute_hosts, and powervm-compute_hosts groups accordingly
  • An option has been added allowing the user to define the user_group LBaaSv2 uses. The new option is neutron_lbaasv2_user_group and is set within the OS specific value by default.
  • The maximum amount of time to wait until forcibly failing the LXC cache preparation process is now configurable using the lxc_cache_prep_timeout variable. The value is specified in seconds, with the default being 20 minutes.
  • A new variable has been added which allows deployers to set the container technology OSA will use when running a deployment in containers. This new variable is container_tech which has a default value of “lxc”.
  • The lxcbr0 bridge now allows NetworkManager to control it, which allows for networks to start in the correct order when the system boots. In addition, the NetworkManager-wait-online.service is enabled to ensure that all services that require networking to function, such as keepalived, will only start when network configuration is complete. These changes are only applied if a deployer is actively using NetworkManager in their environment.
  • Neutron connectivity agents will now be deployed on baremetal within the “network_hosts” defined within the openstack_user_config.yml.
  • Galera healthcheck has been improved, and relies on an xinetd service. By default, the service is unaccessible (filtered with the no_access directive). You can override the directive by setting any xinetd valid value to galera_monitoring_allowed_source.
  • HAProxy services that use backend nodes that are not in the Ansible inventory can now be specified manually by setting haproxy_backend_nodes to a list of name and ip_addr settings.
  • Open vSwitch dataplane with NSH support has been implemented. This feature may be activated by setting ovs_nsh_support: True in /etc/openstack_deploy/user_variables.yml.
  • A new variable, tempest_roles, has been added to the os_tempest role allowing users to define keystone roles to be during tempest testing.
  • The security_sshd_permit_root_login setting can now be set to change the PermitRootLogin setting in /etc/ssh/sshd_config to any of the possible options. Set security_sshd_permit_root_login to one of without-password, prohibit-password, forced-commands-only, yes or no.
  • Persistent systemd journals are now enabled. This allows deployers to keep older systemd journals on disk for review. The disk space requirements are extremely low since the journals are stored in binary format. The default location for persistent journals is in /var/log/journal.

    Deployers can opt out of this change by setting openstack_host_keep_journals to no.

  • The extra packages percona packages used by the ppc64le are now downloaded by the Ansible deployment host by default, as opposed to the target hosts. Once downloaded the packages are pushed up to the target hosts. This behaviour may be adjusted by setting galera_server_extra_package_downloader to target-host. The packages are downloaded to the path set in galera_server_extra_package_path.
  • The repo server now implements nginx as a reverse proxy for python packages sourced from pypi. The initial query will be to a local deployment of pypiserver in order to serve any locally built packages, but if the package is not available locally it will retry the query against the upstream pypi mirror set in the variable repo_nginx_pypi_upstream (defaults to pypi) and cache the response.
  • Deployers can set a refresh interval for haproxy’s stats page by setting the haproxy_stats_refresh_interval variable. The default value is 60, which causes haproxy to refresh the stats page every 60 seconds.
  • The tempest_images data structure for the os_tempest role now expects the values for each image to include name (optionally) and format (the disk format). Also, the optional variable checksum may be used to set the checksum expected for the file in the format <algorithm>:<checksum>.
  • The default location for the image downloads in the os_tempest role set by the tempest_image_dir variable has now been changed to be /opt/cache/files in order to match the default location in nodepool. This improves the reliability of CI testing in OpenStack CI as it will find the file already cached there.
  • A new variable has been introduced into the os_tempest role named tempest_image_downloader. When set to deployment-host (which is the default) it uses the deployment host to handle the download of images to be used for tempest testing. The images are then uploaded to the target host for uploading into Glance.
  • The tasks within the ansible-hardening role are now based on Version 1, Release 3 of the Red Hat Enteprise Linux Security Technical Implementation Guide.
  • The sysctl parameter kernel.randomize_va_space is now set to 2 by default. This matches the default of most modern Linux distributions and it ensures that Address Space Layout Randomization (ASLR) is enabled.
  • The Datagram Congestion Control Protocol (DCCP) kernel module is now disabled by default, but a reboot is required to make the change effective.
  • An option to disable the machinectl quota system has been added. The variable lxc_host_machine_quota_disabled is a Boolean with a default of true. When this option is set to true it will disable the machinectl quota system.
  • Enable Kernel Shared Memory support by setting nova_compute_ksm_enabled to True.
  • When using Glance and NFS the NFS mount point will now be managed using a systemd mount unit file. This change ensures the deployment of glance is not making potentially system impacting changes to the /etc/fstab and modernizes how we deploy glance when using shared storage.
  • New variables have been added to the glance role allowing a deployer to set the UID and GID of the glance user. The new options are, glance_system_user_uid and glance_system_group_uid. These options are useful when deploying glance with shared storage as the back-end for images and will only set the UID and GID of the glance user when defined.
  • Searching for world-writable files is now disabled by default. The search causes delays in playbook runs and it can consume a significant amount of CPU and I/O resources. Deployers can re-enable the search by setting security_find_world_writable_dirs to yes.

Known Issues

  • Ceph storage backend is known not to work on openSUSE Leap 42.X yet. This is due to missing openSUSE support in the upstream Ceph Ansible playbooks.

Upgrade Notes

  • The ceph-ansible integration has been updated to support the ceph-ansible v3.0 series tags. The new v3.0 series brings a significant refactoring of the ceph-ansible roles and vars, so it is strongly recommended to consult the upstream ceph-ansible documentation to perform any required vars migrations before you upgrade.
  • The ceph-ansible common roles are no longer namespaced with a galaxy-style ‘.’ (ie. ceph.ceph-common is now cloned as ceph-common), due to a change in the way upstream meta dependencies are handled in the ceph roles. The roles will be cloned according to the new naming, and an upgrade playbook ceph-galaxy-removal.yml has been added to clean up the stale galaxy-named roles.
  • The Ceph stable release used by openstack-ansible and its ceph-ansible integration has been changed to the recent Ceph LTS Luminous release.
  • KSM configuration is changed to disabled by default on Ubuntu. If you overcommit the RAM on your hypervisor it’s a good idea to set nova_compute_ksm_enabled to True.
  • The glance v1 API is now disabled by default as the API is scheduled to be removed in Queens.
  • The glance registry service is now disabled by default as it is not required for the v2 API and is scheduled to be removed in the future. The service can be enabled by setting glance_enable_v2_registry to True.
  • When upgrading there is nothing a deployer must immediately do to run neutron agent services on hosts within the network_hosts group. Simply executing the playbooks will deploy the neutron servers on the baremetal machines and will leave all existing agents containers alone.
  • It is recommended for deployers to clean up the neutron_agents container(s) after an upgrade is complete and the cluster has been verified as stable. This can be done by simply disabling the neutron agents running in the neutron_agent container(s), re-balancing the agent services targeting the new baremetal agents, deleting the container(s), and finally removing the container(s) from inventory.
  • Default quotas were bumped for the following resources: networks (from 10 to 100), subnets (from 10 to 100), ports (from 50 to 500) to match upstream defaults.
  • Any tooling using the Designate v1 API needs to be reworked to use the v2 API
  • The variable lxc_host_machine_volume_size now accepts any valid size modifier acceptable by truncate -s and machinectl set-limit. prior to this change the option assumed an integer was set for some value in gigabytes. All acceptable values can be seen within the documentation for machinectl
  • If you have overriden your openstack_host_specific_kernel_modules, please remove its group matching, and move that override directly to the appropriate group.

    Example, for an override like:

    - name: "ebtables"
      group: "network_hosts"

    You can create a file for the network_host group, inside its group vars folder /etc/openstack_deploy/group_vars/network_hosts, with the content:

    - name: "ebtables"
  • Any user that is coming from Pike or below on Ubuntu should modify its user_external_repos_list, switching its ubuntu cloud archive repository from state: present to state: absent. From now on, UCA will be defined with the filename uca. If the deployer wants to use its mirror, he can still override the variable uca_repo to point to its mirror. Alternatively, the deployer can completely define which repos to add and remove, ignoring our defaults, by overriding openstack_hosts_package_repos.

Deprecation Notes

  • The galera_percona_xtrabackup_repo_url variable which was used on Ubuntu distributions to select the upstream Percona repository has been dropped and the default upstream repository is always used from now on.
  • The variables keystone_memcached_servers and keystone_cache_backend_argument have been deprecated in favor of keystone_cache_servers, a list of servers for caching purposes.
  • In OSA deployments prior to Queens, if repo_git_cache_dir was set to a folder which existed on a repo container host then that folder would be symlinked to the repo container bind mount instead of synchronising its contents to the repo container. This functionality is deprecated in Queens and will be removed in Rocky. The ability to make use of the git cache still exists, but the folder contents will be synchronised from the deploy host to the repo container. If you have made use of the symlink functionality previously, please move the contents to a standard folder and remove the symlink.
  • The Ceilometer API is no longer available in the Queens release of OpenStack, this patch removes all references to API related configurations as they are no longer needed.
  • The galera_client_opensuse_mirror_obs_url variable has been removed since the OBS repository is no longer used to install the MariaDB packages.
  • The glance_enable_v1_registry variable has been removed. When using the glance v1 API the registry service is required, so having a variable to disable it makes little sense. The service is now enabled/disabled for the v1 API using the glance_enable_v1_api variable.
  • The nova_placement database which was implemented in the ocata release of OpenStack-Ansible was never actually used for anything due to reverts in the upstream code. The database should be empty and can be deleted. With this the following variables also no longer have any function and have been removed.
    • nova_placement_galera_user
    • nova_placement_galera_database
    • nova_placement_db_max_overflow
    • nova_placement_db_max_pool_size
    • nova_placement_db_pool_timeout
  • The following variables have been removed as they no longer serve any purpose.

    • galera_package_arch
    • percona_package_download_validate_certs
    • percona_package_url
    • percona_package_fallback_url
    • percona_package_sha256
    • percona_package_path
    • qpress_package_download_validate_certs
    • qpress_package_url
    • qpress_package_fallback_url
    • qpress_package_sha256
    • qpress_package_path

    The functionality previously using these variables has been transitioned to using a simpler data structure.

  • The following variables have been removed from the os_tempest role to simplify it. They have been replaced through the use of the data structure tempest_images which now has equivalent variables per image. - cirros_version - tempest_img_url - tempest_image_file - tempest_img_disk_format - tempest_img_name - tempest_images.sha256 (replaced by checksum)

Critical Issues

  • The ceph-ansible integration has been updated to support the ceph-ansible v3.0 series tags. The new v3.0 series brings a significant refactoring of the ceph-ansible roles and vars, so it is strongly recommended to consult the upstream ceph-ansible documentation to perform any required vars migrations before you upgrade.
  • The Designate V1 API has been removed, and cannot be enabled.

Security Issues

  • The PermitRootLogin in sshd_config changed from ‘yes’ to ‘prohibit-password’ in the containers. By default there is no password set in the containers but the ssh pub key from the deployment host is injected in the targets nodes authorized_keys.
  • The following headers were added as additional default (and static) values. X-Content-Type-Options nosniff, X-XSS-Protection “1; mode=block”, and Content-Security-Policy “default-src ‘self’ https: wss:;”. Additionally, the X-Frame-Options DENY header was added, defaulting to DENY. You may override the header via the keystone_x_frame_options variable.
  • Since we use neutron’s credentials to access the queue, security conscious people might want to set up an extra user for octavia on the neutron queue restricted to the topics octavia posts to.

Bug Fixes

  • When the glance_enable_v2_registry variable is set to True the corresponding data_api setting is now correctly set. Previously it was not set and therefore the API service was not correctly informed that the registry was operating.
  • The os_tempest tempest role was downloading images twice - once arbitrarily, and once to use for testing. This has been consolidated into a single download to a consistent location.
  • SELinux policy for neutron on CentOS 7 is now provided to fix SELinux AVCs that occur when neutron’s agents attempt to start daemons such as haproxy and dnsmasq.

Other Notes

  • openSUSE Leap 42.X support is still work in progress and not fully tested besides basic coverange in the OpenStack CI and individual manual testing. Even though backporting fixes to the Pike release will be done on best effort basis, it’s advised to use the master branch when working on openSUSE hosts.
  • CentOS deployments require a special COPR repository for modern LXC packages. The COPR repository is not mirrored at this time and this causes failed gate tests and production deployments.

    The role now syncs the LXC packages down from COPR to each host and builds a local LXC package repository in /opt/thm-lxc2.0. This greatly reduces the amount of times that packages must be downloaded from the COPR server during deployments, which will reduce failures until the packages can be hosted with a more reliable source.

    In addition, this should speed up playbook runs since yum can check a locally-hosted repository instead of a remote repository with availability and performance challenges.

  • Added support for specifying GID and UID for cinder system user by defining cinder_system_user_uid and cinder_system_group_gid. This setting is optional.
  • The variables nova_scheduler_use_baremetal_filters and nova_metadata_host have been removed, matching upstream nova changes. The nova_virt_types dict no longer needs the nova_scheduler_use_baremetal_filters and nova_firewall_driver keys as well.
  • The max_fail_percentage playbook option has been used with the default playbooks since the first release of the playbooks back in Icehouse. While the intention was to allow large-scale deployments to succeed in cases where a single node fails due to transient issues, this option has produced more problems that it solves. If a failure occurs that is transient in nature but is under the set failure percentage the playbook will report a success, which can cause silent failures depending on where the failure happened. If a deployer finds themselves in this situation the problems are are then compounded because the tools will report there are no known issues. To ensure deployers have the best deployment experience and the most accurate information a change has been made to remove the max_fail_percentage option from all of the default playbooks. The removal of this option has the side effect of requiring the deploy to skip specific hosts should one need to be omitted from a run, but has the benefit of eliminating silent, hard to track down, failures. To skip a failing host for a given playbook run use the –limit ‘!$HOSTNAME’ CLI switch for the specific run. Once the issues have been resolved for the failing host rerun the specific playbook without the –limit option to ensure everything is in sync.
  • The use_neutron option was marked to be removed in sahara.
  • The vars plugin override_folder.py has been removed. With the move to Ansible 2.4 [“https://review.openstack.org/#/c/522778”] this plugin is no longer required. The functionality this plugin provided has been replaced with the native Ansible inventory plugin.
  • The variable lxc_host_machine_volume_size is used to set the size of the default sparse file as well as define a limit within the machinectl quota system. When the machinectl quota system is enabled deployers should appropriately set this value to the size of the container volume, even when not using a sparse file.
  • The container image cache within machinectl has been set to “64G” by default.
Creative Commons Attribution 3.0 License

Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.