Queens Series Release Notes

Queens Series Release Notes

17.0.8-13

New Features

  • It is now possible to specify a list of tests for tempest to blacklist when executing using the tempest_test_blacklist list variable.

17.0.8

Deprecation Notes

  • The repo server’s reverse proxy for pypi has now been removed, leaving only the pypiserver to serve packages already on the repo server. The attempt to reverse proxy upstream pypi turned out to be very unstable with increased complexity for deployers using proxies or offline installs. With this, the variables repo_nginx_pypi_upstream and repo_nginx_proxy_cache_path have also been removed.

17.0.7

Bug Fixes

  • The conditional that determines whether the sso_callback_template.html file is deployed for federated deployments has been fixed.

17.0.6

New Features

  • The option rabbitmq_erlang_version_spec has been added allowing deployers to set the version of erlang used on a given installation.

Known Issues

  • With the release of CentOS 7.5, all pike releases are broken due to a mismatch in version between the libvirt-python library specified by the OpenStack community, and the version provided in CentOS 7.5. As such OSA is unable build the appropriate python library for libvirt. The only recourse for this is to upgrade the environment to the latest queens release.

Deprecation Notes

  • The use of the apt_package_pinning role as a meta dependency has been removed from the rabbitmq_server role. While the package pinning role is still used, it will now only be executed when the apt task file is executed.
  • The variable nova_compute_pip_packages is no longer used and has been removed.

Bug Fixes

  • In order to prevent further issues with a libvirt and python-libvirt version mismatch, KVM-based compute nodes will now use the distribution package python library for libvirt. This should resolve the issue seen with pike builds on CentOS 7.5.

17.0.5

New Features

  • Octavia requires SSL certificates for communication with the amphora. This adds the automatic creation of self signed certificates for this purpose. It uses different certificate authorities for amphora and control plane thus insuring maximum security.

Known Issues

  • All OSA releases earlier than 17.0.5, 16.0.4, and 15.1.22 will fail to build the rally venv due to the release of the new cmd2-0.9.0 python library. Deployers are encouraged to update to the latest OSA release which pins to an appropriate version which is compatible with python2.
  • Recently the spice-html5 git repository was entirely moved from https://github.com/SPICE/spice-html5 to https://gitlab.freedesktop.org/spice/spice-html5. This results in a failure in the git clone stage of the repo-build.yml playbook for OSA queens releases earlier than 17.0.5. To fix the issue, deployers may upgrade to the most recent release, or may implement the following override in user_variables.yml.

    nova_spicehtml5_git_repo: https://gitlab.freedesktop.org/spice/spice-html5.git
    

Upgrade Notes

  • The distribution package lookup and data output has been removed from the py_pkgs lookup so that the repo-build use of py_pkgs has reduced output and the lookup is purpose specific for python packages only.

Security Issues

  • It is recommended that the certificate generation is always reviewed by security professionals since algorithms and key-lengths considered secure change all the time.

Bug Fixes

  • Newer releases of CentOS ship a version of libnss that depends on the existance of /dev/random and /dev/urandom in the operating system in order to run. This causes a problem during the cache preparation process which runs inside chroot that does not contain this, resulting in errors with the following message:

    error: Failed to initialize NSS library

    This has been resolved by introducing a /dev/random and /dev/urandom inside the chroot-ed environment.

17.0.4

Known Issues

  • In the lxc_hosts role execution, we make use of the images produced on a daily basis by images.linuxcontainers.org. Recent changes in the way those images are produced have resulted in changes to the default /etc/resolve.conf in that default image. As such, when executing the cache preparation it fails. For queens releases prior to 17.0.4 the workaround to get past the error is to add the following to the /etc/openstack_deploy/user_variables.yml file.

    lxc_cache_prep_pre_commands: "rm -f /etc/resolv.conf || true"
    lxc_cache_prep_post_commands: "ln -s ../run/resolvconf/resolv.conf /etc/resolv.conf -f"
    

17.0.3

New Features

  • When venvwithindex=True and ignorerequirements=True are both specified in tempest_git_install_fragments (as was previously the default), this results in tempest being installed from PyPI without any constraints being applied. This could result in the version of tempest being installed in the integrated build being different than the version being installed in the independent role tests. Going forward, we remove the tempest_git_* overrides in playbooks/defaults/repo_packages/openstack_testing.yml so that the integrated build installs tempest from PyPI, but with appropriate constraints applied.
  • This consolidates the amphora image tasks in a common file and adds a way to download an amphora image from an artefact storage over http(s). With the Octavia team providing test images the tests were modified to not build images any longer but download them.

Security Issues

  • It is commonly considered bad practice to downlaod random images from the Internet expecially the test images the Octavia team provides which could potentially include unpatched operating system packages - so for any production deploy adjust the download url to an artifact storage your organization controls. The system also does not authenticate the image (e.g. with an md5) so should only be used on networks your organization controls.

Other Notes

  • The internal variable python_ceph_package has been renamed to python_ceph_packages and is now a list instead of a string. If you are using gnocchi with ceph and are using this internal variable in your ceph_extra_components overrides, please update it to python_ceph_packages.

17.0.2

New Features

  • Adds support for the horizon octavia-ui dashboard. The dashboard will be automatically enabled if any octavia hosts are defined. If both Neutron LBaaSv2 and Octavia are enabled, two Load Balancer panels will be visible in Horizon.
  • Added the ability to configure vendor data for Nova in order to be able to push things via the metadata service or config drive.
  • Enable networking-bgpvpn ml2 neutron driver to make OpenDaylight SDN Controller to support BGPVPN for external network connectivity. You can set the neutron_plugin_type to ml2.opendaylight and neutron_plugin_base to odl-router_v2 and bgpvpn to enable BGPVPN on the OpenDaylight.
  • The default variable nova_default_schedule_zone was previously set by default to nova. This default has been removed to allow the default to be set by the nova code instead. Deployers wishing to maintain the default availability zone of nova must now set the variable as a user_variables.yml or group_vars override.

Upgrade Notes

  • When upgrading from pike to queens there are the following changes to the container/service setup.

    • All cinder container services are consolidated into a single cinder_api_container. The previously implemented cinder_scheduler_container can be removed.
    • A new heat_api container is created with all heat services running in it. The previously implemented heat_apis_container and heat_engine_container can be removed.
    • The ironic conductor service has been consolidated into the ironic_api_container. The previously implemented ironic_conductor_container can be removed.
    • All nova services are consolidated into the nova_api_container and the rest of the nova containers can be removed.
    • All trove services have been consolidated into the trove_api_container. The previously implemented trove_conductor_container and trove_taskmanager_container can be removed.

    Playbooks have been added to facilitate this process through automation. Please see the Major upgrades chapter in the Operations Guide.

17.0.1

Upgrade Notes

  • Users should purge the ‘ntp’ package from their hosts if ceph-ansible is enabled. ceph-ansible previously was configured to install ntp by default which conflicts with the OSA ansible-hardening role chrony service.

Bug Fixes

  • ceph-ansible is no longer configured to install ntp by default, which creates a conflict with OSA’s ansible-hardening role that is used to implement ntp using ‘chrony’.

17.0.0

New Features

  • A new variable has been added to allow a deployer to control the restart of containers from common-tasks/os-lxc-container-setup.yml. This new option is lxc_container_allow_restarts and has a default of true. If a deployer wishes to disable the auto-restart functionality they can set this value to false and automatic container restarts will be disabled. This is a complement to the same option already present in the lxc_container_create role. This option is useful to avoid uncoordinated restarts of galera or rabbitmq containers if the LXC container configuration changes in a way that requires a restart.
  • OpenStack-Ansible now supports the openSUSE Leap 42.X distributions mainly targeting the latest 42.3 release.
  • The Ceph stable release used by openstack-ansible and its ceph-ansible integration has been changed to the recent Ceph LTS Luminous release.
  • The galera cluster now supports cluster health checks over HTTP using port 9200. The new cluster check ensures a node is healthy by running a simple query against the wsrep sync status using monitoring user. This change will provide for a more robust cluster check ensuring we have the most fault tolerant galera cluster possible.
  • A typical OSA install will put the neutron and octavia queues on different vhosts thus preventing the event streamer from working While octavia is streaming to its own queue the consumer on the neutron side listens to the neutron queue. With a recent octavia enhancement a separate queue for the event streamer can be configured. This patch will set up the event streamer to post into the neutron queue using neutron’s credentials. Thus reaching the consumer on the neutron-lbaas side and allowing for streaming.
  • Generating and validating checksums for all files installed by packages is now disabled by default. The check causes delays in playbook runs and it can consume a significant amount of CPU and I/O resources. Deployers can re-enable the check by setting security_check_package_checksums to yes.
  • Deployers of CentOS 7 environments can use the openstack_hosts_enable_yum_fastestmirror variable to enable or disable yum’s fastestmirror plugin. The default setting of yes ensures that fastestmirror is enabled.
  • New hypervisor groups have been added allowing deployers to better define their compute workloads. While the generic “compute_hosts” group will still work explicit definitions for compute hosts can now be defined using the ironic-compute_hosts, kvm-compute_hosts, lxd-compute_hosts, qemu-compute_hosts, and powervm-compute_hosts groups accordingly
  • An option has been added allowing the user to define the user_group LBaaSv2 uses. The new option is neutron_lbaasv2_user_group and is set within the OS specific value by default.
  • The maximum amount of time to wait until forcibly failing the LXC cache preparation process is now configurable using the lxc_cache_prep_timeout variable. The value is specified in seconds, with the default being 20 minutes.
  • A new variable has been added which allows deployers to set the container technology OSA will use when running a deployment in containers. This new variable is container_tech which has a default value of “lxc”.
  • The lxcbr0 bridge now allows NetworkManager to control it, which allows for networks to start in the correct order when the system boots. In addition, the NetworkManager-wait-online.service is enabled to ensure that all services that require networking to function, such as keepalived, will only start when network configuration is complete. These changes are only applied if a deployer is actively using NetworkManager in their environment.
  • Neutron connectivity agents will now be deployed on baremetal within the “network_hosts” defined within the openstack_user_config.yml.
  • Galera healthcheck has been improved, and relies on an xinetd service. By default, the service is unaccessible (filtered with the no_access directive). You can override the directive by setting any xinetd valid value to galera_monitoring_allowed_source.
  • HAProxy services that use backend nodes that are not in the Ansible inventory can now be specified manually by setting haproxy_backend_nodes to a list of name and ip_addr settings.
  • Open vSwitch dataplane with NSH support has been implemented. This feature may be activated by setting ovs_nsh_support: True in /etc/openstack_deploy/user_variables.yml.
  • A new variable, tempest_roles, has been added to the os_tempest role allowing users to define keystone roles to be during tempest testing.
  • The security_sshd_permit_root_login setting can now be set to change the PermitRootLogin setting in /etc/ssh/sshd_config to any of the possible options. Set security_sshd_permit_root_login to one of without-password, prohibit-password, forced-commands-only, yes or no.
  • Persistent systemd journals are now enabled. This allows deployers to keep older systemd journals on disk for review. The disk space requirements are extremely low since the journals are stored in binary format. The default location for persistent journals is in /var/log/journal.

    Deployers can opt out of this change by setting openstack_host_keep_journals to no.

  • The extra packages percona packages used by the ppc64le are now downloaded by the Ansible deployment host by default, as opposed to the target hosts. Once downloaded the packages are pushed up to the target hosts. This behaviour may be adjusted by setting galera_server_extra_package_downloader to target-host. The packages are downloaded to the path set in galera_server_extra_package_path.
  • The repo server now implements nginx as a reverse proxy for python packages sourced from pypi. The initial query will be to a local deployment of pypiserver in order to serve any locally built packages, but if the package is not available locally it will retry the query against the upstream pypi mirror set in the variable repo_nginx_pypi_upstream (defaults to pypi) and cache the response.
  • Deployers can set a refresh interval for haproxy’s stats page by setting the haproxy_stats_refresh_interval variable. The default value is 60, which causes haproxy to refresh the stats page every 60 seconds.
  • The tempest_images data structure for the os_tempest role now expects the values for each image to include name (optionally) and format (the disk format). Also, the optional variable checksum may be used to set the checksum expected for the file in the format <algorithm>:<checksum>.
  • The default location for the image downloads in the os_tempest role set by the tempest_image_dir variable has now been changed to be /opt/cache/files in order to match the default location in nodepool. This improves the reliability of CI testing in OpenStack CI as it will find the file already cached there.
  • A new variable has been introduced into the os_tempest role named tempest_image_downloader. When set to deployment-host (which is the default) it uses the deployment host to handle the download of images to be used for tempest testing. The images are then uploaded to the target host for uploading into Glance.
  • The tasks within the ansible-hardening role are now based on Version 1, Release 3 of the Red Hat Enteprise Linux Security Technical Implementation Guide.
  • The sysctl parameter kernel.randomize_va_space is now set to 2 by default. This matches the default of most modern Linux distributions and it ensures that Address Space Layout Randomization (ASLR) is enabled.
  • The Datagram Congestion Control Protocol (DCCP) kernel module is now disabled by default, but a reboot is required to make the change effective.
  • An option to disable the machinectl quota system has been changed. The variable lxc_host_machine_quota_disabled is a Boolean with a default of false. When this option is set to true it will disable the machinectl quota system.
  • The options lxc_host_machine_qgroup_space_limit and lxc_host_machine_qgroup_compression_limit have been added allowing a deployer to set qgroup limits as they see fit. The default value for these options is “none” which is effectively unlimited. These options accept any nominal size value followed by the single letter type, example 64G. These options are only effective when the option lxc_host_machine_quota_disabled is set to false.
  • Enable Kernel Shared Memory support by setting nova_compute_ksm_enabled to True.
  • When using Glance and NFS the NFS mount point will now be managed using a systemd mount unit file. This change ensures the deployment of glance is not making potentially system impacting changes to the /etc/fstab and modernizes how we deploy glance when using shared storage.
  • New variables have been added to the glance role allowing a deployer to set the UID and GID of the glance user. The new options are, glance_system_user_uid and glance_system_group_uid. These options are useful when deploying glance with shared storage as the back-end for images and will only set the UID and GID of the glance user when defined.
  • Searching for world-writable files is now disabled by default. The search causes delays in playbook runs and it can consume a significant amount of CPU and I/O resources. Deployers can re-enable the search by setting security_find_world_writable_dirs to yes.

Known Issues

  • Ceph storage backend is known not to work on openSUSE Leap 42.X yet. This is due to missing openSUSE support in the upstream Ceph Ansible playbooks.

Upgrade Notes

  • The ceph-ansible integration has been updated to support the ceph-ansible v3.0 series tags. The new v3.0 series brings a significant refactoring of the ceph-ansible roles and vars, so it is strongly recommended to consult the upstream ceph-ansible documentation to perform any required vars migrations before you upgrade.
  • The ceph-ansible common roles are no longer namespaced with a galaxy-style ‘.’ (ie. ceph.ceph-common is now cloned as ceph-common), due to a change in the way upstream meta dependencies are handled in the ceph roles. The roles will be cloned according to the new naming, and an upgrade playbook ceph-galaxy-removal.yml has been added to clean up the stale galaxy-named roles.
  • The Ceph stable release used by openstack-ansible and its ceph-ansible integration has been changed to the recent Ceph LTS Luminous release.
  • KSM configuration is changed to disabled by default on Ubuntu. If you overcommit the RAM on your hypervisor it’s a good idea to set nova_compute_ksm_enabled to True.
  • The glance v1 API is now disabled by default as the API is scheduled to be removed in Queens.
  • The glance registry service is now disabled by default as it is not required for the v2 API and is scheduled to be removed in the future. The service can be enabled by setting glance_enable_v2_registry to True.
  • When upgrading there is nothing a deployer must immediately do to run neutron agent services on hosts within the network_hosts group. Simply executing the playbooks will deploy the neutron servers on the baremetal machines and will leave all existing agents containers alone.
  • It is recommended for deployers to clean up the neutron_agents container(s) after an upgrade is complete and the cluster has been verified as stable. This can be done by simply disabling the neutron agents running in the neutron_agent container(s), re-balancing the agent services targeting the new baremetal agents, deleting the container(s), and finally removing the container(s) from inventory.
  • Default quotas were bumped for the following resources: networks (from 10 to 100), subnets (from 10 to 100), ports (from 50 to 500) to match upstream defaults.
  • Any tooling using the Designate v1 API needs to be reworked to use the v2 API
  • If you have overriden your openstack_host_specific_kernel_modules, please remove its group matching, and move that override directly to the appropriate group.

    Example, for an override like:

    - name: "ebtables"
      pattern: "CONFIG_BRIDGE_NF_EBTABLES"
      group: "network_hosts"
    

    You can create a file for the network_host group, inside its group vars folder /etc/openstack_deploy/group_vars/network_hosts, with the content:

    - name: "ebtables"
      pattern: "CONFIG_BRIDGE_NF_EBTABLES"
    
  • Any user that is coming from Pike or below on Ubuntu should modify its user_external_repos_list, switching its ubuntu cloud archive repository from state: present to state: absent. From now on, UCA will be defined with the filename uca. If the deployer wants to use its mirror, he can still override the variable uca_repo to point to its mirror. Alternatively, the deployer can completely define which repos to add and remove, ignoring our defaults, by overriding openstack_hosts_package_repos.

Deprecation Notes

  • The galera_percona_xtrabackup_repo_url variable which was used on Ubuntu distributions to select the upstream Percona repository has been dropped and the default upstream repository is always used from now on.
  • The variables keystone_memcached_servers and keystone_cache_backend_argument have been deprecated in favor of keystone_cache_servers, a list of servers for caching purposes.
  • In OSA deployments prior to Queens, if repo_git_cache_dir was set to a folder which existed on a repo container host then that folder would be symlinked to the repo container bind mount instead of synchronising its contents to the repo container. This functionality is deprecated in Queens and will be removed in Rocky. The ability to make use of the git cache still exists, but the folder contents will be synchronised from the deploy host to the repo container. If you have made use of the symlink functionality previously, please move the contents to a standard folder and remove the symlink.
  • The Ceilometer API is no longer available in the Queens release of OpenStack, this patch removes all references to API related configurations as they are no longer needed.
  • The galera_client_opensuse_mirror_obs_url variable has been removed since the OBS repository is no longer used to install the MariaDB packages.
  • The glance_enable_v1_registry variable has been removed. When using the glance v1 API the registry service is required, so having a variable to disable it makes little sense. The service is now enabled/disabled for the v1 API using the glance_enable_v1_api variable.
  • The nova_placement database which was implemented in the ocata release of OpenStack-Ansible was never actually used for anything due to reverts in the upstream code. The database should be empty and can be deleted. With this the following variables also no longer have any function and have been removed.
    • nova_placement_galera_user
    • nova_placement_galera_database
    • nova_placement_db_max_overflow
    • nova_placement_db_max_pool_size
    • nova_placement_db_pool_timeout
  • The following variables have been removed as they no longer serve any purpose.

    • galera_package_arch
    • percona_package_download_validate_certs
    • percona_package_url
    • percona_package_fallback_url
    • percona_package_sha256
    • percona_package_path
    • qpress_package_download_validate_certs
    • qpress_package_url
    • qpress_package_fallback_url
    • qpress_package_sha256
    • qpress_package_path

    The functionality previously using these variables has been transitioned to using a simpler data structure.

  • The following variables have been removed from the os_tempest role to simplify it. They have been replaced through the use of the data structure tempest_images which now has equivalent variables per image. - cirros_version - tempest_img_url - tempest_image_file - tempest_img_disk_format - tempest_img_name - tempest_images.sha256 (replaced by checksum)

Critical Issues

  • The ceph-ansible integration has been updated to support the ceph-ansible v3.0 series tags. The new v3.0 series brings a significant refactoring of the ceph-ansible roles and vars, so it is strongly recommended to consult the upstream ceph-ansible documentation to perform any required vars migrations before you upgrade.
  • The Designate V1 API has been removed, and cannot be enabled.

Security Issues

  • The PermitRootLogin in sshd_config changed from ‘yes’ to ‘prohibit-password’ in the containers. By default there is no password set in the containers but the ssh pub key from the deployment host is injected in the targets nodes authorized_keys.
  • The following headers were added as additional default (and static) values. X-Content-Type-Options nosniff, X-XSS-Protection “1; mode=block”, and Content-Security-Policy “default-src ‘self’ https: wss:;”. Additionally, the X-Frame-Options DENY header was added, defaulting to DENY. You may override the header via the keystone_x_frame_options variable.
  • Since we use neutron’s credentials to access the queue, security conscious people might want to set up an extra user for octavia on the neutron queue restricted to the topics octavia posts to.

Bug Fixes

  • When the glance_enable_v2_registry variable is set to True the corresponding data_api setting is now correctly set. Previously it was not set and therefore the API service was not correctly informed that the registry was operating.
  • The os_tempest tempest role was downloading images twice - once arbitrarily, and once to use for testing. This has been consolidated into a single download to a consistent location.
  • SELinux policy for neutron on CentOS 7 is now provided to fix SELinux AVCs that occur when neutron’s agents attempt to start daemons such as haproxy and dnsmasq.

Other Notes

  • openSUSE Leap 42.X support is still work in progress and not fully tested besides basic coverange in the OpenStack CI and individual manual testing. Even though backporting fixes to the Pike release will be done on best effort basis, it’s advised to use the master branch when working on openSUSE hosts.
  • CentOS deployments require a special COPR repository for modern LXC packages. The COPR repository is not mirrored at this time and this causes failed gate tests and production deployments.

    The role now syncs the LXC packages down from COPR to each host and builds a local LXC package repository in /opt/thm-lxc2.0. This greatly reduces the amount of times that packages must be downloaded from the COPR server during deployments, which will reduce failures until the packages can be hosted with a more reliable source.

    In addition, this should speed up playbook runs since yum can check a locally-hosted repository instead of a remote repository with availability and performance challenges.

  • Added support for specifying GID and UID for cinder system user by defining cinder_system_user_uid and cinder_system_group_gid. This setting is optional.
  • The variables nova_scheduler_use_baremetal_filters and nova_metadata_host have been removed, matching upstream nova changes. The nova_virt_types dict no longer needs the nova_scheduler_use_baremetal_filters and nova_firewall_driver keys as well.
  • The max_fail_percentage playbook option has been used with the default playbooks since the first release of the playbooks back in Icehouse. While the intention was to allow large-scale deployments to succeed in cases where a single node fails due to transient issues, this option has produced more problems that it solves. If a failure occurs that is transient in nature but is under the set failure percentage the playbook will report a success, which can cause silent failures depending on where the failure happened. If a deployer finds themselves in this situation the problems are are then compounded because the tools will report there are no known issues. To ensure deployers have the best deployment experience and the most accurate information a change has been made to remove the max_fail_percentage option from all of the default playbooks. The removal of this option has the side effect of requiring the deploy to skip specific hosts should one need to be omitted from a run, but has the benefit of eliminating silent, hard to track down, failures. To skip a failing host for a given playbook run use the –limit ‘!$HOSTNAME’ CLI switch for the specific run. Once the issues have been resolved for the failing host rerun the specific playbook without the –limit option to ensure everything is in sync.
  • The use_neutron option was marked to be removed in sahara.
  • The vars plugin override_folder.py has been removed. With the move to Ansible 2.4 [“https://review.openstack.org/#/c/522778”] this plugin is no longer required. The functionality this plugin provided has been replaced with the native Ansible inventory plugin.
Creative Commons Attribution 3.0 License

Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.