Contributing Cloud Test Resources

OpenStack utilizes a “project gating” system based on Zuul to ensure that every change proposed to any OpenStack project passes tests before being added to its source code repository. Each change may run several jobs which test the change in various configurations, and each job may run thousands of individual tests. To ensure the overall security of the system as well as isolation between unrelated changes, each job is run on an OpenStack compute instance that is created specifically to run that job and is destroyed and replaced immediately after completing that task.

This system operates across multiple OpenStack clouds, making the OpenStack project infrastructure itself a substantial and very public cross-cloud OpenStack application.

The compute instances used by this system are generously donated by organizations that are contributing to OpenStack, and the project is very appreciative of this.

By visiting https://zuul.openstack.org/ you can see the system in action at any time.

You’ll see every job that’s running currently, as well as some graphs that show activity over time. Each of those jobs is running on its own compute instance. We create and destroy quite a number of those each day (most compute instances last for about 1 hour).

Having resources from more providers will help us continue to grow the project and deliver test results to developers quickly. OpenStack has long-since become too complicated for developers to effectively test in even the most common configurations on their own, so this process is very important for developers.

If you have some capacity on an OpenStack cloud that you are able to contribute to the project, it would be a big help. This is what we need:

  • Nova and Glance APIs (with the ability to upload images)

  • A single instance with 500GB of disk (via Cinder is preferred, local is okay) per cloud region for our region-local mirror

Each test instance requires:

  • 8GB RAM

  • 8vCPU at 2.4GHz (or more or less vCPUs depending on speed)

  • A public IP address (IPv4 and/or IPv6)

  • 80GB of storage

In a setting where our instances will be segregated, our usage patterns will cause us to be our own noisy neighbors at the worst times, so it would be best to plan for little or no overcommitment. In an unsegregated public cloud setting, the distribution of our jobs over a larger number of hypervisors will allow for more overcommitment.

Since there’s a bit of setup and maintenance involved in adding a new provider, a minimum of 100 instances would be helpful.

Benefits to Contributors

Since we continuously use the OpenStack APIs and are familiar with how they should operate, we occasionally discover potential problems with contributing clouds before many of their other users (or occasionally even ops teams). In these cases, we work with contacts on their operations teams to let them know and try to help fix problems before they become an issue for their customers.

We collect numerous metrics about the performance of the clouds we utilize. From these metrics we create dashboards which are freely accessible via the Internet to help providers see and debug performance issues.

The names and regions of providers are a primary component of hostnames on job workers, and as such are noticeable to those reviewing job logs from our CI system (as an example, developers investigating test results on proposed source code changes). In this way, names of providers contributing test resources become known to the technical community in their day-to-day interaction with our systems.

The OpenStack Foundation has identified Infrastructure Donors as a special category of sponsoring organization and prominently identifies those contributing a significant quantity of resources (as determined by the Infra team) at: https://www.openstack.org/foundation/companies/#infra-donors

If this sounds interesting, and you have some capacity to spare, it would be very much appreciated. You are welcome to contact the Infrastructure team on our public mailing list at <service-discuss@lists.opendev.org>, or in our IRC channel, #opendev on OFTC.

Contribution Workflow

After discussing your welcome contribution with the infrastructure team it will be time to build and configure the cloud.

Initial setup

We require two projects to be provisioned

  • A zuul project for infrastructure testing nodes

  • A ci project for control-plane services

The zuul project will be used by nodepool for running the testing nodes. Note there may be be references in configuration to projects with jenkins; although this is not used any more some original clouds named their projects for the CI system in use at the time.

At a minimum, the ci project has the region-local mirror host(s) for the cloud’s region(s). This will be named mirror.<region>.<cloud>.openstack.org and all jobs running in the zuul project will be configured to use it as much as possible (this might influence choices you make in network setup, etc.). Depending on the resources available and with prior co-ordination with the provider, the infrastructure team may also run other services in this project such as webservers, file servers or nodepool builders.

The exact project and user names is not particularly important, usually something like openstack[ci|zuul] is chosen. Per below, these will exist as openstackci-<provider> openstackzuul-<provider> in various clouds.yaml configuration files. For minimising potential for problems it is probably best that the provided users do not have “admin” credentials; although in some clouds that are private to OpenStack infra admin permissions may be granted, or an alternative user available with such permissions, to help with various self-service troubleshooting. For example, the infrastructure team does not require any particular access to subnet or router configuration in the cloud, although where requested we are happy to help with this level of configuration.

Add cloud configuration

After creating the two projects and users, configuration and authentication details need to be added into configuration management. The public portions can be proposed via the standard review process at any time by anyone. Exact details of cloud configuration changes from time to time; the best way to begin the addition is to clone the system-configuration repository (i.e. this repo) with git clone https://opendev.org/opendev/system-config and grep for an existing cloud (or go through git log and find the last cloud added) and follow the pattern. After posting the review, CI tests and reviewers will help with any issues.

These details largely consist of the public portions of the openstackclient configuration format, such as the endpoint and version details. Note we require https communication to Keystone; we can use self-signed certificates if required, some non-commercial clouds use letsencrypt while others use their CA of preference.

Once the public review is ready, the secret values used in the review need to be manually entered by an infra-root member into the secret storage on bridge.openstack.org. You can communicate these via GPG encrypted mail to a infra-root member (ping infra-root in #opendev and someone will appear). If not told explicitly, most sign the OpenStack signing key, so you can find their preferred key via that; if the passwords can be changed plain-text is also fine. With those in place, the public review will be committed and the cloud will become active.

Once active, bridge.openstack.org will begin regularly running ansible-role-cloud-launcher against the new cloud to configure keys, upload base images, setup security groups and such.

Activate in nodepool

After the cloud is configured, it can be added as a resource for nodepool to use for testing nodes.

Firstly, an infra-root member will need to make the region-local mirror server, configure any required storage for it and setup DNS. With this active, the cloud is ready to start running testing nodes.

At this point, the cloud needs to be added to nodepool configuration in project-config. Again existing entries provide useful templates for the initial review proposal, which can be done by anyone. Some clouds provision particular flavors for CI nodes; these need to be present at this point and will be conveyed via the nodepool configuration. Again CI checks and reviewers will help with any fine details.

Once this is committed, nodepool will upload images into the new region and start running nodes automatically. Don’t forget to add the region to the grafana. configuration to ensure we have a dashboard for the region’s health.

Ongoing operation

If at any point the cloud needs to be disabled for maintenance a review can be proposed to set the max-servers to zero in the nodepool configuration. We usually propose a revert of this at the same time with a negative workflow to remember to turn it back on when appropriate. In an emergency, an infra-root member can bypass the normal review process and apply such a change by hand.