Database

Introduction

The general recommendation is to run a galera cluster (using mysql or mariadb) for your database. The active-active features of it allow for a fast and easy failover.

Note that OpenStack does not play well with writing in parallel to multiple galera nodes, see the config recommendation below.

One database to rule them all?

You can consider deploying the database in two ways:

  • one galera cluster for each OpenStack service

  • one single big galera cluster for all OpenStack services

The recommendation is to split your galera in separate clusters for multiple reasons:

  • Reduce the impact when a galera cluster is down

  • Allow intervention on smaller part of infrastructure

Also there is no benefit of colocating multiple services on the same galera cluster.

Config recommendation

This section is split into three parts:

  • the configuration for galera itself

  • the configuration for the reverse proxy in front of galera

  • the configuration for the OpenStack services

Galera configuration

All of these settings need to be consistent on all nodes of the galera cluster.

Note that this guide does not include the general requirements to get the galera cluster set up in the first place. For this please see https://mariadb.com/kb/en/getting-started-with-mariadb-galera-cluster/

General health configs

[mysqld]
max_connections=4000
max_statement_time=20

In order to ensure your cluster runs smoothly we recommend you limit the amount of connections and the time statements can be executed.

The value for max_connections should be set based on actual tests (testing with a lot of idle connections is fine).

The value of 20 seconds for max_statement_time is enough for all normal use-cases we know of. You might only run into issues with regular Nova cleanup jobs if they do not run often enough.

Replication stability

[galera]
wsrep_provider_options=gcomm.thread_prio=rr:2;gcs.fc_limit=160;gcs.fc_factor=0.8;gcache.size=2G

When you have a large amount of connections to your galera cluster, these connections might starve the galera replication thread. If the replication thread does not get enough CPU time, the galera cluster will lose its members and break.

This setting sets the replication thread to realtime scheduling on the kernel side. If you run galera as a non-privileged user (as you hopefully do), galera will need CAP_SYS_NICE in order to be allowed to change the priority. If you run inside a container environment, you might need to set kernel.sched_rt_runtime_us=-1 (although that is suboptimal).

Performance

[mysqld]
tmp_table_size=64M
max_heap_table_size=64M
optimizer_switch=derived_merge=off

Temporary tables and the derived_merge optimizer are an important setting if you have a large amount of Neutron RBAC rules.

Reverse Proxy Configuration

You will need to run a reverse proxy in front of your galera cluster to ensure OpenStack only ever communicates with a single cluster node for write requests. This is required because OpenStack does not handle well the deadlocks when writing to different nodes in parallel.

If you choose to run haproxy for this, you can use something like the following config:

defaults
  timeout client 300s

listen db_master
  bind 0.0.0.0:3306
  balance first
  option mysql-check
  server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1
  server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 2
  server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster backup id 3

listen db_slave
  bind 0.0.0.0:3308
  balance roundrobin
  option mysql-check
  server server-1 server-1.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 3 backup
  server server-2 server-2.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 1 weight 10
  server server-3 server-3.with.the.fqdn check inter 5s downinter 15s fastinter 2s resolvers cluster id 2 weight 10

By using two blocks, we can separate the read only SQL requests (listening on port 3308 here) from read/write requests (listening on port 3306 here) and lower down a little bit the load on the first (master) mysql backend.

You should note the timeout client setting here, as it is relevant to the OpenStack configuration.

OpenStack Configuration

Database Connection Settings

The database configuration is normally in the [database] section of the configuration. You should set the following:

connection = mysql+pymysql://login:pass@proxy:3306/db?charset=utf8
slave_connection = mysql+pymysql://login:pass@proxy:3308/db?charset=utf8
connection_recycle_time = 280
max_pool_size = 15
max_overflow = 25

The connection is used by OpenStack services to do read and write requests.

The slave_connection is used by OpenStack services to do read only requests.

The connection_recycle_time should be a bit smaller than the timeout client in the reverse proxy (5% to 10%). This ensures connections are recreated on the OpenStack side first before the reverse proxy is forcing the connection to terminate.

The max_pool_size and max_overflow define the amount of connections an individual thread is allowed to have. You will need to set this based on experience (although the above should be a good start).

Database cleanup

Nova and Cinder use soft deletes inside their database. This means deleted entries are still persistent in the database and just get a deleted flag set.

In order to prevent the database tables from growing forever these deleted entries will need to be regularly removed. For this you can use:

If you did never run these cleanups previously (or if your environment has a high amount of resources being deleted) you might run into a timeout due to the max_statement_time on the database cluster. To work around this the nova-manage commands support a --max-rows argument. For Cinder you might need to run the SQL statements manually and add a limit 1000 to them (statements are part of the error of the command).