Rocky Series Release Notes¶
Vitrage Rocky release contains significant infrastructure changes that bring a lot of value to the end user. The main ones are:
Graph fast-failover and better HA support.
High-scale support. The graph was tested to work with over 100,000 entities.
Alarm and RCA history.
In addition, we added Kubernetes and Prometheus datasources.
Alarm and RCA Historyfeature allows saving and quering historical alarms and exploring their root cause. New set of parameters in alarm list api and a new history api allows users to query the data saved in Vitrage schema in the DB.
Add support for more aodh alarm types - composite, gnocchi_aggregation_by_metrics_threshold and gnocchi_aggregation_by_resources_threshold.
High availability of active standby vitrage-graph is better supported. A fast fail-over is implemented by storing all the required in-memory state data in mysql. Vitrage-graph initializes quickly upon failover without requesting any updates.
Added a new datasource for Kubernetes cluster as a workload on Openstack. We support kubernetes on top of Nova.
Prometheus Datasourcewas added, to handle alerts coming from Prometheus. Prometheus is an open-source systems monitoring and alerting toolkit, with exporters that exports different metrics to Prometheus and Alertmanager that handles alerts sent by Prometheus server.
Support for graphs with more than 100,000 vertices has been added and tested. See high-scale configuration document.
As part of Rocky fast-failover support, vitrage-graph is now reloaded from the database. This causes an issue with datasources using caches that can become outdated after vitrage-graph restart, or if more than one vitrage-collector is used. Please avoid running multiple vitrage-collector services.
Added support for Networkx version 2.1
Add a command line tool used as scaffold for creating new datasource.
Added a new
Mock datasource, which can mock an entire graph and allows testing large scale stability as well as performance.
The collector service was changed to run on demand instead of periodically, hence it can now be run in active-active mode. This is as part of a larger design to improve high availability.
Oslo service was replaced by cotyledon, so Vitrage uses real threads and multiprocessing. This change removes unnecessary complications of using eventlets and timers.
Created a dedicated process for the api handler, for better handling api calls under stress.
Support get_changes in the static datasource
The static datasource now supports changes in existing yaml files, and updates the graph accordingly.
Many bug fixes related to performance and stability.