Atom feed of this document
  
Juno -  Juno -  Juno -  Juno -  Juno -  Juno -  Juno -  Juno - 

 Cluster health

Use the swift-dispersion-report tool to measure overall cluster health. This tool checks if a set of deliberately distributed containers and objects are currently in their proper places within the cluster. For instance, a common deployment has three replicas of each object. The health of that object can be measured by checking if each replica is in its proper place. If only 2 of the 3 is in place the object’s health can be said to be at 66.66%, where 100% would be perfect. A single object’s health, especially an older object, usually reflects the health of that entire partition the object is in. If you make enough objects on a distinct percentage of the partitions in the cluster,you get a good estimate of the overall cluster health. In practice, about 1% partition coverage seems to balance well between accuracy and the amount of time it takes to gather results. The first thing that needs to be done to provide this health value is create a new account solely for this usage. Next, you need to place the containers and objects throughout the system so that they are on distinct partitions. The swift-dispersion-populate tool does this by making up random container and object names until they fall on distinct partitions. Last, and repeatedly for the life of the cluster, you must run the swift-dispersion-report tool to check the health of each of these containers and objects. These tools need direct access to the entire cluster and to the ring files (installing them on a proxy server suffices). The swift-dispersion-populate and swift-dispersion-report commands both use the same configuration file, /etc/swift/dispersion.conf. Example dispersion.conf file:

[dispersion]
auth_url = http://localhost:8080/auth/v1.0
auth_user = test:tester
auth_key = testing

There are also configuration options for specifying the dispersion coverage, which defaults to 1%, retries, concurrency, and so on. However, the defaults are usually fine. Once the configuration is in place, run swift-dispersion-populate to populate the containers and objects throughout the cluster. Now that those containers and objects are in place, you can run swift-dispersion-report to get a dispersion report, or the overall health of the cluster. Here is an example of a cluster in perfect health:

$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 19s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space

Queried 2619 objects for dispersion reporting, 7s, 0 retries
100.00% of object copies found (7857 of 7857)
Sample represents 1.00% of the object partition space

Now, deliberately double the weight of a device in the object ring (with replication turned off) and re-run the dispersion report to show what impact that has:

$ swift-ring-builder object.builder set_weight d0 200
$ swift-ring-builder object.builder rebalance
...
$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 8s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space

Queried 2619 objects for dispersion reporting, 7s, 0 retries
There were 1763 partitions missing one copy.
77.56% of object copies found (6094 of 7857)
Sample represents 1.00% of the object partition space

You can see the health of the objects in the cluster has gone down significantly. Of course, this test environment has just four devices, in a production environment with many devices the impact of one device change is much less. Next, run the replicators to get everything put back into place and then rerun the dispersion report:

... start object replicators and monitor logs until they're caught up ...
$ swift-dispersion-report
Queried 2621 containers for dispersion reporting, 17s, 0 retries
100.00% of container copies found (7863 of 7863)
Sample represents 1.00% of the container partition space

Queried 2619 objects for dispersion reporting, 7s, 0 retries
100.00% of object copies found (7857 of 7857)
Sample represents 1.00% of the object partition space

Alternatively, the dispersion report can also be output in JSON format. This allows it to be more easily consumed by third-party utilities:

$ swift-dispersion-report -j
{"object": {"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0,
"copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container":
{"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected":
12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}}

Table 9.75. Description of configuration options for [dispersion] in dispersion.conf
Configuration option = Default value Description
auth_key = testing No help text available for this option.
auth_url = http://localhost:8080/auth/v1.0 Endpoint for auth server, such as keystone
auth_user = test:tester Default user for dispersion in this context
auth_version = 1.0 Indicates which version of auth
concurrency = 25 Number of replication workers to spawn
container_populate = yes No help text available for this option.
container_report = yes No help text available for this option.
dispersion_coverage = 1.0 No help text available for this option.
dump_json = no No help text available for this option.
endpoint_type = publicURL Indicates whether endpoint for auth is public or internal
keystone_api_insecure = no Allow accessing insecure keystone server. The keystone's certificate will not be verified.
object_populate = yes No help text available for this option.
object_report = yes No help text available for this option.
retries = 5 No help text available for this option.
swift_dir = /etc/swift Swift configuration directory

Questions? Discuss on ask.openstack.org
Found an error? Report a bug against this page

loading table of contents...