Partitioned Consistent Hash Ring

Ring

class swift.common.ring.ring.Ring(serialized_path, reload_time=15, ring_name=None, validation_hook=<function <lambda>>)

Bases: object

Partitioned consistent hashing ring.

Parameters:
  • serialized_path – path to serialized RingData instance
  • reload_time – time interval in seconds to check for a ring change
  • ring_name – ring name string (basically specified from policy)
  • validation_hook – hook point to validate ring configuration ontime
Raises:

RingLoadError if the loaded ring data violates its constraint

devs

devices in the ring

get_more_nodes(part)

Generator to get extra nodes for a partition for hinted handoff.

The handoff nodes will try to be in zones other than the primary zones, will take into account the device weights, and will usually keep the same sequences of handoffs even with ring changes.

Parameters:part – partition to get handoff nodes for
Returns:generator of node dicts

See get_nodes() for a description of the node dicts.

get_nodes(account, container=None, obj=None)

Get the partition and nodes for an account/container/object. If a node is responsible for more than one replica, it will only appear in the output once.

Parameters:
  • account – account name
  • container – container name
  • obj – object name
Returns:

a tuple of (partition, list of node dicts)

Each node dict will have at least the following keys:

id unique integer identifier amongst devices
index offset into the primary node list for the partition
weight a float of the relative weight of this device as compared to others; this indicates how many partitions the builder will try to assign to this device
zone integer indicating which zone the device is in; a given partition will not be assigned to multiple devices within the same zone
ip the ip address of the device
port the tcp port of the device
device the device’s name on disk (sdb1, for example)
meta general use ‘extra’ field; for example: the online date, the hardware description
get_part(account, container=None, obj=None)

Get the partition for an account/container/object.

Parameters:
  • account – account name
  • container – container name
  • obj – object name
Returns:

the partition number

get_part_nodes(part)

Get the nodes that are responsible for the partition. If one node is responsible for more than one replica of the same partition, it will only appear in the output once.

Parameters:part – partition to get nodes for
Returns:list of node dicts

See get_nodes() for a description of the node dicts.

has_changed()

Check to see if the ring on disk is different than the current one in memory.

Returns:True if the ring on disk has changed, False otherwise
partition_count

Number of partitions in the ring.

replica_count

Number of replicas (full or partial) used in the ring.

class swift.common.ring.ring.RingData(replica2part2dev_id, devs, part_shift)

Bases: object

Partitioned consistent hashing ring data (used for serialization).

classmethod deserialize_v1(gz_file, metadata_only=False)

Deserialize a v1 ring file into a dictionary with devs, part_shift, and replica2part2dev_id keys.

If the optional kwarg metadata_only is True, then the replica2part2dev_id is not loaded and that key in the returned dictionary just has the value [].

Parameters:
  • gz_file (file) – An opened file-like object which has already consumed the 6 bytes of magic and version.
  • metadata_only (bool) – If True, only load devs and part_shift
Returns:

A dict containing devs, part_shift, and replica2part2dev_id

classmethod load(filename, metadata_only=False)

Load ring data from a file.

Parameters:
  • filename – Path to a file serialized by the save() method.
  • metadata_only (bool) – If True, only load devs and part_shift.
Returns:

A RingData instance containing the loaded data.

save(filename, mtime=1300507380.0)

Serialize this RingData instance to disk.

Parameters:
  • filename – File into which this instance should be serialized.
  • mtime – time used to override mtime for gzip, default or None if the caller wants to include time
serialize_v1(file_obj)
to_dict()

Ring Builder

class swift.common.ring.builder.RingBuilder(part_power, replicas, min_part_hours)

Bases: object

Used to build swift.common.ring.RingData instances to be written to disk and used with swift.common.ring.Ring instances. See bin/swift-ring-builder for example usage.

The instance variable devs_changed indicates if the device information has changed since the last balancing. This can be used by tools to know whether a rebalance request is an isolated request or due to added, changed, or removed devices.

Parameters:
  • part_power – number of partitions = 2**part_power.
  • replicas – number of replicas for each partition
  • min_part_hours – minimum number of hours between partition changes
add_dev(dev)

Add a device to the ring. This device dict should have a minimum of the following keys:

id unique integer identifier amongst devices. Defaults to the next id if the ‘id’ key is not provided in the dict
weight a float of the relative weight of this device as compared to others; this indicates how many partitions the builder will try to assign to this device
region integer indicating which region the device is in
zone integer indicating which zone the device is in; a given partition will not be assigned to multiple devices within the same (region, zone) pair if there is any alternative
ip the ip address of the device
port the tcp port of the device
device the device’s name on disk (sdb1, for example)
meta general use ‘extra’ field; for example: the online date, the hardware description

Note

This will not rebalance the ring immediately as you may want to make multiple changes for a single rebalance.

Parameters:dev – device dict
Returns:id of device (not used in the tree anymore, but unknown users may depend on it)
change_min_part_hours(min_part_hours)

Changes the value used to decide if a given partition can be moved again. This restriction is to give the overall system enough time to settle a partition to its new location before moving it to yet another location. While no data would be lost if a partition is moved several times quickly, it could make that data unreachable for a short period of time.

This should be set to at least the average full partition replication time. Starting it at 24 hours and then lowering it to what the replicator reports as the longest partition cycle is best.

Parameters:min_part_hours – new value for min_part_hours
copy_from(builder)

Reinitializes this RingBuilder instance from data obtained from the builder dict given. Code example:

b = RingBuilder(1, 1, 1)  # Dummy values
b.copy_from(builder)

This is to restore a RingBuilder that has had its b.to_dict() previously saved.

debug(*args, **kwds)

Temporarily enables debug logging, useful in tests, e.g.

with rb.debug():
rb.rebalance()
classmethod from_dict(builder_data)
get_balance()

Get the balance of the ring. The balance value is the highest percentage of the desired amount of partitions a given device wants. For instance, if the “worst” device wants (based on its weight relative to the sum of all the devices’ weights) 123 partitions and it has 124 partitions, the balance value would be 0.83 (1 extra / 123 wanted * 100 for percentage).

Returns:balance of the ring
get_part_devices(part)

Get the devices that are responsible for the partition, filtering out duplicates.

Parameters:part – partition to get devices for
Returns:list of device dicts
get_required_overload(weighted=None, wanted=None)

Returns the minimum overload value required to make the ring maximally dispersed.

The required overload is the largest percentage change of any single device from its weighted replicanth to its wanted replicanth (note: under weighted devices have a negative percentage change) to archive dispersion - that is to say a single device that must be overloaded by 5% is worse than 5 devices in a single tier overloaded by 1%.

get_ring()

Get the ring, or more specifically, the swift.common.ring.RingData. This ring data is the minimum required for use of the ring. The ring builder itself keeps additional data such as when partitions were last moved.

increase_partition_power()

Increases ring partition power by one.

Devices will be assigned to partitions like this:

OLD: 0, 3, 7, 5, 2, 1, ... NEW: 0, 0, 3, 3, 7, 7, 5, 5, 2, 2, 1, 1, ...

classmethod load(builder_file, open=<built-in function open>)

Obtain RingBuilder instance of the provided builder file

Parameters:builder_file – path to builder file to load
Returns:RingBuilder instance
min_part_seconds_left

Get the total seconds until a rebalance can be performed

pretend_min_part_hours_passed()

Override min_part_hours by marking all partitions as having been moved 255 hours ago and last move epoch to ‘the beginning of time’. This can be used to force a full rebalance on the next call to rebalance.

rebalance(seed=None)

Rebalance the ring.

This is the main work function of the builder, as it will assign and reassign partitions to devices in the ring based on weights, distinct zones, recent reassignments, etc.

The process doesn’t always perfectly assign partitions (that’d take a lot more analysis and therefore a lot more time – I had code that did that before). Because of this, it keeps rebalancing until the device skew (number of partitions a device wants compared to what it has) gets below 1% or doesn’t change by more than 1% (only happens with a ring that can’t be balanced no matter what).

Returns:(number_of_partitions_altered, resulting_balance, number_of_removed_devices)
remove_dev(dev_id)

Remove a device from the ring.

Note

This will not rebalance the ring immediately as you may want to make multiple changes for a single rebalance.

Parameters:dev_id – device id
save(builder_file)

Serialize this RingBuilder instance to disk.

Parameters:builder_file – path to builder file to save
search_devs(search_values)

Search devices by parameters.

Parameters:search_values – a dictionary with search values to filter devices, supported parameters are id, region, zone, ip, port, replication_ip, replication_port, device, weight, meta
Returns:list of device dicts
set_dev_weight(dev_id, weight)

Set the weight of a device. This should be called rather than just altering the weight key in the device dict directly, as the builder will need to rebuild some internal state to reflect the change.

Note

This will not rebalance the ring immediately as you may want to make multiple changes for a single rebalance.

Parameters:
  • dev_id – device id
  • weight – new weight for device
set_overload(overload)
set_replicas(new_replica_count)

Changes the number of replicas in this ring.

If the new replica count is sufficiently different that self._replica2part2dev will change size, sets self.devs_changed. This is so tools like bin/swift-ring-builder can know to write out the new ring rather than bailing out due to lack of balance change.

to_dict()

Returns a dict that can be used later with copy_from to restore a RingBuilder. swift-ring-builder uses this to pickle.dump the dict to a file and later load that dict into copy_from.

validate(stats=False)

Validate the ring.

This is a safety function to try to catch any bugs in the building process. It ensures partitions have been assigned to real devices, aren’t doubly assigned, etc. It can also optionally check the even distribution of partitions across devices.

Parameters:stats – if True, check distribution of partitions across devices
Returns:if stats is True, a tuple of (device_usage, worst_stat), else (None, None). device_usage[dev_id] will equal the number of partitions assigned to that device. worst_stat will equal the number of partitions the worst device is skewed from the number it should have.
Raises:RingValidationError – problem was found with the ring.
weight_of_one_part()

Returns the weight of each partition as calculated from the total weight of all the devices.

exception swift.common.ring.builder.RingValidationWarning

Bases: exceptions.Warning