Object Audit Watchers

Dark Data

The name of “Dark Data” refers to the scientific hypothesis of Dark Matter, which supposes that the universe contains a lot of matter than we cannot observe. The Dark Data in Swift is the name of objects that are not accounted in the containers.

The experience of running large scale clusters suggests that Swift does not have any particular bugs that trigger creation of dark data. So, this is an excercise in writing watchers, with a plausible function.

When enabled, Dark Data watcher definitely drags down the cluster’s overall performance. Of course, the load increase can be mitigated as usual, but at the expense of the total time taken by the pass of auditor.

Because the watcher only deems an object dark when all container servers agree, it will silently fail to detect anything if even one of container servers in the ring is down or unreacheable. This is done in the interest of operators who run with action=delete.

If a container is sharded, there is a small edgecase where an object row could be misplaced. So it is recommended to always start with action=log, before your confident to run action=delete.

Finally, keep in mind that Dark Data watcher needs the container ring to operate, but runs on an object node. This can come up if cluster has nodes separated by function.