Weakly Reachable: Java Idioms for Hashing

2008-04-28

Java Idioms for Hashing

It is common practice in Java for the calculation of a hash code to take every field into account, recursively descending into referenced objects. In contexts where performance matters, this practice can be harmful. The computation of the hash code in such fashion may actually be more expensive than a full equality check considering that hashing involves a lot of multiplications where equality testing involves only comparisons.

Hashing is also used to divide objects into buckets. For this purpose the extra cost is justified since one hash computation may eliminate many equality tests. However, it is frequently not necessary to hash the entire reachable object graph to get a good hash code. If an object's local non-reference fields show good variance in values across objects, it is adequate to hash them alone -- or even just a subset. The most problematic case is "algebraic types", where the objects may large consist only of references to other objects. Even in this case, one might still be able to get a decent hash by hashing only the object types a few levels into the object graph. Limiting the depth of the traversal is also a reasonable strategy if the reachable object graph contains circular references.

Again, these observations only really matter when performance is a large consideration. For small collections of objects from a small space, the extra think time is probably not worth the effort. But, once again, it is shown that hashing is difficult.

Weakly Reachable

2008-04-28

Java Idioms for Hashing

Blog Archive