Java String – Why hashCode() Uses 31 as Multiplier

algorithmhashjavastring

Per the Java documentation, the hash code for a String object is computed as:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s[i] is the
ith character of the string, n is the length of
the string, and ^ indicates exponentiation.

Why is 31 used as a multiplier?

I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?

Best Answer

According to Joshua Bloch's Effective Java, Second Edition (a book that can't be recommended enough, and which I bought thanks to continual mentions on Stack Overflow):

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

(from Chapter 3, Item 9: Always override hashCode when you override equals, page 48)

Related Solutions

How hashCode() is Calculated in Java

The value returned by hashCode() is by no means guaranteed to be the memory address of the object. I'm not sure of the implementation in the Object class, but keep in mind most classes will override hashCode() such that two instances that are semantically equivalent (but are not the same instance) will hash to the same value. This is especially important if the classes may be used within another data structure, such as Set, that relies on hashCode being consistent with equals.

There is no hashCode() that uniquely identifies an instance of an object no matter what. If you want a hashcode based on the underlying pointer (e.g. in Sun's implementation), use System.identityHashCode() - this will delegate to the default hashCode method regardless of whether it has been overridden.

Nevertheless, even System.identityHashCode() can return the same hash for multiple objects. See the comments for an explanation, but here is an example program that continuously generates objects until it finds two with the same System.identityHashCode(). When I run it, it quickly finds two System.identityHashCode()s that match, on average after adding about 86,000 Long wrapper objects (and Integer wrappers for the key) to a map.

public static void main(String[] args) {
    Map<Integer,Long> map = new HashMap<>();
    Random generator = new Random();
    Collection<Integer> counts = new LinkedList<>();

    Long object = generator.nextLong();
    // We use the identityHashCode as the key into the map
    // This makes it easier to check if any other objects
    // have the same key.
    int hash = System.identityHashCode(object);
    while (!map.containsKey(hash)) {
        map.put(hash, object);
        object = generator.nextLong();
        hash = System.identityHashCode(object);
    }
    System.out.println("Identical maps for size:  " + map.size());
    System.out.println("First object value: " + object);
    System.out.println("Second object value: " + map.get(hash));
    System.out.println("First object identityHash:  " + System.identityHashCode(object));
    System.out.println("Second object identityHash: " + System.identityHashCode(map.get(hash)));
}

Example output:

Identical maps for size:  105822
First object value: 7446391633043190962
Second object value: -8143651927768852586
First object identityHash:  2134400190
Second object identityHash: 2134400190

Best Answer

Related Solutions

How hashCode() is Calculated in Java

Related Question