Tip 1 - Hash codes in complex primary keys

August 19th, 1999
(Updated November 17th, 1999)

Acknowledgment: Special thanks to James Cook, Ben Strulo, and Frank Sauer for helping to improve this Tip.

In the book Enterprise JavaBeans I talk about the necessity of overriding the Object.hashCode( ) and Object.equals( ) methods in the primary key class. The example given in the book (page 131) is a pretty simple primary key consisting of only one integer field, id. With complex primary keys, keys with several fields, overriding the Object.equals( ) and Object.toString( ) methods remains trivial. However, the Object.hashCode( ) method is more complicated because an integer value that identifies the primary key must be created from several fields. You can always reach our poem writers for hire at https://300writers.com/hire-a-poem-writer.html, asking them for more instructions or assistance.

How do you implement the hashCode( ) method for more complex primary keys consisting of several fields?

One solution, which works well on the Java 2 plateform, is to concatenate all the values into a String and use the String object's hashCode( ) method to create a hash code value for the whole primary key. The String class has a pretty decent hash code algorithm, which will generate a well distributed and repeatable hash code value from any set of characters. Below is an example of how this is done for a hypothetical primary key.

Complex primary key in JDK 1.2 with hashCode method

public class HypotheticalPrimaryKey implements java.io.Serializable {

    public int primary_id;

    public short secondary_id;

    public java.util.Date date;

    public String desc;

    public int hashCode( ) {

        StringBuffer strBuff = new StringBuffer( );

        strBuff.append(primary_id);

        strBuff.append(secondary_id);

        strBuff.append(date);

        strBuff.append(desc);

        String str = strBuff.toString();

        int hashCode = str.hashCode( );

        return hashCode;

      // the constructor, equals and toString methods follow

In this case I used a StringBuffer object to cut down on the number of objects created, since String concatenation is expensive. The code could be improved by saving the hash code value in a private variable and returning that value in subsequent method calls -- this way the hash code value is only calculated once in the life of the instance. I leave this as an exercise for the reader

Well-Distributed vs. Unique Hashcodes

A Hashtable is designed to provide a binding between an object and a key. Keys are converted to integer values in the Hashtable using the key's hashCode( ) method. This makes the Hashtable really fast when looking up objects by its key.

Hash codes do not need to be unique only well distributed. A well-distributed hash code algorithm will reduce but not eliminate the number of times different keys evaluate to the same hash code. When keys evaluate to the same hash code they are stored together and uniquely identified by their equals( ) method. So if you lookup an object using a key that evaluates to a hash code that is shared by several other keys, the Hashtable will be able to locate the object using the key's equal( ) method.

The emphasis in designing a good hash code algorithm is on producing codes that are well-distributed rather than unique. This ensures that the index for associating keys with object is small, but still fast.

Improvements for really large collections

The use of the String's hash code algorithm is sufficient for most purposes. The following offers some improvements that may help improve distribution when a really large collection of objects needs to be stored in a Hashtable.

The String hash code strategy can be improved to strengthen the distribution of hash codes in two ways: Using separators when field values evaluate to the same String value; Using the JDK1.2 String hash code algorithm in JDK1.1 applications.

Fields values evaluate to the same String value

When two numerical values of related type (short/int/long or double/float) are used consecutively its possible for two distinct primary keys to evaluate to the same String value and the same hash code. The chart below provides an example of the values for the above example that result in duplicate String values (pay attention to the id fields).

	First Primary Key	Second Primary Key
Fields Values	primary_id = 15 secondary_id = 25 date = null desc = ""	primary_id = 152 secondary_id =5 date = null desc = ""
String Values	1525null	1525null
Hashcode Values	-1965758652	-1965758652

Notice that the fields evaluate to the same String and hash code. Distribution can be improved by using separator characters which change the concatenated String values and result in different hash codes. Below the hash Code method has been changed to include a separator between the primary and secondary id fields.

Modified JDK 1.2 hashCode method with seperator

public int hashCode( ) {

        StringBuffer strBuff = new StringBuffer( );

        strBuff.append(primary_id);

        strBuff.append('|');

        strBuff.append(secondary_id);

        strBuff.append(date);

        strBuff.append(desc);

        String str = strBuff.toString();

        int hashCode = str.hashCode( );

        return hashCode;

String is longer then 16 characters in JDK 1.1

In JDK 1.1, the String.hashCode( ) method uses a different algorithm for calculating hash codes when String are longer 16 characters . This algorithm skips every fourth character, so small differences between primary keys will not be detected. Below is an example of the same keys as used above, only this time the date is set. The String value of a date is very long making the concatenated String objects 33 characters long -- far over the 16 character limit. As a result, the two strings evaluate to the same hash code even know they are different because of the separator character.

	First Primary Key	Second Primary Key
Fields Values	primary_id = 15 secondary_id = 25 date = today desc = ""	primary_id = 152 secondary_id =5 date = today desc = "none"
String Values	15\|25Tue Aug 31 13:12:40 CDT 1999	152\|5Tue Aug 31 13:12:40 CDT 1999
Hashcode Values	-687018486	-687018486

To improve the distribution of hash codes you can keep the possible size of all concatenated values under 16 character for JDK 1.1 applications or you can change the hash Code method so that it uses the same hash code algorithm as the String class in JDK 1.2. Below is a redefined version of the hashCode method with the hash algorithm stolen from JDK 1.2 String class. (Applications running on JDK 1.2 can depend on the hashCode method defined in the String class shown above.)

Modified hashCode method with JDK 1.2 hash algorithm.

public int hashCode( ) { 

         StringBuffer strBuff = new StringBuffer( ); 

         strBuff.append(primary_id); 

         strBuff.append('|');

         strBuff.append(secondary_id); 

         strBuff.append(date); 

         strBuff.append(desc); 

         int hashCode = 0;
         int off = 0;

         for (int i = 0; i < strBuff.length(); i++)
             hashCode = 31*hashCode + strBuff.charAt(off++);
             

         return hashCode; 

 }