In the book Enterprise JavaBeans I talk about the necessity of overriding the Object.hashCode( ) and Object.equals( ) methods in the primary key class. The example given in the book (page 131) is a pretty simple primary key consisting of only one integer field, id. With complex primary keys, keys with several fields, overriding the Object.equals( ) and Object.toString( ) methods remains trivial. However, the Object.hashCode( ) method is more complicated because an integer value that identifies the primary key must be created from several fields.
How do you implement the hashCode( ) method for more complex primary keys consisting of several fields?
One solution, which works well on the Java 2 plateform, is to concatenate all the values into a String and use the String object's hashCode( ) method to create a hash code value for the whole primary key. The String class has a pretty decent hash code algorithm, which will generate a well distributed and repeatable hash code value from any set of characters. Below is an example of how this is done for a hypothetical primary key.
Complex primary key in JDK 1.2 with hashCode method |
public class HypotheticalPrimaryKey implements java.io.Serializable {
public int primary_id;
public short secondary_id;
public java.util.Date date;
public String desc;
public int hashCode( ) {
StringBuffer strBuff = new StringBuffer( );
strBuff.append(primary_id);
strBuff.append(secondary_id);
strBuff.append(date);
strBuff.append(desc);
String str = strBuff.toString();
int hashCode = str.hashCode( );
return hashCode;
}
// the constructor, equals and toString methods follow
}
|
In this case I used a StringBuffer object to cut down on the number of objects created, since String concatenation is expensive. The code could be improved by saving the hash code value in a private variable and returning that value in subsequent method calls -- this way the hash code value is only calculated once in the life of the instance. I leave this as an exercise for the reader
Well-Distributed vs. Unique HashcodesA Hashtable is designed to provide a binding between an object and a key. Keys are converted to integer values in the Hashtable using the key's hashCode( ) method. This makes the Hashtable really fast when looking up objects by its key.
Hash codes do not need to be unique only well distributed. A well-distributed hash code algorithm will reduce but not eliminate the number of times different keys evaluate to the same hash code. When keys evaluate to the same hash code they are stored together and uniquely identified by their equals( ) method. So if you lookup an object using a key that evaluates to a hash code that is shared by several other keys, the Hashtable will be able to locate the object using the key's equal( ) method.
The emphasis in designing a good hash code algorithm is on producing codes that are well-distributed rather than unique. This ensures that the index for associating keys with object is small, but still fast.
Improvements for really large collectionsThe use of the String's hash code algorithm is sufficient for most purposes. The following offers some improvements that may help improve distribution when a really large collection of objects needs to be stored in a Hashtable.
The String hash code strategy can be improved to strengthen the distribution of hash codes in two ways: Using separators when field values evaluate to the same String value; Using the JDK1.2 String hash code algorithm in JDK1.1 applications.
Fields values evaluate to the same String value
When two numerical values of related type (short/int/long or double/float) are used consecutively its possible for two distinct primary keys to evaluate to the same String value and the same hash code. The chart below provides an example of the values for the above example that result in duplicate String values (pay attention to the id fields).
|
First Primary Key |
Second Primary Key |
Fields Values |
primary_id = 15
secondary_id = 25
date = null
desc = "" |
primary_id = 152
secondary_id =5
date = null
desc = "" |
String Values |
1525null |
1525null |
Hashcode Values |
-1965758652 |
-1965758652 |
Notice that the fields evaluate to the same String and hash code. Distribution can be improved by using separator characters which change the concatenated String values and result in different hash codes. Below the hash Code method has been changed to include a separator between the primary and secondary id fields.
Modified JDK 1.2 hashCode method with seperator |
public int hashCode( ) {
StringBuffer strBuff = new StringBuffer( );
strBuff.append(primary_id);
strBuff.append('|');
strBuff.append(secondary_id);
strBuff.append(date);
strBuff.append(desc);
String str = strBuff.toString();
int hashCode = str.hashCode( );
return hashCode;
}
|
String is longer then 16 characters in JDK 1.1
In JDK 1.1, the String.hashCode( ) method uses a different algorithm for calculating hash codes when String are longer 16 characters . This algorithm skips every fourth character, so small differences between primary keys will not be detected. Below is an example of the same keys as used above, only this time the date is set. The String value of a date is very long making the concatenated String objects 33 characters long -- far over the 16 character limit. As a result, the two strings evaluate to the same hash code even know they are different because of the separator character.
|
First Primary Key |
Second Primary Key |
Fields Values |
primary_id = 15
secondary_id = 25
date = today
desc = "" |
primary_id = 152
secondary_id =5
date = today
desc = "none" |
String Values |
15|25Tue Aug 31 13:12:40 CDT 1999 |
152|5Tue Aug 31 13:12:40 CDT 1999 |
Hashcode Values |
-687018486 |
-687018486 |
To improve the distribution of hash codes you can keep the possible size of all concatenated values under 16 character for JDK 1.1 applications or you can change the hash Code method so that it uses the same hash code algorithm as the String class in JDK 1.2. Below is a redefined version of the hashCode method with the hash algorithm stolen from JDK 1.2 String class. (Applications running on JDK 1.2 can depend on the hashCode method defined in the String class shown above.)
Modified hashCode method with JDK 1.2 hash algorithm. |
public int hashCode( ) {
StringBuffer strBuff = new StringBuffer( );
strBuff.append(primary_id);
strBuff.append('|');
strBuff.append(secondary_id);
strBuff.append(date);
strBuff.append(desc);
int hashCode = 0;
int off = 0;
for (int i = 0; i < strBuff.length(); i++)
hashCode = 31*hashCode + strBuff.charAt(off++);
return hashCode;
}
|
|