Forum Controls
Spotlight Features

The Rich Engineering Heritage Behind Dependency Injection

Andrew McVeigh takes us on a tour of the rich heritage behind dependency injection, what it represents, and tells us why its here to stay.

Java, the OLPC, and community responsibility

The "One Laptop Per Child" project has a great device ready to ship, but there's no Java on there. Let's think about working together to put Java on OLPC!
Replies: 74 - Pages: 5   [ 1 2 3 4 5 | Next ]
Threads: [ Previous | Next ]
  Click to reply to this thread Reply

Generating good hashCode() and equals(Object) methods

At 10:01 AM on Feb 24, 2006, Riyad Kalla Javalobby Editors wrote:

Before we had Eclipse 3.2M5 if you wanted a good hashCode() or equals(Object) method your only options were to read a good book on the subject and implement the method, guess and implement it yourself and find out months later it was riddled with bugs or use Jakarta Commons-Lang utility classes used to build hashCodes or perform equals operations . But what if you didn't want to do any of that? What if you just wanted your IDE to be smart enough to generate new methods for you based on the properties of your class? What if indeed, this is where Eclipse 3.2M5 comes in.

Sample Project

Below is a screenshot of my sample dummy project and dummy class I created just for this tip:


I went ahead and modeled a poor-man's version of a user with all the properties I think you would need to get a relatively unique match for someone. Age, first, middle and last name.

Generating our Identities

Now let's have Eclipse generate us a hashCode and equals method based on these properties:


Now we want to leave all the properties selected, these will all be included in the hash and equals calculations. If you didn't want certain properties included in the calculation (say maybe each User had a list of the other Users they knew, you could uncheck this and exclude it from the calculation):

It is worth noting that the generated code from Eclipse 3.2M5 will in fact take into account null values for you, so you don't have to worry about NullPointerException s at runtime because of this code.

Reviewing the Code

Now let's look at the code that was generated for us:


Ok now let's check this code. Everyone break out their copies of Effective Java that they all have sitting next to them, and turn to page 36: "Always override hashCode when you override equals". You'll notice the similarities in the implementation details of the hashCode method suggested by this book and the ones Eclipse 3.2M5 generated. You'll also notice that Eclipse takes into account null values so you don't have to. Now if you have a look at the equals method you'll notice all the nice short-circuit code at the top. No need to check if things are null, no need to check if they are already equal, etc. The one thing I've always wondered about equals methods though is why not just have a hashCode check right after the preliminary basic checks? Since your hashCode already considers all the properties off your class, why not just use it? I'll admit, I didn't re-read this section of Effective Java, so it might cover why. Regardless this equals implementation looks good, considers all the properties, short-circuits incase of any nulls and does just what we want.

Update #1 : As Jacob Grydholt Jensen pointed out to me you cannot use hashCode in your equals implementation because a hashCode implementation that always returns 0 is completely valid per the javadoc:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Thanks Jacob!

Update #2 : Will Hartung gave a good reply as to why you can never really use hash values for equality testing:
As a very crude example, consider if the hashcode for a word was simply the first letter. Bill, Betty, and Bob would all share the same hashcode ('B'), but are clearly different objects. Even consider a simple String hash, where you multiply the value of each character by 31 and add it (basically just like this algorirm). After 6 characters, you're already "losing" information, assuming you have a 32 bit hashcode, as you multiply it by 31, you're shifting it roughly 5 bits. If you mutliply by 32, you would be doing exactly that, but by using the prime you preserve a bit more information. So, you can see how it's essentially impossible to get a "perfect hash" for any reasonably sized object, and that's why you can not rely on hashcode to check for two objects being identical.
Thanks Will!

Conclusion

Some people may wonder what happens when you add new properties to the class and want the hashCode and equals methods to reflect that, well the answer is that you need to erase your old methods and regenerate them using the wizard again. This is just like the properties (getter/setter) generation wizards already in Eclipse. Well that is all for this tip, this is a battle I have fought many times before and had an especially hard time finding a good performant solution to the hashCode issue for my libraries that have needed to consider custom identity and equality methods. Thanks for reading.
  Click to reply to this thread Reply
1. At 10:38 AM on Feb 24, 2006, Wayne Beaton Javalobby Regulars wrote:

Re: Generating good hashCode() and equals(Object) methods

I know that this is splitting hairs but as a general rule, you shouldn't base these methods on values that can change. Certainly, the age of a user will change. Even the name can change. You really need to be careful: poorly chosen fields can have very odd effects at apparently random times.

You're right though... it's a cool feature. But you still need to be careful how you use it...
Wayne Beaton Eclipse Foundation http://wbeaton.blogspot.com
  Click to reply to this thread Reply
2. At 10:50 AM on Feb 24, 2006, Bob Balfe Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

Wow, very cool! It is stuff like this that continues to make Eclipse stand above the other IDE's. Keep innovating!
  Click to reply to this thread Reply
3. At 11:33 AM on Feb 24, 2006, Jacob Grydholt Jensen Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

Are you trolling? IDEA has had this feature for ages. Not that I want to enter a relious war though :-)
  Click to reply to this thread Reply
4. At 11:40 AM on Feb 24, 2006, Jacob Grydholt Jensen Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

> The one
> thing I've always wondered about equals
> methods though is why not just have
> a hashCode check right after the
> preliminary basic checks? Since
> your hashCode already considers all the
> properties off your class, why not just use it?

hashCode cannot be used since the hashCode function that maps everything to 0 (zero) is always valid, but it can hardly be used to implement an equals method.

The last part of the hashCode contract from Object's javadoc explains this:

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
  Click to reply to this thread Reply
5. At 11:42 AM on Feb 24, 2006, Riyad Kalla Javalobby Editors wrote:

Re: Generating good hashCode() and equals(Object) methods

Wayne
You aren't splitting hairs at all. As is pointed out in many tutorials and books covering this subject, it is vastly misunderstood on how a hashCode or equals should be performed and I just proved that. So you adding this notation is great feedback.
Best, Riyad [kallasoft | The "Break it Down" Blog]
  Click to reply to this thread Reply
6. At 11:52 AM on Feb 24, 2006, Riyad Kalla Javalobby Editors wrote:

Re: Generating good hashCode() and equals(Object) methods

Jacob,
Thank you for the followup, that clarifies the point. I'll edit the original post to point that out.
Best, Riyad [kallasoft | The "Break it Down" Blog]
  Click to reply to this thread Reply
7. At 1:29 PM on Feb 24, 2006, Bob Balfe Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

No trolling here. Extensions like these allow newbies and others to see good patterns. It may be arguable that this particular pattern is questionable but for someone who wants a template or sample code its great. We need editors that provide these types of code generation and templates.
  Click to reply to this thread Reply
8. At 2:16 PM on Feb 24, 2006, Mike Miller Blooming Javalobby Member wrote:

Re: Generating good hashCode() and equals(Object) methods

I posted a question on the eclipse jdt forum about the generated equals() method and only got a single hit (so far). Anyone have an issue with their implementation of equals being different than 'Effective Java's.

Below is a top of of an equals() that I generated with M5:

public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
... }

I believe 'Effective Java' suggests the following:
public boolean equals(Object obj) {
if (!(obj instanceof MyClass))
return false;
if (this == obj)
return true;
...
}

His point of was that the Java spec indicates that if the input parameter value is null, then the instanceof test should return false and thus you don't need the additional test "if (obj == null)".

There's also the fact that the Eclipse version expects the classes to be identical versus a possible subclass for the other option.

I haven't looked at the Apache Commons source yet, but I would be willing to bet Apache followed Joshua Bloch's recipe for their EqualsBuilder class which means that library clashes with what Eclipse generates.

Am I "splitting hairs" and being too picky?
  Click to reply to this thread Reply
9. At 2:35 PM on Feb 24, 2006, Riyad Kalla Javalobby Editors wrote:

Re: Generating good hashCode() and equals(Object) methods

instanceof used to be a very expensive call, as the VM's have progressed it has gotten cheaper and cheaper but it is still a much more expensive call than:
1) Reference check (3.2M5 check #1)
2) Null check (3.2M5 check #2)
3) Straight method call (3.2M5 check #2)

The instanceof call, AFAIK, will actually traverse the object hierarchy if necessary to see if matching classes can be found, what happens behind the scenes is much more than those 3 simple checks. So instanceof is shorter and technically Bloch is right and assuming we were comparing oranges to oranges here speed wise, I would go Bloch's route, but given what can happen in an instanceof call, I like the fact that Eclipse generates the faster solution for now.
Best, Riyad [kallasoft | The "Break it Down" Blog]
  Click to reply to this thread Reply
10. At 2:41 PM on Feb 24, 2006, Jacob Grydholt Jensen Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

> I haven't looked at the Apache Commons source yet,
> but I would be willing to bet Apache followed Joshua
> Bloch's recipe for their EqualsBuilder class which
> means that library clashes with what Eclipse
> generates.
>
> Am I "splitting hairs" and being too picky?

Well, if you are splitting hairs then so is Josh Bloch and a lot of other people. For a short discussion of these two approaches to the equals method, see http://www.artima.com/intv/bloch17.html.
  Click to reply to this thread Reply
11. At 2:44 PM on Feb 24, 2006, Jacob Grydholt Jensen Javalobby Newcomers wrote:

Re: Generating good hashCode() and equals(Object) methods

> instanceof used to be a very expensive call, as the
> VM's have progressed it has gotten cheaper and
> cheaper but it is still a much more expensive call
> than:
> 1) Reference check (3.2M5 check #1)
> 2) Null check (3.2M5 check #2)
> 3) Straight method call (3.2M5 check #2)
>
> The instanceof call, AFAIK, will actually traverse
> the object hierarchy if necessary to see if matching
> classes can be found, what happens behind the scenes
> is much more than those 3 simple checks. So
> instanceof is shorter and technically Bloch is right
> and assuming we were comparing oranges to oranges
> here speed wise, I would go Bloch's route, but given
> what can happen in an instanceof call, I like the
> fact that Eclipse generates the faster solution for
> now.

As Bloch explains on http://www.artima.com/intv/bloch17.html, this is not merely a question of execution speed, but a question of semantics. Unless your application is very special, I don't think you should worry about the instanceof call, but rather worry about the semantics. As always premature optimisations is the root of all evil.
  Click to reply to this thread Reply
12. At 2:54 PM on Feb 24, 2006, Riyad Kalla Javalobby Editors wrote:

Re: Generating good hashCode() and equals(Object) methods

Jacob,
Thanks for the links on the subject.
Best, Riyad [kallasoft | The "Break it Down" Blog]
  Click to reply to this thread Reply
13. At 3:07 PM on Feb 24, 2006, Mike Miller Blooming Javalobby Member wrote:

Re: Generating good hashCode() and equals(Object) methods

Thanks for the link - interesting reading.
  Click to reply to this thread Reply
14. At 3:14 PM on Feb 24, 2006, Will Hartung DeveloperZone Top 100 wrote:

Re: Generating good hashCode() and equals(Object) methods

In the article you say "I'll admit, I didn't re-read this section of Effective Java, so it might cover why."

The reason why hashCode can return the same value for two unequal objects is simply because that's what hash codes do.

There is no way you can take, for example, a large object (in this case one with several fields) and reduce it in to a unique instance only 32 bits long. You will inevitably have overlap where hashcodes map to two or more distinct objects.

As a very crude example, consider if the hashcode for a word was simply the first letter. Bill, Betty, and Bob would all share the same hashcode ('B'), but are clearly different objects.

Even consider a simple String hash, where you multiply the value of each character by 31 and add it (basically just like this algorirm). After 6 characters, you're already "losing" information, assuming you have a 32 bit hashcode, as you multiply it by 31, you're shifting it roughly 5 bits. If you mutliply by 32, you would be doing exactly that, but by using the prime you preserve a bit more information. So, you can see how it's essentially impossible to get a "perfect hash" for any reasonably sized object, and that's why you can not rely on hashcode to check for two objects being identical.

thread.rss_message