The "One Laptop Per Child" project has a great device ready to ship, but there's no Java on there. Let's think about working together to put Java on OLPC!
Replies:
74 -
Pages:
5
[
12345
| Next
]
Threads:
[
Previous
|
Next
]
Before we had Eclipse 3.2M5 if you wanted a good hashCode() or equals(Object) method your only options were to read a good book on the subject and implement the method, guess and implement it yourself and find out months later it was riddled with bugs or use Jakarta Commons-Lang utility classes used to
build hashCodes
or perform
equals operations
.
But what if you didn't want to do any of that? What if you just wanted your IDE to be smart enough to generate new methods for you based on the properties of your class? What if indeed, this is where Eclipse 3.2M5 comes in.
Sample Project
Below is a screenshot of my sample dummy project and dummy class I created just for this tip:
I went ahead and modeled a poor-man's version of a user with all the properties I think you would need to get a relatively unique match for someone. Age, first, middle and last name.
Generating our Identities
Now let's have Eclipse generate us a
hashCode
and
equals
method based on these properties:
Now we want to leave all the properties selected, these will all be included in the hash and
equals
calculations. If you didn't want certain properties included in the calculation (say maybe each User had a list of the other Users they knew, you could uncheck this and exclude it from the calculation):
It is worth noting that the generated code from Eclipse 3.2M5 will in fact take into account
null
values for you, so you don't have to worry about
NullPointerException
s at runtime because of this code.
Reviewing the Code
Now let's look at the code that was generated for us:
Ok now let's check this code. Everyone break out their copies of
Effective Java
that they all have sitting next to them, and turn to page 36: "Always override
hashCode
when you override equals".
You'll notice the similarities in the implementation details of the
hashCode
method suggested by this book and the ones Eclipse 3.2M5 generated. You'll also notice that Eclipse takes into account
null
values so you don't have to.
Now if you have a look at the
equals
method you'll notice all the nice short-circuit code at the top. No need to check if things are null, no need to check if they are already equal, etc. The one thing I've always wondered about
equals
methods though is why not just have a
hashCode
check right after the preliminary basic checks? Since your
hashCode
already considers all the properties off your class, why not just use it? I'll admit, I didn't re-read this section of Effective Java, so it might cover why. Regardless this
equals
implementation looks good, considers all the properties, short-circuits incase of any nulls and does just what we want.
Update #1
: As
Jacob Grydholt Jensen
pointed out to me you cannot use
hashCode
in your
equals
implementation because a
hashCode
implementation that always returns 0 is completely valid per the javadoc:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Thanks Jacob!
Update #2
:
Will Hartung
gave a good reply as to why you can never really use hash values for equality testing:
As a very crude example, consider if the hashcode for a word was simply the first letter. Bill, Betty, and Bob would all share the same hashcode ('B'), but are clearly different objects.
Even consider a simple String hash, where you multiply the value of each character by 31 and add it (basically just like this algorirm). After 6 characters, you're already "losing" information, assuming you have a 32 bit hashcode, as you multiply it by 31, you're shifting it roughly 5 bits. If you mutliply by 32, you would be doing exactly that, but by using the prime you preserve a bit more information. So, you can see how it's essentially impossible to get a "perfect hash" for any reasonably sized object, and that's why you can not rely on hashcode to check for two objects being identical.
Thanks Will!
Conclusion
Some people may wonder what happens when you add new properties to the class and want the
hashCode
and
equals
methods to reflect that, well the answer is that you need to erase your old methods and regenerate them using the wizard again. This is just like the properties (getter/setter) generation wizards already in Eclipse.
Well that is all for this tip, this is a battle I have fought many times before and had an especially hard time finding a good performant solution to the
hashCode
issue for my libraries that have needed to consider custom identity and equality methods. Thanks for reading.
Re: Generating good hashCode() and equals(Object) methods
I know that this is splitting hairs but as a general rule, you shouldn't base these methods on values that can change. Certainly, the age of a user will change. Even the name can change. You really need to be careful: poorly chosen fields can have very odd effects at apparently random times.
You're right though... it's a cool feature. But you still need to be careful how you use it...
Wayne Beaton
Eclipse Foundation
http://wbeaton.blogspot.com
Re: Generating good hashCode() and equals(Object) methods
> The one
> thing I've always wondered about
equals
> methods though is why not just have
> a
hashCode
check right after the
> preliminary basic checks? Since
> your
hashCode
already considers all the
> properties off your class, why not just use it?
hashCode cannot be used since the hashCode function that maps everything to 0 (zero) is always valid, but it can hardly be used to implement an equals method.
The last part of the hashCode contract from Object's javadoc explains this:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Re: Generating good hashCode() and equals(Object) methods
Wayne
You aren't splitting hairs at all. As is pointed out in many tutorials and books covering this subject, it is vastly misunderstood on how a hashCode or equals should be performed and I just proved that. So you adding this notation is great feedback.
Re: Generating good hashCode() and equals(Object) methods
No trolling here. Extensions like these allow newbies and others to see good patterns. It may be arguable that this particular pattern is questionable but for someone who wants a template or sample code its great. We need editors that provide these types of code generation and templates.
Re: Generating good hashCode() and equals(Object) methods
I posted a question on the eclipse jdt forum about the generated equals() method and only got a single hit (so far). Anyone have an issue with their implementation of equals being different than 'Effective Java's.
Below is a top of of an equals() that I generated with M5:
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
... }
I believe 'Effective Java' suggests the following:
public boolean equals(Object obj) {
if (!(obj instanceof MyClass))
return false;
if (this == obj)
return true;
...
}
His point of was that the Java spec indicates that if the input parameter value is null, then the instanceof test should return false and thus you don't need the additional test "if (obj == null)".
There's also the fact that the Eclipse version expects the classes to be identical versus a possible subclass for the other option.
I haven't looked at the Apache Commons source yet, but I would be willing to bet Apache followed Joshua Bloch's recipe for their EqualsBuilder class which means that library clashes with what Eclipse generates.
Re: Generating good hashCode() and equals(Object) methods
instanceof used to be a very expensive call, as the VM's have progressed it has gotten cheaper and cheaper but it is still a much more expensive call than:
1) Reference check (3.2M5 check #1)
2) Null check (3.2M5 check #2)
3) Straight method call (3.2M5 check #2)
The instanceof call, AFAIK, will actually traverse the object hierarchy if necessary to see if matching classes can be found, what happens behind the scenes is much more than those 3 simple checks. So instanceof is shorter and technically Bloch is right and assuming we were comparing oranges to oranges here speed wise, I would go Bloch's route, but given what can happen in an instanceof call, I like the fact that Eclipse generates the faster solution for now.
Re: Generating good hashCode() and equals(Object) methods
> I haven't looked at the Apache Commons source yet,
> but I would be willing to bet Apache followed Joshua
> Bloch's recipe for their EqualsBuilder class which
> means that library clashes with what Eclipse
> generates.
>
> Am I "splitting hairs" and being too picky?
Well, if you are splitting hairs then so is Josh Bloch and a lot of other people. For a short discussion of these two approaches to the equals method, see http://www.artima.com/intv/bloch17.html.
Re: Generating good hashCode() and equals(Object) methods
> instanceof used to be a very expensive call, as the
> VM's have progressed it has gotten cheaper and
> cheaper but it is still a much more expensive call
> than:
> 1) Reference check (3.2M5 check #1)
> 2) Null check (3.2M5 check #2)
> 3) Straight method call (3.2M5 check #2)
>
> The instanceof call, AFAIK, will actually traverse
> the object hierarchy if necessary to see if matching
> classes can be found, what happens behind the scenes
> is much more than those 3 simple checks. So
> instanceof is shorter and technically Bloch is right
> and assuming we were comparing oranges to oranges
> here speed wise, I would go Bloch's route, but given
> what can happen in an instanceof call, I like the
> fact that Eclipse generates the faster solution for
> now.
As Bloch explains on http://www.artima.com/intv/bloch17.html, this is not merely a question of execution speed, but a question of semantics. Unless your application is very special, I don't think you should worry about the instanceof call, but rather worry about the semantics. As always premature optimisations is the root of all evil.
Re: Generating good hashCode() and equals(Object) methods
In the article you say "I'll admit, I didn't re-read this section of Effective Java, so it might cover why."
The reason why hashCode can return the same value for two unequal objects is simply because that's what hash codes do.
There is no way you can take, for example, a large object (in this case one with several fields) and reduce it in to a unique instance only 32 bits long. You will inevitably have overlap where hashcodes map to two or more distinct objects.
As a very crude example, consider if the hashcode for a word was simply the first letter. Bill, Betty, and Bob would all share the same hashcode ('B'), but are clearly different objects.
Even consider a simple String hash, where you multiply the value of each character by 31 and add it (basically just like this algorirm). After 6 characters, you're already "losing" information, assuming you have a 32 bit hashcode, as you multiply it by 31, you're shifting it roughly 5 bits. If you mutliply by 32, you would be doing exactly that, but by using the prime you preserve a bit more information. So, you can see how it's essentially impossible to get a "perfect hash" for any reasonably sized object, and that's why you can not rely on hashcode to check for two objects being identical.
Generating good hashCode() and equals(Object) methods
At 10:01 AM on Feb 24, 2006, Riyad Kalla
wrote:
Sample Project
Below is a screenshot of my sample dummy project and dummy class I created just for this tip:I went ahead and modeled a poor-man's version of a user with all the properties I think you would need to get a relatively unique match for someone. Age, first, middle and last name.
Generating our Identities
Now let's have Eclipse generate us ahashCodeandequalsmethod based on these properties:Now we want to leave all the properties selected, these will all be included in the hash and
equalscalculations. If you didn't want certain properties included in the calculation (say maybe each User had a list of the other Users they knew, you could uncheck this and exclude it from the calculation):It is worth noting that the generated code from Eclipse 3.2M5 will in fact take into account
nullvalues for you, so you don't have to worry aboutNullPointerExceptions at runtime because of this code.Reviewing the Code
Now let's look at the code that was generated for us:Ok now let's check this code. Everyone break out their copies of Effective Java that they all have sitting next to them, and turn to page 36: "Always override
hashCodewhen you override equals". You'll notice the similarities in the implementation details of thehashCodemethod suggested by this book and the ones Eclipse 3.2M5 generated. You'll also notice that Eclipse takes into accountnullvalues so you don't have to. Now if you have a look at theequalsmethod you'll notice all the nice short-circuit code at the top. No need to check if things are null, no need to check if they are already equal, etc. The one thing I've always wondered aboutequalsmethods though is why not just have ahashCodecheck right after the preliminary basic checks? Since yourhashCodealready considers all the properties off your class, why not just use it? I'll admit, I didn't re-read this section of Effective Java, so it might cover why. Regardless thisequalsimplementation looks good, considers all the properties, short-circuits incase of any nulls and does just what we want.Update #1 : As Jacob Grydholt Jensen pointed out to me you cannot use
hashCodein yourequalsimplementation because ahashCodeimplementation that always returns 0 is completely valid per the javadoc: Thanks Jacob!Update #2 : Will Hartung gave a good reply as to why you can never really use hash values for equality testing: Thanks Will!
Conclusion
Some people may wonder what happens when you add new properties to the class and want thehashCodeandequalsmethods to reflect that, well the answer is that you need to erase your old methods and regenerate them using the wizard again. This is just like the properties (getter/setter) generation wizards already in Eclipse. Well that is all for this tip, this is a battle I have fought many times before and had an especially hard time finding a good performant solution to thehashCodeissue for my libraries that have needed to consider custom identity and equality methods. Thanks for reading.74 replies so far (
Post your own)
Re: Generating good hashCode() and equals(Object) methods
I know that this is splitting hairs but as a general rule, you shouldn't base these methods on values that can change. Certainly, the age of a user will change. Even the name can change. You really need to be careful: poorly chosen fields can have very odd effects at apparently random times.You're right though... it's a cool feature. But you still need to be careful how you use it...
Re: Generating good hashCode() and equals(Object) methods
Wow, very cool! It is stuff like this that continues to make Eclipse stand above the other IDE's. Keep innovating!Re: Generating good hashCode() and equals(Object) methods
Are you trolling? IDEA has had this feature for ages. Not that I want to enter a relious war thoughRe: Generating good hashCode() and equals(Object) methods
> The one> thing I've always wondered about
equals> methods though is why not just have
> a
hashCodecheck right after the> preliminary basic checks? Since
> your
hashCodealready considers all the> properties off your class, why not just use it?
hashCode cannot be used since the hashCode function that maps everything to 0 (zero) is always valid, but it can hardly be used to implement an equals method.
The last part of the hashCode contract from Object's javadoc explains this:
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Re: Generating good hashCode() and equals(Object) methods
WayneYou aren't splitting hairs at all. As is pointed out in many tutorials and books covering this subject, it is vastly misunderstood on how a hashCode or equals should be performed and I just proved that. So you adding this notation is great feedback.
Re: Generating good hashCode() and equals(Object) methods
Jacob,Thank you for the followup, that clarifies the point. I'll edit the original post to point that out.
Re: Generating good hashCode() and equals(Object) methods
No trolling here. Extensions like these allow newbies and others to see good patterns. It may be arguable that this particular pattern is questionable but for someone who wants a template or sample code its great. We need editors that provide these types of code generation and templates.Re: Generating good hashCode() and equals(Object) methods
I posted a question on the eclipse jdt forum about the generated equals() method and only got a single hit (so far). Anyone have an issue with their implementation of equals being different than 'Effective Java's.Below is a top of of an equals() that I generated with M5:
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
... }
I believe 'Effective Java' suggests the following:
public boolean equals(Object obj) {
if (!(obj instanceof MyClass))
return false;
if (this == obj)
return true;
...
}
His point of was that the Java spec indicates that if the input parameter value is null, then the instanceof test should return false and thus you don't need the additional test "if (obj == null)".
There's also the fact that the Eclipse version expects the classes to be identical versus a possible subclass for the other option.
I haven't looked at the Apache Commons source yet, but I would be willing to bet Apache followed Joshua Bloch's recipe for their EqualsBuilder class which means that library clashes with what Eclipse generates.
Am I "splitting hairs" and being too picky?
Re: Generating good hashCode() and equals(Object) methods
instanceof used to be a very expensive call, as the VM's have progressed it has gotten cheaper and cheaper but it is still a much more expensive call than:1) Reference check (3.2M5 check #1)
2) Null check (3.2M5 check #2)
3) Straight method call (3.2M5 check #2)
The instanceof call, AFAIK, will actually traverse the object hierarchy if necessary to see if matching classes can be found, what happens behind the scenes is much more than those 3 simple checks. So instanceof is shorter and technically Bloch is right and assuming we were comparing oranges to oranges here speed wise, I would go Bloch's route, but given what can happen in an instanceof call, I like the fact that Eclipse generates the faster solution for now.
Re: Generating good hashCode() and equals(Object) methods
> I haven't looked at the Apache Commons source yet,> but I would be willing to bet Apache followed Joshua
> Bloch's recipe for their EqualsBuilder class which
> means that library clashes with what Eclipse
> generates.
>
> Am I "splitting hairs" and being too picky?
Well, if you are splitting hairs then so is Josh Bloch and a lot of other people. For a short discussion of these two approaches to the equals method, see http://www.artima.com/intv/bloch17.html.
Re: Generating good hashCode() and equals(Object) methods
> instanceof used to be a very expensive call, as the> VM's have progressed it has gotten cheaper and
> cheaper but it is still a much more expensive call
> than:
> 1) Reference check (3.2M5 check #1)
> 2) Null check (3.2M5 check #2)
> 3) Straight method call (3.2M5 check #2)
>
> The instanceof call, AFAIK, will actually traverse
> the object hierarchy if necessary to see if matching
> classes can be found, what happens behind the scenes
> is much more than those 3 simple checks. So
> instanceof is shorter and technically Bloch is right
> and assuming we were comparing oranges to oranges
> here speed wise, I would go Bloch's route, but given
> what can happen in an instanceof call, I like the
> fact that Eclipse generates the faster solution for
> now.
As Bloch explains on http://www.artima.com/intv/bloch17.html, this is not merely a question of execution speed, but a question of semantics. Unless your application is very special, I don't think you should worry about the instanceof call, but rather worry about the semantics. As always premature optimisations is the root of all evil.
Re: Generating good hashCode() and equals(Object) methods
Jacob,Thanks for the links on the subject.
Re: Generating good hashCode() and equals(Object) methods
Thanks for the link - interesting reading.Re: Generating good hashCode() and equals(Object) methods
In the article you say "I'll admit, I didn't re-read this section of Effective Java, so it might cover why."The reason why hashCode can return the same value for two unequal objects is simply because that's what hash codes do.
There is no way you can take, for example, a large object (in this case one with several fields) and reduce it in to a unique instance only 32 bits long. You will inevitably have overlap where hashcodes map to two or more distinct objects.
As a very crude example, consider if the hashcode for a word was simply the first letter. Bill, Betty, and Bob would all share the same hashcode ('B'), but are clearly different objects.
Even consider a simple String hash, where you multiply the value of each character by 31 and add it (basically just like this algorirm). After 6 characters, you're already "losing" information, assuming you have a 32 bit hashcode, as you multiply it by 31, you're shifting it roughly 5 bits. If you mutliply by 32, you would be doing exactly that, but by using the prime you preserve a bit more information. So, you can see how it's essentially impossible to get a "perfect hash" for any reasonably sized object, and that's why you can not rely on hashcode to check for two objects being identical.