|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
What’s the best general implementation of custom #hash methods?Hello, ruby-talk.
What’s the best (fast, yet with relatively few cases when equal hashes are generated for non-eql objects) way to implement a hash method? Is XOR-ing the hashes of an object’s instance variables (assuming that two objects are eql if their ivars are eql) a good general approach? I’m asking because I use this pattern in my code, and it seems quite fast; I needed to work around a bug in MRI’s Set#hash and this solution¹ seems about as fast as the original MRI’s Set#hash. I’m just not sure whether it’s hashy enough (so to speak) in that two un-eql objects hashed in this way produce different hashes often enough. ¹ http://github.com/Chastell/art-decomp/commit/b465660b9dcc8eada86f50bbcb8f951aa8145a19 — Shot -- Ill-informed qmail-bashing is better than no qmail-bashing at all. [Don Marti] |
|
|
Re: What’s the best general implementation of custom #hash methods?On Thu, Sep 10, 2009 at 11:56 AM, Shot (Piotr Szotkowski)<shot@...> wrote:
> Hello, ruby-talk. > > What’s the best (fast, yet with relatively few cases when equal hashes > are generated for non-eql objects) way to implement a hash method? > > Is XOR-ing the hashes of an object’s instance variables (assuming that > two objects are eql if their ivars are eql) a good general approach? > > I’m asking because I use this pattern in my code, and it seems quite > fast; I needed to work around a bug in MRI’s Set#hash and this solution¹ > seems about as fast as the original MRI’s Set#hash. I’m just not sure > whether it’s hashy enough (so to speak) in that two un-eql objects > hashed in this way produce different hashes often enough. > Well, let's look at how Array#hash works here's the 1.8 C code: static VALUE rb_ary_hash(ary) VALUE ary; { long i, h; VALUE n; h = RARRAY(ary)->len; for (i=0; i<RARRAY(ary)->len; i++) { h = (h << 1) | (h<0 ? 1 : 0); n = rb_hash(RARRAY(ary)->ptr[i]); h ^= NUM2LONG(n); } return LONG2FIX(h); } This looks like a typical cryptographic hash algorithm pattern combining bit rotation and xor. It seeds the value with the length of the array, then for each element, rotates the value 1 bit to the left, then xors in the elements hash. This scrambles the bits more than just a straight xor -- Rick DeNatale Blog: http://talklikeaduck.denhaven2.com/ Twitter: http://twitter.com/RickDeNatale WWR: http://www.workingwithrails.com/person/9021-rick-denatale LinkedIn: http://www.linkedin.com/in/rickdenatale |
|
|
Re: What’s the best general implementation of custom #hash methods?Shot (Piotr Szotkowski) wrote:
> Hello, ruby-talk. > > What’s the best (fast, yet with relatively few cases when equal hashes > are generated for non-eql objects) way to implement a hash method? One way is to just put everything you care about into an array and hash that, as in: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/230421 But it might not qualify as "fast", I suppose. -- vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407 |
|
|
RE: What’s the best general implementation of custom #hash methods?You could check out some of these (in several languages including Ruby): http://www.partow.net/programming/hashfunctions/#Download > Date: Fri, 11 Sep 2009 00:56:14 +0900 > From: shot@... > Subject: What’s the best general implementation of custom #hash methods? > To: ruby-talk@... > > Hello, ruby-talk. > > What’s the best (fast, yet with relatively few cases when equal hashes > are generated for non-eql objects) way to implement a hash method? > > Is XOR-ing the hashes of an object’s instance variables (assuming that > two objects are eql if their ivars are eql) a good general approach? > > I’m asking because I use this pattern in my code, and it seems quite > fast; I needed to work around a bug in MRI’s Set#hash and this solution¹ > seems about as fast as the original MRI’s Set#hash. I’m just not sure > whether it’s hashy enough (so to speak) in that two un-eql objects > hashed in this way produce different hashes often enough. > > ¹ http://github.com/Chastell/art-decomp/commit/b465660b9dcc8eada86f50bbcb8f951aa8145a19 > > — Shot > -- > Ill-informed qmail-bashing is better than > no qmail-bashing at all. [Don Marti] _________________________________________________________________ Get back to school stuff for them and cashback for you. http://www.bing.com/cashback?form=MSHYCB&publ=WLHMTAG&crea=TEXT_MSHYCB_BackToSchool_Cashback_BTSCashback_1x1 |
|
|
Re: What’s the best general implementation of custom #hash methods?On 9/10/09, Shot (Piotr Szotkowski) <shot@...> wrote:
> Hello, ruby-talk. > > What’s the best (fast, yet with relatively few cases when equal hashes > are generated for non-eql objects) way to implement a hash method? > > Is XOR-ing the hashes of an object’s instance variables (assuming that > two objects are eql if their ivars are eql) a good general approach? The drawback to this algorithm would be that if you have the same values but in different variables, they'll end up hashing the same. For example: class Pair def initialize(a,b) @a,@b=a,b end def hash @a.hash^@... end end Pair.new("foo","bar").hash==Pair.new("bar","foo").hash #=>true As a practical matter, this often doesn't matter. It depends on the application and if you think this kind of situation is likely. For general purpose use, it's better to have some way of encoding the 'location' of each sub-value in the hash as well. |
|
|
Re: What’s the best general implementation of cust om #hash methods?On Sep 10, 10:49 am, Rick DeNatale <rick.denat...@...> wrote:
> On Thu, Sep 10, 2009 at 11:56 AM, Shot (Piotr Szotkowski)<s...@...> wrote: > > Hello, ruby-talk. > > > What’s the best (fast, yet with relatively few cases when equal hashes > > are generated for non-eql objects) way to implement a hash method? > > > Is XOR-ing the hashes of an object’s instance variables (assuming that > > two objects are eql if their ivars are eql) a good general approach? > > > I’m asking because I use this pattern in my code, and it seems quite > > fast; I needed to work around a bug in MRI’s Set#hash and this solution¹ > > seems about as fast as the original MRI’s Set#hash. I’m just not sure > > whether it’s hashy enough (so to speak) in that two un-eql objects > > hashed in this way produce different hashes often enough. > > Well, let's look at how Array#hash works here's the 1.8 C code: > > static VALUE > rb_ary_hash(ary) > VALUE ary; > { > long i, h; > VALUE n; > > h = RARRAY(ary)->len; > for (i=0; i<RARRAY(ary)->len; i++) { > h = (h << 1) | (h<0 ? 1 : 0); > n = rb_hash(RARRAY(ary)->ptr[i]); > h ^= NUM2LONG(n); > } > return LONG2FIX(h); > > } > > This looks like a typical cryptographic hash algorithm pattern > combining bit rotation and xor. It seeds the value with the length of > the array, then for each element, rotates the value 1 bit to the left, > then xors in the elements hash. > > This scrambles the bits more than just a straight xor > > -- > Rick DeNatale I must take STRONG exception to your characterization of this algorithm. This method performs parallel xors of the data. Change one bit of one of the values, and you change one bit of the hash. A fundamental property of cryptographic hashes is that changing one bit of the input changes many bits of output. This is even true of crcs. Cryptographic hashes do rotate & xor (sometimes), but only as part of a larger operation. |
| Free embeddable forum powered by Nabble | Forum Help |