encapsulated hash codes

View: New views
4 Messages — Rating Filter:   Alert me  

encapsulated hash codes

by Allen Wirfs-Brock-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

A straw man proposal for  Harmony encapsulated hash codes  has been posted to the Wiki at http://wiki.ecmascript.org/doku.php?id=strawman:encapsulated_hashcodes

 

Allen


_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: encapsulated hash codes

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 4, 2009, at 2:49 PM, Allen Wirfs-Brock wrote:

A straw man proposal for  Harmony encapsulated hash codes  has been posted to the Wiki athttp://wiki.ecmascript.org/doku.php?id=strawman:encapsulated_hashcodes

1) Wouldn't it be simpler to have a single Object.hash() function that returns a uniformly distributed number in the 0..2^32 range, and leave the mod operation up to code using the hash code? Making a new function that (essentially) encapsulates the hash operation and the mod does not seem to pull its weight in API complexity.

2) What about non-objects? It would be nice if strings, numbers, and perhaps other primitive types could hash by value rather than by identity, so you could reasonably mix them as hash keys with objects.

3) Security considerations: it should be recommended that the hash code, if computed from the address of the object, is not reversible. (Of course, with a copying collector you probably can't compute the hash code from the address).

Regards,
Maciej


_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

RE: encapsulated hash codes

by Allen Wirfs-Brock-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

 See below

 

From: Maciej Stachowiak [mailto:mjs@...]
Sent: Wednesday, November 04, 2009 4:50 PM
To: Allen Wirfs-Brock
Cc: es-discuss
Subject: Re: encapsulated hash codes

 

 

On Nov 4, 2009, at 2:49 PM, Allen Wirfs-Brock wrote:



A straw man proposal for  Harmony encapsulated hash codes  has been posted to the Wiki at.

 

1)     Wouldn't it be simpler to have a single Object.hash() function that returns a uniformly distributed number in the 0..2^32 range, and leave the mod operation up to code using the hash code? Making a new function that (essentially) encapsulates the hash operation and the mod does not seem to pull its weight in API complexity.

 

Yes, it would and that is the classic approach (perhaps excluding the uniformity of distribution)  used by most languages.  As an implementer I would be fairly happy with that approach.  However, some of our members have significant concerns about the possibility of hash codes being used as a covert communications canner.  This proposal tries to avoid that problem by giving each client  of identity hashes (most likely each hash table) their own hash function which yields a distinct sequences of hash values (for the same sequence of objects).  Since  the functions aren’t shared the sequence of values they generate can be used as a channel.

 

Requiring uniform distribution over a 32-bit range may not be practical.  If a client only wants hashes in the 0.255 range the obvious function of a 32-bit value (mod 256) ignores most of the bits that actually distinguish the values.

 

The individual functions probably do a bit more than a simple mod.  Typically, the seed that is used to generate per object hash values  (for example, the allocation address of an object) has fewer than 32 variable bits (for example, only the size range of the allocation nursery).  If the hash function know range of hash values it is expected to generate it can more effectively use the actual variable bits to generate a good distribution.

 

2) What about non-objects? It would be nice if strings, numbers, and perhaps other primitive types could hash by value rather than by identity, so you could reasonably mix them as hash keys with objects.

 

Sure this can be added, but isn’t an essential primitive. Hash values can be easily  generated today for all the built-in value types.  But object identity hash requires some sort of primitive support. 

 

2)     Security considerations: it should be recommended that the hash code, if computed from the address of the object, is not reversible. (Of course, with a copying collector you probably can't compute the hash code from the address).

Yes, that would be a good requirement.   In practice, it’s usually the case anyway.  Actually, the address is a good seed hash value even for copying collectors.  There are tricks that are known by lisp/smalltalk/etc. implementers for efficiently doing this.  Happy to share…  Ideally, identity hashes cost nothing to compute and require no additional per object storage.  The ideal isn’t obtainable, but you can get pretty close.

 

Regards,

Maciej

 


_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: encapsulated hash codes

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 4, 2009, at 6:43 PM, Allen Wirfs-Brock wrote:

 See below
 
From: Maciej Stachowiak [mjs@...] 
Sent: Wednesday, November 04, 2009 4:50 PM
To: Allen Wirfs-Brock
Cc: es-discuss
Subject: Re: encapsulated hash codes
 
 
On Nov 4, 2009, at 2:49 PM, Allen Wirfs-Brock wrote:


A straw man proposal for  Harmony encapsulated hash codes  has been posted to the Wiki at.
 
1)     Wouldn't it be simpler to have a single Object.hash() function that returns a uniformly distributed number in the 0..2^32 range, and leave the mod operation up to code using the hash code? Making a new function that (essentially) encapsulates the hash operation and the mod does not seem to pull its weight in API complexity.
 
Yes, it would and that is the classic approach (perhaps excluding the uniformity of distribution)  used by most languages.  As an implementer I would be fairly happy with that approach.  However, some of our members have significant concerns about the possibility of hash codes being used as a covert communications canner.  This proposal tries to avoid that problem by giving each client  of identity hashes (most likely each hash table) their own hash function which yields a distinct sequences of hash values (for the same sequence of objects).  Since  the functions aren’t shared the sequence of values they generate can be used as a channel.

How can you use it as a covert channel? If you can only produce a hash code given a value, you need an overt communication channel to transfer the value to use a hash function for covert communication. Is there a more complicated scenario that I'm not thinking of?

 
Requiring uniform distribution over a 32-bit range may not be practical.  If a client only wants hashes in the 0.255 range the obvious function of a 32-bit value (mod 256) ignores most of the bits that actually distinguish the values.

Thats ok. For a good quality hash function, this will be distributed just as uniformly as a hash that produces values 0..255. Uniform distribution over a 32-bit range is quite practical, there are many well-lnown algorithms.

 The individual functions probably do a bit more than a simple mod.  Typically, the seed that is used to generate per object hash values  (for example, the allocation address of an object) has fewer than 32 variable bits (for example, only the size range of the allocation nursery).  If the hash function know range of hash values it is expected to generate it can more effectively use the actual variable bits to generate a good distribution.

I don't believe that is the case. With a good hash function, changing any one bit in the input will on average change half the bits in the output (and similar properties for correlations of multiple bits). Thus, the low 28 bits of  32-bit hash code on values with only (say) 28 bits of entropy will be just as good as the best 28-bit hash code you could make.

See here for some references: http://burtleburtle.net/bob/hash/#lookup

 
2) What about non-objects? It would be nice if strings, numbers, and perhaps other primitive types could hash by value rather than by identity, so you could reasonably mix them as hash keys with objects.
 
Sure this can be added, but isn’t an essential primitive. Hash values can be easily  generated today for all the built-in value types.  But object identity hash requires some sort of primitive support. 

True, but that way is much less convenient and likely to be less performant.

 
2)     Security considerations: it should be recommended that the hash code, if computed from the address of the object, is not reversible. (Of course, with a copying collector you probably can't compute the hash code from the address).
Yes, that would be a good requirement.   In practice, it’s usually the case anyway. 

Actually, many popular non-cryptographic hash functions are fully or partially reversible. It's a desirable property that certain passes of computing the hash function are reversible.  To guarantee what I suggested you'd have to combine an unpredictable salt with the object address or use a cryptographic hash (which is more expensive) or do something more tricky.

Actually, the address is a good seed hash value even for copying collectors.  There are tricks that are known by lisp/smalltalk/etc. implementers for efficiently doing this.  Happy to share…  Ideally, identity hashes cost nothing to compute and require no additional per object storage.  The ideal isn’t obtainable, but you can get pretty close.

I would be interested to hear bout these tricks.

Regards,
Maciej


_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss