Re: Binary Data - possible topic for joint session

View: New views
14 Messages — Rating Filter:   Alert me  

Parent Message unknown Re: Binary Data - possible topic for joint session

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

+ es-discuss (since posting there seems to have piqued more interest)

 From reading over other proposals for binary data, I should mention  
the following operations that seem to be of interest to some  
communities but are not directly provided in this proposal (with ones  
I think are most appropriate for v1 first):

- Subrange/subdata/substring (get a Data that's a range from another's  
buffer - perhaps this could be optimized not to copy)
- Concatenation (specifically the ability to concatenate two immutable  
Data objects and get a new one back without having to go through the  
mutable type).
- Ability to convert to/from strings (with some hardcoded encoding or  
choice of encoding)
- Some or all of the operations of Array
- Base64 encode/decode
- Methods to compute various cryptographic hashes
- Find first or last occurrence of a byte or byte sequence (from a  
given offset)
- Split on a byte or byte sequence

I think it is possible to implement all of these with the primitives  
in my proposal, and in many cases the utility seems dubious (do you  
really want to map() or reduce() binary data one byte at a time?).  
Thus, I lean towards keeping the API relatively minimal, at least for  
starters.

Regards,
Maciej

On Nov 4, 2009, at 4:26 PM, Maciej Stachowiak wrote:

>
> Many APIs being developed for the Web platform would benefit from a  
> good way to store binary data. It would be useful for this to be  
> specified as part of the ECMAScript language, but it's also  
> plausible to make this a W3C spec that's only intended for use with  
> Web platform APIs. Here is an overview of some of the APIs that  
> could use such a data type, some notes on requirements and design  
> alternatives, and a strawman proposal.
>
> = If there's time, I'd like to discuss this at the joint TC-39/HTML  
> WG/Web Apps WG session.
>
> Some APIs that could use this:
>
>    XMLHttpRequest v2 - to receive and send binary data
>    WebSocket - to receive and send binary packets
>    File API - to read binary files
>    Canvas - to get image data in the binary form of an image format  
> (avoiding inefficiency of data: URLs)
>    various storage APIs - to store and retrieve binary data (in  
> combination with other APIs)
>    postMessage - to send binary data cross-window and cross-thread  
> (to Workers) efficiently
>
> I suspect there's more I am not thinking of. A convenient and  
> efficient way to represent binary data could also be useful for pure  
> ES programs.
>
>
> = Current de facto ways for Web apps to deal with binary data:
>
>    Array of numbers with one byte per entry
>    String with one byte stored per UTF-16 code unit
>    String with two bytes stored per UTF-16 code unit
>
> I hope it is obvious why these approaches are not great so I won't  
> go into detail.
>
>
> = Issues for the binary data API:
>
>     Name (potential bikeshed):
>         ByteArray
>         ByteVector
>         BinaryData
>         Data
>
> I like "Data" and similar names. Objective-C has NSData as a  
> distinct type for chunks of binary data - it's not treated as a type  
> of array. I think this makes sense. Often the fact that a chunk of  
> binary data can be treated as an octet sequence is incidental.
>
> ==  Mutable or Immutable (or both?)
>
> Immutable has a number of advantages:
>    - Can share backing store with chunks of binary data that the UA  
> already holds (e.g. in the network cache) without requiring copy-on-
> write
>    - Can be passed cross-thread without copying, and without  
> breaking shared-nothing semantics
>    - Has the right semantics for passing cross-window (can make a  
> copy in cross-process case, but avoid it in same-process case; or  
> use shared memory in cross-process case without worrying about  
> locking or races)
>    - Follows the approach of ES strings, which are immutable
>
> But there's some significant disadvantages too:
>    - What if you actually want to mutate some piece of binary data  
> you got before passing it along? How to do this efficiently?
>    - What if you want to build a new binary data item from scratch?
>
> With strings, the answer to both building and mutation is to extract  
> pieces and build a new string by concatenation. But that's probably  
> not efficient or convenient enough for the binary data case.
>
> Possible solution: provide immutable Data, but have a DataBuilder  
> class to allow creating new data items or mutating copies of  
> existing ones, which can then give a final immutable product.
>
>
> == What Operations?
>
> Operation set could be a full set of array-like operations,  
> absolutely minimal (just accessors for individual bytes), or middle  
> ground (byte-level accessors plus a few bulk operations like the  
> equivalent of memcpy). I like the middle ground.
>
> == Rough API Proposal
>
> Here's a sketch of a binary data API that's immutable (with mutable  
> builder class), and provides a middle-ground set of operations. The  
> basic idea is that binary data should be considered a first-class  
> datatype in its own right, just as strings are, rather than being  
> thought of as a kind of array.
>
> Data -- global constructor
>    When called or invoked as a constructor with a number parameter,  
> return a new Data object of the specified size, filled with all zero  
> bytes.
>
> Data.prototype -- the initial Data prototype
>
> Data.prototype.builderCopy()
>
>    When called with a Data instance as the this parameter, return a  
> new DataBuilder object starting with the same size and a copy of the  
> bytes in this Data object.
>
> Data instance properties:
>
>    length - size of the Data object - read-only
>    index properties - individual bytes of the Data (similar to array  
> access) - read-only
>
> DataBuilder -- global constructor
>    When called or invoked as a constructor with a number parameter,  
> return a new DataBuilder object of the specified size, filled with  
> all zero bytes.
>
> DataBuilder.prototype.builderCopy()
>
>    When called with a DataBuilder instance as the this parameter,  
> return a new DataBuilder object starting with the same size and a  
> copy of the bytes in this Data object.
>
> DataBuilder.prototype.copyRange(dstStart, srcObject, srcStart, srcEnd)
>
>    When called with a DataBuilder instance as the this parameter,  
> copy bytes from srcObject starting at offset srcStart up to offset  
> srcEnd. srcObject can be a Data or a DataBuilder, and can be the  
> same as the "this" object. Overlapping ranges are guaranteed to be  
> copied correctly. dstStart is the offset in this DataBuilder at  
> which to provide writing.
>
> DataBuilder.prototype.fill(byte, dstStart, dstEnd)
>
>   Fill with "byte" from dstStart to dstEnd.
>
> DataBuilder.prototype release()
>
>   Return a Data object of the same size and containing the same  
> bytes as this DataBuilder, and at the same time reset this  
> DataBuilder to 0 length. This is so that the new Data object can  
> adopt the buffer of this DataBuilder without copying, which is what  
> is commonly desired.
>
> DataBuilder instance properties:
>
>    length - size of the Data object - read-write
>    index properties - individual bytes of the DataBuilder (similar  
> to array access) - read-write
>
>
> Rationale:
>
>   - copyRange() and fill() are the only higher-level operations  
> provided, because they can be implemented much more efficiently for  
> large ranges in native code than in ECMAScript.
>
>   - Data would be returned and taken by all Web APIs, its  
> immutability allows binary data to be passed around without copying.
>
>   - DataBuilder allows creation and mutation while minimizing copies  
> and letting most of a system maintain the benefits of immutability.
>
>   - DataBuilder.prototype.release() is specifically designed to let  
> a program use mutation to build up a chunk of binary data, then pass  
> it off to code that should not mutate it or across boundaries with  
> shared-nothing semantics (like Workers), without requiring a copy  
> after initially building.
>
> Sorry that this is so sketchy, but I thought this would make a good  
> starting point for discussion.
>
> Regards,
> Maciej
>
>
>
>
>
>
>
>
>

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Brendan Eich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 6, 2009, at 1:34 AM, Maciej Stachowiak wrote:

> = Issues for the binary data API:
>
>    Name (potential bikeshed):
>        ByteArray
>        ByteVector
>        BinaryData
>        Data

This isn't just rank bikeshedding:

1. Data is so common a name that we can't confidently inject it into  
the global object without fear of breaking something. JSON, in spite  
of json2.js precedent, was implemented incompatibly and object-
detected insufficiently, although this was corrected by the  
implementors (Facebook folks, much appreciated). Google codesearch  
results:

http://www.google.com/codesearch?as_q=%22function+Data%28%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y
http://www.google.com/codesearch?as_q=%22var+Data;%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y
http://www.google.com/codesearch?as_q=%22var+Data%20=%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y

2. Data is annoyingly close to Date.

3. Data is technically plural, and usage sometimes treats it as plural  
(ok, this is almost bikeshedding, I admit). For a String analogue this  
is awkward.


> I like "Data" and similar names. Objective-C has NSData as a  
> distinct type for chunks of binary data - it's not treated as a type  
> of array. I think this makes sense. Often the fact that a chunk of  
> binary data can be treated as an octet sequence is incidental.

It's not incidental unless you provide wider-than-byte element access  
and address byte order. Let's not, in the interest of serving API  
simplicity and common octet-sequence use-cases first and only (if we  
can hold this line).

Therefore I think a concrete name such as ByteVector or ByteArray is  
better, all else equal.

Moreover a name such as ByteVector is much easier to inject as a  
global property. No hits for the obvious function and var forms of it,  
one hit for ByteArray:

http://www.google.com/codesearch?hl=en&lr=&q=%22function+ByteArray%28%22+lang%3Ajavascript&sbtn=Search

/be

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 6, 2009, at 8:26 AM, Brendan Eich wrote:

> On Nov 6, 2009, at 1:34 AM, Maciej Stachowiak wrote:
>
>> = Issues for the binary data API:
>>
>>   Name (potential bikeshed):
>>       ByteArray
>>       ByteVector
>>       BinaryData
>>       Data
>
> This isn't just rank bikeshedding:
>
> 1. Data is so common a name that we can't confidently inject it into  
> the global object without fear of breaking something. JSON, in spite  
> of json2.js precedent, was implemented incompatibly and object-
> detected insufficiently, although this was corrected by the  
> implementors (Facebook folks, much appreciated). Google codesearch  
> results:
>
> http://www.google.com/codesearch?as_q=%22function+Data%28%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y
> http://www.google.com/codesearch?as_q=%22var+Data;%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y
> http://www.google.com/codesearch?as_q=%22var+Data%20=%22&btnG=Search+Code&hl=en&as_lang=javascript&as_case=y
>
> 2. Data is annoyingly close to Date.
>
> 3. Data is technically plural, and usage sometimes treats it as  
> plural (ok, this is almost bikeshedding, I admit). For a String  
> analogue this is awkward.

You're right that there are some objective factors which may rule out  
certain names, in addition to subjective taste concerns. I tried not  
to think too hard about the name in making the original proposal,  
since I figured there would be a range of opinion. Your stated reasons  
against Data seem decent.

>
>
>> I like "Data" and similar names. Objective-C has NSData as a  
>> distinct type for chunks of binary data - it's not treated as a  
>> type of array. I think this makes sense. Often the fact that a  
>> chunk of binary data can be treated as an octet sequence is  
>> incidental.
>
> It's not incidental unless you provide wider-than-byte element  
> access and address byte order. Let's not, in the interest of serving  
> API simplicity and common octet-sequence use-cases first and only  
> (if we can hold this line).

Indeed, I'd rather not propose APIs like that in the initial version  
(though I think eventually we may want a way to copy sequences of 16-,  
32-bit or 64-bit values swapping from network byte order to host byte  
order or vice versa to make it practical to interpret popular binary  
formats.

However, I think a common use case for binary data is to pass it  
around for point A to point B, without unpacking the internals at all,  
just as for strings. For example, you may read a file in binary form,  
pass the binary data off to a Worker, and then have the Worker upload  
it to a server. This is part of why I leaned towards a name that does  
not overly emphasize the byte sequence nature.

>
> Therefore I think a concrete name such as ByteVector or ByteArray is  
> better, all else equal.
>
> Moreover a name such as ByteVector is much easier to inject as a  
> global property. No hits for the obvious function and var forms of  
> it, one hit for ByteArray:
>
> http://www.google.com/codesearch?hl=en&lr=&q=%22function+ByteArray%28%22+lang%3Ajavascript&sbtn=Search

Some other possible names (based in part on some other binary data  
proposals that I've seen):

BinaryData
BinData
ByteString
Binary
Blob

Good topic for in-person discussion maybe?

Regards,
Maciej

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Brendan Eich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 6, 2009, at 9:18 AM, Maciej Stachowiak wrote:

> On Nov 6, 2009, at 8:26 AM, Brendan Eich wrote:
>
> Indeed, I'd rather not propose APIs like that in the initial version  
> (though I think eventually we may want a way to copy sequences of  
> 16-, 32-bit or 64-bit values swapping from network byte order to  
> host byte order or vice versa to make it practical to interpret  
> popular binary formats.

Could be -- I agree we should defer.


> However, I think a common use case for binary data is to pass it  
> around for point A to point B, without unpacking the internals at  
> all, just as for strings. For example, you may read a file in binary  
> form, pass the binary data off to a Worker, and then have the Worker  
> upload it to a server. This is part of why I leaned towards a name  
> that does not overly emphasize the byte sequence nature.

Yet the minimal API will give byte-element access, not variable-length  
bit string or any other such access. Concrete beats abstract every  
time in this level of discourse, in my experience.


>> Therefore I think a concrete name such as ByteVector or ByteArray  
>> is better, all else equal.
>>
>> Moreover a name such as ByteVector is much easier to inject as a  
>> global property. No hits for the obvious function and var forms of  
>> it, one hit for ByteArray:
>>
>> http://www.google.com/codesearch?hl=en&lr=&q=%22function+ByteArray%28%22+lang%3Ajavascript&sbtn=Search
>
> Some other possible names (based in part on some other binary data  
> proposals that I've seen):
>
> BinaryData
> BinData
> ByteString

These look free of conflicts from some quick codesearch'ing --  
ByteString is good, better than ByteVector if we do not allow bytes to  
be mutated.


> Binary

Existing uses:

http://www.google.com/codesearch?hl=en&lr=&q=%22function+Binary%28%22+lang%3Ajavascript&sbtn=Search

If any of these involve detection, we probably can't use Binary. I did  
not look further, but suggest we eliminate candidate names that are  
already in use according to Google codesearch.


> Blob

A few hits:

http://www.google.com/codesearch?hl=en&lr=&q=%22function+Blob%28%22+lang%3Ajavascript&sbtn=Search

/be
>
> Good topic for in-person discussion maybe?
>
> Regards,
> Maciej
>

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Dean Landolt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki:

http://wiki.commonjs.org/wiki/Binary

If nothing else there's quite a bit of prior art collected which should inform the conversation. I know the Binary/B proposal has the implementation momentum, but I don't know exactly what the status is. I haven't been closely following the evolution of these binary specs too closely but since it seems that nearly everyone else from the group is off to jsconf.eu I figured I ought to toss this out there.
_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Brendan Eich-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki:

http://wiki.commonjs.org/wiki/Binary

If nothing else there's quite a bit of prior art collected which should inform the conversation. I know the Binary/B proposal has the implementation momentum, but I don't know exactly what the status is. I haven't been closely following the evolution of these binary specs too closely but since it seems that nearly everyone else from the group is off to jsconf.eu I figured I ought to toss this out there.

Thanks, I had forgotten about commonjs.org, having once paid better attention.

Kris did a good job with Binary/B (although I do not see the point of the .get method additions) -- I didn't look at the other proposals yet.

/be

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Ash Berlin-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 6 Nov 2009, at 19:24, Brendan Eich wrote:

On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki:

http://wiki.commonjs.org/wiki/Binary

If nothing else there's quite a bit of prior art collected which should inform the conversation. I know the Binary/B proposal has the implementation momentum, but I don't know exactly what the status is. I haven't been closely following the evolution of these binary specs too closely but since it seems that nearly everyone else from the group is off to jsconf.eu I figured I ought to toss this out there.

Thanks, I had forgotten about commonjs.org, having once paid better attention.

Kris did a good job with Binary/B (although I do not see the point of the .get method additions) -- I didn't look at the other proposals yet.

/be

Binary/B feels largely right, but it has a few too many methods from Array simply because Array had them for my taste, specifically things like sort, reduce, shift, unshift etc. 

Conceptually: why would you want to sort an array of bytes? There are certainly classes of operations that I think should just be done via b.toArray().X rather than directly on the blob.

As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups

-ash

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:


On 6 Nov 2009, at 19:24, Brendan Eich wrote:

On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

Just in case some of you weren't aware, the CommonJS group has done quite a bit of work and (bikeshedding) on this topic. Here's a link to the wiki:

http://wiki.commonjs.org/wiki/Binary

If nothing else there's quite a bit of prior art collected which should inform the conversation. I know the Binary/B proposal has the implementation momentum, but I don't know exactly what the status is. I haven't been closely following the evolution of these binary specs too closely but since it seems that nearly everyone else from the group is off to jsconf.eu I figured I ought to toss this out there.

Thanks, I had forgotten about commonjs.org, having once paid better attention.

Kris did a good job with Binary/B (although I do not see the point of the .get method additions) -- I didn't look at the other proposals yet.

/be

Binary/B feels largely right, but it has a few too many methods from Array simply because Array had them for my taste, specifically things like sort, reduce, shift, unshift etc. 

Conceptually: why would you want to sort an array of bytes? There are certainly classes of operations that I think should just be done via b.toArray().X rather than directly on the blob.

As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups

Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences:

(1) Binary/B does not have a cheap way to convert from the immutable representation (ByteString) to the mutable representation (ByteArray)
(2) In Binary/B, Array-like index access to ByteString gives back one-byte ByteStrings instead of bytes, likely an over-literal copying of String
(3) There are some seemingly needless differences in the interfaces to ByteString and ByteArray that follow from modeling on String and Array 
(4) Binary/B has many more operations available in the base proposal (including charset transcoding and a generous selection of String and Array methods)
(5) Different names - Data/DataBuilder vs. ByteString/ByteArray

My initial impression is that (1), (2) and (3) are all points on which my proposal is better. On (1): cheap conversion from mutable to immutable (DataBuilder.prototype.release() in my proposal) lets binary data objects be built up with a convenient mutation-based idiom, but then passed around as immutable objects thereafter. On (2): I don't think a one-byte ByteString is ever useful, indexing to get the byte value would be much more helpful. On (3), I think it's good for the mutable interface to be a strict superset of the the immutable interface.

(4) and (5) are all points where perhaps neither proposal is at the optimum yet. On (4), I suspect the sweet spot is somewhere between my spartan set of built-in operations and the very generous set in Binary/B. On (5), I'm not sure either set of names is the best possible, and I'm certainly not stuck on my own proposed names.

Regards,
Maciej


_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Ash Berlin-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 8 Nov 2009, at 02:21, Maciej Stachowiak wrote:


On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:

On 6 Nov 2009, at 19:24, Brendan Eich wrote:

On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

http://wiki.commonjs.org/wiki/Binary

[snip]

[snip]
As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups

Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences:

(1) Binary/B does not have a cheap way to convert from the immutable representation (ByteString) to the mutable representation (ByteArray)
(2) In Binary/B, Array-like index access to ByteString gives back one-byte ByteStrings instead of bytes, likely an over-literal copying of String
(3) There are some seemingly needless differences in the interfaces to ByteString and ByteArray that follow from modeling on String and Array 
(4) Binary/B has many more operations available in the base proposal (including charset transcoding and a generous selection of String and Array methods)
(5) Different names - Data/DataBuilder vs. ByteString/ByteArray


On (1): cheap conversion from mutable to immutable (DataBuilder.prototype.release() in my proposal) lets binary data objects be built up with a convenient mutation-based idiom, but then passed around as immutable objects thereafter.

Mutable to immutable or immutable to mutable? Assuming the former, how do you handle the differences in API/behaviour? each function checks wether it is now immutable?

On (2): I don't think a one-byte ByteString is ever useful, indexing to get the byte value would be much more helpful.

Couldn't agree more with you here - if for whatever reason you do want a one-byte ByteString, there is always substr/substring. This is something that came up recently in IRC and prompted me to start looking at making changes to the proposal - I was going to do that next week, so this coming up now is very good timing.

On (3), I think it's good for the mutable interface to be a strict superset of the the immutable interface.

Seems like a reasonable thing to do.


(4) and (5) are all points where perhaps neither proposal is at the optimum yet. On (4), I suspect the sweet spot is somewhere between my spartan set of built-in operations and the very generous set in Binary/B.

Agreed - this was the other thing i noticed - e.g. sorting a ByteArray isn't really an operation that makes a whole lot of sense to my mind. 

On (5), I'm not sure either set of names is the best possible, and I'm certainly not stuck on my own proposed names.

I'm not really bothered either way on this front, although 'Data' is much more likely to clash with existing code.


Regards,
Maciej


Something worth bearing in mind is that Binary/B is implemented in 2 or 3 CommonJS platforms already, but I don't think any one is particularly attached to the behaviour so long as what comes out isn't wildly different.

-ash 

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Maciej Stachowiak :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 7, 2009, at 6:53 PM, Ash Berlin wrote:


On 8 Nov 2009, at 02:21, Maciej Stachowiak wrote:


On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:

On 6 Nov 2009, at 19:24, Brendan Eich wrote:

On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:

http://wiki.commonjs.org/wiki/Binary

[snip]

[snip]
As a community (CommonJS) we'd be more than happy to go forward with a binary spec that came from (or at least has the blessing of) the ES groups

Binary/B is the closest of the three proposals to mine, in that it has both mutable and immutable binary data containers. Here are a few key differences:

(1) Binary/B does not have a cheap way to convert from the immutable representation (ByteString) to the mutable representation (ByteArray)
(2) In Binary/B, Array-like index access to ByteString gives back one-byte ByteStrings instead of bytes, likely an over-literal copying of String
(3) There are some seemingly needless differences in the interfaces to ByteString and ByteArray that follow from modeling on String and Array 
(4) Binary/B has many more operations available in the base proposal (including charset transcoding and a generous selection of String and Array methods)
(5) Different names - Data/DataBuilder vs. ByteString/ByteArray


On (1): cheap conversion from mutable to immutable (DataBuilder.prototype.release() in my proposal) lets binary data objects be built up with a convenient mutation-based idiom, but then passed around as immutable objects thereafter.

Mutable to immutable or immutable to mutable? Assuming the former, how do you handle the differences in API/behaviour? each function checks wether it is now immutable?

Mutable to immutable. Immutable to mutable has to copy (or at least copy-on-write).

My proposal does it like this (where DataBuilder is the mutable variant and Data is the immutable):

DataBuilder.prototype.release()

    Return a new Data with the same length and the same byte values as the DataBuilder passed as the this value. At the same time, the DataBuilder is reset to length 0.

Because the DataBuilder is reset to empty, the implementation can "steal" its underlying buffer for the new Data object, thus converting to immutable without a full copy. This matches the common pattern of assembling a new piece of binary data with mutation, then handing it out to possibly multiple other pieces of code as immutable.



On (2): I don't think a one-byte ByteString is ever useful, indexing to get the byte value would be much more helpful.

Couldn't agree more with you here - if for whatever reason you do want a one-byte ByteString, there is always substr/substring. This is something that came up recently in IRC and prompted me to start looking at making changes to the proposal - I was going to do that next week, so this coming up now is very good timing.

On (3), I think it's good for the mutable interface to be a strict superset of the the immutable interface.

Seems like a reasonable thing to do.

I'm glad we agree on these two points.



(4) and (5) are all points where perhaps neither proposal is at the optimum yet. On (4), I suspect the sweet spot is somewhere between my spartan set of built-in operations and the very generous set in Binary/B.

Agreed - this was the other thing i noticed - e.g. sorting a ByteArray isn't really an operation that makes a whole lot of sense to my mind.

Yep. I'm not even sure things like map(), filter() or reduce() are likely to work well. My own preference is to start the API very small, and add incrementally based on demonstrated need and clearly articulated use cases.

 

On (5), I'm not sure either set of names is the best possible, and I'm certainly not stuck on my own proposed names.

I'm not really bothered either way on this front, although 'Data' is much more likely to clash with existing code.

Yes, Brendan made this point and presented some good evidence in that direction. I think 'Data' doesn't work but 'Binary' or 'BinData' might.


Something worth bearing in mind is that Binary/B is implemented in 2 or 3 CommonJS platforms already, but I don't think any one is particularly attached to the behaviour so long as what comes out isn't wildly different.

What kind of differences do you think they would tolerate? Renaming the classes? Dropping/changing some methods?

Regards,
Maciej



_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Daniel Friesen-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Maciej Stachowiak wrote:

>
> On Nov 7, 2009, at 5:39 AM, Ash Berlin wrote:
>
>>
>> On 6 Nov 2009, at 19:24, Brendan Eich wrote:
>>
>>> On Nov 6, 2009, at 10:44 AM, Dean Landolt wrote:
>>>
>>>> Just in case some of you weren't aware, the CommonJS group has done
>>>> quite a bit of work and (bikeshedding) on this topic. Here's a link
>>>> to the wiki:
>>>>
>>>> http://wiki.commonjs.org/wiki/Binary
>>>>
>>>> ...
>
> Binary/B is the closest of the three proposals to mine, in that it has
> both mutable and immutable binary data containers. Here are a few key
> differences:
> ...
> Regards,
> Maciej
One note, Binary/C also originally had a mutable and an immutable type.
The mutable type was moved to IO/B/Buffer
(http://wiki.commonjs.org/wiki/IO/B/Buffer), when comparing to Binary/B,
Binary/C together with IO/B/Buffer is more equivalent a comparison.

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Kris Kowal-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Nov 6, 2009 at 11:24 AM, Brendan Eich <brendan@...> wrote:
> Kris did a good job with Binary/B (although I do not see the point of the
> .get method additions) -- I didn't look at the other proposals yet.

Thanks.  The .get method is certainly not relevant for an ECMAScript
spec, where you have the luxury of specifying [[Get]] and [[Put]].
The .get method in the CommonJS proposal is intended to serve as a
stop-gap for implementations that cannot provide properties.

Kris Kowal
_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Ash Berlin-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 07 Nov 2009 19:17:48 -0800, Maciej Stachowiak
wrote:


> Mutable to immutable. Immutable to mutable has to copy (or at least
> copy-on-write).
>
> My proposal does it like this (where DataBuilder is the mutable
> variant and Data is the immutable):
>
> DataBuilder.prototype.release()
>
> Return a new Data with the same length and the same byte values
> as the DataBuilder passed as the this value. At the same time, the
> DataBuilder is reset to length 0.
>
> Because the DataBuilder is reset to empty, the implementation can
> "steal" its underlying buffer for the new Data object, thus converting
> to immutable without a full copy. This matches the common pattern of
> assembling a new piece of binary data with mutation, then handing it
> out to possibly multiple other pieces of code as immutable.
>

Seems like a good idea, but is this a case of baking too much
implementation detail in the spec?


> Yes, Brendan made this point and presented some good evidence in that
> direction. I think 'Data' doesn't work but 'Binary' or 'BinData' might.

Yay naming bike-shedding. Perhaps postponing the naming until later on in
the process once the rest of the API is more concrete?

>> Something worth bearing in mind is that Binary/B is implemented in 2
>> or 3 CommonJS platforms already, but I don't think any one is
>> particularly attached to the behaviour so long as what comes out
>> isn't wildly different.
>
> What kind of differences do you think they would tolerate? Renaming
> the classes? Dropping/changing some methods?

I haven't checked with anyone, but so long as there is a clear migration
path I can't see anyone complaining too vocally.

_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss

Re: Binary Data - possible topic for joint session

by Kris Kowal-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

[+ commonjs]

On Sat, Nov 7, 2009 at 6:21 PM, Maciej Stachowiak <mjs@...> wrote:
> If nothing else there's quite a bit of prior art collected which should
> inform the conversation. I know the Binary/B proposal has the implementation
> momentum, but I don't know exactly what the status is. I haven't been
> closely following the evolution of these binary specs too closely but since
> it seems that nearly everyone else from the group is off to jsconf.eu I
> figured I ought to toss this out there.

Thanks, we're back, and convergence on binary data API's are our next
big thrust.  I spoke with Brian Mitchell at jsconf.eu who has a
significant interest in our binary proposals.  Particularly there was
a lot of interest in bit quantized data types, which in my opinion
would complement but not replace byte quantized data types.  At this
point, I imagine that we will eventually have ByteString, ByteArray,
ByteStream, BitString, BitArray, and BitStream types, between our
"binary" and "io" module specifications.

> Binary/B feels largely right, but it has a few too many methods from Array
> simply because Array had them for my taste, specifically things like sort,
> reduce, shift, unshift etc.

In retrospect, I agree.  I think our ByteArray could survive with a
very small subset of the Array API.  Would anyone miss any of: push,
pop, shift, unshift, sort, reverse, splice, indexOf, lastIndexOf,
split, filter, forEach, every, some, map, reduce, reduceRight,
displace, extendLeft, extendRight.  I imagine that the primary use
cases for ByteArray would be fixed-width, but explicitly growable with
length assignment, pipes and buffers, for which the most common
operations would copy(target, start, stop, targetStart) and conversion
to other types.

> (1) Binary/B does not have a cheap way to convert from the immutable
> representation (ByteString) to the mutable representation (ByteArray)

Apart from .toByteArray()?  I imagine that implementations would be
able to track whether underlying buffer blocks are shared by multiple
ByteString or ByteArray data instances and support copy-on-write for
ByteArrays.  I'm probably missing something.  Perhaps you envision
something lower-level?

> (2) In Binary/B, Array-like index access to ByteString gives back one-byte
> ByteStrings instead of bytes, likely an over-literal copying of String

This has been mentioned, but there are certain values to over-literal
copying; the notion is that certain algorithms written for Strings,
albeit algorithms written for byte strings but suffering to do so with
Strings, should continue to function with ByteString.  To that end, it
may be desirable for certain idioms to continue to function properly:

  string[0].concat(string[1])

> (3) There are some seemingly needless differences in the interfaces to
> ByteString and ByteArray that follow from modeling on String and Array

I am not sure.

> (4) Binary/B has many more operations available in the base proposal
> (including charset transcoding and a generous selection of String and Array
> methods)

I think it will be desirable to trim down the ByteArray proposal.

I don't recall where, but there's also some hint that it would be good
to support conversion to various radix string representations,
certainly 16 and 64, but possibly also 2, 8, and 32 (either to the RFC
or Doug Crockford's proposal for human-error-resistant license keys).
I think that these ought to be folded into .toString(radix:Number) in
a future draft.

> (5) Different names - Data/DataBuilder vs. ByteString/ByteArray

I like ByteString. ByteArray is tending toward not being as strictly
Array-like, but I think it's also apt, from the perspective of users
implicitly understanding what kinds of operations are permitted on
ByteArrays based on their understanding of Arrays, like mutability and
resizability.  I definitely don't like Data and DataBuilder for the
reasons Brendan outlined, but I definitely could see cases for Buffer
and Blob.

> On (1): cheap conversion from mutable to immutable
> (DataBuilder.prototype.release() in my proposal) lets binary data objects be
> built up with a convenient mutation-based idiom, but then passed around as
> immutable objects thereafter.

Ah, sure.  That makes sense.  My instinct is that under the hood, the
original byte array would not actually disappear but switch to
copy-on-write and transfer ownership of its underlying buffer to the
new ByteString.  However, could this behavior not be folded up
transparently by toByteString()?

> On (2): I don't think a one-byte ByteString is
> ever useful, indexing to get the byte value would be much more helpful.

I agree this is debatable. I'm not ready to embark on a case study of
existing uses of Strings for binary data in JavaScript to explore what
methods are used, but there certainly is a corpus.  The works of Jacob
Seidelin and Ama Chang come to mind; I've seen and massaged code for
most radix encodings, charset encodings, hashing algorithms, EXIM,
ID3, binary AJAX, ZIP archives, and the itinerant compression
algorithms like LZ77.  They all use a combination of Array and String
operations, all operating on the octet invariant.  It might be worth
looking into how easily these projects can be ported to these API's.

> On (3), I think it's good for the mutable interface to be
> a strict superset of
> the the immutable interface.

Also, not sure.  I'm certain that there should be a body of common
methods so they can be used generically, but I'm not sure that it
should be exhaustive one way or the other.  Perhaps in the course of
pruning ByteArray we'll converge on something a step away from
ByteString.

> My initial impression is that (1), (2) and (3) are all points on which my
> proposal is better.
> (4) and (5) are all points where perhaps neither proposal is at the optimum
> yet.

I think we can address (1) under the hood.  I'm not sure about (2) and
(3); I've hitherto assumed that String/Array genericity would be
valuable. (4) is also contentious; Binary/B does "entrain" a lot of
necessary specification for charsets and radix encodings, although it
rather deliberately avoids specifying API's for structure packing and
unpacking.

> On (4), I suspect the sweet spot is somewhere between my spartan set of
> built-in operations and the very generous set in Binary/B.

Agreed.

> On (5), I'm not
> sure either set of names is the best possible, and I'm certainly not stuck
> on my own proposed names.

Yes. It might be best to revisit nomenclature after the API's settle.

Thanks,
Kris Kowal
_______________________________________________
es-discuss mailing list
es-discuss@...
https://mail.mozilla.org/listinfo/es-discuss