json_to_term EEP

View: New views
10 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 31 Jul 2008, at 6:06 pm, Willem de Jong wrote:
> Of course, but if the Erlang team creates special, fast support in C  
> it
> would be good if it could be used by as many people as possible.

Agreed.

But remember, the existence of an EEP is no guarantee whatever that
the proposal will ever be accepted.  Even the publication of an EEP
is simply acceptance by the moderator that the proposal meets the
formal criteria and isn't ragingly insane.  Rival EEPs addressing
the same area are allowed to exist, and are even a good idea.

To put it bluntly, there is no way I am ever going to put a SAX-like
interface in *my* EEP for JSON.  (I am not rejecting the idea that a
JSON converter might accept or deliver JSON forms incrementally; the
issues there are rather different.)

Anyone who thinks differently is not only free to write their own
EEP, they are *welcome* to do so.  It will be *good* for Erlang if
different ideas about how to do things are clearly written up and
available for discussion.
>

> I personally like working with a SAX parser.
> See the example below - I quite enjoyed writing it.

I'm sure you did, but the example does not in fact
work with a SAX parser.  It works with an (apparently non-existent)
parser that delivers a *data structure*, and it consumes the data
structure.  If you are going to work with a data structure, why
not the very same data structure that you are supposed to be getting?
It's like saying "well, I could have a pizza delivered to my door,
but instead I'll have all the ingredients delivered in separate
deliveries and then I'll make the pizza".
>
> The question is whether the things that an ESIS/SAX-like interface
> let you do are things that people particularly *want* to do with JSON.
> I have no idea.
>
> The point is, that the Erlang team would probably like to implement  
> only 1 very
> fast JSON parser in C.

The snag is that you CAN have a "very fast" JSON->term parser,
but you CAN'T have a "very fast" JSON->event stream parser,
because you have the extra overhead of creating event terms
and either calling a handler function (which then has to go to
all the trouble of decoding what the parser _knew_) or sending
messages to another process (ditto).  The people who *want* a JSON
form as an Erlang term would be very ill served by a SAX-like
interface, and the intrinsic overheads are such that the people
who want a SAX-like interface would get little benefit from an
implementation in C.

> In my opinion, that should be a SAX-like parser, because
> it is easy to create DVM output based on SAX output, but pointless  
> to do it the
> other way around.

JSON is so simple that producing a term from a sequence of events
is scarcely any easier than writing a parser in the first place.
Really, the only thing you are spared is handling UTF-8.

As for it being pointless to turn DOM (or DVM) into SAX,
opinions may vary.  I've had good reason to do it several times.

I find it telling that all the JSON parsers for Erlang that I've looked
at generate terms; not one of them offers a SAX-like interface.
Doubtless there are many more that I haven't looked at, so I cannot
claim that there are no JSON/SAX parsers for Erlang, or that nobody
has a need for one.  I certainly can claim that if anyone did want a
JSON/SAX parser, it would be quite easy to take one of the existing
freely available JSON parsers and modify it to send events instead of
building a result.

If people were routinely pumping Brobdingnagian JSON messages around
the Web, it would be important to use an event stream interface to
keep process sizes reasonable.  It does not appear that they are.
The Agile slogan YAGNI! applies, I think.


> A sax parser may create the following events (that is: call its  
> callback
> function with the following arguments, while parsing):
>
> E = [startDocument,startObject, {key,"menu"}, startObject, {key,"id"},
>  {value,"file"}, {key,"popup"}, startObject, {key,"menuitem"},
>  startArray,startObject, {key,"value"}, {value,"New"},  
> {key,"onclick"},
>  {value,"CreateNewDoc()"}, endObject,startObject, {key,"value"},
>  {value,"Close"}, {key,"onclick"}, {value,"CloseDoc()"}, endObject,
>  endArray,endObject,endObject,endObject, endDocument].
>
As a data structure, this is far bigger than the simple term would be.
It *has* to be more expensive to create this.
It becomes clear later in your message that this is not what you
really mean:  you mean something like
        json_event_stream_parser(IO_Data, Handler, Initial_State)
where
        Handler :: JSON_Event -> State -> State

> Below an example of a callback function to process these events -  
> this function would be called by the SAX parser when it has  
> processed another relevant part of the JSON document. The parser  
> passes the value
> returned by the function to the next invocation (second argument of  
> the function, the first argument is the SAX event).
>
> dvm(startDocument, _) ->
>   start;
> dvm(startObject, Stack) ->
>   [[]| Stack];
> dvm(startArray, Stack) ->
>   [[]| Stack];
> dvm({key, _} = Event, Stack) ->
>   [Event|Stack];
> dvm({value, Value}, start) ->
>   {value, Value};
>
Technically, the JSON RFC does not allow this.
It does seem sensible to handle it though.

>
> dvm({value, Value}, [{key, Key}, List | T]) ->
>   [[{Key, Value} | List] | T];
> dvm({value, Value}, [List | T]) ->
>   [[Value | List] | T];
> dvm(endObject, [List | T]) ->
>   dvm({value, {lists:reverse(List)}}, T);
> dvm(endArray, [List | T]) ->
>   dvm({value, lists:reverse(List)}, T);
> dvm(endDocument, {value, R}) ->
>   R.
>
In short, you are proposing that an interface that most Erlang
JSON users do not appear to have a need for should be
privileged so that an interface that there IS a demonstrated
need for can be implemented on top of it much more expensively.

I do not find this convincing.

That does not matter.
Write an EEP of your own.  Spell out the details.
Put it on the supermarket shelf and see if anyone
makes chop suey with it.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by David Mercer-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I notice keys and values are stored as binaries rather than lists of Unicode
code points (Erlang "strings").  Won't this result in different keys and
values for the same JSON object encoded in UTF-8 vs. UTF-16?

David

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 1 Aug 2008, at 11:16 am, David Mercer wrote:

> I notice keys and values are stored as binaries rather than lists of  
> Unicode
> code points (Erlang "strings").  Won't this result in different keys  
> and
> values for the same JSON object encoded in UTF-8 vs. UTF-16?

No it won't.  The EEP *specifically* says that the binaries
will *always* use UTF-8, whatever the source or destination
encoding.

There is a 4th draft which is sitting waiting for moderator
approval before going out to the mailing list, because it is
44kB.  But this was already in the 1st draft.






_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by David Mercer-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Does this signal that the Erlang community is moving away from
strings-as-lists to strings-as-binaries?  Should we have an option in the
JSON functionality to return keys and values as strings-as-lists?  Should
this be the default?  Not advocating, just asking.

Cheers,

David

> -----Original Message-----
> From: Richard A. O'Keefe [mailto:ok@...]
> Sent: Thursday, July 31, 2008 19:20
> To: dmercer@...
> Cc: 'Erlang Questions'
> Subject: Re: [erlang-questions] json_to_term EEP
>
>
> On 1 Aug 2008, at 11:16 am, David Mercer wrote:
>
> > I notice keys and values are stored as binaries rather than lists of
> > Unicode
> > code points (Erlang "strings").  Won't this result in different keys
> > and
> > values for the same JSON object encoded in UTF-8 vs. UTF-16?
>
> No it won't.  The EEP *specifically* says that the binaries
> will *always* use UTF-8, whatever the source or destination
> encoding.
>
> There is a 4th draft which is sitting waiting for moderator
> approval before going out to the mailing list, because it is
> 44kB.  But this was already in the 1st draft.
>
>
>
>


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Gustavo Niemeyer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Does this signal that the Erlang community is moving away from
> strings-as-lists to strings-as-binaries?  Should we have an option in the
> JSON functionality to return keys and values as strings-as-lists?  Should
> this be the default?  Not advocating, just asking.

Note that doing something like this for that specific case would require
additional syntax to be able to distinguish a list of ints in JSON from a
plain string.  Not sure if it was taken in consideration when writing the
EEP, but it likely was.

--
Gustavo Niemeyer
http://niemeyer.net
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Chris Anderson-11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 31, 2008 at 4:10 PM, Richard A. O'Keefe <ok@...> wrote:

> To put it bluntly, there is no way I am ever going to put a SAX-like
> interface in *my* EEP for JSON.  (I am not rejecting the idea that a
> JSON converter might accept or deliver JSON forms incrementally; the
> issues there are rather different.)
>
> Anyone who thinks differently is not only free to write their own
> EEP, they are *welcome* to do so.  It will be *good* for Erlang if
> different ideas about how to do things are clearly written up and
> available for discussion.
>>
>
>> I personally like working with a SAX parser.
>> See the example below - I quite enjoyed writing it.
>

Just checked through the Yecc documentation - it looks like the
example code I posted has both a DVM and a SAX-like API. It's nice
that the same code base can serve both purposes. Now to make it fast!

from the Yecc docs: http://www.erlang.org/doc/man/yecc.html

====

It is also possible to make the parser ask for more input tokens when
needed if the following call format is used:

myparser:parse_and_scan({Function, Args})
myparser:parse_and_scan({Mod, Tokenizer, Args})

The tokenizer Function is either a fun or a tuple {Mod, Tokenizer}.

The call apply(Function, Args) or apply({Mod, Tokenizer}, Args) is
executed whenever a new token is needed. This, for example, makes it
possible to parse from a file, token by token.



--
Chris Anderson
http://jchris.mfdz.com
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Willem de Jong-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Fri, Aug 1, 2008 at 1:10 AM, Richard A. O'Keefe <ok@...> wrote:
On 31 Jul 2008, at 6:06 pm, Willem de Jong wrote:
I personally like working with a SAX parser.
See the example below - I quite enjoyed writing it.

I'm sure you did, but the example does not in fact
work with a SAX parser.  It works with an (apparently non-existent)
parser that delivers a *data structure*, and it consumes the data
structure.  If you are going to work with a data structure, why
not the very same data structure that you are supposed to be getting?
It's like saying "well, I could have a pizza delivered to my door,
but instead I'll have all the ingredients delivered in separate deliveries and then I'll make the pizza".
 
Or something else, if I want to eat something else. Or, if I am hungry, I can start eating when the first tomato arrives. Or, if I actually want to feed an elephant, I don't have to wory about the pizza fitting through the door.
 
I think I'll just finish the parser that I have, and then see how it compares to other parsers.
 
Regards,
Willem.
 

 


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2 Aug 2008, at 2:21 am, David Mercer wrote:

> Does this signal that the Erlang community is moving away from
> strings-as-lists to strings-as-binaries?

Basically, I looked at what other people were doing with JSON->Erlang
conversion, and strings-as-binaries seemed to be the most popular  
choice.

Strings as binaries have some advantages.

(1) They are MUCH closer in spirit to what people expect strings to
     be.  I've lost count of the number of times people have said in
     this mailing list "Erlang is no XXXXXXX good because it doesn't
     have strings."  I'm sick of explaining why this is wrong.

(2) They _are_ more compact than lists, and if you want to pump data
     _through_ Erlang, reducing space turnover is a help.
     This is why I think strings-as-binaries with labels-as-atoms is
     a good balance; the part you expect to look inside using Erlang
     is easy to look at, the rest is cheap to hold and pass on.

(3) They offer constant-time slicing.

With the addition of <<"...">> syntax to Erlang they are almost
readable.

With Unicode, the major snag is that lists can represent one Unicode
character per element, whereas binary matching counts bytes.  Regular
expressions are coming, and if they can handle UTF-8 binaries we need
not worry too much about counting bytes.

>  Should we have an option in the
> JSON functionality to return keys and values as strings-as-lists?

This now leads us to the reason WHY other people are mapping JSON
strings to Erlang binaries.
        "ABC"
is a legal JSON string.
        [65,66,67]
is a legal JSON "array".  If we turned JSON strings into Erlang
strings, how would we tell them from "arrays"?  At least one of them
would have to be flagged somehow.

It might be pleasantly symmetric to have
        [array,65,66,67]
        [string,65,66,67]
        [object,......]
The thing is, you can't -just- say "let's have strings as lists",
you have to do something different with arrays/lists as well.

This really doesn't make a good default.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2 Aug 2008, at 6:20 am, Chris Anderson wrote:

> Just checked through the Yecc documentation - it looks like the
> example code I posted has both a DVM and a SAX-like API. It's nice
> that the same code base can serve both purposes. Now to make it fast!
>
> from the Yecc docs: http://www.erlang.org/doc/man/yecc.html
>
> ====
>
> It is also possible to make the parser ask for more input tokens when
> needed if the following call format is used:
>
> myparser:parse_and_scan({Function, Args})
> myparser:parse_and_scan({Mod, Tokenizer, Args})
>
> The tokenizer Function is either a fun or a tuple {Mod, Tokenizer}.
>
> The call apply(Function, Args) or apply({Mod, Tokenizer}, Args) is
> executed whenever a new token is needed. This, for example, makes it
> possible to parse from a file, token by token.

This is not a SAX-like API.  A SAX-like API is one where you do NOT
get a data structure as *output*, but instead get a stream of
parsing events.  Taking Yecc as an example, imagine all your Erlang
code being ripped out of a Yecc file, and having to put it inside a
giant receive.  The interface you are talking about is all about
the INPUT of the parser, not the output.  And even then it is the
opposite of a SAX-like API, because it is a "pull" interface (the
token consumer tells the token producer "give me another token"),
whereas SAX is a "push" interface ("here is another event, like it
or not").

It is agreed that any network system is likely to receive data in
chunks so that a JSON->Erlang converter that can accept input in
chunks might be useful.






_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by David Mercer-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the outstanding explanation.

If the Erlang community decides to represent strings as UTF-8-encoded
binaries, we should probably add I/O support for that...

Cheers,

David

> -----Original Message-----
> From: Richard A. O'Keefe [mailto:ok@...]
> Sent: Sunday, August 03, 2008 21:24
> To: dmercer@...
> Cc: 'Erlang Questions'
> Subject: Re: [erlang-questions] json_to_term EEP
>
>
> On 2 Aug 2008, at 2:21 am, David Mercer wrote:
>
> > Does this signal that the Erlang community is moving away from
> > strings-as-lists to strings-as-binaries?
>
> Basically, I looked at what other people were doing with JSON->Erlang
> conversion, and strings-as-binaries seemed to be the most popular
> choice.
>
> Strings as binaries have some advantages.
>
> (1) They are MUCH closer in spirit to what people expect strings to
>      be.  I've lost count of the number of times people have said in
>      this mailing list "Erlang is no XXXXXXX good because it doesn't
>      have strings."  I'm sick of explaining why this is wrong.
>
> (2) They _are_ more compact than lists, and if you want to pump data
>      _through_ Erlang, reducing space turnover is a help.
>      This is why I think strings-as-binaries with labels-as-atoms is
>      a good balance; the part you expect to look inside using Erlang
>      is easy to look at, the rest is cheap to hold and pass on.
>
> (3) They offer constant-time slicing.
>
> With the addition of <<"...">> syntax to Erlang they are almost
> readable.
>
> With Unicode, the major snag is that lists can represent one Unicode
> character per element, whereas binary matching counts bytes.  Regular
> expressions are coming, and if they can handle UTF-8 binaries we need
> not worry too much about counting bytes.
>
> >  Should we have an option in the
> > JSON functionality to return keys and values as strings-as-lists?
>
> This now leads us to the reason WHY other people are mapping JSON
> strings to Erlang binaries.
> "ABC"
> is a legal JSON string.
> [65,66,67]
> is a legal JSON "array".  If we turned JSON strings into Erlang
> strings, how would we tell them from "arrays"?  At least one of them
> would have to be flagged somehow.
>
> It might be pleasantly symmetric to have
> [array,65,66,67]
> [string,65,66,67]
> [object,......]
> The thing is, you can't -just- say "let's have strings as lists",
> you have to do something different with arrays/lists as well.
>
> This really doesn't make a good default.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions
< Prev | 1 - 2 | Next >