|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Re: json_to_term EEPOn 31 Jul 2008, at 6:06 pm, Willem de Jong wrote:
> Of course, but if the Erlang team creates special, fast support in C > it > would be good if it could be used by as many people as possible. Agreed. But remember, the existence of an EEP is no guarantee whatever that the proposal will ever be accepted. Even the publication of an EEP is simply acceptance by the moderator that the proposal meets the formal criteria and isn't ragingly insane. Rival EEPs addressing the same area are allowed to exist, and are even a good idea. To put it bluntly, there is no way I am ever going to put a SAX-like interface in *my* EEP for JSON. (I am not rejecting the idea that a JSON converter might accept or deliver JSON forms incrementally; the issues there are rather different.) Anyone who thinks differently is not only free to write their own EEP, they are *welcome* to do so. It will be *good* for Erlang if different ideas about how to do things are clearly written up and available for discussion. > > I personally like working with a SAX parser. > See the example below - I quite enjoyed writing it. I'm sure you did, but the example does not in fact work with a SAX parser. It works with an (apparently non-existent) parser that delivers a *data structure*, and it consumes the data structure. If you are going to work with a data structure, why not the very same data structure that you are supposed to be getting? It's like saying "well, I could have a pizza delivered to my door, but instead I'll have all the ingredients delivered in separate deliveries and then I'll make the pizza". > > The question is whether the things that an ESIS/SAX-like interface > let you do are things that people particularly *want* to do with JSON. > I have no idea. > > The point is, that the Erlang team would probably like to implement > only 1 very > fast JSON parser in C. The snag is that you CAN have a "very fast" JSON->term parser, but you CAN'T have a "very fast" JSON->event stream parser, because you have the extra overhead of creating event terms and either calling a handler function (which then has to go to all the trouble of decoding what the parser _knew_) or sending messages to another process (ditto). The people who *want* a JSON form as an Erlang term would be very ill served by a SAX-like interface, and the intrinsic overheads are such that the people who want a SAX-like interface would get little benefit from an implementation in C. > In my opinion, that should be a SAX-like parser, because > it is easy to create DVM output based on SAX output, but pointless > to do it the > other way around. JSON is so simple that producing a term from a sequence of events is scarcely any easier than writing a parser in the first place. Really, the only thing you are spared is handling UTF-8. As for it being pointless to turn DOM (or DVM) into SAX, opinions may vary. I've had good reason to do it several times. I find it telling that all the JSON parsers for Erlang that I've looked at generate terms; not one of them offers a SAX-like interface. Doubtless there are many more that I haven't looked at, so I cannot claim that there are no JSON/SAX parsers for Erlang, or that nobody has a need for one. I certainly can claim that if anyone did want a JSON/SAX parser, it would be quite easy to take one of the existing freely available JSON parsers and modify it to send events instead of building a result. If people were routinely pumping Brobdingnagian JSON messages around the Web, it would be important to use an event stream interface to keep process sizes reasonable. It does not appear that they are. The Agile slogan YAGNI! applies, I think. > A sax parser may create the following events (that is: call its > callback > function with the following arguments, while parsing): > > E = [startDocument,startObject, {key,"menu"}, startObject, {key,"id"}, > {value,"file"}, {key,"popup"}, startObject, {key,"menuitem"}, > startArray,startObject, {key,"value"}, {value,"New"}, > {key,"onclick"}, > {value,"CreateNewDoc()"}, endObject,startObject, {key,"value"}, > {value,"Close"}, {key,"onclick"}, {value,"CloseDoc()"}, endObject, > endArray,endObject,endObject,endObject, endDocument]. > It *has* to be more expensive to create this. It becomes clear later in your message that this is not what you really mean: you mean something like json_event_stream_parser(IO_Data, Handler, Initial_State) where Handler :: JSON_Event -> State -> State > Below an example of a callback function to process these events - > this function would be called by the SAX parser when it has > processed another relevant part of the JSON document. The parser > passes the value > returned by the function to the next invocation (second argument of > the function, the first argument is the SAX event). > > dvm(startDocument, _) -> > start; > dvm(startObject, Stack) -> > [[]| Stack]; > dvm(startArray, Stack) -> > [[]| Stack]; > dvm({key, _} = Event, Stack) -> > [Event|Stack]; > dvm({value, Value}, start) -> > {value, Value}; > It does seem sensible to handle it though. > > dvm({value, Value}, [{key, Key}, List | T]) -> > [[{Key, Value} | List] | T]; > dvm({value, Value}, [List | T]) -> > [[Value | List] | T]; > dvm(endObject, [List | T]) -> > dvm({value, {lists:reverse(List)}}, T); > dvm(endArray, [List | T]) -> > dvm({value, lists:reverse(List)}, T); > dvm(endDocument, {value, R}) -> > R. > JSON users do not appear to have a need for should be privileged so that an interface that there IS a demonstrated need for can be implemented on top of it much more expensively. I do not find this convincing. That does not matter. Write an EEP of your own. Spell out the details. Put it on the supermarket shelf and see if anyone makes chop suey with it. _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPI notice keys and values are stored as binaries rather than lists of Unicode
code points (Erlang "strings"). Won't this result in different keys and values for the same JSON object encoded in UTF-8 vs. UTF-16? David _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPOn 1 Aug 2008, at 11:16 am, David Mercer wrote: > I notice keys and values are stored as binaries rather than lists of > Unicode > code points (Erlang "strings"). Won't this result in different keys > and > values for the same JSON object encoded in UTF-8 vs. UTF-16? No it won't. The EEP *specifically* says that the binaries will *always* use UTF-8, whatever the source or destination encoding. There is a 4th draft which is sitting waiting for moderator approval before going out to the mailing list, because it is 44kB. But this was already in the 1st draft. _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPDoes this signal that the Erlang community is moving away from
strings-as-lists to strings-as-binaries? Should we have an option in the JSON functionality to return keys and values as strings-as-lists? Should this be the default? Not advocating, just asking. Cheers, David > -----Original Message----- > From: Richard A. O'Keefe [mailto:ok@...] > Sent: Thursday, July 31, 2008 19:20 > To: dmercer@... > Cc: 'Erlang Questions' > Subject: Re: [erlang-questions] json_to_term EEP > > > On 1 Aug 2008, at 11:16 am, David Mercer wrote: > > > I notice keys and values are stored as binaries rather than lists of > > Unicode > > code points (Erlang "strings"). Won't this result in different keys > > and > > values for the same JSON object encoded in UTF-8 vs. UTF-16? > > No it won't. The EEP *specifically* says that the binaries > will *always* use UTF-8, whatever the source or destination > encoding. > > There is a 4th draft which is sitting waiting for moderator > approval before going out to the mailing list, because it is > 44kB. But this was already in the 1st draft. > > > > _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEP> Does this signal that the Erlang community is moving away from
> strings-as-lists to strings-as-binaries? Should we have an option in the > JSON functionality to return keys and values as strings-as-lists? Should > this be the default? Not advocating, just asking. Note that doing something like this for that specific case would require additional syntax to be able to distinguish a list of ints in JSON from a plain string. Not sure if it was taken in consideration when writing the EEP, but it likely was. -- Gustavo Niemeyer http://niemeyer.net _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPOn Thu, Jul 31, 2008 at 4:10 PM, Richard A. O'Keefe <ok@...> wrote:
> To put it bluntly, there is no way I am ever going to put a SAX-like > interface in *my* EEP for JSON. (I am not rejecting the idea that a > JSON converter might accept or deliver JSON forms incrementally; the > issues there are rather different.) > > Anyone who thinks differently is not only free to write their own > EEP, they are *welcome* to do so. It will be *good* for Erlang if > different ideas about how to do things are clearly written up and > available for discussion. >> > >> I personally like working with a SAX parser. >> See the example below - I quite enjoyed writing it. > Just checked through the Yecc documentation - it looks like the example code I posted has both a DVM and a SAX-like API. It's nice that the same code base can serve both purposes. Now to make it fast! from the Yecc docs: http://www.erlang.org/doc/man/yecc.html ==== It is also possible to make the parser ask for more input tokens when needed if the following call format is used: myparser:parse_and_scan({Function, Args}) myparser:parse_and_scan({Mod, Tokenizer, Args}) The tokenizer Function is either a fun or a tuple {Mod, Tokenizer}. The call apply(Function, Args) or apply({Mod, Tokenizer}, Args) is executed whenever a new token is needed. This, for example, makes it possible to parse from a file, token by token. -- Chris Anderson http://jchris.mfdz.com _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPOn Fri, Aug 1, 2008 at 1:10 AM, Richard A. O'Keefe <ok@...> wrote:
Or something else, if I want to eat something else. Or, if I am hungry, I can start eating when the first tomato arrives. Or, if I actually want to feed an elephant, I don't have to wory about the pizza fitting through the door.
I think I'll just finish the parser that I have, and then see how it compares to other parsers.
Regards,
Willem.
_______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPOn 2 Aug 2008, at 2:21 am, David Mercer wrote: > Does this signal that the Erlang community is moving away from > strings-as-lists to strings-as-binaries? Basically, I looked at what other people were doing with JSON->Erlang conversion, and strings-as-binaries seemed to be the most popular choice. Strings as binaries have some advantages. (1) They are MUCH closer in spirit to what people expect strings to be. I've lost count of the number of times people have said in this mailing list "Erlang is no XXXXXXX good because it doesn't have strings." I'm sick of explaining why this is wrong. (2) They _are_ more compact than lists, and if you want to pump data _through_ Erlang, reducing space turnover is a help. This is why I think strings-as-binaries with labels-as-atoms is a good balance; the part you expect to look inside using Erlang is easy to look at, the rest is cheap to hold and pass on. (3) They offer constant-time slicing. With the addition of <<"...">> syntax to Erlang they are almost readable. With Unicode, the major snag is that lists can represent one Unicode character per element, whereas binary matching counts bytes. Regular expressions are coming, and if they can handle UTF-8 binaries we need not worry too much about counting bytes. > Should we have an option in the > JSON functionality to return keys and values as strings-as-lists? This now leads us to the reason WHY other people are mapping JSON strings to Erlang binaries. "ABC" is a legal JSON string. [65,66,67] is a legal JSON "array". If we turned JSON strings into Erlang strings, how would we tell them from "arrays"? At least one of them would have to be flagged somehow. It might be pleasantly symmetric to have [array,65,66,67] [string,65,66,67] [object,......] The thing is, you can't -just- say "let's have strings as lists", you have to do something different with arrays/lists as well. This really doesn't make a good default. _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPOn 2 Aug 2008, at 6:20 am, Chris Anderson wrote: > Just checked through the Yecc documentation - it looks like the > example code I posted has both a DVM and a SAX-like API. It's nice > that the same code base can serve both purposes. Now to make it fast! > > from the Yecc docs: http://www.erlang.org/doc/man/yecc.html > > ==== > > It is also possible to make the parser ask for more input tokens when > needed if the following call format is used: > > myparser:parse_and_scan({Function, Args}) > myparser:parse_and_scan({Mod, Tokenizer, Args}) > > The tokenizer Function is either a fun or a tuple {Mod, Tokenizer}. > > The call apply(Function, Args) or apply({Mod, Tokenizer}, Args) is > executed whenever a new token is needed. This, for example, makes it > possible to parse from a file, token by token. This is not a SAX-like API. A SAX-like API is one where you do NOT get a data structure as *output*, but instead get a stream of parsing events. Taking Yecc as an example, imagine all your Erlang code being ripped out of a Yecc file, and having to put it inside a giant receive. The interface you are talking about is all about the INPUT of the parser, not the output. And even then it is the opposite of a SAX-like API, because it is a "pull" interface (the token consumer tells the token producer "give me another token"), whereas SAX is a "push" interface ("here is another event, like it or not"). It is agreed that any network system is likely to receive data in chunks so that a JSON->Erlang converter that can accept input in chunks might be useful. _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
|
|
Re: json_to_term EEPThanks for the outstanding explanation.
If the Erlang community decides to represent strings as UTF-8-encoded binaries, we should probably add I/O support for that... Cheers, David > -----Original Message----- > From: Richard A. O'Keefe [mailto:ok@...] > Sent: Sunday, August 03, 2008 21:24 > To: dmercer@... > Cc: 'Erlang Questions' > Subject: Re: [erlang-questions] json_to_term EEP > > > On 2 Aug 2008, at 2:21 am, David Mercer wrote: > > > Does this signal that the Erlang community is moving away from > > strings-as-lists to strings-as-binaries? > > Basically, I looked at what other people were doing with JSON->Erlang > conversion, and strings-as-binaries seemed to be the most popular > choice. > > Strings as binaries have some advantages. > > (1) They are MUCH closer in spirit to what people expect strings to > be. I've lost count of the number of times people have said in > this mailing list "Erlang is no XXXXXXX good because it doesn't > have strings." I'm sick of explaining why this is wrong. > > (2) They _are_ more compact than lists, and if you want to pump data > _through_ Erlang, reducing space turnover is a help. > This is why I think strings-as-binaries with labels-as-atoms is > a good balance; the part you expect to look inside using Erlang > is easy to look at, the rest is cheap to hold and pass on. > > (3) They offer constant-time slicing. > > With the addition of <<"...">> syntax to Erlang they are almost > readable. > > With Unicode, the major snag is that lists can represent one Unicode > character per element, whereas binary matching counts bytes. Regular > expressions are coming, and if they can handle UTF-8 binaries we need > not worry too much about counting bytes. > > > Should we have an option in the > > JSON functionality to return keys and values as strings-as-lists? > > This now leads us to the reason WHY other people are mapping JSON > strings to Erlang binaries. > "ABC" > is a legal JSON string. > [65,66,67] > is a legal JSON "array". If we turned JSON strings into Erlang > strings, how would we tell them from "arrays"? At least one of them > would have to be flagged somehow. > > It might be pleasantly symmetric to have > [array,65,66,67] > [string,65,66,67] > [object,......] > The thing is, you can't -just- say "let's have strings as lists", > you have to do something different with arrays/lists as well. > > This really doesn't make a good default. _______________________________________________ erlang-questions mailing list erlang-questions@... http://www.erlang.org/mailman/listinfo/erlang-questions |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |