json_to_term EEP

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

json_to_term EEP

by Chris Anderson-11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Richard,

Thanks again for your work on the EEP.

I've been communicating with Damien (CouchDB lead) about the shape of
objects as returned by json_to_term(). We think that returning a list
of tuples is preferable to returning a tuple of tuples.

Starting with a JSON object like:

{"key":"value", "key2":"value2"}

the two options in Erlang are:

Tuple of tuples (A):  {{<<"key">>, <<"value">>},{<<"key2">>, <<"value2">>}}

or

Tuple containing a list of tuples (B):  {[{<<"key">>,
<<"value">>},{<<"key2">>, <<"value2">>}]}

We both have a preference for (B - list of tuples) because based on
current usage in CouchDB, (A - raw tuples) would have us calling
tuple_to_list() constantly when we need to interact with the data. I
don't see any big drawbacks to (B) and the ease-of-use argument is
important. Requiring less code for the most common use-cases is a big
win.

Chris

--
Chris Anderson
http://jchris.mfdz.com
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Kevin A. Smith-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jul 28, 2008, at 2:42 PM, Chris Anderson wrote:

> Richard,
>
> Thanks again for your work on the EEP.
>
> I've been communicating with Damien (CouchDB lead) about the shape of
> objects as returned by json_to_term(). We think that returning a list
> of tuples is preferable to returning a tuple of tuples.
>
> Starting with a JSON object like:
>
> {"key":"value", "key2":"value2"}
>
> the two options in Erlang are:
>
> Tuple of tuples (A):  {{<<"key">>, <<"value">>},{<<"key2">>,  
> <<"value2">>}}
>
> or
>
> Tuple containing a list of tuples (B):  {[{<<"key">>,
> <<"value">>},{<<"key2">>, <<"value2">>}]}
>
> We both have a preference for (B - list of tuples) because based on
> current usage in CouchDB, (A - raw tuples) would have us calling
> tuple_to_list() constantly when we need to interact with the data. I
> don't see any big drawbacks to (B) and the ease-of-use argument is
> important. Requiring less code for the most common use-cases is a big
> win.

Wouldn't B also allow us to access the data as a proplist? If so, that  
seems like another reason to vote for B.

--Kevin

>
>
> Chris
>
> --
> Chris Anderson
> http://jchris.mfdz.com
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@...
> http://www.erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Bob Ippolito :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Jul 28, 2008 at 11:42 AM, Chris Anderson <jchris@...> wrote:

> Richard,
>
> Thanks again for your work on the EEP.
>
> I've been communicating with Damien (CouchDB lead) about the shape of
> objects as returned by json_to_term(). We think that returning a list
> of tuples is preferable to returning a tuple of tuples.
>
> Starting with a JSON object like:
>
> {"key":"value", "key2":"value2"}
>
> the two options in Erlang are:
>
> Tuple of tuples (A):  {{<<"key">>, <<"value">>},{<<"key2">>, <<"value2">>}}
>
> or
>
> Tuple containing a list of tuples (B):  {[{<<"key">>,
> <<"value">>},{<<"key2">>, <<"value2">>}]}
>
> We both have a preference for (B - list of tuples) because based on
> current usage in CouchDB, (A - raw tuples) would have us calling
> tuple_to_list() constantly when we need to interact with the data. I
> don't see any big drawbacks to (B) and the ease-of-use argument is
> important. Requiring less code for the most common use-cases is a big
> win.

I have a strong preference for B. I don't think I would use it if it
was implemented as a tuple of tuples just because it wouldn't be very
convenient.

-bob
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Paulo Sérgio Almeida :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Chris Anderson wrote:

> the two options in Erlang are:
>
> Tuple of tuples (A):  {{<<"key">>, <<"value">>},{<<"key2">>, <<"value2">>}}
>
> or
>
> Tuple containing a list of tuples (B):  {[{<<"key">>,
> <<"value">>},{<<"key2">>, <<"value2">>}]}

I think there is no doubt that lists will be more useful than tuples.
There is, however another option, that I have been using in a json
parser I wrote:

(C) an object is simply a proplist, i.e. a list of tuples.

This is what one really wants to have in erlang. The difference to
option (B) is that while if a single object is decoded it is easy to
discard the outer {}, when objects are used inside other structures that
is not the case anymore, and (C) will result in a greater chance of
allowing a decoded structure to be stored as is with no post-processing
in a useful erlang structure.

The only problem (C) poses is distinguishing the empty object from an
empty array. My solution (which I am almost happy about) is to represent
the empty object as [{}]. This way:

- objects can be distinguished from arrays, e.g. by the following function:

is_object(O=[T|_]) when is_tuple(T) -> true;
is_object(_) -> false.

- we can use objects as proplists, use functions like lists:keysearch or
list comprehensions like

Keys = [V || {V,_} <- Object]

which will work even for the special empty object [{}].

Anyway, the empty object is not a common case (at least for my
purposes), and the advantages of being able to store nested objects in
the most pleasant way is something that should make one consider option (C).

As others have said, I also do not consider option (A) useful.

Regards,
Paulo

P.S. Given the sudden interest in json, I will describe the options I
took in my parser and make it available in a subsequent post, to further
discussion.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Chris Anderson-11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's an example of (C):

>From the JSON: {"key":"value", "key2":"value2"}

(C - object as proplist): [{<<"key">>,<<"value">>}, {<<"key2">>, <<"value2">>}]

On Mon, Jul 28, 2008 at 2:51 PM, Paulo Sérgio Almeida <psa@...> wrote:

>
> The only problem (C) poses is distinguishing the empty object from an empty
> array. My solution (which I am almost happy about) is to represent the empty
> object as [{}]. This way:
>
> - objects can be distinguished from arrays, e.g. by the following function:
>
> is_object(O=[T|_]) when is_tuple(T) -> true;
> is_object(_) -> false.
>
> - we can use objects as proplists, use functions like lists:keysearch or
> list comprehensions like
>
> Keys = [V || {V,_} <- Object]
>
> which will work even for the special empty object [{}].

(C) seems promising to me now that I've convinced myself that there
aren't issues with nesting.

Thanks for the input, Paulo!

--
Chris Anderson
http://jchris.mfdz.com
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 29 Jul 2008, at 9:51 am, Paulo Sérgio Almeida wrote:
> I think there is no doubt that lists will be more useful than  
> tuples. There is, however another option, that I have been using in  
> a json parser I wrote:
>
> (C) an object is simply a proplist, i.e. a list of tuples.

This is in fact what I originally proposed,
the tricky point being that {} is a legal empty object in JSON,
and we can't map that to [] because that's the representation
for the empty sequence [].

(O) Original proposal: {} => {}, other objects => list of pairs
(A) Armstrong version: object => tuple of pairs, no exceptions.
(B) Object => {list of pairs}.
(C) Almeida proposal: as (O) but {} => [{}].

The arguments for usability of the result in Erlang are the
arguments that originally had me proposing (O).

However, I note that nothing stops us providing a range of
handy-dandy functions that work on tuples of pairs.

%(O)
is_object({})        -> true;
is_object([{_,_}|_]) -> true;
is_object(_)         -> false.

%(A)
is_object(T)         -> is_tuple(T).

%(B)
is_object({T})       -> is_list(T).

%(C)
is_object([T|_])     -> is_tuple(T);
is_object(_)         -> false.

It's rather annoying to be so bothered about empty objects;
do they occur in practical JSON?  Proposal (C) seems neat enough;
the main problem is fitting the results with @type.

--
If stupidity were a crime, who'd 'scape hanging?







_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Willem de Jong-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,
 
How about a SAX-like API? See for example http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/. I can imagine that it would be easy to create any of the forms proposed in this thread based on such an API. On the other hand it would allow you to do things that you wouldn't be able to do with a parser that produces a complete representation at once (in particular: parsing very big documents), and it would be better suitedt to support a 'data mapper' approach like the Erlang ASN.1 implementation, Googles Protocol Buffers or erlsom.
 
Regards,
Willem

 
On 7/29/08, Richard A. O'Keefe <ok@...> wrote:
On 29 Jul 2008, at 9:51 am, Paulo Sérgio Almeida wrote:
> I think there is no doubt that lists will be more useful than
> tuples. There is, however another option, that I have been using in
> a json parser I wrote:
>
> (C) an object is simply a proplist, i.e. a list of tuples.

This is in fact what I originally proposed,
the tricky point being that {} is a legal empty object in JSON,
and we can't map that to [] because that's the representation
for the empty sequence [].

(O) Original proposal: {} => {}, other objects => list of pairs
(A) Armstrong version: object => tuple of pairs, no exceptions.
(B) Object => {list of pairs}.
(C) Almeida proposal: as (O) but {} => [{}].

The arguments for usability of the result in Erlang are the
arguments that originally had me proposing (O).

However, I note that nothing stops us providing a range of
handy-dandy functions that work on tuples of pairs.

%(O)
is_object({})        -> true;
is_object([{_,_}|_]) -> true;
is_object(_)         -> false.

%(A)
is_object(T)         -> is_tuple(T).

%(B)
is_object({T})       -> is_list(T).

%(C)
is_object([T|_])     -> is_tuple(T);
is_object(_)         -> false.

It's rather annoying to be so bothered about empty objects;
do they occur in practical JSON?  Proposal (C) seems neat enough;
the main problem is fitting the results with @type.

--
If stupidity were a crime, who'd 'scape hanging?







_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Hynek Vychodil :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Tue, Jul 29, 2008 at 3:13 AM, Richard A. O'Keefe <ok@...> wrote:
On 29 Jul 2008, at 9:51 am, Paulo Sérgio Almeida wrote:
> I think there is no doubt that lists will be more useful than
> tuples. There is, however another option, that I have been using in
> a json parser I wrote:
>
> (C) an object is simply a proplist, i.e. a list of tuples.

This is in fact what I originally proposed,
the tricky point being that {} is a legal empty object in JSON,
and we can't map that to [] because that's the representation
for the empty sequence [].

(O) Original proposal: {} => {}, other objects => list of pairs
(A) Armstrong version: object => tuple of pairs, no exceptions.
(B) Object => {list of pairs}.
(C) Almeida proposal: as (O) but {} => [{}].

The arguments for usability of the result in Erlang are the
arguments that originally had me proposing (O).

However, I note that nothing stops us providing a range of
handy-dandy functions that work on tuples of pairs.

%(O)
is_object({})        -> true;
is_object([{_,_}|_]) -> true;
is_object(_)         -> false.

%(A)
is_object(T)         -> is_tuple(T).

%(B)
is_object({T})       -> is_list(T).
is_object({T})       -> is_list(T);
is_object(_)       -> false.   % avoid exception


%(C)
is_object([T|_])     -> is_tuple(T);
is_object(_)         -> false.

It's rather annoying to be so bothered about empty objects;
do they occur in practical JSON?  Proposal (C) seems neat enough;
the main problem is fitting the results with @type.

(C) seems good for me too, because proplist works fine with it.

> proplists:get_bool(a, [{}]).
false
> proplists:get_bool(a, [{a, true}]).
true
> proplists:get_value(a, [{a, true}]).
true
> proplists:get_value(a, [{a, heh}]).
heh
> proplists:get_value(a, [{}]).
undefined

atom is used only for simplicity, but works with binaries too. (JSON's boolean should be true/false atom of course I assume.)


--
If stupidity were a crime, who'd 'scape hanging?







_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions



--
--Hynek (Pichi) Vychodil

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Robert Virding :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You will have to forgive but I am now going to do something which I hate when others do it: comment without really knowing much about the topic. :-)

Why not just use option (B) and have the empty object as {[]}? It is always consistent and the empty object is easily from the empty list and empty string. I don't see having the extra tuple should cause any problems, but then again I am no expert.

I would prefer to always have strings in *one* format and not special case keys with atoms sometimes. Otherwise to be certain you would have to match both atom and binary to find key. Unless you *always* use atoms for keys, which could easily explode.

Robert

2008/7/29 Hynek Vychodil <vychodil.hynek@...>


On Tue, Jul 29, 2008 at 3:13 AM, Richard A. O'Keefe <ok@...> wrote:
On 29 Jul 2008, at 9:51 am, Paulo Sérgio Almeida wrote:
> I think there is no doubt that lists will be more useful than
> tuples. There is, however another option, that I have been using in
> a json parser I wrote:
>
> (C) an object is simply a proplist, i.e. a list of tuples.

This is in fact what I originally proposed,
the tricky point being that {} is a legal empty object in JSON,
and we can't map that to [] because that's the representation
for the empty sequence [].

(O) Original proposal: {} => {}, other objects => list of pairs
(A) Armstrong version: object => tuple of pairs, no exceptions.
(B) Object => {list of pairs}.
(C) Almeida proposal: as (O) but {} => [{}].

The arguments for usability of the result in Erlang are the
arguments that originally had me proposing (O).

However, I note that nothing stops us providing a range of
handy-dandy functions that work on tuples of pairs.

%(O)
is_object({})        -> true;
is_object([{_,_}|_]) -> true;
is_object(_)         -> false.

%(A)
is_object(T)         -> is_tuple(T).

%(B)
is_object({T})       -> is_list(T).
is_object({T})       -> is_list(T);
is_object(_)       -> false.   % avoid exception


%(C)
is_object([T|_])     -> is_tuple(T);
is_object(_)         -> false.

It's rather annoying to be so bothered about empty objects;
do they occur in practical JSON?  Proposal (C) seems neat enough;
the main problem is fitting the results with @type.

(C) seems good for me too, because proplist works fine with it.

> proplists:get_bool(a, [{}]).
false
> proplists:get_bool(a, [{a, true}]).
true
> proplists:get_value(a, [{a, true}]).
true
> proplists:get_value(a, [{a, heh}]).
heh
> proplists:get_value(a, [{}]).
undefined

atom is used only for simplicity, but works with binaries too. (JSON's boolean should be true/false atom of course I assume.)


--
If stupidity were a crime, who'd 'scape hanging?







_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions



--
--Hynek (Pichi) Vychodil

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Hynek Vychodil :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Tue, Jul 29, 2008 at 4:07 PM, Robert Virding <rvirding@...> wrote:
You will have to forgive but I am now going to do something which I hate when others do it: comment without really knowing much about the topic. :-)

Why not just use option (B) and have the empty object as {[]}? It is always consistent and the empty object is easily from the empty list and empty string. I don't see having the extra tuple should cause any problems, but then again I am no expert.

*I am no expert.* You are joking.

So on topic:

JSON: {"key":"value", "key2":{}, "key3":[{}, 3.14 , "val", true], "key4": {"a":false, "b":2} }

(B): {[
      {<<"key">>, <<"value">>},
      {<<"key2">>, {[]}},
      {<<"key3", [{[]}, 3.14, <<"val">>, true]},
      {<<"key4">>, {[{<<"a">>, false},{<<"b">>, 2}]}}
   ]}

(C): [
      {<<"key">>, <<"value">>},
      {<<"key2">>, [{}]},
      {<<"key3", [[{}], 3.14, <<"val">>, true]},
      {<<"key4">>, [{<<"a">>, false},{<<"b">>, 2}]}
   ]

(One can use it as simple test case ;-) )

I don't know why (B) version should be better than (C). It's true that (B) have minimal overhead and (C) have a little bit (a really little) more complicate object detection, but in both variants object and list can be determined exactly and in both in function/case guard expression. Notice key2, key3 and key4 values.

Result:
(B) - one structure level for each object more - no problem in Erlang
(C) - first element type check "more" - no problem in Erlang
It's fifty fifty in technically manner and only personal preference rules. (One more structure level is worse in my feeling, but ...)


I would prefer to always have strings in *one* format and not special case keys with atoms sometimes. Otherwise to be certain you would have to match both atom and binary to find key. Unless you *always* use atoms for keys, which could easily explode.

I argue unification, so transforming all to atom is insecure and result is don't use this way at all.
Aside non-uniformity of  list_to_existing_atom way, there is performance drawback too. For each key you must call list_to_existing_atom(binary_to_list(X)) and binary_to_list causes GC pressure in this usage. I would not have use this variant, too.
All is binary is best for me.

P.S.: Why non-uniform is problem. One can argue, it looks nicer. OK. One can argue, binary->atom transformation is done only for exists atoms and all atoms which used in comparisons are exists. BAD, imagine for example store Erlang term for long time or send to other nodes ... It *can* complicate think, so avoid it if you can and we *can*. I think, it is dangerous.


Robert

2008/7/29 Hynek Vychodil <vychodil.hynek@...>



On Tue, Jul 29, 2008 at 3:13 AM, Richard A. O'Keefe <ok@...> wrote:
On 29 Jul 2008, at 9:51 am, Paulo Sérgio Almeida wrote:
> I think there is no doubt that lists will be more useful than
> tuples. There is, however another option, that I have been using in
> a json parser I wrote:
>
> (C) an object is simply a proplist, i.e. a list of tuples.

This is in fact what I originally proposed,
the tricky point being that {} is a legal empty object in JSON,
and we can't map that to [] because that's the representation
for the empty sequence [].

(O) Original proposal: {} => {}, other objects => list of pairs
(A) Armstrong version: object => tuple of pairs, no exceptions.
(B) Object => {list of pairs}.
(C) Almeida proposal: as (O) but {} => [{}].

The arguments for usability of the result in Erlang are the
arguments that originally had me proposing (O).

However, I note that nothing stops us providing a range of
handy-dandy functions that work on tuples of pairs.

%(O)
is_object({})        -> true;
is_object([{_,_}|_]) -> true;
is_object(_)         -> false.

%(A)
is_object(T)         -> is_tuple(T).

%(B)
is_object({T})       -> is_list(T).
is_object({T})       -> is_list(T);
is_object(_)       -> false.   % avoid exception


%(C)
is_object([T|_])     -> is_tuple(T);
is_object(_)         -> false.

It's rather annoying to be so bothered about empty objects;
do they occur in practical JSON?  Proposal (C) seems neat enough;
the main problem is fitting the results with @type.

(C) seems good for me too, because proplist works fine with it.

> proplists:get_bool(a, [{}]).
false
> proplists:get_bool(a, [{a, true}]).
true
> proplists:get_value(a, [{a, true}]).
true
> proplists:get_value(a, [{a, heh}]).
heh
> proplists:get_value(a, [{}]).
undefined

atom is used only for simplicity, but works with binaries too. (JSON's boolean should be true/false atom of course I assume.)


--
If stupidity were a crime, who'd 'scape hanging?







_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions



--
--Hynek (Pichi) Vychodil

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions




--
--Hynek (Pichi) Vychodil

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Chris Anderson-11 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I find this discussion very interesting. Thanks to everyone who has spoken up.

2008/7/28 Willem de Jong <w.a.de.jong@...>:
> How about a SAX-like API? See for
> example http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/

CouchDB will definitely need a streaming JSON processor if we are to
handle giant documents without building them in memory. The example
SAX/JSON parser in C++ is a good read, it's making me want to
prototype something like that in Ruby. A SAX-like streaming tokenizer
seems like it could lend itself to a nice, lean implementation.


On the question of formats, I think any of the proplist formats would
be a good choice. Here's a look at is_array() for the proplist
options.

%(O)
is_array([{_,_}|_])  -> false;
is_array(T)          -> is_list(T).

%(B)
is_array(T)          -> is_list(T).

%(C)
is_array([T|_])      -> not is_tuple(T);
is_array(T)          -> is_list(T).


(B) has the simplest array/object test-functions and has the
parsing/writing advantage that it doesn't require you to look inside
each Erlang list, to see if it corresponds to a JSON array or object.
This means reading left-to-right you know immediately when you've
encountered a JSON array or object.

I'm not sure how heavy to weight the easy-to-read (especially as some
people could think of the {[]} format as harder to read due to the
extra {}.


Chris

--
Chris Anderson
http://jchris.mfdz.com
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 29 Jul 2008, at 6:10 pm, Willem de Jong wrote:
> How about a SAX-like API?

(1) Anyone who wants such a design can produce their own design,
     AND their own code.  The EEP I am concerned with is a DVM-
     like design (Document *Value* Model).

(2) In the XML world, there are several reasons for being
     interested in SAX-like designs (why the H*LL they could
     not bring themselves to say ESIS-like, when ESIS was the
     traditional SGML model for the event stream, I cannot
     imagine, unless it was sheer NIH).

     (A) You can start processing a document without waiting for
         the end.  If people have JSON applications where they
         need to start, say, processing the properties of an
         "object" before knowing what other properties it may
         have, then such a design may be useful for them.  See
         JSON-RPC note below.

     (B) You can process a HUGE document without having to hold
         all of it in memory.  This was a major issue back in the
         days of 16-bit machines; one of the merits of Troff was
         that it produced pages "on-line", and pipelines
         involving SGML and Troff (or similar) made sense.  These
         days, there are some amazingly large RDF files around,
         so again, not having to hold the hold thing makes sense.
         If people have JSON applications where they want to send
         100s of MB of data as JSON, such a design may be useful
         for them.

        The 'man' documentation kit on Solaris works in very much
        this way:  SGML documentation => events => hacky program
        that converts element edges to Troff macros => Troff.

     (C) You may be able to filter an event stream so as to yield
         the effect of selecting (or removing) elements.  I've done
        more of this than I care to remember piping the output of
        nsgmls (or of the SWI Prolog SGML parser) through AWK
        scripts.  Think "subset of XPath" and you'll get the idea.
        This is really a special case of (A) and (B).  People who
        have a need for filtering lengthy JSON streams and want
        to reduce latency could use such a design.

(3) In the functional programming world, SAX is less attractive,
     because the usual techniques for using an ESIS/SAX-like interface
     are heavily stateful.

     Once I had my Document Value Model kit, I found doing things the
     "functional" way over documents as trees was so much easier than
     doing things the ESIS/SAX-like way that now work with entire
     forms whenever I can, and this is *C* programming I'm talking
     about, where stateful is supposed to be easy.

(4) The JSON RFC makes it clear that JSON "messages", if I may call
     them that, may only be "arrays" or "objects"; a number or a
     string must be inside something else.  In cases where an ESIS/
     SAX-like interface might have made sense, it would be more usual
     using JSON to send a stream of self-contained forms that can be
     easily processed one at a time as entire things.

(5) The JSON-RPC 1.1 draft (I haven't looked at 1.0) hints at some
     kind of ESIS/SAX-like interface when it says that arguments
     should be sent in such an order that the receiver can process
     them when it gets them.  How are people actually using JSON-RPC?
     Is there that much to gain, in actual practice?

(6) Not on topic, but I can't help feeling that Linux D-Bus would be
     nicer if it used JSON...

> See for example http://www.p6r.com/articles/2008/05/22/a-sax-like-parser-for-json/ 
> . I can imagine that it would be easy to create any of the forms  
> proposed in this thread based on such an API.

The thing is, it wouldn't be NEARLY as easy as NOT using such an API.
Several Erlang JSON implementations have been mentioned or displayed
in this thread already.  They are not particularly hard to write.
I'd say they are MUCH harder to design than to write!  And the ones
I have read would definitely have been *harder* to code using an ESIS/
SAX-
like interface.

> On the other hand it would allow you to do things that you wouldn't  
> be able to do with a parser that produces a complete representation  
> at once (in particular: parsing very big documents), and it would be  
> better suitedt to support a 'data mapper' approach like the Erlang  
> ASN.1 implementation, Googles Protocol Buffers or erlsom.

The question is whether the things that an ESIS/SAX-like interface
let you do are things that people particularly *want* to do with JSON.
I have no idea.

The world has room for both "value" interfaces and "event stream"
interfaces.

Obviously an ESIS-like interface is possible
because we can trivially map JSON to XML:

        number => <number value="numeric string"/>
        string => <string value="string"/>
        array  => <array>e1...en</array>
        object => <object><slot name="n1">e1</slot>...</object>

So a JSON parser could simply emit the same event stream
(using *precisely* a SAX interface) as an XML parser
*would* have emitted given the equivalent XML.
That is, you would not have a new *interface*, just a new
*parser* that reused your existing "SAX" interface.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It would be nice if people would read the EEP.

On 30 Jul 2008, at 2:55 am, Hynek Vychodil wrote:
> I would prefer to always have strings in *one* format and not  
> special case keys with atoms sometimes. Otherwise to be certain you  
> would have to match both atom and binary to find key. Unless you  
> *always* use atoms for keys, which could easily explode.

In the EEP, json_to_term(IO_Data, Options) has an option
        {label,binary}
or {label,atom}
or {label,existing_atom}
There is no corresponding option for strings, which are
always binaries.  (The idea is that strings are
unpredictable data, whereas labels are predictable structure.)
{label,binary} says to leave all labels as binaries.
     This would have been intolerable before <<"...">> syntax
     was introduced; now the main thing is that it wastes space.
{label,atom} says to convert to an atom any label that CAN
     be converted to an atom, the main limitation being that
     Erlang atoms are not yet Unicode-ready.  (Someone else has
     an EEP about that, I believe.)  This is perfect for
     communicating with a TRUSTED source, just like receiving
     Erlang term_to_binary() values and decoding them.
{label,existing_atom} means that a module that mentions
     certain atoms in pattern matches against formerly-JSON
     labels can be confident of finding those atoms, while
     other labels may remain binaries.

Options are a way of coping with different people's different
situations and needs; the trick is to have just enough of them.

> I argue unification,

Unification of what with what?

> so transforming all to atom is insecure and result is don't use this  
> way at all.

WITHIN a trust boundary, all is well.  Not all communication
crosses trust boundaries, otherwise term_to_binary() would be
of little or no use.

>
> Aside non-uniformity of  list_to_existing_atom way, there is  
> performance drawback too. For each key you must call  
> list_to_existing_atom(binary_to_list(X)) and binary_to_list causes  
> GC pressure in this usage. I would not have use this variant, too.

What performance drawback?  What call to binary_to_list()?  Whoever said
the binary EXISTED in the first place?  The EEP is a proposal for  
putting
these conversion functions in the Erlang core, eventually to be
implemented in C.  So implemented, the alleged performance drawback  
simply
does not exist.
>

> P.S.: Why non-uniform is problem.

It is a problem for people who EXPECT a uniform translation,
and not for people who don't.

> One can argue, it looks nicer. OK. One can argue, binary->atom  
> transformation is done only for exists atoms and all atoms which  
> used in comparisons are exists. BAD, imagine for example store  
> Erlang term for long time or send to other nodes

Again, you are overlooking the fact that different people have
different needs, and that the translation of labels can be (and
IS, in the EEP) an OPTION.  You are also overlooking the fact
that *considered as JSON*, the forms are entirely equivalent,
and that since JSON explicitly says that the order of key:value
pairs does not matter, there is uncertainty about precisely
what Erlang term you get anyway.

In fact, for binary storage, conversion to existing atoms is
*better* than conversion to binaries, because the Erlang
term-to-binary format uses a compression scheme for atoms
that it does not use for binaries.

Admittedlty, the answer to that is to extend the compression
scheme to binaries as well.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Jim Larson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message <5B5B0B86-D10A-4B88-AA77-BD13381E1393@...>
Richard O'Keefe writes:
>On 29 Jul 2008, at 6:10 pm, Willem de Jong wrote:
>> How about a SAX-like API?
>
>(1) Anyone who wants such a design can produce their own design,
>     AND their own code.  The EEP I am concerned with is a DVM-
>     like design (Document *Value* Model).

Note: if anyone dislikes DVM because of difficulties
in editing large values - have a look at "zippers".


>(5) The JSON-RPC 1.1 draft (I haven't looked at 1.0) hints at some
>     kind of ESIS/SAX-like interface when it says that arguments
>     should be sent in such an order that the receiver can process
>     them when it gets them.  How are people actually using JSON-RPC?
>     Is there that much to gain, in actual practice?

I've used only JSON-RPC 1.0, which (as gratuitous exposition) was
essentially just:

        requests are JSON objects with the fields:
                - id: (term) a value to associate request and response
                - method: (string) the name of the procedure being called
                - args: (array) the arguments

        responses are JSON ojects with the fields:
                - id: (term) to associate the response with the request
                - result: (term) the result of the procedure application
                - error: (term) an exceptional result
                - exactly one of result or error will be null

JSON-RPC could be layered directly over TCP, or any other bytestream
transport.  This means that the JSON parser is required to do proper
framing - to be able to handle too much or too little input.  This
motivated my request for a continuation-based parser interface in
my feedback to the original EEP draft.

The direct layering of JSON-RPC over a stream transport allowed for
out-of-order responses over a single connection.  For reasonably-sized
requests and responses, this was almost as good as having channels
within the connection, as BEEP has.  Sadly, JSON-RPC 1.1 looks like
it is only layered on top of HTTP, losing this feature.

In answer to your question, I've used JSON-RPC (1.0) for a production
service, and I've been just fine with a value model for the parsed
results.  I kept the size of JSON terms small by design: if the
parsed terms were too big to conveniently handle as Erlang values,
they would have been clogging the transport too much.

Jim
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Jim Larson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In message <38C632F4-991C-4F8D-8694-8DE1066385FC@...>
Richard A. O'Keefe writes:

>On 30 Jul 2008, at 2:55 am, Hynek Vychodil wrote:
>> Aside non-uniformity of  list_to_existing_atom way, there is  
>> performance drawback too. For each key you must call  
>> list_to_existing_atom(binary_to_list(X)) and binary_to_list causes  
>> GC pressure in this usage. I would not have use this variant, too.
>
>What performance drawback?  What call to binary_to_list()?  Whoever said
>the binary EXISTED in the first place?  The EEP is a proposal for  
>putting
>these conversion functions in the Erlang core, eventually to be
>implemented in C.  So implemented, the alleged performance drawback  
>simply
>does not exist.

I may have been the source of the confusion here.  I mentioned
list_to_existing_atom/1 in my feedback to Richard's original draft.
I mentioned it only to a) point to existing semantics, and b) suggest
that the proposed parser interface would allows a pure erlang
implementation in addition to being built in to the runtime, though
I was not explicit about either reason.

Jim
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Anthony Shipman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, 30 Jul 2008 12:55:27 am Hynek Vychodil wrote:
> JSON: {"key":"value", "key2":{}, "key3":[{}, 3.14 , "val", true], "key4":
> {"a":false, "b":2} }
>
> (B): {[
>       {<<"key">>, <<"value">>},
>       {<<"key2">>, {[]}},
>       {<<"key3", [{[]}, 3.14, <<"val">>, true]},
>       {<<"key4">>, {[{<<"a">>, false},{<<"b">>, 2}]}}
>    ]}

How about

     {json, [ {...} ] }

so that we know what we are looking at and can check it in function argument
patterns etc.

--
Anthony Shipman                    Mamas don't let your babies
als@...                   grow up to be outsourced.
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Hynek Vychodil :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Wed, Jul 30, 2008 at 3:34 AM, Richard A. O'Keefe <ok@...> wrote:
It would be nice if people would read the EEP.


On 30 Jul 2008, at 2:55 am, Hynek Vychodil wrote:
I would prefer to always have strings in *one* format and not special case keys with atoms sometimes. Otherwise to be certain you would have to match both atom and binary to find key. Unless you *always* use atoms for keys, which could easily explode.

In the EEP, json_to_term(IO_Data, Options) has an option
       {label,binary}
or      {label,atom}
or      {label,existing_atom}
There is no corresponding option for strings, which are
always binaries.  (The idea is that strings are
unpredictable data, whereas labels are predictable structure.)
{label,binary} says to leave all labels as binaries.
   This would have been intolerable before <<"...">> syntax
   was introduced; now the main thing is that it wastes space.
{label,atom} says to convert to an atom any label that CAN
   be converted to an atom, the main limitation being that
   Erlang atoms are not yet Unicode-ready.  (Someone else has
   an EEP about that, I believe.)  This is perfect for
   communicating with a TRUSTED source, just like receiving
   Erlang term_to_binary() values and decoding them.
{label,existing_atom} means that a module that mentions
   certain atoms in pattern matches against formerly-JSON
   labels can be confident of finding those atoms, while
   other labels may remain binaries.

Options are a way of coping with different people's different
situations and needs; the trick is to have just enough of them.

I argue unification,

Unification of what with what?


so transforming all to atom is insecure and result is don't use this way at all.

WITHIN a trust boundary, all is well.  Not all communication
crosses trust boundaries, otherwise term_to_binary() would be
of little or no use.



Aside non-uniformity of  list_to_existing_atom way, there is performance drawback too. For each key you must call list_to_existing_atom(binary_to_list(X)) and binary_to_list causes GC pressure in this usage. I would not have use this variant, too.

What performance drawback?  What call to binary_to_list()?  Whoever said
the binary EXISTED in the first place?  The EEP is a proposal for putting
these conversion functions in the Erlang core, eventually to be
implemented in C.  So implemented, the alleged performance drawback simply
does not exist.

All JSON data coming outside Erlang are binary in first state, there is no Erlang lists outside Erlang.



P.S.: Why non-uniform is problem.

It is a problem for people who EXPECT a uniform translation,
and not for people who don't.


One can argue, it looks nicer. OK. One can argue, binary->atom transformation is done only for exists atoms and all atoms which used in comparisons are exists. BAD, imagine for example store Erlang term for long time or send to other nodes

Again, you are overlooking the fact that different people have
different needs, and that the translation of labels can be (and
IS, in the EEP) an OPTION.  You are also overlooking the fact
that *considered as JSON*, the forms are entirely equivalent,
and that since JSON explicitly says that the order of key:value
pairs does not matter, there is uncertainty about precisely
what Erlang term you get anyway.

In fact, for binary storage, conversion to existing atoms is
*better* than conversion to binaries, because the Erlang
term-to-binary format uses a compression scheme for atoms
that it does not use for binaries.

Admittedlty, the answer to that is to extend the compression
scheme to binaries as well.

You are overlooking the fact, that there are another scenarios. For example:

1/ Read and parse JSON {"a":1, "b":2, "c":3} on one erlang node with one set of existing atoms (a,b).

2/ Store Erlang term to file [{a,1}, {b,2}, {<<"c">>, 3}]

3/ In another erlang node with existing atom list {a,c} (for examle in some module you want detect c key of data take from JSON) you load and parse same JSON {"a":1, "b":2, "c":3} and from parser you get [{a,1}, {<<"b">>,2}, {c, 3}]

4/ Than you load stored erlang term from file and two think happend. You take [{a,1}, {b,2}, {<<"c">>, 3}] and existing atoms are now {a,b,c}.

5/ Read and poarse JSON {"a":1, "b":2, "c":3} again and you take [{a,1}, {b,2}, {c, 3}]

6/ Great, you have terms [{a,1}, {b,2}, {c, 3}], [{a,1}, {b,2}, {<<"c">>, 3}] and [{a,1}, {<<"b">>,2}, {c, 3}] as Erlang term representing same JSON input {"a":1, "b":2, "c":3}. What the hell, there is some totaly wrong, isn't it?

Erlang is way how to make things safe and reliable. Converting keys to atoms is not safe and reliable so don't do it, It hurts you!

--
--Hynek (Pichi) Vychodil

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Tony Garnock-Jones-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Anthony Shipman wrote:
> How about
>      {json, [ {...} ] }
> so that we know what we are looking at and can check it in function argument
> patterns etc.

rfc4627.erl uses {obj, [{Key, Value}, ...]}.

Personally, I'm in favour of the uniform option {[{Key, Value}, ...]},
with the empty object being {[]}. It permits uniform treatment of the
list of key-value pairs without a gratuitous special case. I find myself
reading it as if JSON objects are delimited by a new kind of brackets,
"{[" and "]}".

Tony
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Richard O'Keefe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 30 Jul 2008, at 10:07 pm, Hynek Vychodil wrote:
[it was rather hard to figure out what was just quoting
  and what was actual response]

> What performance drawback?  What call to binary_to_list()?  Whoever  
> said
> the binary EXISTED in the first place?  The EEP is a proposal for  
> putting
> these conversion functions in the Erlang core, eventually to be
> implemented in C.  So implemented, the alleged performance drawback  
> simply
> does not exist.
>
> All JSON data coming outside Erlang are binary in first state,
> there is no Erlang lists outside Erlang.

True and irrelevant:  the ONLY lists that json_to_term/[1,2] should
construct are the ones in the results.  NO list construction whatsoever
is implied in the handling of strings.  Remember, this is an EEP based
on Joe Armstrong's suggestion that there should be new built in
functions!


> You are overlooking the fact, that there are another scenarios.

ABSOLUTELY NOT!  Remember, options are OPTIONS.

> For example:
>
> 1/ Read and parse JSON {"a":1, "b":2, "c":3} on one erlang node with  
> one set of existing atoms (a,b).
>
> 2/ Store Erlang term to file [{a,1}, {b,2}, {<<"c">>, 3}]

Remember, this does *NOT* happen by default.
For labels to be converted to existing atoms,
the programmer HAS TO ASK FOR IT EXPLICITLY.

You are 100% right that the DEFAULT options should be safe.

However, the real danger here has nothing to do with atoms.
The danger is this:  if you want to store JSON data, you should
store it *as* JSON, not as something else.  (I am counting
compressed JSON as JSON here.)  The EEP points out other ways
in which Erlang-encoded-JSON may vary:  numbers might be
integers or floats, {key,value} pairs may be reordered in
many ways.

Nor does this have anything to do with Erlang specifically.
For ALL languages, if you want to store JSON or transmit it
or in any way cause JSON data known to one node to become
known to another node you should store or transmit it
*AS* (possibly compressed) JSON, not as something else.

Anyone who keeps this straight will not run into trouble.


>
> 3/ In another erlang node with existing atom list {a,c} (for examle  
> in some module you want detect c key of data take from JSON) you  
> load and parse same JSON {"a":1, "b":2, "c":3} and from parser you  
> get [{a,1}, {<<"b">>,2}, {c, 3}]

Remember, {label,existing_atom} is meant for a module that
wants to receive a JSON term and process it, looking for keys that
are mentioned in that module.  If an Erlang process holds a JSON
term in Erlang form and wants to pass it to another node or
another time, it should send it AS JSON.

> 4/ Than you load stored erlang term from file and two think happend.  
> You take [{a,1}, {b,2}, {<<"c">>, 3}] and existing atoms are now  
> {a,b,c}.
>
> 5/ Read and poarse JSON {"a":1, "b":2, "c":3} again and you take [{a,
> 1}, {b,2}, {c, 3}]
>
> 6/ Great, you have terms [{a,1}, {b,2}, {c, 3}], [{a,1}, {b,2},  
> {<<"c">>, 3}] and [{a,1}, {<<"b">>,2}, {c, 3}] as Erlang term  
> representing same JSON input {"a":1, "b":2, "c":3}. What the hell,  
> there is some totaly wrong, isn't it?

Yes, and what is wrong is seriously incompetent programming.
There are other round trip issues, including the handling of numbers,
and the order of {key,value} pairs.  Recall that the default is
{label,binary}.  So in one node we convert a JSON form to an
Erlang term.  Another node does the same.  One of the nodes then
sends its term to the other, which compares the two terms.
Are they guaranteed to be the same?  Nope.  [hint

FOR ANY PROGRAMMING LANGUAGE AND LIBRARY THE ONLY COMPLETELY
RELIABLE WAY TO TRANSMIT JSON DATA IS *AS* *JSON*.

Got that?

{label,existing_atom} simply is not meant for the use case you
present.
>
>
> Erlang is way how to make things safe and reliable. Converting keys  
> to atoms is not safe and reliable so don't do it, It hurts you!

No, it only hurts stupid people.
Converting keys to existing atoms is perfectly safe for SOME
uses, and there seems to be no good reason to forbid letting
people do that when they are willing to take responsibility
for it being safe.

Expecting JSON forms to convert to identical Erlang terms
at all times and in all places, now THAT is not safe and not
reliable and WILL hurt you.

I could make similar remarks about any language, and about
many formats including XML.

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: json_to_term EEP

by Willem de Jong-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On 7/30/08, Richard A. O'Keefe <ok@...> wrote:
On 29 Jul 2008, at 6:10 pm, Willem de Jong wrote:
How about a SAX-like API?

(1) Anyone who wants such a design can produce their own design,
   AND their own code.  The EEP I am concerned with is a DVM-
   like design (Document *Value* Model).
 
Of course, but if the Erlang team creates special, fast support in C it
would be good if it could be used by as many people as possible.

(3) In the functional programming world, SAX is less attractive,
   because the usual techniques for using an ESIS/SAX-like interface
   are heavily stateful.

   Once I had my Document Value Model kit, I found doing things the
   "functional" way over documents as trees was so much easier than
   doing things the ESIS/SAX-like way that now work with entire
   forms whenever I can, and this is *C* programming I'm talking
   about, where stateful is supposed to be easy.
 
I personally like working with a SAX parser. See the example below - I quite enjoyed writing it.
 
 
The question is whether the things that an ESIS/SAX-like interface
let you do are things that people particularly *want* to do with JSON.
I have no idea.
 
The point is, that the Erlang team would probably like to implement only 1 very
fast JSON parser in C. In my opinion, that should be a SAX-like parser, because
it is easy to create DVM output based on SAX output, but pointless to do it the
other way around.

To give an example:

A sax parser may create the following events (that is: call its callback
function with the following arguments, while parsing):

E = [startDocument,startObject, {key,"menu"}, startObject, {key,"id"},
 {value,"file"}, {key,"popup"}, startObject, {key,"menuitem"},
 startArray,startObject, {key,"value"}, {value,"New"}, {key,"onclick"},
 {value,"CreateNewDoc()"}, endObject,startObject, {key,"value"},
 {value,"Close"}, {key,"onclick"}, {value,"CloseDoc()"}, endObject,
 endArray,endObject,endObject,endObject, endDocument].

(This corresponds to a slightly shortened version of the second example found on json.org).

Below an example of a callback function to process these events - this function would be called by the SAX parser when it has processed another relevant part of the JSON document. The parser passes the value
returned by the function to the next invocation (second argument of the function, the first argument is the SAX event).

dvm(startDocument, _) ->
  start;
dvm(startObject, Stack) ->
  [[]| Stack];
dvm(startArray, Stack) ->
  [[]| Stack];
dvm({key, _} = Event, Stack) ->
  [Event|Stack];
dvm({value, Value}, start) ->
  {value, Value};
dvm({value, Value}, [{key, Key}, List | T]) ->
  [[{Key, Value} | List] | T];
dvm({value, Value}, [List | T]) ->
  [[Value | List] | T];
dvm(endObject, [List | T]) ->
  dvm({value, {lists:reverse(List)}}, T);
dvm(endArray, [List | T]) ->
  dvm({value, lists:reverse(List)}, T);
dvm(endDocument, {value, R}) ->
  R.

With the events given above this gives the following output:
(you can use lists:foldl(fun dvm/2, [], E). to try this).

{[{"menu",
   {[{"id","file"},
     {"popup",
      {[{"menuitem",
         [{[{"value","New"},{"onclick","CreateNewDoc()"}]},
          {[{"value","Close"},{"onclick","CloseDoc()"}]}]}]}}]}}]}

Regards,

Willem


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions
< Prev | 1 - 2 | Next >