RDF Test Cases Bug: The N-Triples Grammar is Ambiguous

View: New views
3 Messages — Rating Filter:   Alert me  

RDF Test Cases Bug: The N-Triples Grammar is Ambiguous

by Sean B. Palmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


According to the N-Triples grammar [1], the following is a valid
instance of the line production in an N-Triples document:

<p:> <> <q:> <> <r:> <> "s" .

But which part of the line matches the subject production, and which
part matches the predicate production? As far as I can tell, the
N-Triples specification does provide a means of interpretation. This
is a very major bug, if so; it means that N-Triples does not have a
usable grammar.

Cf. http://chatlogs.planetrdf.com/swig/2007-11-01.html#T09-28-40

Thanks,

[1] http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#ntrip_grammar
- RDF Test Cases, 3.1. Extended Backus-Naur Form (EBNF) Grammar

--
Sean B. Palmer, http://inamidst.com/sbp/


Re: RDF Test Cases Bug: The N-Triples Grammar is Ambiguous

by Sean B. Palmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 11/1/07, Sean B. Palmer <sean@...> wrote:

> As far as I can tell, the N-Triples specification does provide a
> means of interpretation.

That was a typo: I meant of course that it *doesn't* provide a means
of interpretation. I've checked the specification of the EBNF grammar
that is used:

http://www.w3.org/TR/REC-xml/#sec-notation

And it doesn't way whether productions are greedy or not, but clearly
any interpretation would depend on matters such as that, and the fact
that only valid RDF URI references are allowed in the circumstance.

So my test case was as follows:

<p:> <> <q:> <> <r:> <> "s" .

And there are at least four ways of interpreting this:

[<p:> <> <q:> <> <r:>] [<>] ["s"] .
- Greedy, invalid RDF URI references

[<p:> <> <q:> <>] [<r:> <>] ["s"] .
- Greedy and valid RDF URI references

[<p:>] [<> <q:> <> <r:> <>] ["s"] .
- Non-greedy, invalid RDF URI references

[<p:> <>] [<q:> <> <r:> <>] ["s"] .
- Non-greedy, valid RDF URI references

The stakes here are that depending on what the interpretation is, it
mightn't be possible to express various RDF Graphs using N-Triples,
which, as far as I know, is supposed to be able to represent all
possible RDF Graphs.

For example, say we go with the following interpretation:

[<p:> <>] [<q:> <> <r:> <>] ["s"] .

Then how do you express the following?

[<p:> <> <q:> <>] [<r:> <>] ["s"] .

Whatever the resolution, this will undoubtedly make compliant
N-Triples parsing a lot harder than it prima facie appears if parsing
depends on checking whether potential resulting RDF URI references are
valid.

--
Sean B. Palmer, http://inamidst.com/sbp/


Re: RDF Test Cases Bug: The N-Triples Grammar is Ambiguous

by Sean B. Palmer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 11/1/07, Sean B. Palmer <sean@...> wrote:

> <p:> <> <q:> <> <r:> <> "s" .

Note the following response when we try to get rapper to emit
N-Triples corresponding as closely as possible to this edge case:

$ cat test/rdfxml006.rdf
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="p:> <> <q:> <">
<s xmlns="r:> <" rdf:resource="http://example.org/"/>
</rdf:Description>
</rdf:RDF>

$ rapper -i rdfxml test/rdfxml006.rdf
rapper: Parsing file test/rdfxml006.rdf
<p:\u003E <\u003E <q:\u003E <> <r:\u003E <s> <http://example.org/> .
rapper: Parsing returned 1 triple

rapper refuses to emit an > character in an absoluteURI, even though
according to the specification this cannot be escaped.

All the same, this is the obvious way to correct this problem. It
means that instead of the following:

"These are encoded in N-Triples using the escapes described in section Strings."
- http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/#sec-uri-encoding

The specification should instead say something like:

'These are encoded in N-Triples using the escapes described in section
Strings, with the extra proviso that unicode character #x3E, ">", must
be escaped as \u003E.'

This would mean that /[^>]+/ could be used as a regexp to get
absoluteURI production instances.

Thanks,

--
Sean B. Palmer, http://inamidst.com/sbp/