Returning un-escaped XML literals in SPARQL 1.1 XML results

View: New views
2 Messages — Rating Filter:   Alert me  

Returning un-escaped XML literals in SPARQL 1.1 XML results

by Stu Baurmann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Howdy Folks,

In response to the solicitation of suggestions for new features in SPARQL 1.1,
I would like to raise this horse from the dead for further beatings:

http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml

This feature has significant utility in cases where a user wants
to store blocks of (well-formed!) XML in an RDF model, and then process
this XML in a pipelined context.  In my experience, this feature
makes it easier to gradually adopt RDF as part of an ongoing SOA
project.  The stored XML could be either document content or message
content.  I do not advocate retrieval of ill-formed legacy HTML through
this mechanism (which was a possibility raised in previous discussion).

I also think it is now relevant to consider the impact on XProc integration,
as raised by Paul Tyson on 2009-03-04.   I say this without being well
versed in XProc, but based on the assumption that un-escaped XML results
are useful in any pipelined processing context.   I welcome clarifications
from the more XProc-savvy.

http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2009Mar/0004.html

The implementation might also be linked to the use of the ReturnFormatKeyword:

http://www.w3.org/2009/sparql/wiki/Feature:ReturnFormatKeyword

I know there is some complexity involved in embedding arbitrary
XML into the results stream.   It might be sensible to make xml-literal
results an optional feature (both in the sense that SPARQL implementors
are not required to implement it, and in the sense that SPARQL
users are not required to use it).  I would also support placing
restrictions on the XML content that can be returned this way,
e.g. to address some of the encoding issues addressed by Eric
Prud'hommeaux here:

http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/0163.html

(Perhaps there's been some progress on c14n in recent months?)

Regarding XML schemas and implementation, one idea is that
the XML literal might come wrapped in a child tag of <binding> called
<xml-literal>, which has content type xsd:any.
This means the overall SPARQL-results schema would not be
weakened for any results that do not happen to include <xml-literal>.

Example of well-formed XHTML content (we could just as well use WSDL,
a SOAP message, etc.):

<binding name="o">
   <xml-literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
      <xh:p xmlns:xh="http://www.w3.org/1999/xhtml">Contents of <xh:em>important</xh:em> paragraph</xh:p>
   </xml-literal>
</binding>

I hope these thoughts are useful!

peace,

Stu


Re: Returning un-escaped XML literals in SPARQL 1.1 XML results

by Norman Walsh :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stu Baurmann <stub@...> writes:
> In response to the solicitation of suggestions for new features in SPARQL 1.1,
> I would like to raise this horse from the dead for further beatings:
>
> http://www.w3.org/2001/sw/DataAccess/issues#unescapedXml

I think I've made my feelings about escaped markup pretty clear:

  http://norman.walsh.name/2003/09/16/escmarkup

:-)

> I also think it is now relevant to consider the impact on XProc integration,
> as raised by Paul Tyson on 2009-03-04.   I say this without being well
> versed in XProc, but based on the assumption that un-escaped XML results
> are useful in any pipelined processing context.   I welcome clarifications
> from the more XProc-savvy.

If you've got XML and you want to pass XML through an XML pipeline,
starting with escaped XML is a damned inconvenience.

That said, XProc does have a step for unescaping markup, so it's not
fair to say that you can't deal with escaped markup (at least in XProc
pipelines).

But I'd much rather have a way of storing it unescaped, thank you very much.

> I know there is some complexity involved in embedding arbitrary
> XML into the results stream.   It might be sensible to make xml-literal
> results an optional feature (both in the sense that SPARQL implementors
> are not required to implement it, and in the sense that SPARQL
> users are not required to use it).  I would also support placing
> restrictions on the XML content that can be returned this way,
> e.g. to address some of the encoding issues addressed by Eric
> Prud'hommeaux here:
>
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2007JulSep/0163.html
>
> (Perhaps there's been some progress on c14n in recent months?)
I assume that this embedding is happening at the Infoset level (or in
some other data model abstraction), so losing the XML Declaration
isn't likely to be too problematic (XML 1.1 notwithstanding).

Yes, you have to lose the <!DOCTYPE declaration. So be it. I think
this effort should be described in terms of embedding XML content in
RDF, not in terms of embedding XML *documents* in RDF. Presumably you
can use some other triple to keep track of what its DTD was, and for
that matter what version of XML it was, if those things are important
to you.

Yes, xml:ids can collide. That's ok, the xml:id spec says they can
collide too. It means that you'll get funny results (potentially) if
you do id() queries on an XML serialization of your RDF store. But
really, you're going to get funny results anyway if you do that,
right?

> Regarding XML schemas and implementation, one idea is that
> the XML literal might come wrapped in a child tag of <binding> called
> <xml-literal>, which has content type xsd:any.
> This means the overall SPARQL-results schema would not be
> weakened for any results that do not happen to include <xml-literal>.

Technically, the XML Core WG own's all element names that begin with
"x", "m", and "l" in that order, so some coordination might be
necessary (not that I expect it would be difficult).

> Example of well-formed XHTML content (we could just as well use WSDL,
> a SOAP message, etc.):
>
> <binding name="o">
>   <xml-literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">
>      <xh:p xmlns:xh="http://www.w3.org/1999/xhtml">Contents of <xh:em>important</xh:em> paragraph</xh:p>
>   </xml-literal>
> </binding>

I'm not close enough to SPARQL to have a good grasp on the relationship between
binding and xml-literal, but

  <xml-literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">

looks a little redundant. Wouldn't simply

  <literal datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral">

be sufficient?

                                        Be seeing you,
                                          norm

--
Norman Walsh <ndw@...> | Kinship is healing; we are physicians
http://nwalsh.com/            | to each other.--Oliver Sacks


attachment0 (191 bytes) Download Attachment