
|
Some strengths and weaknesses in Scala's XML capabilities
Today I needed to transform some XML in a fairly minor way; some data
serialized using the XStream library (* task description in footer). I
chose to try it using Scala's XML APIs, my first use of them.
I was surpised by what I discovered tackling the task.
- XML Pattern Matching, which had sounded really cool when I'd read
about it, turned out to be of no use in practice. IMO It should
de-emphasized or even deprecated.
- Much more useful are the \ and \\ operators. This seems to be
because they do what XPath does and are easy to invoke, however they
are less feature-complete and less documented than XPath. Re: level of
docs, compare [eg
http://www.scala-lang.org/docu/files/api/scala/xml/NodeSeq.html#\(String)]
with [ http://www.w3.org/TR/xpath]
1. I suspect a nice integration between Scala XML and javax.xml.xpath
would prove very popular. "Nice" being (a) issuing Java-backed XPath
queries over Scala NodeSeqs is a single method call, (b) where the
query-result is a DOM NodeList, turning it back into a Scala NodeSeq.
An easy way to do this would be with implicits, but this seemingly
requires duplicating the whole NodeSeq into a DOM tree. Still welcome
for non-perf critical use cases, but kinda unsatisfying if it wont
scale out to industrial grade use, as you have to switch tactics
mid-stream (which is against Scala's philosophy).
I just noticed Stephan Koltsov started something more elegant at
[ http://code.google.com/p/scala-xml-jaxen/] that appears to adapt the
Jaxen engine to understand Scala XML nodes. javax.XPath seems to offer
similar extension point. Sounds like best long-term option?
Or, make \ & \\ methods work more (or ideally, exactly) like XPath,
but thats alot of work that duplicates working java code.
2. I now feel pages 524-526 of Programming in Scala (XML Pattern
Matching) dont give particularly useful guidance, IMO the sample
problems could be easier solved using only the \ & \\ operators and
for comprehensions.
3. Scala XML Literals allow <a>{ <b/> }</a> (an Elem), and <a/><a/> (a
NodeBuffer), but not <a/>{ <b/> }<a/>. While I (kind of) understand
why, the rules feel rather unintuitive and implementation-driven.
Proposal: Could XML fragment literals, ie any NodeSeq and not just
Elem, be supported by an enclosing delimiter, something similar to """
for strings?
Apologies if these issues were all raked over in detail last week. I
did do a couple of list archive searches before posting.
-Ben
Problem description: Within an XML document, transform fragments of this form:
<vector><default>
<capacityIncrement>0</capacityIncrement>
<elementCount>0</elementCount>
<elementData id="1291">
<null/>
</elementData>
</default></vector>
...into this:
<size>0</size>
<elementData id="1291">
<null/>
</elementData>
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
I agree that with a fixed-depth xml tree the xpath operators are more efficient. The power of pattern matching I think is when nodes can be nested. Think of something like Jetty's xml format or Flex's. You aren't looking for a specific node; you have to recurse through the whole tree and handle each node for what it is.
To write a few nodes as a literal, I believe you surround them with <xml:group>.
-------------------------------------
Ben Hutchison< ben@...> wrote:
Today I needed to transform some XML in a fairly minor way; some data
serialized using the XStream library (* task description in footer). I
chose to try it using Scala's XML APIs, my first use of them.
I was surpised by what I discovered tackling the task.
- XML Pattern Matching, which had sounded really cool when I'd read
about it, turned out to be of no use in practice. IMO It should
de-emphasized or even deprecated.
- Much more useful are the \ and \\ operators. This seems to be
because they do what XPath does and are easy to invoke, however they
are less feature-complete and less documented than XPath. Re: level of
docs, compare [eg
http://www.scala-lang.org/docu/files/api/scala/xml/NodeSeq.html#\(String)]
with [ http://www.w3.org/TR/xpath]
1. I suspect a nice integration between Scala XML and javax.xml.xpath
would prove very popular. "Nice" being (a) issuing Java-backed XPath
queries over Scala NodeSeqs is a single method call, (b) where the
query-result is a DOM NodeList, turning it back into a Scala NodeSeq.
An easy way to do this would be with implicits, but this seemingly
requires duplicating the whole NodeSeq into a DOM tree. Still welcome
for non-perf critical use cases, but kinda unsatisfying if it wont
scale out to industrial grade use, as you have to switch tactics
mid-stream (which is against Scala's philosophy).
I just noticed Stephan Koltsov started something more elegant at
[ http://code.google.com/p/scala-xml-jaxen/] that appears to adapt the
Jaxen engine to understand Scala XML nodes. javax.XPath seems to offer
similar extension point. Sounds like best long-term option?
Or, make \ & \\ methods work more (or ideally, exactly) like XPath,
but thats alot of work that duplicates working java code.
2. I now feel pages 524-526 of Programming in Scala (XML Pattern
Matching) dont give particularly useful guidance, IMO the sample
problems could be easier solved using only the \ & \\ operators and
for comprehensions.
3. Scala XML Literals allow <a>{ <b/> }</a> (an Elem), and <a/><a/> (a
NodeBuffer), but not <a/>{ <b/> }<a/>. While I (kind of) understand
why, the rules feel rather unintuitive and implementation-driven.
Proposal: Could XML fragment literals, ie any NodeSeq and not just
Elem, be supported by an enclosing delimiter, something similar to """
for strings?
Apologies if these issues were all raked over in detail last week. I
did do a couple of list archive searches before posting.
-Ben
Problem description: Within an XML document, transform fragments of this form:
<vector><default>
<capacityIncrement>0</capacityIncrement>
<elementCount>0</elementCount>
<elementData id="1291">
<null/>
</elementData>
</default></vector>
...into this:
<size>0</size>
<elementData id="1291">
<null/>
</elementData>
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
Ben Hutchison wrote:
> - XML Pattern Matching, which had sounded really cool when I'd read about it, turned out to be of no use in practice. IMO It should de-emphasized or even deprecated.
>
There's been some talk of enhancing Scala's XML pattern matching recently; at the moment I'm not aware of any concrete, fully fleshed-out proposal though. http://thread.gmane.org/gmane.comp.lang.scala.xml/92/focus=103 is probably the most recent discussion of any substance on the subject.
> - Much more useful are the \ and \\ operators. This seems to be because they do what XPath does and are easy to invoke, however they are less feature-complete and less documented than XPath. Re: level of
> docs, compare [eg http://www.scala-lang.org/docu/files/api/scala/xml/NodeSeq.html#\(String)]
> with [ http://www.w3.org/TR/xpath]
>
> 1. I suspect a nice integration between Scala XML and javax.xml.xpath
> would prove very popular. "Nice" being (a) issuing Java-backed XPath
> queries over Scala NodeSeqs is a single method call, (b) where the
> query-result is a DOM NodeList, turning it back into a Scala NodeSeq.
>
> An easy way to do this would be with implicits, but this seemingly
> requires duplicating the whole NodeSeq into a DOM tree. Still welcome
> for non-perf critical use cases, but kinda unsatisfying if it wont
> scale out to industrial grade use, as you have to switch tactics
> mid-stream (which is against Scala's philosophy).
>
It would certainly be very nice to have some kind of native XPath support. It probably can't look much like the idiomatic XPath syntax, but I suspect it would be much better to try to leverage for comprehensions anyway. Also note that due to scala.xml's approach to immutability, nodes don't know their parent, so some types of expression will be significantly harder to implement than others.
And while I do think it would be quite useful to have round trip
org.w3c.dom conversion routines, I'm not sure how politically correct it
would be to couple against the Java DOM APIs in the Scala standard
library. I'd suggest filing an enhancement request anyway. :)
It would probably be nice to have connections between scala.xml and
existing XPath APIs, but the same concern about their suitability in the
Scala standard library applies. Maybe something like sbaz or scalax
(unfortunately neither seems particularly lively at the moment) is the
natural home for them?
> Or, make \ & \\ methods work more (or ideally, exactly) like XPath,
> but thats alot of work that duplicates working java code.
>
I think at this point we need some very concrete suggestions to
stimulate further suggestion. Ordinarily I'm very generous with ideas
about how Somebody Really Oughta do things but I'm a bit pressed for
time these days. :/
> 3. Scala XML Literals allow <a>{ <b/> }</a> (an Elem), and <a/><a/> (a NodeBuffer), but not <a/>{ <b/> }<a/>. While I (kind of) understand why, the rules feel rather unintuitive and implementation-driven.
> Proposal: Could XML fragment literals, ie any NodeSeq and not just Elem, be supported by an enclosing delimiter, something similar to """ for strings?
>
If you scroll down a bit at
http://burak.emir.googlepages.com/scalaxbook.docbk.html#id2893345 you'll
see <xml:group/> for precisely this purpose. :)
Thanks for your feedback!
-0xe1a
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
Like XML itself, XPath is a standard. An implementation that lets programmers from other languages use their existing knowledge without modification is better than one that's idiomatically Scalaish. Something like this:
for (n <- library \ "//book[author/@id=234]") ...
But I said the same thing about native regular expressions, and I'm not likely to get those either. :)
Arrgh wrote:
Ben Hutchison wrote:
> - XML Pattern Matching, which had sounded really cool when I'd read about it, turned out to be of no use in practice. IMO It should de-emphasized or even deprecated.
>
There's been some talk of enhancing Scala's XML pattern matching recently; at the moment I'm not aware of any concrete, fully fleshed-out proposal though. http://thread.gmane.org/gmane.comp.lang.scala.xml/92/focus=103 is probably the most recent discussion of any substance on the subject.
> - Much more useful are the \ and \\ operators. This seems to be because they do what XPath does and are easy to invoke, however they are less feature-complete and less documented than XPath. Re: level of
> docs, compare [eg http://www.scala-lang.org/docu/files/api/scala/xml/NodeSeq.html#\(String)]
> with [ http://www.w3.org/TR/xpath]
>
> 1. I suspect a nice integration between Scala XML and javax.xml.xpath
> would prove very popular. "Nice" being (a) issuing Java-backed XPath
> queries over Scala NodeSeqs is a single method call, (b) where the
> query-result is a DOM NodeList, turning it back into a Scala NodeSeq.
>
> An easy way to do this would be with implicits, but this seemingly
> requires duplicating the whole NodeSeq into a DOM tree. Still welcome
> for non-perf critical use cases, but kinda unsatisfying if it wont
> scale out to industrial grade use, as you have to switch tactics
> mid-stream (which is against Scala's philosophy).
>
It would certainly be very nice to have some kind of native XPath support. It probably can't look much like the idiomatic XPath syntax, but I suspect it would be much better to try to leverage for comprehensions anyway. Also note that due to scala.xml's approach to immutability, nodes don't know their parent, so some types of expression will be significantly harder to implement than others.
And while I do think it would be quite useful to have round trip
org.w3c.dom conversion routines, I'm not sure how politically correct it
would be to couple against the Java DOM APIs in the Scala standard
library. I'd suggest filing an enhancement request anyway. :)
It would probably be nice to have connections between scala.xml and
existing XPath APIs, but the same concern about their suitability in the
Scala standard library applies. Maybe something like sbaz or scalax
(unfortunately neither seems particularly lively at the moment) is the
natural home for them?
> Or, make \ & \\ methods work more (or ideally, exactly) like XPath,
> but thats alot of work that duplicates working java code.
>
I think at this point we need some very concrete suggestions to
stimulate further suggestion. Ordinarily I'm very generous with ideas
about how Somebody Really Oughta do things but I'm a bit pressed for
time these days. :/
> 3. Scala XML Literals allow { } (an Elem), and (a NodeBuffer), but not { } . While I (kind of) understand why, the rules feel rather unintuitive and implementation-driven.
> Proposal: Could XML fragment literals, ie any NodeSeq and not just Elem, be supported by an enclosing delimiter, something similar to """ for strings?
>
If you scroll down a bit at
http://burak.emir.googlepages.com/scalaxbook.docbk.html#id2893345 you'll
see <xml:group/> for precisely this purpose. :)
Thanks for your feedback!
-0xe1a
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
So maybe someone wants to implement a subset of xpath using parsers as a start? :) It could use an implicit to supply a new operator to NodeSeq.
-------------------------------------
Marcus Downing< marcus@...> wrote:
Like XML itself, XPath is a standard. An implementation that lets programmers
from other languages use their existing knowledge without modification is
better than one that's idiomatically Scalaish. Something like this:
for (n <- library \ "//book[author/@id=234]") ...
But I said the same thing about native regular expressions, and I'm not
likely to get those either. :)
Arrgh wrote:
>
> Ben Hutchison wrote:
>
>> - XML Pattern Matching, which had sounded really cool when I'd read about
>> it, turned out to be of no use in practice. IMO It should de-emphasized
>> or even deprecated.
>>
> There's been some talk of enhancing Scala's XML pattern matching recently;
> at the moment I'm not aware of any concrete, fully fleshed-out proposal
> though. http://thread.gmane.org/gmane.comp.lang.scala.xml/92/focus=103 is
> probably the most recent discussion of any substance on the subject.
>
>> - Much more useful are the \ and \\ operators. This seems to be because
>> they do what XPath does and are easy to invoke, however they are less
>> feature-complete and less documented than XPath. Re: level of
>> docs, compare [eg
>> http://www.scala-lang.org/docu/files/api/scala/xml/NodeSeq.html#\(String)]
>> with [ http://www.w3.org/TR/xpath]
>>
>> 1. I suspect a nice integration between Scala XML and javax.xml.xpath
>> would prove very popular. "Nice" being (a) issuing Java-backed XPath
>> queries over Scala NodeSeqs is a single method call, (b) where the
>> query-result is a DOM NodeList, turning it back into a Scala NodeSeq.
>>
>> An easy way to do this would be with implicits, but this seemingly
>> requires duplicating the whole NodeSeq into a DOM tree. Still welcome
>> for non-perf critical use cases, but kinda unsatisfying if it wont
>> scale out to industrial grade use, as you have to switch tactics
>> mid-stream (which is against Scala's philosophy).
>>
> It would certainly be very nice to have some kind of native XPath support.
> It probably can't look much like the idiomatic XPath syntax, but I suspect
> it would be much better to try to leverage for comprehensions anyway.
> Also note that due to scala.xml's approach to immutability, nodes don't
> know their parent, so some types of expression will be significantly
> harder to implement than others.
>
> And while I do think it would be quite useful to have round trip
> org.w3c.dom conversion routines, I'm not sure how politically correct it
> would be to couple against the Java DOM APIs in the Scala standard
> library. I'd suggest filing an enhancement request anyway. :)
>
> It would probably be nice to have connections between scala.xml and
> existing XPath APIs, but the same concern about their suitability in the
> Scala standard library applies. Maybe something like sbaz or scalax
> (unfortunately neither seems particularly lively at the moment) is the
> natural home for them?
>> Or, make \ & \\ methods work more (or ideally, exactly) like XPath,
>> but thats alot of work that duplicates working java code.
>>
> I think at this point we need some very concrete suggestions to
> stimulate further suggestion. Ordinarily I'm very generous with ideas
> about how Somebody Really Oughta do things but I'm a bit pressed for
> time these days. :/
>> 3. Scala XML Literals allow { } (an Elem), and (a NodeBuffer), but
>> not { } . While I (kind of) understand why, the rules feel rather
>> unintuitive and implementation-driven.
>> Proposal: Could XML fragment literals, ie any NodeSeq and not just Elem,
>> be supported by an enclosing delimiter, something similar to """ for
>> strings?
>>
> If you scroll down a bit at
> http://burak.emir.googlepages.com/scalaxbook.docbk.html#id2893345 you'll
> see <xml:group/> for precisely this purpose. :)
>
> Thanks for your feedback!
>
> -0xe1a
>
>
--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24247516.htmlSent from the Scala - Debate mailing list archive at Nabble.com.
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...> wrote:
>
> Like XML itself, XPath is a standard. An implementation that lets programmers
> from other languages use their existing knowledge without modification is
> better than one that's idiomatically Scalaish. Something like this:
>
> for (n <- library \ "//book[author/@id=234]") ...
I agree that supporting the standard is good, reuse of knowledge etc.
But, having done some more reading, I suspect Scala has made a
conscious, defensible choice to offer a different API.
AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
and cannot go backwards through tree. I see now that this means
supporting full XPath was probably not practical without converting to
a DOM rep.
Should Scala offer a XPath query method on Node, if only parts of the
standard are supported, and/or sometimes incur severe performance
costs?
Maybe it should, but make the options clear: High performance default
impl with only "XPath-like" \ & \\ operators, OR, full XPath query
which will force use of more memory hungry & less functional data rep?
-Ben
PS thanks Naftoli, Alex for pointers to <xml:group>
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
I believe that using a zipper pattern [1] would allow not all, but most xpaths to be handled. It would have some conditions for successful use though.
The problem with using xpath for scala.xml is that an xpath rule is allowed to refer to parent nodes, while scala.xml's nodes don't contain links to their parents. What's needed is a way to access the known parents, without compromising scala.xml's immutable nodes.
My functional-fu is weak, but here's my limited understanding. The zipper is a pattern for tree traversal where the current node is represented by a temporary object containing the node (and therefore its children), its siblings (a pair of lists, one for previous/already traversed, another for next/not yet traversed), and the path leading to this node from the root or start point.
From this point, you can step in any direction: up, down, left or right. Each step produces another zipper, but only levels down create a larger object.
case class NodeZipper (node: Node, prev: List[Node], next: List[Node], parent: Option[NodeZipper]) {
def nextNode: Option[NodeZipper] =
next match {
case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
case _ => None
}
def prevNode: Option[NodeZipper] =
prev match {
case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
case _ => None
}
}
With an engine that used something like this to traverse the tree, you could successfully implement an xpath query that contained rules referring to parent nodes, provided they didn't step above the node from which the query was run. In other words, the starting node is assumed to be the root of the document, even when it isn't.
That means there's a difference between what should be equivalent queries depending on which node they were launched on.
To implement a more thorough version of xpath, in which queries can correctly traverse the whole tree including its root, would require a more drastic change - one which I'm *not* seriously suggesting. When an XML document is loaded, its nodes are made immutably as at present; however, when that document is queried, the nodes returned to the user are in fact zippers, wrapping around the nodes and providing information about where in the document the node belongs. This distinction is maintained between the data themselves, and the zippers that client code executes. These zippers would have an implicit conversion to Node, allowing them to behave properly if you attempted to copy a node into another tree. To make them sound more Scalaish, you could call them something like RichNode. But like I said, I'm *not* seriously suggesting that.
In short, there are ways of implementing a reasonable approximation to xpath, even with immutable Nodes.
[1] http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf
Ben Hutchison wrote:
On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing<marcus@minotaur.it> wrote:
>
> Like XML itself, XPath is a standard. An implementation that lets programmers
> from other languages use their existing knowledge without modification is
> better than one that's idiomatically Scalaish. Something like this:
>
> for (n <- library \ "//book[author/@id=234]") ...
I agree that supporting the standard is good, reuse of knowledge etc.
But, having done some more reading, I suspect Scala has made a
conscious, defensible choice to offer a different API.
AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
and cannot go backwards through tree. I see now that this means
supporting full XPath was probably not practical without converting to
a DOM rep.
Should Scala offer a XPath query method on Node, if only parts of the
standard are supported, and/or sometimes incur severe performance
costs?
Maybe it should, but make the options clear: High performance default
impl with only "XPath-like" \ & \\ operators, OR, full XPath query
which will force use of more memory hungry & less functional data rep?
-Ben
PS thanks Naftoli, Alex for pointers to <xml:group>
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
At any rate, I'd like to mention that you can use "filter", "map" and "flatMap" on the results of "\" and "\\", which makes it possible to query the XML if not in an XPath-standard way, in the Scala-standard way:
for(n <- library \\ "book" if !((n \ "author") filter (_.attribute("id") == "234")).isEmpty ) ...
On Mon, Jun 29, 2009 at 12:20 AM, Ben Hutchison <ben@...> wrote:
On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...> wrote:
>
> Like XML itself, XPath is a standard. An implementation that lets programmers
> from other languages use their existing knowledge without modification is
> better than one that's idiomatically Scalaish. Something like this:
>
> for (n <- library \ "//book[author/@id=234]") ...
I agree that supporting the standard is good, reuse of knowledge etc.
But, having done some more reading, I suspect Scala has made a
conscious, defensible choice to offer a different API.
AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
and cannot go backwards through tree. I see now that this means
supporting full XPath was probably not practical without converting to
a DOM rep.
Should Scala offer a XPath query method on Node, if only parts of the
standard are supported, and/or sometimes incur severe performance
costs?
Maybe it should, but make the options clear: High performance default
impl with only "XPath-like" \ & \\ operators, OR, full XPath query
which will force use of more memory hungry & less functional data rep?
-Ben
PS thanks Naftoli, Alex for pointers to <xml:group>
-- Daniel C. Sobral Something I learned in academia: there are three kinds of academic reviews: review by name, review by reference and review by value.
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
May I suggest that we move this discussion to -xml? (cc'd)...
Marcus Downing wrote:
> I believe that using a zipper pattern [1] would allow not all, but most
> xpaths to be handled. It would have some conditions for successful use
> though.
>
> The problem with using xpath for scala.xml is that an xpath rule is allowed
> to refer to parent nodes, while scala.xml's nodes don't contain links to
> their parents. What's needed is a way to access the known parents, without
> compromising scala.xml's immutable nodes.
>
Indeed, I stumbled on http://www.kmonos.net/pub/Slit/index.en.html last
year and thought it might be interesting for an XML navigation idea I
was daydreaming about.
> To implement a more thorough version of xpath, in which queries can
> correctly traverse the whole tree including its root, would require a more
> drastic change - one which I'm *not* seriously suggesting. When an XML
> document is loaded, its nodes are made immutably as at present; however,
> when that document is queried, the nodes returned to the user are in fact
> zippers, wrapping around the nodes and providing information about where in
> the document the node belongs. This distinction is maintained between the
> data themselves, and the zippers that client code executes. These zippers
> would have an implicit conversion to Node, allowing them to behave properly
> if you attempted to copy a node into another tree. To make them sound more
> Scalaish, you could call them something like RichNode. But like I said, I'm
> *not* seriously suggesting that.
>
Most XPath libraries require the user to "compile" an expression down to
an AST before it can be used--interrogating the AST would let you know
fairly quickly whether the expression uses problematic axes.
I think the following axes are problematic to some degree with the
current scala.xml node classes:
ancestor, ancestor-or-self, following, following-sibling, parent,
preceding, preceding-sibling
The following should be uncomplicated:
attribute, child, descendant, descendant-or-self, namespace, self
To get around the inability to navigate to the root of a document, we
could just make the user remember and supply the root element in any
XPath API that uses an absolute expression... Or we could play games
with hidden mutation... Uncontended synchronization for objects that
don't escape their creating thread is getting very cheap. ;)
-0xe1a
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
That's exactly the sort of thing I was talking about (but didn't know enough to come up with myself).
Arrgh wrote:
Or we could play games
with hidden mutation... Uncontended synchronization for objects that
don't escape their creating thread is getting very cheap. ;)
I'm against that approach. The immutability of scala.xml is one of it's selling points compared to other XML libraries I've used -somehow or other I've always found a way to confuse and produce errors in them, that scala.xml should be immune to.
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
Marcus, +1 zipper-based approach. To do zipper more generically, however, we need a good implementation of data structure differentiation. i started to look at this, but then Martin published the new collections proposal and i wanted to wait until this stabilized. Maybe a good evaluation of Martin's proposal is to see how it supports an implementation of differentiation and zipper.
Best wishes, --greg On Sun, Jun 28, 2009 at 9:14 PM, Marcus Downing <marcus@...> wrote:
I believe that using a zipper pattern [1] would allow not all, but most
xpaths to be handled. It would have some conditions for successful use
though.
The problem with using xpath for scala.xml is that an xpath rule is allowed
to refer to parent nodes, while scala.xml's nodes don't contain links to
their parents. What's needed is a way to access the known parents, without
compromising scala.xml's immutable nodes.
My functional-fu is weak, but here's my limited understanding. The zipper is
a pattern for tree traversal where the current node is represented by a
temporary object containing the node (and therefore its children), its
siblings (a pair of lists, one for previous/already traversed, another for
next/not yet traversed), and the path leading to this node from the root or
start point.
From this point, you can step in any direction: up, down, left or right.
Each step produces another zipper, but only levels down create a larger
object.
case class NodeZipper (node: Node, prev: List[Node], next: List[Node],
parent: Option[NodeZipper]) {
def nextNode: Option[NodeZipper] =
next match {
case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
case _ => None
}
def prevNode: Option[NodeZipper] =
prev match {
case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
case _ => None
}
}
With an engine that used something like this to traverse the tree, you could
successfully implement an xpath query that contained rules referring to
parent nodes, provided they didn't step above the node from which the query
was run. In other words, the starting node is assumed to be the root of the
document, even when it isn't.
That means there's a difference between what should be equivalent queries
depending on which node they were launched on.
To implement a more thorough version of xpath, in which queries can
correctly traverse the whole tree including its root, would require a more
drastic change - one which I'm *not* seriously suggesting. When an XML
document is loaded, its nodes are made immutably as at present; however,
when that document is queried, the nodes returned to the user are in fact
zippers, wrapping around the nodes and providing information about where in
the document the node belongs. This distinction is maintained between the
data themselves, and the zippers that client code executes. These zippers
would have an implicit conversion to Node, allowing them to behave properly
if you attempted to copy a node into another tree. To make them sound more
Scalaish, you could call them something like RichNode. But like I said, I'm
*not* seriously suggesting that.
In short, there are ways of implementing a reasonable approximation to
xpath, even with immutable Nodes.
[1]
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf
Ben Hutchison wrote:
>
> On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...>
> wrote:
>>
>> Like XML itself, XPath is a standard. An implementation that lets
>> programmers
>> from other languages use their existing knowledge without modification is
>> better than one that's idiomatically Scalaish. Something like this:
>>
>> for (n <- library \ "//book[author/@id=234]") ...
>
> I agree that supporting the standard is good, reuse of knowledge etc.
>
> But, having done some more reading, I suspect Scala has made a
> conscious, defensible choice to offer a different API.
>
> AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
> and cannot go backwards through tree. I see now that this means
> supporting full XPath was probably not practical without converting to
> a DOM rep.
>
> Should Scala offer a XPath query method on Node, if only parts of the
> standard are supported, and/or sometimes incur severe performance
> costs?
>
> Maybe it should, but make the options clear: High performance default
> impl with only "XPath-like" \ & \\ operators, OR, full XPath query
> which will force use of more memory hungry & less functional data rep?
>
> -Ben
>
> PS thanks Naftoli, Alex for pointers to <xml:group>
>
>
--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24248167.html
Sent from the Scala - Debate mailing list archive at Nabble.com.
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117 +1 206.650.3740 http://biosimilarity.blogspot.com
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
What do you mean by "data structure differentiation"? Any papers or references you could provide? --j On Tue, Jun 30, 2009 at 11:41 AM, Meredith Gregory <lgreg.meredith@...> wrote:
Marcus,
+1 zipper-based approach.
To do zipper more generically, however, we need a good implementation of data structure differentiation. i started to look at this, but then Martin published the new collections proposal and i wanted to wait until this stabilized. Maybe a good evaluation of Martin's proposal is to see how it supports an implementation of differentiation and zipper.
Best wishes,
--gregOn Sun, Jun 28, 2009 at 9:14 PM, Marcus Downing <marcus@...> wrote:
I believe that using a zipper pattern [1] would allow not all, but most
xpaths to be handled. It would have some conditions for successful use
though.
The problem with using xpath for scala.xml is that an xpath rule is allowed
to refer to parent nodes, while scala.xml's nodes don't contain links to
their parents. What's needed is a way to access the known parents, without
compromising scala.xml's immutable nodes.
My functional-fu is weak, but here's my limited understanding. The zipper is
a pattern for tree traversal where the current node is represented by a
temporary object containing the node (and therefore its children), its
siblings (a pair of lists, one for previous/already traversed, another for
next/not yet traversed), and the path leading to this node from the root or
start point.
From this point, you can step in any direction: up, down, left or right.
Each step produces another zipper, but only levels down create a larger
object.
case class NodeZipper (node: Node, prev: List[Node], next: List[Node],
parent: Option[NodeZipper]) {
def nextNode: Option[NodeZipper] =
next match {
case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
case _ => None
}
def prevNode: Option[NodeZipper] =
prev match {
case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
case _ => None
}
}
With an engine that used something like this to traverse the tree, you could
successfully implement an xpath query that contained rules referring to
parent nodes, provided they didn't step above the node from which the query
was run. In other words, the starting node is assumed to be the root of the
document, even when it isn't.
That means there's a difference between what should be equivalent queries
depending on which node they were launched on.
To implement a more thorough version of xpath, in which queries can
correctly traverse the whole tree including its root, would require a more
drastic change - one which I'm *not* seriously suggesting. When an XML
document is loaded, its nodes are made immutably as at present; however,
when that document is queried, the nodes returned to the user are in fact
zippers, wrapping around the nodes and providing information about where in
the document the node belongs. This distinction is maintained between the
data themselves, and the zippers that client code executes. These zippers
would have an implicit conversion to Node, allowing them to behave properly
if you attempted to copy a node into another tree. To make them sound more
Scalaish, you could call them something like RichNode. But like I said, I'm
*not* seriously suggesting that.
In short, there are ways of implementing a reasonable approximation to
xpath, even with immutable Nodes.
[1]
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf
Ben Hutchison wrote:
>
> On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...>
> wrote:
>>
>> Like XML itself, XPath is a standard. An implementation that lets
>> programmers
>> from other languages use their existing knowledge without modification is
>> better than one that's idiomatically Scalaish. Something like this:
>>
>> for (n <- library \ "//book[author/@id=234]") ...
>
> I agree that supporting the standard is good, reuse of knowledge etc.
>
> But, having done some more reading, I suspect Scala has made a
> conscious, defensible choice to offer a different API.
>
> AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
> and cannot go backwards through tree. I see now that this means
> supporting full XPath was probably not practical without converting to
> a DOM rep.
>
> Should Scala offer a XPath query method on Node, if only parts of the
> standard are supported, and/or sometimes incur severe performance
> costs?
>
> Maybe it should, but make the options clear: High performance default
> impl with only "XPath-like" \ & \\ operators, OR, full XPath query
> which will force use of more memory hungry & less functional data rep?
>
> -Ben
>
> PS thanks Naftoli, Alex for pointers to <xml:group>
>
>
--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24248167.html
Sent from the Scala - Debate mailing list archive at Nabble.com.
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117
+1 206.650.3740
http://biosimilarity.blogspot.com
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
Jorge, Here are some links. Best wishes, --greg On Thu, Jul 2, 2009 at 4:32 PM, Jorge Ortiz <jorge.ortiz@...> wrote:
What do you mean by "data structure differentiation"? Any papers or references you could provide?
--jOn Tue, Jun 30, 2009 at 11:41 AM, Meredith Gregory <lgreg.meredith@...> wrote:
Marcus,
+1 zipper-based approach.
To do zipper more generically, however, we need a good implementation of data structure differentiation. i started to look at this, but then Martin published the new collections proposal and i wanted to wait until this stabilized. Maybe a good evaluation of Martin's proposal is to see how it supports an implementation of differentiation and zipper.
Best wishes,
--gregOn Sun, Jun 28, 2009 at 9:14 PM, Marcus Downing <marcus@...> wrote:
I believe that using a zipper pattern [1] would allow not all, but most
xpaths to be handled. It would have some conditions for successful use
though.
The problem with using xpath for scala.xml is that an xpath rule is allowed
to refer to parent nodes, while scala.xml's nodes don't contain links to
their parents. What's needed is a way to access the known parents, without
compromising scala.xml's immutable nodes.
My functional-fu is weak, but here's my limited understanding. The zipper is
a pattern for tree traversal where the current node is represented by a
temporary object containing the node (and therefore its children), its
siblings (a pair of lists, one for previous/already traversed, another for
next/not yet traversed), and the path leading to this node from the root or
start point.
From this point, you can step in any direction: up, down, left or right.
Each step produces another zipper, but only levels down create a larger
object.
case class NodeZipper (node: Node, prev: List[Node], next: List[Node],
parent: Option[NodeZipper]) {
def nextNode: Option[NodeZipper] =
next match {
case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
case _ => None
}
def prevNode: Option[NodeZipper] =
prev match {
case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
case _ => None
}
}
With an engine that used something like this to traverse the tree, you could
successfully implement an xpath query that contained rules referring to
parent nodes, provided they didn't step above the node from which the query
was run. In other words, the starting node is assumed to be the root of the
document, even when it isn't.
That means there's a difference between what should be equivalent queries
depending on which node they were launched on.
To implement a more thorough version of xpath, in which queries can
correctly traverse the whole tree including its root, would require a more
drastic change - one which I'm *not* seriously suggesting. When an XML
document is loaded, its nodes are made immutably as at present; however,
when that document is queried, the nodes returned to the user are in fact
zippers, wrapping around the nodes and providing information about where in
the document the node belongs. This distinction is maintained between the
data themselves, and the zippers that client code executes. These zippers
would have an implicit conversion to Node, allowing them to behave properly
if you attempted to copy a node into another tree. To make them sound more
Scalaish, you could call them something like RichNode. But like I said, I'm
*not* seriously suggesting that.
In short, there are ways of implementing a reasonable approximation to
xpath, even with immutable Nodes.
[1]
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf
Ben Hutchison wrote:
>
> On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...>
> wrote:
>>
>> Like XML itself, XPath is a standard. An implementation that lets
>> programmers
>> from other languages use their existing knowledge without modification is
>> better than one that's idiomatically Scalaish. Something like this:
>>
>> for (n <- library \ "//book[author/@id=234]") ...
>
> I agree that supporting the standard is good, reuse of knowledge etc.
>
> But, having done some more reading, I suspect Scala has made a
> conscious, defensible choice to offer a different API.
>
> AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
> and cannot go backwards through tree. I see now that this means
> supporting full XPath was probably not practical without converting to
> a DOM rep.
>
> Should Scala offer a XPath query method on Node, if only parts of the
> standard are supported, and/or sometimes incur severe performance
> costs?
>
> Maybe it should, but make the options clear: High performance default
> impl with only "XPath-like" \ & \\ operators, OR, full XPath query
> which will force use of more memory hungry & less functional data rep?
>
> -Ben
>
> PS thanks Naftoli, Alex for pointers to <xml:group>
>
>
--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24248167.html
Sent from the Scala - Debate mailing list archive at Nabble.com.
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117
+1 206.650.3740
http://biosimilarity.blogspot.com
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117 +1 206.650.3740 http://biosimilarity.blogspot.com
|

|
Re: Some strengths and weaknesses in Scala's XML capabilities
Jorge, It's pretty cool stuff, eh? When i was working on BigTop/Highwire (an operating system/programming language based on π-calculus), one of my central motivations was a characterization of data. All data is a form of program, but is there a clear dividing line that says exactly which programs are data? Is there a typing discipline that can pick out exactly which programs we might want to call data? The species of structure stuff is the beginnings of an very interesting proposal for what constitutes such a dividing line -- and it seems to come from such an unexpected angle: analysis.
Best wishes, --greg On Thu, Jul 2, 2009 at 7:20 PM, Meredith Gregory <lgreg.meredith@...> wrote:
Jorge,
Here are some links.
Best wishes,
--gregOn Thu, Jul 2, 2009 at 4:32 PM, Jorge Ortiz <jorge.ortiz@...> wrote:
What do you mean by "data structure differentiation"? Any papers or references you could provide?
--jOn Tue, Jun 30, 2009 at 11:41 AM, Meredith Gregory <lgreg.meredith@...> wrote:
Marcus,
+1 zipper-based approach.
To do zipper more generically, however, we need a good implementation of data structure differentiation. i started to look at this, but then Martin published the new collections proposal and i wanted to wait until this stabilized. Maybe a good evaluation of Martin's proposal is to see how it supports an implementation of differentiation and zipper.
Best wishes,
--gregOn Sun, Jun 28, 2009 at 9:14 PM, Marcus Downing <marcus@...> wrote:
I believe that using a zipper pattern [1] would allow not all, but most
xpaths to be handled. It would have some conditions for successful use
though.
The problem with using xpath for scala.xml is that an xpath rule is allowed
to refer to parent nodes, while scala.xml's nodes don't contain links to
their parents. What's needed is a way to access the known parents, without
compromising scala.xml's immutable nodes.
My functional-fu is weak, but here's my limited understanding. The zipper is
a pattern for tree traversal where the current node is represented by a
temporary object containing the node (and therefore its children), its
siblings (a pair of lists, one for previous/already traversed, another for
next/not yet traversed), and the path leading to this node from the root or
start point.
From this point, you can step in any direction: up, down, left or right.
Each step produces another zipper, but only levels down create a larger
object.
case class NodeZipper (node: Node, prev: List[Node], next: List[Node],
parent: Option[NodeZipper]) {
def nextNode: Option[NodeZipper] =
next match {
case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
case _ => None
}
def prevNode: Option[NodeZipper] =
prev match {
case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
case _ => None
}
}
With an engine that used something like this to traverse the tree, you could
successfully implement an xpath query that contained rules referring to
parent nodes, provided they didn't step above the node from which the query
was run. In other words, the starting node is assumed to be the root of the
document, even when it isn't.
That means there's a difference between what should be equivalent queries
depending on which node they were launched on.
To implement a more thorough version of xpath, in which queries can
correctly traverse the whole tree including its root, would require a more
drastic change - one which I'm *not* seriously suggesting. When an XML
document is loaded, its nodes are made immutably as at present; however,
when that document is queried, the nodes returned to the user are in fact
zippers, wrapping around the nodes and providing information about where in
the document the node belongs. This distinction is maintained between the
data themselves, and the zippers that client code executes. These zippers
would have an implicit conversion to Node, allowing them to behave properly
if you attempted to copy a node into another tree. To make them sound more
Scalaish, you could call them something like RichNode. But like I said, I'm
*not* seriously suggesting that.
In short, there are ways of implementing a reasonable approximation to
xpath, even with immutable Nodes.
[1]
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf
Ben Hutchison wrote:
>
> On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing< marcus@...>
> wrote:
>>
>> Like XML itself, XPath is a standard. An implementation that lets
>> programmers
>> from other languages use their existing knowledge without modification is
>> better than one that's idiomatically Scalaish. Something like this:
>>
>> for (n <- library \ "//book[author/@id=234]") ...
>
> I agree that supporting the standard is good, reuse of knowledge etc.
>
> But, having done some more reading, I suspect Scala has made a
> conscious, defensible choice to offer a different API.
>
> AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
> and cannot go backwards through tree. I see now that this means
> supporting full XPath was probably not practical without converting to
> a DOM rep.
>
> Should Scala offer a XPath query method on Node, if only parts of the
> standard are supported, and/or sometimes incur severe performance
> costs?
>
> Maybe it should, but make the options clear: High performance default
> impl with only "XPath-like" \ & \\ operators, OR, full XPath query
> which will force use of more memory hungry & less functional data rep?
>
> -Ben
>
> PS thanks Naftoli, Alex for pointers to <xml:group>
>
>
--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24248167.html
Sent from the Scala - Debate mailing list archive at Nabble.com.
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117
+1 206.650.3740
http://biosimilarity.blogspot.com
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117 +1 206.650.3740 http://biosimilarity.blogspot.com
-- L.G. Meredith Managing Partner Biosimilarity LLC 1219 NW 83rd St Seattle, WA 98117 +1 206.650.3740 http://biosimilarity.blogspot.com
|