« Return to Thread: Some strengths and weaknesses in Scala's XML capabilities

Re: Some strengths and weaknesses in Scala's XML capabilities

by Meredith Gregory :: Rate this Message:

Reply to Author | View in Thread

Marcus,

+1 zipper-based approach.

To do zipper more generically, however, we need a good implementation of data structure differentiation. i started to look at this, but then Martin published the new collections proposal and i wanted to wait until this stabilized. Maybe a good evaluation of Martin's proposal is to see how it supports an implementation of differentiation and zipper.

Best wishes,

--greg

On Sun, Jun 28, 2009 at 9:14 PM, Marcus Downing <marcus@...> wrote:

I believe that using a zipper pattern [1] would allow not all, but most
xpaths to be handled. It would have some conditions for successful use
though.

The problem with using xpath for scala.xml is that an xpath rule is allowed
to refer to parent nodes, while scala.xml's nodes don't contain links to
their parents. What's needed is a way to access the known parents, without
compromising scala.xml's immutable nodes.

My functional-fu is weak, but here's my limited understanding. The zipper is
a pattern for tree traversal where the current node is represented by a
temporary object containing the node (and therefore its children), its
siblings (a pair of lists, one for previous/already traversed, another for
next/not yet traversed), and the path leading to this node from the root or
start point.

From this point, you can step in any direction: up, down, left or right.
Each step produces another zipper, but only levels down create a larger
object.

 case class NodeZipper (node: Node, prev: List[Node], next: List[Node],
parent: Option[NodeZipper]) {
   def nextNode: Option[NodeZipper] =
     next match {
       case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
       case _ => None
     }

   def prevNode: Option[NodeZipper] =
     prev match {
       case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
       case _ => None
     }
 }

With an engine that used something like this to traverse the tree, you could
successfully implement an xpath query that contained rules referring to
parent nodes, provided they didn't step above the node from which the query
was run. In other words, the starting node is assumed to be the root of the
document, even when it isn't.

That means there's a difference between what should be equivalent queries
depending on which node they were launched on.

To implement a more thorough version of xpath, in which queries can
correctly traverse the whole tree including its root, would require a more
drastic change - one which I'm *not* seriously suggesting. When an XML
document is loaded, its nodes are made immutably as at present; however,
when that document is queried, the nodes returned to the user are in fact
zippers, wrapping around the nodes and providing information about where in
the document the node belongs. This distinction is maintained between the
data themselves, and the zippers that client code executes. These zippers
would have an implicit conversion to Node, allowing them to behave properly
if you attempted to copy a node into another tree. To make them sound more
Scalaish, you could call them something like RichNode. But like I said, I'm
*not* seriously suggesting that.

In short, there are ways of implementing a reasonable approximation to
xpath, even with immutable Nodes.



[1]
http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf



Ben Hutchison wrote:
>
> On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing<marcus@...>
> wrote:
>>
>> Like XML itself, XPath is a standard. An implementation that lets
>> programmers
>> from other languages use their existing knowledge without modification is
>> better than one that's idiomatically Scalaish. Something like this:
>>
>>  for (n <- library \ "//book[author/@id=234]") ...
>
> I agree that supporting the standard is good, reuse of knowledge etc.
>
> But, having done some more reading, I suspect Scala has made a
> conscious, defensible choice to offer a different API.
>
> AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
> and cannot go backwards through tree. I see now that this means
> supporting full XPath was probably not practical without converting to
> a DOM rep.
>
> Should Scala offer a XPath query method on Node, if only parts of the
> standard are supported, and/or sometimes incur severe performance
> costs?
>
> Maybe it should, but make the options clear: High performance default
> impl with only "XPath-like" \ & \\ operators, OR, full XPath query
> which will force use of more memory hungry & less functional data rep?
>
> -Ben
>
> PS thanks Naftoli, Alex for pointers to <xml:group>
>
>

--
View this message in context: http://www.nabble.com/Some-strengths-and-weaknesses-in-Scala%27s-XML-capabilities-tp24242655p24248167.html
Sent from the Scala - Debate mailing list archive at Nabble.com.




--
L.G. Meredith
Managing Partner
Biosimilarity LLC
1219 NW 83rd St
Seattle, WA 98117

+1 206.650.3740

http://biosimilarity.blogspot.com

 « Return to Thread: Some strengths and weaknesses in Scala's XML capabilities