« Return to Thread: Some strengths and weaknesses in Scala's XML capabilities

Re: Some strengths and weaknesses in Scala's XML capabilities

by Marcus Downing :: Rate this Message:

Reply to Author | View in Thread

I believe that using a zipper pattern [1] would allow not all, but most xpaths to be handled. It would have some conditions for successful use though.

The problem with using xpath for scala.xml is that an xpath rule is allowed to refer to parent nodes, while scala.xml's nodes don't contain links to their parents. What's needed is a way to access the known parents, without compromising scala.xml's immutable nodes.

My functional-fu is weak, but here's my limited understanding. The zipper is a pattern for tree traversal where the current node is represented by a temporary object containing the node (and therefore its children), its siblings (a pair of lists, one for previous/already traversed, another for next/not yet traversed), and the path leading to this node from the root or start point.

From this point, you can step in any direction: up, down, left or right. Each step produces another zipper, but only levels down create a larger object.

  case class NodeZipper (node: Node, prev: List[Node], next: List[Node], parent: Option[NodeZipper]) {
    def nextNode: Option[NodeZipper] =
      next match {
        case n :: tail => Some(NodeZipper(n, node :: prev, tail, parent))
        case _ => None
      }

    def prevNode: Option[NodeZipper] =
      prev match {
        case n :: tail => Some(NodeZipper(n, tail, node :: next, parent))
        case _ => None
      }
  }

With an engine that used something like this to traverse the tree, you could successfully implement an xpath query that contained rules referring to parent nodes, provided they didn't step above the node from which the query was run. In other words, the starting node is assumed to be the root of the document, even when it isn't.

That means there's a difference between what should be equivalent queries depending on which node they were launched on.

To implement a more thorough version of xpath, in which queries can correctly traverse the whole tree including its root, would require a more drastic change - one which I'm *not* seriously suggesting. When an XML document is loaded, its nodes are made immutably as at present; however, when that document is queried, the nodes returned to the user are in fact zippers, wrapping around the nodes and providing information about where in the document the node belongs. This distinction is maintained between the data themselves, and the zippers that client code executes. These zippers would have an implicit conversion to Node, allowing them to behave properly if you attempted to copy a node into another tree. To make them sound more Scalaish, you could call them something like RichNode. But like I said, I'm *not* seriously suggesting that.

In short, there are ways of implementing a reasonable approximation to xpath, even with immutable Nodes.



[1]  http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf


Ben Hutchison wrote:
On Mon, Jun 29, 2009 at 12:15 PM, Marcus Downing<marcus@minotaur.it> wrote:
>
> Like XML itself, XPath is a standard. An implementation that lets programmers
> from other languages use their existing knowledge without modification is
> better than one that's idiomatically Scalaish. Something like this:
>
>  for (n <- library \ "//book[author/@id=234]") ...

I agree that supporting the standard is good, reuse of knowledge etc.

But, having done some more reading, I suspect Scala has made a
conscious, defensible choice to offer a different API.

AFAIK, the Scala XML rep is a singly-linked list. So, no parent refs
and cannot go backwards through tree. I see now that this means
supporting full XPath was probably not practical without converting to
a DOM rep.

Should Scala offer a XPath query method on Node, if only parts of the
standard are supported, and/or sometimes incur severe performance
costs?

Maybe it should, but make the options clear: High performance default
impl with only "XPath-like" \ & \\ operators, OR, full XPath query
which will force use of more memory hungry & less functional data rep?

-Ben

PS thanks Naftoli, Alex for pointers to <xml:group>

 « Return to Thread: Some strengths and weaknesses in Scala's XML capabilities