anything about it. Saxon uses the underlying regex engine in Java or .NET,
and these seems to perform reasonably well most of the time. I could raise
> -----Original Message-----
> From:
saxon-help-bounces@...
> [mailto:
saxon-help-bounces@...] On Behalf
> Of Abel Braaksma
> Sent: 23 January 2007 16:19
> To: Mailing list for SAXON XSLT queries
> Subject: [saxon] Certain non-zero-length non-matching regexes
> run forever onSaxon
>
> Hi Saxon users,
>
> I don't usually do any cross-posts, but this time I think I
> have to, my apologies beforehand. I abridged the text a bit,
> sorry if it is still rather wieldy. Original subject line:
> "Backtracking and eternal loops caused by regular expressions
> matching: what to expect from implementations?"
>
> [original, altered message]
>
> I happen to run into an XSLT 2 behavior that I am uncertain
> of if it is desired or expected behavior of implementations,
> if it is undesired, but still expected, or if it should be prohibited.
>
> Now, suppose this smallest possible match is zero length. In
> this situation, with classic regex parsers, the engine will
> try forever on a non-matching string (or at least very long).
> In XSLT, this behavior seems to happen even when the
> "smallest possible match" is of non-zero length. Note that
> this applies to subexpressions only, because zero-length
> matching regex (in its entirety) is not allowed in XSLT.
>
> In XSLT this is shown as follows, where I use the example of
> matching a CSV quoted string which may escape the quote by
> doubling it:
> Good csv string examples:
> "well quoted csv string"
> "well ""quoted"" csv string
>
> Bad csv string example:
> not well "quoted csv string
>
> The regular expression to match a quoted CSV string is
> (include quotes):
> "([^"]+|"")*"
>
> This gives trouble in the following XSLT (showing a
> non-matching string), which runs for a very long time
> (exponential performance chart)
>
> <xsl:variable name="ltr">
> before " after and some more text after </xsl:variable>
> <xsl:variable name="regex">"([^"]+|"")*"</xsl:variable>
>
> <xsl:analyze-string select="$ltr" regex="{$regex}">
> <xsl:matching-substring>
> <found><xsl:value-of select="." /></found>
> </xsl:matching-substring>
> <xsl:non-matching-substring>
> <not-found><xsl:value-of select="." /></not-found>
> </xsl:non-matching-substring>
> </xsl:analyze-string>
>
>
> The above example ran for > 140 minutes and still runs (it
> does not give any problems with matching strings, but does
> give problems with strings that have a large non-matching
> part with a unequal quote count in it). A simpler regex (but
> with less practicability) that shows the same behavior
> (quotes are again part of the regex): "([^"]+)*"
>
> I am under the impression that such behavior is not
> desirable, but I am unsure if there is anything in the specs
> that says something about how implementations should/must
> deal with this. As a comparison, I tried the example with
> Perl, which gave no noticeable performance troubles. I am
> aware of the fact that Saxon relies havily on Sun's Java
> regex engine, but I also know that it does a translation beforehand.
>
> Note that repeating a possible-empty match is dangerous and
> should be avoided. But in the example above, I use a
> subexpression that never matches an empty string (the +
> quantifier means 1 or more) and as such should not happen to
> fall into the eternal-loop problem.
>
> I use Saxon 8.8 with Java 1.5. I did not yet try 8.8.0.4
>
> Note that it may or may not be possible to optimize the regex
> in such a way that it fails earlier on the given string. But
> my problem is with the (imo) unpredictability of the
> performance (or even lockup) with any less-then-very-trivial regex.
>
>
> Cheers,
> -- Abel Braaksma
>
http://www.nuntia.nl>
> --------------------------------------------------------------
> -----------
> Take Surveys. Earn Cash. Influence the Future of IT Join
> SourceForge.net's Techsay panel and you'll get the chance to
> share your opinions on IT & business topics through brief
> surveys - and earn cash
>
http://www.techsay.com/default.php?page=join.php&p=sourceforge> &CID=DEVDEV
> _______________________________________________
> saxon-help mailing list
>
saxon-help@...
>
https://lists.sourceforge.net/lists/listinfo/saxon-helpTake Surveys. Earn Cash. Influence the Future of IT