|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Order of definitions in source-highlight 2.10I just upgraded source-highlight to 2.10 and I am noticing some strange
behavior. Suppose we have the file foo.lang: symbol = "/" comment start "//" And the file test.foo: // foo The language definition is taken from the source-highlight manual, section 7.4: "Order of definitions". Note that the definitions are in the wrong order, according to the manual: "The first expression will always be matched first, and the second expression will never be matched." And yet: $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo <!-- Generator: GNU source-highlight 2.10 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite --> <pre><tt><span class="comment">// foo</span> </tt></pre> This was different with version 2.9: $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo <!-- Generator: GNU source-highlight 2.9 by Lorenzo Bettini http://www.lorenzobettini.it http://www.gnu.org/software/src-highlite --> <pre><tt><span class="symbol">//</span><span class="normal"> foo</span> </tt></pre> What has changed between version 2.9 and 2.10? _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10gnombat@... wrote:
> I just upgraded source-highlight to 2.10 and I am noticing some strange > behavior. > > Suppose we have the file foo.lang: > > symbol = "/" > comment start "//" > > And the file test.foo: > > // foo > > The language definition is taken from the source-highlight manual, > section 7.4: "Order of definitions". Note that the definitions are in > the wrong order, according to the manual: "The first expression will > always be matched first, and the second expression will never be > matched." And yet: > > $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo > <!-- Generator: GNU source-highlight 2.10 > by Lorenzo Bettini > http://www.lorenzobettini.it > http://www.gnu.org/software/src-highlite --> > <pre><tt><span class="comment">// foo</span> > </tt></pre> > > This was different with version 2.9: > > $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo > <!-- Generator: GNU source-highlight 2.9 > by Lorenzo Bettini > http://www.lorenzobettini.it > http://www.gnu.org/software/src-highlite --> > <pre><tt><span class="symbol">//</span><span class="normal"> foo</span> > </tt></pre> > > What has changed between version 2.9 and 2.10? Hi yes, the strategy for regular expression matching has changed: before it used to build a huge regular expression with many alternatives; however, this would make the handling of things such as backreferences a real nightmare (since the number of backreference would have to be updated, and the number of backreferences is limited to 9), in particular it required to split regular expressions and the code was really buggy. so in 2.10 I completely re-written the handling of regular expressions (http://www.gnu.org/software/src-highlite/source-highlight.html#fn-29); in particular, now each element has its own regular expression and the engine tests each expression and, as explained in 7.12: "As hinted at the beginning of Language Definitions, source-highlight uses the definitions in the language definition file to internally create, on-the-fly, regular expressions that are used to highlight the tokens of an input file. Here we provide some internal details that are crucial to understand how to write language definition files correctly29. First of all, each element definition, an highlighting rule is created by source-highlight (even if they correspond to the same language element); thus, each language definition file will correspond to a list of highlighting rules. For each line of the input file, source-highlight will try to match all these rules against the whole line (more formally, against the part of the line that has not been highlighted yet). It will not stop as soon as an highlighting rule matched, since there might be another rule that matches “better”. The strategy used by source-highlight is to select the first rule that matches the longest part of the text with the smallest prefix (i.e., the initial part of the line that contains no language element). (Thus, as already noted in the previous sections, the order of language definitions is crucial.) Then, it will continue to search for another matching rule for the remaining part of the line." So the case of / and // respects this rule, since // matches better than /. Of course, you're right: the example of 7.4 does not work anymore and I have to update the documentation with a better example! Sorry about that, and thanks for the bug report. Does this new strategy pose problems for your language definition? hope to hear from you soon cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10Lorenzo Bettini wrote:
> Does this new strategy pose problems for your language definition? I was using the old definition from function.lang, which can cause problems (as explained in section 7.12). Changing to the new definition seems to fix things. Thanks! _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10gnombat@... wrote:
> Lorenzo Bettini wrote: >> Does this new strategy pose problems for your language definition? > > I was using the old definition from function.lang, which can cause > problems (as explained in section 7.12). Changing to the new definition > seems to fix things. Thanks! > OK, happy to hear that :-) However, you're right about the documentation: it must be updated, since that example does not hold anymore. cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10gnombat@... wrote:
> I just upgraded source-highlight to 2.10 and I am noticing some strange > behavior. > > Suppose we have the file foo.lang: > > symbol = "/" > comment start "//" > > And the file test.foo: > > // foo > > The language definition is taken from the source-highlight manual, > section 7.4: "Order of definitions". Note that the definitions are in > the wrong order, according to the manual: "The first expression will > always be matched first, and the second expression will never be > matched." And yet: > > $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo > <!-- Generator: GNU source-highlight 2.10 > by Lorenzo Bettini > http://www.lorenzobettini.it > http://www.gnu.org/software/src-highlite --> > <pre><tt><span class="comment">// foo</span> > </tt></pre> > > This was different with version 2.9: > > $ source-highlight --lang-def=foo.lang -c foo.css --no-doc -i test.foo > <!-- Generator: GNU source-highlight 2.9 > by Lorenzo Bettini > http://www.lorenzobettini.it > http://www.gnu.org/software/src-highlite --> > <pre><tt><span class="symbol">//</span><span class="normal"> foo</span> > </tt></pre> > > What has changed between version 2.9 and 2.10? as I had already written in the previous email, the matching strategy changed between 2.9 and 2.10: "The strategy used by source-highlight is to select the first rule that matches the longest part of the text with the smallest prefix (i.e., the initial part of the line that contains no language element). (Thus, as already noted in the previous sections, the order of language definitions is crucial.)" however, when working on the documentation, I actually realized that this strategy is too involved and a little bit confusing, not to mention that it has a lot of overhead, since it tests ALL the rules in a state. Then, I realized that basically the rule that should be selected is the one with the smallest prefix, but we could stop testing rules as soon as we find a rule that matches and whose prefix (i.e., the part of the string before the matched one) contains only spaces (or it's empty). I think this is also the strategy used by standard regular expression engines, or at least, this one seems to be enough for programming languages. Thus, for instance, if I have i = null; if I match null as a keyword, its prefix is "i = " and I should not stop testing other rules, since otherwise I would not test the symbol rule (that is defined later). While, if I have if (exp) as soon as I match "if" as a keyword, since its prefix is " ", I can stop testing other rules (this way, I don't even risk to match "if(exp)" as a function call (note that with the previous strategy this would match better since it matches more characters). I think this is the right strategy and it brings the example in the documentation to work again as described. I've uploaded a temporary version that uses this strategy (and it also performs faster as expected) here: http://gdn.dsi.unifi.it/~bettini/source-highlight-2.10.1.tar.gz I'd really appreciate to get some feedback, especially do you think that this new strategy makes sense? There's also a new test in the tests directory: test_string_stop.lang: keyword = "if|class" type = 'int' comment delim "/*" "*/" # thus this won't catch "/* */ /" as a regexp, # since comment elem definition comes first regexp = '/.*/.*/' # this won't match if ( ) as a function, # since keyword elem definition comes first function = '([[:alpha:]]|_)[[:word:]]*[[:blank:]]*\(*[[:blank:]]*\)' # the following order is conceptually wrong, # since "//" won't be highlighted as a comment, but as two symbols symbol = "/" comment start "//" which can be used with the input file test_string_stop.java, which produces the attached output, which is the one expected with the new strategy. cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net /* comment */ final /
/my/regexp/
if ( ) {
class;
myfun ( );
}
int i;
int ( );
// comment? or two symbols?
_______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10Lorenzo Bettini wrote:
> I've uploaded a temporary version that uses this strategy (and it also > performs faster as expected) here: > > http://gdn.dsi.unifi.it/~bettini/source-highlight-2.10.1.tar.gz > > I'd really appreciate to get some feedback, especially do you think that > this new strategy makes sense? On a totally unrelated note, I notice that this new version has some changes to the search for Boost in the configure script - which is good, because I've never been able to get it to compile without passing a bunch of options to configure :) But shouldn't there be a reference to BOOST_CPPFLAGS in src/Makefile.am (and src/lib/Makefile.am)? _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10gnombat@... wrote:
> Lorenzo Bettini wrote: >> I've uploaded a temporary version that uses this strategy (and it also >> performs faster as expected) here: >> >> http://gdn.dsi.unifi.it/~bettini/source-highlight-2.10.1.tar.gz >> >> I'd really appreciate to get some feedback, especially do you think >> that this new strategy makes sense? > > On a totally unrelated note, I notice that this new version has some > changes to the search for Boost in the configure script - which is good, > because I've never been able to get it to compile without passing a yes, I didn't mention that :-) finally the autoconf macro for searching for that libary improved and can find that library in a smarter way :-) http://autoconf-archive.cryp.to/ax_boost_regex.html > bunch of options to configure :) But shouldn't there be a reference to > BOOST_CPPFLAGS in src/Makefile.am (and src/lib/Makefile.am)? yes, probably you're right, I should add that! The documentation found here http://randspringer.de/boost/ucl-sbs.html did not mention that, but I think I should add BOOST_CPPFLAGS to AM_CPPFLAGS. What about the other changes? Do you think they make sense? cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10Lorenzo Bettini wrote:
> What about the other changes? > Do you think they make sense? It seems to work well. If my understanding is correct, the end result should be the same as it was with versions 2.9 and earlier (although the implementation is now quite different). _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: Order of definitions in source-highlight 2.10gnombat@... wrote:
> Lorenzo Bettini wrote: >> What about the other changes? >> Do you think they make sense? > > It seems to work well. If my understanding is correct, the end result > should be the same as it was with versions 2.9 and earlier (although the > implementation is now quite different). > Actually, it might not be exactly the same (but I've also updated many .lang files). Previously, it all relied on a big regular expression with many alternatives, and so it relied on the regular expression machine to select the most appropriate one, which I don't think it's the same of the current one. However, the current one (which is a manual implementation of the selection of the right regular expression) is targeted to programming languages (the fact that the prefix only contains spaces). cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
| Free embeddable forum powered by Nabble | Forum Help |