regexp in javascript.lang (3rd try!)

View: New views
7 Messages — Rating Filter:   Alert me  

regexp in javascript.lang (3rd try!)

by gnombat :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Last time I suggested an ugly regexp definition for
javascript.lang to avoid matching /* */ comments:

http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html

On second thought (or third thought) I don't like this because it
matches cases where there are two division operators in a single
expression, such as:

document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');

Here is a proposed javascript.lang to fix these problems.  It does the
following:
* first check if the input matches a comment
* next check if it matches a division operator, which can occur only
after a number, an identifier, or certain symbols
* finally check if it matches a regular expression

Note that it is no longer based on the java.lang because the order of
the definitions is important.  (Hence, this would not work with
source-highlight 2.10, where the matching algorithm was different, but
does work with source-highlight 2.11.)

The disadvantages:
* it no longer reuses java.lang
* the division operator definitions are ugly
The advantages:
* it works in all possible cases (I hope)
* it simplifies the regexp definition

What do you think?

include "c_comment.lang"

keyword =
"abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with"

(symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))`
(number,normal,symbol) =
`(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))`
(normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))`

regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*'

include "number.lang"

include "c_string.lang"

include "symbols.lang"

cbracket = "{|}"

include "function.lang"



_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by Lorenzo Bettini :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

gnombat@... wrote:

> Last time I suggested an ugly regexp definition for
> javascript.lang to avoid matching /* */ comments:
>
> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html 
>
>
> On second thought (or third thought) I don't like this because it
> matches cases where there are two division operators in a single
> expression, such as:
>
> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>

mh... I'm not sure I understad: why does this happen?  The other / are
in strings delimited by '', aren't they?

--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net


_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by gnombat :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Lorenzo Bettini wrote:

> gnombat@... wrote:
>> Last time I suggested an ugly regexp definition for
>> javascript.lang to avoid matching /* */ comments:
>>
>> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html 
>>
>>
>> On second thought (or third thought) I don't like this because it
>> matches cases where there are two division operators in a single
>> expression, such as:
>>
>> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>>
>
> mh... I'm not sure I understad: why does this happen?  The other / are
> in strings delimited by '', aren't they?
>

document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
                                           ^
When at this point in the line, the "regexp" rule will match instead of
the "string" rule.  I.e., the "regexp" rule will match with an empty
prefix, while the "string" rule would have a nonempty prefix before the
string starts:

document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
                                           _________
                                            regexp

document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
                                           ------____________________
                                           prefix      string


_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by Lorenzo Bettini :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

gnombat@... wrote:

> Lorenzo Bettini wrote:
>> gnombat@... wrote:
>>> Last time I suggested an ugly regexp definition for
>>> javascript.lang to avoid matching /* */ comments:
>>>
>>> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html 
>>>
>>>
>>> On second thought (or third thought) I don't like this because it
>>> matches cases where there are two division operators in a single
>>> expression, such as:
>>>
>>> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>>>
>>
>> mh... I'm not sure I understad: why does this happen?  The other / are
>> in strings delimited by '', aren't they?
>>
>
> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>                                           ^
> When at this point in the line, the "regexp" rule will match instead of
> the "string" rule.  I.e., the "regexp" rule will match with an empty
> prefix, while the "string" rule would have a nonempty prefix before the
> string starts:
>
> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>                                           _________
>                                            regexp
>
> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>                                           ------____________________
>                                           prefix      string

yes, sure, I should have guessed this by myself :-)

I'll take a look at your solution, which seems to make sense

cheers
        Lorenzo

--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net


_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by Lorenzo Bettini :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

gnombat@... wrote:

> Last time I suggested an ugly regexp definition for
> javascript.lang to avoid matching /* */ comments:
>
> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html 
>
>
> On second thought (or third thought) I don't like this because it
> matches cases where there are two division operators in a single
> expression, such as:
>
> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>');
>
> Here is a proposed javascript.lang to fix these problems.  It does the
> following:
> * first check if the input matches a comment
> * next check if it matches a division operator, which can occur only
> after a number, an identifier, or certain symbols
> * finally check if it matches a regular expression
>
> Note that it is no longer based on the java.lang because the order of
> the definitions is important.  (Hence, this would not work with
> source-highlight 2.10, where the matching algorithm was different, but
> does work with source-highlight 2.11.)
>
> The disadvantages:
> * it no longer reuses java.lang
> * the division operator definitions are ugly
> The advantages:
> * it works in all possible cases (I hope)
> * it simplifies the regexp definition
>
> What do you think?
>
> include "c_comment.lang"
>
> keyword =
> "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with"
>
>
> (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))`
> (number,normal,symbol) =
> `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))`
>
> (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))`
>
> regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*'
>
> include "number.lang"
>
> include "c_string.lang"
>
> include "symbols.lang"
>
> cbracket = "{|}"
>
> include "function.lang"
>
Actually it works also this way, and it reuses most of java.lang (see
the attached file);

what do you think?

cheers
        Lorenzo

--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net

# Javascript lang definition file

# first check if the input matches a comment
include "c_comment.lang"

# next check if it matches a division operator, which can occur only
# after a number, an identifier, or certain symbols
(symbol,normal,symbol) =
        `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))`
(number,normal,symbol) =
        `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))`
(normal,symbol) =
        `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))`

# finally check if it matches a regular expression
regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*'

include "java.lang"

subst keyword = "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with"

_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by gnombat :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Lorenzo Bettini wrote:

> gnombat@... wrote:
>> include "c_comment.lang"
>>
>> keyword =
>> "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with"
>>
>>
>> (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))`
>> (number,normal,symbol) =
>> `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))`
>>
>> (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))`
>>
>> regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*'
>>
>> include "number.lang"
>>
>> include "c_string.lang"
>>
>> include "symbols.lang"
>>
>> cbracket = "{|}"
>>
>> include "function.lang"
>>
>
> Actually it works also this way, and it reuses most of java.lang (see
> the attached file);
>
> what do you think?

The keyword definition has to occur before the definitions with the
division operator in order to correctly match things like this:

/* unusual, but valid JavaScript */
throw /foo/;

/* this is more likely to occur in practice */
function f() {
   return /foo/;
}

/* or this */
function g(bar) {
   return /foo/.test(bar);
}


_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight

Re: regexp in javascript.lang (3rd try!)

by Lorenzo Bettini :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OK

so the only solution is the one you had proposed :-)

it's quite a pity not re-using java.lang, but that's not a big deal, is
it? ;-)

cheers
        Lorenzo

gnombat@... wrote:

> Lorenzo Bettini wrote:
>> gnombat@... wrote:
>>> include "c_comment.lang"
>>>
>>> keyword =
>>> "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with"
>>>
>>>
>>> (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))`
>>> (number,normal,symbol) =
>>> `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))`
>>>
>>> (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))`
>>>
>>> regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*'
>>>
>>> include "number.lang"
>>>
>>> include "c_string.lang"
>>>
>>> include "symbols.lang"
>>>
>>> cbracket = "{|}"
>>>
>>> include "function.lang"
>>>
>>
>> Actually it works also this way, and it reuses most of java.lang (see
>> the attached file);
>>
>> what do you think?
>
> The keyword definition has to occur before the definitions with the
> division operator in order to correctly match things like this:
>
> /* unusual, but valid JavaScript */
> throw /foo/;
>
> /* this is more likely to occur in practice */
> function f() {
>   return /foo/;
> }
>
> /* or this */
> function g(bar) {
>   return /foo/.test(bar);
> }
>
>
> _______________________________________________
> Help-source-highlight mailing list
> Help-source-highlight@...
> http://lists.gnu.org/mailman/listinfo/help-source-highlight


--
Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino
ICQ# lbetto, 16080134     (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
http://www.myspace.com/supertrouperabba
BLOGS: http://tronprog.blogspot.com  http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net



_______________________________________________
Help-source-highlight mailing list
Help-source-highlight@...
http://lists.gnu.org/mailman/listinfo/help-source-highlight