|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
regexp in javascript.lang (3rd try!)Last time I suggested an ugly regexp definition for
javascript.lang to avoid matching /* */ comments: http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html On second thought (or third thought) I don't like this because it matches cases where there are two division operators in a single expression, such as: document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); Here is a proposed javascript.lang to fix these problems. It does the following: * first check if the input matches a comment * next check if it matches a division operator, which can occur only after a number, an identifier, or certain symbols * finally check if it matches a regular expression Note that it is no longer based on the java.lang because the order of the definitions is important. (Hence, this would not work with source-highlight 2.10, where the matching algorithm was different, but does work with source-highlight 2.11.) The disadvantages: * it no longer reuses java.lang * the division operator definitions are ugly The advantages: * it works in all possible cases (I hope) * it simplifies the regexp definition What do you think? include "c_comment.lang" keyword = "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with" (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))` (number,normal,symbol) = `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))` (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))` regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*' include "number.lang" include "c_string.lang" include "symbols.lang" cbracket = "{|}" include "function.lang" _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)gnombat@... wrote:
> Last time I suggested an ugly regexp definition for > javascript.lang to avoid matching /* */ comments: > > http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html > > > On second thought (or third thought) I don't like this because it > matches cases where there are two division operators in a single > expression, such as: > > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); > mh... I'm not sure I understad: why does this happen? The other / are in strings delimited by '', aren't they? -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)Lorenzo Bettini wrote:
> gnombat@... wrote: >> Last time I suggested an ugly regexp definition for >> javascript.lang to avoid matching /* */ comments: >> >> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html >> >> >> On second thought (or third thought) I don't like this because it >> matches cases where there are two division operators in a single >> expression, such as: >> >> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); >> > > mh... I'm not sure I understad: why does this happen? The other / are > in strings delimited by '', aren't they? > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); ^ When at this point in the line, the "regexp" rule will match instead of the "string" rule. I.e., the "regexp" rule will match with an empty prefix, while the "string" rule would have a nonempty prefix before the string starts: document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); _________ regexp document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); ------____________________ prefix string _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)gnombat@... wrote:
> Lorenzo Bettini wrote: >> gnombat@... wrote: >>> Last time I suggested an ugly regexp definition for >>> javascript.lang to avoid matching /* */ comments: >>> >>> http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html >>> >>> >>> On second thought (or third thought) I don't like this because it >>> matches cases where there are two division operators in a single >>> expression, such as: >>> >>> document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); >>> >> >> mh... I'm not sure I understad: why does this happen? The other / are >> in strings delimited by '', aren't they? >> > > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); > ^ > When at this point in the line, the "regexp" rule will match instead of > the "string" rule. I.e., the "regexp" rule will match with an empty > prefix, while the "string" rule would have a nonempty prefix before the > string starts: > > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); > _________ > regexp > > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); > ------____________________ > prefix string yes, sure, I should have guessed this by myself :-) I'll take a look at your solution, which seems to make sense cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)gnombat@... wrote:
> Last time I suggested an ugly regexp definition for > javascript.lang to avoid matching /* */ comments: > > http://lists.gnu.org/archive/html/help-source-highlight/2008-09/msg00000.html > > > On second thought (or third thought) I don't like this because it > matches cases where there are two division operators in a single > expression, such as: > > document.write('<table><tr><td>25% = '+(25/100)+'</td></tr></table>'); > > Here is a proposed javascript.lang to fix these problems. It does the > following: > * first check if the input matches a comment > * next check if it matches a division operator, which can occur only > after a number, an identifier, or certain symbols > * finally check if it matches a regular expression > > Note that it is no longer based on the java.lang because the order of > the definitions is important. (Hence, this would not work with > source-highlight 2.10, where the matching algorithm was different, but > does work with source-highlight 2.11.) > > The disadvantages: > * it no longer reuses java.lang > * the division operator definitions are ugly > The advantages: > * it works in all possible cases (I hope) > * it simplifies the regexp definition > > What do you think? > > include "c_comment.lang" > > keyword = > "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with" > > > (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))` > (number,normal,symbol) = > `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))` > > (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))` > > regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*' > > include "number.lang" > > include "c_string.lang" > > include "symbols.lang" > > cbracket = "{|}" > > include "function.lang" > the attached file); what do you think? cheers Lorenzo -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net # Javascript lang definition file # first check if the input matches a comment include "c_comment.lang" # next check if it matches a division operator, which can occur only # after a number, an identifier, or certain symbols (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))` (number,normal,symbol) = `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))` (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))` # finally check if it matches a regular expression regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*' include "java.lang" subst keyword = "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with" _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)Lorenzo Bettini wrote:
> gnombat@... wrote: >> include "c_comment.lang" >> >> keyword = >> "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with" >> >> >> (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))` >> (number,normal,symbol) = >> `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))` >> >> (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))` >> >> regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*' >> >> include "number.lang" >> >> include "c_string.lang" >> >> include "symbols.lang" >> >> cbracket = "{|}" >> >> include "function.lang" >> > > Actually it works also this way, and it reuses most of java.lang (see > the attached file); > > what do you think? The keyword definition has to occur before the definitions with the division operator in order to correctly match things like this: /* unusual, but valid JavaScript */ throw /foo/; /* this is more likely to occur in practice */ function f() { return /foo/; } /* or this */ function g(bar) { return /foo/.test(bar); } _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
|
|
Re: regexp in javascript.lang (3rd try!)OK
so the only solution is the one you had proposed :-) it's quite a pity not re-using java.lang, but that's not a big deal, is it? ;-) cheers Lorenzo gnombat@... wrote: > Lorenzo Bettini wrote: >> gnombat@... wrote: >>> include "c_comment.lang" >>> >>> keyword = >>> "abstract|break|case|catch|class|const|continue|debugger|default|delete|do|else|enum|export|extends|false|final|finally|for|function|goto|if|implements|in|instanceof|interface|native|new|null|private|protected|prototype|public|return|static|super|switch|synchronized|throw|throws|this|transient|true|try|typeof|var|volatile|while|with" >>> >>> >>> (symbol,normal,symbol) = `(\+\+|--|\)|\])(\s*)(/=?(?![*/]))` >>> (number,normal,symbol) = >>> `(0x[[:xdigit:]]+|(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?)(\s*)(/(?![*/]))` >>> >>> (normal,symbol) = `([[:alpha:]$_][[:alnum:]$_]*\s*)(/=?(?![*/]))` >>> >>> regexp = '/(\\.|[^*\\/])(\\.|[^\\/])*/[gim]*' >>> >>> include "number.lang" >>> >>> include "c_string.lang" >>> >>> include "symbols.lang" >>> >>> cbracket = "{|}" >>> >>> include "function.lang" >>> >> >> Actually it works also this way, and it reuses most of java.lang (see >> the attached file); >> >> what do you think? > > The keyword definition has to occur before the definitions with the > division operator in order to correctly match things like this: > > /* unusual, but valid JavaScript */ > throw /foo/; > > /* this is more likely to occur in practice */ > function f() { > return /foo/; > } > > /* or this */ > function g(bar) { > return /foo/.test(bar); > } > > > _______________________________________________ > Help-source-highlight mailing list > Help-source-highlight@... > http://lists.gnu.org/mailman/listinfo/help-source-highlight -- Lorenzo Bettini, PhD in Computer Science, DI, Univ. Torino ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com http://www.myspace.com/supertrouperabba BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net _______________________________________________ Help-source-highlight mailing list Help-source-highlight@... http://lists.gnu.org/mailman/listinfo/help-source-highlight |
| Free embeddable forum powered by Nabble | Forum Help |