|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
SableCC 4-beta.1
Hi all,
It is with great pleasure that I announce the release of the first beta version of SableCC 4. It is far from complete, but I wanted to let you start playing with some of the new features. This version generates lexers (and related token classes) using the new and powerful lexical expressions based engine. It also supports the new flexible syntax for defining tokens. I'll discuss all of these things below. First, let me warn you of a few limitations:
OK, so what's so neat about SableCC 4 lexers? There's nothing like an example to show you. Language demo;
Lexer
// some named expressions
letter = upper_case | 'a'..'z';
upper_case = 'A'..'Z'; // defined after letter
digit = '0'..'9';
blank = (' ' | #xA | #13)+;
number = digit+;
while = 'while'; // some keyword
identifier = letter (letter | digit)*;
for = 'for'; // another keyword (defined after identifier)
ip_num = digit^(1..3);
ip_address = (ip_num Separator '.')^4;
zero = '0'+ | 'zero';
Token
for, number, identifier, while, 'else', ip_address, zero;
Ignored
blank;
Priority
zero > number;
zero > identifier;
So, here are some of the features above:
These two operators can be used anywhere within an expression, not only at the top level, and they do deliver the expected semantics: the longest token is always matched (without taking lookahead into account), and scanning is done in linear time O(input length). How can they be used? Here are some examples: Language demo2; Lexer // line comment line_comment = '//' (Shortest (Any* Look (eol | '' Look End))) (eol | '' Look End); // c comment c_comment = '/*' (Any* - '*/') '*/';The lexical subtraction operator '-' is not a regular expression "difference" operator. e.g.
abc*/defg...this expression would match 'abc*', as it is the longest matching string! In general, we would like to match 'abc'; that's what the lexical subtraction operator does. You can use the operators as you want: dummy = ('a' Look 'a'* 'b')*;
This expression will match 0 or more 'a', where each 'a'
is followed by 0 or more 'a' and then 'b'.There's also a 'Look Not' to match when not followed by an expression. dummy2 = 'b' Look Not End; // b, when not followed by EOF Oh yes... SableCC automatically generates a small Test.java application to test your lexer (in the language_languageName package), to save you some typing. So, there it is. Please play with it and report back your comments and suggestions to this list. Have fun! Etienne -- Etienne M. Gagnon, Ph.D. SableCC: http://sablecc.org _______________________________________________ SableCC-Discussion mailing list SableCC-Discussion@... http://lists.sablecc.org/listinfo/sablecc-discussion |
| Free embeddable forum powered by Nabble | Forum Help |