SableCC 4 and old grammar code base

View: New views
7 Messages — Rating Filter:   Alert me  

SableCC 4 and old grammar code base

by Jean-Baptiste BRIAUD -- Novlog :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

One problem I found with SableCC, the only one in fact, was the poor  
grammar base availlable.
Finding a javascript grammar was really hard compared to other lexer/
parser/walker generator.

If SableCC 4 change the grammar syntax, what is the future for that  
already small grammar base ?
Will SableCC 4 be retrocompatible with old grammar ?
Will a converter (of course done with SableCC) from old grammar to new  
grammar be available ?
Other idea ?

Thanks !

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

Re: SableCC 4 and old grammar code base

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The SableCC 4 syntax is quite close to the previous syntax. I needed to
change the syntax, a little, to fix problems in the old one, and to
allow for new features.

As an illustration, I just converted, in less than 5 minutes, the
SableCC 3 MiniBasic grammar to SableCC 4 syntax. I have attached both
grammars so that you can look at the differences.

In the attached SableCC 4 version:
1- Both Helpers and Tokens sections are merged into a single Lexer
section. Ignored is a subsection of Lexer.
2- Numeric character literals use a # (e.g. #10).
3- No more sets. [[32..127] - [cr + lf]] becomes (#32..#127) - (cr | lf) .
4- The Productions section is now called Parser.
5- alternative and element names are written as {alt_name:} and
[elem_name:].

As you see, a direct conversion should be simple enough. It could
probably be automated, except for grammars that use lexer states. But,
the resulting SableCC 4 grammar is not as elegant as SableCC 4 allows.

I'll send another message with a more elegant MiniBasic grammar, using
the new syntax.

Etienne

Jean-Baptiste BRIAUD -- Novlog a écrit :

> One problem I found with SableCC, the only one in fact, was the poor
> grammar base availlable.
> Finding a javascript grammar was really hard compared to other
> lexer/parser/walker generator.
>
> If SableCC 4 change the grammar syntax, what is the future for that
> already small grammar base ?
> Will SableCC 4 be retrocompatible with old grammar ?
> Will a converter (of course done with SableCC) from old grammar to new
> grammar be available ?
> Other idea ?
--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org


Package org.sablecc.minibasic;

Helpers
  letter = ['A'..'Z'];
  digit = ['0'..'9'];
  cr = 13;
  lf = 10;
  not_cr_lf = [[32..127] - [cr + lf]];

Tokens
  if = 'IF';
  then = 'THEN';
  else = 'ELSE';
  endif = 'ENDIF';

  for = 'FOR';
  to = 'TO';
  next = 'NEXT';

  read = 'READ';
  print = 'PRINT';
  println = 'PRINTLN';

  assign = ':=';

  less_than = '<';
  greater_than = '>';
  equal = '=';

  plus = '+';
  minus = '-';
  mult = '*';
  div = '/';
  mod = 'MOD';

  l_par = '(';
  r_par = ')';

  identifier = letter (letter | digit)*;
  number = digit+;
  string = '"' [not_cr_lf - '"']* '"';

  new_line = cr | lf | cr lf;

  blank = ' '*;

Ignored Tokens
  blank;

Productions
  statements =
    {list}  statement statements |
    {empty} ;

  statement =
    {if}         if condition then [nl1]:new_line
                   statements
                   optional_else
                 endif [nl2]:new_line |

    {for}        for identifier assign [from_exp]:expression to [to_exp]:expression [nl1]:new_line
                   statements
                 next [nl2]:new_line |

    {read}       read identifier new_line |

    {print_exp}  print expression new_line |
    {print_str}  print string new_line |
    {println}    println new_line |

    {assignment} identifier assign expression new_line;

  optional_else =
    {else}  else new_line
              statements |
    {empty} ;
 
  condition =
    {less_than}    [left]:expression less_than    [right]:expression |
    {greater_than} [left]:expression greater_than [right]:expression |
    {equal}        [left]:expression equal        [right]:expression;

  expression =
    {value} value |
    {plus}  [left]:value plus  [right]:value |
    {minus} [left]:value minus [right]:value |
    {mult}  [left]:value mult  [right]:value |
    {div}   [left]:value div   [right]:value |
    {mod}   [left]:value mod   [right]:value;

  value =
    {constant}   number |
    {identifier} identifier |
    {expression} l_par expression r_par;

Language minibasic;

Lexer

  letter = 'A'..'Z';
  digit = '0'..'9';
  cr = #13;
  lf = #10;
  not_cr_lf = (#32..#127) - (cr | lf);

  if = 'IF';
  then = 'THEN';
  else = 'ELSE';
  endif = 'ENDIF';

  for = 'FOR';
  to = 'TO';
  next = 'NEXT';

  read = 'READ';
  print = 'PRINT';
  println = 'PRINTLN';

  assign = ':=';

  less_than = '<';
  greater_than = '>';
  equal = '=';

  plus = '+';
  minus = '-';
  mult = '*';
  div = '/';
  mod = 'MOD';

  l_par = '(';
  r_par = ')';

  identifier = letter (letter | digit)*;
  number = digit+;
  string = '"' (not_cr_lf - '"')* '"';

  new_line = cr | lf | cr lf;

  blank = ' '*;

 Ignored
  blank;

Parser

  statements =
    {list:}  statement statements |
    {empty:} ;

  statement =
    {if:}         if condition then [nl1:]new_line
                   statements
                   optional_else
                 endif [nl2:]new_line |

    {for:}        for identifier assign [from_exp:]expression to [to_exp:]expression [nl1:]new_line
                   statements
                 next [nl2:]new_line |

    {read:}       read identifier new_line |

    {print_exp:}  print expression new_line |
    {print_str:}  print string new_line |
    {println:}    println new_line |

    {assignment:} identifier assign expression new_line;

  optional_else =
    {else:}  else new_line
              statements |
    {empty:} ;
 
  condition =
    {less_than:}    [left:]expression less_than    [right:]expression |
    {greater_than:} [left:]expression greater_than [right:]expression |
    {equal:}        [left:]expression equal        [right:]expression;

  expression =
    {value:} value |
    {plus:}  [left:]value plus  [right:]value |
    {minus:} [left:]value minus [right:]value |
    {mult:}  [left:]value mult  [right:]value |
    {div:}   [left:]value div   [right:]value |
    {mod:}   [left:]value mod   [right:]value;

  value =
    {constant:}   number |
    {identifier:} identifier |
    {expression:} l_par expression r_par;


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

Re: SableCC 4 and old grammar code base

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's a more elegant version of the MiniBasic grammar.

Note that I didn't change the language. I could have used new features
to get more powerful expressions, for example. But, that was not the
objective here. The objective was only to assure you that the new syntax
is close enough to the old one as to make the transition very, very easy
for current SableCC users and for converting old grammars, and that it
is actually much better.

Have fun!

Etienne

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org


Language minibasic;

Lexer

  letter = 'A'..'Z';
  digit = '0'..'9';
  cr = #13;
  lf = #10;
  not_cr_lf = (#32..#127) - (cr | lf);

  identifier = letter (letter | digit)*;
  number = digit+;
  string = '"' (not_cr_lf - '"')* '"';

  new_line = cr | lf | cr lf;

  blank = ' '+;

 Ignored
  blank;

Parser

  statements =
    statement*;

  statement =
    {if:} 'IF' condition 'THEN' new_line
            statements
            else_part?
          'ENDIF' new_line |

    {for:} 'FOR' identifier ':=' [from_exp:]exp 'TO' [to_exp:]exp new_line
                   statements
           'NEXT' new_line |

    {read:} 'READ' identifier new_line |

    {print_exp:} 'PRINT' exp new_line |
    {print_str:} 'PRINT' string new_line |
    {println:}   'PRINTLN' new_line |

    {assignment:} identifier ':=' exp new_line;

  else_part =
    'ELSE' new_line
      statements;
 
  condition =
    {less_than:}    [left_exp:]exp '<' [right_exp:]exp |
    {greater_than:} [left_exp:]exp '>' [right_exp:]exp |
    {equal:}        [left_exp:]exp '=' [right_exp:]exp;

  exp =
    {value:} value |
    {plus:}  [left_exp:]value '+'   [right_exp:]value |
    {minus:} [left_exp:]value '-'   [right_exp:]value |
    {mult:}  [left_exp:]value '*'   [right_exp:]value |
    {div:}   [left_exp:]value '/'   [right_exp:]value |
    {mod:}   [left_exp:]value 'MOD' [right_exp:]value;

  value =
    {constant:}   number |
    {identifier:} identifier |
    {expression:} '(' exp ')';


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

Re: SableCC 4 and old grammar code base

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just a note to those that have participated to the Unicode identifiers
discussion. I have not forgotten about it. I just changed the proposed
solution. Read below.

The old SableCC approach for identifiers and keywords was simply too
useful to throw away. Using Lexer, instead of $lexer is more visually
attractive, for one thing. But mostly, the camel case conversion of old
identifiers was just too useful. Being able to convert some_name to
SomeName without problems (e.g. ambiguous upper case or no concept of
lower/upper case in some scripts) is very convenient.

So, I decided to retain the old "pure ASCII" identifiers (with the old
rules: no upper case, etc.). But, I also allow for rich identifiers,
made up of Unicode characters. A rich identifier is enclosed within "<"
and ">", and it may not contain the underscore "_" character.

This way, I can get unambiguous conversions and I am also able to
concatenate identifiers to create new names. e.g.

  prod_name = {alt_name:} ... | ...;

Generates: PProdName, AProdName_AltName  (yes, different from SableCC3,
but it eliminates name conflicts).

  <Gagnon> = {<Étienne>:} ... | ...;

Generates: P_Gagnon, A_Gagnon__Étienne. In other words, rich identifiers
are converted by adding a "_" prefix.

This way, we (hopefully) get to please everybody. We make things easy
for normal uses, and possible for complex uses.

Etienne

Etienne M. Gagnon wrote:
> 1- Both Helpers and Tokens sections are merged into a single Lexer
> section. Ignored is a subsection of Lexer.
> [...]

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org




_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

RE: SableCC 4 and old grammar code base

by Bergmann, Seth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Wow, that's interesting!  That means I'll need to update my Compiler Design textbook to use sablecc version 4, but I'll also need to maintain the existing book for people who are still using sablecc version 3.   It's a good thing the whole book is available on the web, so that users can select which ever version they wish.



Seth D. Bergmann             Associate Professor
Computer Science             bergmann@...
Rowan University               856-256-4500  ext. 3197
Glassboro NJ 08028          Fax: 856-256-4741
-----Original Message-----
From: sablecc-discussion-bounces+bergmann=rowan.edu@... [mailto:sablecc-discussion-bounces+bergmann=rowan.edu@...] On Behalf Of Etienne M. Gagnon
Sent: Friday, March 06, 2009 10:22 AM
To: Discussion mailing list for the SableCC project
Subject: Re: SableCC 4 and old grammar code base

The SableCC 4 syntax is quite close to the previous syntax. I needed to change the syntax, a little, to fix problems in the old one, and to allow for new features.

As an illustration, I just converted, in less than 5 minutes, the SableCC 3 MiniBasic grammar to SableCC 4 syntax. I have attached both grammars so that you can look at the differences.

In the attached SableCC 4 version:
1- Both Helpers and Tokens sections are merged into a single Lexer section. Ignored is a subsection of Lexer.
2- Numeric character literals use a # (e.g. #10).
3- No more sets. [[32..127] - [cr + lf]] becomes (#32..#127) - (cr | lf) .
4- The Productions section is now called Parser.
5- alternative and element names are written as {alt_name:} and [elem_name:].

As you see, a direct conversion should be simple enough. It could probably be automated, except for grammars that use lexer states. But, the resulting SableCC 4 grammar is not as elegant as SableCC 4 allows.

I'll send another message with a more elegant MiniBasic grammar, using the new syntax.

Etienne

Jean-Baptiste BRIAUD -- Novlog a écrit :

> One problem I found with SableCC, the only one in fact, was the poor
> grammar base availlable.
> Finding a javascript grammar was really hard compared to other
> lexer/parser/walker generator.
>
> If SableCC 4 change the grammar syntax, what is the future for that
> already small grammar base ?
> Will SableCC 4 be retrocompatible with old grammar ?
> Will a converter (of course done with SableCC) from old grammar to new
> grammar be available ?
> Other idea ?

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

Re: SableCC 4 and old grammar code base

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Seth,

You should definitely add a link to your book on
http://sablecc.org/wiki/DocumentationPage . You should probably create a
"Books" section.

Have fun!

Etienne

Bergmann, Seth wrote:
> Wow, that's interesting!  That means I'll need to update my Compiler Design textbook to use sablecc version 4, but I'll also need to maintain the existing book for people who are still using sablecc version 3.   It's a good thing the whole book is available on the web, so that users can select which ever version they wish.
>  

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org




_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

RE: SableCC 4 and old grammar code base

by Bergmann, Seth :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Great idea, Etienne!  

I also included Andrew Appel's book, which uses SableCC, on the wiki
page.

Sincerely,


Seth D. Bergmann             Associate Professor
Computer Science             bergmann@...
Rowan University               856-256-4500  ext. 3197
Glassboro NJ 08028          Fax: 856-256-4741
-----Original Message-----
From: sablecc-discussion-bounces+bergmann=rowan.edu@...
[mailto:sablecc-discussion-bounces+bergmann=rowan.edu@...]
On Behalf Of Etienne M. Gagnon
Sent: Friday, March 06, 2009 3:49 PM
To: Discussion mailing list for the SableCC project
Subject: Re: SableCC 4 and old grammar code base

Hi Seth,

You should definitely add a link to your book on
http://sablecc.org/wiki/DocumentationPage . You should probably create a
"Books" section.

Have fun!

Etienne

Bergmann, Seth wrote:
> Wow, that's interesting!  That means I'll need to update my Compiler
Design textbook to use sablecc version 4, but I'll also need to maintain
the existing book for people who are still using sablecc version 3.
It's a good thing the whole book is available on the web, so that users
can select which ever version they wish.
>  

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org



_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion