problem where I can't ignoring text / tokens

View: New views
6 Messages — Rating Filter:   Alert me  

problem where I can't ignoring text / tokens

by twashing :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Hey there, 

I'm having a problem where I want the same behaviour from the 'xpath_chars' token as from the 'word' token. Both 'word' and 'xpath_chars' are used in my grammar file (see fig. 0). 

GOOD -> In the Productions below, when I use a 'word' token in between quotes, all the characters are ignored (see fig. 1 - or at least given no meaning in the 'freetext' state). 

BAD -> But when I try that same behaviour with my xpath_chars token, the parser fails (see fig. 2). But it does work when I only use 'word' tokens (see fig. 3). 

QUESTION -> Is there a way I can ignore any character in between backquotes ' ` '? I just can't get this one to work (grammar file attached). 


Tokens
...
{bkeeping -> freetext, freetext -> bkeeping} quote = ( '"' | ''' ); 
{bkeeping -> freetext, freetext -> bkeeping} backquote = '`'; 
word = ( lowercase | uppercase | dash | underscore | digit | dot )+; 
xpath_chars = ( ats | forwardslash | colon_helper | left_bracket | right_bracket | lsquare_bracket | rsquare_bracket | equals_helper | double_quote | single_quote ); 
xmlns = 'xmlns'; 
decl_xml = 'xml';
decl_dtd = 'DOCTYPE'; 
eoll = (cr | lf | cr lf)?;
Productions 
... 

fig. 0


create ( <journal id="thing-thing" /> ); // [ok] word is used in in between the double quotes "
fig. 1



load ( `/system[@id='main.system']` ); // [x] xpath goes in between the back quotes `

ERROR [Thread-3] (Bkell.java:161) - [11,9] expecting: '`', word, xpath chars
com.interrupt.bookkeeping.cc.parser.ParserException: [11,9] expecting: '`', word, xpath chars
at com.interrupt.bookkeeping.cc.parser.Parser.parse(Parser.java:998)
at com.interrupt.bookkeeping.cc.bkell.Bkell.run(Bkell.java:135)
at java.lang.Thread.run(Thread.java:613)
fig. 2 


load ( `` ); 
load ( `asdf` ); 
fig. 3


Thanks in advance
Tim



Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

bkeeping.cc (24K) Download Attachment

Re: problem where I can't ignoring text / tokens

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Timothy,

The easiest way to investigate lexer problems is to use a debug lexer such as proposed in:
 http://lists.sablecc.org/pipermail/sablecc-discussion/msg00311.html

I've attached a minimal Main.java file to test your grammar with an anonymous debug lexer. Running it on your example reveals the problem: the "/" is matched to a "TFslash" instead of a "TXpathChars".

Do not forget that, with SableCC 2 & 3, if no state is specified before a token, then this token is matched in all states. Also, if two tokens match a string, the one that appears first wins.

Have fun!

Etienne

Timothy Washington wrote:
load ( `/system[@id='main.system']` ); // [x] xpath goes in between the back quotes `

ERROR [Thread-3] (Bkell.java:161) - [11,9] expecting: '`', word, xpath chars
com.interrupt.bookkeeping.cc.parser.ParserException: [11,9] expecting: '`', word, xpath chars

-- 
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org

import com.interrupt.bookkeeping.cc.parser.*;
import com.interrupt.bookkeeping.cc.lexer.*;
import com.interrupt.bookkeeping.cc.node.*;
import com.interrupt.bookkeeping.cc.analysis.*;

import java.io.*;

public class Main {

    public static void main(
            String[] args)
            throws Exception {

        new Parser(new Lexer(new PushbackReader(
                new InputStreamReader(System.in), 1024)) {

            protected void filter() {

                System.out.println(token.getClass() + ", state : " + state.id()
                        + ", text : [" + token.getText() + "]");
            }
        }).parse();
    }
}


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

Re: problem where I can't ignoring text / tokens

by twashing :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Yessss, thanks very much. I can now mix the languages like in the statements below ( 'mylang', xml & xpath ). It turns out that the Lexer debugger gave me the info I needed to sort through the token mismatches. 

login ( user -username root -password password ); 
create ( <journal xmlns='com/interrupt/bookkeeping/journal' id='new.journal' /> ); 
load ( `/` ); 
load ( `/system[ @id='main.system' and date='01/01/2009'  ]/groups[@id='main.groups']` ); 
load ( <journal xmlns='com/interrupt/bookkeeping/journal' id='new.journal' /> ); 

commit ( 
(`/system[@id='main.system']/groups[@id='main.groups']/group[@id='webkell']/bookkeeping[@id='main.bookkeeping']/journals[@id='main.journals']`) 
<journal xmlns='com/interrupt/bookkeeping/journal' id='new.journal' /> ); 

Cheers
Tim




From: Etienne M. Gagnon <egagnon@...>
To: Discussion mailing list for the SableCC project <sablecc-discussion@...>
Sent: Wednesday, February 11, 2009 4:40:13 PM
Subject: Re: problem where I can't ignoring text / tokens

Hi Timothy,

The easiest way to investigate lexer problems is to use a debug lexer such as proposed in:
 http://lists.sablecc.org/pipermail/sablecc-discussion/msg00311.html

I've attached a minimal Main.java file to test your grammar with an anonymous debug lexer. Running it on your example reveals the problem: the "/" is matched to a "TFslash" instead of a "TXpathChars".

Do not forget that, with SableCC 2 & 3, if no state is specified before a token, then this token is matched in all states. Also, if two tokens match a string, the one that appears first wins.

Have fun!

Etienne

Timothy Washington wrote:
load ( `/system[@id='main.system']` ); // [x] xpath goes in between the back quotes `

ERROR [Thread-3] (Bkell.java:161) - [11,9] expecting: '`', word, xpath chars
com.interrupt.bookkeeping.cc.parser.ParserException: [11,9] expecting: '`', word, xpath chars

-- 
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org


Yahoo! Canada Toolbar : Search from anywhere on the web and bookmark your favourite sites. Download it now!


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

Shift reduce problem?

by Christopher Van Kirk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi there.

I'm trying to implement the null coalescing operator in a little expression language we have, and I'm running into a shift-reduce error that I'm not understanding.

The grammar in question is below, shortened a bit to focus on the problem at hand. We're still on the old 3.2 platform, if that helps.

Error msg follows:

shift/reduce conflict in state [stack: PCNullCoalesceExpr *] on TTQuestion in {
        [ PCExpr = PCNullCoalesceExpr * TTQuestion PCExpr TTColon PCExpr ] (shift),
        [ PCExpr = PCNullCoalesceExpr * ] followed by TTQuestion (reduce)
}
        at org.sablecc.sablecc.GenParser.caseStart(GenParser.java:227)
        at org.sablecc.sablecc.node.Start.apply(Start.java:33)
        at org.sablecc.sablecc.SableCC.processGrammar(SableCC.java:391)
        at org.sablecc.sablecc.SableCC.processGrammar(SableCC.java:280)
        at org.sablecc.sablecc.SableCC.main(SableCC.java:220)


and the grammar:

--------------------------

Package Expressions;

Helpers

unicode_input_character  =  [ 0 .. 0xffff ];

tab=0x0009;
lf=0x000a;
cr=0x000d;
eol = [[cr + lf] + [cr + lf]];
white = [[' ' + tab] + eol];

input_character  =  [ unicode_input_character - [ cr + lf ] ];

escape_sequence  =  '\b' | '\t' | '\n' | '\f' | '\r' | '\"' | '\' | ''' | '\\';

string_character = [ input_character - [ '"' + '\' ] ] | escape_sequence;

single_character  =  [ input_character - [ ''' + '\' ] ]  ;

alpha = [['A'..'Z'] + ['a'..'z']];
numeral = ['0'..'9'];
alphanumeric = [numeral + alpha];

// override each letter of the alphabet to ensure case insensitivity for keywords.

a = 'a' | 'A';
b = 'b' | 'B';
c = 'c' | 'C';
d = 'd' | 'D';
e = 'e' | 'E';
f = 'f' | 'F';
g = 'g' | 'G';
h = 'h' | 'H';
i = 'i' | 'I';
j = 'j' | 'J';
k = 'k' | 'K';
l = 'l' | 'L';
m = 'm' | 'M';
n = 'n' | 'N';
o = 'o' | 'O';
p = 'p' | 'P';
q = 'q' | 'Q';
r = 'r' | 'R';
s = 's' | 'S';
t = 't' | 'T';
u = 'u' | 'U';
v = 'v' | 'V';
w = 'w' | 'W';
x = 'x' | 'X';
y = 'y' | 'Y';
z = 'z' | 'Z';

States

base;

Tokens

white = white+;

t_double_question = '??';
t_question = '?';
t_colon = ':';
t_shim = s h i m;

Ignored Tokens

white, line_comment, multiline_comment;

Productions

c_expr {-> a_expr }
        = [expr]:c_conditional_expr
                                                        {-> New a_expr( expr.a_conditional_expr ) }
;

c_conditional_expr {-> a_conditional_expr }
        = { q_passthrough } [expr]:c_null_coalesce_expr
                                                        {-> New a_conditional_expr.passthrough( expr.a_null_coalesce_expr ) }
        | { q_conditional } [if_expr]:c_null_coalesce_expr t_question [true_expr]:c_expr t_colon [false_expr]:c_expr
                                                        {-> New a_conditional_expr.conditional( if_expr.a_null_coalesce_expr, true_expr.a_expr, false_expr.a_expr ) }
;

c_null_coalesce_expr {-> a_null_coalesce_expr }
        = { q_passthrough } [expr]:c_conditional_or_expr
                                                        {-> New a_null_coalesce_expr.passthrough( expr.a_conditional_or_expr ) }
        | { q_null_coalesce } [left]:c_conditional_or_expr t_double_question [right]:c_expr
                                                        {-> New a_null_coalesce_expr.coalesce( left.a_conditional_or_expr, right.a_expr ) }
;

c_conditional_or_expr {-> a_conditional_or_expr }
        = { temp } t_shim
                                                        {-> New a_conditional_or_expr.shim( t_abs ) }
;

Abstract Syntax Tree

a_expr
        = [expr]:a_conditional_expr
;

a_conditional_expr
        = { passthrough } [expr]:a_null_coalesce_expr
        | { conditional } [if_expr]:a_null_coalesce_expr [true_expr]:a_expr [false_expr]:a_expr
;

a_null_coalesce_expr
        = { passthrough } [expr]:a_conditional_or_expr
        | { coalesce } [left]:a_conditional_or_expr [right]:a_expr
;

a_conditional_or_expr
        = { shim } t_shim
;


_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

How to write an unambiguous expression grammar [was: Shift reduce problem?]

by Etienne M. Gagnon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Chris,

The conflict is due to the well known expression grammar ambiguity.

Here's the idea. A typical expression grammar looks like:

exp =
 {add} exp plus exp |
 {sub} exp minus exp |
 {mul} exp star exp |
 {div} exp slash exp |
 {num} number;

This grammar is ambiguous for 2 reasons:
1- operator precedence : 5 + 2 * 3 usually means 5 + (2 * 3 ), not (5 +
2) * 3
2- associativity : 5 - 3 - 2 usually means (5 - 3) - 2, not 5 - (3 - 2)

The grammar allows for all these interpretations (resulting in different
syntax trees for the same input text). This is obviously undesirable! We
don't want the parser to randomly select one interpretation: 5 + 2 * 3
== 11 or 21 depending on some random parsing choice...

To solve the ambiguity, operator precedence and associativity must be
first determined. So, let's decide it for the above grammar.

Priority  // highest to lowest precedence
  Left mul, div;
  Left add, sub;

The trick, now, is to rewrite the grammar using one production per
precedence level, starting with the LOWEST priority. Left associativity
corresponds to left recursion, and right associativity to right recursion.

exp =  // lowest priority, left associative
  {add} exp plus factor |
  {sub} exp minus factor |
  {simple} factor;

factor = // left associative
  {mul} factor star term |
  {div} factor div term |
  {simple} term;

term =
  {num} number |
  {par} l_par exp r_par;

Notes:
1- You must add a "simple" alternative at each precedence level for
expressions that do not use current precedence operators. E.g. 5 * 2 is
an expression without addition nor subtraction.
2- The atomic production "term" may not be left nor right recursive.
3- Usually, the parenthesized expression is added as a term, for
expressiveness. If I want to express (5 + 2) * 3, I need parentheses.
Conveniently, "l_par exp r_par" is neither left nor right recursive. So,
it is a valid term.
4- The leftmost and rightmost element of an alternative may only be the
current production (recursion) or the next-level production.

Your grammar breaks rule "4-", so you are getting a conflict message
related to some resulting ambiguity.

By applying the above approach, the conflict should disappear (e.g.
decide on precedence and associativity of t_question and
t_double_question and probably add a parenthesized expression term, in
addition to t_shim, for expressiveness).

Have fun!

Etienne

Christopher Van Kirk wrote:

> Hi there.
>
> I'm trying to implement the null coalescing operator in a little expression language we have, and I'm running into a shift-reduce error that I'm not understanding.
>
> The grammar in question is below, shortened a bit to focus on the problem at hand. We're still on the old 3.2 platform, if that helps.
>
> Error msg follows:
>
> shift/reduce conflict in state [stack: PCNullCoalesceExpr *] on TTQuestion in {
> [ PCExpr = PCNullCoalesceExpr * TTQuestion PCExpr TTColon PCExpr ] (shift),
> [ PCExpr = PCNullCoalesceExpr * ] followed by TTQuestion (reduce)
> }
> [...]
--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org




_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion

signature.asc (265 bytes) Download Attachment

Re: How to write an unambiguous expression grammar [was: Shift reduce problem?]

by Christopher Van Kirk :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Thanks Etienne, that solved it. The null coalesce production looped back to expr when it should have recursed onto itself. Much appreciated as usual!


--- On Thu, 2/26/09, Etienne M. Gagnon <egagnon@...> wrote:

> From: Etienne M. Gagnon <egagnon@...>
> Subject: How to write an unambiguous expression grammar [was: Shift reduce problem?]
> To: "Discussion mailing list for the SableCC project" <sablecc-discussion@...>
> Date: Thursday, February 26, 2009, 11:14 PM
> Hi Chris,
>
> The conflict is due to the well known expression grammar
> ambiguity.
>
> Here's the idea. A typical expression grammar looks
> like:
>
> exp =
>  {add} exp plus exp |
>  {sub} exp minus exp |
>  {mul} exp star exp |
>  {div} exp slash exp |
>  {num} number;
>
> This grammar is ambiguous for 2 reasons:
> 1- operator precedence : 5 + 2 * 3 usually means 5 + (2 * 3
> ), not (5 +
> 2) * 3
> 2- associativity : 5 - 3 - 2 usually means (5 - 3) - 2, not
> 5 - (3 - 2)
>
> The grammar allows for all these interpretations (resulting
> in different
> syntax trees for the same input text). This is obviously
> undesirable! We
> don't want the parser to randomly select one
> interpretation: 5 + 2 * 3
> == 11 or 21 depending on some random parsing choice...
>
> To solve the ambiguity, operator precedence and
> associativity must be
> first determined. So, let's decide it for the above
> grammar.
>
> Priority  // highest to lowest precedence
>   Left mul, div;
>   Left add, sub;
>
> The trick, now, is to rewrite the grammar using one
> production per
> precedence level, starting with the LOWEST priority. Left
> associativity
> corresponds to left recursion, and right associativity to
> right recursion.
>
> exp =  // lowest priority, left associative
>   {add} exp plus factor |
>   {sub} exp minus factor |
>   {simple} factor;
>
> factor = // left associative
>   {mul} factor star term |
>   {div} factor div term |
>   {simple} term;
>
> term =
>   {num} number |
>   {par} l_par exp r_par;
>
> Notes:
> 1- You must add a "simple" alternative at each
> precedence level for
> expressions that do not use current precedence operators.
> E.g. 5 * 2 is
> an expression without addition nor subtraction.
> 2- The atomic production "term" may not be left
> nor right recursive.
> 3- Usually, the parenthesized expression is added as a
> term, for
> expressiveness. If I want to express (5 + 2) * 3, I need
> parentheses.
> Conveniently, "l_par exp r_par" is neither left
> nor right recursive. So,
> it is a valid term.
> 4- The leftmost and rightmost element of an alternative may
> only be the
> current production (recursion) or the next-level
> production.
>
> Your grammar breaks rule "4-", so you are getting
> a conflict message
> related to some resulting ambiguity.
>
> By applying the above approach, the conflict should
> disappear (e.g.
> decide on precedence and associativity of t_question and
> t_double_question and probably add a parenthesized
> expression term, in
> addition to t_shim, for expressiveness).
>
> Have fun!
>
> Etienne
>
> Christopher Van Kirk wrote:
> > Hi there.
> >
> > I'm trying to implement the null coalescing
> operator in a little expression language we have, and
> I'm running into a shift-reduce error that I'm not
> understanding.
> >
> > The grammar in question is below, shortened a bit to
> focus on the problem at hand. We're still on the old 3.2
> platform, if that helps.
> >
> > Error msg follows:
> >
> > shift/reduce conflict in state [stack:
> PCNullCoalesceExpr *] on TTQuestion in {
> > [ PCExpr = PCNullCoalesceExpr * TTQuestion PCExpr
> TTColon PCExpr ] (shift),
> > [ PCExpr = PCNullCoalesceExpr * ] followed by
> TTQuestion (reduce)
> > }
> > [...]
>
> --
> Etienne M. Gagnon, Ph.D.
> SableCC:                                          
> http://sablecc.org
>
>
> _______________________________________________
> SableCC-Discussion mailing list
> SableCC-Discussion@...
> http://lists.sablecc.org/listinfo/sablecc-discussion

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@...
http://lists.sablecc.org/listinfo/sablecc-discussion