LOOKAHEAD problem (long)

View: New views
1 Messages — Rating Filter:   Alert me  

LOOKAHEAD problem (long)

by Kenneth R Beesley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Dear JavaCC Users,

I thought that I understood JavaCC LOOKAHEAD pretty well, but lately  
I've been stumped by a problem.  I have a
LOOKAHEAD at a decision point, and it seems to be failing when (as far  
as I can see) it should be succeeding.
I wonder if someone on this list could help me spot or better trace  
the problem.

Background:

I'm using JavaCC 4.2 on OS X.  I've been using JavaCC successfully for  
years.

My new programming language (dubbed Kleene) has a 'return' statement  
that optionally contains an expression(), which might be a regular
expression (regexp()), an arithmetic expression (numexp()), or several  
other types of expression, e.g.

return ;                                    // void return (no  
expression)
return 2 + 2 ;   //  return with a numexp()
return a*b+(c|d|e|f)?   ; //   return with a regexp()

The JavaCC production for return_statement (with a printout for  
tracing) looks like this:

void return_statement() #return_statement:    
{ System.out.println("***** In return_statement() *****") ;}
{
     <RETURN_RW> ( expression() <SEMICOLON>
                       | <SEMICOLON>
                                   )
}

There are various kinds of expression, so the production for  
expression() represents a decision
point and contains a number of LOOKAHEADs.  The key LOOKAHEAD in this  
production is the
first one:

void expression(): { System.out.println("***** In expression()  
*****") ; }
{

     LOOKAHEAD(regexp())            // if a regexp() can be parsed,  
then commit the parser to parsing a regexp()
     regexp() { System.out.println("***** After regexp() yy *****") ; }
|   LOOKAHEAD(numexp())
     numexp()
|   LOOKAHEAD(net_func_exp())
     net_func_exp()  { System.out.println("***** After net_func_exp()  
yy *****") ; }
|   LOOKAHEAD(num_func_exp())
     num_func_exp()
|   LOOKAHEAD(net_arr_exp())
     net_arr_exp()
|   LOOKAHEAD(num_arr_exp())
     num_arr_exp()
|   LOOKAHEAD(net_func_func_exp())
     net_func_func_exp()
|   LOOKAHEAD(num_func_func_exp())
     num_func_func_exp()
|   LOOKAHEAD(net_arr_func_exp())
     net_arr_func_exp()
|   LOOKAHEAD(num_arr_func_exp())
     num_arr_func_exp()
}

The first LOOKAHEAD(regexp()) in expression() is a syntactic  
lookahead, saying "if lookahead determines that a full regular  
expression
_can_ be parsed from this point, then commit the parser to parsing a  
regular expression".   Regular expressions are
rather complicated in Kleene, but they generally parse correctly.  One  
type of primary regular expression is a 'net_func_call'
(i.e. a function call that returns a finite-state network) of the form

net_func_exp() arg_list()

e.g.

$&myFunctionName($myArg1, $myArg2)

where $&myFunctionName is a net_func_id(), the name of a function,  
which is the simplest kind of net_func_exp().
Here ($myArg1, $myArg2) is an arg_list().

Here is the production for primary_regexp(), a kind of regexp(), with  
net_func_call() at the end:


void primary_regexp(): { System.out.println("***** In primary_regexp()  
*****") ; }

{
       
   lit_char()                               // e.g.  a b c
| multichar_symbol()            // e.g.  '[Noun]'   a single symbol  
with a mulcharacter print name
| net_id()                                // e.g.   $mynet      a  
variable with a finite-state-network value

// $>Foo can appear only in rrprod_definition
| LOOKAHEAD({ getToken(1).kind == RRPROD_ID && parsing_rrprod_def ==  
true })
   rrprod_id()

| any()                       // "dot" (.) match any char
| <LPAREN> regexp() <RPAREN>
| double_quoted_string()
| epsilon()                   // U+03F5 GREEK LUNATE EPSILON SYMBOL
                               //    to represent the empty string
| char_union() // [a-z]
| complement_char_union() // [^a-z]

| net_func_call()    { System.out.println("***** After net_func_call()  
*****") ; }
// $&foo(args)      $&lambda(params){block}(args)

}

And here is the production for net_func_call

void net_func_call() #net_func_call: { System.out.println("***** In  
net_func_call() *****") ; }
{
    net_func_exp()  { System.out.println("***** Found net_func_exp()  
*****") ; } arg_list() {
                System.out.println("***** Found arg_list() *****") ; }
}

void net_func_exp() #net_func_exp : { System.out.println("***** In  
net_func_exp() *****") ; }
{
          net_func_id()  { System.out.println("***** After net_func_id() xx  
*****") ; }  // e.g.  $&myFuncName
| net_func_lambda_exp()
| net_func_func_call()
}

With the following Kleene script input, this first LOOKAHEAD(regexp())  
should
be succeeding after the second 'return' keyword, but  
LOOKAHEAD(regexp()), seen at the top of the
expression() production above, is failing:

// Start of script

$&f1($req, $opt=a) { return $req $opt ; }

// This first statement defines a function, named $&f1, that returns a  
finite-state network.
// This first statement parses perfectly.   The expression() after the  
'return' keyword is correctly determined, via LOOKAHEAD(regexp()),
// to be a regular expression, the parser is committed to parsing a  
regexp(), and all works perfectly.

// But in the following second function definition (for a function  
named $&f2)

$&f2($a1, $a2) { return $&f1($a1, $opt=$a2) ; }

//  LOOKAHEAD(regexp()) should succeed after the 'return' keyword, but  
doesn't.
//  End of Script

In the second function definition, the expression after 'return'

$&f1($a1, $opt=$a2)

is a regexp(), being a net_func_call(), consisting of

1.  a net_func_exp(), here a net_func_id() which is '$&f1', and
2.  an arg_list

As best I can tell, the LOOKAHEAD(regexp()) in expression() is failing
where it is expected (by me) to succeed, and a later  
LOOKAHEAD(net_func_exp()) is succeeding, as shown in
the **** comments below


void expression(): { System.out.println("***** In expression()  
*****") ; }
{

        // *****  this first LOOKAHEAD is failing   for  $&f1($a1, $opt=$a2)

     LOOKAHEAD(regexp())            // if a regexp() can be parsed,  
then commit the parser to parsing a regexp()
     regexp() { System.out.println("***** After regexp() yy *****") ; }
|   LOOKAHEAD(numexp())
     numexp()

        // and this later LOOKAHEAD(net_func_exp())  is succeeding for the  
prefix  $&f1,
        // which is a net_func_id(), which is indeed a kind of  
net_func_exp()

|   LOOKAHEAD(net_func_exp())
     net_func_exp()  { System.out.println("***** After net_func_exp()  
yy *****") ; }
|   LOOKAHEAD(num_func_exp())
     num_func_exp()
|   LOOKAHEAD(net_arr_exp())
     net_arr_exp()
|   LOOKAHEAD(num_arr_exp())
     num_arr_exp()
|   LOOKAHEAD(net_func_func_exp())
     net_func_func_exp()
|   LOOKAHEAD(num_func_func_exp())
     num_func_func_exp()
|   LOOKAHEAD(net_arr_func_exp())
     net_arr_func_exp()
|   LOOKAHEAD(num_arr_func_exp())
     num_arr_func_exp()
}

As best I can tell, LOOKAHEAD(regexp()) in expression() is failing for

$&f1($a1, $opt=$a2)

because it fails to match the arg_list  ($a1, $opt=$a2).  Here's the  
production for arg_list()

void arg_list() #arg_list :  { System.out.println("***** In arg_list()  
*****") ;  }
{
        <LPAREN> ( LOOKAHEAD(1)
                           (
                             LOOKAHEAD( { getToken(2).kind != EQUAL_SIGN } )
                                  positional_args() ( <COMMA_OP> named_args() )?

                           |  named_args()
                           )
                         )?
        <RPAREN>
}

void positional_args() #positional_args : { System.out.println("*****  
In positional_args() *****") ; }
{
        expression() ( LOOKAHEAD( { getToken(1).kind == COMMA_OP &&  
getToken(3).kind != EQUAL_SIGN } )
                       <COMMA_OP> expression()
                                 )*
}

void named_args() #named_args : { System.out.println("***** In  
named_args() *****") ;  }
{
                id_with_assignment() ( LOOKAHEAD(1) <COMMA_OP> id_with_assignment() )*
}

The arg_list()    ($a1, $opt=$a2)   consists of a positional_args()  
(containing one expression()) and a named_args() containing one
id_with_assignment, being  $opt=$a2.

My thanks and congratulations to anyone who got this far.

Question:  Am I doing anything obvious wrong?  Is there some obvious  
reason why LOOKAHEAD(regexp()) is failing after the 'return'
keyword in

return  $&f1($a1, $opt=$a2)   ;

??

Many thanks,

Ken


******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054  USA






---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...