Dear JavaCC Users,
I thought that I understood JavaCC LOOKAHEAD pretty well, but lately
I've been stumped by a problem. I have a
LOOKAHEAD at a decision point, and it seems to be failing when (as far
as I can see) it should be succeeding.
I wonder if someone on this list could help me spot or better trace
the problem.
Background:
I'm using JavaCC 4.2 on OS X. I've been using JavaCC successfully for
years.
My new programming language (dubbed Kleene) has a 'return' statement
that optionally contains an expression(), which might be a regular
expression (regexp()), an arithmetic expression (numexp()), or several
other types of expression, e.g.
return ; // void return (no
expression)
return 2 + 2 ; // return with a numexp()
return a*b+(c|d|e|f)? ; // return with a regexp()
The JavaCC production for return_statement (with a printout for
tracing) looks like this:
void return_statement() #return_statement:
{ System.out.println("***** In return_statement() *****") ;}
{
<RETURN_RW> ( expression() <SEMICOLON>
| <SEMICOLON>
)
}
There are various kinds of expression, so the production for
expression() represents a decision
point and contains a number of LOOKAHEADs. The key LOOKAHEAD in this
production is the
first one:
void expression(): { System.out.println("***** In expression()
*****") ; }
{
LOOKAHEAD(regexp()) // if a regexp() can be parsed,
then commit the parser to parsing a regexp()
regexp() { System.out.println("***** After regexp() yy *****") ; }
| LOOKAHEAD(numexp())
numexp()
| LOOKAHEAD(net_func_exp())
net_func_exp() { System.out.println("***** After net_func_exp()
yy *****") ; }
| LOOKAHEAD(num_func_exp())
num_func_exp()
| LOOKAHEAD(net_arr_exp())
net_arr_exp()
| LOOKAHEAD(num_arr_exp())
num_arr_exp()
| LOOKAHEAD(net_func_func_exp())
net_func_func_exp()
| LOOKAHEAD(num_func_func_exp())
num_func_func_exp()
| LOOKAHEAD(net_arr_func_exp())
net_arr_func_exp()
| LOOKAHEAD(num_arr_func_exp())
num_arr_func_exp()
}
The first LOOKAHEAD(regexp()) in expression() is a syntactic
lookahead, saying "if lookahead determines that a full regular
expression
_can_ be parsed from this point, then commit the parser to parsing a
regular expression". Regular expressions are
rather complicated in Kleene, but they generally parse correctly. One
type of primary regular expression is a 'net_func_call'
(i.e. a function call that returns a finite-state network) of the form
net_func_exp() arg_list()
e.g.
$&myFunctionName($myArg1, $myArg2)
where $&myFunctionName is a net_func_id(), the name of a function,
which is the simplest kind of net_func_exp().
Here ($myArg1, $myArg2) is an arg_list().
Here is the production for primary_regexp(), a kind of regexp(), with
net_func_call() at the end:
void primary_regexp(): { System.out.println("***** In primary_regexp()
*****") ; }
{
lit_char() // e.g. a b c
| multichar_symbol() // e.g. '[Noun]' a single symbol
with a mulcharacter print name
| net_id() // e.g. $mynet a
variable with a finite-state-network value
// $>Foo can appear only in rrprod_definition
| LOOKAHEAD({ getToken(1).kind == RRPROD_ID && parsing_rrprod_def ==
true })
rrprod_id()
| any() // "dot" (.) match any char
| <LPAREN> regexp() <RPAREN>
| double_quoted_string()
| epsilon() // U+03F5 GREEK LUNATE EPSILON SYMBOL
// to represent the empty string
| char_union() // [a-z]
| complement_char_union() // [^a-z]
| net_func_call() { System.out.println("***** After net_func_call()
*****") ; }
// $&foo(args) $&lambda(params){block}(args)
}
And here is the production for net_func_call
void net_func_call() #net_func_call: { System.out.println("***** In
net_func_call() *****") ; }
{
net_func_exp() { System.out.println("***** Found net_func_exp()
*****") ; } arg_list() {
System.out.println("***** Found arg_list() *****") ; }
}
void net_func_exp() #net_func_exp : { System.out.println("***** In
net_func_exp() *****") ; }
{
net_func_id() { System.out.println("***** After net_func_id() xx
*****") ; } // e.g. $&myFuncName
| net_func_lambda_exp()
| net_func_func_call()
}
With the following Kleene script input, this first LOOKAHEAD(regexp())
should
be succeeding after the second 'return' keyword, but
LOOKAHEAD(regexp()), seen at the top of the
expression() production above, is failing:
// Start of script
$&f1($req, $opt=a) { return $req $opt ; }
// This first statement defines a function, named $&f1, that returns a
finite-state network.
// This first statement parses perfectly. The expression() after the
'return' keyword is correctly determined, via LOOKAHEAD(regexp()),
// to be a regular expression, the parser is committed to parsing a
regexp(), and all works perfectly.
// But in the following second function definition (for a function
named $&f2)
$&f2($a1, $a2) { return $&f1($a1, $opt=$a2) ; }
// LOOKAHEAD(regexp()) should succeed after the 'return' keyword, but
doesn't.
// End of Script
In the second function definition, the expression after 'return'
$&f1($a1, $opt=$a2)
is a regexp(), being a net_func_call(), consisting of
1. a net_func_exp(), here a net_func_id() which is '$&f1', and
2. an arg_list
As best I can tell, the LOOKAHEAD(regexp()) in expression() is failing
where it is expected (by me) to succeed, and a later
LOOKAHEAD(net_func_exp()) is succeeding, as shown in
the **** comments below
void expression(): { System.out.println("***** In expression()
*****") ; }
{
// ***** this first LOOKAHEAD is failing for $&f1($a1, $opt=$a2)
LOOKAHEAD(regexp()) // if a regexp() can be parsed,
then commit the parser to parsing a regexp()
regexp() { System.out.println("***** After regexp() yy *****") ; }
| LOOKAHEAD(numexp())
numexp()
// and this later LOOKAHEAD(net_func_exp()) is succeeding for the
prefix $&f1,
// which is a net_func_id(), which is indeed a kind of
net_func_exp()
| LOOKAHEAD(net_func_exp())
net_func_exp() { System.out.println("***** After net_func_exp()
yy *****") ; }
| LOOKAHEAD(num_func_exp())
num_func_exp()
| LOOKAHEAD(net_arr_exp())
net_arr_exp()
| LOOKAHEAD(num_arr_exp())
num_arr_exp()
| LOOKAHEAD(net_func_func_exp())
net_func_func_exp()
| LOOKAHEAD(num_func_func_exp())
num_func_func_exp()
| LOOKAHEAD(net_arr_func_exp())
net_arr_func_exp()
| LOOKAHEAD(num_arr_func_exp())
num_arr_func_exp()
}
As best I can tell, LOOKAHEAD(regexp()) in expression() is failing for
$&f1($a1, $opt=$a2)
because it fails to match the arg_list ($a1, $opt=$a2). Here's the
production for arg_list()
void arg_list() #arg_list : { System.out.println("***** In arg_list()
*****") ; }
{
<LPAREN> ( LOOKAHEAD(1)
(
LOOKAHEAD( { getToken(2).kind != EQUAL_SIGN } )
positional_args() ( <COMMA_OP> named_args() )?
| named_args()
)
)?
<RPAREN>
}
void positional_args() #positional_args : { System.out.println("*****
In positional_args() *****") ; }
{
expression() ( LOOKAHEAD( { getToken(1).kind == COMMA_OP &&
getToken(3).kind != EQUAL_SIGN } )
<COMMA_OP> expression()
)*
}
void named_args() #named_args : { System.out.println("***** In
named_args() *****") ; }
{
id_with_assignment() ( LOOKAHEAD(1) <COMMA_OP> id_with_assignment() )*
}
The arg_list() ($a1, $opt=$a2) consists of a positional_args()
(containing one expression()) and a named_args() containing one
id_with_assignment, being $opt=$a2.
My thanks and congratulations to anyone who got this far.
Question: Am I doing anything obvious wrong? Is there some obvious
reason why LOOKAHEAD(regexp()) is failing after the 'return'
keyword in
return $&f1($a1, $opt=$a2) ;
??
Many thanks,
Ken
******************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054 USA
---------------------------------------------------------------------
To unsubscribe, e-mail:
users-unsubscribe@...
For additional commands, e-mail:
users-help@...