|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
LOOKAHEAD problem (long)Dear JavaCC Users, I thought that I understood JavaCC LOOKAHEAD pretty well, but lately I've been stumped by a problem. I have a LOOKAHEAD at a decision point, and it seems to be failing when (as far as I can see) it should be succeeding. I wonder if someone on this list could help me spot or better trace the problem. Background: I'm using JavaCC 4.2 on OS X. I've been using JavaCC successfully for years. My new programming language (dubbed Kleene) has a 'return' statement that optionally contains an expression(), which might be a regular expression (regexp()), an arithmetic expression (numexp()), or several other types of expression, e.g. return ; // void return (no expression) return 2 + 2 ; // return with a numexp() return a*b+(c|d|e|f)? ; // return with a regexp() The JavaCC production for return_statement (with a printout for tracing) looks like this: void return_statement() #return_statement: { System.out.println("***** In return_statement() *****") ;} { <RETURN_RW> ( expression() <SEMICOLON> | <SEMICOLON> ) } There are various kinds of expression, so the production for expression() represents a decision point and contains a number of LOOKAHEADs. The key LOOKAHEAD in this production is the first one: void expression(): { System.out.println("***** In expression() *****") ; } { LOOKAHEAD(regexp()) // if a regexp() can be parsed, then commit the parser to parsing a regexp() regexp() { System.out.println("***** After regexp() yy *****") ; } | LOOKAHEAD(numexp()) numexp() | LOOKAHEAD(net_func_exp()) net_func_exp() { System.out.println("***** After net_func_exp() yy *****") ; } | LOOKAHEAD(num_func_exp()) num_func_exp() | LOOKAHEAD(net_arr_exp()) net_arr_exp() | LOOKAHEAD(num_arr_exp()) num_arr_exp() | LOOKAHEAD(net_func_func_exp()) net_func_func_exp() | LOOKAHEAD(num_func_func_exp()) num_func_func_exp() | LOOKAHEAD(net_arr_func_exp()) net_arr_func_exp() | LOOKAHEAD(num_arr_func_exp()) num_arr_func_exp() } The first LOOKAHEAD(regexp()) in expression() is a syntactic lookahead, saying "if lookahead determines that a full regular expression _can_ be parsed from this point, then commit the parser to parsing a regular expression". Regular expressions are rather complicated in Kleene, but they generally parse correctly. One type of primary regular expression is a 'net_func_call' (i.e. a function call that returns a finite-state network) of the form net_func_exp() arg_list() e.g. $&myFunctionName($myArg1, $myArg2) where $&myFunctionName is a net_func_id(), the name of a function, which is the simplest kind of net_func_exp(). Here ($myArg1, $myArg2) is an arg_list(). Here is the production for primary_regexp(), a kind of regexp(), with net_func_call() at the end: void primary_regexp(): { System.out.println("***** In primary_regexp() *****") ; } { lit_char() // e.g. a b c | multichar_symbol() // e.g. '[Noun]' a single symbol with a mulcharacter print name | net_id() // e.g. $mynet a variable with a finite-state-network value // $>Foo can appear only in rrprod_definition | LOOKAHEAD({ getToken(1).kind == RRPROD_ID && parsing_rrprod_def == true }) rrprod_id() | any() // "dot" (.) match any char | <LPAREN> regexp() <RPAREN> | double_quoted_string() | epsilon() // U+03F5 GREEK LUNATE EPSILON SYMBOL // to represent the empty string | char_union() // [a-z] | complement_char_union() // [^a-z] | net_func_call() { System.out.println("***** After net_func_call() *****") ; } // $&foo(args) $&lambda(params){block}(args) } And here is the production for net_func_call void net_func_call() #net_func_call: { System.out.println("***** In net_func_call() *****") ; } { net_func_exp() { System.out.println("***** Found net_func_exp() *****") ; } arg_list() { System.out.println("***** Found arg_list() *****") ; } } void net_func_exp() #net_func_exp : { System.out.println("***** In net_func_exp() *****") ; } { net_func_id() { System.out.println("***** After net_func_id() xx *****") ; } // e.g. $&myFuncName | net_func_lambda_exp() | net_func_func_call() } With the following Kleene script input, this first LOOKAHEAD(regexp()) should be succeeding after the second 'return' keyword, but LOOKAHEAD(regexp()), seen at the top of the expression() production above, is failing: // Start of script $&f1($req, $opt=a) { return $req $opt ; } // This first statement defines a function, named $&f1, that returns a finite-state network. // This first statement parses perfectly. The expression() after the 'return' keyword is correctly determined, via LOOKAHEAD(regexp()), // to be a regular expression, the parser is committed to parsing a regexp(), and all works perfectly. // But in the following second function definition (for a function named $&f2) $&f2($a1, $a2) { return $&f1($a1, $opt=$a2) ; } // LOOKAHEAD(regexp()) should succeed after the 'return' keyword, but doesn't. // End of Script In the second function definition, the expression after 'return' $&f1($a1, $opt=$a2) is a regexp(), being a net_func_call(), consisting of 1. a net_func_exp(), here a net_func_id() which is '$&f1', and 2. an arg_list As best I can tell, the LOOKAHEAD(regexp()) in expression() is failing where it is expected (by me) to succeed, and a later LOOKAHEAD(net_func_exp()) is succeeding, as shown in the **** comments below void expression(): { System.out.println("***** In expression() *****") ; } { // ***** this first LOOKAHEAD is failing for $&f1($a1, $opt=$a2) LOOKAHEAD(regexp()) // if a regexp() can be parsed, then commit the parser to parsing a regexp() regexp() { System.out.println("***** After regexp() yy *****") ; } | LOOKAHEAD(numexp()) numexp() // and this later LOOKAHEAD(net_func_exp()) is succeeding for the prefix $&f1, // which is a net_func_id(), which is indeed a kind of net_func_exp() | LOOKAHEAD(net_func_exp()) net_func_exp() { System.out.println("***** After net_func_exp() yy *****") ; } | LOOKAHEAD(num_func_exp()) num_func_exp() | LOOKAHEAD(net_arr_exp()) net_arr_exp() | LOOKAHEAD(num_arr_exp()) num_arr_exp() | LOOKAHEAD(net_func_func_exp()) net_func_func_exp() | LOOKAHEAD(num_func_func_exp()) num_func_func_exp() | LOOKAHEAD(net_arr_func_exp()) net_arr_func_exp() | LOOKAHEAD(num_arr_func_exp()) num_arr_func_exp() } As best I can tell, LOOKAHEAD(regexp()) in expression() is failing for $&f1($a1, $opt=$a2) because it fails to match the arg_list ($a1, $opt=$a2). Here's the production for arg_list() void arg_list() #arg_list : { System.out.println("***** In arg_list() *****") ; } { <LPAREN> ( LOOKAHEAD(1) ( LOOKAHEAD( { getToken(2).kind != EQUAL_SIGN } ) positional_args() ( <COMMA_OP> named_args() )? | named_args() ) )? <RPAREN> } void positional_args() #positional_args : { System.out.println("***** In positional_args() *****") ; } { expression() ( LOOKAHEAD( { getToken(1).kind == COMMA_OP && getToken(3).kind != EQUAL_SIGN } ) <COMMA_OP> expression() )* } void named_args() #named_args : { System.out.println("***** In named_args() *****") ; } { id_with_assignment() ( LOOKAHEAD(1) <COMMA_OP> id_with_assignment() )* } The arg_list() ($a1, $opt=$a2) consists of a positional_args() (containing one expression()) and a named_args() containing one id_with_assignment, being $opt=$a2. My thanks and congratulations to anyone who got this far. Question: Am I doing anything obvious wrong? Is there some obvious reason why LOOKAHEAD(regexp()) is failing after the 'return' keyword in return $&f1($a1, $opt=$a2) ; ?? Many thanks, Ken ****************************** Kenneth R. Beesley, D.Phil. P.O. Box 540475 North Salt Lake, UT 84054 USA --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |