|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
Lexical state switching issueHello, all,
I have encountered an issue which I could not solve using my limited JavaCC knowledge. Real grammar is a bit too complicated for inclusion, but I managed to repeat my issue using simplified example. My parser tries to parse comma separated expressions consisting of identifiers and numbers. Numbers can have "K" character in the end to indicate that values are expressed in thousands. After matching a number I switch to AFTER_CONSTANT state and if number is followed by "K" all is fine (we switch back to DEFAULT and continue parsing). Problem appears when number is not followed by "K". Is there a cleaner solution than adding <DEFAULT, AFTER_CONSTANT> before all tokens that can follow number? Here's the grammar I'm using: options { STATIC=false; } PARSER_BEGIN(Parser) package com.test; public class Parser { } PARSER_END(Parser) SimpleNode Start() #Start : {} { Expression() <EOF> { return jjtThis; } } void Expression() #void : {} { Value() (<COMMA> Value() )* } void Value() #void : {} { Identifier() | Constant() } void Identifier() : { Token t; } { t = <IDENTIFIER> { jjtThis.setText(t.image); } } void Constant() : { Token t; ASTConstant.Type type = ASTConstant.Type.SIMPLE; } { t = <INTEGER_LITERAL> [ <THOUSANDS> { type = ASTConstant.Type.MODIFIED; } ] { jjtThis.setText(t.image); jjtThis.setType(type); } } SKIP : { " " | "\t" | "\n" | "\r" | "\r\n"} TOKEN : { < COMMA : ","> } TOKEN : { < IDENTIFIER : <LETTER>(<LETTER>|<DIGIT>)* > } TOKEN : { < INTEGER_LITERAL : <DIGIT> (<DIGIT>)* > : AFTER_CONSTANT } <AFTER_CONSTANT> TOKEN : { < THOUSANDS : "K" > : DEFAULT } TOKEN : { < #DIGIT : ["0"-"9"] > } TOKEN : { < #LETTER: ["_", "a"-"z", "A"-"Z"] > } |
|
|
RE: Lexical state switching issueHi
You can drop your states, the [<THOUSANDS> {...}] option will do the job, and just declare the THOUSANDS token before the IDENTIFIER token (otherwise "K" will always be recognized as an IDENTIFIER). SKIP : { " " | "\t" | "\n" | "\r" | "\r\n"} TOKEN : { < COMMA : ","> } TOKEN : { < THOUSANDS : "K" > } TOKEN : { < IDENTIFIER : <LETTER>(<LETTER>|<DIGIT>)* > } TOKEN : { < INTEGER_LITERAL : <DIGIT> (<DIGIT>)* > } TOKEN : { < #DIGIT : ["0"-"9"] > } TOKEN : { < #LETTER : ["_", "a"-"z", "A"-"Z"] > } Marc MAZAS -----Message d'origine----- De : Domas Savickas [mailto:dsavickas@...] Envoyé : lundi 28 septembre 2009 14:17 À : users@... Objet : [JavaCC] Lexical state switching issue Hello, all, I have encountered an issue which I could not solve using my limited JavaCC knowledge. Real grammar is a bit too complicated for inclusion, but I managed to repeat my issue using simplified example. My parser tries to parse comma separated expressions consisting of identifiers and numbers. Numbers can have "K" character in the end to indicate that values are expressed in thousands. After matching a number I switch to AFTER_CONSTANT state and if number is followed by "K" all is fine (we switch back to DEFAULT and continue parsing). Problem appears when number is not followed by "K". Is there a cleaner solution than adding <DEFAULT, AFTER_CONSTANT> before all tokens that can follow number? Here's the grammar I'm using: options { STATIC=false; } PARSER_BEGIN(Parser) package com.test; public class Parser { } PARSER_END(Parser) SimpleNode Start() #Start : {} { Expression() <EOF> { return jjtThis; } } void Expression() #void : {} { Value() (<COMMA> Value() )* } void Value() #void : {} { Identifier() | Constant() } void Identifier() : { Token t; } { t = <IDENTIFIER> { jjtThis.setText(t.image); } } void Constant() : { Token t; ASTConstant.Type type = ASTConstant.Type.SIMPLE; } { t = <INTEGER_LITERAL> [ <THOUSANDS> { type = ASTConstant.Type.MODIFIED; } ] { jjtThis.setText(t.image); jjtThis.setType(type); } } SKIP : { " " | "\t" | "\n" | "\r" | "\r\n"} TOKEN : { < COMMA : ","> } TOKEN : { < IDENTIFIER : <LETTER>(<LETTER>|<DIGIT>)* > } TOKEN : { < INTEGER_LITERAL : <DIGIT> (<DIGIT>)* > : AFTER_CONSTANT } <AFTER_CONSTANT> TOKEN : { < THOUSANDS : "K" > : DEFAULT } TOKEN : { < #DIGIT : ["0"-"9"] > } TOKEN : { < #LETTER: ["_", "a"-"z", "A"-"Z"] > } -- View this message in context: http://www.nabble.com/Lexical-state-switching-issue-tp25644632p25644632.html Sent from the java.net - javacc users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
RE: Lexical state switching issueThank you for your help!
The problem is that K can be IDENTIFIER and THOUSANDS depending on whether it is following number or not. I guess I wasn't clear enough. So If I modify the grammar according to your suggestions, I cannot parse following expressions: 1. "K" 2. "10K, 10, K" -> {IDENTIFIER, MODIFIED_NUMBER, NUMBER, IDENTIFIER} I expect first expression to be parsed as {IDENTIFIER}, but I get ParseException. Second one should be parsed as {NUMBER, THOUSANDS, NUMBER, IDENTIFIER} but I get ParseException too. Following token definitions seem to do the trick, but I thought that there is a better way to do it: <DEFAULT, AFTER_CONSTANT> SKIP : { " " | "\t" | "\n" | "\r" | "\r\n" : DEFAULT } <DEFAULT, AFTER_CONSTANT> TOKEN : { < COMMA : ","> : DEFAULT } TOKEN : { < IDENTIFIER : <LETTER>(<LETTER>|<DIGIT>)* > } TOKEN : { < INTEGER_LITERAL : <DIGIT> (<DIGIT>)* > : AFTER_CONSTANT } <AFTER_CONSTANT> TOKEN : { < THOUSANDS : "K" > : DEFAULT } TOKEN : { < #DIGIT : ["0"-"9"] > } TOKEN : { < #LETTER: ["_", "a"-"z", "A"-"Z"] > } Domas Savickas |
| Free embeddable forum powered by Nabble | Forum Help |