|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
Possible Lex BugI think I may have a possible bug. I cut it down to a small parser that still shows the problem (below). The problem seems to be in switching lexical states with MORE. (or, I just don't understand more). Here's the input file: ---- start ----- (1) GENERAL INFORMATION: (i) APPLICANT: kurt guenther (ii) TITLE OF INVENTION: fusion reactor ---- end ----- The output and lexer is below. The problem arises on line #3 with the curchar being "i". On Line #2, it correctly enters the TEXT state to parse <TEXT_LINE> and <TEXT_EOL>. It looks like more detects the parenthesis correctly and switches state to DEFAULT. But, the lexer doesn't seem to use the open paren "(" for the token match. I have a junit, parser, sample file. Let me know and I'll send a zip. I'll take out the more and use skips for the time being. --Kurt Here is the output: ------- start ------------------ ****** FOUND A <TEXT_EOL> MATCH (\r\n) ****** Consumed token: <<TEXT_EOL>: " " at line 2 column 35> <TEXT>Skipping character : (32) <TEXT>Skipping character : (32) <TEXT>Skipping character : (32) <TEXT>Skipping character : (32) <TEXT>Current character : ( (40) at line 3 column 5 No more string literal token matches are possible. Currently matched the first 1 characters as a "(" token. ****** FOUND A "(" MATCH (() ****** >>>>>>>>>>>Switching to DEFAULT <DEFAULT>Current character : i (105) at line 3 column 6 <DEFAULT>Current character : i (105) at line 3 column 6 No string literal matches possible. Starting NFA to match one of : { <EOL>, <TAG_GENERAL_INFO> } <DEFAULT>Current character : i (105) at line 3 column 6 Return: parseText Return: parseGeneralInfo Return: Document -----------------------end----------------------- Here's the lexer: --------------- start ------------------ <*> TOKEN : { <EOF> } <DEFAULT> SKIP : { " " | "\t" } <DEFAULT> TOKEN : { <EOL: "\n" | "\r" | "\r\n" > | <#INTEGER: ["1"-"9"] (["0"-"9"])* > | <#WHITESPACE: ([" ","\t"])+ > } < DEFAULT > TOKEN : { <TAG_COLON: ":" > | <TAG_GENERAL_INFO: "(1)" <WHITESPACE> "GENERAL INFORMATION:" > : DEFAULT | <TAG_APPLICANT: "(i)" <WHITESPACE> "APPLICANT:" > : TEXT | <TAG_TITLE_OF_INVENTION: "(ii)" <WHITESPACE> "TITLE OF INVENTION:" > : TEXT } < TEXT > SKIP : { " " | "\t" } <TEXT> MORE : { // this should only trigger when found as a first character on the line "(" { System.out.println(">>>>>>>>>>>Switching to DEFAULT"); SwitchTo(DEFAULT); } } <TEXT> TOKEN : { <TEXT_LINE: ~["("," ","\n","\r"] (~["\n","\r"])* > | <TEXT_EOL: <EOL> > } --------------- end ------------------ --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |