« Return to Thread: Grammar for parsing regular expression

RE: Grammar for parsing regular expression

by Laughing Man :: Rate this Message:

Reply to Author | View in Thread

Some parts of this message have been removed. Learn more about Nabble's security policy.
One option, is to simply treat regex as any other variable, and leave handling regular expressions for later. In otherwords, just treat anything after an equal sign as a generic "assignedValue" token (or something with some better name).

This does mean that a lot of invalid assignments can be tokenized, but you can catch those later, some other way than with the tokenizer.

Date: Thu, 16 Apr 2009 17:08:48 +0200
From: klangner@...
To: users@...
Subject: Re: [JavaCC] Grammar for parsing regular expression


You will never, ever be able to write a lexeme / token definition (a
regular expression) to recognize the syntax of regular expressions.
Parsing regular expressions requires a push-down automata, owing to the
balanced, nestable constructs such as parentheses and (in certain
dialects) character classes and its use of infix operators (i.e.,
alternation).

I know. I don't want to parse RE with lexer (tokens). It would be enough for me to just represent RE as token.
It could almost be done with this simple definition:

< REGEX_LITERAL:
      "/"
      (   (~["/","\n","\r"]) | ("\\" "/")
      )*
      "/"
  >

And the problem is not with RE syntax. But with math expression. Because now int this expression:

my_variable = 2/3/4;

I'll get REGEX_LITERAL token.

But Tom Copeland  helped me with this problem by sending me ECMAScript grammar. It looks that the problem is solved there on the parser level.

Thank you all for help :-)
Krzysztof


Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!

 « Return to Thread: Grammar for parsing regular expression