Given that strings are delimited by "s, the only way to define them as containing a literal " is to escape it somehow.
E.g. in C etc, internal quotes are escaped by a \
Then your definition can become:
<STRING_LITERAL: "\"" ("\\" "\"" | <NOT_QUOTE_CHAR>)* "\"">
or if you escape it by doubling it like say Pascal or Delphi, then it is even simpler as you can leave out the escape branch above (so long as you allow in your grammar for a string to appear as a sequence of one or more <STRING_LITERAL>s as "xxx""yyy" is the representation for the literal value xxx"yyy but will parse as 2 consecutive <STRING_LITERAL>s).
If you can't double it you need to define some escape semantics (e.g. it's not a delimiter unless it has whitespace against it or something like that). Generally however, follow the why-reinvent-the-wheel concept and realise that this problem has been around for ages and is generally solved using 3 standard ways:
- no escapes, but provide multiple delimiters e.g. " and ' so you can make a string containing a " by delimiting it with 's instead etc.
- some escape semantic, usually \ that makes non-special the following character
- doubling the delimiter to include it once in the value (like how CSV files do), implemented by allowing in the grammar a sequence of string literal tokens with the concatenation rule that a delimiter character is inserted into the value between each.
On Fri, May 8, 2009 at 9:32 AM, Terry Gardner
<Terry.Gardner@...> wrote:
I need a way to parse this string:
"m>\AF\A4}"\B7p\28^\?"
where all of the double quote characters are present, that is, the string is delimited by the quotes at the beginning and end, and a double quote appears as part of the string. I had hoped for a simple TOKEN but cannot make that work. My current TOKEN looks like this:
| <#CHARACTER: ["a"-"z","A"-"Z","0"-"9","/","'","`","="," ",",","(",")","*","-",";","|","&","\\",".",":","$","!","@","#","%","^","_","+","?","<",">","~","{","}"] >
| <STRING_LITERAL: ( "\"" <CHARACTER> (<CHARACTER>)* "\"" ) | "\"\"" >
Is there a way to have the string above matched by <STRING_LITERAL>, or am I off-track and need to do something else? If so, what?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...
--
- J.Chris Findlay
(c: