Scanning an identifier with same value as a token

View: New views
2 Messages — Rating Filter:   Alert me  

Scanning an identifier with same value as a token

by Henrik Wahlberg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all. <mailto:users@...>

I have a beginners problem, that I'm not able to search for, as I don't
know how to express it :-(

I'm trying to parse an RCS files using the work already done in the
apache jacarta project, now hosted at svn.codehaus.org
It has the majority of the RCS file grammar defined.

Right now I'm trying to make the parser more CVSNT compatible, so I
started adding parsing of "deltatype" from the following excerpt from a
cvsnt file
---------------
1.1
date    2009.07.13.04.34.34;    author hew;    state Exp;
branches
    1.1.2.1;
next    ;
deltatype    text;
---------------
In this context the "text" parameter on deltatype conflicts with the
predefined token "text"
TOKEN definitions:
.....
 < STRICT: "strict">
| < SYMBOLS: "symbols" >
| < TEXT: "text" >
| < DELTATYPE: "deltatype" >}


Parsing:
    <AUTHOR>   s   = id()   { node.setAuthor(s); }    ";"
    <STATE>    [ s = id() { node.setState(s);   } ] ";"
    <BRANCHES> ( v = version() { node.addBranch(arc.newBranchNode(v));  
} )* ";"
    <NEXT>     [ v = version() { node.setRCSNext(arc.newNode(v));} ]  ";"
    [<DELTATYPE> s = id() { node.setDeltaType(s); } ";" ]

Where Last line is my addition.
Fine until I discovered that a user cannot have the initials of "text"
or any other predefined "keyword"
(and my deltatype likewise cannot be "text")

the id() method is defined like this

String id()      : {Token t; } { t = <ID>  { return t.image; } }

And my basic tokens like this:

TOKEN :
{   < ID:  (<DIGIT>|".")? <IDCHAR> (<IDCHAR>|<DIGIT>|".")*>
|    < SYM: (<DIGIT>)* <IDCHAR> (<IDCHAR>|<DIGIT>)* >
|    < STRING: "@" ( ~["@"] | "@@" )* "@" >
|    < #IDCHAR:        ["A"-"Z","a"-"z","-","_"]    >
|    < #DIGIT: ["0"-"9"]  >
|    < NUM: ( <DIGIT> )+ >
}

The grammar without my mods are here:
http://svn.codehaus.org/jrcs/trunk/jrcs/src/java/org/apache/commons/jrcs/rcs/ArchiveParser.jj

Am I mixing, what in java would be called  identifiers and reserved words.
Im I trying to pasrse somthing which isn't parseable with this kind af
parser

So please tell me what I have misunderstood

Med venlig hilsen / Best regards / Mit freundlichen GrĂ¼ssen
Henrik Wahlberg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Scanning an identifier with same value as a token

by Tom Copeland :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Jul 15, 2009, at 9:34 AM, Henrik Wahlberg wrote:

>
> In this context the "text" parameter on deltatype conflicts with the  
> predefined token "text"
> TOKEN definitions:

Hi Henrik -

Yup, the problem is that the characters "text" will be matched as a  
TEXT token, not as an ID token.  Thus this production:

========
[<DELTATYPE> s = id() { node.setDeltaType(s);   } ";"]
========

fails because the id() nonterminal expects an ID token, not a TEXT  
token.

Are there a limited number of delta types?  For example, if there are  
only "text" and "binary", you could add a "binary" token:

========
  < TEXT: "text" >
|
  < BINARY: "binary" >
|
  < DELTATYPE: "deltatype" >
========

and then add a new nonterminal:

========
String delta_type()      : {} { <TEXT> {return token.image;} |  
<BINARY> {return token.image;} }
========

and then modify the delta() production to call that new nonterminal:

========
[<DELTATYPE> s = delta_type() { node.setDeltaType(s);   } ";"]
========

Yours,

Tom
http://generatingparserswithjavacc.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...