Dynamic tokenizing

View: New views
4 Messages — Rating Filter:   Alert me  

Dynamic tokenizing

by Bachelier, Georges :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

Hi!

 

I have been searching a solution for three days to the following problem:

 

I am building a JavaCC parser for a language having an original feature: the language allows you to define user key words which may be found anywhere in the input file after their declaration.

 

User key words are declared with the following statement : UserKeywords <my_user_kw1> … <my_user_kwn> ;

 

Once keywords have been declared, they may be used like this :

 

<my_user_kw1> {

   <any_sequence_of_characters>

}

 

<my_user_kwn> {

   <any_sequence_of_characters>

}

 

The sequences of characters do not matter; we do not use them, but they are surrounded by braces.

 

I have written a parser production in order to get the user key words and I have tried to play with the tokenizer CommonTokenAction function, but without success.

 

Could someone help me on this issue, please?

 

Georges


Re: Dynamic tokenizing

by Farrukh Najmi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


This sounds doable. You can probably use LOOKAHEAD with a function call
as parameter instead of a constant value.
See an example here:

<http://old.nabble.com/Problem:-semantic-lookahead-calling-boolean-functions-td23087841.html>

Bachelier, Georges wrote:

>
> Hi!
>
> I have been searching a solution for three days to the following problem:
>
> I am building a JavaCC parser for a language having an original
> feature: the language allows you to define user key words which may be
> found anywhere in the input file after their declaration.
>
> User key words are declared with the following statement :
> UserKeywords <my_user_kw1> … <my_user_kwn> ;
>
> Once keywords have been declared, they may be used like this :
>
> <my_user_kw1> {
>
> <any_sequence_of_characters>
>
> }
>
> <my_user_kwn> {
>
> <any_sequence_of_characters>
>
> }
>
> The sequences of characters do not matter; we do not use them, but
> they are surrounded by braces.
>
> I have written a parser production in order to get the user key words
> and I have tried to play with the tokenizer CommonTokenAction
> function, but without success.
>
> Could someone help me on this issue, please?
>
> Georges
>


--
Regards,
Farrukh

Web: http://www.wellfleetsoftware.com



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


RE: Dynamic tokenizing

by Mazas Marc :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
I do not believe you can do dynamic tokenizing directly in JavaCC.
 
I would think about writing a first parser that recognizes your UserKeywords syntax, and in the corresponding production a) generates a second parser (through println() statements, or a through a template engine) which will contain all the keywords token definitions, b) compile it, and c) execute it on the remaining of the input file.
 
Marc
 


De : Bachelier, Georges [mailto:georges.bachelier@...]
Envoyé : vendredi 6 novembre 2009 20:14
À : users@...
Objet : [JavaCC] Dynamic tokenizing

Hi!

 

I have been searching a solution for three days to the following problem:

 

I am building a JavaCC parser for a language having an original feature: the language allows you to define user key words which may be found anywhere in the input file after their declaration.

 

User key words are declared with the following statement : UserKeywords <my_user_kw1> … <my_user_kwn> ;

 

Once keywords have been declared, they may be used like this :

 

<my_user_kw1> {

   <any_sequence_of_characters>

}

 

<my_user_kwn> {

   <any_sequence_of_characters>

}

 

The sequences of characters do not matter; we do not use them, but they are surrounded by braces.

 

I have written a parser production in order to get the user key words and I have tried to play with the tokenizer CommonTokenAction function, but without success.

 

Could someone help me on this issue, please?

 

Georges


Re: Dynamic tokenizing

by Bill Fenlason-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Can't you just use a symbol table?  Macro processors often do this kind
of thing. Another similar example is "typedef" in C.

I haven't used COMMON_TOKEN_ACTION since I use USER_TOKEN_MANAGER, but I
would think that you can update a symbol table as the "UserKeywords"
statement is parsed.  Then in the common action code you can check
generic word tokens in the symbol table, and if found, change the "kind"
field in the token to a different token type (e.g. USER_KW).  The actual
keyword text is in the "image" field.  In the grammar you can specify
the <USER_KW> "{" ... "}" sequence anywhere it may occur and process it
as necessary as it is parsed.

Depending on what you are doing with the content within the braces, you
may have to do some additional processing in the common action code.  If
you were trying to do text substitution or macro processing (i.e.
replacing the USER_KW {...} sequence with a different string of tokens
to be parsed), that would be a whole different ballgame.

Bill

Bachelier, Georges wrote:

> Hi!
>
>  
>
> I have been searching a solution for three days to the following problem:
>
>  
>
> I am building a JavaCC parser for a language having an original feature: the language allows you to define user key words which may be found anywhere in the input file after their declaration.
>
>  
>
> User key words are declared with the following statement : UserKeywords <my_user_kw1> ... <my_user_kwn> ;
>
>  
>
> Once keywords have been declared, they may be used like this :
>
>  
>
> <my_user_kw1> {
>
>    <any_sequence_of_characters>
>
> }
>
>  
>
> <my_user_kwn> {
>
>    <any_sequence_of_characters>
>
> }
>
>  
>
> The sequences of characters do not matter; we do not use them, but they are surrounded by braces.
>
>  
>
> I have written a parser production in order to get the user key words and I have tried to play with the tokenizer CommonTokenAction function, but without success.
>
>  
>
> Could someone help me on this issue, please?
>
>  
>
> Georges
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...