Boost.Lex ... how is a token value initialized ?

View: New views
4 Messages — Rating Filter:   Alert me  

Boost.Lex ... how is a token value initialized ?

by Jean-Francois Ostiguy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am back again ;-(

I am now trying to understand something that should be simple,
but so far, I have been unable to find a satisfactory answer just by
consulting the documentation. I admit that I did not read _all_ the  
documentation ... but I am trying to avoid diving into all of  
Phoenix , Variant etc, ... at least for now.

So the question is this:

I have a token defined as follows

integer    = "[0-9]+";
....

lex::token_def<int>   integer;

When the token is matched, something must set its value attribute to the
the binary representation of an int. This requires calling a function to
translate the matched ascii string that represents the integer.
In flex, one would do something like this.

   
{integer}     {     yylval->ival=atoi(yytext);
                    return token::INT_TOKEN;  
              }


In this case, I think I am supposed to use a semantic action of some kind,

integer[ val_ = ??? ]  

but what goes on the rhs ( at this point I am not even sure about the lhs) ?
Am I supposed to write my own lambda/Phoenix expression ? Do I use _start,
_end iterators to copy the matched string into a  stringstream and read it
back into an int ?  Is there a better, pre-defined, solution ?

This is not clear at all. The "quick start" examples are not very useful,
since they conveniently avoid the issue by merely counting tokens, never
initializing any value attribute.  So far, all my attempts at writing a
suitable semantic action have resulted an orgy of template errors.

An example would go a long way ...  
   
-Francois




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: Boost.Lex ... how is a token value initialized ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Francois

> I am back again ;-(

Welcome back! :-)

> I am now trying to understand something that should be simple,
> but so far, I have been unable to find a satisfactory answer just by
> consulting the documentation. I admit that I did not read _all_ the
> documentation ... but I am trying to avoid diving into all of
> Phoenix , Variant etc, ... at least for now.
>
> So the question is this:
>
> I have a token defined as follows
>
> integer    = "[0-9]+";
> ....
>
> lex::token_def<int>   integer;
>
> When the token is matched, something must set its value attribute to
> the
> the binary representation of an int. This requires calling a function
> to
> translate the matched ascii string that represents the integer.
> In flex, one would do something like this.
>
>
> {integer}     {     yylval->ival=atoi(yytext);
>                     return token::INT_TOKEN;
>               }
>
> In this case, I think I am supposed to use a semantic action of some
> kind,
>
> integer[ val_ = ??? ]
>
> but what goes on the rhs ( at this point I am not even sure about the
> lhs) ?
> Am I supposed to write my own lambda/Phoenix expression ? Do I use
> _start,
> _end iterators to copy the matched string into a  stringstream and read
> it
> back into an int ?  Is there a better, pre-defined, solution ?

The trick is that you don't have to do anything. By specifying the token
value type to be int you're defining the attribute type of this token
definition if used as a parser as well. Spirit.Lex knows how to convert all
build in types from the matched input. So not need to attach any semantic
actions to the lexer:

Let's have a look at an example:

template <typename Lexer>
struct print_numbers_tokens : lex::lexer<Lexer>
{
    print_numbers_tokens()
      : print_numbers_tokens::base_type()
    {
        integer = "[1-9][0-9]*";

        this->self
            =   integer
            |   string(".")[lex::_pass = lex::pass_flags::pass_ignore]
            ;
    }
    lex::token_def<int> integer;
};

template <typename Iterator>
struct print_numbers_grammar : qi::grammar<Iterator>
{
    print_numbers_grammar(print_numbers_tokens& def)
      : print_numbers_grammar::base_type(start)
    {
        start =  * def.integer[std::cout << qi::_1  << "\n"];
    }

    qi::rule<Iterator> start;
};

This will print all integer numbers in a file, ignoring everything else. The
qi::_1 in the semantic action of the start rule refers to the 'int'
attribute exposed by the token definition (and it actually is of type
'int').

The conversion of the matched input sequence to int will be executed on
demand only, while it's accessed for the first time. If the same token
happens to be inspected for a second time (because of backtracking in the
parser) the integer will be still available without any need to be converted
from the input string again.

> This is not clear at all. The "quick start" examples are not very
> useful,
> since they conveniently avoid the issue by merely counting tokens,
> never
> initializing any value attribute.  So far, all my attempts at writing a
> suitable semantic action have resulted an orgy of template errors.

Again, I'm sorry for the incomplete documentation, I'll try to catch up as
soon as possible.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com





------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: Boost.Lex ... how is a token value initialized?

by Jean-Francois Ostiguy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:

> The trick is that you don't have to do anything. By specifying the token
> value type to be int you're defining the attribute type of this token
> definition if used as a parser as well. Spirit.Lex knows how to convert
> all build in types from the matched input. So not need to attach any
> semantic actions to the lexer:
>

Hartmut -

Thank you for your careful and detailed explanation ... There are things
that remain rather nebulous.

I used 'integer' as an example because it does not hold a string as a value
attribute. While having automatic conversion for some built-in types is
nice,  what happens when the value attribute to be an instance of some
unspecified (user-defined) class ?

Surely in general, I need to provide some code to explain to the lexer how
the conversion is done. To fix ideas, suppose I have a class for rational
numbers and each rational number is represented
by  { n, m } i.e. { 3,4 } would be 3/4. I want to tokenize "{3,4}" and
store an attribute of type Rational in my token value attribute using the
constructor Rational(3,4) to initialize it. In that case, I think would have
to write a custom semantic action. I am not sure how I would do this ... How
does my custom semantic action get access to the token string    
representation and to the value attribute ?

Again, thanks for your patience.

-Francois




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: Boost.Lex ... how is a token value initialized?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > The trick is that you don't have to do anything. By specifying the
> token
> > value type to be int you're defining the attribute type of this token
> > definition if used as a parser as well. Spirit.Lex knows how to
> convert
> > all build in types from the matched input. So not need to attach any
> > semantic actions to the lexer:
>
> Hartmut -
>
> Thank you for your careful and detailed explanation ... There are
> things
> that remain rather nebulous.
>
> I used 'integer' as an example because it does not hold a string as a
> value
> attribute. While having automatic conversion for some built-in types is
> nice,  what happens when the value attribute to be an instance of some
> unspecified (user-defined) class ?
>
> Surely in general, I need to provide some code to explain to the lexer
> how
> the conversion is done. To fix ideas, suppose I have a class for
> rational
> numbers and each rational number is represented
> by  { n, m } i.e. { 3,4 } would be 3/4. I want to tokenize "{3,4}" and
> store an attribute of type Rational in my token value attribute using
> the
> constructor Rational(3,4) to initialize it. In that case, I think would
> have
> to write a custom semantic action. I am not sure how I would do this
> ... How
> does my custom semantic action get access to the token string
> representation and to the value attribute ?

Good question, and I have to admit this is not documented yet (it is a
missing paragraph in the section 'Customization of Spirit's Attribute
Handling', I'll add it asap).

For user-defined types you need to specialize the following template:

// this is the default/main template definition (contained in Spirit)
namespace boost { namespace spirit { namespace traits
{
    template <typename Attribute, typename Iterator
      , typename Enable /* = void*/>
    struct assign_to_attribute_from_iterators
    {
        static void
        call(Iterator const& first, Iterator const& last, Attribute& attr)
        {
            attr = Attribute(first, last);
        }
    };
}}}

// this is an example for a user-defined type foo
namespace boost { namespace spirit { namespace traits
{
    template <typename Iterator>
    struct assign_to_attribute_from_iterators<foo, Iterator>
    {
        static void
        call(Iterator const& first, Iterator const& last, foo& attr)
        {
            attr = foo(first, last); // construct foo from iterators
        }
    };
}}}

The iterators passed to call() point to the matched input sequence. Spirit
will use this specialization for conversion of the iterator pair to your
data type.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general