[lex] Parser primitives

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

[lex] Parser primitives

by Kay-Michael Wuerzner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

three questions:
1. After playing around with the 'Counting Words Using a Parser'
example from the spirit::lex tutorials, I wonder how one can use
parser primitives like 'alnum' in such a combined framework?
2. If they can not be used, how would you use predefined character
classes as 'alnum' in the lexer specification?
3. Are the parser primitives in general sensible for the current
'locale' (e.g. en_US.UTF-8 etc.)?

Many thanks in advance,
Kay

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> three questions:
> 1. After playing around with the 'Counting Words Using a Parser'
> example from the spirit::lex tutorials, I wonder how one can use
> parser primitives like 'alnum' in such a combined framework?

You can't use alnum et.al. while parsing based on lexer input. Although this
might be something to think about.

> 2. If they can not be used, how would you use predefined character
> classes as 'alnum' in the lexer specification?

Character classes are currently not supported by the used Lexertl library,
IIUC. Ben?
You'll need to resolve to constructs like "[a-zA-Z]" for now.

> 3. Are the parser primitives in general sensible for the current
> 'locale' (e.g. en_US.UTF-8 etc.)?

The lexer doesn't support character sets either. Everything is implemented
based on the standard locale (namespace boost::spirit::standard). This is
something we want to look into in the future.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com





------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Ben.Hanson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:
> three questions:
> 1. After playing around with the 'Counting Words Using a Parser'
> example from the spirit::lex tutorials, I wonder how one can use
> parser primitives like 'alnum' in such a combined framework?

You can't use alnum et.al. while parsing based on lexer input. Although this
might be something to think about.
> 2. If they can not be used, how would you use predefined character
> classes as 'alnum' in the lexer specification?

Character classes are currently not supported by the used Lexertl library,
IIUC. Ben?
You'll need to resolve to constructs like "[a-zA-Z]" for now.
I looked into supporting POSIX charsets (e.g. [[:alnum:]]), but I couldn't come up with an efficient way to implement it. You've got the \w and \W support, but this doesn't help much. I did however include an imbue() method on the lexertl rules class so you can change the default locale. This locale is used for case conversions. Again not a great deal of help, but possibly useful to somebody.
> 3. Are the parser primitives in general sensible for the current
> 'locale' (e.g. en_US.UTF-8 etc.)?

The lexer doesn't support character sets either. Everything is implemented
based on the standard locale (namespace boost::spirit::standard). This is
something we want to look into in the future.

Regards Hartmut
I notice someone actually mentioned patches to boost::regex for EBCDIC too..! Heh.

Regards,

Ben

Re: [lex] Parser primitives

by Ben.Hanson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:
> three questions:
> 1. After playing around with the 'Counting Words Using a Parser'
> example from the spirit::lex tutorials, I wonder how one can use
> parser primitives like 'alnum' in such a combined framework?

You can't use alnum et.al. while parsing based on lexer input. Although this
might be something to think about.
> 2. If they can not be used, how would you use predefined character
> classes as 'alnum' in the lexer specification?

Character classes are currently not supported by the used Lexertl library,
IIUC. Ben?
You'll need to resolve to constructs like "[a-zA-Z]" for now.
I looked into supporting POSIX charsets (e.g. [[:alnum:]]), but I couldn't come up with an efficient way to implement it. You've got the \w and \W support, but this doesn't help much. I did however include an imbue() method on the lexertl rules class so you can change the default locale. This locale is used for case conversions. Again not a great deal of help, but possibly useful to somebody.
> 3. Are the parser primitives in general sensible for the current
> 'locale' (e.g. en_US.UTF-8 etc.)?

The lexer doesn't support character sets either. Everything is implemented
based on the standard locale (namespace boost::spirit::standard). This is
something we want to look into in the future.

Regards Hartmut
I notice someone actually mentioned patches to boost::regex for EBCDIC too..! Heh.

Regards,

Ben

Re: [lex] Parser primitives

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 8:46 AM, Ben.Hanson <jamin.hanson@...> wrote:

>
>
> Hartmut Kaiser wrote:
>>
>>> three questions:
>>> 1. After playing around with the 'Counting Words Using a Parser'
>>> example from the spirit::lex tutorials, I wonder how one can use
>>> parser primitives like 'alnum' in such a combined framework?
>>
>> You can't use alnum et.al. while parsing based on lexer input. Although
>> this
>> might be something to think about.
>>> 2. If they can not be used, how would you use predefined character
>>> classes as 'alnum' in the lexer specification?
>>
>> Character classes are currently not supported by the used Lexertl library,
>> IIUC. Ben?
>> You'll need to resolve to constructs like "[a-zA-Z]" for now.
>>
> I looked into supporting POSIX charsets (e.g. [[:alnum:]]), but I couldn't
> come up with an efficient way to implement it. You've got the \w and \W
> support, but this doesn't help much. I did however include an imbue() method
> on the lexertl rules class so you can change the default locale. This locale
> is used for case conversions. Again not a great deal of help, but possibly
> useful to somebody.
>
>
>>> 3. Are the parser primitives in general sensible for the current
>>> 'locale' (e.g. en_US.UTF-8 etc.)?
>>
>> The lexer doesn't support character sets either. Everything is implemented
>> based on the standard locale (namespace boost::spirit::standard). This is
>> something we want to look into in the future.
>>
>> Regards Hartmut
>>
> I notice someone actually mentioned patches to boost::regex for EBCDIC
> too..! Heh.

I have been curious about that, why does not lexertl not use
Boost.Xpressive for its regex since it supports the normal dynamic
regex, as well as the faster static xpressive regex, and have more
consistent regex support across boost?

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by CARL BARRON-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 5, 2009, at 1:05 AM, OvermindDL1 wrote:

> On Wed, Nov 4, 2009 at 8:46 AM, Ben.Hanson <jamin.hanson@...
> > wrote:
>>
>>
>> Hartmut Kaiser wrote:
>>>
>>>> three questions:
>>>> 1. After playing around with the 'Counting Words Using a Parser'
>>>> example from the spirit::lex tutorials, I wonder how one can use
>>>> parser primitives like 'alnum' in such a combined framework?
>>>
>>> You can't use alnum et.al. while parsing based on lexer input.  
>>> Although
>>> this
>>> might be something to think about.
>>>> 2. If they can not be used, how would you use predefined character
>>>> classes as 'alnum' in the lexer specification?
>>>
>>> Character classes are currently not supported by the used Lexertl  
>>> library,
>>> IIUC. Ben?
>>> You'll need to resolve to constructs like "[a-zA-Z]" for now.
>>>
>> I looked into supporting POSIX charsets (e.g. [[:alnum:]]), but I  
>> couldn't
>> come up with an efficient way to implement it. You've got the \w  
>> and \W
>> support, but this doesn't help much. I did however include an imbue
>> () method
>> on the lexertl rules class so you can change the default locale.  
>> This locale
>> is used for case conversions. Again not a great deal of help, but  
>> possibly
>> useful to somebody.
>>
>>
>>>> 3. Are the parser primitives in general sensible for the current
>>>> 'locale' (e.g. en_US.UTF-8 etc.)?
>>>
>>> The lexer doesn't support character sets either. Everything is  
>>> implemented
>>> based on the standard locale (namespace boost::spirit::standard).  
>>> This is
>>> something we want to look into in the future.
>>>
>>> Regards Hartmut
>>>
>> I notice someone actually mentioned patches to boost::regex for  
>> EBCDIC
>> too..! Heh.
>
> I have been curious about that, why does not lexertl not use
> Boost.Xpressive for its regex since it supports the normal dynamic
> regex, as well as the faster static xpressive regex, and have more
> consistent regex support across boost?
Xpressive is a regular expression search engine.  Search strings for  
regular expressions.
The match can occur any place in the string.

Lex is a lexer ,  translate input into tokens which are regular  
expressions. The match can only
occurr starting at the current input character.

These are two different tools for two different problems.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Carl Barron wrote:

>> I have been curious about that, why does not lexertl not use
>> Boost.Xpressive for its regex since it supports the normal dynamic
>> regex, as well as the faster static xpressive regex, and have more
>> consistent regex support across boost?
> Xpressive is a regular expression search engine.  Search strings for  
> regular expressions.
> The match can occur any place in the string.
>
> Lex is a lexer ,  translate input into tokens which are regular  
> expressions. The match can only
> occurr starting at the current input character.
>
> These are two different tools for two different problems.

Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
place, but for lexing, you need a DFA. NFAs won't cut it in terms
of performance.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net
http://www.facebook.com/djowel

Meet me at BoostCon
http://www.boostcon.com/home
http://www.facebook.com/boostcon



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 11:35 PM, Carl Barron <cbarron413@...> wrote:

>
> On Nov 5, 2009, at 1:05 AM, OvermindDL1 wrote:
>
>> On Wed, Nov 4, 2009 at 8:46 AM, Ben.Hanson <jamin.hanson@...
>> > wrote:
>>>
>>>
>>> Hartmut Kaiser wrote:
>>>>
>>>>> three questions:
>>>>> 1. After playing around with the 'Counting Words Using a Parser'
>>>>> example from the spirit::lex tutorials, I wonder how one can use
>>>>> parser primitives like 'alnum' in such a combined framework?
>>>>
>>>> You can't use alnum et.al. while parsing based on lexer input.
>>>> Although
>>>> this
>>>> might be something to think about.
>>>>> 2. If they can not be used, how would you use predefined character
>>>>> classes as 'alnum' in the lexer specification?
>>>>
>>>> Character classes are currently not supported by the used Lexertl
>>>> library,
>>>> IIUC. Ben?
>>>> You'll need to resolve to constructs like "[a-zA-Z]" for now.
>>>>
>>> I looked into supporting POSIX charsets (e.g. [[:alnum:]]), but I
>>> couldn't
>>> come up with an efficient way to implement it. You've got the \w
>>> and \W
>>> support, but this doesn't help much. I did however include an imbue
>>> () method
>>> on the lexertl rules class so you can change the default locale.
>>> This locale
>>> is used for case conversions. Again not a great deal of help, but
>>> possibly
>>> useful to somebody.
>>>
>>>
>>>>> 3. Are the parser primitives in general sensible for the current
>>>>> 'locale' (e.g. en_US.UTF-8 etc.)?
>>>>
>>>> The lexer doesn't support character sets either. Everything is
>>>> implemented
>>>> based on the standard locale (namespace boost::spirit::standard).
>>>> This is
>>>> something we want to look into in the future.
>>>>
>>>> Regards Hartmut
>>>>
>>> I notice someone actually mentioned patches to boost::regex for
>>> EBCDIC
>>> too..! Heh.
>>
>> I have been curious about that, why does not lexertl not use
>> Boost.Xpressive for its regex since it supports the normal dynamic
>> regex, as well as the faster static xpressive regex, and have more
>> consistent regex support across boost?
> Xpressive is a regular expression search engine.  Search strings for
> regular expressions.
> The match can occur any place in the string.
>
> Lex is a lexer ,  translate input into tokens which are regular
> expressions. The match can only
> occurr starting at the current input character.
>
> These are two different tools for two different problems.

I know, I did not say to replace the lexing functions with xpressive,
I know it cannot do that, just that lexertl uses a regex parser
internally to parse a set of text to see what to lex it as, instead of
being internal it could use xpressive as a backend regex parser
instead and get some speed enhancments.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 12:38 AM, Joel de Guzman
<joel@...> wrote:

> Carl Barron wrote:
>>> I have been curious about that, why does not lexertl not use
>>> Boost.Xpressive for its regex since it supports the normal dynamic
>>> regex, as well as the faster static xpressive regex, and have more
>>> consistent regex support across boost?
>> Xpressive is a regular expression search engine.  Search strings for
>> regular expressions.
>> The match can occur any place in the string.
>>
>> Lex is a lexer ,  translate input into tokens which are regular
>> expressions. The match can only
>> occurr starting at the current input character.
>>
>> These are two different tools for two different problems.
>
> Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
> place, but for lexing, you need a DFA. NFAs won't cut it in terms
> of performance.

Ah, quite true.  Well what about writing an xpressive like language
for Boost.Lex to allow for faster lexing?

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OvermindDL1 wrote:

> On Thu, Nov 5, 2009 at 12:38 AM, Joel de Guzman
> <joel@...> wrote:
>> Carl Barron wrote:
>>>> I have been curious about that, why does not lexertl not use
>>>> Boost.Xpressive for its regex since it supports the normal dynamic
>>>> regex, as well as the faster static xpressive regex, and have more
>>>> consistent regex support across boost?
>>> Xpressive is a regular expression search engine.  Search strings for
>>> regular expressions.
>>> The match can occur any place in the string.
>>>
>>> Lex is a lexer ,  translate input into tokens which are regular
>>> expressions. The match can only
>>> occurr starting at the current input character.
>>>
>>> These are two different tools for two different problems.
>> Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
>> place, but for lexing, you need a DFA. NFAs won't cut it in terms
>> of performance.
>
> Ah, quite true.  Well what about writing an xpressive like language
> for Boost.Lex to allow for faster lexing?

Sure. It's called Spirit.Lex ;-)

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net
http://www.facebook.com/djowel

Meet me at BoostCon
http://www.boostcon.com/home
http://www.facebook.com/boostcon



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Ben.Hanson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OvermindDL1 wrote:
I know, I did not say to replace the lexing functions with xpressive,
I know it cannot do that, just that lexertl uses a regex parser
internally to parse a set of text to see what to lex it as, instead of
being internal it could use xpressive as a backend regex parser
instead and get some speed enhancments.
The problem is that the two are very tightly coupled. The regex syntax is different between a straight up regex engine and a lexer due to the extra 'non-regular expression' features you tend to get with an NFA implementation quite apart from the fact that lexertl builds a syntax tree of all the regexes (with its own specific requirements).

Eric was interested in having a DFA module for Xpressive, but that is another discussion! Of course if that was an option, you would still need to support two different regex syntaxes.

Overall these are two very different problems solved in a significantly different way. Yes, there is an overlap but it's not trivial to factor out common code.

A meta regex library for boost could be really interesting, as indeed would a meta parsing library supporting LL, LR and recursive descent etc. As far as doing compile time conversion of regex *strings* to a regex representation, well, maybe D would be better for this job.

Regards,

Ben

Re: [lex] Parser primitives

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Thu, Nov 5, 2009 at 12:38 AM, Joel de Guzman
> <joel@...> wrote:
> > Carl Barron wrote:
> >>> I have been curious about that, why does not lexertl not use
> >>> Boost.Xpressive for its regex since it supports the normal dynamic
> >>> regex, as well as the faster static xpressive regex, and have more
> >>> consistent regex support across boost?
> >> Xpressive is a regular expression search engine.  Search strings for
> >> regular expressions.
> >> The match can occur any place in the string.
> >>
> >> Lex is a lexer ,  translate input into tokens which are regular
> >> expressions. The match can only
> >> occurr starting at the current input character.
> >>
> >> These are two different tools for two different problems.
> >
> > Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
> > place, but for lexing, you need a DFA. NFAs won't cut it in terms
> > of performance.
>
> Ah, quite true.  Well what about writing an xpressive like language
> for Boost.Lex to allow for faster lexing?

Spirit.Lex _is_ fast. Actually it's a lot faster than anything I saw (including flex, xpressive, boost.regex, etc.), especially the static lexer support

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 5, 2009 at 5:38 AM, Hartmut Kaiser <hartmut.kaiser@...> wrote:

>> On Thu, Nov 5, 2009 at 12:38 AM, Joel de Guzman
>> <joel@...> wrote:
>> > Carl Barron wrote:
>> >>> I have been curious about that, why does not lexertl not use
>> >>> Boost.Xpressive for its regex since it supports the normal dynamic
>> >>> regex, as well as the faster static xpressive regex, and have more
>> >>> consistent regex support across boost?
>> >> Xpressive is a regular expression search engine.  Search strings for
>> >> regular expressions.
>> >> The match can occur any place in the string.
>> >>
>> >> Lex is a lexer ,  translate input into tokens which are regular
>> >> expressions. The match can only
>> >> occurr starting at the current input character.
>> >>
>> >> These are two different tools for two different problems.
>> >
>> > Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
>> > place, but for lexing, you need a DFA. NFAs won't cut it in terms
>> > of performance.
>>
>> Ah, quite true.  Well what about writing an xpressive like language
>> for Boost.Lex to allow for faster lexing?
>
> Spirit.Lex _is_ fast. Actually it's a lot faster than anything I saw (including flex, xpressive, boost.regex, etc.), especially the static lexer support

It outperforms Boost.Xpressive static grammars?  Nice, but that
indicates a problem in Boost.Xpressive static grammars then.  :)

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Thu, Nov 5, 2009 at 5:38 AM, Hartmut Kaiser
> <hartmut.kaiser@...> wrote:
> >> On Thu, Nov 5, 2009 at 12:38 AM, Joel de Guzman
> >> <joel@...> wrote:
> >> > Carl Barron wrote:
> >> >>> I have been curious about that, why does not lexertl not use
> >> >>> Boost.Xpressive for its regex since it supports the normal
> dynamic
> >> >>> regex, as well as the faster static xpressive regex, and have
> more
> >> >>> consistent regex support across boost?
> >> >> Xpressive is a regular expression search engine.  Search strings
> for
> >> >> regular expressions.
> >> >> The match can occur any place in the string.
> >> >>
> >> >> Lex is a lexer ,  translate input into tokens which are regular
> >> >> expressions. The match can only
> >> >> occurr starting at the current input character.
> >> >>
> >> >> These are two different tools for two different problems.
> >> >
> >> > Also, Xpressive and regex are NFAs. Lex is a DFA. Both have its
> >> > place, but for lexing, you need a DFA. NFAs won't cut it in terms
> >> > of performance.
> >>
> >> Ah, quite true.  Well what about writing an xpressive like language
> >> for Boost.Lex to allow for faster lexing?
> >
> > Spirit.Lex _is_ fast. Actually it's a lot faster than anything I saw
> (including flex, xpressive, boost.regex, etc.), especially the static
> lexer support
>
> It outperforms Boost.Xpressive static grammars?  Nice, but that
> indicates a problem in Boost.Xpressive static grammars then.  :)

Well, let's be fair. I didn't compare single regex's. I was trying to use xpressive as a lexer, which is slow by design (retrying different xpressive regex's one by one instead of lexertl's DFA approach). It would be interesting to see how single regex's compare.

Ben, did you have to time to work on generating re2c style static lexers, BTW?

Regards Hartmut



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Ben.Hanson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:
Well, let's be fair. I didn't compare single regex's. I was trying to use xpressive as a lexer, which is slow by design (retrying different xpressive regex's one by one instead of lexertl's DFA approach). It would be interesting to see how single regex's compare.

Ben, did you have to time to work on generating re2c style static lexers, BTW?
I haven't got a re2c style code generator yet. If someone is prepared to work on this with me I could probably write one quite quickly. I printed out the design for re2c a long time ago, but if I understand correctly the original design targeted weaknesses in the C compilers of that era, so I wonder how careful a library targeting new compilers would have to be regarding the ordering of switch items.

If anyone has any suggestions I'm all ears. Alternatively, if this is going to be a hot topic I can try to pore over the re2c docs again.

Regards,

Ben

Re: [lex] Parser primitives

by Ben.Hanson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ben.Hanson wrote:
Hartmut Kaiser wrote:
Well, let's be fair. I didn't compare single regex's. I was trying to use xpressive as a lexer, which is slow by design (retrying different xpressive regex's one by one instead of lexertl's DFA approach). It would be interesting to see how single regex's compare.

Ben, did you have to time to work on generating re2c style static lexers, BTW?
I haven't got a re2c style code generator yet. If someone is prepared to work on this with me I could probably write one quite quickly. I printed out the design for re2c a long time ago, but if I understand correctly the original design targeted weaknesses in the C compilers of that era, so I wonder how careful a library targeting new compilers would have to be regarding the ordering of switch items.

If anyone has any suggestions I'm all ears. Alternatively, if this is going to be a hot topic I can try to pore over the re2c docs again.
Actually thinking about this a bit more, maybe I should just do a simple generator that checks chars in code and we can just improve the code as we go. It's actually *really* simple to do the basics as you just iterate over the state machine and spit out some switch statements and some gotos. I'll have a go with that and then we can look at why the generated code is too slow! :-D

Regards,

Ben

Re: [lex] Parser primitives

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> >> Well, let's be fair. I didn't compare single regex's. I was trying
> to use
> >> xpressive as a lexer, which is slow by design (retrying different
> >> xpressive regex's one by one instead of lexertl's DFA approach). It
> would
> >> be interesting to see how single regex's compare.
> >>
> >> Ben, did you have to time to work on generating re2c style static
> lexers,
> >> BTW?
> >>
> > I haven't got a re2c style code generator yet. If someone is prepared
> to
> > work on this with me I could probably write one quite quickly. I
> printed
> > out the design for re2c a long time ago, but if I understand
> correctly the
> > original design targeted weaknesses in the C compilers of that era,
> so I
> > wonder how careful a library targeting new compilers would have to be
> > regarding the ordering of switch items.
> >
> > If anyone has any suggestions I'm all ears. Alternatively, if this is
> > going to be a hot topic I can try to pore over the re2c docs again.
> >
> Actually thinking about this a bit more, maybe I should just do a
> simple
> generator that checks chars in code and we can just improve the code as
> we
> go. It's actually *really* simple to do the basics as you just iterate
> over
> the state machine and spit out some switch statements and some gotos.
> I'll
> have a go with that and then we can look at why the generated code is
> too
> slow! :-D

That would be great starting point. We could run direct comparing
measurements with the current statically generated lexer, then.

Would it be possible to target the 'old' state machine (the one used in
Spirit)?

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ben.Hanson wrote:

>
> Ben.Hanson wrote:
>>
>> Hartmut Kaiser wrote:
>>> Well, let's be fair. I didn't compare single regex's. I was trying to use
>>> xpressive as a lexer, which is slow by design (retrying different
>>> xpressive regex's one by one instead of lexertl's DFA approach). It would
>>> be interesting to see how single regex's compare.
>>>
>>> Ben, did you have to time to work on generating re2c style static lexers,
>>> BTW?
>>>
>> I haven't got a re2c style code generator yet. If someone is prepared to
>> work on this with me I could probably write one quite quickly. I printed
>> out the design for re2c a long time ago, but if I understand correctly the
>> original design targeted weaknesses in the C compilers of that era, so I
>> wonder how careful a library targeting new compilers would have to be
>> regarding the ordering of switch items.
>>
>> If anyone has any suggestions I'm all ears. Alternatively, if this is
>> going to be a hot topic I can try to pore over the re2c docs again.
>>
> Actually thinking about this a bit more, maybe I should just do a simple
> generator that checks chars in code and we can just improve the code as we
> go. It's actually *really* simple to do the basics as you just iterate over
> the state machine and spit out some switch statements and some gotos. I'll
> have a go with that and then we can look at why the generated code is too
> slow! :-D

IMO, we shouldn't follow re2c strategy. Re2c is a code generator and
you'll end up like lex/yacc, which is not the spirit of spirit. A
long time ago, I've demonstrated code using perfect hashing that
generates code, using just the compiler, that beats most switch
implementations at the time. I'd say that's the right way to go:
beat Re2C performance without ever using a switch. A humongous
switch statement is just so, ehmmm... stone age. Ditto for offline
code generators.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net
http://www.facebook.com/djowel

Meet me at BoostCon
http://www.boostcon.com/home
http://www.facebook.com/boostcon



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by CARL BARRON-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 6, 2009, at 6:39 PM, Joel de Guzman wrote:

>>>
>> Actually thinking about this a bit more, maybe I should just do a  
>> simple
>> generator that checks chars in code and we can just improve the  
>> code as we
>> go. It's actually *really* simple to do the basics as you just  
>> iterate over
>> the state machine and spit out some switch statements and some  
>> gotos. I'll
>> have a go with that and then we can look at why the generated code  
>> is too
>> slow! :-D
>
> IMO, we shouldn't follow re2c strategy. Re2c is a code generator and
> you'll end up like lex/yacc, which is not the spirit of spirit. A
> long time ago, I've demonstrated code using perfect hashing that
> generates code, using just the compiler, that beats most switch
> implementations at the time. I'd say that's the right way to go:
> beat Re2C performance without ever using a switch. A humongous
> switch statement is just so, ehmmm... stone age. Ditto for offline
> code generators.

    Stone age?   Well even  lexer::lexertl's static generation is  
'offline' in that it needs a small C++ program to generate a header  
file.

    I think that a requiring a random access iterator for the source  
iterator is a bigger problem, than providing an re2c style code  
generator, but more options is better than one option.

     Boy this thread is getting on a tangent:)



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Parser primitives

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Carl Barron wrote:

> On Nov 6, 2009, at 6:39 PM, Joel de Guzman wrote:
>>> Actually thinking about this a bit more, maybe I should just do a  
>>> simple
>>> generator that checks chars in code and we can just improve the  
>>> code as we
>>> go. It's actually *really* simple to do the basics as you just  
>>> iterate over
>>> the state machine and spit out some switch statements and some  
>>> gotos. I'll
>>> have a go with that and then we can look at why the generated code  
>>> is too
>>> slow! :-D
>> IMO, we shouldn't follow re2c strategy. Re2c is a code generator and
>> you'll end up like lex/yacc, which is not the spirit of spirit. A
>> long time ago, I've demonstrated code using perfect hashing that
>> generates code, using just the compiler, that beats most switch
>> implementations at the time. I'd say that's the right way to go:
>> beat Re2C performance without ever using a switch. A humongous
>> switch statement is just so, ehmmm... stone age. Ditto for offline
>> code generators.
>
>     Stone age?   Well even  lexer::lexertl's static generation is  
> 'offline' in that it needs a small C++ program to generate a header  
> file.
>
>     I think that a requiring a random access iterator for the source  
> iterator is a bigger problem, than providing an re2c style code  
> generator, but more options is better than one option.
>
>      Boy this thread is getting on a tangent:)

It also highlights my lack of knowledge of lexertl :P Hmmm...
Man, I should be paying more attention. Why does it require
the header file generation?

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net
http://www.facebook.com/djowel

Meet me at BoostCon
http://www.boostcon.com/home
http://www.facebook.com/boostcon



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general
< Prev | 1 - 2 | Next >