[lex] Character classes

View: New views
5 Messages — Rating Filter:   Alert me  

[lex] Character classes

by Kay-Michael Wuerzner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Another question related to this topic:
> The lexer doesn't support character sets either. Everything is implemented
> based on the standard locale (namespace boost::spirit::standard). This is
> something we want to look into in the future.
Do you think it would be possible to add another charset (comparable
to 'ascii.hpp' and 'iso-8859-1.hpp') let's say 'unicode.hpp' based on
this listing:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
and integrate it in spirit (based on wchar_t)? All the necessary
information (small, capital, control etc.) seems to be in there, so I
volunteer to script a conversion ;).

Cheers,
Kay

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Character classes

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Another question related to this topic:
> > The lexer doesn't support character sets either. Everything is
> implemented
> > based on the standard locale (namespace boost::spirit::standard).
> This is
> > something we want to look into in the future.
> Do you think it would be possible to add another charset (comparable
> to 'ascii.hpp' and 'iso-8859-1.hpp') let's say 'unicode.hpp' based on
> this listing:
> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
> and integrate it in spirit (based on wchar_t)? All the necessary
> information (small, capital, control etc.) seems to be in there, so I
> volunteer to script a conversion ;).

That actually has been the plan from the beginning, but it has not been
implemented yet.
We wanted to 'wait' for Boost to get a Unicode library, but if you have a
quicker solution be our guest to come up with a patch!

Thanks!
Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Character classes

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:

>> Another question related to this topic:
>>> The lexer doesn't support character sets either. Everything is
>> implemented
>>> based on the standard locale (namespace boost::spirit::standard).
>> This is
>>> something we want to look into in the future.
>> Do you think it would be possible to add another charset (comparable
>> to 'ascii.hpp' and 'iso-8859-1.hpp') let's say 'unicode.hpp' based on
>> this listing:
>> http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
>> and integrate it in spirit (based on wchar_t)? All the necessary
>> information (small, capital, control etc.) seems to be in there, so I
>> volunteer to script a conversion ;).
>
> That actually has been the plan from the beginning, but it has not been
> implemented yet.
> We wanted to 'wait' for Boost to get a Unicode library, but if you have a
> quicker solution be our guest to come up with a patch!

Alas, it's not that simple. You'll find out as you dig deeper into
UnicodeData.txt and its required semantics.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net
http://www.facebook.com/djowel

Meet me at BoostCon
http://www.boostcon.com/home
http://www.facebook.com/boostcon



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Parent Message unknown Re: [lex] Character classes

by Kay-Michael Wuerzner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> That actually has been the plan from the beginning, but it has not been
>> implemented yet.
>> We wanted to 'wait' for Boost to get a Unicode library, but if you have a
>> quicker solution be our guest to come up with a patch!

I'll try my best.

> Alas, it's not that simple. You'll find out as you dig deeper into
> UnicodeData.txt and its required semantics.

Granted, but there are Unicode libraries (based on UnicodeData.txt)
available for other languages, let's say python. One could use the
included 'upper', 'lower', 'digit', etc. classification to generate a
'wchar_t unicode_char_types[]'. From my experience, the python Unicode
support is very good. 'Upper'->'Lower' mappings are included for
really weird characters as 'Ⅲ' for example.

Cheers,
Kay

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: [lex] Character classes

by Joel de Guzman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kay-Michael Wuerzner wrote:

>>> That actually has been the plan from the beginning, but it has not been
>>> implemented yet.
>>> We wanted to 'wait' for Boost to get a Unicode library, but if you have a
>>> quicker solution be our guest to come up with a patch!
>
> I'll try my best.
>
>> Alas, it's not that simple. You'll find out as you dig deeper into
>> UnicodeData.txt and its required semantics.
>
> Granted, but there are Unicode libraries (based on UnicodeData.txt)
> available for other languages, let's say python. One could use the
> included 'upper', 'lower', 'digit', etc. classification to generate a
> 'wchar_t unicode_char_types[]'. From my experience, the python Unicode
> support is very good. 'Upper'->'Lower' mappings are included for
> really weird characters as 'Ⅲ' for example.

What about UTF-7, UTF-8, UTF-16 (UCS2), UTF32 (UCS4)? wchar_t
alone won't cut it. It can't even represent unicode by itself.
Each unicode character (code point) is 1 to 4 octets (8-bit bytes).
You need 32 bits to represent unicode and wchar_t is not
guaranteed to have 32 bits. It is 16 bits on some platforms
(and can be as small as 8 bits). uint32_t can be sufficient,
but it is very wasteful of memory usage. UTF-8 is very efficient
on memory usage but can have an impact on performance. The only
acceptable strategy is to be generic and not fix the data type.
It is hairy to implement, but it is the right way to go.

Sure, we can hack it, but I'd rather wait for a more robust
solution. The Boost unicode project is close to becoming
useful.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general