|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
Case and accent insensitive string searchI'm trying to set up a string search where a collation, text and search string are passed in and the position of the match is passed out. Simplifying the code, I have: search(collName, text, search) ucol = ucol_openFromShortString(collName) locale = ucol_getLocaleByType(ucol) ubrk = ubrk_openFromCollator(UBRK_CHARACTER, locale, text) usearch = usearch_openFromCollator(search, text, ucol, ubrk) return(usearch_first(usearch) When I first tried this, I didn't have the character break iterator and I was getting incorrect results for accented characters and strength 3 collations. I saw the comments in the ICU documentation and added in the character break iterator. Now, however, I'm having trouble with combining characters and strength 1 collations. If I pass in: collName = LEN_S1 text = 0043 00D4 0054 00C9 (CÔTÉ) search = 004F (O) the function returns offset 1 as expected. However if I pass in: collName = LEN_S1 text = 0043 004F 0302 0054 00C9 (CO^TÉ) search = 004F (O) the search fails and returns -1. So, can anyone tell me what I'm missing here? ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
|
|
|
Re: Case and accent insensitive string searchOn 4/24/07, Doug Doole <doole@...> wrote:
> I've been playing with this a bit more, but I'm still stumped. It is, I'm afraid, a bug. There are several known problems with string search in this general area. See tickets Trac tickets at http://bugs.icu-project.org/trac #5420, #5382, #5024, #4279, #4038, #3536, The problems, unfortunately, do not have ready fixes. You will need to evaluate whether ICU string search can meet your needs in its current form. -- Andy > A couple > things I checked: > - The character break iterator is properly determining the character > boundaries > - The language for the collator doesn't seem to make any difference > > So, if anyone has a suggestion on how I can get this working properly, I > would really appreciate it. > > > I'm trying to set up a string search where a collation, text and > > search string are passed in and the position of the match is passed out. > > > > Simplifying the code, I have: > > > > search(collName, text, search) > > ucol = ucol_openFromShortString(collName) > > locale = ucol_getLocaleByType(ucol) > > ubrk = ubrk_openFromCollator(UBRK_CHARACTER, locale, text) > > usearch = usearch_openFromCollator(search, text, ucol, ubrk) > > return(usearch_first(usearch) > > > > When I first tried this, I didn't have the character break iterator > > and I was getting incorrect results for accented characters and > > strength 3 collations. I saw the comments in the ICU documentation > > and added in the character break iterator. > > > > Now, however, I'm having trouble with combining characters and > > strength 1 collations. > > > > If I pass in: > > collName = LEN_S1 > > text = 0043 00D4 0054 00C9 (CÔTÉ) > > search = 004F (O) > > the function returns offset 1 as expected. > > > > However if I pass in: > > collName = LEN_S1 > > text = 0043 004F 0302 0054 00C9 (CO^TÉ) > > search = 004F (O) > > the search fails and returns -1. > > > > So, can anyone tell me what I'm missing here? > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchOn 4/25/07, Andy Heninger <andy.heninger@...> wrote:
> On 4/24/07, Doug Doole <doole@...> wrote: > > I've been playing with this a bit more, but I'm still stumped. > > It is, I'm afraid, a bug. > Doug wrote > However if I pass in: > collName = LEN_S1 > text = 0043 004F 0302 0054 00C9 (CO^TÉ) > search = 004F (O) > the search fails and returns -1. A question: does it still fail to find the match if you don't use the character break iterator with stength 1 matching. Which would be the case if the underlying match failed to include the combining mark.302 character. If that's the case, a workaround might be to make use of the break iterator dependent on strength > 1. -- Andy ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchAndy wrote:
> Doug wrote > > However if I pass in: > > collName = LEN_S1 > > text = 0043 004F 0302 0054 00C9 (CO^TÉ) > > search = 004F (O) > > the search fails and returns -1. > > A question: does it still fail to find the match if you don't use the > character break iterator with stength 1 matching. Which would be the > case if the underlying match failed to include the combining mark.302 > character. > > If that's the case, a workaround might be to make use of the break > iterator dependent on strength > 1. If I drop the character break iterator, then it does find the matching character. However it reports the match length as 1 (the combining accent is not considered part of the match) which causes problems for subsequent processing that my code needs to do. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchI think a workaround is to
Mark
On 4/27/07, Doug Doole <doole@...> wrote: Andy wrote: -- Mark ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
UnicodeString comparison to char*I've got the following code fragment (cut down here), and it works,
but I don't understand why. const std::string &DesiredTimezone; UnicodeString TimeZoneID; TimeZone *timezone = TimeZone::createTimeZone (DesiredTimezone.c_str()); if (timezone->getID (TimeZoneID) != DesiredTimezone.c_str()) { printf ("not the selected timezone\n"); } What I don't understand is how the returned value from TimeZone::getID(), which is a reference to a UnicodeString, is correctly being compared to the "char*" string (using the obvious string-comparison semantics). Looking in the UnicodeString Class Reference page, I don't see operators that would perform the string comparison I want and which is, in fact, actually happening. I could just accept this, but I want to know how it works so I know I'm not making some serious mistake. What am I missing? __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: UnicodeString comparison to char*icu-support-bounces@... wrote on 04/27/2007 10:39:24 AM:
> I've got the following code fragment (cut down here), and it works, > but I don't understand why. > > const std::string &DesiredTimezone; > UnicodeString TimeZoneID; > TimeZone *timezone = TimeZone::createTimeZone (DesiredTimezone.c_str()); > if (timezone->getID (TimeZoneID) != DesiredTimezone.c_str()) > { > printf ("not the selected timezone\n"); > } > > What I don't understand is how the returned value from TimeZone::getID(), > which is a reference to a UnicodeString, is correctly being compared to > the "char*" string (using the obvious string-comparison semantics). > Looking in the UnicodeString Class Reference page, I don't see operators > that would perform the string comparison I want and which is, in fact, > actually happening. I could just accept this, but I want to know how > it works so I know I'm not making some serious mistake. What am I missing? UnicodeString has a constructor that takes a const char*, and there's an operator== for two UnicodeString instances, so the compiler constructs a temporary to do the comparison. Dave ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: UnicodeString comparison to char*From: <david_n_bertoni@...>
[...] >> What I don't understand is how the returned value from > TimeZone::getID(), >> which is a reference to a UnicodeString, is correctly being compared to >> the "char*" string (using the obvious string-comparison semantics). >> Looking in the UnicodeString Class Reference page, I don't see operators >> that would perform the string comparison I want and which is, in fact, >> actually happening. I could just accept this, but I want to know how >> it works so I know I'm not making some serious mistake. What am I > missing? > > UnicodeString has a constructor that takes a const char*, and there's an > operator== for two UnicodeString instances, so the compiler constructs a > temporary to do the comparison. But what does it do to the narrow const char* to make it 'wide'? Use the default system code page? Sounds like a constructor that shouldn't exist to avoid unintended side effects... Bob ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: UnicodeString comparison to char*> But what does it do to the narrow const char* to make it 'wide'? Use the
> default system code page? Yes, that is how char * strings are converted to UChar * strings in the code snippit. The string is also converted to UTF-16 on the createTimeZone line. If you use "#define UCONFIG_NO_CONVERSION 1" in your application, you can ensure that you're not using codepage conversion. This will also make it more difficult to use ICU with the non-Unicode OS functions, but it will help you find the lines that are using codepage conversion. So you can reduce the number of times you reconvert between char * and UChar * or use alternate faster ICU functions to do the conversion. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: UnicodeString comparison to char*icu-support-bounces@... wrote on 04/27/2007 08:35:28 PM:
> From: <david_n_bertoni@...> > [...] > >> What I don't understand is how the returned value from > > TimeZone::getID(), > >> which is a reference to a UnicodeString, is correctly being compared to > >> the "char*" string (using the obvious string-comparison semantics). > >> Looking in the UnicodeString Class Reference page, I don't see operators > >> that would perform the string comparison I want and which is, in fact, > >> actually happening. I could just accept this, but I want to know how > >> it works so I know I'm not making some serious mistake. What am I > > missing? > > > > UnicodeString has a constructor that takes a const char*, and there's an > > operator== for two UnicodeString instances, so the compiler constructs a > > temporary to do the comparison. > > But what does it do to the narrow const char* to make it 'wide'? Use the > default system code page? Sounds like a constructor that shouldn't exist to > avoid unintended side effects... Well, the documentation makes this pretty clear: /** * char* constructor. * @param codepageData an array of bytes, null-terminated * @param codepage the encoding of <TT>codepageData</TT>. The special * value 0 for <TT>codepage</TT> indicates that the text is in the * platform's default codepage. * * If <code>codepage</code> is an empty string (<code>""</code>), * then a simple conversion is performed on the codepage-invariant * subset ("invariant characters") of the platform encoding. See utypes.h. * Recommendation: For invariant-character strings use the constructor * UnicodeString(const char *src, int32_t length, enum EInvariant inv) * because it avoids object code dependencies of UnicodeString on * the conversion code. * * @stable ICU 2.0 */ And, as George said, you can avoid implicit conversions if you want you. Dave ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchOn 4/27/07, Mark Davis <mark.davis@...> wrote:
> I think a workaround is to > > use the search without the character break iterator. > then use the character break iterator to extend the start and end boundaries > to include whole grapheme clusters, > then retest that result with the collator set to whatever your strength is, > then skip to the next if it doesn't match.Let me know if that works. > Mark, a question about what would what StringSearch should really be doing (as opposed to what it does now): In what cases should a match not extend to a grapheme cluster boundary anyhow, without the extra complication of a user supplied break iterator? Something along the lines of, after finding a match, continue to iterate collation elements in the string until a grapheme cluster boundary is reached, and any CEs encountered along the way better be ignorable. -- Andy > > On 4/27/07, Doug Doole <doole@...> wrote: > > Andy wrote: > > > Doug wrote > > > > However if I pass in: > > > > collName = LEN_S1 > > > > text = 0043 004F 0302 0054 00C9 (CO^TÉ) > > > > search = 004F (O) > > > > the search fails and returns -1. > > > > > > A question: does it still fail to find the match if you don't use the > > > character break iterator with stength 1 matching. Which would be the > > > case if the underlying match failed to include the combining mark.302 > > > character. > > > > > > If that's the case, a workaround might be to make use of the break > > > iterator dependent on strength > 1. > > > > If I drop the character break iterator, then it does find the matching > > character. However it reports the match length as 1 (the combining accent > > is not considered part of the match) which causes problems for subsequent > > processing that my code needs to do. > > > > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchVladimir and I have been discussing some different approaches, and he'll see if we can mock something up to see if it is worth pursuing. Mark
On 4/29/07, Andy Heninger <andy.heninger@...> wrote: On 4/27/07, Mark Davis <mark.davis@...> wrote: -- Mark ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchOn 5/1/07, Mark Davis <mark.davis@...> wrote:
> One may or may not want to test for that boundary. > > Vladimir and I have been discussing some different approaches, and he'll see > if we can mock something up to see if it is worth pursuing. > I've been poking around more in String Search - the bugs are becoming more of an issue here. But, separate from the bugs, the choice of match options provided now seems way too confusing. I can't imagine ordinary developers sorting them out without an excess of pain and annoyance. Maybe something along these lines would produce usable, non-surprising results without needing a bunch of options: Strength 1 matches always absorb all combining stuff following the match, up to the next base character. Strength > 1 matches require a perfect collation element match through all combining stuff at the end of the match in the text being searched. The final grapheme cluster of the match is considered as a whole - it either matches the pattern entirely, or the match fails. No more tearing it apart and reordering it looking for a match. Get rid of the canonical vs exact match distinction that exists in the current API. Get rid of the break iterator option. Make a pattern that starts with unattached combining marks either an error or a warning. It's probably not going to do what the user expected - it won't match normal (attached) combining marks from the text being searched. If we could redo the API, I'd be inclined towards something much much simpler - forward searching only, no retained index position, results returned only as start and limit indexes (compatible with UText, could be non-UTF-16), nothing explicit for iteration, overlapping vs. non-overlapping and the like - just a starting index parameter on the search function that the caller provides & manages. -- Andy ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchI agree with your analysis Andy. It appears that the current API is
too complicated and that it should be redone. Another point there is that the only reason for backward iteration over CEs in collation is string search's need to go back. I'm thinking about this in a background thread and I hope to have some ideas figured out soon. Regards, v. On May 2, 2007, at 10:07 AM, Andy Heninger wrote: > On 5/1/07, Mark Davis <mark.davis@...> wrote: >> One may or may not want to test for that boundary. >> >> Vladimir and I have been discussing some different approaches, and >> he'll see >> if we can mock something up to see if it is worth pursuing. >> > > I've been poking around more in String Search - the bugs are becoming > more of an issue here. > > But, separate from the bugs, the choice of match options provided now > seems way too confusing. I can't imagine ordinary developers sorting > them out without an excess of pain and annoyance. > > Maybe something along these lines would produce usable, non-surprising > results without needing a bunch of options: > > Strength 1 matches always absorb all combining stuff following the > match, up to the next base character. > > Strength > 1 matches require a perfect collation element match through > all combining stuff at the end of the match in the text being > searched. The final grapheme cluster of the match is considered as a > whole - it either matches the pattern entirely, or the match fails. > No more tearing it apart and reordering it looking for a match. > > Get rid of the canonical vs exact match distinction that exists in the > current API. > > Get rid of the break iterator option. > > Make a pattern that starts with unattached combining marks either an > error or a warning. It's probably not going to do what the user > expected - it won't match normal (attached) combining marks from the > text being searched. > > If we could redo the API, I'd be inclined towards something much much > simpler - forward searching only, no retained index position, results > returned only as start and limit indexes (compatible with UText, could > be non-UTF-16), nothing explicit for iteration, overlapping vs. > non-overlapping and the like - just a starting index parameter on the > search function that the caller provides & manages. > > -- Andy > > ---------------------------------------------------------------------- > --- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > icu-support mailing list - icu-support@... > To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu- > support ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchMark, I finally got a chance to try this and it seems to work. Thanks!
A bit of follow-up: Why is the last comparison needed? (I'm concerned about performance.) Can you think of a case where your suggestion would fail without the extra check? > I think a workaround is to > use the search without the character break iterator. > then use the character break iterator to extend the start and end > boundaries to include whole grapheme clusters, > then retest that result with the collator set to whatever your strength is, > then skip to the next if it doesn't match. > Let me know if that works. > > Mark > On 4/27/07, Doug Doole <doole@...> wrote: > Andy wrote: > > Doug wrote > > > However if I pass in: > > > collName = LEN_S1 > > > text = 0043 004F 0302 0054 00C9 (CO^TÉ) > > > search = 004F (O) > > > the search fails and returns -1. > > > > A question: does it still fail to find the match if you don't use the > > character break iterator with stength 1 matching. Which would be the > > case if the underlying match failed to include the combining mark.302 > > character. > > > > If that's the case, a workaround might be to make use of the break > > iterator dependent on strength > 1. > > If I drop the character break iterator, then it does find the matching > character. However it reports the match length as 1 (the combining accent > is not considered part of the match) which causes problems for subsequent > processing that my code needs to do. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchSuppose that you have primary+secondary on, and search for "abc". What you find is "...[abc]^..." (where the ^ is some accent, and [...] means what you found). When you extend the boundary to a grapheme cluster, you get [abc^], but that is not a match with the original. The same thing could happen with Korean. The cost of comparing the substring again should be a small percentage of the total, for this workaround.
Mark (Also, some of us are now taking a look at this, and we should be able to come up with a more reliable (and higher performance) mechanism.) On 5/10/07, Doug Doole <doole@...> wrote:
Mark, I finally got a chance to try this and it seems to work. Thanks! -- Mark ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string search> Suppose that you have primary+secondary on, and search for "abc".
> What you find is "...[abc]^..." (where the ^ is some accent, and > [...] means what you found). When you extend the boundary to a > grapheme cluster, you get [abc^], but that is not a match with the > original. The same thing could happen with Korean. The cost of > comparing the substring again should be a small percentage of the > total, for this workaround. Geez, that's obvious - apparently my brain is on vacation without me today. > (Also, some of us are now taking a look at this, and we should be > able to come up with a more reliable (and higher performance) mechanism.) That's good to hear. However I'm stuck with the broken code for now. You wouldn't happen to know a good work around for the unable to match leading/trailing characters bugs (see bugs 5024, 5420)? (It seems that adding a space at the beginning and ending of the text to be searched solves the problem, but that's a big pain.) ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchIf the only remaining failure cases are at the beginning and end of the searched-in text, I'd suggest -- as a workaround --
On 5/10/07,
Doug Doole <doole@...> wrote: > Suppose that you have primary+secondary on, and search for "abc". -- Mark ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: Case and accent insensitive string searchOn 5/10/07, Doug Doole <doole@...> wrote:
[snip] > However I'm stuck with the broken code for now. You wouldn't happen to know > a good work around for the unable to match leading/trailing characters bugs > (see bugs 5024, 5420)? (It seems that adding a space at the beginning and > ending of the text to be searched solves the problem, but that's a big > pain.) > I'm continuing to look into alternate C/C++ search implementations that should eliminate the problems with the existing one. But there are some things going on that I don't understand having to do with expansions, matching the German ß with ss, for example. The issues here do affect the current implementation and are not restricted to the ends of the string. I'll let you know what comes of it. -- Andy -- Andy ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |