|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
ucol_getSortKey, with a special EOL unicode character ?Hi,
I'm trying to using ICU inside IBM DB2 and i create like the IBM example a function ICU.SORTKEY that use ucol_getSortKey. This case work fine: (check a value on a column with the collation LFR_AS_S1 (ignore case,ponctuation,accents) SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') = ICU.SORTKEY('motif','LFR_AS_S1'); But this case doesn't work (check a portion value on a column) SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') LIKE ('%' || ICU.SORTKEY('ot','LFR_AS_S1') || '%'); ucol_getSortKey for "motif" with a "LFR_AS_S1" collation return "AEO9?" ucol_getSortKey for "ot" with a "LFR_AS_S1" collation return "E?" I don't know what is really the ? character, but by the way "E?" ins'nt included into "AEO9?" and the LIKE statement fails For test purpose, i compile my dll with a modified source that remove the 2 last chars in the ucol_getSortKey results and it seems to work. But i want know is it's the good way for handle this task What is for this character ? A String's part has always the same key that does'nt depends of the rest of the string ? Best regards, ANDRE Sébastien. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
|
|
Re: ucol_getSortKey, with a special EOL unicode character ?> But this case doesn't work (check a portion value on a column)
> SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') LIKE ('%' || > ICU.SORTKEY('ot','LFR_AS_S1') || '%'); Andre, You cannot use the ICU.SORTKEY function in a LIKE predicate in this manner. When ICU builds a sort key, it generates one or more weights per character and then concatenates them together. (Unfortunately, I can't do subscripts here, so please bear with some ugly syntax. I'm also going to simplify a little bit - you can get the full details of ICU sort keys from their documentation.) For a character X, let the corresponding weights be denoted by X1 X2 X3 X4. Let | represent the ICU field separator and $ represent the end-of-key indicator. Most ICU collations default to weight 3, so your search string "OT" would get the sort key: O1 T1 | O2 T2 | O3 T3 $ The target string "MOTIF" would get the key: M1 O1 T1 I1 F1 | M2 O2 T2 I2 F2 | M3 O3 T3 I3 F3 $ DB2's LIKE predicate does a simple binary match, and you can see that the search sort key does not match any substring of the target sort key. In your case, you chose a strength 1 collation so the search sort key is simply "O1 T1 $" and the target sort key is "M1 O1 T1 I1 F1 $". In theory, stripping the end-of-key indicator of the search sort key would allow DB2 to find a binary match. ICU, however, in a effort to reduce memory consumption compresses the generated sort key. As you have seen, the compression prevents the match from being found. You've got a couple options: - In the UDF, rather than using ucol_getSortKey(), try using ucol_getNextSortKeyPart() to get the sort key. This functions doesn't compress the sort key so, in theory, you could make it work for LIKE predicates. ** This will only work for strength 1 collations. ** If you use a strength > 1 you get into my first case and you won't be able to find a match with either API. - You could try creating a similar UDF based on the ICU string search API. (Note that there are some known issues with the string search API though. See this mailing list's archives and/or bug list for details.) Hope this helps. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ icu-support mailing list - icu-support@... To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support |
| Free embeddable forum powered by Nabble | Forum Help |