ucol_getSortKey, with a special EOL unicode character ?

View: New views
2 Messages — Rating Filter:   Alert me  

ucol_getSortKey, with a special EOL unicode character ?

by s.andre :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I'm trying to using ICU inside IBM DB2 and i create like the IBM example a
function ICU.SORTKEY that use ucol_getSortKey.

This case work fine: (check a value on a column with the collation LFR_AS_S1
(ignore case,ponctuation,accents)

SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') =
ICU.SORTKEY('motif','LFR_AS_S1');

But this case doesn't work (check a portion value on a column)
SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') LIKE ('%' ||
ICU.SORTKEY('ot','LFR_AS_S1') || '%');


ucol_getSortKey for "motif" with a "LFR_AS_S1" collation return "AEO9?"
ucol_getSortKey for "ot" with a "LFR_AS_S1" collation return "E?"

I don't know what is really the ? character, but by the way "E?"  ins'nt
included into "AEO9?" and the LIKE statement fails

For test purpose, i compile my dll with a modified source that remove the 2
last chars in the ucol_getSortKey  results and it seems to work.
But i want know is it's the good way for handle this task
What is for this character ?
A String's part has always the same key that does'nt depends of the rest of
the string ?

Best regards,
ANDRE Sébastien.



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support

Re: ucol_getSortKey, with a special EOL unicode character ?

by Doug Doole :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> But this case doesn't work (check a portion value on a column)
> SELECT .... FROM ... WHERE ICU.SORTKEY(MyField,'LFR_AS_S1') LIKE ('%' ||
> ICU.SORTKEY('ot','LFR_AS_S1') || '%');

Andre,
You cannot use the ICU.SORTKEY function in a LIKE predicate in this manner.

When ICU builds a sort key, it generates one or more weights per character
and then concatenates them together. (Unfortunately, I can't do subscripts
here, so please bear with some ugly syntax. I'm also going to simplify a
little bit - you can get the full details of ICU sort keys from their
documentation.)

For a character X, let the corresponding weights be denoted by X1 X2 X3 X4.
Let | represent the ICU field separator and $ represent the end-of-key
indicator.

Most ICU collations default to weight 3, so your search string "OT" would
get the sort key:
   O1 T1 | O2 T2 | O3 T3 $

The target string "MOTIF" would get the key:

   M1 O1 T1 I1 F1 | M2 O2 T2 I2 F2 | M3 O3 T3 I3 F3 $

DB2's LIKE predicate does a simple binary match, and you can see that the
search sort key does not match any substring of the target sort key.

In your case, you chose a strength 1 collation so the search sort key is
simply "O1 T1 $" and the target sort key is "M1 O1 T1 I1 F1 $". In theory,
stripping the end-of-key indicator of the search sort key would allow DB2
to find a binary match. ICU, however, in a effort to reduce memory
consumption compresses the generated sort key. As you have seen, the
compression prevents the match from being found.

You've got a couple options:
- In the UDF, rather than using ucol_getSortKey(), try using
ucol_getNextSortKeyPart() to get the sort key. This functions doesn't
compress the sort key so, in theory, you could make it work for LIKE
predicates. ** This will only work for strength 1 collations. ** If you use
a strength > 1 you get into my first case and you won't be able to find a
match with either API.

- You could try creating a similar UDF based on the ICU string search API.
(Note that there are some known issues with the string search API though.
See this mailing list's archives and/or bug list for details.)

Hope this helps.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
icu-support mailing list - icu-support@...
To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support