http://sourceware.org/bugzilla/show_bug.cgi?id=13547 Bug #: 13547
Summary: Different strings collate as equal in Hungarian
Product: glibc
Version: 2.14
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo:
libc-locales@...
ReportedBy:
egmont@...
Classification: Unclassified
Created attachment 6139
-->
http://sourceware.org/bugzilla/attachment.cgi?id=6139collate fix for Hungarian
Please apply the attached patch to the Hungarian locale definition.
Using the current definition, certain strings collate as equal, e.g.
strcoll("ccs", "cscs") returns zero. This causes confusion with programs such
as sort (the order is undefined, might vary from run to run), or uniq
(different lines being reported as equal).
The given patch addresses this problem and makes them collate as different,
without modifying the actual sorting order of valid Hungarian words.
The problem in more detail:
We have compound letters, such as "sh" in English, e.g. we have "cs". Whenever
such a letter is pronounced long, we write it using a shorthand "ccs" notation
(only the first letter is duplicated), rather than "cscs".
Currently "ccs" is tokenized as <cs><cs>, which is correct, but "cscs" (not
used in valid Hungarian words, but might occur in text files anyways) is also
tokenized as <cs><cs>, hence they collate equal.
The solution is to tokenize "ccs" as <c_or_cs><cs>, and reorder the tokens like
<a> <b> <c> <c_or_cs> <cs> <d> ...
The problem was originally discovered at
http://hup.hu/node/110267 (forum in
Hungarian).
--
Configure bugmail:
http://sourceware.org/bugzilla/userprefs.cgi?tab=email------- You are receiving this mail because: -------
You are the assignee for the bug.