[Bug localedata/13547] New: Different strings collate as equal in Hungarian

View: New views
3 Messages — Rating Filter:   Alert me  

[Bug localedata/13547] New: Different strings collate as equal in Hungarian

by Bugzilla from sourceware-bugzilla@sourceware.org :: Rate this Message:

| View Threaded | Show Only this Message

http://sourceware.org/bugzilla/show_bug.cgi?id=13547

             Bug #: 13547
           Summary: Different strings collate as equal in Hungarian
           Product: glibc
           Version: 2.14
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: libc-locales@...
        ReportedBy: egmont@...
    Classification: Unclassified


Created attachment 6139
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6139
collate fix for Hungarian

Please apply the attached patch to the Hungarian locale definition.

Using the current definition, certain strings collate as equal, e.g.
strcoll("ccs", "cscs") returns zero. This causes confusion with programs such
as sort (the order is undefined, might vary from run to run), or uniq
(different lines being reported as equal).

The given patch addresses this problem and makes them collate as different,
without modifying the actual sorting order of valid Hungarian words.

The problem in more detail:

We have compound letters, such as "sh" in English, e.g. we have "cs". Whenever
such a letter is pronounced long, we write it using a shorthand "ccs" notation
(only the first letter is duplicated), rather than "cscs".

Currently "ccs" is tokenized as <cs><cs>, which is correct, but "cscs" (not
used in valid Hungarian words, but might occur in text files anyways) is also
tokenized as <cs><cs>, hence they collate equal.

The solution is to tokenize "ccs" as <c_or_cs><cs>, and reorder the tokens like
<a> <b> <c> <c_or_cs> <cs> <d> ...

The problem was originally discovered at http://hup.hu/node/110267 (forum in
Hungarian).

--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug localedata/13547] Different strings collate as equal in Hungarian

by Bugzilla from sourceware-bugzilla@sourceware.org :: Rate this Message:

| View Threaded | Show Only this Message

http://sourceware.org/bugzilla/show_bug.cgi?id=13547

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #6139|0                           |1
        is obsolete|                            |

--- Comment #1 from Egmont Koblinger <egmont at gmail dot com> 2012-01-03 00:28:36 UTC ---
Created attachment 6140
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6140
collate fix for Hungarian

--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug localedata/13547] Different strings collate as equal in Hungarian

by Bugzilla from sourceware-bugzilla@sourceware.org :: Rate this Message:

| View Threaded | Show Only this Message

http://sourceware.org/bugzilla/show_bug.cgi?id=13547

Ulrich Drepper <drepper.fsp at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |drepper.fsp at gmail dot
                   |                            |com
         Resolution|                            |FIXED

--- Comment #2 from Ulrich Drepper <drepper.fsp at gmail dot com> 2012-01-07 16:05:07 UTC ---
I added the patch.

--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.