[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

View: New views
8 Messages — Rating Filter:   Alert me  

[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


URL:
  <http://savannah.gnu.org/patch/?6869>

                 Summary: fgrep/egrep returns wrong matched none UTF-8 chars
                 Project: grep
            Submitted by: fujiwara
            Submitted on: 2009年07月17日 08時20分26秒
                Category: None
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email:
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

fgrep/egrep check ASCII chars so the command doesn't work on none UTF-8
locales.

E.g. in GB18030 encoding, 4 bytes 0x 81308938 expresses a char of the
multi-byte 'beta'.

/bin/echo -e "\x81\x30\x89\x38" | fgrep '0'

The echo outputs the multi-byte chars and fgrep matches the single-byte
0x30.

The attaching patch fixes bmexec() to work with the multi-byte chars.



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: 2009年07月17日 08時20分26秒  Name:
grep-508811-head-fgrep-bmexec.diff  Size: 2kB   By: fujiwara
Patch for src/kwset.c
<http://savannah.gnu.org/patch/download.php?file_id=18429>

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #1, patch #6869 (project grep):

The usage of `echo' is incorrect.  Instead, use `printf'.

$ /bin/echo -e "x81x30x89x38"
x81x30x89x38
$ /bin/echo -e "x81x30x89x38" | od -tx1 -Ax
000000 5c 78 38 31 5c 78 33 30 5c 78 38 39 5c 78 33 38
000010 0a
000011
$ printf "x81x30x89x38" | od -tx1 -Ax
000000 81 30 89 38
000004

$ printf "x81x30x89x38" | LANG=zh_CN.gb18030 grep -o '0'
$ printf "x81x30x89x38" | LANG=C grep -o '0'
0
$ grep --version
GNU grep 2.5.4

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #2, patch #6869 (project grep):

> The usage of `echo' is incorrect. Instead, use `printf'.
> $ /bin/echo -e "x81x30x89x38"
> x81x30x89x38

No, it's not different of my result.

% /bin/echo -e "x81x30x89x38" | od -tx1 -Ax
000000 81 30 89 38 0a
000005

Did you run the terminal on zh_CN.GB18030 ?

Also the problem is egrep/fgrep but not grep.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #3, patch #6869 (project grep):

My typo:

> No, it's not different of my result.
No, it's different from my result.

I can reproduce this problem with either echo or printf.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #4, patch #6869 (project grep):

I can have reproduced it with egrep/fgrep bundled on CentOS etc.,
but can't with original version of egrep/fgrep.

Have you downloaded grep program from http://ftp.gnu.org/gnu/grep/
but not applied any patches to it ?


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




Re: [patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Takao Fujiwara :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

(07/22/09 01:30), Norihirio Tanaka-san wrote:
> Follow-up Comment #4, patch #6869 (project grep):
>
> I can have reproduced it with egrep/fgrep bundled on CentOS etc.,
> but can't with original version of egrep/fgrep.

Thanks much for your point.
I didn't notice the internal patch of search.c

Please close this issue. I'll update the internal patch.

>
> Have you downloaded grep program from http://ftp.gnu.org/gnu/grep/
> but not applied any patches to it ?
>
>
>      _______________________________________________________
>
> Reply to this item at:
>
>    <http://savannah.gnu.org/patch/?6869>
>
> _______________________________________________
>    Message sent via/by Savannah
>    http://savannah.gnu.org/
>
>




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #5, patch #6869 (project grep):

> Follow-up Comment #4, patch #6869 (project grep):
>
> I can have reproduced it with egrep/fgrep bundled on CentOS etc.,
> but can't with original version of egrep/fgrep.

Thanks much for your point.
I didn't notice the internal patch of search.c

Please close this issue. I'll update the internal patch.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




[patch #6869] fgrep/egrep returns wrong matched none UTF-8 chars

by Phil Carmody-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Update of patch #6869 (project grep):

                  Status:                    None => Invalid                
             Open/Closed:                    Open => Closed                

    _______________________________________________________

Follow-up Comment #6:

Closing as requested by reporter.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?6869>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/