Incorrect setting of some '$' fields when FS = "" and LINT = 1

View: New views
2 Messages — Rating Filter:   Alert me  

Incorrect setting of some '$' fields when FS = "" and LINT = 1

by Nick Hobson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I think I've found a gawk bug.  I have demonstrated it in gawk 3.1.7 under Arch Linux and in gawk 3.1.6 under Xubuntu 8.10.

The program is:

#! /usr/bin/gawk -f

BEGIN {LINT = 1; FS = ""}

{
    for (i = 1; i <= NF; i++) {
        a[$i]++
        print FNR, i, $i
    }
}

The input data file is:

abc xyz

When run as gawk-bug test-input > test-output, the expected output is:

1 1 a
1 2 b
1 3 c
1 4  
1 5 x
1 6 y
1 7 z

The actual output is:

1 1 a
1 2 \0
1 3 \0
1 4  
1 5 \0
1 6 y
1 7 \0

(where '\0' is a binary zero.)

So the problem is that $2, $3, $5 and $7 are incorrectly being set to binary zero.

Curiously, the bug goes away if:
(a) LINT = 1 is removed (or changed to LINT = 0),
(b) a[$i]++ is removed, or
(c) FS = "" is removed -- in which case the actual output is as expected:
1 1 abc
1 2 xyz

Attached: gawk-bug, test-input, test-output

Regards,
Nick Hobson




gawk-bug (158 bytes) Download Attachment
test-input (14 bytes) Download Attachment
test-output (58 bytes) Download Attachment

Parent Message unknown Re: Incorrect setting of some '$' fields when FS = "" and LINT = 1

by Aharon Robbins :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Greetings.  Re this:

> Date: Thu, 1 Oct 2009 19:41:01 -0700 (PDT)
> From: Nick Hobson <nick.hobson@...>
> Subject: Incorrect setting of some '$' fields when FS = "" and LINT = 1
> To: bug-gawk@...
>
> Hi,
>
> I think I've found a gawk bug.  I have demonstrated it in gawk 3.1.7
> under Arch Linux and in gawk 3.1.6 under Xubuntu 8.10.
>
> The program is:
>
> #! /usr/bin/gawk -f
>
> BEGIN {LINT = 1; FS = ""}
>
> {
>     for (i = 1; i <= NF; i++) {
>         a[$i]++
>         print FNR, i, $i
>     }
> }
>
> The input data file is:
>
> abc xyz
>
> When run as gawk-bug test-input > test-output, the expected output is:
>
> 1 1 a
> 1 2 b
> 1 3 c
> 1 4  
> 1 5 x
> 1 6 y
> 1 7 z
>
> The actual output is:
>
> 1 1 a
> 1 2 \0
> 1 3 \0
> 1 4  
> 1 5 \0
> 1 6 y
> 1 7 \0
>
> (where '\0' is a binary zero.)
>
> So the problem is that $2, $3, $5 and $7 are incorrectly being set to
> binary zero.
>
> Curiously, the bug goes away if:
> (a) LINT = 1 is removed (or changed to LINT = 0),
> (b) a[$i]++ is removed, or
> (c) FS = "" is removed -- in which case the actual output is as expected:
> 1 1 abc
> 1 2 xyz
>
> Attached: gawk-bug, test-input, test-output
>
> Regards,
> Nick Hobson

Hi.  This is indeed a bug. Thank you for the bug report.
Here is a patch.

Sun Oct  4 18:45:06 2009  Arnold D. Robbins  <arnold@...>

        * array.c (assoc_lookup): In lint warning, don't clobber
        the character at the end of the subscript; instead use the
        length to limit the number of characters printed. Thanks to
        Nick Hobson <nick.hobson@...>.

Index: array.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/array.c,v
retrieving revision 1.5
diff -u -r1.5 array.c
--- array.c 9 Jul 2009 19:54:38 -0000 1.5
+++ array.c 4 Oct 2009 16:44:20 -0000
@@ -510,9 +510,8 @@
  }
 
  if (do_lint && reference) {
- subs->stptr[subs->stlen] = '\0';
- lintwarn(_("reference to uninitialized element `%s[\"%s\"]'"),
-      array_vname(symbol), subs->stptr);
+ lintwarn(_("reference to uninitialized element `%s[\"%.*s\"]'"),
+      array_vname(symbol), subs->stlen, subs->stptr);
  }
 
  /* It's not there, install it. */