fix <ctype.h> bugs on 64-bit machine

View: New views
6 Messages — Rating Filter:   Alert me  

fix <ctype.h> bugs on 64-bit machine

by Eric Blake :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bruno Haible pointed out to me that on 64-bit machines, the use of isalpha
(0x100000001LL) will cause an out-of-bounds array dereference, rather than
returning the correct result of isalpha(1).  32-bit machines are immune.  All
of this stems from our QoI effort to warn about raw char arguments.  OK to
apply this fix?  I've verified that on 32-bit architectures, there is no
difference in the generated code, with or without optimization.  I've also
verified that 'gcc -Wall' still gives the desired warning.

diff --git i/newlib/libc/include/ctype.h w/newlib/libc/include/ctype.h
index e1d2d01..abc080e 100644
--- i/newlib/libc/include/ctype.h
+++ w/newlib/libc/include/ctype.h
@@ -47,24 +47,32 @@ extern __IMPORT char *__ctype_ptr__;
 #ifndef __cplusplus
 /* These macros are intentionally written in a manner that will trigger
    a gcc -Wall warning if the user mistakenly passes a 'char' instead
-   of an int containing an 'unsigned char'.  */
-#define isalpha(__c) ((__ctype_ptr__+1)[__c]&(_U|_L))
-#define isupper(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_U)
-#define islower(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_L)
-#define isdigit(__c) ((__ctype_ptr__+1)[__c]&_N)
-#define isxdigit(__c) ((__ctype_ptr__+1)[__c]&(_X|_N))
-#define isspace(__c) ((__ctype_ptr__+1)[__c]&_S)
-#define ispunct(__c) ((__ctype_ptr__+1)[__c]&_P)
-#define isalnum(__c) ((__ctype_ptr__+1)[__c]&(_U|_L|_N))
-#define isprint(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N|_B))
-#define isgraph(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N))
-#define iscntrl(__c) ((__ctype_ptr__+1)[__c]&_C)
+   of an int containing an 'unsigned char'.  Note that the sizeof will
+   always be 1, which is what we want for mapping EOF to __ctype_ptr__[0];
+   the use of a raw index inside the sizeof triggers the gcc warning if
+   __c was of type char, and sizeof masks side effects of the extra __c.
+   Meanwhile, the real index to __ctype_ptr__+1 must be cast to int,
+   since isalpha(0x100000001LL) must equal isalpha(1), rather than being
+   an out-of-bounds reference on a 64-bit machine.  */
+#define __ctype_lookup(__c) ((__ctype_ptr__+sizeof""[__c])[(int)__c])
+
+#define isalpha(__c) (__ctype_lookup(__c)&(_U|_L))
+#define isupper(__c) ((__ctype_lookup(__c)&(_U|_L))==_U)
+#define islower(__c) ((__ctype_lookup(__c)&(_U|_L))==_L)
+#define isdigit(__c) (__ctype_lookup(__c)&_N)
+#define isxdigit(__c) (__ctype_lookup(__c)&(_X|_N))
+#define isspace(__c) (__ctype_lookup(__c)&_S)
+#define ispunct(__c) (__ctype_lookup(__c)&_P)
+#define isalnum(__c) (__ctype_lookup(__c)&(_U|_L|_N))
+#define isprint(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N|_B))
+#define isgraph(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N))
+#define iscntrl(__c) (__ctype_lookup(__c)&_C)

 #if defined(__GNUC__) && \
     (!defined(__STRICT_ANSI__) || __STDC_VERSION__ >= 199901L)
 #define isblank(__c) \
   __extension__ ({ __typeof__ (__c) __x = (__c); \
-      ((__ctype_ptr__+1)[__x]&_B) || (__x) == '\t';})
+        (__ctype_lookup(__x)&_B) || (int) (__x) == '\t';})
 #endif





RE: fix <ctype.h> bugs on 64-bit machine

by Howland Craig D (Craig) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

One thought for a minor tweak:  since this is rather a unique construct,
add () to make more clear to the general population (i.e. later
maintainers) what is intended:
#define __ctype_lookup(__c) ((__ctype_ptr__+sizeof(""[__c]))[(int)__c])
                                                  ^       ^
(While most people probably know the general precedence rules, it's
hard to tell how many know that array dereferencing is higher than
sizeof.  And using "" as a shortcut to a character pointer is a bit
esoteric when coupled with [].  My compliments on a nice compact trick.)
 
I'm not normally a fan of () simply to make things easier to read for
people who don't know the precedence rules, but think that this is an
odd enough case to warrant considering it as a supplement to the
comments.
 
Craig

-----Original Message-----
From: newlib-owner@... [mailto:newlib-owner@...]
On Behalf Of Eric Blake
Sent: Monday, October 19, 2009 4:22 PM
To: newlib@...
Subject: fix <ctype.h> bugs on 64-bit machine

Bruno Haible pointed out to me that on 64-bit machines, the use of
isalpha
(0x100000001LL) will cause an out-of-bounds array dereference, rather
than
returning the correct result of isalpha(1).  32-bit machines are immune.
All
of this stems from our QoI effort to warn about raw char arguments.  OK
to
apply this fix?  I've verified that on 32-bit architectures, there is no

difference in the generated code, with or without optimization.  I've
also
verified that 'gcc -Wall' still gives the desired warning.

...
+#define __ctype_lookup(__c) ((__ctype_ptr__+sizeof""[__c])[(int)__c])
...

Re: fix <ctype.h> bugs on 64-bit machine

by Eric Blake :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 10/19/2009 2:21 PM:
> Bruno Haible pointed out to me that on 64-bit machines, the use of isalpha
> (0x100000001LL) will cause an out-of-bounds array dereference, rather than
> returning the correct result of isalpha(1).

Ping.

> 2009-10-19  Eric Blake  <ebb9@...>
>
> * libc/include/ctype.h (__ctype_lookup): New macro.
> (isalpha, isupper, islower, isdigit, isxdigit, isspace, ispunct)
> (isalnum, isprint, isgraph, iscntrl, isblank): Use it to fix bug
> on 64-bit machines.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9@...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrhmI8ACgkQ84KuGfSFAYCqIQCfSyILZv46HL8Gj//vvXgjRhU/
AdUAn2H+djL1JO6gcVBi1ON04y+XSSt7
=3hVr
-----END PGP SIGNATURE-----

Re: fix <ctype.h> bugs on 64-bit machine

by Jeff Johnston :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Go ahead.  You might as well put in the parentheses as suggested by Craig.

-- Jeff J.

On 19/10/09 04:21 PM, Eric Blake wrote:

> Bruno Haible pointed out to me that on 64-bit machines, the use of isalpha
> (0x100000001LL) will cause an out-of-bounds array dereference, rather than
> returning the correct result of isalpha(1).  32-bit machines are immune.  All
> of this stems from our QoI effort to warn about raw char arguments.  OK to
> apply this fix?  I've verified that on 32-bit architectures, there is no
> difference in the generated code, with or without optimization.  I've also
> verified that 'gcc -Wall' still gives the desired warning.
>
> diff --git i/newlib/libc/include/ctype.h w/newlib/libc/include/ctype.h
> index e1d2d01..abc080e 100644
> --- i/newlib/libc/include/ctype.h
> +++ w/newlib/libc/include/ctype.h
> @@ -47,24 +47,32 @@ extern __IMPORT char *__ctype_ptr__;
>   #ifndef __cplusplus
>   /* These macros are intentionally written in a manner that will trigger
>      a gcc -Wall warning if the user mistakenly passes a 'char' instead
> -   of an int containing an 'unsigned char'.  */
> -#define isalpha(__c) ((__ctype_ptr__+1)[__c]&(_U|_L))
> -#define isupper(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_U)
> -#define islower(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_L)
> -#define isdigit(__c) ((__ctype_ptr__+1)[__c]&_N)
> -#define isxdigit(__c) ((__ctype_ptr__+1)[__c]&(_X|_N))
> -#define isspace(__c) ((__ctype_ptr__+1)[__c]&_S)
> -#define ispunct(__c) ((__ctype_ptr__+1)[__c]&_P)
> -#define isalnum(__c) ((__ctype_ptr__+1)[__c]&(_U|_L|_N))
> -#define isprint(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N|_B))
> -#define isgraph(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N))
> -#define iscntrl(__c) ((__ctype_ptr__+1)[__c]&_C)
> +   of an int containing an 'unsigned char'.  Note that the sizeof will
> +   always be 1, which is what we want for mapping EOF to __ctype_ptr__[0];
> +   the use of a raw index inside the sizeof triggers the gcc warning if
> +   __c was of type char, and sizeof masks side effects of the extra __c.
> +   Meanwhile, the real index to __ctype_ptr__+1 must be cast to int,
> +   since isalpha(0x100000001LL) must equal isalpha(1), rather than being
> +   an out-of-bounds reference on a 64-bit machine.  */
> +#define __ctype_lookup(__c) ((__ctype_ptr__+sizeof""[__c])[(int)__c])
> +
> +#define isalpha(__c) (__ctype_lookup(__c)&(_U|_L))
> +#define isupper(__c) ((__ctype_lookup(__c)&(_U|_L))==_U)
> +#define islower(__c) ((__ctype_lookup(__c)&(_U|_L))==_L)
> +#define isdigit(__c) (__ctype_lookup(__c)&_N)
> +#define isxdigit(__c) (__ctype_lookup(__c)&(_X|_N))
> +#define isspace(__c) (__ctype_lookup(__c)&_S)
> +#define ispunct(__c) (__ctype_lookup(__c)&_P)
> +#define isalnum(__c) (__ctype_lookup(__c)&(_U|_L|_N))
> +#define isprint(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N|_B))
> +#define isgraph(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N))
> +#define iscntrl(__c) (__ctype_lookup(__c)&_C)
>
>   #if defined(__GNUC__)&&  \
>       (!defined(__STRICT_ANSI__) || __STDC_VERSION__>= 199901L)
>   #define isblank(__c) \
>     __extension__ ({ __typeof__ (__c) __x = (__c); \
> -      ((__ctype_ptr__+1)[__x]&_B) || (__x) == '\t';})
> +        (__ctype_lookup(__x)&_B) || (int) (__x) == '\t';})
>   #endif
>
>
>
>


Re: Re: fix <ctype.h> bugs on 64-bit machine

by Jon TURNEY :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


This change seems to introduce a regression if there is an assignment inside
the ctype macro argument, e.g.

#include <ctype.h>

int main()
{
         int c;
         int t = isspace(c = ' ');
}

after this ctype.h change, with the current cygwin gcc4, compilation fails
'error: lvalue required as left operand of assignment'.

A trivial patch to fix this attached.

On 19:59, Jeff Johnston wrote:

> Go ahead.  You might as well put in the parentheses as suggested by Craig.
>
> -- Jeff J.
>
> On 19/10/09 04:21 PM, Eric Blake wrote:
>> Bruno Haible pointed out to me that on 64-bit machines, the use of
>> isalpha
>> (0x100000001LL) will cause an out-of-bounds array dereference, rather
>> than
>> returning the correct result of isalpha(1). 32-bit machines are
>> immune. All
>> of this stems from our QoI effort to warn about raw char arguments. OK to
>> apply this fix? I've verified that on 32-bit architectures, there is no
>> difference in the generated code, with or without optimization. I've also
>> verified that 'gcc -Wall' still gives the desired warning.
>>
>> diff --git i/newlib/libc/include/ctype.h w/newlib/libc/include/ctype.h
>> index e1d2d01..abc080e 100644
>> --- i/newlib/libc/include/ctype.h
>> +++ w/newlib/libc/include/ctype.h
>> @@ -47,24 +47,32 @@ extern __IMPORT char *__ctype_ptr__;
>> #ifndef __cplusplus
>> /* These macros are intentionally written in a manner that will trigger
>> a gcc -Wall warning if the user mistakenly passes a 'char' instead
>> - of an int containing an 'unsigned char'. */
>> -#define isalpha(__c) ((__ctype_ptr__+1)[__c]&(_U|_L))
>> -#define isupper(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_U)
>> -#define islower(__c) (((__ctype_ptr__+1)[__c]&(_U|_L))==_L)
>> -#define isdigit(__c) ((__ctype_ptr__+1)[__c]&_N)
>> -#define isxdigit(__c) ((__ctype_ptr__+1)[__c]&(_X|_N))
>> -#define isspace(__c) ((__ctype_ptr__+1)[__c]&_S)
>> -#define ispunct(__c) ((__ctype_ptr__+1)[__c]&_P)
>> -#define isalnum(__c) ((__ctype_ptr__+1)[__c]&(_U|_L|_N))
>> -#define isprint(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N|_B))
>> -#define isgraph(__c) ((__ctype_ptr__+1)[__c]&(_P|_U|_L|_N))
>> -#define iscntrl(__c) ((__ctype_ptr__+1)[__c]&_C)
>> + of an int containing an 'unsigned char'. Note that the sizeof will
>> + always be 1, which is what we want for mapping EOF to __ctype_ptr__[0];
>> + the use of a raw index inside the sizeof triggers the gcc warning if
>> + __c was of type char, and sizeof masks side effects of the extra __c.
>> + Meanwhile, the real index to __ctype_ptr__+1 must be cast to int,
>> + since isalpha(0x100000001LL) must equal isalpha(1), rather than being
>> + an out-of-bounds reference on a 64-bit machine. */
>> +#define __ctype_lookup(__c) ((__ctype_ptr__+sizeof""[__c])[(int)__c])
>> +
>> +#define isalpha(__c) (__ctype_lookup(__c)&(_U|_L))
>> +#define isupper(__c) ((__ctype_lookup(__c)&(_U|_L))==_U)
>> +#define islower(__c) ((__ctype_lookup(__c)&(_U|_L))==_L)
>> +#define isdigit(__c) (__ctype_lookup(__c)&_N)
>> +#define isxdigit(__c) (__ctype_lookup(__c)&(_X|_N))
>> +#define isspace(__c) (__ctype_lookup(__c)&_S)
>> +#define ispunct(__c) (__ctype_lookup(__c)&_P)
>> +#define isalnum(__c) (__ctype_lookup(__c)&(_U|_L|_N))
>> +#define isprint(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N|_B))
>> +#define isgraph(__c) (__ctype_lookup(__c)&(_P|_U|_L|_N))
>> +#define iscntrl(__c) (__ctype_lookup(__c)&_C)
>>
>> #if defined(__GNUC__)&& \
>> (!defined(__STRICT_ANSI__) || __STDC_VERSION__>= 199901L)
>> #define isblank(__c) \
>> __extension__ ({ __typeof__ (__c) __x = (__c); \
>> - ((__ctype_ptr__+1)[__x]&_B) || (__x) == '\t';})
>> + (__ctype_lookup(__x)&_B) || (int) (__x) == '\t';})
>> #endif


ctype.h.patch (798 bytes) Download Attachment

Re: fix <ctype.h> bugs on 64-bit machine

by Eric Blake :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Jon TURNEY on 11/5/2009 5:11 PM:
> int main()
> {
>         int c;
>         int t = isspace(c = ' ');
> }
>
> after this ctype.h change, with the current cygwin gcc4, compilation
> fails 'error: lvalue required as left operand of assignment'.

Thanks for spotting this.  Your patch is correct, so I applied it as obvious.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9@...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrzl5MACgkQ84KuGfSFAYCuagCg0uxzgTxt4Tl1Frdm4CfwHwo2
RAEAoMrS3s0iEa9OP67TUajMbw04+/uZ
=MTMl
-----END PGP SIGNATURE-----