library/6254: i386 libm long double sqrt function sqrtl() is very inaccurate

View: New views
3 Messages — Rating Filter:   Alert me  

library/6254: i386 libm long double sqrt function sqrtl() is very inaccurate

by Jonathan Thornburg-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>Number:         6254
>Category:       library
>Synopsis:       i386 libm long double sqrt function sqrtl() is very inaccurate
>Confidential:   yes
>Severity:       serious
>Priority:       medium
>Responsible:    bugs
>State:          open
>Quarter:        
>Keywords:      
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   unknown
>Arrival-Date:   Thu Nov 05 04:20:01 GMT 2009
>Closed-Date:
>Last-Modified:
>Originator:    
>Release:        
>Organization:
>Environment:
        System      : OpenBSD 4.6
        Details     : OpenBSD 4.6-stable (GENERIC) #1: Mon Nov  2 14:30:55 EST 2009
                         root@...:/usr/src/sys/arch/i386/compile/GENERIC

        Architecture: OpenBSD.i386
        Machine     : i386
>Description:
        In some cases the long double libm function
          long double sqrtl(long double x)
        has a relative error of over 1e-10, i.e., about 10 million times
        *worse* than that of the IEEE-double sqrt function.
       
>How-To-Repeat:
        Here's a trivial test program demonstrating the error:
                % cat sqrt2-error.c
                #include <stdio.h>
                #include <math.h>

                /*
                 * test the accuracy of the libm sqrt() and sqrtl() routines
                 */
                int main(void)
                {
                const      double  dx = sqrt (2.0);
                const long double ldx = sqrtl(2.0L);
                const      double error_double      =  dx* dx - 2.0;
                const long double error_long_double = ldx*ldx - 2.0L;

                printf("double\n");
                printf("   sqrt(2) = %.16g\n", dx);
                printf("   error   = %g\n", error_double);
                printf("long double\n");
                printf("   sqrt(2) = %.20Lg\n", ldx);
                printf("           = %.16g\n", (double)ldx);
                printf("   error   = %g\n", (double)error_long_double);
                }
                % /usr/bin/gcc --version
                gcc (GCC) 3.3.5 (propolice)
                Copyright (C) 2003 Free Software Foundation, Inc.
                This is free software; see the source for copying conditions.  There is NO
                warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

                % /usr/bin/gcc -o sqrt2-error sqrt2-error.c -lm
                % ./sqrt2-error
                double
                   sqrt(2) = 1.414213562373095
                   error   = 2.73436e-16
                long double
                   sqrt(2) = 1.4142135626059256924
                           = 1.414213562605926
                   error   = 6.58545e-10
                %
        Notice that sqrtl() gives a result which is wrong in the 10th decimal
        place.  Notice also that printf() is *not* the culprit here -- even
        if the long-double result is converted back to double before printing,
        it still prints as wrong in the 10th decimal place.

        Interestingly, if I compile with gcc 4.2.4 (from packages), then
        the long-double sqrt works fine (at least in my limited tests so far):
                % /usr/local/bin/egcc --version
                egcc (GCC) 4.2.4
                Copyright (C) 2007 Free Software Foundation, Inc.
                This is free software; see the source for copying conditions.  There is NO
                warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

                % /usr/local/bin/egcc -o sqrt2-error sqrt2-error.c -lm
                % ./sqrt2-error
                double
                   sqrt(2) = 1.414213562373095
                   error   = 2.73436e-16
                long double
                   sqrt(2) = 1.4142135623730950488
                           = 1.414213562373095
                   error   = -1.0842e-19
                %

>Fix:
        The Cephes math library
          http://www.moshier.net/#Cephes
          http://www.moshier.net/ldouble.zip
          http://www.moshier.net/ldoubdoc.html
        includes source code for a full long-double libm.

SENDBUG: dmesg is attached.

dmesg:
OpenBSD 4.6-stable (GENERIC) #1: Mon Nov  2 14:30:55 EST 2009
    root@...:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) M processor 1.80GHz ("GenuineIntel" 686-class) 1.80 GHz
cpu0: FPU,V86,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,SBF,EST,TM2
real mem  = 2146398208 (2046MB)
avail mem = 2066685952 (1970MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 06/18/04, BIOS32 rev. 0 @ 0xfd750, SMBIOS rev. 2.33 @ 0xe0010 (61 entries)
bios0: vendor IBM version "1RETCDWW (3.06f)" date 06/18/2004
bios0: IBM 23738ZU
apm0 at bios0: Power Management spec V1.2
apm0: battery life expectancy 100%
apm0: AC on, battery charge high
acpi at bios0 function 0x0 not configured
pcibios0 at bios0: rev 2.1 @ 0xfd6e0/0x920
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdea0/272 (15 entries)
pcibios0: PCI Interrupt Router at 000:31:0 ("Intel 82371FB ISA" rev 0x00)
pcibios0: PCI bus #6 is the last bus
bios0: ROM list: 0xc0000/0x10000 0xe0000/0x10000
cpu0 at mainbus0: (uniprocessor)
cpu0: Enhanced SpeedStep 1799 MHz: speeds: 1800, 1600, 1400, 1200, 1000, 800, 600 MHz
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
io address conflict 0x5800/0x8
io address conflict 0x5808/0x4
io address conflict 0x5810/0x8
io address conflict 0x580c/0x4
pchb0 at pci0 dev 0 function 0 "Intel 82855PM Host" rev 0x03
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xd0000000, size 0x10000000
ppb0 at pci0 dev 1 function 0 "Intel 82855PM AGP" rev 0x03
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 "ATI Radeon Mobility M10" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
radeondrm0 at vga1: irq 11
drm0 at radeondrm0
uhci0 at pci0 dev 29 function 0 "Intel 82801DB USB" rev 0x01: irq 11
uhci1 at pci0 dev 29 function 1 "Intel 82801DB USB" rev 0x01: irq 11
uhci2 at pci0 dev 29 function 2 "Intel 82801DB USB" rev 0x01: irq 11
ehci0 at pci0 dev 29 function 7 "Intel 82801DB USB" rev 0x01: irq 11
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "Intel EHCI root hub" rev 2.00/1.00 addr 1
ppb1 at pci0 dev 30 function 0 "Intel 82801BAM Hub-to-PCI" rev 0x81
pci2 at ppb1 bus 2
mem address conflict 0xb0000000/0x1000
mem address conflict 0xb1000000/0x1000
cbb0 at pci2 dev 0 function 0 "TI PCI4520 CardBus" rev 0x01: irq 11
cbb1 at pci2 dev 0 function 1 "TI PCI4520 CardBus" rev 0x01: irq 11
em0 at pci2 dev 1 function 0 "Intel PRO/1000MT (82540EP)" rev 0x03: irq 11, address 00:0d:60:fc:eb:f8
ath0 at pci2 dev 2 function 0 "Atheros AR5212 (IBM MiniPCI)" rev 0x01: irq 11
ath0: AR5213 5.6 phy 4.1 rf5111 1.7 rf2111 2.3, WOR1W, address 00:05:4e:4b:1a:8f
cardslot0 at cbb0 slot 0 flags 0
cardbus0 at cardslot0: bus 3 device 0 cacheline 0x8, lattimer 0xb0
pcmcia0 at cardslot0
cardslot1 at cbb1 slot 1 flags 0
cardbus1 at cardslot1: bus 6 device 0 cacheline 0x8, lattimer 0xb0
pcmcia1 at cardslot1
ichpcib0 at pci0 dev 31 function 0 "Intel 82801DBM LPC" rev 0x01: 24-bit timer at 3579545Hz
pciide0 at pci0 dev 31 function 1 "Intel 82801DBM IDE" rev 0x01: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
wd0 at pciide0 channel 0 drive 0: <Hitachi HTS541616J9AT00>
wd0: 16-sector PIO, LBA48, 152627MB, 312581808 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 5
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <HL-DT-ST, RW/DVD GCC-4242N, 0201> ATAPI 5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
ichiic0 at pci0 dev 31 function 3 "Intel 82801DB SMBus" rev 0x01: irq 11
iic0 at ichiic0
spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM non-parity PC2700CL2.5
spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM non-parity PC2700CL2.5
auich0 at pci0 dev 31 function 5 "Intel 82801DB AC97" rev 0x01: irq 11, ICH4 AC97
ac97: codec id 0x41445374 (Analog Devices AD1981B)
ac97: codec features headphone, 20 bit DAC, No 3D Stereo
audio0 at auich0
"Intel 82801DB Modem" rev 0x01 at pci0 dev 31 function 6 not configured
usb1 at uhci0: USB revision 1.0
uhub1 at usb1 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb2 at uhci1: USB revision 1.0
uhub2 at usb2 "Intel UHCI root hub" rev 1.00/1.00 addr 1
usb3 at uhci2: USB revision 1.0
uhub3 at usb3 "Intel UHCI root hub" rev 1.00/1.00 addr 1
isa0 at ichpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
spkr0 at pcppi0
lpt2 at isa0 port 0x3bc/4: polled
aps0 at isa0 port 0x1600/31
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
biomask efed netmask efed ttymask ffff
mtrr: Pentium Pro MTRR support
softraid0 at root
root on wd0a swap on wd0b dump on wd0b


>Release-Note:
>Audit-Trail:
>Unformatted:


Re: library/6254: i386 libm long double sqrt function sqrtl() is very inaccurate

by Mark Kettenis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Date: Wed, 4 Nov 2009 23:07:23 -0500 (EST)
> From: Jonathan Thornburg <jthorn@...>
>
> Interestingly, if I compile with gcc 4.2.4 (from packages), then
> the long-double sqrt works fine (at least in my limited tests so far):

What must be happening there is that gcc 4.2.4 has sqrtl(3) as a
builtin that directly emits a fsqrt instruction on i386.  Not exactly
sure why gcc 3.3.5 doesn't do that.  But we'll have to provide a
proper implementation anyway.  Attached is a diff that provides an
assembly implementation on i386.  I'm working on the equivalent for
amd64 as well.  Obviously I'll need to take a look at the generic C
implementation as well, to see why it doesn't work as advertised.

Cheers,

Mark

Index: Makefile
===================================================================
RCS file: /cvs/src/lib/libm/Makefile,v
retrieving revision 1.72
diff -u -p -r1.72 Makefile
--- Makefile 26 Oct 2009 21:06:19 -0000 1.72
+++ Makefile 5 Nov 2009 10:44:14 -0000
@@ -26,6 +26,7 @@ ARCH_SRCS = s_copysign.S s_copysignf.S
 .PATH: ${.CURDIR}/arch/i387
 ARCH_SRCS = e_acos.S e_asin.S e_atan2.S e_exp.S e_fmod.S e_log.S e_log10.S \
     e_remainder.S e_remainderf.S e_scalb.S e_sqrt.S e_sqrtf.S \
+    e_sqrtl.S \
     invtrig.c \
     s_atan.S s_atanf.S s_ceil.S s_ceilf.S s_copysign.S s_copysignf.S \
     s_cos.S s_cosf.S s_floor.S s_floorf.S \
Index: arch/i387/e_sqrtl.S
===================================================================
RCS file: arch/i387/e_sqrtl.S
diff -N arch/i387/e_sqrtl.S
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ arch/i387/e_sqrtl.S 5 Nov 2009 10:44:14 -0000
@@ -0,0 +1,12 @@
+/* $OpenBSD: e_sqrt.S,v 1.3 2008/09/07 20:36:08 martynas Exp $ */
+/*
+ * Written by J.T. Conklin <jtc@...>.
+ * Public domain.
+ */
+
+#include <machine/asm.h>
+
+ENTRY(sqrtl)
+ fldt 4(%esp)
+ fsqrt
+ ret


Parent Message unknown Re: library/6254: i386 libm long double sqrt function sqrtl() is very inaccurate

by Mark Kettenis :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The following reply was made to PR library/6254; it has been noted by GNATS.

From: Mark Kettenis <mark.kettenis@...>
To: jthorn@...
Cc: gnats@..., bugs@...
Subject: Re: library/6254: i386 libm long double sqrt function sqrtl() is very inaccurate
Date: Thu, 5 Nov 2009 14:51:21 +0100 (CET)

 > Date: Wed, 4 Nov 2009 23:07:23 -0500 (EST)
 > From: Jonathan Thornburg <jthorn@...>
 >
 > Interestingly, if I compile with gcc 4.2.4 (from packages), then
 > the long-double sqrt works fine (at least in my limited tests so far):
 
 What must be happening there is that gcc 4.2.4 has sqrtl(3) as a
 builtin that directly emits a fsqrt instruction on i386.  Not exactly
 sure why gcc 3.3.5 doesn't do that.  But we'll have to provide a
 proper implementation anyway.  Attached is a diff that provides an
 assembly implementation on i386.  I'm working on the equivalent for
 amd64 as well.  Obviously I'll need to take a look at the generic C
 implementation as well, to see why it doesn't work as advertised.
 
 Cheers,
 
 Mark
 
 Index: Makefile
 ===================================================================
 RCS file: /cvs/src/lib/libm/Makefile,v
 retrieving revision 1.72
 diff -u -p -r1.72 Makefile
 --- Makefile 26 Oct 2009 21:06:19 -0000 1.72
 +++ Makefile 5 Nov 2009 10:44:14 -0000
 @@ -26,6 +26,7 @@ ARCH_SRCS = s_copysign.S s_copysignf.S
  .PATH: ${.CURDIR}/arch/i387
  ARCH_SRCS = e_acos.S e_asin.S e_atan2.S e_exp.S e_fmod.S e_log.S e_log10.S \
     e_remainder.S e_remainderf.S e_scalb.S e_sqrt.S e_sqrtf.S \
 +    e_sqrtl.S \
     invtrig.c \
     s_atan.S s_atanf.S s_ceil.S s_ceilf.S s_copysign.S s_copysignf.S \
     s_cos.S s_cosf.S s_floor.S s_floorf.S \
 Index: arch/i387/e_sqrtl.S
 ===================================================================
 RCS file: arch/i387/e_sqrtl.S
 diff -N arch/i387/e_sqrtl.S
 --- /dev/null 1 Jan 1970 00:00:00 -0000
 +++ arch/i387/e_sqrtl.S 5 Nov 2009 10:44:14 -0000
 @@ -0,0 +1,12 @@
 +/* $OpenBSD: e_sqrt.S,v 1.3 2008/09/07 20:36:08 martynas Exp $ */
 +/*
 + * Written by J.T. Conklin <jtc@...>.
 + * Public domain.
 + */
 +
 +#include <machine/asm.h>
 +
 +ENTRY(sqrtl)
 + fldt 4(%esp)
 + fsqrt
 + ret