On Mon, Jun 02, 2008 at 05:25:32PM -0400, Wietse Venema wrote:
> If your system library reports SERVFAIL errors as EAI_NONAME, then
> there is no way to report this as a recoverable error.
For the record, after spending hours of barking up wrong trees (or at
least the wrong branches of the correct tree) this problem has finally
been resolved. Executive summary: this is/was indeed a bug in the system
library.
We originally observed this problem on SLES 10.1 which includes glibc
2.4. After you pointed towards an errorneous return value of
getnameinfo() I did some tests on my workstation (Ubuntu Hardy, glibc
2.7) and found it to be affected as well. Since there had been no
changes in glibc CVS since that version for that code I concluded that
this bug was still present in current glibc.
This assumption was wrong. The bug in glibc has been fixed with the
following commit for glibc 2.5
http://sourceware.org/cgi-bin/cvsweb.cgi/libc/inet/getnameinfo.c.diff?r1=1.34&r2=1.35&cvsroot=glibc&f=hMy Hardy workstation (glibc 2.7) still being broken was caused by an
unrelated problem with the mDNS/avahi module installed on Ubuntu by
default
bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
rv:Name or service not known(-2)
bschmidt@lxbsc01:~$ sudo vim /etc/nsswitch.conf
bschmidt@lxbsc01:~$ grep ^hosts: /etc/nsswitch.conf
hosts: files dns
bschmidt@lxbsc01:~$ ./getnameinfo 62.85.116.236
rv:Temporary failure in name resolution(-3)
After recompiling the glibc 2.4 in SLES10 with the patch applied
getnameinfo() and thus Postfix behave as expected.
Would it be unreasonable to add a heads-up to the manpage? Definitely
affected are
* SLES 10 (including the recently released SP2) shipping glibc 2.4
* Debian Etch shipping glibc 2.3
* FreeBSD 7.0-RELEASE (not shipping any glibc but according to my tests
broken as well)
I'll file the appropriate bug reports with Novell and Debian in the next
couple of days, but it will probably take years rather than months to
fix all the systems out there, so a small note in the manpage would
probably be a good idea. And/or maybe ship a small test program that can
be used to determine whether your system library is broken. I can
provide an IP address where the reverse lookup will always fail if
necessary.
Regards,
Bernhard