dccifd: restart after signal 6

View: New views
15 Messages — Rating Filter:   Alert me  

dccifd: restart after signal 6

by Petar Bogdanovic-6 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

Hi,

I just noticed the following irregularity since we replaced 1.3.103 with
1.3.105 on a NetBSD 4.0 machine (no virtualization):

Jun  4 16:10:14 dccifd: restart after signal 6
Jun  4 16:10:14 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  4 16:10:14  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  4 18:20:12 dccifd: restart after signal 6
Jun  4 18:20:12 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  4 18:20:12  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  4 20:30:19 dccifd: restart after signal 6
Jun  4 20:30:19 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  4 20:30:19  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  4 22:31:11 dccifd: restart after signal 6
Jun  4 22:31:11 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  4 22:31:11  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 00:41:07 dccifd: restart after signal 6
Jun  5 00:41:07 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 00:41:07  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 03:14:51 dccifd: restart after signal 6
Jun  5 03:14:51 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 03:14:51  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 06:37:53 dccifd: restart after signal 6
Jun  5 06:37:53 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 06:37:53  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 08:53:25 dccifd: restart after signal 6
Jun  5 08:53:25 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 08:53:25  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 12:35:06 dccifd: restart after signal 6
Jun  5 12:35:06 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 12:35:06  spamd: dcc: dccifd -> check skipped: failed to read header (...)
Jun  5 15:01:19 dccifd: restart after signal 6
Jun  5 15:01:19 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
Jun  5 15:01:19  spamd: dcc: dccifd -> check skipped: failed to read header (...)


According to kill(1) on NetBSD, signal 6 is ``ABRT (abort)''.  What
could that be?  It certainly never occured when we used 1.3.103.

Thanks,



   Petar Bogdanovic



_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by MrC-10 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On 6/5/2009 7:10 AM, Petar Bogdanovic wrote:
> Hi,
>
> I just noticed the following irregularity since we replaced 1.3.103 with
> 1.3.105 on a NetBSD 4.0 machine (no virtualization):
>
> Jun  4 16:10:14 dccifd: restart after signal 6
> Jun  4 16:10:14 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
> Jun  4 16:10:14  spamd: dcc: dccifd ->  check skipped: failed to read header (...)
> Jun  4 18:20:12 dccifd: restart after signal 6

>
>
> According to kill(1) on NetBSD, signal 6 is ``ABRT (abort)''.  What
> could that be?  It certainly never occured when we used 1.3.103.
>

I had the same experience, and didn't have time to track the issue.  I
reverted to dccproc for the time being.  I'd be curious about the
situation too.

Regards,
Mike
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Petar Bogdanovic <petar@...>

> I just noticed the following irregularity since we replaced 1.3.103 with
> 1.3.105 on a NetBSD 4.0 machine (no virtualization):
>
> Jun  4 16:10:14 dccifd: restart after signal 6
> Jun  4 16:10:14 dccifd: 1.3.105 listening to /var/dcc/dccifd (...)
> Jun  4 16:10:14  spamd: dcc: dccifd -> check skipped: failed to read header (...)
> ...

> According to kill(1) on NetBSD, signal 6 is ``ABRT (abort)''.  What
> could that be?  It certainly never occured when we used 1.3.103.

Signal 6 generally comes from the abort() library function.  That
ought to be associated with a system log complaint about a major problem.

There should also be a core file in the DCC home directory.  That core
file might be useful with gdb if dccifd has been built with
debugging information.  
To rebuild the DCC software with debugging information, run

   .../libexec/updatedcc -e DBGFLAGS=-g

If the core file is for version of dccifd built with -g,
then the following will get the stack trace on NetBSD with the DCC
home and libexec directories set to the defaults:

   % gdb /var/dcc/libexec/dccifd /var/dcc/dccifd.core
   bt
   exit



} From: MrC <lists-dcc@...>

} I had the same experience, and didn't have time to track the issue.  I
} reverted to dccproc for the time being.  I'd be curious about the
} situation too.

Is that also with NetBSD?


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by MrC-10 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On 6/5/2009 11:14 AM, Vernon Schryver wrote:

> Signal 6 generally comes from the abort() library function.  That
> ought to be associated with a system log complaint about a major problem.
>
Yup.  The only other log message I have is generated by amavisd-new,
which just passed on the info from SpamAssassin:

May 30 16:57:33 glacier amavis[14242]: (14242-02) _WARN: dcc: dccifd ->
check skipped:  failed to read header at
/usr/pkg/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/DCC.pm
line 471.
May 30 16:57:33 glacier dccifd[1484]: restart after signal 6
May 30 16:57:33 glacier dccifd[21099]: 1.3.105 listening to
/var/dcc/dccifd for ASCII protocol

> There should also be a core file in the DCC home directory.  That core
> file might be useful with gdb if dccifd has been built with
> debugging information.

There was, but I had not built w/debug.

> To rebuild the DCC software with debugging information, run
>
>     .../libexec/updatedcc -e DBGFLAGS=-g

Done - now awaiting core file production.

>
> Is that also with NetBSD?

Yup:

$ uname -smr
NetBSD 4.0.1 i386
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by MrC-10 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On 6/5/2009 11:38 AM, MrC wrote:

> On 6/5/2009 11:14 AM, Vernon Schryver wrote:
>
>> Signal 6 generally comes from the abort() library function. That
>> ought to be associated with a system log complaint about a major problem.
>>
> Yup. The only other log message I have is generated by amavisd-new,
> which just passed on the info from SpamAssassin:
>
> May 30 16:57:33 glacier amavis[14242]: (14242-02) _WARN: dcc: dccifd ->
> check skipped: failed to read header at
> /usr/pkg/lib/perl5/vendor_perl/5.10.0/Mail/SpamAssassin/Plugin/DCC.pm
> line 471.
> May 30 16:57:33 glacier dccifd[1484]: restart after signal 6
> May 30 16:57:33 glacier dccifd[21099]: 1.3.105 listening to
> /var/dcc/dccifd for ASCII protocol
>
>> There should also be a core file in the DCC home directory. That core
>> file might be useful with gdb if dccifd has been built with
>> debugging information.
>
> There was, but I had not built w/debug.
>
>> To rebuild the DCC software with debugging information, run
>>
>> .../libexec/updatedcc -e DBGFLAGS=-g
>
> Done - now awaiting core file production.

As expected, this is the result of a kill(2) call.

#0  0xbbaf923f in kill () from /usr/lib/libc.so.12
#1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
#2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
#3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
#4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
clnt_fgs=8 '\b') at clnt_send.c:1740
#5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
#6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
#7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12


This comes from Mail/SpamAssassin/Plugin/DCC.pm :

     # send the options and other parameters to the daemon
     $sock->print("header " . $opts . "\n") || dbg("dcc: failed write")
&& die; # options
     $sock->print($client . "\n") || dbg("dcc: failed write") && die; #
client
     $sock->print($helo . "\n") || dbg("dcc: failed write") && die; #
HELO value
     $sock->print("\n") || dbg("dcc: failed write") && die; # sender
     $sock->print("unknown\r\n") || dbg("dcc: failed write") && die; #
recipients
     $sock->print("\n") || dbg("dcc: failed write") && die; # recipients

     $sock->print($$fulltext);

     $sock->shutdown(1) || dbg("dcc: failed socket shutdown: $!") && die;

     $sock->getline() || dbg("dcc: failed read status") && die;
     $sock->getline() || dbg("dcc: failed read multistatus") && die;

 >>>> so we failed to getlines() from the socket ...

     my @null = $sock->getlines();

 >>>> and then die... :

     if (!@null) {
       # no facility prefix on this
       die("failed to read header\n");
     }


     # the first line will be the header we want to look at
     chomp($response = shift @null);
     # but newer versions of DCC fold the header if it's too long...
     while (my $v = shift @null) {
       last unless ($v =~ s/^\s+/ /);  # if this line wasn't folded, stop
       chomp $v;
       $response .= $v;
     }

     dbg("dcc: dccifd got response: $response");

   });

   $permsgstatus->leave_helper_run_mode();

   if ($timer->timed_out()) {
     dbg("dcc: dccifd check timed out after $timeout secs.");
     return 0;
   }

 >>>> but the die is caught, and this message output:

   if ($err) {
     chomp $err;
     warn("dcc: dccifd -> check skipped: $! $err");
     return 0;
   }

_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: MrC <lists-dcc@...>

> As expected, this is the result of a kill(2) call.
>
> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
> clnt_fgs=8 '\b') at clnt_send.c:1740
> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12

Now that you mention it, I saw an instance of it a week or two ago,
but hoped it was a fluke.  I've been unable to reproduce it then or today.

That's an ugly one, because it's not in my code.
This is the relevant part of my get_port.c:

        if (!dcc_host_locked)
                dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");

        /* get the current value */
        if (!(_res.options & RES_INIT))
                res_init();

dcc_logbad() calls abort() after syslog().
Because I assume the resolver is not thread safe and check that it's
locked, it can't be a simple, valid locking problem.

I guess I'll have to look for NetBSD's version of the resolver library
to see what NetBSD has done to it.  There are no abort() calls in the
FreeBSD 7.1 version of res_state.c


Have I mentioned that I'm not a fan of the clean target in
the NetBSD bsd.prog.mk because it deletes .gdbinit?


thanks,
Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Petar Bogdanovic-6 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 05, 2009 at 09:52:05PM +0000, Vernon Schryver wrote:

> > From: MrC <lists-dcc@...>
>
> > As expected, this is the result of a kill(2) call.
> >
> > #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
> > #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
> > #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
> > #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
> > #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
> > clnt_fgs=8 '\b') at clnt_send.c:1740
> > #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
> > #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
> > #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>
> Now that you mention it, I saw an instance of it a week or two ago,
> but hoped it was a fluke.  I've been unable to reproduce it then or today.
>
> That's an ugly one, because it's not in my code.
> This is the relevant part of my get_port.c:
>
> if (!dcc_host_locked)
> dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>
> /* get the current value */
> if (!(_res.options & RES_INIT))
> res_init();
>
> dcc_logbad() calls abort() after syslog().
> Because I assume the resolver is not thread safe and check that it's
> locked, it can't be a simple, valid locking problem.
>
> I guess I'll have to look for NetBSD's version of the resolver library
> to see what NetBSD has done to it.  There are no abort() calls in the
> FreeBSD 7.1 version of res_state.c

        # find /usr/src/ -name 'res_state*'
        /usr/src/lib/libc/resolv/res_state.c
        /usr/src/lib/libpthread/res_state.c
        # grep -ri abort\( /usr/src/lib/libc/resolv/res_state.c
        # grep -ri abort\( /usr/src/lib/libpthread/res_state.c
                abort();



   Petar Bogdanovic



_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by MrC-10 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message



On 6/5/2009 2:52 PM, Vernon Schryver wrote:

>> From: MrC<lists-dcc@...>
>
>> As expected, this is the result of a kill(2) call.
>>
>> #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
>> #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
>> #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
>> #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
>> #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
>> clnt_fgs=8 '\b') at clnt_send.c:1740
>> #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
>> #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
>> #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
>
> Now that you mention it, I saw an instance of it a week or two ago,
> but hoped it was a fluke.  I've been unable to reproduce it then or today.
>
> That's an ugly one, because it's not in my code.
> This is the relevant part of my get_port.c:
>
> if (!dcc_host_locked)
> dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
>
> /* get the current value */
> if (!(_res.options&  RES_INIT))
> res_init();
>
> dcc_logbad() calls abort() after syslog().
> Because I assume the resolver is not thread safe and check that it's
> locked, it can't be a simple, valid locking problem.
>
> I guess I'll have to look for NetBSD's version of the resolver library
> to see what NetBSD has done to it.  There are no abort() calls in the
> FreeBSD 7.1 version of res_state.c
>

I don't see anything that immediately jumps out between changes in 4.0
and 4.0.1, but I'm long since familiar with libc code.


Here's the abort() in the libpthread version of res_state:

libpthread/res_state.c

     /*
      * This is aliased via a macro to _res; don't allow multi-threaded
programs
      * to use it.
      */
     res_state
     __res_state(void)
     {
             static const char res[] = "_res is not supported for
multi-threaded"
                 " programs.\n";
             (void)write(STDERR_FILENO, res, sizeof(res) - 1);
             abort();
             return NULL;
     }


The libc version uses weak aliases:

     #include <sys/cdefs.h>
     #if defined(LIBC_SCCS) && !defined(lint)
     __RCSID("$NetBSD: res_state.c,v 1.5.10.1 2007/05/17 21:25:19 jdc
Exp $");
     #endif

     #include <sys/types.h>
     #include <arpa/inet.h>
     #include <arpa/nameser.h>
     #include <netdb.h>
     #include <resolv.h>

     struct __res_state _nres
     # if defined(__BIND_RES_TEXT)
             = { .retrans = RES_TIMEOUT, }   /*%< Motorola, et al. */
     # endif
             ;

     res_state __res_get_state_nothread(void);
     void __res_put_state_nothread(res_state);

     #ifdef __weak_alias
     __weak_alias(__res_get_state, __res_get_state_nothread)
     __weak_alias(__res_put_state, __res_put_state_nothread)
     /* Source compatibility; only for single threaded programs */
     __weak_alias(__res_state, __res_get_state_nothread)
     #endif

     res_state
     __res_get_state_nothread(void)
     {
             if ((_nres.options & RES_INIT) == 0 && res_ninit(&_nres) ==
-1) {
                     h_errno = NETDB_INTERNAL;
                     return NULL;
             }
             return &_nres;
     }

And dccifd is linked against libpthread:

     $ ldd /var/dcc/libexec/dccifd
     /var/dcc/libexec/dccifd:
             -lpthread.0 => /usr/lib/libpthread.so.0
             -lm.0 => /usr/lib/libm387.so.0
             -lm.0 => /usr/lib/libm.so.0
             -lc.12 => /usr/lib/libc.so.12


Let me know if there is something I can do to help.


> Have I mentioned that I'm not a fan of the clean target in
> the NetBSD bsd.prog.mk because it deletes .gdbinit?
>

Oh, boy.  They really mean *squeaky clean*.

Thanks for your time,
Mike
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Petar Bogdanovic-6 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On Sat, Jun 06, 2009 at 12:17:25AM +0200, Petar Bogdanovic wrote:

> On Fri, Jun 05, 2009 at 09:52:05PM +0000, Vernon Schryver wrote:
> > > From: MrC <lists-dcc@...>
> >
> > > As expected, this is the result of a kill(2) call.
> > >
> > > #0  0xbbaf923f in kill () from /usr/lib/libc.so.12
> > > #1  0xbbb95a64 in abort () from /usr/lib/libc.so.12
> > > #2  0xbbbdc60c in __res_state () from /usr/lib/libpthread.so.0
> > > #3  0x0806b9ca in dcc_res_delays (budget=4) at get_port.c:476
> > > #4  0x080660f2 in dcc_clnt_rdy (emsg=0xb91fff50 "", ctxt=0x80c9000,
> > > clnt_fgs=8 '\b') at clnt_send.c:1740
> > > #5  0x080544dc in clnt_resolve_thread (arg=0x0) at clnt_threaded.c:394
> > > #6  0xbbbe562d in pthread_join () from /usr/lib/libpthread.so.0
> > > #7  0xbbb1aa2c in swapcontext () from /usr/lib/libc.so.12
> >
> > Now that you mention it, I saw an instance of it a week or two ago,
> > but hoped it was a fluke.  I've been unable to reproduce it then or today.
> >
> > That's an ugly one, because it's not in my code.
> > This is the relevant part of my get_port.c:
> >
> > if (!dcc_host_locked)
> > dcc_logbad(EX_SOFTWARE, "dcc_get_host() not locked");
> >
> > /* get the current value */
> > if (!(_res.options & RES_INIT))
> > res_init();
> >
> > dcc_logbad() calls abort() after syslog().
> > Because I assume the resolver is not thread safe and check that it's
> > locked, it can't be a simple, valid locking problem.
> >
> > I guess I'll have to look for NetBSD's version of the resolver library
> > to see what NetBSD has done to it.  There are no abort() calls in the
> > FreeBSD 7.1 version of res_state.c
>
> # find /usr/src/ -name 'res_state*'
> /usr/src/lib/libc/resolv/res_state.c
> /usr/src/lib/libpthread/res_state.c
> # grep -ri abort\( /usr/src/lib/libc/resolv/res_state.c
> # grep -ri abort\( /usr/src/lib/libpthread/res_state.c
> abort();

/usr/src/include/resolv.h:
        /*
         * Source and Binary compatibility; _res will not work properly
         * with multi-threaded programs.
         */
        extern struct __res_state *__res_state(void);
        #define _res (*__res_state())

/usr/src/lib/libpthread/res_state.c:
        /*
         * This is aliased via a macro to _res; don't allow multi-threaded programs
         * to use it.
         */
        res_state
        __res_state(void)
        {
                static const char res[] = "_res is not supported for multi-threaded"
                    " programs.\n";
                (void)write(STDERR_FILENO, res, sizeof(res) - 1);
                abort();
                return NULL;
        }


I won't pretend that I have an idea about what's going on, but something
tells me that using _res has something to do with it.



   Petar Bogdanovic



_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: MrC <lists-dcc@...>

> Here's the abort() in the libpthread version of res_state:

>              static const char res[] = "_res is not supported for
> multi-threaded"
>                  " programs.\n";
>              (void)write(STDERR_FILENO, res, sizeof(res) - 1);
>              abort();
>              return NULL;
>      }

That is utterly lame, overprotective mommy-ism.  Besides, it stupidly
assumes that stderr has not been closed, as competent programmer does
in a daemon.

A competent programmer knows to suspect that a library is not thread
safe unless it explicitly says it is.  More than that, you assume that
a library that keeps internal state like _res.options and res_init()
is not thread safe unless it both claims to be thread safe and you can't
find any evidence about problems.

And to omit any hint of the nonsense in the resolver man page!
There is this passing mention in resolve.h:
 * Source and Binary compatibility; _res will not work properly
 * with multi-threaded programs.


> And dccifd is linked against libpthread:

Because dccifd and dccm use threads


> Let me know if there is something I can do to help.

Any suggestions on the least nasty kludge to link dccifd and dccm
to the libc resolver intead of the broken-by-design libpthread
resolver?  I'd have to force not only the res_state hooks, but the
whole resolver edifice including anything called inside gethostby*().

Should I change the Makefiles to treat NetBSD like Windows and not build
dccifd and dccm under the toy-applications-for-toy-operating-systems rule?

Maybe I can arrange to not tweak the resolver timeouts for the threaded
DCC programs to limit the total DCC delays and so keep SpamAssassin and
MTAs from being unhappy.


Have I mentioned I'm becoming ever less enamored of recent versions of
NetBSD?  The Linux experts only gratuitously, incompatibly changed the
names of the resolver hooks.


thanks any way,
Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Paul Vixie-4 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

the strange part of all this is, there has been a thread safe superset
of libresolv for about 11 years.  just call res_ninit instead of
res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Vernon Schryver :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

> From: Paul Vixie <vixie@...>
> To: Vernon Schryver <vjs@...>
> cc: dcc@..., lists-dcc@...

> the strange part of all this is, there has been a thread safe superset
> of libresolv for about 11 years.  just call res_ninit instead of
> res_init, and res_nsend instead of res_send.  netbsd doesn't know this?

I didn't know that, but now that you mention it, it's in non-NetBSD
versions of res_state.c, and so probably in NetBSD versions.
I'd not trust it without reading the whole resolver library because
"thread" does appear in res_state.c.  But there are plenty of
pthread.h and similar stains in other files in the
FreeBSD version of /usr/src/lib/libc/resolv/*


While I'm ranting about NetBSD, I wish they'd get with the program in
minor ways that wouldn't break anything.  I can't think of an excuse
for the spews of compiler warnings like these:
    sign.c:77: warning: pointer targets in passing argument 2 of 'MD5Update' differ in signedness
Declaring your MD5Update() to take a const void * instead of a const
unsigned char * should be done even before you tune it.  It should be
done when you add "const" to the ancient code from the RFC 1321.


Vernon Schryver    vjs@...
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by MrC-10 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On 6/5/2009 6:02 PM, Vernon Schryver wrote:

>> From: Paul Vixie<vixie@...>
>> To: Vernon Schryver<vjs@...>
>> cc: dcc@..., lists-dcc@...
>
>> the strange part of all this is, there has been a thread safe superset
>> of libresolv for about 11 years.  just call res_ninit instead of
>> res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
>
> I didn't know that, but now that you mention it, it's in non-NetBSD
> versions of res_state.c, and so probably in NetBSD versions.
> I'd not trust it without reading the whole resolver library because
> "thread" does appear in res_state.c.  But there are plenty of
> pthread.h and similar stains in other files in the
> FreeBSD version of /usr/src/lib/libc/resolv/*
>

Here's Christos Zoulas' mention of using res_ninit() and deprecating
usage of _res :

http://www.nabble.com/_res-value-related-query-td22433804.html

Seems the changes aren't too bad.  I can test if you want.

> While I'm ranting about NetBSD, I wish they'd get with the program in
> minor ways that wouldn't break anything.  I can't think of an excuse
> for the spews of compiler warnings like these:
>      sign.c:77: warning: pointer targets in passing argument 2 of 'MD5Update' differ in signedness
> Declaring your MD5Update() to take a const void * instead of a const
> unsigned char * should be done even before you tune it.  It should be
> done when you add "const" to the ancient code from the RFC 1321.
>

I want to go off on pkgsrc... but I'll resist and slog instead.

-m
_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Petar Bogdanovic-6 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On Sat, Jun 06, 2009 at 01:02:56AM +0000, Vernon Schryver wrote:

> > From: Paul Vixie <vixie@...>
> > To: Vernon Schryver <vjs@...>
> > cc: dcc@..., lists-dcc@...
>
> > the strange part of all this is, there has been a thread safe superset
> > of libresolv for about 11 years.  just call res_ninit instead of
> > res_init, and res_nsend instead of res_send.  netbsd doesn't know this?
>
> I didn't know that, but now that you mention it, it's in non-NetBSD
> versions of res_state.c, and so probably in NetBSD versions.
> I'd not trust it without reading the whole resolver library because
> "thread" does appear in res_state.c.  But there are plenty of
> pthread.h and similar stains in other files in the
> FreeBSD version of /usr/src/lib/libc/resolv/*
>
>
> While I'm ranting about NetBSD, I wish they'd get with the program in
> minor ways that wouldn't break anything.  I can't think of an excuse
> for the spews of compiler warnings like these:
>     sign.c:77: warning: pointer targets in passing argument 2 of 'MD5Update' differ in signedness
> Declaring your MD5Update() to take a const void * instead of a const
> unsigned char * should be done even before you tune it.  It should be
> done when you add "const" to the ancient code from the RFC 1321.


I don't understand the need for drama here.  Since I'm on this list,
you've been hammering NetBSD again and again for various reasons.  In
the end it's a volunteer project with a much smaller developer &
userbase then FreeBSD, let alone Linux or any bigger Linux distribution.

If all the details you mentioned really anger you that much, feel free
to open a PR or post to one of the available NetBSD mailing-lists (you
may cc them without being subscribed, btw) and i'm pretty sure you'll
get a reasonable answer like: ``yes, it's suboptimal and insufficiently
documented because nobody had the time to work on it yet''.

In general, the NetBSD developers are a tribe of highly qualified and
friendly perfectionists willing to help whenever they can.  Admitting
obvious and less obvious flaws is also one of their strengths so it's
unlikely that you'll ever have to face lengthy and pointless discussions
about unimportant details.



   Petar Bogdanovic



_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc

Re: dccifd: restart after signal 6

by Petar Bogdanovic-6 :: Rate this Message:

Reply (Restricted by the Administrator) | Reply to Author | View Threaded | Show Only this Message

On Fri, Jun 05, 2009 at 06:26:50PM -0700, MrC wrote:
>
> I want to go off on pkgsrc... but I'll resist and slog instead.

What is so wrong with pkgsrc that it would justify such tough talk?



   Petar Bogdanovic



_______________________________________________
DCC mailing list      DCC@...
http://www.rhyolite.com/mailman/listinfo/dcc