Douglas E. Engert wrote:
> We have seen a number of issues with nss-ldap when going from
> Ubuntu Dapper to Ubuntu Hardy. (Intrepid has shown similiar problems.)
> Dapper clients and Solaris 9 and 10 using Sun's nss ldap work
> fine with our two ldap servers.
>
> Hardy, based on nss-ldap_258, has the problems. The code for 260
> and 264 appears to have the same problems.
Your analysis makes sense to me. But at the moment I'm no longer interested in
nss-ldap since nss-ldapd ( + slapd nssov) works better and offers easier
administration.
> First problem:
>
> The /etc/ldap.conf file implies the default for timeout is 30 seconds.
> But it is unlimited in the code. This has caused nscd to lockup as it
> keeps accepting requests, with all its worker threads waiting on the
> nss-ldap lock, with one thread waiting in ldap_result waiting for the
> response. netstat -a shows the connection is in CLOSE_WAIT. The systems
> keep running slow as each caller of nscd times out waithing the nscd,
> then goes ahead and does the LDAP request. Nscd uses on file descriptor
> for each request and eventually runs out of file descriptors and start
> using 100% CPU.
>
> Setting timeout 30 at least helps get out of this situation.
> Suggestion: in util.c: set result->ldc_timelimit = 30; (See attachment)
>
> Second problem:
>
> In ldap-nss.c if the do_result gets a timeout (or error), it writes to
> syslog: "nss_ldap: could not get LDAP result" and sets stat = NSS_UNAVAIL;
>
> But the __session.ls_state is still set to LS_CONNECTED_TO_DSA
> and the next operation tries to use the same connection which will also
> time out.
>
> Suggestion: in ldap-nss.c (see attachment)
> Add call to do_close() in two places where do_result gets a timeout or
> other connection error. This change will causes the next request to
> reconnect. It may take 30 seconds, but the new connection will not timeout
> again.
>
>
> These problems may be related to the Ubuntu conversion from using OpenSSL
> to using GunTLS. It may be that OpenSSL or GnuTLS fails to shutdown the
> connectioncorrectly, or fails to tell ldap_search that the connection is
> down.
>
> In any case if the do_result fails with some timeout or connection problem,
> the conservative thing to do is to do through the do_with_reconnect and try
> a different server.
>
> Has anyone seen any similar problems?
>
> What we are testing now is using the Intrepid version of nss-ldap based on
> 260 on Hardy with the attached changes.
>
> Packages being used:
> libnss-ldap 260-1ubuntu2-dee1 (-dee1 has my changes)
> libldap-2.4-2 2.4.9-0ubuntu0.8.04.2
> libgnutls13 2.0.4-1ubuntu2.3
> nscd 2.7-10ubuntu4
>
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com Director, Highland Sun
http://highlandsun.com/hyc/ Chief Architect, OpenLDAP
http://www.openldap.org/project/