|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
nss-ldap timeouts when used with nscd and gnutlsWe have seen a number of issues with nss-ldap when going from
Ubuntu Dapper to Ubuntu Hardy. (Intrepid has shown similiar problems.) Dapper clients and Solaris 9 and 10 using Sun's nss ldap work fine with our two ldap servers. Hardy, based on nss-ldap_258, has the problems. The code for 260 and 264 appears to have the same problems. First problem: The /etc/ldap.conf file implies the default for timeout is 30 seconds. But it is unlimited in the code. This has caused nscd to lockup as it keeps accepting requests, with all its worker threads waiting on the nss-ldap lock, with one thread waiting in ldap_result waiting for the response. netstat -a shows the connection is in CLOSE_WAIT. The systems keep running slow as each caller of nscd times out waithing the nscd, then goes ahead and does the LDAP request. Nscd uses on file descriptor for each request and eventually runs out of file descriptors and start using 100% CPU. Setting timeout 30 at least helps get out of this situation. Suggestion: in util.c: set result->ldc_timelimit = 30; (See attachment) Second problem: In ldap-nss.c if the do_result gets a timeout (or error), it writes to syslog: "nss_ldap: could not get LDAP result" and sets stat = NSS_UNAVAIL; But the __session.ls_state is still set to LS_CONNECTED_TO_DSA and the next operation tries to use the same connection which will also time out. Suggestion: in ldap-nss.c (see attachment) Add call to do_close() in two places where do_result gets a timeout or other connection error. This change will causes the next request to reconnect. It may take 30 seconds, but the new connection will not timeout again. These problems may be related to the Ubuntu conversion from using OpenSSL to using GunTLS. It may be that OpenSSL or GnuTLS fails to shutdown the connectioncorrectly, or fails to tell ldap_search that the connection is down. In any case if the do_result fails with some timeout or connection problem, the conservative thing to do is to do through the do_with_reconnect and try a different server. Has anyone seen any similar problems? What we are testing now is using the Intrepid version of nss-ldap based on 260 on Hardy with the attached changes. Packages being used: libnss-ldap 260-1ubuntu2-dee1 (-dee1 has my changes) libldap-2.4-2 2.4.9-0ubuntu0.8.04.2 libgnutls13 2.0.4-1ubuntu2.3 nscd 2.7-10ubuntu4 -- Douglas E. Engert <DEEngert@...> Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439 (630) 252-5444 diff -u -r nss_ldap-260/ldap-nss.c nss_ldap-260-dee1/ldap-nss.c --- nss_ldap-260/ldap-nss.c 2009-04-15 10:13:08.000000000 -0500 +++ nss_ldap-260-dee1/ldap-nss.c 2009-04-20 14:32:28.000000000 -0500 @@ -1577,6 +1577,7 @@ } else { + syslog (LOG_ERR, "nss_ldap: do_open: do_start_tls failed:stat=%d", stat); do_close (); debug ("<== do_open (TLS startup failed)"); return stat; @@ -2472,6 +2473,7 @@ #endif /* LDAP_OPT_ERROR_NUMBER */ syslog (LOG_AUTHPRIV | LOG_ERR, "nss_ldap: could not get LDAP result - %s", ldap_err2string (rc)); + do_close(); stat = NSS_UNAVAIL; break; case LDAP_RES_SEARCH_ENTRY: @@ -2507,6 +2509,7 @@ syslog (LOG_AUTHPRIV | LOG_ERR, "nss_ldap: could not get LDAP result - %s", ldap_err2string (rc)); + do_close(); } else if (resultControls != NULL) { Only in nss_ldap-260-dee1: ldap-nss.o Binary files nss_ldap-260/nss_ldap.so and nss_ldap-260-dee1/nss_ldap.so differ diff -u -r nss_ldap-260/util.c nss_ldap-260-dee1/util.c --- nss_ldap-260/util.c 2008-03-04 04:05:12.000000000 -0600 +++ nss_ldap-260-dee1/util.c 2009-04-15 12:40:26.000000000 -0500 @@ -625,7 +625,7 @@ #else result->ldc_version = LDAP_VERSION2; #endif /* LDAP_VERSION3 */ - result->ldc_timelimit = LDAP_NO_LIMIT; + result->ldc_timelimit = 30; result->ldc_bind_timelimit = 30; result->ldc_ssl_on = SSL_OFF; result->ldc_sslpath = NULL; Only in nss_ldap-260-dee1: util.o |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsDouglas E. Engert wrote:
> We have seen a number of issues with nss-ldap when going from > Ubuntu Dapper to Ubuntu Hardy. (Intrepid has shown similiar problems.) > Dapper clients and Solaris 9 and 10 using Sun's nss ldap work > fine with our two ldap servers. > > Hardy, based on nss-ldap_258, has the problems. The code for 260 > and 264 appears to have the same problems. Your analysis makes sense to me. But at the moment I'm no longer interested in nss-ldap since nss-ldapd ( + slapd nssov) works better and offers easier administration. > First problem: > > The /etc/ldap.conf file implies the default for timeout is 30 seconds. > But it is unlimited in the code. This has caused nscd to lockup as it > keeps accepting requests, with all its worker threads waiting on the > nss-ldap lock, with one thread waiting in ldap_result waiting for the > response. netstat -a shows the connection is in CLOSE_WAIT. The systems > keep running slow as each caller of nscd times out waithing the nscd, > then goes ahead and does the LDAP request. Nscd uses on file descriptor > for each request and eventually runs out of file descriptors and start > using 100% CPU. > > Setting timeout 30 at least helps get out of this situation. > Suggestion: in util.c: set result->ldc_timelimit = 30; (See attachment) > > Second problem: > > In ldap-nss.c if the do_result gets a timeout (or error), it writes to > syslog: "nss_ldap: could not get LDAP result" and sets stat = NSS_UNAVAIL; > > But the __session.ls_state is still set to LS_CONNECTED_TO_DSA > and the next operation tries to use the same connection which will also > time out. > > Suggestion: in ldap-nss.c (see attachment) > Add call to do_close() in two places where do_result gets a timeout or > other connection error. This change will causes the next request to > reconnect. It may take 30 seconds, but the new connection will not timeout > again. > > > These problems may be related to the Ubuntu conversion from using OpenSSL > to using GunTLS. It may be that OpenSSL or GnuTLS fails to shutdown the > connectioncorrectly, or fails to tell ldap_search that the connection is > down. > > In any case if the do_result fails with some timeout or connection problem, > the conservative thing to do is to do through the do_with_reconnect and try > a different server. > > Has anyone seen any similar problems? > > What we are testing now is using the Intrepid version of nss-ldap based on > 260 on Hardy with the attached changes. > > Packages being used: > libnss-ldap 260-1ubuntu2-dee1 (-dee1 has my changes) > libldap-2.4-2 2.4.9-0ubuntu0.8.04.2 > libgnutls13 2.0.4-1ubuntu2.3 > nscd 2.7-10ubuntu4 > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsHoward Chu wrote: > Douglas E. Engert wrote: >> We have seen a number of issues with nss-ldap when going from >> Ubuntu Dapper to Ubuntu Hardy. (Intrepid has shown similiar problems.) >> Dapper clients and Solaris 9 and 10 using Sun's nss ldap work >> fine with our two ldap servers. >> >> Hardy, based on nss-ldap_258, has the problems. The code for 260 >> and 264 appears to have the same problems. > > Your analysis makes sense to me. But at the moment I'm no longer > interested in nss-ldap since nss-ldapd ( + slapd nssov) works better and > offers easier administration. Sounds interesting, but we are trying to stick with what is offered by Ubuntu. > >> First problem: >> >> The /etc/ldap.conf file implies the default for timeout is 30 seconds. >> But it is unlimited in the code. This has caused nscd to lockup as it >> keeps accepting requests, with all its worker threads waiting on the >> nss-ldap lock, with one thread waiting in ldap_result waiting for the >> response. netstat -a shows the connection is in CLOSE_WAIT. The systems >> keep running slow as each caller of nscd times out waithing the nscd, >> then goes ahead and does the LDAP request. Nscd uses on file descriptor >> for each request and eventually runs out of file descriptors and start >> using 100% CPU. >> >> Setting timeout 30 at least helps get out of this situation. >> Suggestion: in util.c: set result->ldc_timelimit = 30; (See attachment) >> >> Second problem: >> >> In ldap-nss.c if the do_result gets a timeout (or error), it writes to >> syslog: "nss_ldap: could not get LDAP result" and sets stat = >> NSS_UNAVAIL; >> >> But the __session.ls_state is still set to LS_CONNECTED_TO_DSA >> and the next operation tries to use the same connection which will also >> time out. >> >> Suggestion: in ldap-nss.c (see attachment) >> Add call to do_close() in two places where do_result gets a timeout or >> other connection error. This change will causes the next request to >> reconnect. It may take 30 seconds, but the new connection will not >> timeout >> again. >> >> >> These problems may be related to the Ubuntu conversion from using OpenSSL >> to using GunTLS. It may be that OpenSSL or GnuTLS fails to shutdown the >> connectioncorrectly, or fails to tell ldap_search that the connection is >> down. >> >> In any case if the do_result fails with some timeout or connection >> problem, >> the conservative thing to do is to do through the do_with_reconnect >> and try >> a different server. >> >> Has anyone seen any similar problems? >> >> What we are testing now is using the Intrepid version of nss-ldap >> based on >> 260 on Hardy with the attached changes. >> >> Packages being used: >> libnss-ldap 260-1ubuntu2-dee1 (-dee1 has my changes) >> libldap-2.4-2 2.4.9-0ubuntu0.8.04.2 >> libgnutls13 2.0.4-1ubuntu2.3 >> nscd 2.7-10ubuntu4 >> > > -- Douglas E. Engert <DEEngert@...> Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439 (630) 252-5444 |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsOn Tue, 2009-04-21 at 15:22 -0500, Douglas E. Engert wrote:
> > Your analysis makes sense to me. But at the moment I'm no longer > > interested in nss-ldap since nss-ldapd ( + slapd nssov) works better > > and offers easier administration. > > Sounds interesting, but we are trying to stick with what is offered by > Ubuntu. FWIW some releases of Ubuntu have nss-ldapd (libnss-ldapd) but I would avoid version 0.5. The 0.6.7 release is known to work quite well and is included in Debian stable. There is however no packaged version of the nssov in slapd as far as I know (but you can use nss-ldapd without it). Since we're working hard on a PAM module (actually Howard Chu is doing all the hard work at the moment) as a side effect we may also make it more easily possible to use the nss-ldapd NSS module together with a packaged slapd-nssov package (if such a package would be made). (it's a bit awkward to post a more or less nss-ldapd promotional message on the nss_ldap list) -- -- arthur - arthur@... - http://ch.tudelft.nl/~arthur -- |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsArthur de Jong wrote: > On Tue, 2009-04-21 at 15:22 -0500, Douglas E. Engert wrote: >>> Your analysis makes sense to me. But at the moment I'm no longer >>> interested in nss-ldap since nss-ldapd ( + slapd nssov) works better >>> and offers easier administration. >> Sounds interesting, but we are trying to stick with what is offered by >> Ubuntu. > > FWIW some releases of Ubuntu have nss-ldapd (libnss-ldapd) but I would > avoid version 0.5. The 0.6.7 release is known to work quite well and is > included in Debian stable. There is however no packaged version of the > nssov in slapd as far as I know (but you can use nss-ldapd without it). Thanks, we will have to look at that. I did see in the archives that Howard Wilkinson on Dec 9, 2008 "Mega patch against nss_ldap 264" said: "My intention with this is to make the critical path through the code run the minimal code when a connection to the LDAP server exists, make recovery on failure more resilient, and provide for multiple SASL mechs without need to alter the ldap-nss code." If it handles the cases where do_result fails, and timeout and connection errors reconnect to any server that may fix the issue I have seen. > > Since we're working hard on a PAM module (actually Howard Chu is doing > all the hard work at the moment) as a side effect we may also make it > more easily possible to use the nss-ldapd NSS module together with a > packaged slapd-nssov package (if such a package would be made). > > (it's a bit awkward to post a more or less nss-ldapd promotional message > on the nss_ldap list) > -- Douglas E. Engert <DEEngert@...> Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439 (630) 252-5444 |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsDouglas E. Engert wrote:
> > > Arthur de Jong wrote: >> On Tue, 2009-04-21 at 15:22 -0500, Douglas E. Engert wrote: >>>> Your analysis makes sense to me. But at the moment I'm no longer >>>> interested in nss-ldap since nss-ldapd ( + slapd nssov) works better >>>> and offers easier administration. >>> Sounds interesting, but we are trying to stick with what is offered by >>> Ubuntu. >> >> FWIW some releases of Ubuntu have nss-ldapd (libnss-ldapd) but I would >> avoid version 0.5. The 0.6.7 release is known to work quite well and is >> included in Debian stable. There is however no packaged version of the >> nssov in slapd as far as I know (but you can use nss-ldapd without it). > > Thanks, we will have to look at that. > > I did see in the archives that Howard Wilkinson on Dec 9, 2008 > "Mega patch against nss_ldap 264" said: > > "My intention with this is to make the critical path through the code run > the minimal code when a connection to the LDAP server exists, make > recovery on failure more resilient, and provide for multiple SASL mechs > without need to alter the ldap-nss code." > done runs better than it did before but it does not address some of the stability issues I found. You will need to apply the patch and see how you get on. I am hoping to find time next month to revisit this, but as I am having trouble finding paying work (as most of the UK seems to be) this may slip if somebody finds something else for me to do. The major piece of work that is needed, apart from fixing my patch to be style compatible with the rest of nss_ldap, is to remove some recursion from the code that breaks if the underlying connection to the LDAP disconnects. This needs to be replaced with a list walking operation so that the reconnects can recover and continue if the remote server has gone away. I forget which piece of code this is, but I think it was in the groups generation operation. > If it handles the cases where do_result fails, and timeout and connection > errors reconnect to any server that may fix the issue I have seen. > >> >> Since we're working hard on a PAM module (actually Howard Chu is doing >> all the hard work at the moment) as a side effect we may also make it >> more easily possible to use the nss-ldapd NSS module together with a >> packaged slapd-nssov package (if such a package would be made). >> >> (it's a bit awkward to post a more or less nss-ldapd promotional message >> on the nss_ldap list) >> > porting the functionality into the nss-ldapd environment. But again time has not been on my side. If I can help then please feel free to ping me. Howard. |
|
|
Re: nss-ldap timeouts when used with nscd and gnutlsHoward Wilkinson wrote: > Douglas E. Engert wrote: >> >> >> Arthur de Jong wrote: >>> On Tue, 2009-04-21 at 15:22 -0500, Douglas E. Engert wrote: >>>>> Your analysis makes sense to me. But at the moment I'm no longer >>>>> interested in nss-ldap since nss-ldapd ( + slapd nssov) works better >>>>> and offers easier administration. >>>> Sounds interesting, but we are trying to stick with what is offered by >>>> Ubuntu. >>> >>> FWIW some releases of Ubuntu have nss-ldapd (libnss-ldapd) but I would >>> avoid version 0.5. The 0.6.7 release is known to work quite well and is >>> included in Debian stable. There is however no packaged version of the >>> nssov in slapd as far as I know (but you can use nss-ldapd without it). >> >> Thanks, we will have to look at that. >> >> I did see in the archives that Howard Wilkinson on Dec 9, 2008 >> "Mega patch against nss_ldap 264" said: >> >> "My intention with this is to make the critical path through the code run >> the minimal code when a connection to the LDAP server exists, make >> recovery on failure more resilient, and provide for multiple SASL mechs >> without need to alter the ldap-nss code." >> > Yes I said this but I have yet to finish this piece of code. What I have > done runs better than it did before but it does not address some of the > stability issues I found. > > You will need to apply the patch and see how you get on. I am hoping to > find time next month to revisit this, but as I am having trouble finding > paying work (as most of the UK seems to be) this may slip if somebody > finds something else for me to do. > OK, I was not sure where this major modification stood. > The major piece of work that is needed, apart from fixing my patch to be > style compatible with the rest of nss_ldap, is to remove some recursion > from the code that breaks if the underlying connection to the LDAP > disconnects. This needs to be replaced with a list walking operation so > that the reconnects can recover and continue if the remote server has > gone away. I forget which piece of code this is, but I think it was in > the groups generation operation. Your change may address the two bugs I turned into today, #391 and #392. If so that would be great. I was hopping to get #392 into the code upstream, of Debian and Ubuntu so they would pick them up. The #392 change is really adding two calls to do_close(), if a connection has an error or times out. This is not a perfect fix as the active request may still fail. But what we see is nscd stops working, but the caller like sshd, cron, ls, etc. will detect that nscd is not working and do calls to LDAP directly bypassing nscd. So nothing appears to fail, but an ls can take 15 seconds, or a login 30 seconds more then expected. >> If it handles the cases where do_result fails, and timeout and connection >> errors reconnect to any server that may fix the issue I have seen. >> >>> >>> Since we're working hard on a PAM module (actually Howard Chu is doing >>> all the hard work at the moment) as a side effect we may also make it >>> more easily possible to use the nss-ldapd NSS module together with a >>> packaged slapd-nssov package (if such a package would be made). >>> >>> (it's a bit awkward to post a more or less nss-ldapd promotional message >>> on the nss_ldap list) >>> >> > I had intended to get the nss_Ldap work finished and then look at > porting the functionality into the nss-ldapd environment. But again time > has not been on my side. > > If I can help then please feel free to ping me. > > Howard. > > > -- Douglas E. Engert <DEEngert@...> Argonne National Laboratory 9700 South Cass Avenue Argonne, Illinois 60439 (630) 252-5444 |
| Free embeddable forum powered by Nabble | Forum Help |