Multiple LDAP servers, single URI, server shutting down, hangs or fails!

View: New views
3 Messages — Rating Filter:   Alert me  

Multiple LDAP servers, single URI, server shutting down, hangs or fails!

by Howard Wilkinson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We have what I think of as a 'standard' mixed environment set up and
everything works under normal operation BUT when one of our LDAP servers
is shutting down we get failures. I think this is a short coming in the
openldap library's handling of the 'uri' settings but would like some
more info and wondered if anybody can shed some more light on this. I
have traced through the library code to the 'ldap_connect_to_host'
routine in the os-ip.c file in the openldap library and think this is
where the problem arises but have no direct evidence.

Our set-up is as follows. We use Active Directory as our LDAP/KDC
supplier (these are Win2K3R2 boxes but I have seen this with other
flavours). On this particular environment we have 2 servers both fairly
lightly loaded most of the time. However, one of these server runs
Exchange 2000 and when shutting down can take up to 25 minutes to get to
the point where the network interface stops responding to pings.

The Unix side is configured with nss_ldap (264 + my kerberos patches)
and uses kerberos sasl connections to the LDAP service under AD.

The system is also configured to use pam_krb5 as the authenticator which
may amplify the problem as the KDC seems to shut down before the LDAP
service.

The ldap.conf file contains a single 'uri' statement which looks like this.

uri ldap://active-directory-domain

The look up of the domain will give multiple addresses in our case
192.168.10.1 and 192.168.10.3! (The second is our Exchange Server)

While the exchange server machine is shutting down we get login failures
(pam_krb5 reports incorrect password) and 'getent password' does not
report user entries.

We run NSCD on our boxes just to complicate matters.

It looks to me like the LDAP code will connect to the LDAP server on the
machine that is closing down but as it cannot get service it reports a
failure which results in the upper level code not listing the users.
That is the socket is still accepting connections but the LDAP server
has already died on the Active Directory box ... this is potentially a
Microsoft bug, but we should be working around this as a partially crash
server would give the same results elsewhere.

Now I could use a url with multiple host names and it looks like this
might work a bit better, as the code seems to have a mechanism to
iterate through the hosts. But I was wondering if this should be fixed
in the OpenLDAP library especially as listing the Domain Name allows us
to add and remove AD servers dynamically and the DNS provides the lookup.

As an alternative or an addition should we be handling the sites and
services information in the DNS and binding via SRV lookups? Again is
this a job for the OpenLDAP library or should nss_ldap handle this.

I am struggling to work out which mailing list in the OpenLDAP fora
would be appropriate to try to discuss this and was hoping somebody here
could also point me down that path.

Regards, Howard.



Parent Message unknown Re: Multiple LDAP servers, single URI, server shutting down, hangs or fails!

by Howard Wilkinson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mathieu MILLET wrote:
On Sat, 15 Nov 2008 18:54:35 +0000, Howard Wilkinson howard@...
wrote:

Hi,

  
[snip]
this.

uri ldap://active-directory-domain

The look up of the domain will give multiple addresses in our case 
192.168.10.1 and 192.168.10.3! (The second is our Exchange Server)
    

Well, maybe, the first problem to address is relative to DNS.

What the following command gives you ?
$ host active-directory-domain
  
This is active directory so we get the IP address of all interfaces on all domain controllers! In this case this is 2 addresses!
Depending on your DNS server and your DNS resolver (client), the upper
openldap libray may get a single IP address as an answer.

And since the DNS resolution was successful, there is no reason not to keep
this "correct" IP address.
So, the ldap library will keep this IP address for the sole need to
establish the connection. If the connection fails and there is no other "IP
address" to try, ... no luck.
 
  
While the exchange server machine is shutting down we get login failures 
    
{snip]
  
Now I could use a url with multiple host names and it looks like this 
might work a bit better, as the code seems to have a mechanism to 
iterate through the hosts.
    

Using multiple uri is definitely working (using here), just be sure to
adjust timer. By default, on a Linux system, complete login (including
typing user and password and granting access) must be performed in less
than 60 seconds. So you may have to lower the bind_timelimit and timelimit,
but that could become tricky with the "sometines loaded" server.

  
Yes I am sure it works but it looks from the code that it may be subject to a number of "stalls" as the 'closing' server may accept the connection but not respond to the LDAP interaction correctly (I am still tryng to confirm if the server just sends empty responses during this time window or if it sends some error status or what - need to capture a network trace here)

  
But I was wondering if this should be fixed 
in the OpenLDAP library especially as listing the Domain Name allows us 
to add and remove AD servers dynamically and the DNS provides the lookup.
    

Sorry, I don't understand what you mean (maybe english isn't so good :-( ).

But about DNS (servers and clients), don't forget that their some timing
issue regarding the propagation of new "updated" information.
 
  
Problem is not at the DNS level - the LDAP servers stay listed unless you demote the domain controller - so a shut down server is still in the list returned from DNS, but will not accept a connect so that times out fast!
As an alternative or an addition should we be handling the sites and 
services information in the DNS and binding via SRV lookups? Again is 
this a job for the OpenLDAP library or should nss_ldap handle this.
    

Well, in fact, the openldap libray is already implementing this. With the
standard ldap client library, if you don't specifiy any uri or host, the
client will use the base 'dn' suffix (if it is coded with "dc" object) and
make DNS SRV requests based on the "suffix" domain to retrieve the ldap
servers addresses.
  
This is only half of what is needed to do sites and services, but is better than nothing! At least the server round robin would be attempted!
Concerning the nssldap, it seems (never tested personnaly) that srv records
can be used to determine LDAP servers. See the private function
_nss_ldap_mergeconfigfromdns from ldap-nss.c . To know precisely,
which(what?) DNS requests are performed, just put a tcpdump and listen for
everything passing on UDP port 53.

  
Thanks I will look at this, but I still think the problem lies down in the one-at-a-time nature of the OpenLDAP libraries connection attempts to the servers.

  
I am struggling to work out which mailing list in the OpenLDAP fora 
would be appropriate to try to discuss this and was hoping somebody here 
could also point me down that path.

Regards, Howard.
    

I hope my explanations will help you.

Sincerely yours, Mathieu MILLET.

  

Thanks for this, has clarified some of my concerns, but still leaves me with an issue with this set up.

Howard.


Re: Multiple LDAP servers, single URI, server shutting down, hangs or fails!

by Howard Wilkinson :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Howard Wilkinson wrote:
Mathieu MILLET wrote:
On Sat, 15 Nov 2008 18:54:35 +0000, Howard Wilkinson howard@...
wrote:

Hi,

  
[snip]
this.

uri ldap://active-directory-domain

The look up of the domain will give multiple addresses in our case 
192.168.10.1 and 192.168.10.3! (The second is our Exchange Server)
    

Well, maybe, the first problem to address is relative to DNS.

What the following command gives you ?
$ host active-directory-domain
  
This is active directory so we get the IP address of all interfaces on all domain controllers! In this case this is 2 addresses!
Depending on your DNS server and your DNS resolver (client), the upper
openldap libray may get a single IP address as an answer.

And since the DNS resolution was successful, there is no reason not to keep
this "correct" IP address.
So, the ldap library will keep this IP address for the sole need to
establish the connection. If the connection fails and there is no other "IP
address" to try, ... no luck.
 
  
While the exchange server machine is shutting down we get login failures 
    
{snip]
  
Now I could use a url with multiple host names and it looks like this 
might work a bit better, as the code seems to have a mechanism to 
iterate through the hosts.
    

Using multiple uri is definitely working (using here), just be sure to
adjust timer. By default, on a Linux system, complete login (including
typing user and password and granting access) must be performed in less
than 60 seconds. So you may have to lower the bind_timelimit and timelimit,
but that could become tricky with the "sometines loaded" server.

  
Yes I am sure it works but it looks from the code that it may be subject to a number of "stalls" as the 'closing' server may accept the connection but not respond to the LDAP interaction correctly (I am still tryng to confirm if the server just sends empty responses during this time window or if it sends some error status or what - need to capture a network trace here)
But I was wondering if this should be fixed 
in the OpenLDAP library especially as listing the Domain Name allows us 
to add and remove AD servers dynamically and the DNS provides the lookup.
    

Sorry, I don't understand what you mean (maybe english isn't so good :-( ).

But about DNS (servers and clients), don't forget that their some timing
issue regarding the propagation of new "updated" information.
 
  
Problem is not at the DNS level - the LDAP servers stay listed unless you demote the domain controller - so a shut down server is still in the list returned from DNS, but will not accept a connect so that times out fast!
As an alternative or an addition should we be handling the sites and 
services information in the DNS and binding via SRV lookups? Again is 
this a job for the OpenLDAP library or should nss_ldap handle this.
    

Well, in fact, the openldap libray is already implementing this. With the
standard ldap client library, if you don't specifiy any uri or host, the
client will use the base 'dn' suffix (if it is coded with "dc" object) and
make DNS SRV requests based on the "suffix" domain to retrieve the ldap
servers addresses.
  
This is only half of what is needed to do sites and services, but is better than nothing! At least the server round robin would be attempted!
Concerning the nssldap, it seems (never tested personnaly) that srv records
can be used to determine LDAP servers. See the private function
_nss_ldap_mergeconfigfromdns from ldap-nss.c . To know precisely,
which(what?) DNS requests are performed, just put a tcpdump and listen for
everything passing on UDP port 53.

  
Thanks I will look at this, but I still think the problem lies down in the one-at-a-time nature of the OpenLDAP libraries connection attempts to the servers.
Have tried this out and it half fixes my problem! The issue definitely now lies in the pam_krb5 operation ... I suspect the KDC is getting half shutdown and reporting the wrong things, but that is for someone else's list to discuss!

This stills leaves a problem with sites and services to be fixed. I will do a hack for the nss code to accept a site name option for now and look at doing a proper extension to do subnet matching!
I am struggling to work out which mailing list in the OpenLDAP fora 
would be appropriate to try to discuss this and was hoping somebody here 
could also point me down that path.

Regards, Howard.
    

I hope my explanations will help you.

Sincerely yours, Mathieu MILLET.

  

Thanks for this, has clarified some of my concerns, but still leaves me with an issue with this set up.

Howard.

Thanks for the pointer, got me some forward progress.

Howard.