|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
NLM and CTDB recovery master node failureHi, all
I'm trying to implement clustered Samba on my cluster file system by using Samba+CTDB (version 3.4.2). I noticed on CTDB wiki page (http://wiki.samba.org/index.php/CTDB_Project) the following sentence: "To become a recovery master, a node must be able to acquire an exclusive lock on that file." So I am wondering how CTDB deals with recovery master failure. What happens if the node, CTDB recovery master is running on, has hardware failure and doesn't come up for a very long time (or even never)? NLM server of the underlying clustered file system will hold the lock until the client comes back up which might never happen so remaining nodes will not be able to select a new leader because none of them will be able to acquire an exclusive lock. Am I missing something? Thank you in advance, Sergey |
|
|
Re: NLM and CTDB recovery master node failureOn Thu, Oct 29, 2009 at 10:41:01AM +0200, Sergey Kleyman wrote:
> I'm trying to implement clustered Samba on my cluster file system by > using Samba+CTDB (version 3.4.2). I noticed on CTDB wiki page > (http://wiki.samba.org/index.php/CTDB_Project) the following sentence: > > "To become a recovery master, a node must be able to acquire an > exclusive lock on that file." > > So I am wondering how CTDB deals with recovery master failure. What > happens if the node, CTDB recovery master is running on, has hardware > failure and doesn't come up for a very long time (or even never)? NLM > server of the underlying clustered file system will hold the lock until > the client comes back up which might never happen so remaining nodes > will not be able to select a new leader because none of them will be > able to acquire an exclusive lock. Am I missing something? until that node comes back up, nobody will be able to take that lock? Our assumption so far is that shared fcntl locks behave like local fcntl locks: If a process that holds a lock dies, then the lock is released. It should not matter for what reason that process dies. A node being killed is a particularly nasty death for a process, but the lock must nevertheless be released. You *can* run ctdb without that shared lock. But the shared lock was there for a reason: We need to make sure that we have the same view of cluster membership as the cluster fs below has. You should look at ctdb setvar VerifyRecoveryLock 0 to work without a recovery lock. But be aware that this is NOT recommended. Volker |
|
|
RE: NLM and CTDB recovery master node failure> -----Original Message-----
hardware
> From: Volker Lendecke [mailto:Volker.Lendecke@...] > Sent: Thursday, October 29, 2009 11:21 > To: Sergey Kleyman > Cc: samba-technical@... > Subject: Re: NLM and CTDB recovery master node failure > > On Thu, Oct 29, 2009 at 10:41:01AM +0200, Sergey Kleyman wrote: > > I'm trying to implement clustered Samba on my cluster file system by > > using Samba+CTDB (version 3.4.2). I noticed on CTDB wiki page > > (http://wiki.samba.org/index.php/CTDB_Project) the following > sentence: > > > > "To become a recovery master, a node must be able to acquire an > > exclusive lock on that file." > > > > So I am wondering how CTDB deals with recovery master failure. What > > happens if the node, CTDB recovery master is running on, has > > failure and doesn't come up for a very long time (or even never)? NLM > > server of the underlying clustered file system will hold the lock > > until the client comes back up which might never happen so remaining > > nodes will not be able to select a new leader because none of them > > will be able to acquire an exclusive lock. Am I missing something? > > So you're saying that a node takes a lock, the node dies and until that > node comes back up, nobody will be able to take that lock? Our > assumption so far is that shared fcntl locks behave like local fcntl > locks: If a process that holds a lock dies, then the lock is released. > It should not matter for what reason that process dies. A node being > killed is a particularly nasty death for a process, but the lock must > nevertheless be released. > > You *can* run ctdb without that shared lock. But the shared lock was > there for a reason: We need to make sure that we have the same view of > cluster membership as the cluster fs below has. > > You should look at > > ctdb setvar VerifyRecoveryLock 0 > > to work without a recovery lock. But be aware that this is NOT > recommended. > > Volker Thanks for the reply but allow me to disagree about "shared fcntl locks behave like local fcntl locks" According to this http://www.opengroup.org/onlinepubs/009629799/chap9.htm#tagcjh_10 "Client Failure and Restart" "... the client NSM issues an SM_NOTIFY RPC to the NSM on the named host. In this example it will issue an SM_NOTIFY to the server NSM, including the client name and the new client state... The callback procedure in the server NLM notes that the client state has changed and releases all locks held on behalf of the client." So NLM server releases locks only when notified by client (in our case NLM client in Linux kernel) but obviously this happens only when the node that was holding the lock comes back up. So the problem is that NLM server doesn't have an ability to distinguish between failed client and client that holds a lock for a very long time. There's no proactive heartbeat as CTDB has. The document even says so explicitly (section "NSM Protocol") "... The NSM does not actively "probe" hosts it has been asked to monitor; instead it waits for the monitored host to notify it that the monitored host's status has changed (that is, crashed and rebooted). " It's not the case for the kernel which can easily distinguish between process that died (and so it should have all its locks automatically released) and process that is still running and holding a lock. Please correct me if I'm wrong. As for your advice about running CTDB without a recovery lock I would obviously prefer to use recommended configuration but I wonder what functionality will suffer from this choice? Thanks Sergey |
|
|
Re: NLM and CTDB recovery master node failureOn Thu, Oct 29, 2009 at 04:11:01PM +0200, Sergey Kleyman wrote:
> Thanks for the reply but allow me to disagree about "shared fcntl locks > behave like local fcntl locks" > > According to this > http://www.opengroup.org/onlinepubs/009629799/chap9.htm#tagcjh_10 > "Client Failure and Restart" > > "... the client NSM issues an SM_NOTIFY RPC to the NSM on the named > host. In this example it will issue an SM_NOTIFY to the server NSM, > including the client name and the new client state... The callback > procedure in the server NLM notes that the client state has changed and > releases all locks held on behalf of the client." > > So NLM server releases locks only when notified by client (in our case > NLM client in Linux kernel) but obviously this happens only when the > node that was holding the lock comes back up. So the problem is that NLM > server doesn't have an ability to distinguish between failed client and > client that holds a lock for a very long time. There's no proactive > heartbeat as CTDB has. The document even says so explicitly (section > "NSM Protocol") expect is different. We view the cluster not as a group of NFS clients whose servers have to adhere to that standard behaviour. In fact, in Samba we definitely do not support re-exporting NFS imports, problems with locking being the main reason for this. Please use a different cluster file system that does not exhibit this behaviour or run without the central reclockfile. Volker |
|
|
Re: NLM and CTDB recovery master node failureOn Thu, Oct 29, 2009 at 04:34:14PM +0100, Volker Lendecke wrote:
> Please use a different cluster file system that does not > exhibit this behaviour or run without the central > reclockfile. Ok, I've got a question: Can we achieve the same result we use the fcntl lock on the reclockfile for with another API on your system? We need to very quickly determine correct cluster membership of all ctdb nodes: If nobody can get the reclock lock, then we're broken. If more than one can get it, we've got a split brain. How can we get that info reliably out of your cluster fs without using the fcntl lock? Volker |
|
|
RE: NLM and CTDB recovery master node failure> -----Original Message-----
> From: Volker Lendecke [mailto:Volker.Lendecke@...] > Sent: Thursday, October 29, 2009 17:48 > To: Sergey Kleyman > Cc: samba-technical@... > Subject: Re: NLM and CTDB recovery master node failure > > On Thu, Oct 29, 2009 at 04:34:14PM +0100, Volker Lendecke wrote: > > Please use a different cluster file system that does not exhibit this > > behaviour or run without the central reclockfile. > > Ok, I've got a question: Can we achieve the same result we use the > fcntl lock on the reclockfile for with another API on your system? > > We need to very quickly determine correct cluster membership of all > ctdb nodes: If nobody can get the reclock lock, then we're broken. If > more than one can get it, we've got a split brain. How can we get that > info reliably out of your cluster fs without using the fcntl lock? > > Volker We have our internal API that are implemented on top of Spread Toolkit (http://www.spread.org/) but our goal is to make as less changes to Samba as possible so changing election code to use our API is not the optimal solution. I guess it'll be easier to adhere to Samba's assumptions about NLM and provide automatic lock clean-up in case of the node failure. Are you sure that GPFS and/or GFS have this capability? As a side note: if I understand you correctly CTDB is assumed to be running on the same machines as underlying file system. I was under the impression that it's possible to run file system on machines A and B, while Samba+CTDB will run on different machines C and D that will see clustered file system through NFS mounts in which case C and D are just NLM clients to the file system. One more point I wanted to inquire about: if smbd daemons dies for some reason (abnormal exit - panic, etc.) what happens to CIFS locks it was holding? Are those locks automatically cleaned up? Thanks, Sergey |
|
|
Re: NLM and CTDB recovery master node failureOn Thu, Oct 29, 2009 at 09:20:30PM +0200, Sergey Kleyman wrote:
> We have our internal API that are implemented on top of Spread Toolkit > (http://www.spread.org/) but our goal is to make as less changes to > Samba as possible so changing election code to use our API is not the > optimal solution. I guess it'll be easier to adhere to Samba's > assumptions about NLM and provide automatic lock clean-up in case of the > node failure. Are you sure that GPFS and/or GFS have this capability? I haven't tested it myself, but this is a basic assumption in ctdb. Tridge might answer this authoritatively. > As a side note: if I understand you correctly CTDB is assumed to be > running on the same machines as underlying file system. I was under the > impression that it's possible to run file system on machines A and B, > while Samba+CTDB will run on different machines C and D that will see > clustered file system through NFS mounts in which case C and D are just > NLM clients to the file system. Why would you want to do that? Going through the network twice is a very bad idea for performance. And as I said, the fcntl locking problems plus very frequent client lockups due to buggy NFS clients under CIFS load really tell us that you asking more trouble than you will appreciate. > One more point I wanted to inquire about: if smbd daemons dies for some > reason (abnormal exit - panic, etc.) what happens to CIFS locks it was > holding? Are those locks automatically cleaned up? They are cleaned up. Look for example at the for-loop in source3/locking/locking.c:650ff in current master. We also send immediate retry messages to all processes in case the parent smbd detects a child has died. Volker |
|
|
Re: NLM and CTDB recovery master node failureOn Fri, Oct 30, 2009 at 6:20 AM, Sergey Kleyman
<Sergey.Kleyman@...> wrote: >> -----Original Message----- >> From: Volker Lendecke [mailto:Volker.Lendecke@...] >> Sent: Thursday, October 29, 2009 17:48 >> To: Sergey Kleyman >> Cc: samba-technical@... >> Subject: Re: NLM and CTDB recovery master node failure >> >> On Thu, Oct 29, 2009 at 04:34:14PM +0100, Volker Lendecke wrote: >> > Please use a different cluster file system that does not exhibit > this >> > behaviour or run without the central reclockfile. >> >> Ok, I've got a question: Can we achieve the same result we use the >> fcntl lock on the reclockfile for with another API on your system? >> >> We need to very quickly determine correct cluster membership of all >> ctdb nodes: If nobody can get the reclock lock, then we're broken. If >> more than one can get it, we've got a split brain. How can we get that >> info reliably out of your cluster fs without using the fcntl lock? >> >> Volker > > We have our internal API that are implemented on top of Spread Toolkit > (http://www.spread.org/) but our goal is to make as less changes to > Samba as possible so changing election code to use our API is not the > optimal solution. I guess it'll be easier to adhere to Samba's > assumptions about NLM and provide automatic lock clean-up in case of the > node failure. Are you sure that GPFS and/or GFS have this capability? Yes. Locks and open files need to be recovered by the cluster filesystem very promptly anyway since if an i/o is blocked for 40 seconds or more, you are very likely causing the redirector to timeout with data corruption as a result. > > As a side note: if I understand you correctly CTDB is assumed to be > running on the same machines as underlying file system. I was under the > impression that it's possible to run file system on machines A and B, > while Samba+CTDB will run on different machines C and D that will see > clustered file system through NFS mounts in which case C and D are just > NLM clients to the file system. Do not re-export nfs, bad things happens, which is why knfsd for example refuses to re-export nfs shares. Also, do not use NFS for locking, or to store the reclock file. NFS file locking in v2/v3 is very unreliable and will break things. Instead, if you do need split-brain protection but you can not use open()/fcntl() on a reclock file due to cluster filesystem semantincs you can either run it without a reclockfile, which opens the possibility of scplit brain so it is probably sub-optimal. It should be reasonably easy to replace the recovery-lock with a different mechanism using some other type of shared resource as arbitrator. Most of what you need would be to replace ctdb_recovery_lock() with an alternative function that uses something else. Perhaps have a shared dedicated scsi device and use persistent reservations? that would be useful. (Just dont use NFS, nfs file locking is broken by design so this will cause more problems than it is worth.) > > One more point I wanted to inquire about: if smbd daemons dies for some > reason (abnormal exit - panic, etc.) what happens to CIFS locks it was > holding? Are those locks automatically cleaned up? > > Thanks, Sergey > |
| Free embeddable forum powered by Nabble | Forum Help |