Split brain in a dual primary configuration

View: New views
5 Messages — Rating Filter:   Alert me  

Split brain in a dual primary configuration

by Jean-Francois Chevrette-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

we have a Citrix XenServer two-nodes cluster on which both nodes has a
local partition that is configured as a DRBD resource. The resource is
set to become primary on both nodes simultaneously. XenServer uses LVM
and it is my understanding that it works in a way that any LV will ever
be in use on both hosts at the same this and thus ensuring consistency
between our dual-primary hosts.

For the DRBD connectivity, both nodes are connected directly through a
cross-over cable.

For testing purposes, we have unplugged the network interfaces and thus
forced both nodes to become WFConnection and in a Primary/Unknown state.
VMs on each node kept working as usual.

However, after reconnecting the network interfaces, both nodes became
StandAlone and logs were showing that a Split-brain had been detected.
It was my understanding that DRBD would have been able to sync OOS
blocks from each nodes to the other one properly.

What is supposed to happen when nodes from a dual-primary configuration
reconnects to each other?

Our configuration is as follow:

global {
   usage-count no;
}

common {
   protocol C;

   startup {
     become-primary-on both;
   }

   syncer {
     rate 33M;
     verify-alg crc32c;
     al-extents 1801;
   }
   net {
     cram-hmac-alg sha1;
     max-epoch-size 8192;
     max-buffers 8192;
     after-sb-0pri discard-zero-changes;
     after-sb-1pri discard-secondary;
     after-sb-2pri disconnect;
     allow-two-primaries;
   }

   disk {
     on-io-error detach;
     no-disk-flushes;
     no-disk-barrier;
     no-md-flushes;
   }
}

resource drbd0 {
   disk /dev/sda3;
   device /dev/drbd0;
   flexible-meta-disk internal;
   on node1 {
     address 10.10.0.1:7788;
   }
   on node2 {
     address 10.10.0.2:7788;
   }
}


Logs from when we reconnected both nodes:
block drbd0: Handshake successful: Agreed network protocol version 91
block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
block drbd0: conn( WFConnection -> WFReportParams )
block drbd0: Starting asender thread (from drbd0_receiver [7644])
block drbd0: data-integrity-alg: <not-used>
block drbd0: drbd_sync_handshake:
block drbd0: self
95BA39C140141F17:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57
bits:160 flags:0
block drbd0: peer
F83F651106A22A31:ADE0E340AD8230BB:0CAA835AA97548CC:CF72ED70E8F22F57
bits:51795 flags:0
block drbd0: uuid_compare()=100 by rule 90
block drbd0: Split-Brain detected, dropping connection!
block drbd0: helper command: /sbin/drbdadm split-brain minor-0
block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code
0 (0x0)
block drbd0: conn( WFReportParams -> Disconnecting )
block drbd0: error receiving ReportState, l: 4!
block drbd0: asender terminated
block drbd0: Terminating asender thread
block drbd0: Connection closed
block drbd0: conn( Disconnecting -> StandAlone )
block drbd0: receiver terminated
block drbd0: Terminating receiver thread



Can anyone tell me why I am not getting the behavior I am expecting?


Regards,
--
Jean-François Chevrette [iWeb]


_______________________________________________
drbd-user mailing list
drbd-user@...
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: Split brain in a dual primary configuration

by Gianluca Cecchi-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Fri, Oct 30, 2009 at 7:43 PM, Jean-Francois Chevrette <jfchevrette@...> wrote:
Hello,

we have a Citrix XenServer two-nodes cluster on which both nodes has a local partition that is configured as a DRBD resource. The resource is set to become primary on both nodes simultaneously. XenServer uses LVM and it is my understanding that it works in a way that any LV will ever be in use on both hosts at the same this and thus ensuring consistency between our dual-primary hosts.

For the DRBD connectivity, both nodes are connected directly through a cross-over cable.

For testing purposes, we have unplugged the network interfaces and thus forced both nodes to become WFConnection and in a Primary/Unknown state. VMs on each node kept working as usual.

However, after reconnecting the network interfaces, both nodes became StandAlone and logs were showing that a Split-brain had been detected. It was my understanding that DRBD would have been able to sync OOS blocks from each nodes to the other one properly.

What is supposed to happen when nodes from a dual-primary configuration reconnects to each other?


   after-sb-2pri disconnect;
   allow-two-primaries;
 }




Logs from when we reconnected both nodes:

block drbd0: Split-Brain detected, dropping connection!


Can anyone tell me why I am not getting the behavior I am expecting?

Hello,
the message is self explanatory: in drbd.conf you define the policy to "disconnect" when you get a split brain (sb) deriving from a 2-primary scenario.
And so does drbd...

btw: having you dual primary and LVM, are you using also CLVMD? Otherwise if you do modifications on one VG (such as add an lv) you don't see them immediately, because you don't have cluster locking...

Bye,
Gianluca

_______________________________________________
drbd-user mailing list
drbd-user@...
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: Split brain in a dual primary configuration

by Jean-Francois Chevrette-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

On 09-10-30 3:07 PM, Gianluca Cecchi wrote:
> Hello,
> the message is self explanatory: in drbd.conf you define the policy to
> "disconnect" when you get a split brain (sb) deriving from a 2-primary
> scenario.
> And so does drbd...

But what else would be more appropriate for such a situation? In fact,
that's what we want to do, have both nodes to disconnect. We don't want
either of them to become secondary.

Is it acceptable to have both nodes remain primaries while they are
disconnected and expect them to sync to each other properly when they
are connected again?

> btw: having you dual primary and LVM, are you using also CLVMD?
> Otherwise if you do modifications on one VG (such as add an lv) you
> don't see them immediately, because you don't have cluster locking...

We are not using clvm. When a new VG or LV is created, we see it
immediately on the second node. Maybe Citrix XenServer has a mechanism
so that LVM is reloaded on both nodes when a new VM is created on the
cluster?


Regards,
--
Jean-François Chevrette [iWeb]

_______________________________________________
drbd-user mailing list
drbd-user@...
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: Split brain in a dual primary configuration

by Martin Gombac( :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

mind that i'm no expert and can be completely wrong but..

LVM works on top of drbd in active/passive mode only.
For active/active you need CLVM (and all of that RH Cluster Suite S**t)

It's the same as with filesystems, if you want to have it mounted on
many locations at the same time, you need locking, so that no two nodes
write at the same spot/block at the same time. LVM by itself doesn't
guarantee that.

But to inform you, you're not only one who tried that setup. :-) I'm
currently looking for appropriate solution.

One might be:
Disk <-> LVM <-> DRBD[X] <-> domU[X].
Where each DRBD instance is one virtual machine.
It would work, only if during live virtual machine migration from host A
to host B, writing on host B starts _after_ all writing on host A ceases.

Does anyone know if this would work and if XEN can/will write
concurrently during live migration on both backing devices (DRBD[X])?

Regards,
M.


Jean-Francois Chevrette wrote:

> Hello,
>
> On 09-10-30 3:07 PM, Gianluca Cecchi wrote:
>> Hello,
>> the message is self explanatory: in drbd.conf you define the policy to
>> "disconnect" when you get a split brain (sb) deriving from a 2-primary
>> scenario.
>> And so does drbd...
>
> But what else would be more appropriate for such a situation? In fact,
> that's what we want to do, have both nodes to disconnect. We don't want
> either of them to become secondary.
>
> Is it acceptable to have both nodes remain primaries while they are
> disconnected and expect them to sync to each other properly when they
> are connected again?
>
>> btw: having you dual primary and LVM, are you using also CLVMD?
>> Otherwise if you do modifications on one VG (such as add an lv) you
>> don't see them immediately, because you don't have cluster locking...
>
> We are not using clvm. When a new VG or LV is created, we see it
> immediately on the second node. Maybe Citrix XenServer has a mechanism
> so that LVM is reloaded on both nodes when a new VM is created on the
> cluster?
>
>
> Regards,

_______________________________________________
drbd-user mailing list
drbd-user@...
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: Split brain in a dual primary configuration

by Martin Gombac( :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Well,

it's actually being done in such a way. Apparently it even has it's own
manual chapter. :-)
http://www.drbd.org/users-guide/ch-xen.html
And a blog entry:
http://blogs.linbit.com/florian/2007/09/03/drbd-806-brings-full-live-migration-for-xen-on-drbd/

I guess that's the way to go. :-)

Regards,
M.

Martin Gombac( wrote:

> Hi,
>
> mind that i'm no expert and can be completely wrong but..
>
> LVM works on top of drbd in active/passive mode only.
> For active/active you need CLVM (and all of that RH Cluster Suite S**t)
>
> It's the same as with filesystems, if you want to have it mounted on
> many locations at the same time, you need locking, so that no two nodes
> write at the same spot/block at the same time. LVM by itself doesn't
> guarantee that.
>
> But to inform you, you're not only one who tried that setup. :-) I'm
> currently looking for appropriate solution.
>
> One might be:
> Disk <-> LVM <-> DRBD[X] <-> domU[X].
> Where each DRBD instance is one virtual machine.
> It would work, only if during live virtual machine migration from host A
> to host B, writing on host B starts _after_ all writing on host A ceases.
>
> Does anyone know if this would work and if XEN can/will write
> concurrently during live migration on both backing devices (DRBD[X])?
>
> Regards,
> M.
>
>
> Jean-Francois Chevrette wrote:
>> Hello,
>>
>> On 09-10-30 3:07 PM, Gianluca Cecchi wrote:
>>> Hello,
>>> the message is self explanatory: in drbd.conf you define the policy to
>>> "disconnect" when you get a split brain (sb) deriving from a 2-primary
>>> scenario.
>>> And so does drbd...
>>
>> But what else would be more appropriate for such a situation? In fact,
>> that's what we want to do, have both nodes to disconnect. We don't
>> want either of them to become secondary.
>>
>> Is it acceptable to have both nodes remain primaries while they are
>> disconnected and expect them to sync to each other properly when they
>> are connected again?
>>
>>> btw: having you dual primary and LVM, are you using also CLVMD?
>>> Otherwise if you do modifications on one VG (such as add an lv) you
>>> don't see them immediately, because you don't have cluster locking...
>>
>> We are not using clvm. When a new VG or LV is created, we see it
>> immediately on the second node. Maybe Citrix XenServer has a mechanism
>> so that LVM is reloaded on both nodes when a new VM is created on the
>> cluster?
>>
>>
>> Regards,
>
> _______________________________________________
> drbd-user mailing list
> drbd-user@...
> http://lists.linbit.com/mailman/listinfo/drbd-user
_______________________________________________
drbd-user mailing list
drbd-user@...
http://lists.linbit.com/mailman/listinfo/drbd-user