|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Skipping initial sync, and full sync after node failureHi, All --
I'm working on getting DRBD up and running for a large storage array -- around 10TB. I'm having two issues that I suspect are related, and am hoping that someone might be able to help me out with them. Specifically, I'm wondering whether these issues are related; and if so, whether there is a method that can be used to allow the desired behavior. First of all, 10TB requires a fair amount of time to sync. Even if a 10Gbps network is used, drbd's limit of 650MB/s means that a full sync would take between 4 and 5 hours. With a 1Gbps network, that time rises to closer to 18 hours since the effective speed is 125MB/s. Because of this, I'm hoping to avoid the initial sync phase. I'm starting with empty disks, and so I don't need the bitmap to be synchronized for data preservation or anything like that. In googling around, I found the following command, which does in fact have the effect of causing both nodes to report that they are UpToDate: drbdadm -- 6::::1 set-gi resource So, the first question is whether this is, in fact, the appropriate command to use if one wants to avoid the initial sync. Is there another method that's preferred? Is it simply not possible to skip the initial sync any longer? What I really want is a way to tell drbd to sync the bitmap without actually syncing data, since there isn't data that I care about. The second issue that I'm having is that having established both nodes as UpToDate using the command above, and having swapped the Primary role back and forth between the hosts successfully, if one node fails (or is rebooted), it requires a full sync after coming back online. This happens even if the other node was primary at the time that the local machine went down, and if no changes have been made to the local node. It appears that there is something going on with the size of the bitmap changing on the rebooted host, based on the logs, though that doesn't really make all that much sense to me: Oct 21 17:16:11 scurry4 kernel: drbd0: No usable activity log found. Oct 21 17:16:11 scurry4 kernel: drbd0: max_segment_size ( = BIO size ) = 32768 Oct 21 17:16:11 scurry4 kernel: drbd0: drbd_bm_resize called with capacity == 21462221048 Oct 21 17:16:11 scurry4 kernel: drbd0: resync bitmap: bits=2682777631 words=41918401 Oct 21 17:16:11 scurry4 kernel: drbd0: size = 10 TB (10731110524 KB) Oct 21 17:16:11 scurry4 kernel: drbd0: Writing the whole bitmap, size changed Oct 21 17:16:11 scurry4 kernel: drbd0: writing of bitmap took 468 jiffies Oct 21 17:16:11 scurry4 kernel: drbd0: 10 TB (2682777631 bits) marked out-of-sync by on disk bit-map. Oct 21 17:16:12 scurry4 kernel: drbd0: reading of bitmap took 289 jiffies Oct 21 17:16:12 scurry4 kernel: drbd0: recounting of set bits took additional 271 jiffies Oct 21 17:16:12 scurry4 kernel: drbd0: 10 TB (2682777631 bits) marked out-of-sync by on disk bit-map. Oct 21 17:16:12 scurry4 kernel: drbd0: disk( Attaching -> Inconsistent ) Oct 21 17:16:12 scurry4 kernel: drbd0: Writing meta data super block now. The remote host, which remained up, shows this in its logs: Oct 21 17:16:48 scurry24 kernel: drbd0: Becoming sync source due to disk states. Oct 21 17:16:48 scurry24 kernel: drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake. Oct 21 17:16:48 scurry24 kernel: drbd0: Writing meta data super block now. I'm wondering whether its possible that this behavior is related to the skipped sync documented above, or if it may be related in some way to the size of the device being synced. Has anyone seen this before, or can anyone shed some light on that? Basic info: OS is CentOS x86_64. Kernel version is 2.6.18-128.1.10.el5. DRBD is version 8.2.6-2. Thanks for any help, Ian _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Skipping initial sync, and full sync after node failureHi Ian,
when creating a new resource which doesn't have any data that you want to keep on either node, you can use the following: # drbdadm -- --clear-bitmap new-current-uuid drbd0 You can see the documentation here and make sure this applies to your situation: http://www.drbd.org/users-guide/re-drbdadm.html Then, your second issue should not occur as both nodes will now have a synchronized bitmap. It least that's how I understand it. Regards, -- Jean-François Chevrette [iWeb] On 09-10-22 8:57 AM, Ian Marlier wrote: > Hi, All -- > > I'm working on getting DRBD up and running for a large storage array -- > around 10TB. I'm having two issues that I suspect are related, and am > hoping that someone might be able to help me out with them. > Specifically, I'm wondering whether these issues are related; and if so, > whether there is a method that can be used to allow the desired behavior. > > First of all, 10TB requires a fair amount of time to sync. Even if a > 10Gbps network is used, drbd's limit of 650MB/s means that a full sync > would take between 4 and 5 hours. With a 1Gbps network, that time rises > to closer to 18 hours since the effective speed is 125MB/s. > > Because of this, I'm hoping to avoid the initial sync phase. I'm > starting with empty disks, and so I don't need the bitmap to be > synchronized for data preservation or anything like that. In googling > around, I found the following command, which does in fact have the > effect of causing both nodes to report that they are UpToDate: > drbdadm -- 6::::1 set-gi resource > > So, the first question is whether this is, in fact, the appropriate > command to use if one wants to avoid the initial sync. Is there > another method that's preferred? Is it simply not possible to skip the > initial sync any longer? What I really want is a way to tell drbd to > sync the bitmap without actually syncing data, since there isn't data > that I care about. > > The second issue that I'm having is that having established both nodes > as UpToDate using the command above, and having swapped the Primary role > back and forth between the hosts successfully, if one node fails (or is > rebooted), it requires a full sync after coming back online. This > happens even if the other node was primary at the time that the local > machine went down, and if no changes have been made to the local node. > It appears that there is something going on with the size of the bitmap > changing on the rebooted host, based on the logs, though that doesn't > really make all that much sense to me: > Oct 21 17:16:11 scurry4 kernel: drbd0: No usable activity log found. > Oct 21 17:16:11 scurry4 kernel: drbd0: max_segment_size ( = BIO > size ) = 32768 > Oct 21 17:16:11 scurry4 kernel: drbd0: drbd_bm_resize called with > capacity == 21462221048 > Oct 21 17:16:11 scurry4 kernel: drbd0: resync bitmap: > bits=2682777631 words=41918401 > Oct 21 17:16:11 scurry4 kernel: drbd0: size = 10 TB (10731110524 KB) > Oct 21 17:16:11 scurry4 kernel: drbd0: Writing the whole bitmap, > size changed > Oct 21 17:16:11 scurry4 kernel: drbd0: writing of bitmap took 468 > jiffies > Oct 21 17:16:11 scurry4 kernel: drbd0: 10 TB (2682777631 bits) > marked out-of-sync by on disk bit-map. > Oct 21 17:16:12 scurry4 kernel: drbd0: reading of bitmap took 289 > jiffies > Oct 21 17:16:12 scurry4 kernel: drbd0: recounting of set bits took > additional 271 jiffies > Oct 21 17:16:12 scurry4 kernel: drbd0: 10 TB (2682777631 bits) > marked out-of-sync by on disk bit-map. > Oct 21 17:16:12 scurry4 kernel: drbd0: disk( Attaching -> > Inconsistent ) > Oct 21 17:16:12 scurry4 kernel: drbd0: Writing meta data super > block now. > > The remote host, which remained up, shows this in its logs: > Oct 21 17:16:48 scurry24 kernel: drbd0: Becoming sync source due to > disk states. > Oct 21 17:16:48 scurry24 kernel: drbd0: Writing the whole bitmap, > full sync required after drbd_sync_handshake. > Oct 21 17:16:48 scurry24 kernel: drbd0: Writing meta data super > block now. > > I'm wondering whether its possible that this behavior is related to the > skipped sync documented above, or if it may be related in some way to > the size of the device being synced. Has anyone seen this before, or > can anyone shed some light on that? > > Basic info: OS is CentOS x86_64. Kernel version is > 2.6.18-128.1.10.el5. DRBD is version 8.2.6-2. > > Thanks for any help, > > Ian > > > > _______________________________________________ > drbd-user mailing list > drbd-user@... > http://lists.linbit.com/mailman/listinfo/drbd-user _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Skipping initial sync, and full sync after node failureOn Thu, Oct 22, 2009 at 02:44:45PM -0400, Jean-Francois Chevrette wrote:
> Hi Ian, > > when creating a new resource which doesn't have any data that you want > to keep on either node, you can use the following: > > # drbdadm -- --clear-bitmap new-current-uuid drbd0 > > You can see the documentation here and make sure this applies to your > situation: > http://www.drbd.org/users-guide/re-drbdadm.html > > Then, your second issue should not occur as both nodes will now have a > synchronized bitmap. It least that's how I understand it. Right. The necessary procedure is best documented in the drbdsetup manpage, which is online as well: http://www.drbd.org/users-guide/re-drbdsetup.html currently, the section on new-current-uuid is http://www.drbd.org/users-guide/re-drbdsetup.html#id1229962 but I'm not sure how stable those "id" tags are, when the docbook sources change ;) -- : Lars Ellenberg : LINBIT HA-Solutions GmbH : DRBD®/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Skipping initial sync, and full sync after node failurehttp://www.drbd.org/users-guide/s-using-truck-based-replication.html
Still no guarantee that this applies to you, but probably a bit more verbose than the man page. Florian On 10/22/2009 10:21 PM, Lars Ellenberg wrote: > On Thu, Oct 22, 2009 at 02:44:45PM -0400, Jean-Francois Chevrette wrote: >> Hi Ian, >> >> when creating a new resource which doesn't have any data that you want >> to keep on either node, you can use the following: >> >> # drbdadm -- --clear-bitmap new-current-uuid drbd0 >> >> You can see the documentation here and make sure this applies to your >> situation: >> http://www.drbd.org/users-guide/re-drbdadm.html >> >> Then, your second issue should not occur as both nodes will now have a >> synchronized bitmap. It least that's how I understand it. > > Right. > The necessary procedure is best documented in the drbdsetup manpage, > which is online as well: > http://www.drbd.org/users-guide/re-drbdsetup.html > > currently, the section on new-current-uuid is > http://www.drbd.org/users-guide/re-drbdsetup.html#id1229962 > but I'm not sure how stable those "id" tags are, > when the docbook sources change ;) > _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
|
|
Re: Skipping initial sync, and full sync after node failureOn Thu, Oct 22, 2009 at 4:21 PM, Lars Ellenberg <lars.ellenberg@...> wrote:
Lars and others who replied -- This procedure was exactly what I was looking for. It was quick and painless. Following this (instead of the set-gi thing I had mentioned previously) also fixed the issue that I had been having with a full sync being required after reboot. Which is excellent! For what it's worth, I would suggest that this documentation could possibly be made more prominent than it is at the moment. While it is in one of the man pages, I suspect that I am not the only person who is going to overlook it because we're used to using drbdadm as the interface to drbdsetup. I had looked through the drbdadm man page in detail, on top of Google and everything else, but hadn't found this because it never occured to me to look at the drbdsetup page. Thanks again for the pointers! - Ian _______________________________________________ drbd-user mailing list drbd-user@... http://lists.linbit.com/mailman/listinfo/drbd-user |
| Free embeddable forum powered by Nabble | Forum Help |