|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
FAS2050C questions (clustering)I'm the proud new owner of an IBM N3600 A20 (rebranded FAS2050C) with
20x30GB SAS disks. I'm trying to determine the best way to get this thing set up, and realized I have only a bit of a fuzzy understanding as to how the clustering or failover filer head should work. My initial thoughts were to aim for the following setup: - Set up all 20 disks in a RAID-DP aggregate with one spare (17 data, 2 parity and one spare, or maybe two spares). - Bond a NIC from the first controller with a NIC from the second controller to give us a 2Gbps connection to our "storage network". - Third and fourth NIC's would go to our regular network. My hope was that I could lose one filer head and the other would take over seamlessly. We'd lose half of our network bandwidth but still be up and running. However, it sounds like my understanding of how the clustering works might have been a bit flawed and that I actually need to treat the filer heads as two separate filers. So I may be forced to do something like the following: - Split my disks up between the two filers (7 data, 2 parity, one spare -- or maybe I can have one spare available to both heads). - Probably can't team NIC's from multiple filer heads meaning if I team the two NIC's on the filer I can no longer connect to my management network. I probably need to order more NIC's :( - If I lose one head, I lose one aggregate unless manual intervention is taken. - Each filer has a different hostname/IP for network access. This maybe gives me better performance, but at the expense of total disk space and flexibility if my understanding is correct. Maybe someone could help clear this up. It doesn't appear IBM has a RedBook on clustering... I'm searching around in NOW and have come across the Data ONTAP 7.3 Active/Active Configuration Guide which I am now reading. Is there something similar for Active/Passive setups (which seems to be more what I am after) or other documents that would be recommended reading? Any advice or best practices? This filer will be serving NFS to a pair of ESX servers. We plan to add a second shelf of disks later this year. Thanks in advance. No sales inquiries please. Ray |
|
|
RE: FAS2050C questions (clustering)You are correct that clustering is treated as two separate controllers
which can take over for each other. You cannot vif across NICs on different controllers. If you want to do the closest thing to active/passive would be to allocate at least 2 (possible 3 if you want a spare) disks to the "passive" controller and the rest to the active one. I'd set up a raid4 trad vol or aggregate for it since you only are going to use 2 disks, you don't need raid_dp. Definitely use raid_dp on the active controller. Under this scenario, you can lose either controller head and still be running. -- Adam Fox Systems Engineer adamfox@... -----Original Message----- From: Ray Van Dolson [mailto:rvandolson@...] Sent: Tuesday, June 02, 2009 11:44 AM To: toasters@... Subject: FAS2050C questions (clustering) I'm the proud new owner of an IBM N3600 A20 (rebranded FAS2050C) with 20x30GB SAS disks. I'm trying to determine the best way to get this thing set up, and realized I have only a bit of a fuzzy understanding as to how the clustering or failover filer head should work. My initial thoughts were to aim for the following setup: - Set up all 20 disks in a RAID-DP aggregate with one spare (17 data, 2 parity and one spare, or maybe two spares). - Bond a NIC from the first controller with a NIC from the second controller to give us a 2Gbps connection to our "storage network". - Third and fourth NIC's would go to our regular network. My hope was that I could lose one filer head and the other would take over seamlessly. We'd lose half of our network bandwidth but still be up and running. However, it sounds like my understanding of how the clustering works might have been a bit flawed and that I actually need to treat the filer heads as two separate filers. So I may be forced to do something like the following: - Split my disks up between the two filers (7 data, 2 parity, one spare -- or maybe I can have one spare available to both heads). - Probably can't team NIC's from multiple filer heads meaning if I team the two NIC's on the filer I can no longer connect to my management network. I probably need to order more NIC's :( - If I lose one head, I lose one aggregate unless manual intervention is taken. - Each filer has a different hostname/IP for network access. This maybe gives me better performance, but at the expense of total disk space and flexibility if my understanding is correct. Maybe someone could help clear this up. It doesn't appear IBM has a RedBook on clustering... I'm searching around in NOW and have come across the Data ONTAP 7.3 Active/Active Configuration Guide which I am now reading. Is there something similar for Active/Passive setups (which seems to be more what I am after) or other documents that would be recommended reading? Any advice or best practices? This filer will be serving NFS to a pair of ESX servers. We plan to add a second shelf of disks later this year. Thanks in advance. No sales inquiries please. Ray |
|
|
Re: FAS2050C questions (clustering)On Tue, Jun 02, 2009 at 08:54:35AM -0700, Fox, Adam wrote:
> You are correct that clustering is treated as two separate controllers > which can take over for each other. You cannot vif across NICs on > different controllers. > > If you want to do the closest thing to active/passive would be to > allocate at least 2 (possible 3 if you want a spare) disks to the > "passive" controller and the rest to the active one. I'd set up a raid4 > trad vol or aggregate for it since you only are going to use 2 disks, > you don't need raid_dp. Definitely use raid_dp on the active > controller. > > Under this scenario, you can lose either controller head and still be > running. Ah, so we need to have disks assigned to the "passive" controller in an aggregate configuration? What if I just split the disks up evenly, would the aggregate on "active" controller shift down to be controlled by the "passive" controller automatically? Maybe this would be preferrable to having 2 or 3 disks doing "nothing" on the second head. Thanks for the response. > > -- Adam Fox > Systems Engineer > adamfox@... Ray |
|
|
RE: FAS2050C questions (clustering)You can split them if you like. I only said do the 2-disk to the
"passive" side if you wanted an active/passive config. If you want to go active/active, then split them up. Just be aware that depending on your load, you may get spindle-bound at some point with that few disks in your aggregate, but you may be fine until you get your new disks later. -- Adam Fox Systems Engineer adamfox@... -----Original Message----- From: Ray Van Dolson [mailto:rvandolson@...] Sent: Tuesday, June 02, 2009 12:01 PM To: Fox, Adam Cc: toasters@... Subject: Re: FAS2050C questions (clustering) On Tue, Jun 02, 2009 at 08:54:35AM -0700, Fox, Adam wrote: > You are correct that clustering is treated as two separate controllers > which can take over for each other. You cannot vif across NICs on > different controllers. > > If you want to do the closest thing to active/passive would be to > allocate at least 2 (possible 3 if you want a spare) disks to the > "passive" controller and the rest to the active one. I'd set up a raid4 > trad vol or aggregate for it since you only are going to use 2 disks, > you don't need raid_dp. Definitely use raid_dp on the active > controller. > > Under this scenario, you can lose either controller head and still be > running. Ah, so we need to have disks assigned to the "passive" controller in an aggregate configuration? What if I just split the disks up evenly, would the aggregate on "active" controller shift down to be controlled by the "passive" controller automatically? Maybe this would be preferrable to having 2 or 3 disks doing "nothing" on the second head. Thanks for the response. > > -- Adam Fox > Systems Engineer > adamfox@... Ray |
|
|
Re: FAS2050C questions (clustering)You need more NICs (we hit this issue all the time).
Basically, set both filers up on both networks as if they were separate filers. Given them separate IP addresses. If one head dies, the other head will takeover all connections and assume the "personality" of the dead filers. So the IPs and the disks of the dead filer will all be visile on the live one. The heads have an internal interconenct that makes this possible. And everything Adam said about the disks. Peta 2009/6/2 Fox, Adam <Adam.Fox@...> You are correct that clustering is treated as two separate controllers |
|
|
RE: FAS2050C questions (clustering)Depending on your workload you could get a decent boost by splitting it
up since that means you'll have twice as much cache servicing your requests. Keep in mind it's not just about the disks :) -----Original Message----- From: owner-toasters@... [mailto:owner-toasters@...] On Behalf Of Fox, Adam Sent: Tuesday, June 02, 2009 12:03 PM To: Ray Van Dolson Cc: toasters@... Subject: RE: FAS2050C questions (clustering) You can split them if you like. I only said do the 2-disk to the "passive" side if you wanted an active/passive config. If you want to go active/active, then split them up. Just be aware that depending on your load, you may get spindle-bound at some point with that few disks in your aggregate, but you may be fine until you get your new disks later. -- Adam Fox Systems Engineer adamfox@... -----Original Message----- From: Ray Van Dolson [mailto:rvandolson@...] Sent: Tuesday, June 02, 2009 12:01 PM To: Fox, Adam Cc: toasters@... Subject: Re: FAS2050C questions (clustering) On Tue, Jun 02, 2009 at 08:54:35AM -0700, Fox, Adam wrote: > You are correct that clustering is treated as two separate controllers > which can take over for each other. You cannot vif across NICs on > different controllers. > > If you want to do the closest thing to active/passive would be to > allocate at least 2 (possible 3 if you want a spare) disks to the > "passive" controller and the rest to the active one. I'd set up a raid4 > trad vol or aggregate for it since you only are going to use 2 disks, > you don't need raid_dp. Definitely use raid_dp on the active > controller. > > Under this scenario, you can lose either controller head and still be > running. Ah, so we need to have disks assigned to the "passive" controller in an aggregate configuration? What if I just split the disks up evenly, would the aggregate on "active" controller shift down to be controlled by the "passive" controller automatically? Maybe this would be preferrable to having 2 or 3 disks doing "nothing" on the second head. Thanks for the response. > > -- Adam Fox > Systems Engineer > adamfox@... Ray Please be advised that this email may contain confidential information. If you are not the intended recipient, please do not read, copy or re-transmit this email. If you have received this email in error, please notify us by email by replying to the sender and by telephone (call us collect at +1 202-828-0850) and delete this message and any attachments. Thank you in advance for your cooperation and assistance. In addition, Danaher and its subsidiaries disclaim that the content of this email constitutes an offer to enter into, or the acceptance of, any contract or agreement or any amendment thereto; provided that the foregoing disclaimer does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment to this email. |
|
|
|
|
|
RE: FAS2050C questions (clustering)The reason is that the cluster is built as an active/active cluster and
so each controller must have a root volume, thus each needs a trad vol or 1 aggregate. So the "active/passive" config isn't truly active/passive, it's just functionally so. There is no way currently to automatically float a spare. You could manually change ownership of a spare disk, but it doesn't do this automatically. ONTAP will whine in the messages file if there are no spares, and you will probably want to up the raid.timeout option to give you more time to move the spare over. -- Adam Fox Systems Engineer adamfox@... -----Original Message----- From: Ray Van Dolson [mailto:rvandolson@...] Sent: Tuesday, June 02, 2009 12:59 PM To: Steve Francis Cc: Page, Jeremy; Fox, Adam; toasters@... Subject: Re: FAS2050C questions (clustering) On Tue, Jun 02, 2009 at 09:26:17AM -0700, Steve Francis wrote: > You can get a performance boost by splitting it up. > Which is why I don't like to do that, in general, for performance > sensitive workloads. :-) > > Otherwise, in the event of a head failure, the surviving head may not > have the CPU/cache to deal with the extra work - even if both heads > are normally under 50% load. > Most workloads grow linearly - until they hit an elbow an don't grow linearly. > The only true way to be sure you have failover capacity is to run that > way all the time. > > Your mileage/budget/workload/performance requirements may vary. Thanks all. Great advice. So, my question is: if the second head can take over the personality of the failed head, why do I need to allocate any disks at all to the second head to begin with? Just a design thing? I'll probably do the even split thing.... and look into ordering additional NIC's. Can I have a "spare" that is available to either aggregate on either filer? This way I could do two RAID-DP's on each head with one common "spare" disk and rely on 4hr support to get me replacement disks quickly. Ray |
|
|
Re: FAS2050C questions (clustering)On Tue, Jun 02, 2009 at 10:57:07AM -0700, Fox, Adam wrote:
> The reason is that the cluster is built as an active/active cluster and > so each controller must have a root volume, thus each needs a trad vol > or 1 aggregate. So the "active/passive" config isn't truly > active/passive, it's just functionally so. Ah-ha... the light has turned on in my head. Makes sense. > > There is no way currently to automatically float a spare. You could > manually change ownership of a spare disk, but it doesn't do this > automatically. ONTAP will whine in the messages file if there are no > spares, and you will probably want to up the raid.timeout option to give > you more time to move the spare over. > Gotcha. I'll just have to think about the RAID-4 vs RAID-6 thing then. Extra 300GB of space might be nice... Thanks, Ray |
|
|
Re: FAS2050C questions (clustering)We have the exact same unit, except NetApp branded and with 147GB disks. No extra shelves at this stage either. It does take a bit of time to get your head around how cluster configurations work with the NetApp filers if you've never dealt with them before. It sounds like quite a few of your questions have been answered already by others in this thread, but I thought it may be useful to add our experience and decisions on how we set things up. Firstly, regarding clustering - the best way to think of the 2050C is two seperate filer heads that each have their own hardware, own disks, own spares, own OS, our config, etc, but are capable of "taking over" the disks, volumes, exports, IP addresses, etc, etc, from one another in the event that one of them dies. So, as you've discovered, each filer does need to have dedicated disks assigned to it, you can't share network cards across the filers, etc, but you CAN expect them to seamlessly and automatically fail over to each other in the event of a problem (and this process works well - simulated it several times). The same fail-over functionality lets you non-disruptively upgrade to new Data ONTAP versions as well (we recently upgraded from 7.2.4L1 to 7.3.1 without any disruption to the data being served up to our VM hosts). In our case, we decided to mimic an "active/passive" setup as closely as possible by allocating 17 disks to the first head (16 disk RAID-DP with 1 spare), and 3 disks (2 disk RAID-4 with 1 spare) to the second head. In this setup, all of our data and stored on and served from the first head - the second head does nothing except sit there watching for a failure of the first head, and takes over if this were to occur. The logic behind this choice was: - to maintain a reasonably good-practice setup (RAID-DP + hotspares) we would "lose" more disks by splitting them evenly across both heads than by putting as many disks as possible on one head - 9-disk RAID-DP group + 1 hotspare on each head is 6 disks "lost", whereas a 16-disk RAID-DP group + 1 hotspare & 2-disk RAID-4 group + 1 hotspare is 5 disks "lost", - we could reduce this to 4 by removing the hotspare from the second head, - gave us more active spindles (and hence higher performance), - would be easier to expand from - e.g. if we bought a shelf of SATA disks it would be much easier to implement them efficiently There are some cons - namely less usable cache available and no distribution of CPU load across the two heads - but for our workload we felt the pros outweighed these cons. Also, to touch on your network cards - do you know that you need 2 Gbps of bandwidth, or just thinking the more the merrier? If it's the latter, you may want measure your network usage before going to great lengths to obtain that extra bandwidth. We have each of the two NICs on each head in our 2050 connected to a different switch and then joined into a single-mode VIF, so we get switch fail-over with total bandwidth of 1Gbps per head. Average bandwidth usage on the main head is < 10MB/s while serving up VMDKs for 30-odd ESX VMs over NFS. Of course it does spike higher, but pretty much the only times I've seen it max out the network connection is when doing NDMP back-ups to a seperate backup proxy, and once or twice while doing mass-patching of our Virtual Machines during a maintenance window (so we had 30-odd Windows VMs all trying to install, say, .NET 3.5 at the same time). Hope that helps! If you have any questions feel free to post them - I went from knowing nothing about SANs (at all) to selecting and purchasing one to having it running in production in the space of a couple of months - so I know what the learning curve feels like!! :-) Cheers, Matt |
|
|
Re: FAS2050C questions (clustering)Be careful about runnning without a spare on each head.
If the filer panic's you will not get the core dump unless you enable the 'wait until core dump writes to the file system before failing over' option. which then make failover take a long time and probably cause other problems. I am not sure about the 2050 series but assume this is the same as the 3000 series. Jack Ray Van Dolson wrote: > On Tue, Jun 02, 2009 at 09:26:17AM -0700, Steve Francis wrote: > >> You can get a performance boost by splitting it up. >> Which is why I don't like to do that, in general, for performance >> sensitive workloads. :-) >> >> Otherwise, in the event of a head failure, the surviving head may not >> have the CPU/cache to deal with the extra work - even if both heads >> are normally under 50% load. >> Most workloads grow linearly - until they hit an elbow an don't grow linearly. >> The only true way to be sure you have failover capacity is to run that >> way all the time. >> >> Your mileage/budget/workload/performance requirements may vary. >> > > Thanks all. Great advice. So, my question is: if the second head can > take over the personality of the failed head, why do I need to allocate > any disks at all to the second head to begin with? Just a design > thing? > > I'll probably do the even split thing.... and look into ordering > additional NIC's. > > Can I have a "spare" that is available to either aggregate on either > filer? This way I could do two RAID-DP's on each head with one common > "spare" disk and rely on 4hr support to get me replacement disks > quickly. > > Ray > > |
|
|
Re: FAS2050C questions (clustering)On Tue, Jun 02, 2009 at 05:04:29PM +0100, Peta Spies wrote:
> You need more NICs (we hit this issue all the time). Not necessarily. If you now have 2 NICs on both filers, simply VIF those NICs, then configure a vlan trunk over that etherchannel, and create vlan interfaces. Make sure to create interfaces for all vlans on both filers, so they can failover to eachother. This assumes that your management network is actually just a vlan on the same infrastructure. If it's separate hardware, you do need more NICs Ray Van Dolson wrote: > Gotcha. I'll just have to think about the RAID-4 vs RAID-6 thing then. > Extra 300GB of space might be nice... Don't. The extra headache you get when your performance goes down the drain whenever you have a high priority RAID rebuild, or the chance of data loss because there's a second failure during rebuild, REALLY outweighs the cost of a single extra disk. RAID-DP uses low priority rebuilds which take longer, but don't have such a big impact on your production data (unless you get into double degraded mode - which would have meant data loss on RAID-4). Also, RAID-DP raid groups can be larger than RAID-4 raid groups, so you can more easily extend your RAID-DP based volumes (or aggregates), and if you do, RAID-DP is just as efficient as RAID-4, just vastly more reliable. It does add a little more overhead, but if you want raw performance, just use RAID-0, and recover from disk crashes using "newfs" (we do that - but not on netapp hardware obviously - and you'll have to add redundancy at another level). -- Jan-Pieter Cornet <johnpc@...> !! Disclamer: The addressee of this email is not the intended recipient. !! !! This is only a test of the echelon and data retention systems. Please !! !! archive this message indefinitely to allow verification of the logs. !! |
| Free embeddable forum powered by Nabble | Forum Help |