colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV

View: New views
10 Messages — Rating Filter:   Alert me  

colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

2 issues on colonialone:

- First /dev/md2 is only on a single RAID disk

- Secondly I got a filesystem error on a LV disk I had just created
  (on vg_1.0tb, on md1), which was then remounted read-only.  It's
  pretty strange, so I wonder if you have anything to say / suggest
  about this.

--
Sylvain



Parent Message unknown Re: [gnu.org #494104] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

On Wed, Oct 21, 2009 at 11:49:30AM -0400, Daniel Clark via RT wrote:

> > [beuc - Wed Oct 21 03:17:03 2009]:
> > - Secondly I got a filesystem error on a LV disk I had just created
> >   (on vg_1.0tb, on md1), which was then remounted read-only.  It's
> >   pretty strange, so I wonder if you have anything to say / suggest
> >   about this.
>
> Maybe we should just stop using the 1TB disks, which are a bit older
> than the 1.5TB disks, and which we do not use in other places.
>
> I think colonialone now has far more disk than it needs, so unless there
> are objections I'll just bring colonialone down and swap the 2 1TB
> drives in the computer with 2 1.5TB drives in the external enclosure,
> and then savannah-hackers can migrate everything off of the 1TB disks,
> and I'll just remove the external enclosure during the next GNAPs visit,
> which will remove a possible point of failure.
>
> Also since there is now so much more disk space than you need, I would
> highly suggest doing quad redundancy in RAID1, as there is often a delay
> of several days before we can get out to GNAPS and replace disks, and
> also disks tend to fail more often when being used to reconstruct a
> broken array. FSF is using triple redundancy RAID1 in all new systems at
> the moment, but will be moving to quad (sort of; 2 clustered RAID1s) soon.
>
> Please tell me if my impression that a total of 1.5GB of drive space for
> the new savannah is not enough. The current savannah seems to use less
> than 200GB.

- Now that the cause of the failure is known to be a software failure,
  do we forget about this, or still pursue the plan to remove 1.0TB
  disks that are used nowhere else at the FSF?

- If not, do you recommend moving everything (but '/') on the 1.5TB
  disks?

- About UUIDs, everything in fstab in using mdX, which I'd rather not
  mess with.

--
Sylvain



Parent Message unknown Re: [gnu.org #494104] colonialone: 2nd disk of /dev/md2 dead + filesystem errors on newly created LV

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Oct 26, 2009 at 03:19:41PM -0400, Daniel Clark via RT wrote:
> BTW I'm guessing you already know this, but 2 of the RAID arrays on colonialone are still
> down a disk:
>
> md3 : active raid1 sda4[0] sdb4[2](F)
>       955128384 blocks [2/1] [U_]
>      
> md2 : active raid1 sda3[0] sdb3[2](F)
>       19534976 blocks [2/1] [U_]

I didn't know about the 2nd disk failing :/

Please change them asap! :)

--
Sylvain



Parent Message unknown Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

As far as the hardware is concerned, I think it is best that we do
what the FSF sysadmins think is best.

We don't have access to the computer, don't really know anything about
what it's made of, don't understand the eSATA/internal
differences. We're even using Xen as you do, to ease this kind of
interaction. In short, you're more often than not in better position
to judge the hardware issues.


So:

If you think it's safer to use 4x1.5TB RAID-1, then let's do that.

Only, we need to discuss how to migrate the current data, since
colonialone is already in production.

In particular, fixing the DNS issues I reported would help if
temporary relocation is needed.

--
Sylvain

On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:

> Ah I see, I was waiting for comments on this - should be able to go out this weekend to do
> replacements / reshuffles / etc, but I need to know if savannah-hackers has a strong
> opinion on how to proceed:
>
> (1) Do we keep the 1TB disks?
> > - Now that the cause of the failure is known to be a software failure,
> > do we forget about this, or still pursue the plan to remove 1.0TB
> > disks that are used nowhere else at the FSF?
>
> That was mostly a "this makes no sense, but that's the only thing that's different about
> that system" type of response; it is true they are not used elsewhere, but if they are
> actually working fine I am fine with doing whatever savannah-hackers wants to do.
>
> (2) Do we keep the 2 eSATA drives connected?
> > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > disks?
>
> Again if they are working fine it's your call; however the bigger issue is if you want to
> keep the 2 eSATA / external drives connected, since that is a legitimate extra point of
> failure, and there are some cases where errors in the external enclosure can bring a system
> down (although it's been up and running fine for several months now).
>
> (3) Do we make the switch to UUIDs now?
> > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > mess with.
>
> IMHO it would be better to mess with this when the system is less critical; not using UUIDs
> everywhere tends to screw you during recovery from hardware failures.
>
> And BTW totally off-topic, but eth1 on colonialone is now connected via crossover ethernet
> cable to eth1 on savannah (and colonialone is no longer on fsf 10. management network,
> which I believe we confirmed no one cared about)
>
> (4) We need to change to some technique that will give us RAID1 redundancy even if one
> drives dies. I think the safest solution would be to not use eSATA, and use 4 1.5TB drives
> all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would need to fail to
> bring savannah down. Other option would be 2 triple RAID1s using eSATA, each with 2 disks
> inside the computer and the 3rd disks in the external enclosure.



Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:
> > Ah I see, I was waiting for comments on this - should be able to go out this weekend to do
> > replacements / reshuffles / etc, but I need to know if savannah-hackers has a strong
> > opinion on how to proceed:
> >
> > (1) Do we keep the 1TB disks?
> > > - Now that the cause of the failure is known to be a software failure,
> > > do we forget about this, or still pursue the plan to remove 1.0TB
> > > disks that are used nowhere else at the FSF?
> >
> > That was mostly a "this makes no sense, but that's the only thing that's different about
> > that system" type of response; it is true they are not used elsewhere, but if they are
> > actually working fine I am fine with doing whatever savannah-hackers wants to do.
> >
> > (2) Do we keep the 2 eSATA drives connected?
> > > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > > disks?
> >
> > Again if they are working fine it's your call; however the bigger issue is if you want to
> > keep the 2 eSATA / external drives connected, since that is a legitimate extra point of
> > failure, and there are some cases where errors in the external enclosure can bring a system
> > down (although it's been up and running fine for several months now).
> >
> > (3) Do we make the switch to UUIDs now?
> > > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > > mess with.
> >
> > IMHO it would be better to mess with this when the system is less critical; not using UUIDs
> > everywhere tends to screw you during recovery from hardware failures.
> >
> > And BTW totally off-topic, but eth1 on colonialone is now connected via crossover ethernet
> > cable to eth1 on savannah (and colonialone is no longer on fsf 10. management network,
> > which I believe we confirmed no one cared about)
> >
> > (4) We need to change to some technique that will give us RAID1 redundancy even if one
> > drives dies. I think the safest solution would be to not use eSATA, and use 4 1.5TB drives
> > all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would need to fail to
> > bring savannah down. Other option would be 2 triple RAID1s using eSATA, each with 2 disks
> > inside the computer and the 3rd disks in the external enclosure.

On Thu, Oct 29, 2009 at 07:29:50PM +0100, Sylvain Beucler wrote:

> Hi,
>
> As far as the hardware is concerned, I think it is best that we do
> what the FSF sysadmins think is best.
>
> We don't have access to the computer, don't really know anything about
> what it's made of, don't understand the eSATA/internal
> differences. We're even using Xen as you do, to ease this kind of
> interaction. In short, you're more often than not in better position
> to judge the hardware issues.
>
>
> So:
>
> If you think it's safer to use 4x1.5TB RAID-1, then let's do that.
>
> Only, we need to discuss how to migrate the current data, since
> colonialone is already in production.
>
> In particular, fixing the DNS issues I reported would help if
> temporary relocation is needed.


I see that there are currently 4x 1.5TB disks.


sda 1TB   inside
sdb 1TB   inside
sdc 1.5TB inside?
sdd 1.5TB inside?
sde 1.5TB external/eSATA?
sdf 1.5TB external/eSATA?


Here's what I started doing:

- recreate 4 partitions on sdc and sde (but 2 of them in an extended
  partition)

- added sdc and sdd to the current RAID-1 arrays

  mdadm /dev/md0 --add /dev/sdc1
  mdadm /dev/md0 --add /dev/sdd1
  mdadm /dev/md1 --add /dev/sdc2
  mdadm /dev/md1 --add /dev/sdd2
  mdadm /dev/md2 --add /dev/sdc5
  mdadm /dev/md2 --add /dev/sdd5
  mdadm /dev/md3 --add /dev/sdc6
  mdadm /dev/md3 --add /dev/sdd6
  mdadm /dev/md0 --grow -n 4
  mdadm /dev/md1 --grow -n 4
  mdadm /dev/md2 --grow -n 4
  mdadm /dev/md3 --grow -n 4

colonialone:~# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdd6[4] sdc6[5] sdb4[1] sda4[0]
      955128384 blocks [4/2] [UU__]
      [>....................]  recovery =  0.0% (43520/955128384) finish=730.1min speed=21760K/sec
     
md2 : active raid1 sdc5[2] sdd5[3] sdb3[1] sda3[0]
      19534976 blocks [4/4] [UUUU]
     
md1 : active raid1 sdd2[2] sdc2[3] sda2[0] sdb2[1]
      2000000 blocks [4/4] [UUUU]
     
md0 : active raid1 sdd1[2] sdc1[3] sda1[0] sdb1[1]
      96256 blocks [4/4] [UUUU]

- install GRUB on sdc and sdd


With this setup, the data is both on the 1TB and the 1.5TB disks.

If you confirm that this is OK, we can:

* extend this to sde and sdf,

* unplug sda+sdb and plug all the 1.5TB disks internaly

* reboot while you are at the colo, and ensure that there's no device
  renaming mess

* add the #7 partitions in sdc/d/e/f as a new RAID device / LVM
  Physical Volume and get the remaining 500GB


Can you let me know if this sounds reasonable?

--
Sylvain



Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 31, 2009 at 11:13:51AM +0100, Sylvain Beucler wrote:

> > On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:
> > > Ah I see, I was waiting for comments on this - should be able to go out this weekend to do
> > > replacements / reshuffles / etc, but I need to know if savannah-hackers has a strong
> > > opinion on how to proceed:
> > >
> > > (1) Do we keep the 1TB disks?
> > > > - Now that the cause of the failure is known to be a software failure,
> > > > do we forget about this, or still pursue the plan to remove 1.0TB
> > > > disks that are used nowhere else at the FSF?
> > >
> > > That was mostly a "this makes no sense, but that's the only thing that's different about
> > > that system" type of response; it is true they are not used elsewhere, but if they are
> > > actually working fine I am fine with doing whatever savannah-hackers wants to do.
> > >
> > > (2) Do we keep the 2 eSATA drives connected?
> > > > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > > > disks?
> > >
> > > Again if they are working fine it's your call; however the bigger issue is if you want to
> > > keep the 2 eSATA / external drives connected, since that is a legitimate extra point of
> > > failure, and there are some cases where errors in the external enclosure can bring a system
> > > down (although it's been up and running fine for several months now).
> > >
> > > (3) Do we make the switch to UUIDs now?
> > > > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > > > mess with.
> > >
> > > IMHO it would be better to mess with this when the system is less critical; not using UUIDs
> > > everywhere tends to screw you during recovery from hardware failures.
> > >
> > > And BTW totally off-topic, but eth1 on colonialone is now connected via crossover ethernet
> > > cable to eth1 on savannah (and colonialone is no longer on fsf 10. management network,
> > > which I believe we confirmed no one cared about)
> > >
> > > (4) We need to change to some technique that will give us RAID1 redundancy even if one
> > > drives dies. I think the safest solution would be to not use eSATA, and use 4 1.5TB drives
> > > all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would need to fail to
> > > bring savannah down. Other option would be 2 triple RAID1s using eSATA, each with 2 disks
> > > inside the computer and the 3rd disks in the external enclosure.
>
> On Thu, Oct 29, 2009 at 07:29:50PM +0100, Sylvain Beucler wrote:
> > Hi,
> >
> > As far as the hardware is concerned, I think it is best that we do
> > what the FSF sysadmins think is best.
> >
> > We don't have access to the computer, don't really know anything about
> > what it's made of, don't understand the eSATA/internal
> > differences. We're even using Xen as you do, to ease this kind of
> > interaction. In short, you're more often than not in better position
> > to judge the hardware issues.
> >
> >
> > So:
> >
> > If you think it's safer to use 4x1.5TB RAID-1, then let's do that.
> >
> > Only, we need to discuss how to migrate the current data, since
> > colonialone is already in production.
> >
> > In particular, fixing the DNS issues I reported would help if
> > temporary relocation is needed.
>
>
> I see that there are currently 4x 1.5TB disks.
>
>
> sda 1TB   inside
> sdb 1TB   inside
> sdc 1.5TB inside?
> sdd 1.5TB inside?
> sde 1.5TB external/eSATA?
> sdf 1.5TB external/eSATA?
>
>
> Here's what I started doing:
>
> - recreate 4 partitions on sdc and sde (but 2 of them in an extended
>   partition)
>
> - added sdc and sdd to the current RAID-1 arrays
>
>   mdadm /dev/md0 --add /dev/sdc1
>   mdadm /dev/md0 --add /dev/sdd1
>   mdadm /dev/md1 --add /dev/sdc2
>   mdadm /dev/md1 --add /dev/sdd2
>   mdadm /dev/md2 --add /dev/sdc5
>   mdadm /dev/md2 --add /dev/sdd5
>   mdadm /dev/md3 --add /dev/sdc6
>   mdadm /dev/md3 --add /dev/sdd6
>   mdadm /dev/md0 --grow -n 4
>   mdadm /dev/md1 --grow -n 4
>   mdadm /dev/md2 --grow -n 4
>   mdadm /dev/md3 --grow -n 4
>
> colonialone:~# cat /proc/mdstat
> Personalities : [raid1]
> md3 : active raid1 sdd6[4] sdc6[5] sdb4[1] sda4[0]
>       955128384 blocks [4/2] [UU__]
>       [>....................]  recovery =  0.0% (43520/955128384) finish=730.1min speed=21760K/sec
>      
> md2 : active raid1 sdc5[2] sdd5[3] sdb3[1] sda3[0]
>       19534976 blocks [4/4] [UUUU]
>      
> md1 : active raid1 sdd2[2] sdc2[3] sda2[0] sdb2[1]
>       2000000 blocks [4/4] [UUUU]
>      
> md0 : active raid1 sdd1[2] sdc1[3] sda1[0] sdb1[1]
>       96256 blocks [4/4] [UUUU]
>
> - install GRUB on sdc and sdd
>
>
> With this setup, the data is both on the 1TB and the 1.5TB disks.
>
> If you confirm that this is OK, we can:
>
> * extend this to sde and sdf,
>
> * unplug sda+sdb and plug all the 1.5TB disks internaly
>
> * reboot while you are at the colo, and ensure that there's no device
>   renaming mess
>
> * add the #7 partitions in sdc/d/e/f as a new RAID device / LVM
>   Physical Volume and get the remaining 500GB
>
>
> Can you let me know if this sounds reasonable?

up!

--
Sylvain



Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 12, 2009 at 12:33:17PM +0100, Sylvain Beucler wrote:

> On Sat, Oct 31, 2009 at 11:13:51AM +0100, Sylvain Beucler wrote:
> > > On Thu, Oct 29, 2009 at 01:20:55PM -0400, Daniel Clark via RT wrote:
> > > > Ah I see, I was waiting for comments on this - should be able to go out this weekend to do
> > > > replacements / reshuffles / etc, but I need to know if savannah-hackers has a strong
> > > > opinion on how to proceed:
> > > >
> > > > (1) Do we keep the 1TB disks?
> > > > > - Now that the cause of the failure is known to be a software failure,
> > > > > do we forget about this, or still pursue the plan to remove 1.0TB
> > > > > disks that are used nowhere else at the FSF?
> > > >
> > > > That was mostly a "this makes no sense, but that's the only thing that's different about
> > > > that system" type of response; it is true they are not used elsewhere, but if they are
> > > > actually working fine I am fine with doing whatever savannah-hackers wants to do.
> > > >
> > > > (2) Do we keep the 2 eSATA drives connected?
> > > > > - If not, do you recommend moving everything (but '/') on the 1.5TB
> > > > > disks?
> > > >
> > > > Again if they are working fine it's your call; however the bigger issue is if you want to
> > > > keep the 2 eSATA / external drives connected, since that is a legitimate extra point of
> > > > failure, and there are some cases where errors in the external enclosure can bring a system
> > > > down (although it's been up and running fine for several months now).
> > > >
> > > > (3) Do we make the switch to UUIDs now?
> > > > > - About UUIDs, everything in fstab in using mdX, which I'd rather not
> > > > > mess with.
> > > >
> > > > IMHO it would be better to mess with this when the system is less critical; not using UUIDs
> > > > everywhere tends to screw you during recovery from hardware failures.
> > > >
> > > > And BTW totally off-topic, but eth1 on colonialone is now connected via crossover ethernet
> > > > cable to eth1 on savannah (and colonialone is no longer on fsf 10. management network,
> > > > which I believe we confirmed no one cared about)
> > > >
> > > > (4) We need to change to some technique that will give us RAID1 redundancy even if one
> > > > drives dies. I think the safest solution would be to not use eSATA, and use 4 1.5TB drives
> > > > all inside the computer in a 1.5TB quad RAID1 array, so all 4 drives would need to fail to
> > > > bring savannah down. Other option would be 2 triple RAID1s using eSATA, each with 2 disks
> > > > inside the computer and the 3rd disks in the external enclosure.
> >
> > On Thu, Oct 29, 2009 at 07:29:50PM +0100, Sylvain Beucler wrote:
> > > Hi,
> > >
> > > As far as the hardware is concerned, I think it is best that we do
> > > what the FSF sysadmins think is best.
> > >
> > > We don't have access to the computer, don't really know anything about
> > > what it's made of, don't understand the eSATA/internal
> > > differences. We're even using Xen as you do, to ease this kind of
> > > interaction. In short, you're more often than not in better position
> > > to judge the hardware issues.
> > >
> > >
> > > So:
> > >
> > > If you think it's safer to use 4x1.5TB RAID-1, then let's do that.
> > >
> > > Only, we need to discuss how to migrate the current data, since
> > > colonialone is already in production.
> > >
> > > In particular, fixing the DNS issues I reported would help if
> > > temporary relocation is needed.
> >
> >
> > I see that there are currently 4x 1.5TB disks.
> >
> >
> > sda 1TB   inside
> > sdb 1TB   inside
> > sdc 1.5TB inside?
> > sdd 1.5TB inside?
> > sde 1.5TB external/eSATA?
> > sdf 1.5TB external/eSATA?
> >
> >
> > Here's what I started doing:
> >
> > - recreate 4 partitions on sdc and sde (but 2 of them in an extended
> >   partition)
> >
> > - added sdc and sdd to the current RAID-1 arrays
> >
> >   mdadm /dev/md0 --add /dev/sdc1
> >   mdadm /dev/md0 --add /dev/sdd1
> >   mdadm /dev/md1 --add /dev/sdc2
> >   mdadm /dev/md1 --add /dev/sdd2
> >   mdadm /dev/md2 --add /dev/sdc5
> >   mdadm /dev/md2 --add /dev/sdd5
> >   mdadm /dev/md3 --add /dev/sdc6
> >   mdadm /dev/md3 --add /dev/sdd6
> >   mdadm /dev/md0 --grow -n 4
> >   mdadm /dev/md1 --grow -n 4
> >   mdadm /dev/md2 --grow -n 4
> >   mdadm /dev/md3 --grow -n 4
> >
> > colonialone:~# cat /proc/mdstat
> > Personalities : [raid1]
> > md3 : active raid1 sdd6[4] sdc6[5] sdb4[1] sda4[0]
> >       955128384 blocks [4/2] [UU__]
> >       [>....................]  recovery =  0.0% (43520/955128384) finish=730.1min speed=21760K/sec
> >      
> > md2 : active raid1 sdc5[2] sdd5[3] sdb3[1] sda3[0]
> >       19534976 blocks [4/4] [UUUU]
> >      
> > md1 : active raid1 sdd2[2] sdc2[3] sda2[0] sdb2[1]
> >       2000000 blocks [4/4] [UUUU]
> >      
> > md0 : active raid1 sdd1[2] sdc1[3] sda1[0] sdb1[1]
> >       96256 blocks [4/4] [UUUU]
> >
> > - install GRUB on sdc and sdd
> >
> >
> > With this setup, the data is both on the 1TB and the 1.5TB disks.
> >
> > If you confirm that this is OK, we can:
> >
> > * extend this to sde and sdf,
> >
> > * unplug sda+sdb and plug all the 1.5TB disks internaly
> >
> > * reboot while you are at the colo, and ensure that there's no device
> >   renaming mess
> >
> > * add the #7 partitions in sdc/d/e/f as a new RAID device / LVM
> >   Physical Volume and get the remaining 500GB
> >
> >
> > Can you let me know if this sounds reasonable?
>
> up!

Seriously, can you answer if it's OK to move the RAID 1tb->1.5tb and
plan a disk re-plug soon?

--
Sylvain




Parent Message unknown Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 25, 2009 at 02:42:57PM -0500, Daniel Clark via RT wrote:
> Is this Sat or Sun sometime between 11am and 5pm New_York time good for you for a physical
> disk swap etc?

Hi,

On Sunday, I should be around yes. Do you have a more specific time?

--
Sylvain



Parent Message unknown Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 30, 2009 at 01:41:35PM -0500, Daniel Clark via RT wrote:
> My next openings for off-hours maintenance (all New_York time) are:
>
> Sat Dec 5 6pm-8pm
> Sun Dec 6 2pm-8pm

This one (Dec6 7pm/1am UTC) is the only one when I'm available (and
awake renough :)).

> Mon Dec 7 6pm-8pm
> Sat Dec 12 10:30am-8pm
> Sun Dec 13 6pm-8pm
>
> Please tell me if any of these are good for you.
>
> We'll of course want to form a plan about what to do differently before
> the maintenance. I'll also try to get the PXE boot stuff doc'ed on our
> side so we can use that as a possible workaround if things should not
> work again.

http://www.gnu.org/software/grub/manual/html_node/Stage1-errors.html#Stage1-errors
"Hard Disk Error"
    The stage2 or stage1.5 is being read from a hard disk, and the
    attempt to determine the size and geometry of the hard disk
    failed.

I can only guess that the 1.0TB geometry (used by grub-install) isn't
the same as the 1.5TB geometry.

I'll check if there's a way to make grub-install use the 1.5TB disk
when locating /boot, otherwise the easiest way is to boot on a rescue
system as you suggested, chroot in the real system and re-run
grub-install from there.

--
Sylvain



Parent Message unknown Re: [gnu.org #498996] Hard-disk failures on colonialone

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Dec 02, 2009 at 10:17:14AM -0500, Daniel Clark via RT wrote:
> > This one (Dec6 7pm/1am UTC) is the only one when I'm available (and
> > awake renough :)).
>
> Cool, I'll try to get to GNAPS by 7pm this Sunday (shouldn't be a
> problem unless there is a cooking disaster :-)

:)

> As mentioned on IRC using the grub shell worked for me on the laptop
> when install-grub did not. This also seems to be echoed by some doc, eg:
>
> http://wiki.archlinux.org/index.php/GRUB#Installing_to_the_MBR

Yes, I just did that to force using the current /dev/sdc, using a
alternate device.map.


colonialone:~# cat /boot/grub/device.map-next
(hd0)          /dev/sdc
(hd1)          /dev/sdd
(hd2)          /dev/sde
(hd3)          /dev/sdf
(hd4)          /dev/sda
(hd5)          /dev/sdb

colonialone:~# grub --device-map /boot/grub/device.map


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

       [ Minimal BASH-like line editing is supported.   For
         the   first   word,  TAB  lists  possible  command
         completions.  Anywhere else TAB lists the possible
         completions of a device/filename. ]
grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"...  17 sectors are embedded.
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+17 p (hd0,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub> setup (hd1)
setup (hd1)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd1)"...  17 sectors are embedded.
succeeded
 Running "install /grub/stage1 d (hd1) (hd1)1+17 p (hd0,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub> setup (hd2)
setup (hd2)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd2)"...  17 sectors are embedded.
succeeded
 Running "install /grub/stage1 d (hd2) (hd2)1+17 p (hd0,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.
grub> setup (hd3)
setup (hd3)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd3)"...  17 sectors are embedded.
succeeded
 Running "install /grub/stage1 d (hd3) (hd3)1+17 p (hd0,0)/grub/stage2 /grub/menu.lst"... succeeded
Done.

colonialone:~# grub-install /dev/sda
Searching for GRUB installation directory ... found: /boot/grub
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(hd0)  /dev/sda
(hd1)  /dev/sdb
(hd2)  /dev/sdc
(hd3)  /dev/sdd
(hd4)  /dev/sde
(hd5)  /dev/sdf
colonialone:~# grub-install /dev/sdb
Searching for GRUB installation directory ... found: /boot/grub
Installation finished. No error reported.
This is the contents of the device map /boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(hd0)  /dev/sda
(hd1)  /dev/sdb
(hd2)  /dev/sdc
(hd3)  /dev/sdd
(hd4)  /dev/sde
(hd5)  /dev/sdf

--
Sylvain