|
View:
New views
17 Messages
—
Rating Filter:
Alert me
|
|
|
Massive ext4 filesystem corruption after a failed s2disk/ram cycleHi,
Just prior to 2.6.32 cycle I tried -next tree and noticed that after a failed s2ram (here it works only once, and I test once in a whileto see if fixed accidentally) I got a minor filesystem corruption. I am sorry I didn't report that back then. Now I have installed 2.6.32-rc2 (well -rc1...) and things were sort of ok, I have even thought that hibernation is once again stable (somewhere in the not that distinct past the hibernation which used to work, began to fail randomly on resume) Few days ago, I got a read-only filesystem again, an fsck, few more corrupted files..., It should have had rung the bell for me (I have still used hibernation, trying to understand why it fails sometimes) Yesterday, however, I have decided to fix that once and for all, and for that I have set up a loop + rtc wakealarm to make it cycle through hibernation. Needless to say I didn't run that loop more that maybe 3 cycles (and no failures), but noticed that rtc clock is dead on resume. I sort of fixed that (this is hpet emulation that strikes again), I will post when I test the fix (trivial), because when I had rebooted the system into the modified kernel, I got that readonly filesystem again, and this time the damage had spread over lots of files. (I have even lost most of dpkg database..., many programs, libraries,..., settings) Yet, thanks to Linux flexibility, after a day, and some study of nautilus source, I had the system recovered fully. (Now am doing backups.....) But I don't want that to happen again... Another clue that I have seen was that ext4 driver reported that it aborts journal replay. I know that for now there is not much you can do, but just to let you know that something is there... What is especially interesting is that there were no s2ram'disk faulure preceding the corruption, but my theory is that corruption wasn't detected for a while from last failure, probably giving such bad consequences. You do sync file-systems before entering the hibernation, don't you? Best regards, Maxim Levitsky -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Tue, Oct 06, 2009 at 11:06:55PM +0200, Maxim Levitsky wrote:
> > Just prior to 2.6.32 cycle I tried -next tree and noticed that after a > failed s2ram (here it works only once, and I test once in a whileto see > if fixed accidentally) I got a minor filesystem corruption. I am sorry I > didn't report that back then. When you say filesystem corruption, it's important to indicate whether you meant that (a) you noticed that some files were had corrupted contents, (b) the kernel complained that the filesystem was corrupted, and remounted the filesystem read-only, or (c) e2fsck found and fixed errors. Also, when you found errors of either class (a) or (b), did you run e2fsck to find and fix any potential errors? In a few places it sounded like the kernel had complained about errors, but you had ignored them and hadn't run e2fsck to fix them. I hope that was just me misunderstanding what you wrote! Can you clarify? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Tuesday 06 October 2009, Maxim Levitsky wrote:
> Hi, > > Just prior to 2.6.32 cycle I tried -next tree and noticed that after a > failed s2ram (here it works only once, and I test once in a whileto see > if fixed accidentally) I got a minor filesystem corruption. I am sorry I > didn't report that back then. > > Now I have installed 2.6.32-rc2 (well -rc1...) and things were sort of > ok, I have even thought that hibernation is once again stable > (somewhere in the not that distinct past the hibernation which used to > work, began to fail randomly on resume) > > Few days ago, I got a read-only filesystem again, an fsck, few more > corrupted files..., It should have had rung the bell for me (I have > still used hibernation, trying to understand why it fails sometimes) > > Yesterday, however, I have decided to fix that once and for all, and for > that I have set up a loop + rtc wakealarm to make it cycle through > hibernation. > > Needless to say I didn't run that loop more that maybe 3 cycles (and no > failures), but noticed that rtc clock is dead on resume. > > I sort of fixed that (this is hpet emulation that strikes again), I will > post when I test the fix (trivial), because when I had rebooted the > system into the modified kernel, I got that readonly filesystem again, > and this time the damage had spread over lots of files. > (I have even lost most of dpkg database..., many programs, > libraries,..., settings) > > Yet, thanks to Linux flexibility, after a day, and some study of > nautilus source, I had the system recovered fully. > (Now am doing backups.....) > > But I don't want that to happen again... > > Another clue that I have seen was that ext4 driver reported that it > aborts journal replay. > > I know that for now there is not much you can do, but just to let you > know that something is there... > > What is especially interesting is that there were no s2ram'disk faulure > preceding the corruption, but my theory is that corruption wasn't > detected for a while from last failure, probably giving such bad > consequences. > > You do sync file-systems before entering the hibernation, don't you? Yes, a sync is there, but it is not effective on some filesystems. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Tue, 06 Oct 2009, Rafael J. Wysocki wrote:
> > You do sync file-systems before entering the hibernation, don't you? > > Yes, a sync is there, but it is not effective on some filesystems. Which ones? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Tue, 2009-10-06 at 17:42 -0400, Theodore Tso wrote:
> On Tue, Oct 06, 2009 at 11:06:55PM +0200, Maxim Levitsky wrote: > > > > Just prior to 2.6.32 cycle I tried -next tree and noticed that after a > > failed s2ram (here it works only once, and I test once in a whileto see > > if fixed accidentally) I got a minor filesystem corruption. I am sorry I > > didn't report that back then. > > When you say filesystem corruption, it's important to indicate whether > you meant that (a) you noticed that some files were had corrupted > contents, (b) the kernel complained that the filesystem was corrupted, > and remounted the filesystem read-only, or (c) e2fsck found and fixed > errors. > > Also, when you found errors of either class (a) or (b), did you run > e2fsck to find and fix any potential errors? In a few places it > sounded like the kernel had complained about errors, but you had > ignored them and hadn't run e2fsck to fix them. I hope that was just > me misunderstanding what you wrote! Can you clarify? Sure, kernel noticed errors, and remounted the filesystem R/O (I didn't write anything down. really sorry) I had rebooted the system. Then startup scripts had booted the system to root shell I had run fsck on the filesystem. It had plenty of files with shared blocks, many orphaned inodes, errors in free bitmaps. Then, after the fsck, I got many missing files (many probably went to lost+found), some had garbage, some became truncated (0 size) Mostly were affected files that were from recent dpkg update. I use ubuntu 9.10, and (almost) latest -git of kernel tree. Best regards, Maxim Levitsky -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wednesday 07 October 2009, Henrique de Moraes Holschuh wrote:
> On Tue, 06 Oct 2009, Rafael J. Wysocki wrote: > > > You do sync file-systems before entering the hibernation, don't you? > > > > Yes, a sync is there, but it is not effective on some filesystems. > > Which ones? XFS for one example. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wed, 07 Oct 2009, Rafael J. Wysocki wrote:
> On Wednesday 07 October 2009, Henrique de Moraes Holschuh wrote: > > On Tue, 06 Oct 2009, Rafael J. Wysocki wrote: > > > > You do sync file-systems before entering the hibernation, don't you? > > > > > > Yes, a sync is there, but it is not effective on some filesystems. > > > > Which ones? > > XFS for one example. Interesting. So XFS is not only a Bad Idea for /, but also for anything that might enter S3/S4. Not nice. I sure hope it doesn't do a half-assed job of flushing and checkpointing itself during machine shutdown/restart like it apparently does when told to "sync" before S3/S4... Would you be so kind to disclose to us, the uninitated, which other filesystems are unsafe when faced with a sleep/suspend request? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleHenrique de Moraes Holschuh <hmh@...> writes:
> On Wed, 07 Oct 2009, Rafael J. Wysocki wrote: >> On Wednesday 07 October 2009, Henrique de Moraes Holschuh wrote: >> > On Tue, 06 Oct 2009, Rafael J. Wysocki wrote: >> > > > You do sync file-systems before entering the hibernation, don't you? >> > > >> > > Yes, a sync is there, but it is not effective on some filesystems. >> > >> > Which ones? >> >> XFS for one example. > > Interesting. So XFS is not only a Bad Idea for /, but also for anything > that might enter S3/S4. Not nice. I sure hope it doesn't do a > half-assed job of flushing and checkpointing itself during machine > shutdown/restart like it apparently does when told to "sync" before > S3/S4... For what it is worth, I would also be quite interested to know /why/ XFS is bad in this regard. Is it just the previously stated "XFS writes to disk despite freezing kernel threads" issue, or something deeper? Daniel -- ✣ Daniel Pittman ✉ daniel@... ☎ +61 401 155 707 ♽ made with 100 percent post-consumer electrons -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wed, 07 Oct 2009 01:02:25 +0200
Maxim Levitsky <maximlevitsky@...> wrote: > On Tue, 2009-10-06 at 17:42 -0400, Theodore Tso wrote: > > On Tue, Oct 06, 2009 at 11:06:55PM +0200, Maxim Levitsky wrote: > > > > > > Just prior to 2.6.32 cycle I tried -next tree and noticed that > > > after a failed s2ram (here it works only once, and I test once in > > > a whileto see if fixed accidentally) I got a minor filesystem > > > corruption. I am sorry I didn't report that back then. > > > > Sure, kernel noticed errors, and remounted the filesystem R/O (I > didn't write anything down. really sorry) > > I had rebooted the system. > Then startup scripts had booted the system to root shell > > I had run fsck on the filesystem. It had plenty of files with shared > blocks, many orphaned inodes, errors in free bitmaps. > > > Then, after the fsck, I got many missing files (many probably went to > lost+found), some had garbage, some became truncated (0 size) > > Mostly were affected files that were from recent dpkg update. > > > I use ubuntu 9.10, and (almost) latest -git of kernel tree. I encountered something very similar yesterday, with 2.6.32-rc3. When doing sync after accidentally removing a mounted USB stick, sync got stuck, so I resorted to SysRq+S/U/B. Unfortunately this was also just after an apt-get upgrade. The result was the same, corrupted ext4 partitions, shared blocks, orphaned inodes, free bitmap errors. Only recently written files seem to be affected, in my case upgraded stuff in / and configuration files in /home. -- Jindrich Makovicka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wed, Oct 07, 2009 at 01:14:10PM +1100, Daniel Pittman wrote:
> For what it is worth, I would also be quite interested to know /why/ XFS is > bad in this regard. Is it just the previously stated "XFS writes to disk > despite freezing kernel threads" issue, or something deeper? sync pushes out all data to disk, but in a journaling filesystem that might just but the log not the "normal" place on disk. For a boot loader to deal with it properly it actually needs to do an replay of the log. Grub does so for reiserfs but not for XFS for some reason. I don't know why problems don't trigger more often with ext3, though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
ext4 filesystem corruptionI have more information on that issue.
First of all this isn't related to s2ram/disk. Second, this happened here again 3 times. Now kernel complains loudly about access to freed inode. After a reboot fsck tells the following: - Some directory entries point to freed inodes, Which means these files are gone, but I never deleted some of them - Some inodes have shared blocks - Some orpahaned inodes found - Free block counts/bitmaps corrupted. That all happens without any s2ram/disk cycle. However, yet an unusual situation did happen today. I had installed an update to mountall ubuntu package, and it hosed all boot process. I had to reboot many times, and once did hold the power button for 4 seconds. I also used often the SYSRQ+U/SYSRQ+B tool. On the contrary, I did several s2disk cycles, and one did fail, but there was no corruption. I must say that until now, I had never seen any ext3/ext4 corruption, even though there were many many crashes, power failures, forced reboots, etc... Best regards, Maxim Levitsky -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycle> On Wed, Oct 07, 2009 at 01:14:10PM +1100, Daniel Pittman wrote:
> > For what it is worth, I would also be quite interested to know /why/ XFS is > > bad in this regard. Is it just the previously stated "XFS writes to disk > > despite freezing kernel threads" issue, or something deeper? > > sync pushes out all data to disk, but in a journaling filesystem that > might just but the log not the "normal" place on disk. For a boot > loader to deal with it properly it actually needs to do an replay of > the log. Grub does so for reiserfs but not for XFS for some reason. > I don't know why problems don't trigger more often with ext3, though. I'm sorry for the long delayed and offtopic responce. I discussed this issue with okuji-san (GRUB2 maintainer) at several month ago. He really wish linux implement real sync. A bootloader has much constraint than OS (mainly caused by size constraint). it can't implemnt jornal log replay logic for _all_ filesystem. Why can't we implement storong sync syscall? I don't think this is PM nor bootloader fault. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wed, 04 Nov 2009, KOSAKI Motohiro wrote:
> > On Wed, Oct 07, 2009 at 01:14:10PM +1100, Daniel Pittman wrote: > > > For what it is worth, I would also be quite interested to know /why/ XFS is > > > bad in this regard. Is it just the previously stated "XFS writes to disk > > > despite freezing kernel threads" issue, or something deeper? > > > > sync pushes out all data to disk, but in a journaling filesystem that > > might just but the log not the "normal" place on disk. For a boot > > loader to deal with it properly it actually needs to do an replay of > > the log. Grub does so for reiserfs but not for XFS for some reason. > > I don't know why problems don't trigger more often with ext3, though. > > I'm sorry for the long delayed and offtopic responce. I discussed this > issue with okuji-san (GRUB2 maintainer) at several month ago. > He really wish linux implement real sync. This is not about real sync. It is about the box being able to reboot after a crash or power failure. GRUB2 is broken in that regard, at least in its peecee-BIOS version: last time I checked, it doesn't sort RAID components so that it won't boot from failed or out-of-sync older components, it can't deal with some of the filesystems being unclean... > A bootloader has much constraint than OS (mainly caused by size constraint). > it can't implemnt jornal log replay logic for _all_ filesystem. Why can't we > implement storong sync syscall? I don't think this is PM nor bootloader fault. A bootloader that can't boot a system that went through an unclean shutdown is quite broken. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Thu November 5 2009, Henrique de Moraes Holschuh wrote:
> On Wed, 04 Nov 2009, KOSAKI Motohiro wrote: > > > On Wed, Oct 07, 2009 at 01:14:10PM +1100, Daniel Pittman wrote: > > > > For what it is worth, I would also be quite interested to know > > > > /why/ XFS is bad in this regard. Is it just the previously stated > > > > "XFS writes to disk despite freezing kernel threads" issue, or > > > > something deeper? > > > > > > sync pushes out all data to disk, but in a journaling filesystem that > > > might just but the log not the "normal" place on disk. For a boot > > > loader to deal with it properly it actually needs to do an replay of > > > the log. Grub does so for reiserfs but not for XFS for some reason. > > > I don't know why problems don't trigger more often with ext3, though. > > > > I'm sorry for the long delayed and offtopic responce. I discussed this > > issue with okuji-san (GRUB2 maintainer) at several month ago. > > He really wish linux implement real sync. > > This is not about real sync. It is about the box being able to reboot > after a crash or power failure. > > GRUB2 is broken in that regard, at least in its peecee-BIOS version: > last time I checked, it doesn't sort RAID components so that it won't > boot from failed or out-of-sync older components, it can't deal with > some of the filesystems being unclean... > > > A bootloader has much constraint than OS (mainly caused by size > > constraint). it can't implemnt jornal log replay logic for _all_ > > filesystem. Why can't we implement storong sync syscall? I don't think > > this is PM nor bootloader fault. > > A bootloader that can't boot a system that went through an unclean > shutdown is quite broken. > It can barely boot a system that's gone through a clean shutdown. "bios read error" and all that. -- Thomas Fjellstrom tfjellstrom@... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Wed, Nov 04, 2009 at 11:18:05AM +0900, KOSAKI Motohiro wrote:
> > On Wed, Oct 07, 2009 at 01:14:10PM +1100, Daniel Pittman wrote: > > > For what it is worth, I would also be quite interested to know > > > /why/ XFS is bad in this regard. Is it just the previously > > > stated "XFS writes to disk despite freezing kernel threads" > > > issue, or something deeper? > > > > sync pushes out all data to disk, but in a journaling filesystem > > that might just but the log not the "normal" place on disk. For > > a boot loader to deal with it properly it actually needs to do > > an replay of the log. Grub does so for reiserfs but not for XFS > > for some reason. I don't know why problems don't trigger more > > often with ext3, though. > > I'm sorry for the long delayed and offtopic responce. I discussed > this issue with okuji-san (GRUB2 maintainer) at several month ago. > He really wish linux implement real sync. > > A bootloader has much constraint than OS (mainly caused by size > constraint). it can't implemnt jornal log replay logic for _all_ > filesystem. Why can't we implement storong sync syscall? I don't > think this is PM nor bootloader fault. We already have an ioctl that does what you want: FIFREEZE. Cheers, Dave. -- Dave Chinner david@... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Sun, Nov 08, 2009 at 07:29:05PM +1100, Dave Chinner wrote:
> We already have an ioctl that does what you want: FIFREEZE. Doesn't really help as there is not guarentee important metadata is modified again before the bootloader accesses it, but that's a fate share with any other kind of super sync. The only way to really fix the problem is to implement proper (in-memory) log recovery in the bootloader, especially as it doesn't only have to deal with the relatively easy case of clean shutdowns but also needs to deal with the case of an unclean shutdown with major amounts of updates to the lookup and allocation data structures in the log. IMHO the best option is to have a separate partition for /boot with a very simple filesystem that we can expect boot loader developers to implement fully and correctly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
|
|
Re: [linux-pm] Massive ext4 filesystem corruption after a failed s2disk/ram cycleOn Sun, 08 Nov 2009, Christoph Hellwig wrote:
> IMHO the best option is to have a separate partition for /boot with a > very simple filesystem that we can expect boot loader developers to > implement fully and correctly. Agreed. And one that is not all but abandoned kernel-side, or you will risk bugs there. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@... More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |
| Free embeddable forum powered by Nabble | Forum Help |