|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
zfs panic mounting fs after crash with RC2Hi,
unfortunately I got no answer concerning this problem so far on -stable and -current (apart from the suggestion to try it again here :-). I can reproduce the panic, and if someone can guide me what to do with kdb, gdb, zdb or whatever tool might be needed to get the information needed to fix this, I'm all ears... cu Gerrit Begin forwarded message: Date: Wed, 4 Nov 2009 09:29:00 +0100 From: Gerrit Kühn <gerrit@...> To: freebsd-stable@... Cc: Subject: zfs panic mounting fs after crash with RC2 Hi, Yesterday I had the opportunity to play around with my yet-to-become new fileserver a bit more. Originally I had installed 7.2-R, which I upgraded to 8-0-RC2 yesterday. After that I upgraded my zpool consisting of 4 disks in raidz1 constallation to v13. Some time later I tried to use powerd which was obviously a bad idea: it crashed the machine immediately. I will give a separate report on that later as it is probably related to the hardware, which is a bit exotic (VIA VB8001 board with 64bit Via Nano processor). However, the worst thing for me is, that after rebooting from that crash, one of my zfs fs cannot be mounted anymore. As soon as I try to mount it I get a kernel panic. I can still access the properties (I made use of "canmount=noauto" for the first time :-), but I cannot do a snapshot of the fs (funny enough, zfs complains that the fs is busy, while in reality it is not even mounted - so how could it be busy?). I took a picture of the kernel panic and put it here (don't know if there is any useful information in it): <http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic.jpg> The pool as such seems to be fine, all other fs in it can be mounted and used, only trying to mount tank/sys/var triggers this panic. Are there any suggestions what I could do to get my fs back? Please let me know if (and how) I can provide more debugging information. cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: zfs panic mounting fs after crash with RC2On 2009-11-06 09:47, Gerrit Kühn wrote:
> unfortunately I got no answer concerning this problem so far on -stable > and -current (apart from the suggestion to try it again here :-). > I can reproduce the panic, and if someone can guide me what to do with kdb, > gdb, zdb or whatever tool might be needed to get the information needed to > fix this, I'm all ears... At least a backtrace would be nice. :) _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2: DA> > unfortunately I got no answer concerning this problem so far on DA> > -stable and -current (apart from the suggestion to try it again DA> > here :-). I can reproduce the panic, and if someone can guide me DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to DA> > get the information needed to fix this, I'm all ears... DA> At least a backtrace would be nice. :) I know. Unfortunately I know not much about debugging the kernel. I read <http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html>, but I do not get a kernel core file, because I run the system from a CF card and use the hds completely for zfs. I have no swap partition I could dump to. Is it possible to dump onto a zfs fs? Or is there any other way for debugging? cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2: DA> > unfortunately I got no answer concerning this problem so far on DA> > -stable and -current (apart from the suggestion to try it again DA> > here :-). I can reproduce the panic, and if someone can guide me DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to DA> > get the information needed to fix this, I'm all ears... DA> At least a backtrace would be nice. :) Thinking about my situation and assuming that I cannot dump directly onto a zfs fs, I could probably either plug in an usb stick and try to dump onto that or recompile the kernel with ddb to try online debugging. Any suggestions? cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
trace for zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2: DA> On 2009-11-06 09:47, Gerrit Kühn wrote: DA> > unfortunately I got no answer concerning this problem so far on DA> > -stable and -current (apart from the suggestion to try it again DA> > here :-). I can reproduce the panic, and if someone can guide me DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to DA> > get the information needed to fix this, I'm all ears... DA> At least a backtrace would be nice. :) I recomplied the kernel with ddb support and got the following trace (using mount -t zfs instead of zfs mount this time, but getting the same panic): <http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic2.jpg> I have the system still sitting at this point and can also 100% reproduce the panic. Please let me know if (and how) any further information can get pulled out of the debugger. cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2Gerrit Kühn wrote:
> I recomplied the kernel with ddb support and got the following trace > (using mount -t zfs instead of zfs mount this time, but getting the same > panic): You may be able to recover your pool by changing the line below, but I have never tried it: it may clobber the pool. You definitely don't want this change normally! It may be necessary to avoid calling zil_destroy here too. How the ZIL got corrupted - if it did - is a harder question. What kind of hard disk is this, and how is it connected to the system? Was there any redundancy (mirror, raidz)? void zil_replay(objset_t *os, void *arg, uint64_t *txgp, zil_replay_func_t *replay_func[TX_MAX_TYPE], zil_replay_cleaner_t *replay_cleaner) { zilog_t *zilog = dmu_objset_zil(os); const zil_header_t *zh = zilog->zl_header; zil_replay_arg_t zr; ==> if (1 || zil_empty(zilog)) { zil_destroy(zilog, B_TRUE); return; } _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting fs after crash with RC2: JRVA> > I recomplied the kernel with ddb support and got the following JRVA> > trace (using mount -t zfs instead of zfs mount this time, but JRVA> > getting the same panic): JRVA> You may be able to recover your pool by changing the line below, but JRVA> I have never tried it: it may clobber the pool. You definitely JRVA> don't want this change normally! It may be necessary to avoid JRVA> calling zil_destroy here too. Well, as I said before, the pool itself and all other filesystems in it are fine. The pool can be imported and all other filesystems can be mounted and used. Just one of it panics the system when I try to mount it. JRVA> How the ZIL got corrupted - if it did - is a harder question. What JRVA> kind of hard disk is this, and how is it connected to the system? JRVA> Was there any redundancy (mirror, raidz)? These are 4x2.5" 400GB drives (WD4000BEVT) in a RAID-Z1 setup on a Supermicro AOC-USAS-L8i controller (LSI chip, mpt driver) in a VIA VB8001 board (powered by a Via Nano 1.6GHz) with 4GB of memory. The system paniced when I tried to run powerd, after reboot the pool came back fine, but the system paniced again when trying to mount this particular fs (tank/sys/var). Before this happened I had one similar issue when the system crashed (probably because I was mechanically pushing the controller card a bit too hard during operation when trying to fix some SATA cables). However, after this crash the whole pool did not come back and the system paniced when trying to import the pool - but also with ZIL-replay problems. As this happened right after installing the base system, I simply re-did the pool and re-installed the system. However, with this similar problem after a quite "normal" crash (hey, I only started powerd) my confidence is a bit low and I would like to have this fixed before I really put data on the machine. cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting fs after crash with RC2: JRVA> How the ZIL got corrupted - if it did - is a harder question. I think it is. Otherwise zfs would not crash while trying to replay the ZIL, wouldn't it? It seems that this happens rather easily with the system I have at hand (it happend twice to me so far - and I crashed the system only twice, that makes 100%, although I doubt that it is that reproducible). Searching around I found some reports of the same or similar issues (but no solution). So apart from recovering my fs (I did not try your suggested patch yet), there are two things I regard as very important: 1. Find you why the ZIL gets corrupted under some circumstances. 2. Find a safe way to recover a fs with a corrupted ZIL. I guess I could live with a corrupted ZIL after a crash, if there was some kind of --ignore-zil switch to get my data back online. In any case, zfs should not panic on corrupted ZIL data, should it? As I do not dare to use the system for storing data until this is sorted out, I can try out almost anything to get more information about the problem. Please let me know what I should do to support debugging. cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2Gerrit Kühn wrote:
> On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen" > <james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting > fs after crash with RC2: > > JRVA> How the ZIL got corrupted - if it did - is a harder question. > > I think it is. Otherwise zfs would not crash while trying to replay the > ZIL, wouldn't it? > It seems that this happens rather easily with the system I have at hand > (it happend twice to me so far - and I crashed the system only twice, > that makes 100%, although I doubt that it is that reproducible). Searching > around I found some reports of the same or similar issues (but no > solution). So apart from recovering my fs (I did not try your suggested > patch yet), there are two things I regard as very important: > > 1. Find you why the ZIL gets corrupted under some circumstances. > 2. Find a safe way to recover a fs with a corrupted ZIL. > > I guess I could live with a corrupted ZIL after a crash, if there was some > kind of --ignore-zil switch to get my data back online. In any case, zfs > should not panic on corrupted ZIL data, should it? Is there is a way to "manually" use zdb to mimic the "zpool clear" command introduced in OpenSolaris's ZFS with PSARC-2009479? http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html I have no idea if this would help: in fact it might very well be dangerous for the pool that Gerrit is trying to recover. Are you able to copy the pool somehow before trying experiments? I think the current state of "disaster recovery" tools and methods for ZFS makes some folks nervous. With so much error checking "built in" there's fewer tried and true "old school" sysadmin approaches to recovering lost data after the fact. So thanks for debugging your problem in public. I hope you can resolve things and document how you did it for everyone. Good luck. _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2On Tue, 10 Nov 2009 10:01:06 -0500 Graham Todd <gtodd@...> wrote
about Re: trace for zfs panic mounting fs after crash with RC2: GT> > I guess I could live with a corrupted ZIL after a crash, if there GT> > was some kind of --ignore-zil switch to get my data back online. In GT> > any case, zfs should not panic on corrupted ZIL data, should it? GT> GT> Is there is a way to "manually" use zdb to mimic the "zpool clear" GT> command introduced in OpenSolaris's ZFS with PSARC-2009479? GT> GT> http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html FYI: Meanwhile I opened a PR for the issue (kern/140433) and got some request for additional zdb input (that I will hopefully be able to provide later this evening). The page above looks interesting, though. There it is mentioned (in the comments) that you can achieve the same thing zpool clear does... but it is not mentioned how. Does anyone here know? GT> I have no idea if this would help: in fact it might very well be GT> dangerous for the pool that Gerrit is trying to recover. Are you able GT> to copy the pool somehow before trying experiments? I do not care that much about this specific pool, since I only installed the system and some software. But I want to know I can handle this situation before I put data on the disks. :-) GT> I think the current state of "disaster recovery" tools and methods for GT> ZFS makes some folks nervous. With so much error checking "built in" GT> there's fewer tried and true "old school" sysadmin approaches to GT> recovering lost data after the fact. As long as these situations do not happen, it's ok for me to have no way to recover. :-) I am using zfs since Pawel made the first patchset available in autumn 2006 and never had to face a situation like this so far. As it happend two times in a row now on this new machine, I guess it must have something to do with the hardware. OTOH, everything seems to run fine, unless the machine crashes and corrupts the zil in some strange way. GT> So thanks for debugging your GT> problem in public. I hope you can resolve things and document how you GT> did it for everyone. I hope we get this resolved, too. As long as I do not have to fear to loose important data, I can do almost anything with the machine for debugging. cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2> GT> http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html
> > The page above looks interesting, though. There it is mentioned (in the > comments) that you can achieve the same thing zpool clear does... but it > is not mentioned how. Does anyone here know? Perhaps some of the links on the following post on zfs-discuss may help: http://www.mail-archive.com/zfs-discuss@.../msg26704.html Another option would be to boot from OpenSolaris LiveCD that contains latest zfs changes, import your pool there, fix, export and then re-import it on FreeBSD. Make sure you don't upgrade your pool while running OpenSolaris. --Artem _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2On Tue, 10 Nov 2009 08:17:55 -0800 Artem Belevich <fbsdlist@...> wrote
about Re: trace for zfs panic mounting fs after crash with RC2: AB> Perhaps some of the links on the following post on zfs-discuss may AB> help: AB> http://www.mail-archive.com/zfs-discuss@.../msg26704.html Interesting stuff, thanks. At a first glance I do not see an easy way to roll back my pool to a slightly previous (consistent) state, but all the posts state that it is possible. I guess I have to dive into this a bit deeper. "zpool clear -F" definitely would be the easier-to-use solution. AB> Another option would be to boot from OpenSolaris LiveCD that contains AB> latest zfs changes, import your pool there, fix, export and then AB> re-import it on FreeBSD. Make sure you don't upgrade your pool while AB> running OpenSolaris. Uh, yes, not really an option in this case, I guess. Unless I buy an additional external CD drive and stuff. But thanks for the hint, anyway. I will have a look around how difficult it is to get recent OpenSolaris on a USB stick... cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
|
|
Re: trace for zfs panic mounting fs after crash with RC2On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting fs after crash with RC2: JRVA> How the ZIL got corrupted - if it did - is a harder question. What JRVA> kind of hard disk is this, and how is it connected to the system? JRVA> Was there any redundancy (mirror, raidz)? I have been thinking about this for some time now. I have almost the same controller (low-profile version, different bios, but otherwise identical) in use without these problems. Can the 2.5" disks cause any problems? The problematic system is the only one I have with the small drives. Maybe they somehow "lie" to the system about the data actually being written? I remember that a long time ago (about 10 years?) FreeBSD people suggested to turn off the write cache of disk drives to prevent data losses. I see that the sysctl hw.ata.wc is still there. Do people here think that this is worth giving a try? Are there any recent experiences concerning the performance-wise impact on zfs when turning off wc? cu Gerrit _______________________________________________ freebsd-fs@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |