zfs panic mounting fs after crash with RC2

View: New views
13 Messages — Rating Filter:   Alert me  

zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

unfortunately I got no answer concerning this problem so far on -stable
and -current (apart from the suggestion to try it again here :-).
I can reproduce the panic, and if someone can guide me what to do with kdb,
gdb, zdb or whatever tool might be needed to get the information needed to
fix this, I'm all ears...


cu
  Gerrit



Begin forwarded message:

Date: Wed, 4 Nov 2009 09:29:00 +0100
From: Gerrit Kühn <gerrit@...>
To: freebsd-stable@...
Cc:
Subject: zfs panic mounting fs after crash with RC2


Hi,

Yesterday I had the opportunity to play around with my yet-to-become new
fileserver a bit more. Originally I had installed 7.2-R, which I upgraded
to 8-0-RC2 yesterday. After that I upgraded my zpool consisting of 4 disks
in raidz1 constallation to v13.
Some time later I tried to use powerd which was obviously a bad idea: it
crashed the machine immediately. I will give a separate report on that
later as it is probably related to the hardware, which is a bit exotic (VIA
VB8001 board with 64bit Via Nano processor).
However, the worst thing for me is, that after rebooting from that crash,
one of my zfs fs cannot be mounted anymore. As soon as I try to mount it I
get a kernel panic. I can still access the properties (I made use of
"canmount=noauto" for the first time :-), but I cannot do a snapshot of
the fs (funny enough, zfs complains that the fs is busy, while in reality
it is not even mounted - so how could it be busy?).

I took a picture of the kernel panic and put it here (don't know if there
is any useful information in it):
<http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic.jpg>

The pool as such seems to be fine, all other fs in it can be mounted and
used, only trying to mount tank/sys/var triggers this panic.
Are there any suggestions what I could do to get my fs back? Please let me
know if (and how) I can provide more debugging information.


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: zfs panic mounting fs after crash with RC2

by Dimitry Andric :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2009-11-06 09:47, Gerrit Kühn wrote:
> unfortunately I got no answer concerning this problem so far on -stable
> and -current (apart from the suggestion to try it again here :-).
> I can reproduce the panic, and if someone can guide me what to do with kdb,
> gdb, zdb or whatever tool might be needed to get the information needed to
> fix this, I'm all ears...

At least a backtrace would be nice. :)

_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2:

DA> > unfortunately I got no answer concerning this problem so far on
DA> > -stable and -current (apart from the suggestion to try it again
DA> > here :-). I can reproduce the panic, and if someone can guide me
DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to
DA> > get the information needed to fix this, I'm all ears...

DA> At least a backtrace would be nice. :)

I know. Unfortunately I know not much about debugging the kernel. I read
<http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html>,
but I do not get a kernel core file, because I run the system from a CF
card and use the hds completely for zfs. I have no swap partition I could
dump to.
Is it possible to dump onto a zfs fs? Or is there any other way for
debugging?


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2:

DA> > unfortunately I got no answer concerning this problem so far on
DA> > -stable and -current (apart from the suggestion to try it again
DA> > here :-). I can reproduce the panic, and if someone can guide me
DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to
DA> > get the information needed to fix this, I'm all ears...

DA> At least a backtrace would be nice. :)

Thinking about my situation and assuming that I cannot dump directly onto
a zfs fs, I could probably either plug in an usb stick and try to dump onto
that or recompile the kernel with ddb to try online debugging. Any
suggestions?


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 13:10:34 +0100 Dimitry Andric <dimitry@...>
wrote about Re: zfs panic mounting fs after crash with RC2:

DA> On 2009-11-06 09:47, Gerrit Kühn wrote:
DA> > unfortunately I got no answer concerning this problem so far on
DA> > -stable and -current (apart from the suggestion to try it again
DA> > here :-). I can reproduce the panic, and if someone can guide me
DA> > what to do with kdb, gdb, zdb or whatever tool might be needed to
DA> > get the information needed to fix this, I'm all ears...

DA> At least a backtrace would be nice. :)

I recomplied the kernel with ddb support and got the following trace
(using mount -t zfs instead of zfs mount this time, but getting the same
panic):

<http://www.pmp.uni-hannover.de/test/Mitarbeiter/g_kuehn/data/zfs-panic2.jpg>


I have the system still sitting at this point and can also 100% reproduce
the panic. Please let me know if (and how) any further information can get
pulled out of the debugger.


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by James R. Van Artsdalen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gerrit Kühn wrote:
> I recomplied the kernel with ddb support and got the following trace
> (using mount -t zfs instead of zfs mount this time, but getting the same
> panic):

You may be able to recover your pool by changing the line below, but I
have never tried it: it may clobber the pool.  You definitely don't want
this change normally!  It may be necessary to avoid calling zil_destroy
here too.

How the ZIL got corrupted - if it did - is a harder question.  What kind
of hard disk is this, and how is it connected to the system?  Was there
any redundancy (mirror, raidz)?

void
zil_replay(objset_t *os, void *arg, uint64_t *txgp,
        zil_replay_func_t *replay_func[TX_MAX_TYPE],
        zil_replay_cleaner_t *replay_cleaner)
{
        zilog_t *zilog = dmu_objset_zil(os);
        const zil_header_t *zh = zilog->zl_header;
        zil_replay_arg_t zr;

==>     if (1 || zil_empty(zilog)) {
                zil_destroy(zilog, B_TRUE);
                return;
        }


_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting
fs after crash with RC2:

JRVA> > I recomplied the kernel with ddb support and got the following
JRVA> > trace (using mount -t zfs instead of zfs mount this time, but
JRVA> > getting the same panic):

JRVA> You may be able to recover your pool by changing the line below, but
JRVA> I have never tried it: it may clobber the pool.  You definitely
JRVA> don't want this change normally!  It may be necessary to avoid
JRVA> calling zil_destroy here too.

Well, as I said before, the pool itself and all other filesystems in it
are fine. The pool can be imported and all other filesystems can be
mounted and used. Just one of it panics the system when I try to mount it.

JRVA> How the ZIL got corrupted - if it did - is a harder question.  What
JRVA> kind of hard disk is this, and how is it connected to the system?
JRVA> Was there any redundancy (mirror, raidz)?

These are 4x2.5" 400GB drives (WD4000BEVT) in a RAID-Z1 setup on a
Supermicro AOC-USAS-L8i controller (LSI chip, mpt driver) in a VIA VB8001
board (powered by a Via Nano 1.6GHz) with 4GB of memory.
The system paniced when I tried to run powerd, after reboot the pool came
back fine, but the system paniced again when trying to mount this
particular fs (tank/sys/var).
Before this happened I had one similar issue when the system crashed
(probably because I was mechanically pushing the controller card a bit too
hard during operation when trying to fix some SATA cables). However, after
this crash the whole pool did not come back and the system paniced when
trying to import the pool - but also with ZIL-replay problems. As this
happened right after installing the base system, I simply re-did the pool
and re-installed the system.
However, with this similar problem after a quite "normal" crash (hey, I
only started powerd) my confidence is a bit low and I would like to have
this fixed before I really put data on the machine.


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting
fs after crash with RC2:

JRVA> How the ZIL got corrupted - if it did - is a harder question.

I think it is. Otherwise zfs would not crash while trying to replay the
ZIL, wouldn't it?
It seems that this happens rather easily with the system I have at hand
(it happend twice to me so far - and I crashed the system only twice,
that makes 100%, although I doubt that it is that reproducible). Searching
around I found some reports of the same or similar issues (but no
solution). So apart from recovering my fs (I did not try your suggested
patch yet), there are two things I regard as very important:

1. Find you why the ZIL gets corrupted under some circumstances.
2. Find a safe way to recover a fs with a corrupted ZIL.

I guess I could live with a corrupted ZIL after a crash, if there was some
kind of --ignore-zil switch to get my data back online. In any case, zfs
should not panic on corrupted ZIL data, should it?

As I do not dare to use the system for storing data until this is sorted
out, I can try out almost anything to get more information about the
problem. Please let me know what I should do to support debugging.


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Graham Todd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Gerrit Kühn wrote:

> On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
> <james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting
> fs after crash with RC2:
>
> JRVA> How the ZIL got corrupted - if it did - is a harder question.
>
> I think it is. Otherwise zfs would not crash while trying to replay the
> ZIL, wouldn't it?
> It seems that this happens rather easily with the system I have at hand
> (it happend twice to me so far - and I crashed the system only twice,
> that makes 100%, although I doubt that it is that reproducible). Searching
> around I found some reports of the same or similar issues (but no
> solution). So apart from recovering my fs (I did not try your suggested
> patch yet), there are two things I regard as very important:
>
> 1. Find you why the ZIL gets corrupted under some circumstances.
> 2. Find a safe way to recover a fs with a corrupted ZIL.
>
> I guess I could live with a corrupted ZIL after a crash, if there was some
> kind of --ignore-zil switch to get my data back online. In any case, zfs
> should not panic on corrupted ZIL data, should it?

Is there is a way to "manually" use zdb to mimic the "zpool clear" command
introduced in OpenSolaris's ZFS with PSARC-2009479?

http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html

I have no idea if this would help: in fact it might very well be dangerous
for the pool that Gerrit is trying to recover.  Are you able to copy the
pool somehow before trying experiments?

I think the current state of "disaster recovery" tools and methods for ZFS
makes some folks nervous. With so much error checking "built in" there's
fewer tried and true "old school" sysadmin approaches to recovering lost
data after the fact. So thanks for debugging your problem in public. I
hope you can resolve things and document how you did it for everyone.

Good luck.




_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 10 Nov 2009 10:01:06 -0500 Graham Todd <gtodd@...> wrote
about Re: trace for zfs panic mounting fs after crash with RC2:

GT> > I guess I could live with a corrupted ZIL after a crash, if there
GT> > was some kind of --ignore-zil switch to get my data back online. In
GT> > any case, zfs should not panic on corrupted ZIL data, should it?
GT>
GT> Is there is a way to "manually" use zdb to mimic the "zpool clear"
GT> command introduced in OpenSolaris's ZFS with PSARC-2009479?
GT>
GT> http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html

FYI: Meanwhile I opened a PR for the issue (kern/140433) and got some
request for additional zdb input (that I will hopefully be able to provide
later this evening).
The page above looks interesting, though. There it is mentioned (in the
comments) that you can achieve the same thing zpool clear does... but it
is not mentioned how. Does anyone here know?

GT> I have no idea if this would help: in fact it might very well be
GT> dangerous for the pool that Gerrit is trying to recover.  Are you able
GT> to copy the pool somehow before trying experiments?

I do not care that much about this specific pool, since I only installed
the system and some software. But I want to know I can handle this
situation before I put data on the disks. :-)

GT> I think the current state of "disaster recovery" tools and methods for
GT> ZFS makes some folks nervous. With so much error checking "built in"
GT> there's fewer tried and true "old school" sysadmin approaches to
GT> recovering lost data after the fact.

As long as these situations do not happen, it's ok for me to have no way
to recover. :-)
I am using zfs since Pawel made the first patchset available in autumn
2006 and never had to face a situation like this so far. As it happend two
times in a row now on this new machine, I guess it must have something to
do with the hardware. OTOH, everything seems to run fine, unless the
machine crashes and corrupts the zil in some strange way.

GT> So thanks for debugging your
GT> problem in public. I hope you can resolve things and document how you
GT> did it for everyone.

I hope we get this resolved, too. As long as I do not have to fear to
loose important data, I can do almost anything with the machine for
debugging.


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Artem Belevich :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> GT> http://www.c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-a.html
>
> The page above looks interesting, though. There it is mentioned (in the
> comments) that you can achieve the same thing zpool clear does... but it
> is not mentioned how. Does anyone here know?

Perhaps some of the links on the following post on zfs-discuss may help:
http://www.mail-archive.com/zfs-discuss@.../msg26704.html

Another option would be to boot from OpenSolaris LiveCD that contains
latest zfs changes, import your pool there, fix, export and then
re-import it on FreeBSD. Make sure you don't upgrade your pool while
running OpenSolaris.

--Artem
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 10 Nov 2009 08:17:55 -0800 Artem Belevich <fbsdlist@...> wrote
about Re: trace for zfs panic mounting fs after crash with RC2:

AB> Perhaps some of the links on the following post on zfs-discuss may
AB> help:
AB> http://www.mail-archive.com/zfs-discuss@.../msg26704.html

Interesting stuff, thanks.
At a first glance I do not see an easy way to roll back my pool to a
slightly previous (consistent) state, but all the posts state that it is
possible. I guess I have to dive into this a bit deeper. "zpool clear -F"
definitely would be the easier-to-use solution.

AB> Another option would be to boot from OpenSolaris LiveCD that contains
AB> latest zfs changes, import your pool there, fix, export and then
AB> re-import it on FreeBSD. Make sure you don't upgrade your pool while
AB> running OpenSolaris.

Uh, yes, not really an option in this case, I guess. Unless I buy an
additional external CD drive and stuff. But thanks for the hint, anyway. I
will have a look around how difficult it is to get recent OpenSolaris on a
USB stick...


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."

Re: trace for zfs panic mounting fs after crash with RC2

by Gerrit Kühn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 06 Nov 2009 17:02:23 -0600 "James R. Van Artsdalen"
<james-freebsd-fs2@...> wrote about Re: trace for zfs panic mounting
fs after crash with RC2:

JRVA> How the ZIL got corrupted - if it did - is a harder question.  What
JRVA> kind of hard disk is this, and how is it connected to the system?
JRVA> Was there any redundancy (mirror, raidz)?

I have been thinking about this for some time now. I have almost the same
controller (low-profile version, different bios, but otherwise identical)
in use without these problems.
Can the 2.5" disks cause any problems? The problematic system is the only
one I have with the small drives. Maybe they somehow "lie" to the system
about the data actually being written? I remember that a long time ago
(about 10 years?) FreeBSD people suggested to turn off the write cache of
disk drives to prevent data losses. I see that the sysctl hw.ata.wc is
still there. Do people here think that this is worth giving a try? Are
there any recent experiences concerning the performance-wise impact on zfs
when turning off wc?


cu
  Gerrit
_______________________________________________
freebsd-fs@... mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@..."