|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
hammer errorsHi,
ever since the last time I had CRC problems on my router box, I've developed the habit of doing a daily 'hammer -f /dev/ad4s1d show |& grep "^B"' to see if any new errors crept up, and today I found: yoyodyne# hammer -f /dev/ad4s1d show |& grep "^B" B dataoff=a00000714d120000/65536 crc=7e4f7545 B dataoff=a000007171380000/65536 crc=616b1cc1 Console log for the recent days is: Nov 7 03:15:19 <kern.crit> yoyodyne kernel: HAMMER: Warning: rebalance caught race against propagate Nov 7 03:15:19 <kern.crit> yoyodyne last message repeated 2 times Nov 8 03:05:33 <kern.crit> yoyodyne kernel: bio_page_alloc: WARNING emergency page allocation Nov 8 03:19:41 <kern.info> yoyodyne kernel: nfs send error 32 for server 192.168.0.10:/backup Nov 8 03:19:41 <kern.info> yoyodyne kernel: receive error 54 from nfs server 192.168.0.10:/backup Nov 9 03:56:32 <kern.crit> yoyodyne kernel: Warning: vfsync_bp skipping dirty buffer 0xc2706098 Nov 9 03:57:03 <kern.crit> yoyodyne kernel: Warning: vfsync_bp skipping dirty buffer 0xc26eb26c smartctl -a /dev/ad4 doesn't report any problems. The box is running 2.4.1 (v2.4.1.8.g93de5-RELEASE, to be specific). So my question is: What are my next steps in order to help resolve this issue? Is there any way to get e.g. to the names of the files affected by this problem from the data which is output by 'hammer show'? So far the only thing I've done is to disable nightly hammer cleanup because DragonFly, upon encountering a CRC error, will unfortunately simply drop to the debugger without panicing, so this doesn't get caught by DDB_UNATTENDED as far as I can tell (Matt, are there any plans to change this unpleasant behavior?). And I won't be near that box until next weekend. Regards, Sascha -- http://yoyodyne.ath.cx |
|
|
Re: hammer errors:Hi,
: :ever since the last time I had CRC problems on my router box, I've :developed the habit of doing a daily 'hammer -f /dev/ad4s1d show |& grep :"^B"' to see if any new errors crept up, and today I found: : :yoyodyne# hammer -f /dev/ad4s1d show |& grep "^B" :B dataoff=a00000714d120000/65536 crc=7e4f7545 :B dataoff=a000007171380000/65536 crc=616b1cc1 The question is whether it is real or not. If the filesystem is mounted live then the show command could be catching things in odd states. :Console log for the recent days is: : :Nov 7 03:15:19 <kern.crit> yoyodyne kernel: HAMMER: Warning: rebalance :caught race against propagate :... None of those are serious. Basically just debug messages that will be removed soon. The emergency page allocation for BIO is unrelated to the filesystem code. It's also actually just a warning (telling me that something is eating too many free VM pages). :So my question is: What are my next steps in order to help resolve this :issue? Is there any way to get e.g. to the names of the files affected :by this problem from the data which is output by 'hammer show'? : :So far the only thing I've done is to disable nightly hammer cleanup :because DragonFly, upon encountering a CRC error, will unfortunately :simply drop to the debugger without panicing, so this doesn't get caught :by DDB_UNATTENDED as far as I can tell (Matt, are there any plans to :change this unpleasant behavior?). And I won't be near that box until :next weekend. : :Regards, :Sascha I fixed the behavior in current. There is now a sysctl which controls whether it drops into the debugger or not (and it does not by default). Though it doesn't panic... maybe the sysctl should be modified to give it the ability to panic instead of propagating an error code up the call chain. The filesystem still drops into read-only mode if an error is encountered. What you want to do now is run 'hammer -f ... show | less -B' and search for B, as in '/^B'. less -B uses a fixed buffer so if you scroll down you basically cannot scroll back up (by much), which allows you to pipe gigabytes and gigabytes of text through it without it malloc()ing itself into oblivion. You want to try to find the problem area and get more context out of it, such as the object id. And also to determine whether the problem area is real or not. Again the filesystem has to be idle and it would be even better if it were offline entirely. -Matt Matthew Dillon <dillon@...> |
|
|
Re: hammer errorsMatthew Dillon schrieb:
> :So my question is: What are my next steps in order to help resolve this > :issue? Is there any way to get e.g. to the names of the files affected > :by this problem from the data which is output by 'hammer show'? > : > :So far the only thing I've done is to disable nightly hammer cleanup > :because DragonFly, upon encountering a CRC error, will unfortunately > :simply drop to the debugger without panicing, so this doesn't get caught > :by DDB_UNATTENDED as far as I can tell (Matt, are there any plans to > :change this unpleasant behavior?). And I won't be near that box until > :next weekend. > : > :Regards, > :Sascha > > I fixed the behavior in current. There is now a sysctl which > controls whether it drops into the debugger or not (and it does not > by default). Though it doesn't panic... maybe the sysctl should be > modified to give it the ability to panic instead of propagating an > error code up the call chain. The filesystem still drops into > read-only mode if an error is encountered. > > What you want to do now is run 'hammer -f ... show | less -B' and > search for B, as in '/^B'. less -B uses a fixed buffer so if you > scroll down you basically cannot scroll back up (by much), which allows > you to pipe gigabytes and gigabytes of text through it without it > malloc()ing itself into oblivion. You want to try to find the problem > area and get more context out of it, such as the object id. And also > to determine whether the problem area is real or not. OK, here's some more context from the errors. Is that enough? I fear I'm not used enough to reading hammer show output. I will re-check the filesystem in an unmounted state on the weekend. G------ ELM 24 R obj=000000011164da5c key=000000000c250000 lo=00040002 rt=10 ot=02 tids 0000000111655c50:0000000000000000 B dataoff=a00000714d120000/65536 crc=7e4f7545 fills=z10:58010=100% G------ ELM 0 R obj=000000011164da5c key=00000000302e0000 lo=00040002 rt=10 ot=02 tids 0000000111656eb0:0000000000000000 B dataoff=a000007171380000/65536 crc=616b1cc1 fills=z10:58082=100% obj is the same for both even though they are in different parts of the hammer show output. Sascha -- http://yoyodyne.ath.cx |
| Free embeddable forum powered by Nabble | Forum Help |