|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
/bin/sh core dumps on FreeBSD 7.2Hi!
Suddenly /bin/sh started to crash all the time with core dumps. I'm running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything lately. The /bin/sh binary seems to be untouched. It might be some hardware trouble, but the machine seems to run OK now. (I had to replace /bin/sh with a symlink to /rescue/sh.) I would like to track down the problem, but running sh I only get "Segmentation fault: 11 (core dumped)". I would be happy to run gdb and give you a backtrace. Any clues? Hans PS! I tried to run "freebsd-update IDS" to see if any files are broken, but it stops at Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: Input/output error _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: /bin/sh core dumps on FreeBSD 7.2Hans F. Nordhaug wrote:
> Hi! > > Suddenly /bin/sh started to crash all the time with core dumps. I'm > running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything > lately. The /bin/sh binary seems to be untouched. It might be some > hardware trouble, but the machine seems to run OK now. (I had to > replace /bin/sh with a symlink to /rescue/sh.) > > I would like to track down the problem, but running sh I only get > "Segmentation fault: 11 (core dumped)". I would be happy to run > gdb and give you a backtrace. Any clues? > > Hans > > PS! I tried to run "freebsd-update IDS" to see if any files are > broken, but it stops at > Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: Input/output error All of this points to a hardware problem. I think the best thing you can try is to manually get a hash fingerprint of your sh and compare it with another, known good copy. _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: /bin/sh core dumps on FreeBSD 7.2Ivan Voras wrote:
> Hans F. Nordhaug wrote: >> Hi! >> >> Suddenly /bin/sh started to crash all the time with core dumps. I'm >> running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything >> lately. The /bin/sh binary seems to be untouched. It might be some >> hardware trouble, but the machine seems to run OK now. (I had to >> replace /bin/sh with a symlink to /rescue/sh.) >> >> I would like to track down the problem, but running sh I only get >> "Segmentation fault: 11 (core dumped)". I would be happy to run >> gdb and give you a backtrace. Any clues? >> >> Hans >> >> PS! I tried to run "freebsd-update IDS" to see if any files are >> broken, but it stops at >> Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: >> Input/output error > > All of this points to a hardware problem. > Last time I saw things like this it was either a hard drive on the way out, or a PSU dying. Run some pre-OS tests (Ultimate boot cd or something) to try and get some results outside of the OS > I think the best thing you can try is to manually get a hash fingerprint > of your sh and compare it with another, known good copy. > > _______________________________________________ > freebsd-stable@... mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: /bin/sh core dumps on FreeBSD 7.2On Thu, Nov 12, 2009 at 11:33:08AM +0100, Hans F. Nordhaug wrote:
> Suddenly /bin/sh started to crash all the time with core dumps. I'm > running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything > lately. The /bin/sh binary seems to be untouched. It might be some > hardware trouble, but the machine seems to run OK now. (I had to > replace /bin/sh with a symlink to /rescue/sh.) > > I would like to track down the problem, but running sh I only get > "Segmentation fault: 11 (core dumped)". I would be happy to run > gdb and give you a backtrace. Any clues? > > PS! I tried to run "freebsd-update IDS" to see if any files are > broken, but it stops at > Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: Input/output error Hardware problem. Take your pick: bad RAM, bad hard disk, bad motherboard, bad PSU, bad cabling. You can rule out hard disk problems by installing smartmontools from ports (sysutils/smartmontools). Please provide output from the following command: smartctl -a /dev/{disk} Where {disk} is "ad4", "da0", or similar -- and NOT something like "ad8s1" or "da0s1d". If multiple disks are in your machine -- the one you want is the disk you boot from (where /boot exists, and/or root filesystem). I can teach you how to decode/read SMART statistics correctly. -- | Jeremy Chadwick jdc@... | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
SMARTJeremy Chadwick wrote:
> I can teach you how to decode/read SMART statistics correctly. > Actually, it would be good if you taught more than him :) I've always wondered how important are each of the dozen or so statistics and what indicates what... Here is for example my desktop drive: SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always - 45398197 3 Spin_Up_Time 0x0003 096 093 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 64 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 247407473 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 64 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 058 055 045 Old_age Always - 42 (Lifetime Min/Max 37/44) 194 Temperature_Celsius 0x0022 042 045 000 Old_age Always - 42 (0 20 0 0) 195 Hardware_ECC_Recovered 0x001a 062 059 000 Old_age Always - 45398197 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 I see many values exceeding threshold but since I see it so often on other drives I don't know what the threshold is for. _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn Nov 12, 2009, at 1:25 PM, Ivan Voras wrote:
> Actually, it would be good if you taught more than him :) > > I've always wondered how important are each of the dozen or so statistics and what indicates what... > > Here is for example my desktop drive: > > SMART Attributes Data Structure revision number: 10 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always - 45398197 > 3 Spin_Up_Time 0x0003 096 093 000 Pre-fail Always - 0 > 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 64 > 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 > 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 247407473 > 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155 > 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 > 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 64 > 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 > 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 > 190 Airflow_Temperature_Cel 0x0022 058 055 045 Old_age Always - 42 (Lifetime Min/Max 37/44) > 194 Temperature_Celsius 0x0022 042 045 000 Old_age Always - 42 (0 20 0 0) > 195 Hardware_ECC_Recovered 0x001a 062 059 000 Old_age Always - 45398197 > 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 > 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 > 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 > 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 > 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 > > I see many values exceeding threshold but since I see it so often on other drives I don't know what the threshold is for. Also, judging by the raw read error rate, seek error rate and hardward ECC recovered, allow me to guess that this is a Seagate drive. :-) (Seagate drives, perhaps among others, use these raw values way differently than others. My Hitachi 7K1000.B has 0 on those.) Regards, Thomas_______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTThomas Backman wrote:
> On Nov 12, 2009, at 1:25 PM, Ivan Voras wrote: >> Actually, it would be good if you taught more than him :) >> >> I've always wondered how important are each of the dozen or so statistics and what indicates what... >> >> Here is for example my desktop drive: >> >> SMART Attributes Data Structure revision number: 10 >> Vendor Specific SMART Attributes with Thresholds: >> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE >> 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always - 45398197 >> 3 Spin_Up_Time 0x0003 096 093 000 Pre-fail Always - 0 >> 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 64 >> 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 >> 7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 247407473 >> 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155 >> 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 >> 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 64 >> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 >> 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 >> 190 Airflow_Temperature_Cel 0x0022 058 055 045 Old_age Always - 42 (Lifetime Min/Max 37/44) >> 194 Temperature_Celsius 0x0022 042 045 000 Old_age Always - 42 (0 20 0 0) >> 195 Hardware_ECC_Recovered 0x001a 062 059 000 Old_age Always - 45398197 >> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 >> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 >> 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 >> 200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0 >> 202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0 >> >> I see many values exceeding threshold but since I see it so often on other drives I don't know what the threshold is for. > None of the your values are exceeding the threshold - it works backwards. If the value is LOWER than the threshold, you might be in trouble. Good to know. > Also, judging by the raw read error rate, seek error rate and hardward ECC recovered, allow me to guess that this is a Seagate drive. :-) > (Seagate drives, perhaps among others, use these raw values way differently than others. My Hitachi 7K1000.B has 0 on those.) Yes, it's Seagate. Statistically I have the least problems with their drives. But I imagine that lack of standardization about these statistics very much limits the usability of SMART, right? _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn Thu, 12 Nov 2009 13:56:16 +0100
Ivan Voras <ivoras@...> wrote: > Yes, it's Seagate. Statistically I have the least problems with their > drives. But I imagine that lack of standardization about these > statistics very much limits the usability of SMART, right? > The main problem with SMART appears to be that it's not an accurate predictor of drive failure, according to a study done at Google - see http://labs.google.com/papers/disk_failures.pdf -- Bruce Cran _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTBruce Cran wrote:
> On Thu, 12 Nov 2009 13:56:16 +0100 > Ivan Voras <ivoras@...> wrote: > >> Yes, it's Seagate. Statistically I have the least problems with their >> drives. But I imagine that lack of standardization about these >> statistics very much limits the usability of SMART, right? > > The main problem with SMART appears to be that it's not an accurate > predictor of drive failure, according to a study done at Google - see > http://labs.google.com/papers/disk_failures.pdf I've seen it. But I don't remember if they addressed the problem of nonstandard interpretations of statistics? I do remember they said they buy from multiple drive vendors. _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn 2009-11-12 14:35, Ivan Voras wrote:
> I've seen it. But I don't remember if they addressed the problem of > nonstandard interpretations of statistics? Note the statistics you quoted are "Vendor Specific SMART Attributes", so it is quite logical for different vendors to have different statistics. :) _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTDimitry Andric wrote:
> On 2009-11-12 14:35, Ivan Voras wrote: >> I've seen it. But I don't remember if they addressed the problem of >> nonstandard interpretations of statistics? > > Note the statistics you quoted are "Vendor Specific SMART Attributes", > so it is quite logical for different vendors to have different > statistics. :) I see your point :) Though I would hope that a statistics like: 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always - 45398197 would have an equivalent across vendors :) I know, it's too much to ask :) _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn Thu, 12 Nov 2009, Ivan Voras wrote:
> Dimitry Andric wrote: > > On 2009-11-12 14:35, Ivan Voras wrote: > > > I've seen it. But I don't remember if they addressed the problem of > > > nonstandard interpretations of statistics? > > > > Note the statistics you quoted are "Vendor Specific SMART Attributes", > > so it is quite logical for different vendors to have different > > statistics. :) > > I see your point :) > > Though I would hope that a statistics like: > > 1 Raw_Read_Error_Rate 0x000f 087 083 006 Pre-fail Always > - 45398197 > > would have an equivalent across vendors :) I know, it's too much to ask :) True .. but all you really need to know is that as far as your disk vendor is concerned, your error rate is 87 (somethings), the worst it's ever been is 83 and if it were nearer 6 somethings, you should worry :) 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 10155 Seagate says you're only 11% on the way to (mean) oblivion .. if you believe it should run 11.4 years. We had one 4GB IBM drive that did! cheers, Ian _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn Thu, Nov 12, 2009 at 01:25:12PM +0100, Ivan Voras wrote:
> Jeremy Chadwick wrote: > > >I can teach you how to decode/read SMART statistics correctly. > > > > Actually, it would be good if you taught more than him :) > > I've always wondered how important are each of the dozen or so > statistics and what indicates what... I started a write-up but after writing about 300 lines realised that if I continued the details would eventually be lost in the Sea of Information Chaos that is a mailing list. :-) I've gone over how to read SMART data 3 separate times in the past 2 months (at work, on a public forum, and in private mail), so this would be the 4th... I'll work on writing an actual HTML document to put up on my web site and will respond with the URL once I finish it. Sorry for the "yeah sure I can help you read this data" response followed by what will probably be labelled as an excuse by some. Admittedly reading the output is pretty simple, but "getting familiar" with what the output looks like (on a per-vendor basis) takes exposure to all sorts of drives, ditto with F/W bugs and so on. In general though, don't let anyone tell you SMART is worthless. The "overall health assessment" status is generally worthless, but the per-attribute data is of great use. Don't let anyone tell you the weighted/adjusted values (VALUE/WORST/THRESH) are useless either; in some cases they're all you can safely rely on. Don't damn SMART when it's actually the manufacturers which need to be spanked for setting such unreasonable health failure thresholds. -- | Jeremy Chadwick jdc@... | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: SMARTOn Thu, Nov 12, 2009 at 09:44:28AM -0800, Jeremy Chadwick wrote:
> On Thu, Nov 12, 2009 at 01:25:12PM +0100, Ivan Voras wrote: > > Jeremy Chadwick wrote: > > > > >I can teach you how to decode/read SMART statistics correctly. > > > > Actually, it would be good if you taught more than him :) > > > > I've always wondered how important are each of the dozen or so > > statistics and what indicates what... > > I'll work on writing an actual HTML document to put up on my web site > and will respond with the URL once I finish it. Isn't this sufficient? http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes If not, could you make the changes on wikipedia? This isn't a FreeBSD-specific topic, and the larger community would benefit from such documentation. -- Rick C. Petty _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
|
|
Re: /bin/sh core dumps on FreeBSD 7.2* Jeremy Chadwick <freebsd@...> [2009-11-12]:
> On Thu, Nov 12, 2009 at 11:33:08AM +0100, Hans F. Nordhaug wrote: > > Suddenly /bin/sh started to crash all the time with core dumps. I'm > > running FreeBSD 7.2-RELEASE-p4 (i386) and I have not updated anything > > lately. The /bin/sh binary seems to be untouched. It might be some > > hardware trouble, but the machine seems to run OK now. (I had to > > replace /bin/sh with a symlink to /rescue/sh.) > > > > I would like to track down the problem, but running sh I only get > > "Segmentation fault: 11 (core dumped)". I would be happy to run > > gdb and give you a backtrace. Any clues? > > > > PS! I tried to run "freebsd-update IDS" to see if any files are > > broken, but it stops at > > Inspecting system... sha256: ///boot/kernel/utopia.ko.symbols: Input/output error > > Hardware problem. Take your pick: bad RAM, bad hard disk, bad > motherboard, bad PSU, bad cabling. > > You can rule out hard disk problems by installing smartmontools from > ports (sysutils/smartmontools). Please provide output from the > following command: > > smartctl -a /dev/{disk} Thx for the infp about smartmontools. The only problem I found was: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 001 001 045 Old_age Always FAILING_NOW 253 Don't know if this is a serious problem. Hans PS! The disk is of type Model Family: Western Digital Caviar Second Generation Serial ATA family Device Model: WDC WD2500JS-55NCB1 _______________________________________________ freebsd-stable@... mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@..." |
| Free embeddable forum powered by Nabble | Forum Help |