|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Disable alerting for watchdog timer expirationHello all,
I would like to change the default behavior for our Dell servers (mostly blades) to stop alerting at all when the watchdog timer expires. Our HP ProLiant BL460c G1 servers don't alert on timer expiration. I was hoping to see if there was a difference between the configs, but the HP servers don't work with ipmi-pef-config ("Unable to get Number of Alert Policy Entries") and have very few entries in ipmi-sensors, none of which are related to the watchdog. What I would like to happen when a watchdog timer expires: 1) The system will reboot 2) *No* SNMP trap sent by the server itself 3) *No* SNMP trap sent by the chassis (if the server is a blade) 4) *No* event inserted in the SEL 5) *No* amber lights on the server or chassis What I have accomplished: 1) The system will reboot 2) *No* SNMP trap sent by the server itself (the following worked: "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No") The SEL is populated and an alert sent whether the action is to reboot the server or do nothing. What I have tried: I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No": Section 44_OS_Watch ## Possible values: Yes/No Enable_All_Event_Messages No ## Possible values: Yes/No Enable_Scanning_On_This_Sensor No ## Possible values: Yes/No Enable_Assertion_Event_Timer_Expired No ## Possible values: Yes/No Enable_Assertion_Event_Hard_Reset No ## Possible values: Yes/No Enable_Assertion_Event_Power_Down No ## Possible values: Yes/No Enable_Assertion_Event_Power_Cycle No ## Possible values: Yes/No Enable_Deassertion_Event_Timer_Expired No ## Possible values: Yes/No Enable_Deassertion_Event_Hard_Reset No ## Possible values: Yes/No Enable_Deassertion_Event_Power_Down No ## Possible values: Yes/No Enable_Deassertion_Event_Power_Cycle No EndSection This changes the output of ipmi-sensors for that host to: 44 | OS Watch | Watchdog 2 | N/A | N/A | N/A An unmodified host has this: 44 | OS Watch | Watchdog 2 | N/A | N/A | 'OK' After the timer expires, this shows up in the SEL: ID | Date | Time | Name | Type | Event Direction | Event 1 | Feb-01-2012 | 07:39:18 | SEL | Event Logging Disabled | Assertion Event | Log Area Reset/Cleared 2 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog 2 | Assertion Event | Timer expired, status only 3 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog 2 | Assertion Event | Timer expired, status only If I don't disable the SNMP traps from the server for watchdog timer expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout. A blade chassis will always send a trap stating that the blade changed from normal to critical. Any other ideas? Is this something I need to ask Dell about? Thanks, Ryan -- Ryan Cox Systems Administrator Fulton Supercomputing Lab Brigham Young University _______________________________________________ Freeipmi-users mailing list Freeipmi-users@... https://lists.gnu.org/mailman/listinfo/freeipmi-users |
|
|
Re: Disable alerting for watchdog timer expirationOkay... so I figured it out after looking at the IPMI spec.
ipmi-raw 0 6 0x24 0x80 0x01 0x00 0x00 0x96 0x00 The 0x80 is the trick. The bit that is set is a "don't log" bit. That takes care of it properly. The command above uses a 15 second timer, don't log, and hard reset. The information about the fields for the Set Watchdog Timer command are documented at ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev1_0.pdf on page 378. Ryan On 02/01/2012 03:29 PM, Ryan Cox wrote: > Hello all, > > I would like to change the default behavior for our Dell servers > (mostly blades) to stop alerting at all when the watchdog timer > expires. Our HP ProLiant BL460c G1 servers don't alert on timer > expiration. I was hoping to see if there was a difference between the > configs, but the HP servers don't work with ipmi-pef-config ("Unable > to get Number of Alert Policy Entries") and have very few entries in > ipmi-sensors, none of which are related to the watchdog. > > What I would like to happen when a watchdog timer expires: > 1) The system will reboot > 2) *No* SNMP trap sent by the server itself > 3) *No* SNMP trap sent by the chassis (if the server is a blade) > 4) *No* event inserted in the SEL > 5) *No* amber lights on the server or chassis > > What I have accomplished: > 1) The system will reboot > 2) *No* SNMP trap sent by the server itself (the following worked: > "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No") > > The SEL is populated and an alert sent whether the action is to reboot > the server or do nothing. > > What I have tried: > I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No": > Section 44_OS_Watch > ## Possible values: Yes/No > > Enable_All_Event_Messages > No > ## Possible values: Yes/No > > Enable_Scanning_On_This_Sensor > No > ## Possible values: Yes/No > > Enable_Assertion_Event_Timer_Expired > No > ## Possible values: Yes/No > > Enable_Assertion_Event_Hard_Reset > No > ## Possible values: Yes/No > > Enable_Assertion_Event_Power_Down > No > ## Possible values: Yes/No > > Enable_Assertion_Event_Power_Cycle > No > ## Possible values: Yes/No > > Enable_Deassertion_Event_Timer_Expired > No > ## Possible values: Yes/No > > Enable_Deassertion_Event_Hard_Reset > No > ## Possible values: Yes/No > > Enable_Deassertion_Event_Power_Down > No > ## Possible values: Yes/No > > Enable_Deassertion_Event_Power_Cycle > No > EndSection > > This changes the output of ipmi-sensors for that host to: > 44 | OS Watch | Watchdog 2 | N/A | N/A > | N/A > > An unmodified host has this: > 44 | OS Watch | Watchdog 2 | N/A | N/A > | 'OK' > > After the timer expires, this shows up in the SEL: > ID | Date | Time | Name | > Type | Event Direction | Event > 1 | Feb-01-2012 | 07:39:18 | SEL | Event Logging > Disabled | Assertion Event | Log Area Reset/Cleared > 2 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog > 2 | Assertion Event | Timer expired, status only > 3 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog > 2 | Assertion Event | Timer expired, status only > > If I don't disable the SNMP traps from the server for watchdog timer > expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout. A blade > chassis will always send a trap stating that the blade changed from > normal to critical. > > Any other ideas? Is this something I need to ask Dell about? > > Thanks, > Ryan > > > -- > Ryan Cox > Systems Administrator > Fulton Supercomputing Lab > Brigham Young University > > http://tech.ryancox.net _______________________________________________ Freeipmi-users mailing list Freeipmi-users@... https://lists.gnu.org/mailman/listinfo/freeipmi-users |
|
|
Re: Disable alerting for watchdog timer expirationHi Ryan,
Do the options in bmc-watchdog for turning off logging not work? Or perhaps you're using the ipmi kernel driver bmc watchdog? Al On Wed, 2012-02-01 at 16:31 -0800, Ryan Cox wrote: > Okay... so I figured it out after looking at the IPMI spec. > ipmi-raw 0 6 0x24 0x80 0x01 0x00 0x00 0x96 0x00 > > The 0x80 is the trick. The bit that is set is a "don't log" bit. That > takes care of it properly. The command above uses a 15 second timer, > don't log, and hard reset. > > The information about the fields for the Set Watchdog Timer command are > documented at > ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev1_0.pdf on page 378. > > Ryan > > On 02/01/2012 03:29 PM, Ryan Cox wrote: > > Hello all, > > > > I would like to change the default behavior for our Dell servers > > (mostly blades) to stop alerting at all when the watchdog timer > > expires. Our HP ProLiant BL460c G1 servers don't alert on timer > > expiration. I was hoping to see if there was a difference between the > > configs, but the HP servers don't work with ipmi-pef-config ("Unable > > to get Number of Alert Policy Entries") and have very few entries in > > ipmi-sensors, none of which are related to the watchdog. > > > > What I would like to happen when a watchdog timer expires: > > 1) The system will reboot > > 2) *No* SNMP trap sent by the server itself > > 3) *No* SNMP trap sent by the chassis (if the server is a blade) > > 4) *No* event inserted in the SEL > > 5) *No* amber lights on the server or chassis > > > > What I have accomplished: > > 1) The system will reboot > > 2) *No* SNMP trap sent by the server itself (the following worked: > > "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No") > > > > The SEL is populated and an alert sent whether the action is to reboot > > the server or do nothing. > > > > What I have tried: > > I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No": > > Section 44_OS_Watch > > ## Possible values: Yes/No > > > > Enable_All_Event_Messages > > No > > ## Possible values: Yes/No > > > > Enable_Scanning_On_This_Sensor > > No > > ## Possible values: Yes/No > > > > Enable_Assertion_Event_Timer_Expired > > No > > ## Possible values: Yes/No > > > > Enable_Assertion_Event_Hard_Reset > > No > > ## Possible values: Yes/No > > > > Enable_Assertion_Event_Power_Down > > No > > ## Possible values: Yes/No > > > > Enable_Assertion_Event_Power_Cycle > > No > > ## Possible values: Yes/No > > > > Enable_Deassertion_Event_Timer_Expired > > No > > ## Possible values: Yes/No > > > > Enable_Deassertion_Event_Hard_Reset > > No > > ## Possible values: Yes/No > > > > Enable_Deassertion_Event_Power_Down > > No > > ## Possible values: Yes/No > > > > Enable_Deassertion_Event_Power_Cycle > > No > > EndSection > > > > This changes the output of ipmi-sensors for that host to: > > 44 | OS Watch | Watchdog 2 | N/A | N/A > > | N/A > > > > An unmodified host has this: > > 44 | OS Watch | Watchdog 2 | N/A | N/A > > | 'OK' > > > > After the timer expires, this shows up in the SEL: > > ID | Date | Time | Name | > > Type | Event Direction | Event > > 1 | Feb-01-2012 | 07:39:18 | SEL | Event Logging > > Disabled | Assertion Event | Log Area Reset/Cleared > > 2 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog > > 2 | Assertion Event | Timer expired, status only > > 3 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog > > 2 | Assertion Event | Timer expired, status only > > > > If I don't disable the SNMP traps from the server for watchdog timer > > expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout. A blade > > chassis will always send a trap stating that the blade changed from > > normal to critical. > > > > Any other ideas? Is this something I need to ask Dell about? > > > > Thanks, > > Ryan > > > > > > -- > > Ryan Cox > > Systems Administrator > > Fulton Supercomputing Lab > > Brigham Young University > > > > http://tech.ryancox.net > > _______________________________________________ > Freeipmi-users mailing list > Freeipmi-users@... > https://lists.gnu.org/mailman/listinfo/freeipmi-users Albert Chu chu11@... Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-users mailing list Freeipmi-users@... https://lists.gnu.org/mailman/listinfo/freeipmi-users |
|
|
Re: Disable alerting for watchdog timer expirationAl,
I had trouble getting bmc-watchdog to work the first few times I tried it and then forgot about it. I ended up using the kernel module, which doesn't have the "don't log" feature. A newer version of bmc-watchdog does work using "-l 1". By the way, the naming of that option is a little confusing since it's called the Set Log Flag and a "1" disables logging. The spec refers to it as "don't log", so it may be better to name it the "Don't Log" flag. Ryan On 02/01/2012 07:27 PM, Al Chu wrote: > Hi Ryan, > > Do the options in bmc-watchdog for turning off logging not work? Or > perhaps you're using the ipmi kernel driver bmc watchdog? > > Al > > On Wed, 2012-02-01 at 16:31 -0800, Ryan Cox wrote: >> Okay... so I figured it out after looking at the IPMI spec. >> ipmi-raw 0 6 0x24 0x80 0x01 0x00 0x00 0x96 0x00 >> >> The 0x80 is the trick. The bit that is set is a "don't log" bit. That >> takes care of it properly. The command above uses a 15 second timer, >> don't log, and hard reset. >> >> The information about the fields for the Set Watchdog Timer command are >> documented at >> ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev1_0.pdf on page 378. >> >> Ryan >> >> On 02/01/2012 03:29 PM, Ryan Cox wrote: >>> Hello all, >>> >>> I would like to change the default behavior for our Dell servers >>> (mostly blades) to stop alerting at all when the watchdog timer >>> expires. Our HP ProLiant BL460c G1 servers don't alert on timer >>> expiration. I was hoping to see if there was a difference between the >>> configs, but the HP servers don't work with ipmi-pef-config ("Unable >>> to get Number of Alert Policy Entries") and have very few entries in >>> ipmi-sensors, none of which are related to the watchdog. >>> >>> What I would like to happen when a watchdog timer expires: >>> 1) The system will reboot >>> 2) *No* SNMP trap sent by the server itself >>> 3) *No* SNMP trap sent by the chassis (if the server is a blade) >>> 4) *No* event inserted in the SEL >>> 5) *No* amber lights on the server or chassis >>> >>> What I have accomplished: >>> 1) The system will reboot >>> 2) *No* SNMP trap sent by the server itself (the following worked: >>> "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No") >>> >>> The SEL is populated and an alert sent whether the action is to reboot >>> the server or do nothing. >>> >>> What I have tried: >>> I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No": >>> Section 44_OS_Watch >>> ## Possible values: Yes/No >>> >>> Enable_All_Event_Messages >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Scanning_On_This_Sensor >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Assertion_Event_Timer_Expired >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Assertion_Event_Hard_Reset >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Assertion_Event_Power_Down >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Assertion_Event_Power_Cycle >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Deassertion_Event_Timer_Expired >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Deassertion_Event_Hard_Reset >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Deassertion_Event_Power_Down >>> No >>> ## Possible values: Yes/No >>> >>> Enable_Deassertion_Event_Power_Cycle >>> No >>> EndSection >>> >>> This changes the output of ipmi-sensors for that host to: >>> 44 | OS Watch | Watchdog 2 | N/A | N/A >>> | N/A >>> >>> An unmodified host has this: >>> 44 | OS Watch | Watchdog 2 | N/A | N/A >>> | 'OK' >>> >>> After the timer expires, this shows up in the SEL: >>> ID | Date | Time | Name | >>> Type | Event Direction | Event >>> 1 | Feb-01-2012 | 07:39:18 | SEL | Event Logging >>> Disabled | Assertion Event | Log Area Reset/Cleared >>> 2 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog >>> 2 | Assertion Event | Timer expired, status only >>> 3 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog >>> 2 | Assertion Event | Timer expired, status only >>> >>> If I don't disable the SNMP traps from the server for watchdog timer >>> expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout. A blade >>> chassis will always send a trap stating that the blade changed from >>> normal to critical. >>> >>> Any other ideas? Is this something I need to ask Dell about? >>> >>> Thanks, >>> Ryan >>> >>> >>> -- >>> Ryan Cox >>> Systems Administrator >>> Fulton Supercomputing Lab >>> Brigham Young University >>> >>> http://tech.ryancox.net >> _______________________________________________ >> Freeipmi-users mailing list >> Freeipmi-users@... >> https://lists.gnu.org/mailman/listinfo/freeipmi-users -- Ryan Cox Systems Administrator Fulton Supercomputing Lab Brigham Young University _______________________________________________ Freeipmi-users mailing list Freeipmi-users@... https://lists.gnu.org/mailman/listinfo/freeipmi-users |
|
|
Re: Disable alerting for watchdog timer expirationAl,
Or I should say that when that bit is 1, it is "don't log". Either way, the usage in bmc-watchdog is as a "don't log" flag where 1 means "don't log". Ryan On 02/01/2012 07:41 PM, Ryan Cox wrote: > Al, > > I had trouble getting bmc-watchdog to work the first few times I tried > it and then forgot about it. I ended up using the kernel module, > which doesn't have the "don't log" feature. A newer version of > bmc-watchdog does work using "-l 1". By the way, the naming of that > option is a little confusing since it's called the Set Log Flag and a > "1" disables logging. The spec refers to it as "don't log", so it may > be better to name it the "Don't Log" flag. > > Ryan > > On 02/01/2012 07:27 PM, Al Chu wrote: >> Hi Ryan, >> >> Do the options in bmc-watchdog for turning off logging not work? Or >> perhaps you're using the ipmi kernel driver bmc watchdog? >> >> Al >> >> On Wed, 2012-02-01 at 16:31 -0800, Ryan Cox wrote: >>> Okay... so I figured it out after looking at the IPMI spec. >>> ipmi-raw 0 6 0x24 0x80 0x01 0x00 0x00 0x96 0x00 >>> >>> The 0x80 is the trick. The bit that is set is a "don't log" bit. That >>> takes care of it properly. The command above uses a 15 second timer, >>> don't log, and hard reset. >>> >>> The information about the fields for the Set Watchdog Timer command are >>> documented at >>> ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev1_0.pdf on >>> page 378. >>> >>> Ryan >>> >>> On 02/01/2012 03:29 PM, Ryan Cox wrote: >>>> Hello all, >>>> >>>> I would like to change the default behavior for our Dell servers >>>> (mostly blades) to stop alerting at all when the watchdog timer >>>> expires. Our HP ProLiant BL460c G1 servers don't alert on timer >>>> expiration. I was hoping to see if there was a difference between the >>>> configs, but the HP servers don't work with ipmi-pef-config ("Unable >>>> to get Number of Alert Policy Entries") and have very few entries in >>>> ipmi-sensors, none of which are related to the watchdog. >>>> >>>> What I would like to happen when a watchdog timer expires: >>>> 1) The system will reboot >>>> 2) *No* SNMP trap sent by the server itself >>>> 3) *No* SNMP trap sent by the chassis (if the server is a blade) >>>> 4) *No* event inserted in the SEL >>>> 5) *No* amber lights on the server or chassis >>>> >>>> What I have accomplished: >>>> 1) The system will reboot >>>> 2) *No* SNMP trap sent by the server itself (the following worked: >>>> "ipmi-pef-config -c -e Event_Filter_17:Enable_Filter=No") >>>> >>>> The SEL is populated and an alert sent whether the action is to reboot >>>> the server or do nothing. >>>> >>>> What I have tried: >>>> I set everything in "ipmi-sensors-config -S 44_OS_Watch" to be "No": >>>> Section 44_OS_Watch >>>> ## Possible values: Yes/No >>>> >>>> Enable_All_Event_Messages >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Scanning_On_This_Sensor >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Assertion_Event_Timer_Expired >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Assertion_Event_Hard_Reset >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Assertion_Event_Power_Down >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Assertion_Event_Power_Cycle >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Deassertion_Event_Timer_Expired >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Deassertion_Event_Hard_Reset >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Deassertion_Event_Power_Down >>>> No >>>> ## Possible values: Yes/No >>>> >>>> Enable_Deassertion_Event_Power_Cycle >>>> No >>>> EndSection >>>> >>>> This changes the output of ipmi-sensors for that host to: >>>> 44 | OS Watch | Watchdog 2 | N/A | N/A >>>> | N/A >>>> >>>> An unmodified host has this: >>>> 44 | OS Watch | Watchdog 2 | N/A | N/A >>>> | 'OK' >>>> >>>> After the timer expires, this shows up in the SEL: >>>> ID | Date | Time | Name | >>>> Type | Event Direction | Event >>>> 1 | Feb-01-2012 | 07:39:18 | SEL | Event Logging >>>> Disabled | Assertion Event | Log Area Reset/Cleared >>>> 2 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog >>>> 2 | Assertion Event | Timer expired, status only >>>> 3 | Feb-01-2012 | 07:39:23 | OS Watch | Watchdog >>>> 2 | Assertion Event | Timer expired, status only >>>> >>>> If I don't disable the SNMP traps from the server for watchdog timer >>>> expiration, I get a trap for DELL-ASF-MIB::asfTrapASRTimeout. A blade >>>> chassis will always send a trap stating that the blade changed from >>>> normal to critical. >>>> >>>> Any other ideas? Is this something I need to ask Dell about? >>>> >>>> Thanks, >>>> Ryan >>>> >>>> >>>> -- >>>> Ryan Cox >>>> Systems Administrator >>>> Fulton Supercomputing Lab >>>> Brigham Young University >>>> >>>> http://tech.ryancox.net >>> _______________________________________________ >>> Freeipmi-users mailing list >>> Freeipmi-users@... >>> https://lists.gnu.org/mailman/listinfo/freeipmi-users > > > -- > Ryan Cox > Systems Administrator > Fulton Supercomputing Lab > Brigham Young University _______________________________________________ Freeipmi-users mailing list Freeipmi-users@... https://lists.gnu.org/mailman/listinfo/freeipmi-users |
| Free embeddable forum powered by Nabble | Forum Help |