Watchdog Debugging

View: New views
13 Messages — Rating Filter:   Alert me  

Watchdog Debugging

by Tim DeBaillie :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was thinking the other day that it would be useful to be able to
determine which thread / interrupt caused a watchdog.  We have done this
in our offices before by toggling some output lines after every Yield,
Sleep, EventWait, EventPost, fflush, and other OS functions that yield.  
I have even recompiled the OS before to do the same thing so that I
could trace it down well enough.

One idea that crossed my mind would be very simple to implement across
the OS and user code.  If you could assign a specific piece of memory
(say 4 bytes of high heap memory) to keep thread flags, upon reboot,
your program could detect a watchdog reboot and then report the 4 bytes
back to the user.

The only thing that really keeps this from being super simple to
implement is that it requires making sure the heap manager (NutHeapAlloc
/ malloc) NEVER uses this area of memory.  I'm sure we could rewrite the
DEBUG version of the OS to do that!

Any thoughts or improvements out there in the ether?

Another thought looking through the OS, the _putf function in crt/putf.c
should probably have a pointer to a structure that includes all of the
variables that it declares.  It could dynamically allocate and free the
memory needed for this structure.  This is the part of fprintf that
"explodes" your memory usage and causes oh so many stack overflows.

Thanks,

Tim DeBaillie
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Nathan Moore-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I don't see why you would consider this memory area to be on the heap in the
first place.
I tried something similar by chopping down the size of RAM that the Os was
configured to use,
and using that extra for keeping a thread history (last 10 threads), but
something funky kept
happening to that RAM -- either a bug in my code or something the Os did
that I didn't know about.

I did a similar thing to your first suggestion by each thread structure a
bit pattern to for
IO lines while that thread was running, but resources (available io) limited
my ability to go
beyond a pattern for each thread.

Nathan
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Bernard Fouché :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Timothy M. De Baillie wrote:
>
> Any thoughts or improvements out there in the ether?
>  
What about:

- with a configuration option, remove usage of the hardware watchdog and
provide a software one, based on a available timer/counter. When the
counter reaches its max value, an interrupt is fired. One must choose a
max value that makes the timer/counter with a timing equals to the real
watchdog (or the closest possible).

- the function to reset the hardware watchdog then just reset the
timer/counter.

- the interrupt fired when the counter reaches its max value can write
information to EEPROMs, banked RAM, on the serial port, etc.

I use a similar scheme on targets without any OS (for debugging), or if
I the maximum allowed time by the hardware watchdog is too short for
some processing: in that case I have to reset the hardware watchdog at
different points in the code but I keep the software watchdog running
with a longer period and reset it only in the main application cycle
(again on targets without OS and in that case this system is kept even
when debugging is not needed but only to ensure to have some kind of
watchdog).

Regards,

 Bernard

_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Ethernut :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Moore wrote:
> I don't see why you would consider this memory area to be on the heap in the
> first place.
> I tried something similar by chopping down the size of RAM that the Os was
> configured to use,
> and using that extra for keeping a thread history (last 10 threads), but
> something funky kept
> happening to that RAM -- either a bug in my code or something the Os did
> that I didn't know about.

Without reading the details of the original post, I may be able to help
here. Depending on the platform, the runtime initialization may make use
of the stack before/while entering NutInit. To reserve some RAM on the
top, it may be required to inform the linker as well.
GNU AVR linker option --defsym,__stack=0x10FF
GNU ARM linker script section MEMORY (reducing the len parameter of the ram)

Harald


_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Nathan Moore-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 24, 2008 at 12:31 PM, Harald Kipp <harald.kipp@...> wrote:

> Nathan Moore wrote:
> > I don't see why you would consider this memory area to be on the heap in
> the
> > first place.
> > I tried something similar by chopping down the size of RAM that the Os
> was
> > configured to use,
> > and using that extra for keeping a thread history (last 10 threads), but
> > something funky kept
> > happening to that RAM -- either a bug in my code or something the Os did
> > that I didn't know about.
>
> Without reading the details of the original post, I may be able to help
> here. Depending on the platform, the runtime initialization may make use
> of the stack before/while entering NutInit. To reserve some RAM on the
> top, it may be required to inform the linker as well.
> GNU AVR linker option --defsym,__stack=0x10FF
> GNU ARM linker script section MEMORY (reducing the len parameter of the
> ram)


Wouldn't changing the upper limit in the configurator result in this being
done?
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Nathan Moore-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 24, 2008 at 12:15 PM, Bernard Fouché <bernard.fouche@...
> wrote:

> Timothy M. De Baillie wrote:
> >
> > Any thoughts or improvements out there in the ether?
> >
> What about:
>
> - with a configuration option, remove usage of the hardware watchdog and
> provide a software one, based on a available timer/counter. When the
> counter reaches its max value, an interrupt is fired. One must choose a
> max value that makes the timer/counter with a timing equals to the real
> watchdog (or the closest possible).
>
> - the function to reset the hardware watchdog then just reset the
> timer/counter.
>
> - the interrupt fired when the counter reaches its max value can write
> information to EEPROMs, banked RAM, on the serial port, etc.
>

Just keep in mind that if a hang-up happens within an ISR or critical
section
this method won't catch it.

NutEnterCritical();
for(i = 0; (i = 4); i++) {
   f(i);
}
NutExitCritical();

Nathan
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Ethernut :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Moore wrote:
> On Wed, Sep 24, 2008 at 12:31 PM, Harald Kipp <harald.kipp@...> wrote:
>> GNU AVR linker option --defsym,__stack=0x10FF
>> GNU ARM linker script section MEMORY (reducing the len parameter of the
>> ram)
>
>
> Wouldn't changing the upper limit in the configurator result in this being
> done?

Until today the Configurator modifies a few compiler/linker options
only, mainly compiler option -D.

On GNU AVR the standard linker scripts are used, which will set the
stack pointer to the end of internal RAM by default. Imagecraft uses the
end of external RAM to initialize the early stack pointer. For the ARM,
the linker script is used.

Harald


_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Nathan Moore-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Sep 24, 2008 at 1:29 PM, Harald Kipp <harald.kipp@...> wrote:

> Nathan Moore wrote:
> > On Wed, Sep 24, 2008 at 12:31 PM, Harald Kipp <harald.kipp@...>
> wrote:
> >> GNU AVR linker option --defsym,__stack=0x10FF
> >> GNU ARM linker script section MEMORY (reducing the len parameter of the
> >> ram)
> >
> >
> > Wouldn't changing the upper limit in the configurator result in this
> being
> > done?
>
> Until today the Configurator modifies a few compiler/linker options
> only, mainly compiler option -D.
>
> On GNU AVR the standard linker scripts are used, which will set the
> stack pointer to the end of internal RAM by default.


Ok, we were using the end of external RAM on AVR with GCC.
It did turn out that the problem I was looking for also had it's way with
RAM before killing the
board, so it's likely that that is what caused that area of RAM to be
overwritten.

Nathan
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Ethernut :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Timothy M. De Baillie wrote:

> One idea that crossed my mind would be very simple to implement across
> the OS and user code.  If you could assign a specific piece of memory
> (say 4 bytes of high heap memory) to keep thread flags, upon reboot,
> your program could detect a watchdog reboot and then report the 4 bytes
> back to the user.

If you search in the Configurator for NUTMEM_RESERVED, you'll find a
default of 64, which creates an array in arch/avr/os/nutinit.c:
uint8_t nutmem_onchip[NUTMEM_RESERVED];

If I remember correctly, this had been used to reserve some memory in
internal AVR RAM, which can be used while manipulating the address
lines. One application is to access the hidden external RAM, that
overlaps the internal addresses. The implementation is so awful, that I
would like to delete it immediately.

However, the right way may be to put nutmem_onchip in a different
segment. I remember, that avr-libc offers an uninitialized data segment,
which won't be touched by the runtime initialization. If we manage to
force it into internal RAM, this array could provide both features.

In any case I agree, that your suggestions would be most helpful for
debugging.

Harald


_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Tim DeBaillie :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Moore wrote:

> On Wed, Sep 24, 2008 at 12:15 PM, Bernard Fouché <bernard.fouche@...
>  
>> wrote:
>>    
>
>  
>> Timothy M. De Baillie wrote:
>>    
>>> Any thoughts or improvements out there in the ether?
>>>
>>>      
>> What about:
>>
>> - with a configuration option, remove usage of the hardware watchdog and
>> provide a software one, based on a available timer/counter. When the
>> counter reaches its max value, an interrupt is fired. One must choose a
>> max value that makes the timer/counter with a timing equals to the real
>> watchdog (or the closest possible).
>>
>> - the function to reset the hardware watchdog then just reset the
>> timer/counter.
>>
>> - the interrupt fired when the counter reaches its max value can write
>> information to EEPROMs, banked RAM, on the serial port, etc.
>>
>>    
>
> Just keep in mind that if a hang-up happens within an ISR or critical
> section
> this method won't catch it.
>
> NutEnterCritical();
> for(i = 0; (i = 4); i++) {
>    f(i);
> }
> NutExitCritical();
>
> Nathan
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
>  
Sure it will.

void MyInterrupt(void * arg){

    //save the interrupt threads information
    unsigned int thread_information = _my_memory_location;  

    //set the memory location to the interrupts flag
    _my_memory_location = MY_INTERRUPT_FLAG;

    for(;;); //causes lockup

    //under normal circumstances, you would then set the interrupted
thread back again before returning from the interrupt
    _my_memory_location = thread_information;

}

Now I know that's not 100% reliable, but should get the job done.

Criticals don't change threads (or shouldn't) so you don't have to do
anything special.

Tim
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Tim DeBaillie :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bernard Fouche' wrote:

> Timothy M. De Baillie wrote:
>  
>> Any thoughts or improvements out there in the ether?
>>  
>>    
> What about:
>
> - with a configuration option, remove usage of the hardware watchdog and
> provide a software one, based on a available timer/counter. When the
> counter reaches its max value, an interrupt is fired. One must choose a
> max value that makes the timer/counter with a timing equals to the real
> watchdog (or the closest possible).
>
> - the function to reset the hardware watchdog then just reset the
> timer/counter.
>
> - the interrupt fired when the counter reaches its max value can write
> information to EEPROMs, banked RAM, on the serial port, etc.
>
> I use a similar scheme on targets without any OS (for debugging), or if
> I the maximum allowed time by the hardware watchdog is too short for
> some processing: in that case I have to reset the hardware watchdog at
> different points in the code but I keep the software watchdog running
> with a longer period and reset it only in the main application cycle
> (again on targets without OS and in that case this system is kept even
> when debugging is not needed but only to ensure to have some kind of
> watchdog).
>
> Regards,
>
>  Bernard
>
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
>  
This isn't a BAD solution. I think it is worth considering.

Tim
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Nathan Moore-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tim,
My remark that it wouldn't work was referring to the inability of a software
based
watchdog's timer ISR to interrupt critical sections or other ISRs.
A hardware watchdog will, and when debugging on my desk with a debugger
lock up asserts are great.

Nathan
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion

Re: Watchdog Debugging

by Bernard Fouché :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Nathan Moore wrote:
>
> Just keep in mind that if a hang-up happens within an ISR or critical
> section
> this method won't catch it.
>  
Sure, this is not a perfect solution, it's just a tool among others that
can help in some situations.

I've also tried other ways, for instance by storing into into RAM not
just a few bytes but a circular buffer with some debug info, to be able
to retrieve an history of the events that made the whole system fails
because sometimes it is not a particular function that is broken, the
flaw being in the way a chain of events is handled.

Also one has to keep in mind that  things that first appear as being a
hardware watchdog reset may be related to brown out detection (or any
other hardware related origin) and before spending time digging into the
software, one must be sure that the hardware is totally clean. (Some MCU
allows one to know the previous reset origin)

There are also situations where the hardware watchdog fires because some
processing is too long by a very little bit, or because processing
depends on events external to the MCU, and changing the application
timing with debug features makes the problem disappear or happen
earlier. (sometimes a very difficult case to fix)

At last using a versioning system like CVS or SVN may help a lot: if you
are serious in your commit/comment policy, you have a big advantage when
bizarre things start to show up in a project that was previously working
correctly. (In my experience this is one of the most efficient debug
tool whatever the kind of the bug being chased)

So I think that there is no miracle solution that handles all possible
cases but a set of tools/habits/experience to choose from.

  Bernard
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion