28
Watchdog Timers Watchdog Timers Jeffrey Schwentner EEL6897, Fall 2007

Watchdog Timers

Embed Size (px)

DESCRIPTION

Watchdog Timers. Jeffrey Schwentner EEL6897, Fall 2007. Software Reliability. Embedded systems must be able to cope with both hardware and software anomalies to be truly robust. In many cases, embedded devices operate in total isolation and are not accessible to an operator. - PowerPoint PPT Presentation

Citation preview

Page 1: Watchdog Timers

Watchdog TimersWatchdog TimersJeffrey SchwentnerEEL6897, Fall 2007

Page 2: Watchdog Timers

Software ReliabilitySoftware Reliability

Embedded systems must be able to cope with both hardware and software anomalies to be truly robust.

In many cases, embedded devices operate in total isolation and are not accessible to an operator.

Manually resetting a device in this scenario when its software “hangs” is not possible.

In extreme cases, this can result in damaged hardware or loss of life and incur significant cost impact.

Page 3: Watchdog Timers

The ClementineThe Clementine

In 1994, a deep space probe, the Clementine, was launched to make observations of the moon and a large asteroid (1620 Geographos).

After months of operation, a software exception caused a control thruster to fire for 11 minutes, which depleted most of the remaining fuel and caused the probe to rotate at 80 RPM.

Control was eventually regained, but it was too late to successfully complete the mission.

Page 4: Watchdog Timers

Watchdog TimersWatchdog TimersWhile it is not possible to cope with all hardware

and software anomalies, the developer can employ the use of watchdog timers to help mitigate the risks.

A watchdog timer is a hardware timing device that triggers a system reset, or similar operation, after a designated amount of time has elapsed.

A watchdog timer can be either a stand-alone hardware component or built into the processor itself.

To avoid a reset, an application must periodically reset the watchdog timer before this interval elapses. This is also known as “kicking the dog”.

Page 5: Watchdog Timers

External WatchdogsExternal WatchdogsExternal watchdog timers are integrated circuits

that physically assert the reset pin of the processor.

The Processor must assert an output pin in some fashion to reset the timing mechanism of the watchdog.

This type of watchdog is generally considered the most appropriate because of the complete independence of the watchdog from the processor.

Some external watchdogs feature a “windowed” reset.

◦ Enforces timing constraints for a proper watchdog reset.

◦ Minimizes likelihood of errant software resetting the watchdog.

Page 6: Watchdog Timers

External Watchdog External Watchdog SchematicSchematic

Page 7: Watchdog Timers

Windowed Watchdog Windowed Watchdog OperationOperation

Maxim Maxim MAX6323MAX6323

Page 8: Watchdog Timers

Internal WatchdogsInternal WatchdogsMany processors and microcontrollers have built-

in watchdog circuitry available to the programmer.

This typically consists of a memory-mapped counter that triggers a non-maskable interrupt (NMI), or reset, when the counter reaches a predefined value.

Instead of issuing a reset via an I/O pin assertion, an internal counter of reset to an initial value.

Watchdog configuration is controlled user software.

Watchdog may even be used as a general purpose timer in some cases.

Page 9: Watchdog Timers

Internal Watchdog ConsiderationsInternal Watchdog Considerations

Internal watchdogs are not as “safe” as watchdog circuits external to the processor.◦ Watchdogs that issue a NMI instead of a reset may not

properly reinitialize the system.◦ Watchdog control registers may be inadvertently

overwritten by runaway code, disabling the watchdog all together.

◦ Reset is limited to the processor itself (no outside peripherals).

To circumvent these issues, most built-in watchdogs have extra safety-steps designed to prohibit errant code from interfering with the operation of the watchdog timer.

On-chip solutions have a significant cost and space advantage over their external counterparts.

Page 10: Watchdog Timers

MSP430 WatchdogMSP430 WatchdogTexas Instruments MSP430 family of

microcontrollers has a built-in 16-bit watchdog timer featuring:

◦ Configurable clock source and prescaler

◦ Two interrupt options (Reset or NMI)

◦ Isolated watchdog counter

Access to the watchdog counter requires a unique binary code, or password. ◦ The code must be written to the password

register prior to resetting watchdog timer.◦ An invalid password attempt causes a key

violation interrupt.

Page 11: Watchdog Timers

MSP430 Watchdog Timer Block DiagramMSP430 Watchdog Timer Block Diagram

Page 12: Watchdog Timers

Design ConsiderationsDesign ConsiderationsThe effectiveness of the watchdog is a function

of how it is used within the application software.

Simply issuing a watchdog reset in every iteration of the program loop may be insufficient.

Take a more proactive approach.

◦ Periodically assess the state and health of the system. Only issue a reset if all processes are deemed normal.

◦ Employ a state-based approach when resetting the watchdog timer.

◦ Should a watchdog failure occur, provide an indication and/or capture debugging information.

Page 13: Watchdog Timers

System Health System Health AssessmentAssessmentAs the size and complexity of software

increases, so does the likelihood of introducing code that may be detrimental to the system.

Software may not be the only cause of system invalidation. A spike in the power supply, for example, may corrupt data in memory, or even system registers (program counter, stack pointer, etc).

Check for things like stack overflows and validate memory wherever possible.

Page 14: Watchdog Timers

System Health System Health AssessmentAssessment If the state of the system is compromised,

let the watchdog timer perform the reset. This is a better approach than an application “pseudo-reset”.

Watchdog timers, themselves, can also adversely affect the system.

◦ Setting a watchdog interval too short will generate a premature reset.

◦ If a critical section of code takes 80 milliseconds to complete, do not set the watchdog interval for 60 milliseconds.

Page 15: Watchdog Timers

State-based WatchdogState-based WatchdogTo guarantee that the software executes as

intended, incorporate a simple state machine.◦ This involves adjusting a state variable at

the beginning of a program iteration.◦ Prior to resetting the watchdog timer at the

end of the program iteration, verify that the state is “correct”.

Prevents random code from wandering into the main loop and kicking the dog.

Enforces a constraint on program sequence.

Page 16: Watchdog Timers

State-based Watchdog State-based Watchdog ExampleExamplevoid watchdog_state_advance(void){ g_usWatchdogState += 0x1111;}void watchdog_state_validate(void){ g_usWatchdogStatePrev += 0x1111;

if(g_usWatchdogState != g_usWatchdogStatePrev) { // State is invalid, allow watchdog to reset. SLEEP(); } else { // Reset the watchdog timer. WDT_RESET(); }}

Note: Repeated calls to the validate function will cause a watchdog reset.

Page 17: Watchdog Timers

Debugging InformationDebugging Information If software detects a fault condition, log the

error information prior to allowing the watchdog to reset the system.◦ Allows the cause of the failure to be addressed.◦ A report of the error should be attempted when the

system resets (part of initialization perhaps). In addition to reporting errors after reset, it

is a good idea to indicate that the device has been reset.◦ If the software was unable to catch the error, it will

still attempt to notify of the reset event.◦ Systems that appear “sluggish” may actually be

experience frequent watchdog resets.

Page 18: Watchdog Timers

Single-threaded Single-threaded ImplementationImplementationSingle-threaded implementations should reset the

watchdog timer in the main software loop.

To determine the proper watchdog timeout duration, the programmer must determine the amount of time it takes to execute the code, using worst case scenarios.

◦ Many systems do not require “tight” timing.

◦ In these cases, setting the timeout to a very large safe value may be acceptable, just to provide a protection against deadlocks.

Prior to resetting the watchdog, verify that the state of the system is valid, and system health is normal.

Page 19: Watchdog Timers

Single-threaded ExampleSingle-threaded Examplemain(void) { hwinit(); for (;;) { watchdog_state_advance();

read_sensors(); control_motor(); display_status();

if(system_check() == S_OK) { // Kick the dog. watchdog_state_validate() } else { flash_led(); report_error(E_FAIL); } }}

Page 20: Watchdog Timers

Multi-threaded Multi-threaded ImplementationImplementationThe same concepts used in a single-threaded

design are also applicable for multi-threaded implementations.

Avoid creating a thread that simply resets the watchdog timer at regular intervals.

◦ Other threads could fail, and the watchdog thread would keep kicking the dog.

Generate a set of flags or data from each thread that can be validated in a “monitoring” thread.

◦ The monitoring thread should reset the watchdog at regular intervals only if the data produced by the other threads is acceptable.

Page 21: Watchdog Timers

Multi-threaded MonitoringMulti-threaded Monitoring

Monitoring TaskMonitoring Task

System TasksSystem Tasks

Page 22: Watchdog Timers

Multi-threaded FrequencyMulti-threaded FrequencyAn important criteria that can be used to

validate the health of the system is the execution frequency of the worker threads.

This can be accomplished by incorporating a simple counter that is incremented on each iteration of a worker thread.

These counters are then monitored and compared with threshold values from the monitoring thread.◦ If the execution frequency of the monitoring task is

significantly greater, the monitoring task can perform a thresholds

◦ This allows the software to validate timing constraints.

Page 23: Watchdog Timers

Multi-threaded Example: System Multi-threaded Example: System ThreadsThreadsthread_read_sensor(void) { for (;;) { read_sensors(); thread_sensor_cnt++; sleep(50); }}

thread_control_motor(void) { for (;;) { control_motor(); thread_motor_cnt++; sleep(100); }}

thread_display_status(void) { for (;;) { display_status (); thread_display_cnt++; sleep(125); }}

Note: Each thread maintains a unique execution counter.

Page 24: Watchdog Timers

Multi-threaded Example: Monitoring Multi-threaded Example: Monitoring ThreadThreadmain(void) { hwinit(); launch_threads(); for (;;) { watchdog_state_advance();

if(system_check() == S_OK && thread_sensor_cnt > 18 && thread_sensor_cnt < 22 && thread_motor_cnt > 8 && thread_motor_cnt < 12 && thread_display_cnt > 6 && thread_display_cnt < 10) { // Kick the dog. watchdog_state_validate() // Reset counters. thread_sensor_cnt = 0; thread_motor_cnt = 0; thread_display_cnt = 0; }

else { flash_led(); report_error(E_FAIL); } // Sleep monitoring task for 1 sec. sleep(1000); }}

Note: The relative frequency of each system thread is checked here. A small window is applied to each.

Page 25: Watchdog Timers

Mars PathfinderMars Pathfinder In July of 1997, a priority inversion occurred on the

Mars Pathfinder mission, after the craft had landed on the Martian surface.

A high priority communications task was forced to wait on a mutex held by a lower priority “science” task.

The timing of the software was compromised, and a system reset issued by its watchdog timer brought the system back to normal operating conditions.

On Earth, scientists were able to identify the problem and upload new code to fix the problem.

Thus, the rest of the $265 million dollar mission could be completed successfully.

Page 26: Watchdog Timers

ConclusionConclusionWatchdog timers can add a great deal of reliability

to embedded systems if used properly.

To do so requires a good overall approach. Resetting the watchdog timer must be part of the overall design.

Verify the operation integrity of the system, and use this as a criteria for resetting the watchdog timer.

In addition to validating that the software “does the right thing”, verify that it does so in the time expected.

Assume the software will experience a hardware malfunction or software fault. Add enough debugging information to help debug situation.

Page 27: Watchdog Timers

Questions ?Questions ?

Page 28: Watchdog Timers

[1] Barr, M. (2001). Introduction to Watchdog Timers, http://www.netrino.com/Publications/Glossary/WatchdogTimer.php

[2] Barr, M. (2002). Introduction to Priority Inversion, http://www.netrino.com/Publications/Glossary/PriorityInversion.php

[3] Gansel, J. (2004, January). Great Watchdogs, http://darwin.bio.uci.edu/~sustain/bio65/Titlpage.htm

[4] Murphy, N. Watchdog Timers, Embedded Systems Programming, http://www.embedded.com/2000/0011/0011feat4.htm

[5] Maxim Integrated Products, Inc. (2005, December). Supervisory Circuits with Windowed (Min/Max) Watchdog and Manual Reset,http://datasheets.maxim-ic.com/en/ds/MAX6323-MAX6324.pdf

[6] Texas Instruments, Inc. (2006). MSP430x1xx Family User’s Guide, http://focus.ti.com/lit/ug/slau049f/slau049f.pdf

ReferencesReferences