22
Continuously Recording Program Execution for Deterministic Replay Debugging

Continuously Recording Program Execution for Deterministic Replay Debugging

  • Upload
    roxy

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Continuously Recording Program Execution for Deterministic Replay Debugging. Motivation. Current Scenario Increasing Software Complexity Difficult to guarantee correctness Released software contain bugs Problem Bugs manifest at customer site Difficult to reproduce bugs at developer site - PowerPoint PPT Presentation

Citation preview

Page 1: Continuously Recording Program Execution for Deterministic Replay Debugging

Continuously Recording Program Execution for Deterministic Replay Debugging

Page 2: Continuously Recording Program Execution for Deterministic Replay Debugging

Motivation Current Scenario

Increasing Software Complexity Difficult to guarantee correctness Released software contain bugs

Problem Bugs manifest at customer site Difficult to reproduce bugs at developer site

Solution Continuously record information about program execution, even

during production runs SW vs. HW for Recording

Page 3: Continuously Recording Program Execution for Deterministic Replay Debugging

Conventional Debugging

Examine core dump Developer can examine final system state just before the crash Very challenging to determine the root cause

Customer site (or even during testing)

Debugging at developer’s site

Core dump

Core dump

Program ExecutionBug source

Crash

Page 4: Continuously Recording Program Execution for Deterministic Replay Debugging

Deterministic Replay Debugging

Developer Site

Crash

Replay

Replay Window

Program Execution

Bug occurs

Crash

Customer Site

ContinuousRecording

What is Deterministic Replay?

Executing same sequence of instructions with same input operandslike in original execution

Page 5: Continuously Recording Program Execution for Deterministic Replay Debugging

Deterministic Replay Debugging

Developer Site

Crash

Replay

Replay Window

Program Execution

Bug occurs

Crash

Customer Site

ContinuousRecording

Deterministic Replay Debugging

• Debugger can examine variable values

•Helps figuring out root cause of bug

•Reproduce even non-deterministic Bugs

Page 6: Continuously Recording Program Execution for Deterministic Replay Debugging

Overview

Program ExecutionCheckpointCheckpoint

Interval~10 million instr

Log Header• Program Counter• Arch Register Values• Process ID, Thread ID• Checkpoint ID• ..

Load Values

Only output of loads need to be logged

Input and output values of other instructions can be regenerated during replay

Page 7: Continuously Recording Program Execution for Deterministic Replay Debugging

First Load Log

Log load value only if the load is the first memory access to a location

HW Support: “FLL bits” for every

word in L1 and L2 caches Reset at the

beginning of a checkpoint interval

Set on access

Program Execution

First LoadLog (FLL)

Load A

Load A

Load B

Load A

Logged Not Logged

Page 8: Continuously Recording Program Execution for Deterministic Replay Debugging

Load B

First Load Log

Store values neverneed to be logged

Regenerated during replay

Load A

Store B

Program Execution

First LoadLog (FLL)

PROBLEMS

Memory location can be modified by stores in

• Interrupts, system calls

• Other threads in multithreaded programs

• DMA transfers

Page 9: Continuously Recording Program Execution for Deterministic Replay Debugging

Interrupts

Interrupt,System Call,Context Switch

PrematurelyTerminate checkpoint(FLL bits are reset)

New checkpoint startedAfter servicing interrupt(Start logging First loads)

Interrupts, system calls, I/O, DMA NOT tracked

BUT any values consumed later by the application will be logged, ON DEMAND, in the new checkpoint

Page 10: Continuously Recording Program Execution for Deterministic Replay Debugging

Support for Multi-threaded Programs

Page 11: Continuously Recording Program Execution for Deterministic Replay Debugging

Assumptions for Multithreaded Programs

Shared Memory Multi-threaded processors

Sequential Consistency Memory operations form a total order

Directory based Cache Coherence protocol

Page 12: Continuously Recording Program Execution for Deterministic Replay Debugging

Shared Memory Communication

Thread 1

Processor 1

A First Load Log (FLL) for each thread is collected locally

Problem : Shared memory

communication between threads Affects First Load

optimization

Thread 2

Processor 2

Page 13: Continuously Recording Program Execution for Deterministic Replay Debugging

Shared Memory CommunicationThread 1 Thread 2

Store A

Time

Load A

Load A

Load AInvalidate MessageResets FLL (First-Load Log) bitsfor the word A in Thread 1

Logged Not Logged

DMA are handled similarly as they use same coherence protocol

Page 14: Continuously Recording Program Execution for Deterministic Replay Debugging

Independently Replaying Threads

Thread 1 A thread can be replayed using its local FLL, independent of other threads

FLL checkpoints in different threads need not begin at the same time Prematurely terminating

checkpoints for interrupts becomes easier

Thread 2

Processor 1 Processor 2

Page 15: Continuously Recording Program Execution for Deterministic Replay Debugging

Logging Memory Order

Infer and debug data races Log order of memory operations executed across all the threads

SW vs. HW Locks associated with data locations Piggyback coherence replies with execution states (Thread-ID,

Checkpoint-ID, Inst Count) of sender thread

Page 16: Continuously Recording Program Execution for Deterministic Replay Debugging

Memory Race LogThread X Thread Y

Resets first-loadbit for A

ICx Store AInvalidate

Invalidate Ack (Y, CP_ID1, ICy)

For Thread XLog (ICx, Y, CP_ID1, ICy)

Will be used to determineorder of Store A wrt memory operations in other threads

CP_ID 1

CP_ID 2

CP_ID 3

CP_ID 1

CP_ID 2

Executing STORE

(ICx)

Page 17: Continuously Recording Program Execution for Deterministic Replay Debugging

Memory Race LogThread X Thread Y

ICy Load AWrite update request

(X, CPId3, ICx)

For Thread YLog (ICy, X, CPid3, ICx)

cid 3

cid 4

cid 5

cid 3

cid 4

Write update reply

Executing LOAD

(ICy)

Page 18: Continuously Recording Program Execution for Deterministic Replay Debugging

Checkpoint Management

Oldest checkpoint discarded when allocated main memory space is full

Checkpoint Interval length chosen based on available main memory space

Tradeoff

Smaller the checkpoint interval lesser the information loss when a checkpoint is discarded

Larger the checkpoint interval lesser the information/instruction that need to be logged Reason: First-Load optimization

Page 19: Continuously Recording Program Execution for Deterministic Replay Debugging

How to replay a checkpoint?

Replay using a functional simulator – eg: Simics Can be integrated into conventional debuggers

Steps: Load the binaries into the same address locations like in the

original location

Initialize state of PC and architectural registers

Start emulating instructions

For first loads, get the value from FLL, else get value from simulated memory

Core Dump Not Required

Page 20: Continuously Recording Program Execution for Deterministic Replay Debugging

Replay Window Length

Program Execution

Execution ofLatest instance of“buggy instruction”

Crash

Lower Bound on Replay Window Length

Number of dynamic instructions between the latest execution of the buggy instruction and the crash

Page 21: Continuously Recording Program Execution for Deterministic Replay Debugging

Limitation

Debugging ability

Debugging OS code not possible BUT, memory values modified during interrupts, I/O and DMA will be

captured in FLL Hence, the application with limited interactions with OS can be

debugged

No Core Dump Values of data structures untouched during replay window are

unknown BUT, values responsible for bug can be found in the log or

reproduced during replay if the replay window is large enough to capture the source of bug

If a variable is not accessed between the source of bug and the crashthen it should not be a reason for the crash

Page 22: Continuously Recording Program Execution for Deterministic Replay Debugging

Limitation

Replay window not long enough Problem:

Cause of bug lie outside replay window

Reason: Limited storage space -- Depends on amount of main memory

to devote to capture logs