Towards Adaptive Caching for Parallel and Distributed Simulation

Preview:

DESCRIPTION

Towards Adaptive Caching for Parallel and Distributed Simulation. Abhishek Chugh & Maria Hybinette Computer Science Department The University of Georgia WSC-2004. Simulation Model Assumptions. Collection of Logical Processes (LPs) Assume LPs do not share state variables - PowerPoint PPT Presentation

Citation preview

Maria Hybinette, UGA 1

Towards Adaptive Caching for

Parallel and Distributed Simulation

Abhishek Chugh & Maria Hybinette

Computer Science Department

The University of Georgia

WSC-2004

Maria Hybinette, UGA 2

Airspace

Atlanta Munich

Simulation Model Assumptions

Collection of Logical Processes (LPs) Assume LPs do not share state variables Communicate by exchanging time stamped

messages

LP

LPLP

LP

LP

Maria Hybinette, UGA 3

Problem & Goal

Problem:Inefficiency in PDES: Redundant computations

Observation:Computations repeat: » Long run of simulations» Cyclic Systems» Communication network simulations

Goal:Increase efficiency by reusing computations

Maria Hybinette, UGA 4

LPLPLPLPMsgMsgMsg

Cache

Approach

Cache computations and re-use when they repeat instead of re-compute.

Msg Msg Msg

Msg

Msg MsgMsg LP

Msg LP

Msg

Msg

LP

LP

Msg

Maria Hybinette, UGA 5

Approach: Adaptive Caching

Cache computations and re-use when they repeat instead of re-compute.

Generic caching mechanism independent of simulation engine and application

Caveat: Different factors that impact the effectiveness of caching

» Proposal: An adaptive approach

Msg LP

Msg LP

Msg

Msg

LP

LP

Cache

Maria Hybinette, UGA 6

Factors Affecting Caching Effectiveness

Cache size Cost of looking up into the cache and

updating cache Execution time of the computation Probability of a hit: Hit rate

Maria Hybinette, UGA 7

Effective Caching Cost

E(Costuse_cache) =

hit_rate * Costlookup_hit

+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)

Maria Hybinette, UGA 8

Caching is Not Always a Good Idea

E(Costuse_cache) =

hit_rate * Costlookup_hit

+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)

Hit rate low, or Very fast computation Only when Costuse_cache < Costcomputation is caching

worthwhile

Maria Hybinette, UGA 9

How Much Speedup is Possible?

Neglecting cache warm up and fixed costs

Expected Speedup = Costcomputation / Costuse_cache

Upper bound (hit_rate = 1)

= Costcomputation / Costlookup

In our experiments Costcomputation / Costlookup = ~3.5

Maria Hybinette, UGA 10

Related Work

Function Caching: Replace application level function calls with cache queries:

» Introduced by: Bellman (1957); Michie (1968)» Incremental computations:

– Pugh & Teitelbaum (1989), Liu & Teitelbaum (1995)» Sequential discrete event simulation:

– Staged Simulation: Walsh & Sirer (2003) function caching + currying (break up computations), re-ordering and pre-computations),

Decision Tool Techniques for PADS: Multiple runs of similar simulations

» Simulation Cloning: Hybinette & Fujimoto (1998); Chen & Turner, et al (2002); Straburger (2000)

» Updateable Simulations (Ferenci et al 2002) Related Optimization Techniques

» Lazy Re-Evaluation: West (1988)

Maria Hybinette, UGA 11

Overview of Adaptive Caching

Execution time:

1. Warm-up execution phase, for each function:a) Monitor: hit rate, query time, function run time

b) Determine utility of using cache

2. Main execution phase, for each function:a) Use cache (or not) depending on results from 1

b) Randomly sample: hit rate, query time, function run time» Revise decision if conditions change

Maria Hybinette, UGA 12

What’s New

Decision to use cache is made dynamically » in response to unpredictable local conditions for each LP at

execution time

Relieves user of having to know whether something is worth caching

» adaptive method will automatically identify caching opportunities, reject poor caching choices

Easy to use caching API » independent of application or simulation kernel

» cache middleware

Distributed cache» Each LP maintains own independent cache

Maria Hybinette, UGA 13

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

cacheInitialize(int argc, char** argv);

}

Maria Hybinette, UGA 14

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

cacheInitialize(int argc, char** argv);

}

Maria Hybinette, UGA 15

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init(){cacheInitialize(int argc, char** argv);

}

Proc(state, msg, MyPE){retval = cacheCheckStart( currentstate, event );if( retval == NULL )

{/* original LP code. compute new state and events to be scheduled */

/* allow cache to save results */cacheCheckEnd( newstate, newevents ) ;}

else{newstate = retval.state;newevents = retval.events;}

schedule( newevents );

}

Maria Hybinette, UGA 16

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init(){cacheInitialize(int argc, char** argv);

}

Proc(state, msg, MyPE){retval = cacheCheckStart( currentstate, event );if( retval == NULL )

{/* original LP code. compute new state and events to be scheduled */

/* allow cache to save results */cacheCheckEnd( newstate, newevents ) ;}

else{newstate = retval.state;newevents = retval.events;}

schedule( newevents );

}

Maria Hybinette, UGA 17

Implementation

Maria Hybinette, UGA 18

Caching Middleware

Simulation Application

Cache Middleware

Simulation Kernel

Maria Hybinette, UGA 19

Caching Middleware (Hit)

Simulation Application

Cache Middleware

Simulation Kernel

Check cache state/message Cache Hit

Maria Hybinette, UGA 20

Caching Middleware (Miss)

Simulation Application

Cache Middleware

Simulation Kernel

Check cache state/message

Miss or cache lookup expensive

Miss: Cache new state & message

Cache Miss

Maria Hybinette, UGA 21

Cache Implementation

Hash table and separate chaining Input: Current State & Message Output: State and output message(s) Hash function (djb2 by Dan Bernstein, Perl)

Maria Hybinette, UGA 22

Memory Management

Distributed cache; one for each LP Pre-allocate memory pool for cache in each

LP during initialization phase Upper limit parameterized

Maria Hybinette, UGA 23

Experiments

3 Sets of Experiments with P-Hold» Proof of concept (no adaptive caching) hit-rate» Evaluation of impact of cache size and simulation

running time on speedup (no caching/caching)» Evaluation of adaptive caching with regard to the cost of

event computation 16 processor SGI Origin 2000

» 4 processors

“Curried” out time stamps

Maria Hybinette, UGA 24

0

10

20

30

40

50

60

70

80

90

100

0 20000 40000 60000 80000 100000 120000 140000 160000 180000

Progress (Simulated Time)

Hit

Rate

(Perc

en

tag

e %

)

90 KB (10%)

25000 KB (25%)

10000 KB (100%)

Hit Rate versus Progress

As expected hit ratio increases as cache size increases Maximum hit rate for large cache Hit rates sets an upper bound for speedup

Maria Hybinette, UGA 25

Speedup vs Cache Size

0

0.5

1

1.5

2

2.5

3

3.5

0 2000 4000 6000 8000 10000

Size of Cache (KB)

Spe

edu

p (

No C

achin

g/C

ach

ing

)

5 msec3 msec

Speedup improves as size of the cache increases Beyond size 9,000KB speedup declines and levels off Better performance for simulations with computations

that have higher latency

Maria Hybinette, UGA 26

Speedup vs Costcomputation

Non-adaptive caching suffers a speedup of 0.82 for low latency computations and improves to 1 when the computational latency approaches 1.5 msec

0.8

0.85

0.9

0.95

1

1.05

1.1

0 0.5 1 1.5 2 2.5 3

Computational Latency (msec)

Speedup (

Cach

ing/N

o C

ach

ing)

Non-Adaptive

Maria Hybinette, UGA 27

Speedup vs Costcomputation

Adaptive Caching, tracks the cost of consulting the cast in comparison of running the actual computation

Adaptive caching is 1 for small computational latencies (selects performing computation instead of consulting cache)

0.8

0.85

0.9

0.95

1

1.05

1.1

0 0.5 1 1.5 2 2.5 3

Computational Latency (msec)

Speedup (

Cach

ing/N

o C

ach

ing)

Non-Adaptive

Adaptive

Maria Hybinette, UGA 28

Summary & Future Work

Summary: Middleware implementation that require no major

structural revision of application code Best case speedup approaches 3.5 worst case speedup

of 1 (speedup is limited to a hit rate of 70%) Random generated information (such as time stamps or

other) caching may become ineffective unless taking pre-cautions

Future Work: Function caching instead of LP caching Look at series of functions to jump forward Adaptive replacement strategies

Maria Hybinette, UGA 29

Closing

“A sword wielded poorly will kill it’s owner”

-- Ancient Proverb

Maria Hybinette, UGA 30

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}

Proc(state, msg, MyPE)

{

val1 =

fancy_function(msg->param1,

state->key_part);

val2 =

fancier_function(msg->param3);

state->key_part = val1 + val2;

}

Maria Hybinette, UGA 31

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}

Proc(state, msg, MyPE)

{

val1 =

fancy_function(msg->param1,

state->key_part);

val2 =

fancier_function(msg->param3);

state->key_part = val1 + val2;

}

Maria Hybinette, UGA 32

Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}

Proc(state, msg, MyPE)

{

val1 =

fancy_function(msg->param1,

state->key_part);

val2 =

fancier_function(msg->param3);

state->key_part = val1 + val2;

}

// LP CODE WITH CACHING

LP_init()

{

cache_init(FF1, SIZE1, 2,

fancy_function);

cache_init(FF2, SIZE2, 1,

fancier_function);

}

Proc(state, msg, MyPE)

{

val1 =

cache_query(FF1, msg->param1,

state->key_part);

val2 =

cache_query(FF2, msg->param3);

State->key_part = val1 + val2;

}

Maria Hybinette, UGA 33

Approach

Cache computations and re-use when they repeat instead of re-compute.

LP

LPLP

LPLP

LPLP

LPLP

LP

Recommended