Dancing With Uncertainty Saša Misailović Stelios Sidiroglou Martin Rinard MIT CSAIL

Preview:

Citation preview

Dancing With Uncertainty

Saša Misailović

Stelios Sidiroglou

Martin Rinard

MIT CSAIL

ExampleWater: Simulates system of water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

ExampleWater: Simulates system of water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

ExampleWater: Simulates system of water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

ExampleWater: Simulates liquid water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

ExampleWater: Simulates system of water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

ExampleWater: Simulates system of water molecules

HHO

HHO

HHO

H

HO

H

HO

HHO

HHO

Dubstep

Explores the effects of selectively removing

synchronization

Dubstep Highlights

1. Removing locks and opportunistic barrierstrade accuracy for performance

2. Automatically explores the tradeoff space induced by candidate transformations

3. Uses statistical analysis to characterize impact of transformations on accuracy

Dubstep Workflow

Prepare

Find

Transform

Analyze

Navigate

Dubstep Workflow

Prepare

Find

Transform

Analyze

Navigate

1. Prepare representative inputs

2. Prepare accuracy model– Output abstraction

(important parts of output)– Accuracy bound (amount of

tolerable error)

Dubstep Workflow

Prepare

Find

Transform

Analyze

Navigate

Loops with parallel constructs• Profiling: performance &

memory

Dubstep Workflow

Prepare

Find

Transform

Analyze

Navigate

Loops with parallel constructs• Profiling: performance &

memoryInterf (56.4%)

Poteng (43.4%)

Dubstep Workflow

Removing synchronizationPrepare

Find

Transform

Analyze

Navigate

void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock);}

Dubstep Workflow

Removing synchronizationPrepare

Find

Transform

Analyze

Navigate

void scratchPad::updateForces (double R[3][3]) { mutex_lock(this->lock); this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); mutex_unlock(this->lock);}

Dubstep Workflow

Removing synchronizationPrepare

Find

Transform

Analyze

Navigate

void scratchPad::updateForces (double R[3][3]) { this->H1force.vecAdd(R[0]); this->Oforce.vecAdd(R[1]); this->H2force.vecAdd(R[2]); }

Dubstep Workflow

Opportunistic barriersPrepare

Find

Transform

Analyze

Navigate

void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }

Dubstep Workflow

Opportunistic barriersPrepare

Find

Transform

Analyze

Navigate

void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }

Dubstep Workflow

Opportunistic barriersPrepare

Find

Transform

Analyze

Navigate

void ensemble::interf(){ parallel_for( interf_body, 0, NumMol-1 ); }

• Schedule threads • Execute interf_body in

parallel• Wait for all threads to

complete

Dubstep Workflow

Opportunistic barriersPrepare

Find

Transform

Analyze

Navigate

void ensemble::interf(){ parallel_for*( interf_body, 0, NumMol-1 ); }

• Schedule threads • Execute interf_body in

parallel• Wait for half of threads to

completeInstruct remaining threads to stop

[Rinard, OOPSLA 2007]

Dubstep Workflow

Analyze transformed program:

• Criticality– Memory safety, integrity

• Performance – Speedup comparison

• Accuracy– Statistical analysis

Prepare

Find

Transform

Analyze

Navigate

Dubstep Workflow

Prepare

Find

Transform

Analyze

Navigate

c

InputOriginal ProgramOutput

Output Abstraction(Application-Specific)

Transformed

Program

Difference Bound

δ<

Dubstep Workflow

Navigate the tradeoff space:• Transform and analyze

one location at a time– 3 locations in water

• Transform multiple locations in the same candidate program– Guided by the results of the

previous step

Prepare

Find

Transform

Analyze

Navigate

Search Space Exploration

0 0.01 0.02 0.03 0.04 0.05 0.06 0.071

1.05

1.1

1.15

1.2

1.25

Average Accuracy Loss vs. Speedup

LI BI

BRLI+BI

LI+BP

BI+BP

LI+BI+BP

Rela

tive

Sp

eed

up

Accuracy loss

LI – Synchronization InterfBI – Barrier InterfBP – Barrier Poteng

Baseline: original parallel program runs

6.2 times faster than sequential on 8 cores

Search Space Exploration

0 0.01 0.02 0.03 0.04 0.05 0.06 0.071

1.05

1.1

1.15

1.2

1.25

Average Accuracy Loss vs. Speedup

LI BI

BRLI+BI

LI+BP

BI+BP

LI+BI+BP

Rela

tive

Sp

eed

up

Accuracy loss

LI – Synchronization InterfBI – Barrier InterfBP – Barrier Poteng

How confident can we be about these observations?

Baseline: original parallel program runs

6.2 times faster than sequential on 8 cores

Execution Reliability

The probability p that the transformed program on the given

input produces the result with error less than

bound δ𝐩=𝐏𝐫 [|𝐑𝐞𝐬𝐎−𝐑𝐞𝐬𝐓

𝐑𝐞𝐬𝐎 |≤𝛅]While we cannot model p, we can specify

minimum acceptable reliability r

Execution Reliability

• Repeat execution N times:• Observations: if , else 0

• Compute statistic p’

• Return Yes if p’ > r + • Return No otherwise

Determine if program’s reliability p > r

Execution Reliability

• Repeat execution N times:• Observations: if , else 0

• Compute statistic p’

• Return Yes if p’ > r + • Return No otherwise

Determine if program’s reliability p > r

How to pick N?

How Many Runs Are Enough?

Procedure that determines that p > r :

• Returns correct result most of the time– Wrong decision rate – Tolerance region

• Quickly determines extreme (very good or bad) transformations

Statistical AnalysisSequential Probability Ratio Test

Two hypotheses:

H0: p > r +

H1: p < r

• Collects one observation in every iteration

• Updates likelihoods of H0 and H1 based on the

previous observation • Stops when wrong decision less than

specified • N is not fixed, depends on observations p

Statistical AnalysisSequential Probability Ratio Test

• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02

• If program always produces acceptable result, test says Yes after 100 runs

• If program never produces acceptable result, test says No after 10 runs

Statistical AnalysisSequential Probability Ratio Test

• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02

Bound (δ)Best

Transformation

0.01 LI

0.05 LI

0.10 LI+BI+BR

0.15 LI+BI+BR

Statistical AnalysisSequential Probability Ratio Test

• r (acceptable reliability) = 0.90• (wrong decision rate) = 0.10• ε (tolerance region) = 0.02

Bound (δ)Best

Transformation

0.01 LI

0.05 LI

0.10 LI+BI+BR

0.15 LI+BI+BR

Exploring Tradeoff Space

Start: Sequential program with for loopsTransformations:• Parallel loop introduction• Synchronization,

ReplicationQuickstep [MIT-TR-2010-38, TECS/PEC 2012]

Prepare

Find

Transform

Analyze

Navigate

Exploring Tradeoff Space

Start: Program with for loopsTransformations:• Skip loop iterations

(multiple forms)

Loop Perforation[ICSE 2010, ONWARD 2010, SAS 2011, FSE 2011]

Prepare

Find

Transform

Analyze

Navigate

Exploring Tradeoff Space

Start: Program with command line parametersTransformations:• Alternate function

versions activated by CL parameters

Dynamic Knobs [ASPLOS 2011]

Prepare

Find

Transform

Analyze

Navigate

Exploring Tradeoff Space

Start: Program is a tree of Map-Reduce type tasksTransformations:• Function Substitution• Reduction Sampling

NapRed[POPL 2012]

Prepare

Find

Transform

Analyze

Navigate

Exploring Tradeoff Space

Start: Parallel program with for loopsTransformations:• Removing Locks• Opportunistic Barriers

Dubstep[Today: RACES 2012]

Prepare

Find

Transform

Analyze

Navigate

Reasoning About Accuracy

Exploring levels of accuracy guarantees:• Logic-based• Probabilistic• Statistical• Empirical

Prepare

Find

Transform

Analyze

Navigate

Recommended