Korali - CSE-Lab...Korali Problem SolYer EYalXaWion E[ecXWion GaXVVian ProceVV Model GaXVVian...

Preview:

Citation preview

Korali a High-Performance Multi-Intrusive Bayesian Inference Software for Large-Scale Scientific Models

G. Arampatzis, S. Martin, D. Wälchli, and P. Koumoutsakos

#1

yax1

ytr1

yax,new1

ytr,new1

#2

yax2

ytr2

yax,new2

ytr,new2

#3

ysh3 ysh,new

3

#7

ysh7 ysh,new

7

. . .

. . .

#new

ynew

1 2

stretching shear

#i = ( Q1,i, Q2,i, µ0,i, �st,i ), i = 1, 2 #i = ( Q1,i, Q2,i, µ0,i, Q3,i, Q4,i, ,�sh,i ), i = 3, . . . , 7

#new = ( Q1, Q2, µ0, Q3, Q4,�sh, �st )

5

6

9

11

12 13 1415

16

17

18

19

20

1

2

3

4

7

8 10

1

2

34

5

67

8

9

101112 1314

15

1617

18

19

20

• Optimal Sensor Placement

• Hierarchical Bayesian Infernce

p(ϑ1 |d1, ℳ1)

+

U(s) =

Z

Y

Z

Rlog

p(y|r, s)p(y|s) p(r) p(y|r, s) dr dy

s? = argmaxs

U(s)

Korali

Problem

Solver

Evaluation

Execution

GaussianProcess

Model

GaussianProcess

Bayesian

Direct

Inference

Hierarchical

Reference

Custom

Psi

Theta

Executor

Optimiser

Sampler

CMA- ES

CCMA- ES

LM- CMA- ES

DEA

Rprop

Conduit

Concurent

Distributed

Sequantial

MCMC

DRAM

TMCMC

Extensible

Ease of Use

Motivation

Workflow

Worker 0

Rank 0 (Core 0)

Rank 1 (Core 1)

Worker 1

Rank 0 (Core 2)

Rank 1 (Core 3)

Worker 2

Rank 0 (Core 4)

Rank 1 (Core 5)

Worker 3

Rank 0 (Core 6)

Rank 1 (Core 7)

Experiment I

Solver Problem

Conduit

waiting for sample

Generate Samples

Preprocess Samples

Postprocess Results

Distribute Samples

CollectResults

Experiment II

Solver Problem

Preprocess Samples

Postprocess Results

Update State

Generate Samples

Engine finsihed

busy

Update State

Start Generation

Run Experiment

Check Termination

waiting for sample

Korali Supervisor Supercomputer

p(# | d) = p(d |#) p(#)p(d)

• Bayesian Inference

Load Balance

Unba

lanc

edBa

lanc

ed

ScalableComputational

Model Relaxation

time (hours)

Job 1

CheckPoint 1

Job 2

Resume

O(1000) Nodes

CheckPoint 2

CheckPoint N

Fault Tolerant system error

Recommended