16
Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana- Champaign

A Scalable Double In-memory Checkpoint and Restart Scheme Towards Exascale

Embed Size (px)

DESCRIPTION

A Scalable Double In-memory Checkpoint and Restart Scheme Towards Exascale. Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana-Champaign. Motivation. As machines grow in size MTBF decreases - PowerPoint PPT Presentation

Citation preview

  • Gengbin ZhengXiang NiLaxmikant V. Kale

    Parallel Programming LabUniversity of Illinois at Urbana-Champaign

    Charm++ Workshop 2012

  • MotivationAs machines grow in sizeMTBF decreasesJaguar had 2.33 average failures/day from 2008 to 2010Applications have to tolerate faultsChallenges for exascale:Disk-based (NFS reliable disk) checkpointing is slowSystem-level checkpointing can be expensiveScalable checkpointing/restart can be a communication intensive processJob scheduler prevent fault tolerance support in runtime

    Charm++ Workshop 2012*

    Charm++ Workshop 2012

  • Motivation (cont.)Applications on future exascale machines need fast, low cost and scalable fault tolerance supportPrevious work: double in-memory checkpoint/restart schemeIn production version of Charm++ since 2004

    Charm++ Workshop 2012*

    Charm++ Workshop 2012

  • Double in-memory Checkpoint/Restart ProtocolCharm++ Workshop 2012*HIJABCEDFGABCDEFGHIJABCFGDEHIJABCDEFGHIJAFCDEFGHIJHIJABCDEBGAAAAPE0PE1PE2PE3PE0PE2PE3objectcheckpoint 1checkpoint 2restored objectPE1 crashed ( lost 1 processor )

    Charm++ Workshop 2012

  • Runtime Support for FTAutomatically checkpointing threadsIncluding stack and heap (isomalloc)User helper functionsTo pack and unpack dataCheckpointing only the live variablesCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • Local Disk-Based ProtocolDouble in-memory checkpointingMemory concernPick checkpointing time where global state is smallMD, N-body, quantum chemistryDouble In-disk checkpointingMake use of local disk (or SSD)Also does not rely on any reliable storageUseful for applications with very big memory footprint

    Charm++ Workshop 2012*

    Charm++ Workshop 2012

  • Previous Results: Performance Comparisons with Traditional Disk-based CheckpointingCharm++ Workshop 2012*

    Charm++ Workshop 2012

    Chart2

    0.0040.0420.2180.3872.196

    0.0080.0450.2440.4052.234

    0.0160.0970.6230.782.353

    0.0320.1711.1981.1482.546

    0.0610.3041.2161.5853.52

    0.140.61.5982.1648.3

    0.271.1882.1123.58217.65

    0.5172.374.6696.85433.2

    1.0354.716.90114.01279.71

    2.0299.4311.76226.629129.87

    3.84518.8321.48147.06215.78

    double in-memory (Myrinet)

    double in-memory (100Mb)

    Local Disk

    double in-disk (Myrinet)

    NFS disk

    Problem size (MB)

    Checkpoint overhead (s)

    Sheet1

    6.40.0040.0420.2180.3872.196

    12.80.0080.0450.2440.4052.234

    25.60.0160.0970.6230.782.353

    51.20.0320.1711.1981.1482.546

    102.40.0610.3041.2161.5853.52

    204.80.140.61.5982.1648.3

    409.60.271.1882.1123.58217.65

    819.20.5172.374.6696.85433.2

    1638.41.0354.716.90114.01279.71

    3276.82.0299.4311.76226.629129.87

    6553.63.84518.8321.48147.06215.78

    Sheet1

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    double in-memory (Myrinet)

    double in-memory (100Mb)

    Local Disk

    double in-disk (Myrinet)

    NFS disk

    Problem size (MB)

    Checkpoint overhead (s)

    Sheet2

    Sheet3

  • Previous Results: Restart with Load BalancingCharm++ Workshop 2012*LeanMD, Apoa1, 128 processors

    Charm++ Workshop 2012

    Chart2

    2.009614

    1.715303

    1.695939

    1.715194

    1.371235

    1.950715

    1.978949

    1.757267

    1.687806

    1.701496

    1.700515

    1.626623

    1.597934

    1.440968

    1.355004

    1.214825

    1.228043

    1.078525

    1.081117

    1.158797

    0.935191

    0.926109

    0.943082

    0.926749

    0.96469

    0.927354

    0.961384

    0.899342

    0.931402

    0.949933

    0.988828

    0.994174

    1.025657

    1.047986

    0.895591

    0.965002

    1.042113

    0.979932

    1.032694

    1.001448

    1.003917

    1.0304

    1.038706

    1.087721

    1.069413

    1.02002

    1.174063

    1.192655

    1.098414

    1.153628

    1.099

    0.967127

    1.157128

    1.223548

    1.213982

    1.212825

    1.261162

    1.040725

    1.035942

    0.944774

    0.927978

    0.947262

    0.913283

    1.042702

    0.950637

    0.933469

    0.869751

    0.958103

    1.040595

    1.064749

    1.041798

    0.966193

    1.075353

    0.991624

    1.118805

    1.101449

    1.044793

    1.166518

    1.063404

    0.949263

    0.985938

    1.044694

    0.97056

    0.944894

    1.026142

    0.992745

    1.04838

    1.019766

    1.043817

    1.063878

    1.062542

    1.077293

    1.078989

    0.927144

    1.045735

    1.184777

    0.968462

    1.010038

    0.943256

    0.902721

    0.898092

    0.935502

    0.881208

    0.936183

    0.899393

    0.88032

    0.967682

    1.004605

    1.03273

    1.054115

    1.14725

    1.007501

    1.002491

    1.101577

    1.203901

    1.170128

    1.105745

    1.172174

    1.362858

    1.349746

    1.276214

    1.232062

    1.223564

    1.200905

    1.246819

    1.257887

    1.261652

    1.266324

    1.322484

    1.148785

    1.170641

    1.128079

    1.080882

    1.146171

    1.112182

    1.184934

    1.103204

    1.241024

    1.255153

    1.137981

    1.10045

    1.05994

    1.160035

    1.11048

    1.071578

    1.089167

    1.156947

    1.134598

    1.176696

    1.051746

    1.08226

    1.148134

    1.077623

    1.114203

    1.157763

    1.120371

    1.082479

    1.100304

    1.139752

    1.095038

    1.102715

    1.073009

    1.130851

    1.235905

    1.29095

    1.21523

    1.149708

    1.11114

    1.26187

    1.048929

    1.038305

    1.071786

    1.193509

    1.176457

    1.292901

    1.19144

    1.196389

    1.168556

    1.127554

    1.103472

    1.175753

    1.141691

    1.168513

    1.089175

    1.108441

    1.079966

    1.004083

    1.109842

    1.080457

    1.0846

    1.011798

    1.029041

    1.010761

    1.090176

    1.106607

    1.094634

    1.134613

    1.198009

    1.267268

    1.116588

    0.906991

    0.930289

    0.947122

    0.991644

    0.859498

    2.88266

    3.293003

    3.388471

    3.211137

    3.232565

    3.17068

    2.866359

    2.631764

    2.534208

    2.352287

    2.087968

    1.896062

    1.994562

    1.998752

    2.010156

    1.97682

    1.994103

    1.995991

    2.007732

    1.924195

    1.923999

    1.947834

    2.002525

    1.987392

    1.931417

    1.941952

    1.961183

    1.965781

    1.964549

    1.979992

    1.962649

    1.984623

    2.042883

    2.032125

    1.99038

    2.028829

    2.020808

    1.950227

    1.922465

    1.955345

    2.047458

    2.038188

    1.982511

    1.932677

    1.936854

    2.03362

    1.950351

    1.924653

    1.965704

    1.954082

    2.03491

    1.934451

    1.958489

    1.927321

    1.930575

    1.943952

    1.987292

    1.918903

    1.941163

    1.924595

    1.991671

    1.980222

    1.973316

    2.051759

    2.033623

    2.031429

    1.968183

    2.083912

    2.07348

    2.015043

    1.914541

    1.957682

    1.931788

    1.923598

    1.937796

    1.955661

    1.921733

    2.001874

    1.97265

    1.967819

    1.930842

    1.928904

    2.024632

    1.927654

    1.99485

    2.019106

    2.0046

    1.984987

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    1.962997

    1.980825

    2.063889

    2.010659

    2.063889

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    2.200172

    1.981707

    2.017706

    1.912272

    1.975005

    1.901524

    1.989627

    1.950138

    1.944595

    1.909765

    1.952108

    1.941951

    1.909355

    1.956829

    1.985959

    1.960357

    1.897455

    1.923289

    1.933349

    1.945136

    1.908449

    1.921104

    1.898984

    1.935738

    1.939566

    1.962883

    1.959983

    2.004256

    1.95039

    1.990977

    1.988153

    1.971523

    1.931339

    1.936834

    1.929571

    1.932951

    1.927635

    1.907997

    1.914607

    1.961778

    1.96884

    1.914686

    1.93231

    1.92104

    1.930202

    2.006238

    1.974535

    1.971951

    1.958482

    1.995817

    1.942418

    1.970828

    1.934542

    1.97977

    1.964298

    1.93154

    1.93805

    1.958871

    1.985174

    1.934364

    1.970927

    1.945426

    1.93256

    1.944642

    1.898459

    1.924726

    1.975699

    1.933096

    1.956693

    1.90019

    1.976753

    1.950444

    1.986071

    1.976988

    1.949482

    1.984249

    1.942268

    1.923077

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.969842

    1.917033

    1.950176

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.975135

    1.994313

    1.986376

    1.932574

    1.930701

    1.938055

    1.931742

    1.925482

    1.920733

    1.904041

    1.877132

    1.892237

    1.968057

    1.930821

    1.966495

    1.923515

    1.934028

    2.001933

    1.968827

    1.964655

    1.914891

    1.97013

    1.903098

    1.98663

    1.951209

    1.941464

    1.921714

    1.915151

    1.993102

    1.982194

    1.928612

    1.912259

    1.950094

    1.979134

    1.945588

    2.004483

    1.935869

    1.937256

    1.965171

    1.951399

    1.92098

    1.916989

    1.932824

    1.914597

    1.887689

    1.893165

    1.93146

    1.926887

    1.953792

    1.936362

    1.957064

    1.930316

    1.926294

    1.977252

    1.938591

    1.950371

    1.97272

    1.93755

    1.97443

    1.971506

    1.942103

    1.966804

    1.93183

    1.925058

    1.966637

    2.013649

    1.924295

    1.933238

    1.897978

    1.920966

    1.949399

    1.892507

    1.886588

    1.917741

    1.925794

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    1.927846

    1.896039

    1.921018

    1.93107

    1.906748

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    2.048415

    1.932005

    1.996658

    1.950263

    1.963529

    1.931642

    1.914656

    1.921085

    1.922576

    1.98944

    1.920537

    1.962262

    1.894785

    1.959783

    1.929744

    1.91836

    1.917831

    1.930237

    1.906392

    1.95393

    1.952592

    1.961637

    1.925792

    1.92846

    1.987622

    1.949983

    1.967137

    1.922768

    1.938523

    1.915719

    1.963132

    1.891995

    1.911063

    1.951286

    1.929961

    1.918687

    1.943827

    1.952086

    1.987933

    1.92094

    1.949606

    1.923209

    1.921539

    1.885482

    1.923213

    1.944563

    1.981385

    1.910231

    1.939388

    1.897881

    1.999324

    1.905841

    1.924322

    1.924182

    1.906817

    1.920647

    1.913346

    1.918498

    1.989822

    1.937303

    1.90323

    1.918191

    1.910337

    1.909426

    1.944418

    1.948985

    1.907036

    1.953726

    1.923503

    1.949636

    1.96483

    1.935372

    1.943347

    1.945511

    1.976622

    1.900946

    1.90749

    1.906462

    1.993606

    1.919058

    1.970217

    1.925131

    1.965142

    1.922287

    1.931435

    1.936987

    1.938416

    1.967613

    1.907173

    1.892249

    Timestep

    Simulation time per step (s)

    Without LB

    Sheet1

    2.009614

    1.715303

    1.695939

    1.715194

    1.371235

    1.950715

    1.978949

    1.757267

    1.687806

    1.701496

    1.700515

    1.626623

    1.597934

    1.440968

    1.355004

    1.214825

    1.228043

    1.078525

    1.081117

    1.158797

    0.935191

    0.926109

    0.943082

    0.926749

    0.96469

    0.927354

    0.961384

    0.899342

    0.931402

    0.949933

    0.988828

    0.994174

    1.025657

    1.047986

    0.895591

    0.965002

    1.042113

    0.979932

    1.032694

    1.001448

    1.003917

    1.0304

    1.038706

    1.087721

    1.069413

    1.02002

    1.174063

    1.192655

    1.098414

    1.153628

    1.099

    0.967127

    1.157128

    1.223548

    1.213982

    1.212825

    1.261162

    1.040725

    1.035942

    0.944774

    0.927978

    0.947262

    0.913283

    1.042702

    0.950637

    0.933469

    0.869751

    0.958103

    1.040595

    1.064749

    1.041798

    0.966193

    1.075353

    0.991624

    1.118805

    1.101449

    1.044793

    1.166518

    1.063404

    0.949263

    0.985938

    1.044694

    0.97056

    0.944894

    1.026142

    0.992745

    1.04838

    1.019766

    1.043817

    1.063878

    1.062542

    1.077293

    1.078989

    0.927144

    1.045735

    1.184777

    0.968462

    1.010038

    0.943256

    0.902721

    0.898092

    0.935502

    0.881208

    0.936183

    0.899393

    0.88032

    0.967682

    1.004605

    1.03273

    1.054115

    1.14725

    1.007501

    1.002491

    1.101577

    1.203901

    1.170128

    1.105745

    1.172174

    1.362858

    1.349746

    1.276214

    1.232062

    1.223564

    1.200905

    1.246819

    1.257887

    1.261652

    1.266324

    1.322484

    1.148785

    1.170641

    1.128079

    1.080882

    1.146171

    1.112182

    1.184934

    1.103204

    1.241024

    1.255153

    1.137981

    1.10045

    1.05994

    1.160035

    1.11048

    1.071578

    1.089167

    1.156947

    1.134598

    1.176696

    1.051746

    1.08226

    1.148134

    1.077623

    1.114203

    1.157763

    1.120371

    1.082479

    1.100304

    1.139752

    1.095038

    1.102715

    1.073009

    1.130851

    1.235905

    1.29095

    1.21523

    1.149708

    1.11114

    1.26187

    1.048929

    1.038305

    1.071786

    1.193509

    1.176457

    1.292901

    1.19144

    1.196389

    1.168556

    1.127554

    1.103472

    1.175753

    1.141691

    1.168513

    1.089175

    1.108441

    1.079966

    1.004083

    1.109842

    1.080457

    1.0846

    1.011798

    1.029041

    1.010761

    1.090176

    1.106607

    1.094634

    1.134613

    1.198009

    1.267268

    1.116588

    0.906991

    0.930289

    0.947122

    0.991644

    0.859498

    2.88266

    3.293003

    3.388471

    3.211137

    3.232565

    3.17068

    2.866359

    2.631764

    2.534208

    2.352287

    2.087968

    1.896062

    1.994562

    1.998752

    2.010156

    1.97682

    1.994103

    1.995991

    2.007732

    1.924195

    1.923999

    1.947834

    2.002525

    1.987392

    1.931417

    1.941952

    1.961183

    1.965781

    1.964549

    1.979992

    1.962649

    1.984623

    2.042883

    2.032125

    1.99038

    2.028829

    2.020808

    1.950227

    1.922465

    1.955345

    2.047458

    2.038188

    1.982511

    1.932677

    1.936854

    2.03362

    1.950351

    1.924653

    1.965704

    1.954082

    2.03491

    1.934451

    1.958489

    1.927321

    1.930575

    1.943952

    1.987292

    1.918903

    1.941163

    1.924595

    1.991671

    1.980222

    1.973316

    2.051759

    2.033623

    2.031429

    1.968183

    2.083912

    2.07348

    2.015043

    1.914541

    1.957682

    1.931788

    1.923598

    1.937796

    1.955661

    1.921733

    2.001874

    1.97265

    1.967819

    1.930842

    1.928904

    2.024632

    1.927654

    1.99485

    2.019106

    2.0046

    1.984987

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    1.962997

    1.980825

    2.063889

    2.010659

    2.063889

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    2.200172

    1.981707

    2.017706

    1.912272

    1.975005

    1.901524

    1.989627

    1.950138

    1.944595

    1.909765

    1.952108

    1.941951

    1.909355

    1.956829

    1.985959

    1.960357

    1.897455

    1.923289

    1.933349

    1.945136

    1.908449

    1.921104

    1.898984

    1.935738

    1.939566

    1.962883

    1.959983

    2.004256

    1.95039

    1.990977

    1.988153

    1.971523

    1.931339

    1.936834

    1.929571

    1.932951

    1.927635

    1.907997

    1.914607

    1.961778

    1.96884

    1.914686

    1.93231

    1.92104

    1.930202

    2.006238

    1.974535

    1.971951

    1.958482

    1.995817

    1.942418

    1.970828

    1.934542

    1.97977

    1.964298

    1.93154

    1.93805

    1.958871

    1.985174

    1.934364

    1.970927

    1.945426

    1.93256

    1.944642

    1.898459

    1.924726

    1.975699

    1.933096

    1.956693

    1.90019

    1.976753

    1.950444

    1.986071

    1.976988

    1.949482

    1.984249

    1.942268

    1.923077

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.969842

    1.917033

    1.950176

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.975135

    1.994313

    1.986376

    1.932574

    1.930701

    1.938055

    1.931742

    1.925482

    1.920733

    1.904041

    1.877132

    1.892237

    1.968057

    1.930821

    1.966495

    1.923515

    1.934028

    2.001933

    1.968827

    1.964655

    1.914891

    1.97013

    1.903098

    1.98663

    1.951209

    1.941464

    1.921714

    1.915151

    1.993102

    1.982194

    1.928612

    1.912259

    1.950094

    1.979134

    1.945588

    2.004483

    1.935869

    1.937256

    1.965171

    1.951399

    1.92098

    1.916989

    1.932824

    1.914597

    1.887689

    1.893165

    1.93146

    1.926887

    1.953792

    1.936362

    1.957064

    1.930316

    1.926294

    1.977252

    1.938591

    1.950371

    1.97272

    1.93755

    1.97443

    1.971506

    1.942103

    1.966804

    1.93183

    1.925058

    1.966637

    2.013649

    1.924295

    1.933238

    1.897978

    1.920966

    1.949399

    1.892507

    1.886588

    1.917741

    1.925794

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    1.927846

    1.896039

    1.921018

    1.93107

    1.906748

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    2.048415

    1.932005

    1.996658

    1.950263

    1.963529

    1.931642

    1.914656

    1.921085

    1.922576

    1.98944

    1.920537

    1.962262

    1.894785

    1.959783

    1.929744

    1.91836

    1.917831

    1.930237

    1.906392

    1.95393

    1.952592

    1.961637

    1.925792

    1.92846

    1.987622

    1.949983

    1.967137

    1.922768

    1.938523

    1.915719

    1.963132

    1.891995

    1.911063

    1.951286

    1.929961

    1.918687

    1.943827

    1.952086

    1.987933

    1.92094

    1.949606

    1.923209

    1.921539

    1.885482

    1.923213

    1.944563

    1.981385

    1.910231

    1.939388

    1.897881

    1.999324

    1.905841

    1.924322

    1.924182

    1.906817

    1.920647

    1.913346

    1.918498

    1.989822

    1.937303

    1.90323

    1.918191

    1.910337

    1.909426

    1.944418

    1.948985

    1.907036

    1.953726

    1.923503

    1.949636

    1.96483

    1.935372

    1.943347

    1.945511

    1.976622

    1.900946

    1.90749

    1.906462

    1.993606

    1.919058

    1.970217

    1.925131

    1.965142

    1.922287

    1.931435

    1.936987

    1.938416

    1.967613

    1.907173

    1.892249

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Timestep

    Simulation time per step (s)

    Without LB

    Sheet2

    Sheet3

    Chart2

    2.060572

    1.931205

    1.878904

    1.627035

    1.54573

    1.838356

    1.940753

    1.66237

    1.798899

    1.699308

    1.609657

    1.582982

    1.725091

    1.724436

    1.416855

    1.912544

    1.992336

    1.843522

    1.787968

    1.578881

    1.749074

    1.721319

    1.720792

    1.548338

    1.404611

    1.852854

    1.768577

    1.598787

    1.687954

    1.706513

    1.620017

    1.255407

    1.233622

    1.127641

    1.234462

    1.287475

    1.193388

    1.189892

    0.99324

    0.962672

    0.992411

    1.006699

    1.04138

    0.969301

    1.113824

    0.916239

    0.936094

    0.954022

    0.989555

    1.02479

    1.005211

    1.163595

    1.201799

    1.102465

    1.098228

    1.028322

    1.036103

    0.972076

    1.065422

    1.138067

    0.986625

    1.005647

    1.095719

    0.958778

    0.998048

    1.010745

    1.11048

    0.983394

    0.993608

    1.083521

    1.074188

    1.038286

    1.088879

    1.148243

    1.041047

    1.119522

    1.069704

    1.069352

    0.985704

    0.960018

    1.025316

    0.988069

    0.978957

    1.035497

    0.976275

    1.068464

    0.967931

    0.974109

    0.985639

    0.977586

    1.080294

    0.95472

    1.010183

    1.008256

    1.042001

    1.000356

    1.04785

    0.989684

    0.950716

    0.907331

    0.879343

    0.887231

    0.874751

    0.856354

    0.920724

    0.900419

    0.945531

    1.044399

    0.935695

    0.918593

    1.049466

    1.081448

    1.116128

    1.116112

    1.180923

    1.246483

    1.292915

    1.214981

    1.314288

    1.305134

    1.386524

    1.285486

    1.339381

    1.21084

    1.229857

    1.248031

    1.224514

    1.168148

    1.29184

    1.127459

    1.076682

    1.115381

    1.119832

    1.165949

    1.225126

    1.175402

    0.972285

    0.999218

    1.049842

    1.089163

    1.0472

    1.019105

    1.02164

    1.067423

    0.980737

    1.012048

    1.05805

    1.130666

    1.120378

    1.065913

    1.076211

    1.076157

    1.019195

    1.08021

    1.108113

    1.052279

    1.011755

    1.004593

    1.012997

    1.083555

    1.146402

    1.169461

    1.085392

    1.134367

    1.098053

    1.073385

    1.104686

    1.069164

    1.099568

    1.184347

    1.113693

    1.190831

    1.255191

    1.038505

    1.073652

    1.106564

    1.078479

    1.102903

    1.079685

    1.122704

    1.213064

    1.12968

    1.08834

    1.083566

    1.071111

    1.102088

    1.148053

    1.143092

    0.98726

    1.034385

    1.074711

    1.149926

    1.121928

    1.114652

    1.232812

    1.12267

    1.07861

    1.065033

    1.019278

    0.893063

    0.864822

    0.875184

    0.872714

    0.867169

    0.914183

    2.51822

    3.274045

    3.38535

    3.327303

    3.326231

    1.572805

    1.291528

    1.282563

    1.195413

    1.281697

    1.351278

    1.41182

    1.420893

    1.35073

    1.207345

    1.423579

    1.215017

    1.169013

    1.101311

    0.984618

    1.002282

    1.009545

    0.940676

    0.930696

    1.089722

    1.12236

    1.094353

    1.02755

    1.10523

    1.078062

    1.185435

    1.172529

    1.180859

    1.135149

    1.215596

    1.146032

    1.173749

    1.160676

    1.114597

    1.098378

    1.20156

    1.126391

    1.218134

    1.365837

    1.112039

    1.179607

    1.352579

    1.134288

    1.149921

    1.091482

    1.230376

    1.3414

    1.270711

    1.321813

    1.387056

    1.179084

    1.071516

    1.080868

    1.194492

    1.003742

    1.153284

    1.288704

    1.218278

    1.10546

    1.107355

    1.082475

    1.126309

    1.127273

    1.260399

    1.205362

    1.23908

    1.176623

    1.151854

    1.142528

    1.084323

    1.031449

    1.047176

    1.096269

    1.068707

    1.075525

    1.020311

    1.11261

    1.063544

    1.040865

    1.123212

    1.004555

    1.057848

    1.149934

    1.080261

    1.188141

    1.201995

    1.203672

    1.232376

    1.20621

    1.268279

    1.249325

    1.142575

    1.075439

    1.066331

    0.939303

    1.028284

    1.150136

    0.975997

    1.072057

    1.032319

    1.047562

    1.021373

    0.991336

    0.978471

    1.077165

    1.074773

    1.065461

    1.097531

    1.154555

    1.069852

    1.096352

    1.145798

    1.194215

    1.259295

    1.22811

    1.387169

    1.400844

    1.296343

    1.161405

    1.207471

    1.096671

    1.099053

    1.116261

    1.224266

    1.173608

    1.107899

    1.170175

    1.134412

    1.289924

    1.14338

    1.157075

    1.023501

    1.171553

    1.208359

    1.264599

    1.318647

    1.381978

    1.313919

    1.408652

    1.355754

    1.19197

    1.227243

    1.186915

    1.073975

    1.111496

    1.159998

    1.049054

    1.053717

    1.087436

    1.1362

    1.146585

    1.171901

    1.304624

    1.393229

    1.344975

    1.247171

    1.19004

    1.214342

    1.169926

    1.102449

    1.176876

    1.189897

    1.260473

    1.219286

    1.240637

    1.196267

    1.222259

    1.092598

    1.20193

    1.204242

    1.272453

    1.207051

    1.131122

    1.226318

    1.099369

    1.124663

    1.13474

    1.183924

    1.203704

    1.099796

    1.08283

    1.110308

    1.172176

    1.161562

    1.08917

    1.158742

    1.171614

    1.26025

    1.152344

    1.078808

    1.214086

    1.242712

    1.320226

    1.422131

    1.269468

    1.050201

    1.126931

    0.980549

    0.89928

    0.951257

    1.018921

    1.072066

    1.022013

    0.997903

    1.070778

    1.235411

    1.145889

    1.184446

    1.13628

    1.316929

    1.273835

    1.270294

    1.325384

    1.339973

    1.201388

    1.180919

    1.107804

    1.191401

    1.193776

    1.237621

    1.287715

    1.216725

    1.302029

    1.437431

    1.267819

    1.224654

    1.387489

    1.359403

    1.438596

    1.325198

    1.210793

    1.308453

    1.157467

    1.294329

    1.253489

    1.30115

    1.247543

    1.270057

    1.302583

    1.413441

    1.224054

    1.333359

    1.250685

    1.293967

    1.255599

    1.206257

    1.134268

    1.21995

    1.232118

    1.199247

    1.276804

    1.333451

    1.325729

    1.347565

    1.336253

    1.180034

    1.218165

    1.295983

    1.250252

    1.36607

    1.20234

    1.21411

    1.187355

    1.179482

    1.202792

    1.269934

    1.209173

    1.191557

    1.201336

    1.212744

    1.310085

    1.226332

    1.317346

    1.328281

    1.295426

    1.255357

    1.232607

    1.187254

    1.19741

    1.15615

    1.240302

    1.247239

    1.05425

    1.175569

    1.259295

    1.316296

    1.312434

    1.214804

    1.214533

    1.182005

    1.091171

    1.090968

    1.228359

    1.221041

    1.077193

    1.042002

    1.107627

    0.969915

    0.960628

    0.936166

    1.060246

    0.962746

    0.991709

    1.038922

    1.182681

    1.139499

    1.113086

    1.104547

    1.134765

    1.251538

    1.200886

    1.24352

    1.29842

    1.450039

    1.387679

    1.321627

    1.201209

    1.142341

    1.089895

    1.201425

    1.210527

    1.287517

    1.397484

    1.458638

    1.181752

    1.177029

    1.178251

    1.246274

    1.135265

    1.195558

    1.142152

    1.114611

    1.301401

    1.242675

    1.245565

    1.345349

    1.239675

    1.244319

    1.221322

    1.288002

    1.296624

    1.242214

    1.277246

    1.311366

    1.261963

    1.288182

    1.226164

    1.303767

    1.213738

    1.150819

    1.269432

    1.27805

    1.309832

    1.350773

    1.160464

    1.14494

    1.195354

    1.272599

    1.271826

    1.406962

    1.46772

    1.28377

    1.274632

    1.296633

    1.312492

    1.32592

    1.261491

    1.157473

    1.252089

    1.221721

    1.247894

    1.237201

    1.196247

    1.191015

    1.276539

    1.265503

    1.167094

    1.120498

    1.136233

    1.154613

    1.163526

    1.361613

    1.21193

    1.21968

    1.205614

    1.188679

    1.239615

    1.074431

    1.125611

    1.205336

    1.098654

    1.192173

    1.308113

    1.212591

    1.157649

    Timestep

    Simulation time per step (s)

    With LB

    Sheet1

    2.060572

    1.931205

    1.878904

    1.627035

    1.54573

    1.838356

    1.940753

    1.66237

    1.798899

    1.699308

    1.609657

    1.582982

    1.725091

    1.724436

    1.416855

    1.912544

    1.992336

    1.843522

    1.787968

    1.578881

    1.749074

    1.721319

    1.720792

    1.548338

    1.404611

    1.852854

    1.768577

    1.598787

    1.687954

    1.706513

    1.620017

    1.255407

    1.233622

    1.127641

    1.234462

    1.287475

    1.193388

    1.189892

    0.99324

    0.962672

    0.992411

    1.006699

    1.04138

    0.969301

    1.113824

    0.916239

    0.936094

    0.954022

    0.989555

    1.02479

    1.005211

    1.163595

    1.201799

    1.102465

    1.098228

    1.028322

    1.036103

    0.972076

    1.065422

    1.138067

    0.986625

    1.005647

    1.095719

    0.958778

    0.998048

    1.010745

    1.11048

    0.983394

    0.993608

    1.083521

    1.074188

    1.038286

    1.088879

    1.148243

    1.041047

    1.119522

    1.069704

    1.069352

    0.985704

    0.960018

    1.025316

    0.988069

    0.978957

    1.035497

    0.976275

    1.068464

    0.967931

    0.974109

    0.985639

    0.977586

    1.080294

    0.95472

    1.010183

    1.008256

    1.042001

    1.000356

    1.04785

    0.989684

    0.950716

    0.907331

    0.879343

    0.887231

    0.874751

    0.856354

    0.920724

    0.900419

    0.945531

    1.044399

    0.935695

    0.918593

    1.049466

    1.081448

    1.116128

    1.116112

    1.180923

    1.246483

    1.292915

    1.214981

    1.314288

    1.305134

    1.386524

    1.285486

    1.339381

    1.21084

    1.229857

    1.248031

    1.224514

    1.168148

    1.29184

    1.127459

    1.076682

    1.115381

    1.119832

    1.165949

    1.225126

    1.175402

    0.972285

    0.999218

    1.049842

    1.089163

    1.0472

    1.019105

    1.02164

    1.067423

    0.980737

    1.012048

    1.05805

    1.130666

    1.120378

    1.065913

    1.076211

    1.076157

    1.019195

    1.08021

    1.108113

    1.052279

    1.011755

    1.004593

    1.012997

    1.083555

    1.146402

    1.169461

    1.085392

    1.134367

    1.098053

    1.073385

    1.104686

    1.069164

    1.099568

    1.184347

    1.113693

    1.190831

    1.255191

    1.038505

    1.073652

    1.106564

    1.078479

    1.102903

    1.079685

    1.122704

    1.213064

    1.12968

    1.08834

    1.083566

    1.071111

    1.102088

    1.148053

    1.143092

    0.98726

    1.034385

    1.074711

    1.149926

    1.121928

    1.114652

    1.232812

    1.12267

    1.07861

    1.065033

    1.019278

    0.893063

    0.864822

    0.875184

    0.872714

    0.867169

    0.914183

    2.51822

    3.274045

    3.38535

    3.327303

    3.326231

    1.572805

    1.291528

    1.282563

    1.195413

    1.281697

    1.351278

    1.41182

    1.420893

    1.35073

    1.207345

    1.423579

    1.215017

    1.169013

    1.101311

    0.984618

    1.002282

    1.009545

    0.940676

    0.930696

    1.089722

    1.12236

    1.094353

    1.02755

    1.10523

    1.078062

    1.185435

    1.172529

    1.180859

    1.135149

    1.215596

    1.146032

    1.173749

    1.160676

    1.114597

    1.098378

    1.20156

    1.126391

    1.218134

    1.365837

    1.112039

    1.179607

    1.352579

    1.134288

    1.149921

    1.091482

    1.230376

    1.3414

    1.270711

    1.321813

    1.387056

    1.179084

    1.071516

    1.080868

    1.194492

    1.003742

    1.153284

    1.288704

    1.218278

    1.10546

    1.107355

    1.082475

    1.126309

    1.127273

    1.260399

    1.205362

    1.23908

    1.176623

    1.151854

    1.142528

    1.084323

    1.031449

    1.047176

    1.096269

    1.068707

    1.075525

    1.020311

    1.11261

    1.063544

    1.040865

    1.123212

    1.004555

    1.057848

    1.149934

    1.080261

    1.188141

    1.201995

    1.203672

    1.232376

    1.20621

    1.268279

    1.249325

    1.142575

    1.075439

    1.066331

    0.939303

    1.028284

    1.150136

    0.975997

    1.072057

    1.032319

    1.047562

    1.021373

    0.991336

    0.978471

    1.077165

    1.074773

    1.065461

    1.097531

    1.154555

    1.069852

    1.096352

    1.145798

    1.194215

    1.259295

    1.22811

    1.387169

    1.400844

    1.296343

    1.161405

    1.207471

    1.096671

    1.099053

    1.116261

    1.224266

    1.173608

    1.107899

    1.170175

    1.134412

    1.289924

    1.14338

    1.157075

    1.023501

    1.171553

    1.208359

    1.264599

    1.318647

    1.381978

    1.313919

    1.408652

    1.355754

    1.19197

    1.227243

    1.186915

    1.073975

    1.111496

    1.159998

    1.049054

    1.053717

    1.087436

    1.1362

    1.146585

    1.171901

    1.304624

    1.393229

    1.344975

    1.247171

    1.19004

    1.214342

    1.169926

    1.102449

    1.176876

    1.189897

    1.260473

    1.219286

    1.240637

    1.196267

    1.222259

    1.092598

    1.20193

    1.204242

    1.272453

    1.207051

    1.131122

    1.226318

    1.099369

    1.124663

    1.13474

    1.183924

    1.203704

    1.099796

    1.08283

    1.110308

    1.172176

    1.161562

    1.08917

    1.158742

    1.171614

    1.26025

    1.152344

    1.078808

    1.214086

    1.242712

    1.320226

    1.422131

    1.269468

    1.050201

    1.126931

    0.980549

    0.89928

    0.951257

    1.018921

    1.072066

    1.022013

    0.997903

    1.070778

    1.235411

    1.145889

    1.184446

    1.13628

    1.316929

    1.273835

    1.270294

    1.325384

    1.339973

    1.201388

    1.180919

    1.107804

    1.191401

    1.193776

    1.237621

    1.287715

    1.216725

    1.302029

    1.437431

    1.267819

    1.224654

    1.387489

    1.359403

    1.438596

    1.325198

    1.210793

    1.308453

    1.157467

    1.294329

    1.253489

    1.30115

    1.247543

    1.270057

    1.302583

    1.413441

    1.224054

    1.333359

    1.250685

    1.293967

    1.255599

    1.206257

    1.134268

    1.21995

    1.232118

    1.199247

    1.276804

    1.333451

    1.325729

    1.347565

    1.336253

    1.180034

    1.218165

    1.295983

    1.250252

    1.36607

    1.20234

    1.21411

    1.187355

    1.179482

    1.202792

    1.269934

    1.209173

    1.191557

    1.201336

    1.212744

    1.310085

    1.226332

    1.317346

    1.328281

    1.295426

    1.255357

    1.232607

    1.187254

    1.19741

    1.15615

    1.240302

    1.247239

    1.05425

    1.175569

    1.259295

    1.316296

    1.312434

    1.214804

    1.214533

    1.182005

    1.091171

    1.090968

    1.228359

    1.221041

    1.077193

    1.042002

    1.107627

    0.969915

    0.960628

    0.936166

    1.060246

    0.962746

    0.991709

    1.038922

    1.182681

    1.139499

    1.113086

    1.104547

    1.134765

    1.251538

    1.200886

    1.24352

    1.29842

    1.450039

    1.387679

    1.321627

    1.201209

    1.142341

    1.089895

    1.201425

    1.210527

    1.287517

    1.397484

    1.458638

    1.181752

    1.177029

    1.178251

    1.246274

    1.135265

    1.195558

    1.142152

    1.114611

    1.301401

    1.242675

    1.245565

    1.345349

    1.239675

    1.244319

    1.221322

    1.288002

    1.296624

    1.242214

    1.277246

    1.311366

    1.261963

    1.288182

    1.226164

    1.303767

    1.213738

    1.150819

    1.269432

    1.27805

    1.309832

    1.350773

    1.160464

    1.14494

    1.195354

    1.272599

    1.271826

    1.406962

    1.46772

    1.28377

    1.274632

    1.296633

    1.312492

    1.32592

    1.261491

    1.157473

    1.252089

    1.221721

    1.247894

    1.237201

    1.196247

    1.191015

    1.276539

    1.265503

    1.167094

    1.120498

    1.136233

    1.154613

    1.163526

    1.361613

    1.21193

    1.21968

    1.205614

    1.188679

    1.239615

    1.074431

    1.125611

    1.205336

    1.098654

    1.192173

    1.308113

    1.212591

    1.157649

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Timestep

    Simulation time per step (s)

    With LB

    Sheet2

    Sheet3

  • Previous Result: Recovery PerformanceCharm++ Workshop 2012*10 crashes128 processorsCheckpoint every 10 time steps

    Charm++ Workshop 2012

  • Charm++ Workshop 2012*LeanMD with Apoa1 benchmark90K atoms8498 objects

    Charm++ Workshop 2012

  • FT on MPI-based Charm++Practical challenge: job schedulerJob scheduler kills the entire job when a process failsMPI-based Charm++ is portable on major supercomputersA fault injection scheme in MPI machine layerDieNow() MPI process stop respondingFault detection by keep-alive messagesSpare processors to replace failed ones Demonstrated on 64K cores of BG/P machineCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • Performance at Large ScaleCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • Optimization for scalabilityCommunication bottlenecksCheckpoint/restart time takes O(P) timeOptimizations:Collectives (barriers)Switch O(P) barrier to a tree-based barrierStale message handlingEpoch numberA phase to discard stale messages as quickly as possibleSmall messagesStreaming optimizationCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • LeanMD Checkpoint Time before/after OptimizationCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • Checkpoint Time for Jacobi/AMPICharm++ Workshop 2012*Kraken

    Charm++ Workshop 2012

  • LeanMD Restart TimeCharm++ Workshop 2012*

    Charm++ Workshop 2012

  • Conclusions and Future workIn-memory checkpointing after optimization is scalable towards ExascaleA short paper is accepted at the 2nd Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2012)Future work:Non-blocking checkpointingCharm++ Workshop 2012*

    Charm++ Workshop 2012

    ****Memory usage increase by a factor of 2.*Log scaleVaried the problem size from 6.4MB to as big as 6GB32 processors**Run time with multiple

    *Original spanning tree can not handle failed processors*