23
Shortened presentation title Shortened presentation title Shortened presentation title Evaluating the Performance of Large Scale Data Assimilation on Modern Geophysical Models Julia Piscioniere College of Charleston SIParCS Intern Mentors: Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab

Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Evaluating the Performance of Large Scale Data Assimilation on Modern Geophysical Models

Julia PiscioniereCollege of Charleston

SIParCS Intern

Mentors:Brian Dobbins and John Dennis

August 3, 2018

Computational & Information Systems Lab

Page 2: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Overview

2

Plots of Results

OptimizationMotivationsDART

Page 3: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Introduction

• Data Assimilation Research TestbedDART?

• When information from new physical observations are used to update a numerical model in order to produce more accurate predictions

Data Assimilation?

3

https://www.image.ucar.edu/DAReS/DART/

Page 4: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Motivation

Overall Goal: Improve the performance of data assimilation

Optimization Approach: Reduce communication in DART

4

• Observation density is increasing

• CESM ensembles: hundreds or thousands of nodes

• Enable DART to scale to larger node counts

Page 5: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

How DART Currently Works

5

1. Statistics of observations are calculated

•Processes one observation at a time

3. Update State Phase

•Updates the local model states based on the observation

4. Update Observations Phase

•Updates the local observations with the statistics

2. Localization Phase

•Finds state variables that are close to the current observation being processed

Number of Iterations = The Number of Observations

Page 6: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

How the Optimized DART Works

6

1. Statistics of observations are calculated

•Processes groups of observations at a time (graph code)

3. Update State Phase

•Updates the local model states based on the observation

4. Update Observations Phase

•Updates the local observations with the statistics

2. Localization Phase

•Finds state variables that are close to the current observation being processed

Number of Iterations = The Number of Groups

Number of Communication Counts = 1-2% of the Number of Observations

Page 7: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Cutoff Distance

7

Page 8: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Cutoff Distance

8

Page 9: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

9

Page 10: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

10

Page 11: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

11

Page 12: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

12

Page 13: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

13

Page 14: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

14

Page 15: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

What is a Graph Coloring Algorithm?

15

Color 4 = RedColor 5 = Blue

Page 16: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Running Comparisons

Chunk Size

• The highest number of observations that will be in a chunk

• Used chunk size of 256

Ensemble Size

• Number of ensemble members DART will produce

• Used ensemble size 100

Number of Nodes

• Range from 4 - 384

Cutoff Distance

• Used cutoffs 0.1 and 0.2

16

• All runs were 1–degree CAM (Community Atmospheric Model) cases

Page 17: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Choosing Chunk Size

17Evaluating Performance of New DART

Chunk sizes of 128 and 256 were both viable options

Page 18: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Assimilation Time Plots

18Evaluating Performance of New DART

At 64 nodes, the graph code is 60% faster than the original code

Page 19: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Assimilation Time Plots

19Evaluating Performance of New DART

At 128 nodes, the graph code is 51% faster than the original

Page 20: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Communication Time Plot

20Evaluating Performance of New DART

The communication time is decreased by an average of 90% across nodes from 4-384

Page 21: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Cheyenne Over Gigabit Ethernet

21Evaluating Performance of New DART

The graph code may run faster on a slower network, meaning people with slower networks can run DART

more efficiently

Page 22: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Conclusions

•The graph code communication time is decreased by an average of 90%

The assimilation time is decreased by a minimum of 10% and -as the number of nodes increases - a maximum of 85%

The graph code may be a better option for people who want to run DART on systems with slower network links

22

Page 23: Evaluating the Performance of Large Scale Data ... · Brian Dobbins and John Dennis August 3, 2018 Computational & Information Systems Lab. Shortened presentation titleShortened presentation

Shortened presentation titleShortened presentation titleShortened presentation title

Thank you! [email protected]

AcknowledgementsMentors:

– Brian Dobbins– John Dennis

CODE Group:– AJ Lauer– Elliot Foust– Jenna Preston

DAReS Group:– Nancy Collins– Jeff Anderson – Tim Hoar

– Elizabeth Faircloth, admin– Rich Loft– Valerie Sloan– National Center for Atmospheric

Research– National Science Foundation– John Piscioniere– My fellow interns :)

23Evaluating Performance of New DART