Upload
truongdien
View
215
Download
1
Embed Size (px)
Citation preview
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 1
Next Generation Computing: Needs for the Atmospheric Sciences
at NCAR
May 15, 2017 National Academy of Sciences, Washington DC
Anke Kamrath, [email protected] Interim Director, Computing and Information Systems Laboratory
Director, Operations and Services Division in CISL
National Center for Atmospheric Research (NCAR)
* Thanks to Jim Hurrell, Rich Loft, Dave Hart, J-F Lamarque and Ben Cash for their contributions to this slide deck.
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 2
Overview • Community Earth System Model (CESM)
• State of Computing at NCAR Today and Future Needs – NCAR’s Data Intensive Computing Environment
– Computing Roadmap
• The Challenges Ahead – “The Wall”
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 3
CESM2 – Community Earth System Model
• Fully-coupled, community, global climate model
• ~60% of NCAR HPC Usage
• Model of models - >1.5M lines code
• ~2X more expensive than CESM1 due
to addition of more science
• Stringent verification criteria
• Community Governance via Working Groups (Atmosphere, Biogeochemistry, Chemistry Climate, Climate Variability & Change, Land Model, Ice, Ocean, Paleoclimate, Polar Climate, Societal Dimensions, Software Engineering, Whole Atmosphere)
• Utilized by 100s of scientists around the world
• Single Code Base Across Desktop, Departmental
and HPC
Study of regional-refinement in CAM6 (AMIP) with the Spectral Element (SE) and MPAS dynamical cores (A. Gettelman and C. Zarzycki)
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 4
CESM Development Process: 5 years Where to add code refactoring, optimization,
parallelization or even rearchitecting?
Model release (CESM1/CLM4) Detailed model
assessment (identify strengths and weaknesses)
LMWG members develop
parameterizations or add features
Present ideas/results at
LMWG meetings
Publish papers
Plans for next (and next next) model version discussed
at LMWG meetings
Build and test beta version of offline model
Finalize and test within
CESM Use model for
scientific studies
Evaluate competing parameterizations
Document; Control
integrations; Model
release (CESM2/CLM5)
Observations
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 5
Data Transfer Services 40 GB/s
RDA, Climate Data Service
40-Gb Ethernet
HPSS Archive 175-190 PB capacity
80 PB stored, 12.5 GB/s >20 PB/yr growth
Geyser, Caldera DAV clusters GLADE
Central disk resource 37 PB, 90/200 GB/s GPFS Yellowstone
1.5 PFLOPS peak
High-bandwidth Low-latency HPC and I/O Networks EDR / FDR InfiniBand and 40-Gb Ethernet
Cheyenne 5.34 PFLOPS peak
Remote Vis Partner Sites XSEDE Sites
Supercomputing Environment at NCAR
SGI ICE, 145K Xeon Broadwell cores, 4K nodes
331TB RAM, EDR
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 6
5 Year Target “Data Friendly” Architecture
NVRAM
O(102) Analysis Nodes
Viz/FPGA nodes
Web servers
disk/tape
5x DRAM memory
O(1M cores) O(1 PB DRAM)
100x DRAM
O(10 sec CkPt)
Data Analysis & Vis Supercomputer Super-cache
UCAR Confidential
Collections/Projects 20x DRAM
SSD Storage
Node islands With
NVRAM xconnect
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 7
• Computing wall
• Data wall
• Complexity wall
• Efficiency wall
The Wall
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 8
Computing Wall
• Processor trends – More transistors
– More cores (∝ transistors)
– Flat clock speeds and power
– Slowing per thread performance
– Increasing flops/byte of memory BW
• SunWei processors ~25 flops/byte
• KNL processors ~7 flops/byte
• Climate computing not well matched to these trends – Climate applications are state heavy with low computational intensity
• ESMs typically run at <1 flops/byte over entire application (e.g., MOM6 Barotropic Solver - .11 flops/byte)
– Physics code is branchy, hard to vectorize, has divides and load imbalances
Source: Karl Rupp
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 9
Difficult Road to Exascale
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 10
Efficiency Wall?
0
10
20
30
40
50
60
70
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
5.0%
13/01/01 13/04/11 13/07/20 13/10/28 14/02/05 14/05/16
Dai
ly A
vera
ge T
era
FLO
P/s
eco
nd
Dai
ly A
vera
ge %
Flo
atin
g P
oin
t Ef
fici
en
cy
NCAR’S Yellowstone Floating Point Efficiency Yellowstone %FP Efficiency Yellowstone Avg TFLOP/s
1.57% Lifetime Average Application Floating-Point Efficiency
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 11
Algorithmic ways around the wall?
• Bigger timesteps
– Implicit integration
– Parallel in time
• Fewer points
– Adaptive mesh refinement
– Numerical schemes with higher effective resolution
• Model emulators
– Neural network encoders
• Reduced precision
– FPGA-based computation
“The energy liberated by not
performing overly exact calculations
could be put to more productive use.” ----Tim Palmer
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 12
Co-Design/Partnership between vendors, government and ESM Developers?
– Vendors focused on market of analytics,
deep-learning – simulation is secondary.
– Community level coordination needed
– Purpose-built HPC for ESM including more thread concurrency, more memory bandwidth, less latency to memory, configurable, fast I/O, resilience, efficient reductions, etc.
– Similar needs by other communities: geoscience (tectonics and magna flow), energy (power station design), aerospace (CFD), biomechanics (blood flow), solar and astrophysics physics
CoDEx: Co-Design for Exascale
CRAFT
MDGrape-3 RIKEN
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 13
Data Wall
• Data volumes are exploding and underlying technologies are rapidly changing. A radical rethinking about how data is produced, stored, analyzed, visualized, shared and understood needs to occur.
• Change from “Computing Campaigns” to “Data Campaigns”
• Technology Challenges and Opportunities – Storage Costs outpacing Compute Costs
– New and emerging capabilities in the memory-storage hierarchy
• Science Challenges: – Ensembles (50-100) with Data Assimilation of billions of
observations is just around the corner.
– CMIP6 could be >30PB – how to tackle data management and model intercomparison at this scale?
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 14
NCAR Strategies for Reducing Data Friction
• Provide “Big Data” Community DAV – CMIP Analysis Platform
– NCAR Research Data Archives (RDA) server-side subsetting and processing (processed 20 PB in 2016, delivered .2 PB).
• Better Data Management Policies – Create “storage economy and policies” that drive appropriate trade-offs of
“save data” vs “recompute”
• Lossy Compression – Seeing results of 80% reduction in size in some data types
– With new data policies – more interest by scientists
• New/better storage technologies & hierarchy – Seeing 20X speedups of some workflows (disk-to-disk vs ssd-to-ssd)
• Parallel climate analytics & automated workflow software – Focus on end-to-end workflow performance is vital.
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 15
Complexity Wall
More $ More Compute
More $ More Compute
Faster Threads, and/or Better Code Efficiency to Improve SYPD
Complexity
Resolution
Ensemble size
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 16
Conclusions
• Models are getting exponentially more complex. Without appropriate engineering and governance, ability to produce codes that can run well in jeopardy
• Current ESMs <2% of peak performance • Low Code efficiency means Less Science • Optimizing codes while science is added - like tuning up a race car while
it’s driving • ESMs will not speed up (SYPD) without real effort. New ideas are required
to break through this barrier. – Exploring radically new algorithmic approaches – New types of parallelism – System Co-Design with Vendors
• We must invest heavily in next-generation codes to keep science moving forward!
• Investment (pay, pipeline, diversity, training, etc) in workforce, workforce and workforce… is vital.
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 17
Questions? Comments?
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 18
Some References
• Carman, Jessie, Thomas Clune, Francis Giraldo, Mark Govertt, Brian Gross, Anke Kamrath, Tsengdar Lee, David McCarren, John Michalakes, Scott Sandgathe, Tim Whitcomb, 2017. Position Paper on high performance computing needs in Earth System Prediction. National Earth System Prediction Capability. https://doi.org/10.7289/V5862DH3
• Next Generation Earth System Prediction: Strategies for Subseasonal to Seasonal Forecasts, April 2016 National Academies – NRC Report
• NSF RFI CI Submissions from NCAR – HPC Community
– CESM Community
Shortened presentation title Shortened presentation title Next Generation Computing: Needs and Opportunities for Weather, Climate, and Atmospheric Sciences, National Academy of Science 19
• Best atmos resolution on Yellowstone was 16km, Cheyenne <10km • Simultaneous 4x increase in ocean resolution to .25 degree • ~2x increase in years of integration (Note: sharply limited years
integrated due to computing, storage constraints) • 3X may allow for:
• 5 km maximum atmospheric resolution • 0.1 degree ocean? • 1-1.5 PB analyzable output
• Probably 2 supercomputer gens from explicit representation of convection
Progression Hero-Climate Runs
NCAR Super Project Atmosphere model cycle
Atmosphere spectral
truncation
Atmos vertical levels
Ocean model
Ocean horizontal
Res.
Ocean vertical levels
Yellowstone (1.5PFLOPS)
(2013)
MINERVA IFS cy 38r1 TL319 (64km) TL639 (32km)
TL1279 (16km)
91 levels, top = 1 Pa
NEMO v 3.0/3.1
1 degree 42 levels
Cheyenne (5.34Pflops)(
2017)
METIS IFS cy 43r1 TCO199 (64km) TCO639 (16km) TCO1279 (9km)
91 levels, top = 1 Pa
NEMO v 3.4.1
TCO199: 1º TCO639: 0.25º TCO1279: 0.25º
TCO199:42 TCO639: 75 TCO1279: 75