14
Fleet Numerical… Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority… Remote HPC Computing Mr. Robert Burke 1

Remote HPC Computing

  • Upload
    kemp

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Remote HPC Computing. Mr. Robert Burke. Relevant FNMOC Projects. Enterprise Operational Modeling (EOM) Enable FNMOC exploitation of enterprise-wide HPC assets Run models remotely at the Navy DSRC Distribute data directly to customers from Navy DSRC - PowerPoint PPT Presentation

Citation preview

Page 1: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

Remote HPC Computing

Mr. Robert Burke

1

Page 2: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

2

Relevant FNMOC Projects

• Enterprise Operational Modeling (EOM)– Enable FNMOC exploitation of enterprise-wide HPC assets– Run models remotely at the Navy DSRC – Distribute data directly to customers from Navy DSRC

• Fully Coupled COAMPS-OS Modeling Capability Initiative

• Atmospheric Model Bridge Strategy– Interim solution until Earth System Prediction Capability (ESPC)– Anticipated by 2015– Needed until at least 2020 for ESPC implementation

Page 3: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

3

NOGAPS to ESPC Baseline and Assumptions

• NOGAPS will be replaced with the Navy Global Environmental Model (NAVGEM) in 2011

– New Semi-Lagrangian dynamic core and new physics – Resolution upgrades will continue if computational resources allow– New data upgrades will continue if available and supported– NUOPC ensemble and common standards lead to national system

• ESPC (or other next generation system) is targeted for operational implementation by 2020 - 2025

– Anticipate a national modeling capability with Navy as contributor– Development and schedule of ESPC is uncertain

•Bridge strategy required for Navy global NWP between 2013 and 2020– Based on NAVGEM data assimilation cycle run at FNMOC, with extended forecasts run at DSRC – Goal is to maintain Navy competence while investing in ESPC– Computational, manpower, and R&D resources will constrain COAs

Page 4: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

4

HPC Requirements Implied by Models Roadmap

Page 5: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

5

HPC Requirements Implied by Models Roadmap

Page 6: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

6

EOM Project Plans

• FY11 EOM Plans– Operationalize COAMPS-OS for NAVO regions at the Navy DSRC

• Data Management and Transfer• Job Management and Control• Information Assurance (IA)• Documentation – Processes, Approvals, SOPs

– Demonstrate NOGAPS Ensemble at the Navy DSRC

• FY12 and beyond EOM Plans– Optimize Operationalization among FNMOC, NAVO, and Navy DSRC

• Data Management and Transfer• Job Management and Control• Information Assurance (IA)• Configuration Management

– Operationalize other Compute Intensive Models at Navy DSRC• NAVGEM• Global ensemble• COAMPS-OS ensemble

Page 7: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

7

EOM Data Plan

• COAMPS-OS Operational Data Transfer Alternatives– Best solution: data transfer mechanism via ticketless, kerberized remote copy

• Best data transfer performance• Can be completely automated with any scheduling mechanism• Bi-directional data transfer, either system can push or pull data• Requires one or more (scalable) A2 Emerald gateway nodes to be provisioned and kerberized• Navy ODAA (NAVNETWARCOM) waiver needed to address IA issues

– Interim solution: data sources via CAGIPS and BFT• CAGIPS for all supported data types (currently NOGAPS initial and boundary conditions)• BFT for all data types• Backup data source: GODAE for NAVDAS atmospheric observations and NCODA ocean observations

– Interim solution: data transfer back to FNMOC via DMZ

Page 8: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

8

EOM Data Transfer

COAMPS-OS Data Transfer to Navy DSRC(Primary Data Transfer Paths)

FNMOC BFTs

DPSR Observation Data

NAVO BASTION

Sftp Pull/Push Scripts

FNMOC BFTs

CAGIPS

DATMS-U

Ocean & Atmospheric Obs

DREN

NOGAPS BC

FIREWALL

FIREWALL

FIREWALL

DMZ

DMZ

DSRC Einstein

eslogin1

Rulebot

MAC II

MAC II

MAC III

MAC II

Page 9: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

9

EOM Data Transfer

FNMOC

CAGIPS

DPSR

Ocean and Atmospheric ObservationsAnd NOGAPS Boundary Conditions

SFTP MOVE SCRIPTS

BASTION

DPSR & CAGIPS TO BASTION

BASTION SFTP TO EINSTEIN

(Alternate Data Path)

DSRC Einstein

eslogin1

INSIDE FIREWALL

INSIDE FIREWALLINSIDE DMZ MAC II MAC II

MAC III

Page 10: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

10

EOM Job Management

• COAMPS-OS Job Management Situation– NAVO runs COAMPS-dependent ocean models once daily– Model run mechanisms and paradigms

• NAVO runs are time dependent, automated via script, and run generally without intervention• FNMOC runs are event dependent, tightly controlled and monitored

• COAMPS-OS Operational Job Management Alternatives– PBS Pro remote execution without Supervisor Monitor Scheduler (SMS)

• PBS Pro unkerberized already in use at both FNMOC and Navy DSRC• Longer term plans for EOM should minimize software dependency• Alternate control mechanisms with greater operator activity for initiating and controlling run are possible and necessary

– Rapid Ocean Assessment Model Environment Relocatable (ROAMER) System

• Script-based job monitoring system• Could be tailored and extended for FNMOC usage

Page 11: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

11

EOM IA

• COAMPS-OS IA Situation– EOM Framework uses three types of connectivity

• FNMOC and Navy DSRC connectivity (logon, data transfer, run models)• Data transfer from FNMOC to NAVO and NAVO to Navy DSRC• FNMOC Job Initiation, Control and Monitoring of DSRC model runs

– FNMOC and Navy DSRC Maintain Different Security Postures• FNMOC part of operational community requiring full C&A with necessary demilitarized zones (DMZ), firewall, and border routers• Navy DSRC an R&D HPC center bound by HPCMP and DOD IA policies for R&D systems – currently no DMZ or firewall• Navy DSRC does have NAVNETWARCOM ATO with residual risk rating of Low

– EOM IA Special Requirements• Most Ports, Protocols and Services (PPS) required for connection of FNMOC workstations and FNMOC operational cluster to Navy DSRC are Navy network policy compliant• Data transfer and job management between MAC II (FNMOC) and MAC III (DSRC) systems

Page 12: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

12

EOM IA Issues Explored

• DoD IA Mission Assurance Categories – Mission Assurance Category I (MAC I)

• Systems handling vital information to mission effectiveness of deployed or contingency forces in terms of both content and timeliness• Require most stringent protection measures• Not applicable to FNMOC

– Mission Assurance Category II (MAC II)• Systems handling important information to support deployed or contingency forces• Consequences of loss of integrity are unacceptable• Loss of availability can only be tolerated for a short time• Require safeguards beyond best practices to ensure adequate assurance• FNMOC operational systems

– Mission Assurance Category III (MAC III)• Systems handling information necessary for the conduct of day-to-day business, but does not materially affect support to deployed or contingency forces in the short term• Consequences of loss of integrity could include include delay or degradation of services or commodities enabling routine activities• Navy DSRC

Page 13: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

13

EOM Considerations with the DSRC

• EOM IA Strategy– Minimize software and PPS used exclusively for EOM–Trade off EOM functionality and ease-of-use to gain IA, maintainability, and mobility–Obtain ODAA approval for preferred data transfer alternatives

• DSRC Technology– Hardware technology refresh cycle

• DSRC typically three years, • FNMOC 5-7 years

– Software availability• DSRC a compute-engine• FNMOC requirements

• Job management• Process control• Configuration management

Page 14: Remote HPC Computing

Fleet Numerical…Supercomputing Excellence for Fleet Safety and Warfighter Decision Superiority…

Summary

• Leveraging remote HPC assets is part of a long-range strategy to deliver capability in a budget constrained world

• There are unique challenges presented by DoD Information Assurance requirements

• By carefully choosing what jobs can run remotely, “cloud-like” computing is possible

14