39
nci.org.au nci.org.au @NCInews Computational Environments and Analysis methods available on the NCI HPC & HPD Platform IN53E – 01 Ben Evans 1 , Lesley Wyborn 1 , Adam Lewis 2 , Clinton Foster 2 , Stuart Minchin 2 , Tim Pugh 3 , Alf Uhlerr 4 , Bradley Evans 5 , 1 ANU, 2 Geoscience Australia, 3 Bureau of Meteorology, 4 CSIRO, 5 Macquarie University

Nci.org.au @NCInews Computational Environments and Analysis methods available on the NCI HPC & HPD Platform IN53E – 01 Ben Evans 1, Lesley Wyborn 1, Adam

Embed Size (px)

Citation preview

nci.org.aunci.org.au

@NCInews

Computational Environments and Analysis methods available on the NCI HPC & HPD Platform

IN53E – 01

Ben Evans1, Lesley Wyborn1, Adam Lewis2, Clinton Foster2, Stuart Minchin2, Tim Pugh3, Alf Uhlerr4, Bradley Evans5,

1ANU, 2Geoscience Australia, 3Bureau of Meteorology, 4CSIRO, 5Macquarie University

nci.org.au

Overview

• High Performance Data (HPD) - data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans et al., in press)

– HPC – turning compute into IO-bound problems

– HPD – turning IO-bound into ontology + semantic problems

• What are the HPC and HPD drivers?

• Build re-usable/sustainable software for use in Virtual Laboratories – integrated set of software for science, a mix of new and

familiar

• What have we done?• What’s next?

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

1/34

nci.org.au© National Computational Infrastructure 2014

Numerical Weather Prediction Roadmap

Model Topography of Sydney, NSW

2 x daily 10-day & 3-day forecast40km Global Model

4 x daily 3-day forecast12km Regional Model

Sydney, NSW (research 1.5km topography)

4 x daily 36-hour forecast4km City/State Model

TC

Increasing model resolutionfor improved local information

Future model ensembles for likelihood of significant weather

2 x daily 10-day & 3-day forecast12km Global Model

8 x daily 3-day forecast5km Regional Model

24 x daily 18h or 36h forecast1.0km City/State Model

2013

2020

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

C/- Tim Pugh, BoM

2/34

nci.org.au© National Computational Infrastructure 2014

Capture, analysis & application of Earth Obs

c/- Adam Lewis, GA

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

3/34

nci.org.au© National Computational Infrastructure 2014

How to bring as much observational scrutiny as possible

to the CMIP/IPCC process?

How to best utilize the wealth of satellite observations for the

CMIP/IPCC process?

c/- Robert Ferraro, NASA/JPL, ESGF F2F, 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Combining Satellite and Climate4/34

nci.org.au

Top 500 Super Computer list since 1990

• Fast-and-flexible data access to structured data is required

• The needs to be a balance between processing power and ability to access data (data scaling)

• The focus is for on-demand direct access to large data sources

• enabling High performance analytics and analysis tools directly on that contenthttp://www.top500.org/statistics/perfdevel/

Current NCI

Next NCI

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

© National Computational Infrastructure 2014

5/34

nci.org.au© National Computational Infrastructure 2014

Elephant Flows Place Great Demands on Networks

Physical pipe that leaks water at rate of .0046% by volume.

Network ‘pipe’ that drops packets at rate of .0046%.

Result100% of data transferred, slowly, at <<5% optimal speed.

Result 99.9954% of water transferred.

essentially fixed

determined by speed of

light

With proper engineering, we can minimize packet loss.

Assumptions: 10Gbps TCP flow, 80ms RTT. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design

Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

6/34

nci.org.au© National Computational Infrastructure 2014

Raijin:• 57,472 cores (Intel Xeon Sandy Bridge technology,

2.6 GHz) in 3592 compute nodes;• 160 TBytes (approx.) of main memory;• Infiniband FDR interconnect; and• 7 PBytes (approx.) of usable fast filesystem (for

short-term scratch space).• 1.5 MW power; 100 tonnes of water in cooling

Partner Cloud• Same generation of technology as raijin (Intel

Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores;

• Infiniband FDR interconnect;• Collaborative platform for services and• The platform for hosting non-batch services

NCI Nectar Cloud• Same generation as partner cloud• Non-managed environment• Weak integration

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Computational and Cloud Platforms7/34

nci.org.au© National Computational Infrastructure 2014

Per-Tenant public IP assignments (CIDR boundaries – typically /29)

FD

R I

B

FD

R I

B

FD

R I

BF

DR

IB

FD

R I

B

FD

R I

B

OpenStack private IP (flat network*) - quota managed

NFS

Lustre

NFS

SSDSSDSSDSSDSSDSSD

NCI Cloud

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

8/34

nci.org.au

NCI’s integrated high-performance environment

10 GigE

/g/data 56Gb FDR IB Fabric

/g/data1~7.4 PB

/g/data2~6.7 PB

/short7.6PB

/home, /system, /images,

/apps

Cache 1.0PB, Tape 12.3PB

Massdata (tape) Persistent global parallel filesystem

Raijin high-speed filesystem

Raijin HPC Compute

Raijin Login + Data movers

NCI data movers

To s

eco

nd

dat

a ce

ntr

e

Raijin 56Gb FDR IB Fabric

Internet

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

9/34

nci.org.aunci.org.au

Building The Platform for Earth System modeling & Analysis

© National Computational Infrastructure 2014

10PB+ Research Data

Server-side analysis and visualization

Data Services

THREDDS

VDI: Cloud scale user desktops on data

Web-time analytics software

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

10/34

nci.org.au© National Computational Infrastructure 2014

10 PB of Data for Interdisciplinary Science

BOM GA CSIRO ANU Inter-national

Other National

CMIP5 3PB

Astronomy (Optical) 200 TB

WaterOcean1.5 PB

Atmosphere2.4 PB

Earth Observ.

2 PB

MarineVideos 10 TB

Geophysics 300 TB

Weather340 TB

Mirrored from major science agencies and other sources

Bathy, DEM

100 TB

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

11/34

nci.org.au© National Computational Infrastructure 2014

Data Collections Approx. Capacity

CMIP5, CORDEX ~3 Pbytes

ACCESS products 2.4 Pbytes

LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS 1.5 Pbytes

Digital Elevation, Bathymetry, Onshore Geophysics 700 Tbytes

Seasonal Climate 700 Tbytes

Bureau of Meteorology Observations 350 Tbytes

Bureau of Meteorology Ocean-Marine 350 Tbytes

Terrestrial Ecosystem 290 Tbytes

Reanalysis products 100 Tbytes

National Environment Research Data Collections (NERDC)1. Climate/ESS Model Assets and Data Products2. Earth and Marine Observations and Data Products3. Geoscience Collections4. Terrestrial Ecosystems Collections5. Water Management and Hydrology Collections

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

12/34

nci.org.aunci.org.au

Internationally sourced • Satellite Data (USGS, NASA, JAXA, ESA, …)• Reanalysis (ECMWF, NCEP, NCAR, …)• Climate Data (CMIP5, AMIP, GeoMIP, CORDEX, …)• Ocean Modelling (Earth Simulator, NOAA, GFDL, …)These will only increase as we depend on more data, and some will be replicated.

How should we keep this in sync, versioned, and back-referenced for the supplier?

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

13/34

nci.org.au© National Computational Infrastructure 2014

• allow multiple data types but convert proprietary ones• standardize record format and conventions

• Expose all attributes for search • not just collection-level search, not just datasets, all data• What are the handles we need to access the data?

• Provide more programmatic interfaces and link up data and compute resources• More server side processing

• Add the semantic meaning to the data• Is it scientifically appropriate for a data service to aggregate/interpolate?• CMIP5 successful because we constrained the problem

• What unique identifiers do we need? • DOI is only part of the story.• Versioning is important.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Some Data Challenges14/34

nci.org.au

Recording Hierarchy in 191391. Data collection – eg Climate and Weather modelling2. Series – eg. Landsat 73. Datasets – Semantically the same4. Attributes – including variables (versions, errata)

Metadata Hierarchy for discovery

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

15/34

nci.org.au

Geonetwork: Collection (and Series?)

Dataset specific Geonetworks

Dataset 1 Dataset 2 Dataset 3 Dataset n

Dataset 1Dataset 2Dataset 3…

CSW Harvesting and Cross-walks (eg RIF-CS Adapter)

Full harvest of the metadata

Full Search GeoNetworkFull Search GeoNetwork (or domain)

Dataset 1Dataset 2Dataset 3…

Domain Specific or User deep query

NCI GeoNetwork architecture (basic catalogues)

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Metadata Hierarchy implementation16/34

nci.org.au

GeoNetwork catalogue

Lucene database

DAP, OGC, … Services

/g/data1

/g/data2

Supercomputer access

Virtual lab

© National Computational Infrastructure 2014

Trialing Elastic Search

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Finding data and services17/34

nci.org.au© National Computational Infrastructure 2014

Recording full product description … now need to contextually embed for programs

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

18/34

nci.org.au© National Computational Infrastructure 2014

The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in

virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a

complete waste of time.Sustainability:• The system should capture my operations. Why am I a

secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk

What’s worse?

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.

(Actually same issue with sub-contracting work, and multiparty agreements)

19/34

Collaborating with Researchers/Developers

nci.org.au© National Computational Infrastructure 2014

The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in

virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a

complete waste of time.Sustainability:• The system should capture my operations. Why am I a

secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk

What’s worse? Perhaps the opposite to all these items.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

19/34

Collaborating with Researchers/Developers

• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.

(Actually same issue with sub-contracting work, and multiparty agreements)

nci.org.au© National Computational Infrastructure 2014

The selfish practical researcher:• Not Virtual Organisations. Interoperable tools in

virtual laboratories. Make seamless.• Anti-collaboration: just apply standards• Micro-ambition: did I get stuff done quicker/better• Data handling (and particularly movement!) is a

complete waste of time.Sustainability:• The system should capture my operations. Why am I a

secretary? I can’t remember what I did? The system did things that I didn’t know anyway! www.onlychild.org.uk

What’s worse? Perhaps the opposite to all these items. Need a strategy to properly address this.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Collaborating with Researchers/Developers 19/34

• Cite-me! People should recognise my genius! Do I have to be in PR?• I’ve done my bit, and its really clever. Here you go, I am going to do something else.

(Actually same issue with sub-contracting work, and multiparty agreements)

nci.org.au© National Computational Infrastructure 2014

• Project driven means:• define a use-case• end-date on the work

• The researcher / leading developers may be ahead of the curve• We want to best tap this time and energy, … and to have a reasonable chance of converting for sustainability

The Nth Degree, ST-TNG

Barclay: Computer, begin new program. Create as follows: workstation chair. Now, create a standard alphanumeric console, positioned for the left hand. Now an iconic display console, positioned for the right hand. Tie both consoles into the Enterprise main computer core, utilizing neural-scan interface. Enterprise Computer: There is no such device on file. Barclay: No problem. Here's how you build it.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Collaborating with Researchers/Developers 20/34

nci.org.au© National Computational Infrastructure 2014

Virtual Labs:• Separating Researcher from Software builders• Cloud is an enabler, but:

• don’t make researchers become full system admins.• save developers from being operational

Productivity

Perspiration

Proj1:Start Proj1:End

Project lifecycle – and preparing success

Proj2-4:Start Proj2-4:End

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Prototype to Production - anti-Mine craft 21/34

nci.org.au© National Computational Infrastructure 2014

Development Phase in a project

VL Managers

Dev

elop

ers

Hea

dspa

ce h

ours

VL Managers

Dev

elop

er

Poorly executed

Developer

Reasonablyexecuted

VL Mgr.

Wellexecuted

?

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Prototype to Production - anti-Mine craft 22/34

nci.org.au© National Computational Infrastructure 2014

Prototype to Production - anti-Mine craft

Development Phase in a project

VL Managers

Dev

elop

ers

Hea

dspa

ce h

ours

VL Managers

Dev

elop

er

Poorly executed

Developer

Reasonablyexecuted

VL Mgr

Wellexecuted

Changed Scope – adopted broadly

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

22/34

nci.org.aunci.org.au

Virtual Laboratory driven software patterns

Basic OS functions

Common Modules

Bespoke Services

Special config choices

Super Software Stack

NCI Stack 1NCI Env Stack

WorkflowX

Analytics Stack

2xStack1

Modify Stack1

Modify Stack 2P2P

Vis Stack

Gridftp

Take Stacks from UpstreamAnd use as Bundles

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

23/34

nci.org.au© National Computational Infrastructure 2014

Step 1: Development• Get template for development• What is special, separate out what is common• Reuse other software stacks where possible

Step 2: Prototype• Deploy in an isolated tenant of a cloud• Determine dependencies.• Test cases to demonstrate correctly functioning.

Step 3: Sustainability• Pull repo into operational tenant• Prepare bundle for integration with rest of framework• Hand back cleaned bundle• Establish DevOps process

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Transition from developer, to prototype, to DevOps24/34

nci.org.aunci.org.au

NCI Core Bundles

Community1 repo

Community2 repo

Virtual LaboratoryOperational Bundle- Git controlled- pull model- continuous integration testing

DevOps approach to building and operating environments

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

25/34

nci.org.au© National Computational Infrastructure 2014

• Separates roles and responsibilities: • Specialist on package• VL managers• system admin

• anti-architecture: “Architecture” to “framework”• flexible with technology change• makes handover easier

• Both Test/Dev/Ops and patches/rollback become BAU• Sharable bundles• Can tag release of software stacks• Precondition for trusted software stacks• Provenance - Scientific / gov policy scrutiny

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Advantages26/34

nci.org.au© National Computational Infrastructure 2014

Transforms the system admins• Role Change, from gatekeeper to DevOps management• New skills, new way of thinking

• Separates out root trust for global storage• dev teams are limited to test areas• Root access for ops but can be a limited group

• Only Operating System provided to boot from• Remove old-style Golden (fragile) Images• Easier to security patch

• glue bundles together into different software stacks• addresses the bloated node problem• scale out generally easier• Standard system configs go into “core” bundle (LDAP, logs, easter eggs)• Recast project specific bundles to common, or core.

• Performance issues can be addressed across the Virtual Labs/in the core

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Advantages cont…27/34

nci.org.aunci.org.au© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

A snapshot of layered bundles28/34

nci.org.aunci.org.au

Collaboration: Bureau of Meteorology, CSIRO, NCI, ARCCSS

Climate and Weather Science Lab

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

29/34

nci.org.au

Timetable- Early access started on 2 Sept, General release to CWSlab week Late September- Incorporate into all VLs (eg current AGDC Datacube to be upgraded)

VDI - Virtualised Desktop Infrastructure

© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

30/34

nci.org.au© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

VDI – cont …31/34

nci.org.au© National Computational Infrastructure 2014

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

VDI – cont …32/34

nci.org.au© National Computational Infrastructure 2014

• Trans-disciplinary science To publish, catalogue and access self-documented data and software for enhancing trans-disciplinary, big data science within interoperable data services and protocols.

• Integrity of ScienceManaged services to capture a workflow’s process as a comparable, traceable output.Ease-of-access to data and software for enhanced workflow development and repeatable science which can be conducted with less effort or an acceleration of outputs.

• Integrity of DataThe data repository services to ensure data integrity, provenance records, universal identifiers, repeatable data discovery and access from workflows or interactive users.

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

Progress toward Major Milestones33/34

nci.org.au© National Computational Infrastructure 2014

• Auth: Authentication and Authorisation• Path forward …. Oauth2-style model.• How to enable at all service provider points?• Attributes, not virtual organisations

• Trusted software• Related to citation, but same issues as data

• Provenance • Need well thought out complex graphs, not just pre-canned stacks

• Effectively using new data technology• Its no longer just POSIX• Do we have to copy the same data into different forms?• Libraries increasingly have a new role to play to hide complexity

IN53E-01:“NCI Computational Environments and HPC/HPD” #AGU14, 19 December, 2014 @BenJKEvans

New Challenges34/34

nci.org.aunci.org.au

[email protected]

@BenJKEvans