29
Early Access to NCI Climate Data & Analysis Systems Ben Evans [email protected]

Early Access to NCI Climate Data & Analysis Systems Ben Evans [email protected]

Embed Size (px)

Citation preview

Page 1: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Early Access to NCI Climate Data & Analysis Systems

Ben [email protected]

Page 2: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

NCI: Vision and Role

• Vision– Provide Australian researchers with a world-class, high-end computing service

• Aim– To make a real difference to the conduct, ambition and outcomes of leading

Australian research

• Role– Sustain / develop Australian capability computing targeted to Earth Systems

Science– Provide comprehensive computational support for data-intensive research – Provide specialised support for key research communities– Develop and maintain a strategic plan to support the advancement of national

high-end computing services– Provide access through partner shares, national merit allocation scheme

• Partners– ANU, CSIRO, Geoscience Australia, Bureau of Meteorology (2012),

Page 3: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Providing capability in modelling & data-intensive science

• Peak Infrastructure– Vayu—Sun Constellation

– commissioned 2010– 140 Tflops, 240K SPEC,

36 TBytes/800 TBytes– Well balanced; Good performance

– Batch oriented, parallel processing

• Data Storage Cloud– Large/fast/reliable data storage

– Persistent disk-resident or tape– Relational databases– National datasets and support for

Virtual Observatories– New Storage Infrastructure (Nov, 2011)

– Dual site storage– Dual site data services

• Data Intensive Computing– Data analysis– Data Compute Cloud (2011)

Page 4: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Options for data services

• Data Cloud spans two physical sites• Internal network completed June 2011• Redundant 10 GigE network links to AARNet completed Aug 2011• Data migration from old tape silo to disk mid-Sept 2011.• Floor replacement and additional pod cooling completed end Sept

2011.• Storage architecture physical layer acceptance due end Oct 2011

Page 5: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Options for data services

• HSM (tape) – 1,2 copies at 2 locations• Scratch disk – one location, 2 speed options• Persistent disk – two locations, 2 speed options• Persistent data services – two locations, movable/sychronised VMs• Self-managed backup, synchronised data, or HSM

• Filesystems layered on top of hardware options.• Specialised: Databases, Storage Objects, HDFS, …

• Domain speciality: ESG, THREDDS/OpenDAP, Workflows, Data Archives (OAIS)

Page 6: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au
Page 7: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au
Page 8: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au
Page 9: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

NCI: Cloud Computing for Data Intensive Science

Special Features for the Data-Intensive Cloud• Integration into NCI Environment• Fast access to: Storage, Filesystems and Significant Datasets• Compute (and some GPU)• Cloud Environments – deployed on demand• NF software – large repository• Integrated data analysis tools and visualisation• Networks

– general (AARNet, CSIRO, GA, …), – direct (HPC Computing, Telescopes, Gene Sequencers, CT

scanners, …, International Federations)• More features ....

Page 10: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Astronomy Virtual Observatory

Astronomy Virtual Observatory

X-ray Optical Radio

• An era of cross-fertilisation - match surveys that span the electromagnetic spectrum

• ANU’s SkyMapper telescope - providing the world’s first digital map of the southern sky.

• the measurements of brightness and position will form the basis of countless science programs and calibrate other future surveys

Less time at the telescope - more time in front of the computer!

Page 11: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Earth Observations Workflow

Eg. National Environmental Satellite Data Backbone at NCI – CSIRO, GA

Earth Observing (EO) sensors carried on space-borne platforms produce large (multiple TB/year) data sets serving multiple research and application communities

The NCI established a single National archive of raw (unprocessed) MODIS data for the Australian region, and to support processing software (common to all users) and specialised tools for applications. LANDSAT is now being processed.

The high quality historical archive is complemented by exploiting the NCI network connectivity to download and merge data acquired directly from the spacecraft by local reception stations all round Australia in real time.

Data products and tools available through web technologies and embedded workflows.

Collaborators: King, Evans, Lewis, Wu, Lymburner

 

Page 12: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

An Earth Systems Virtual Laboratory• ESG – internationally significant climate model data• Data analysing capability

• …

Page 13: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Earth Systems Grid and Access to Analysis

Mission: The DOE SciDAC-2 Earth System Grid Center for Enabling Technologies (ESG-CET) project is to provide climate researchers worldwide with access to: data, information, models, analysis tools, and computational resources required to make sense of enormous climate simulation datasets.

ESGF – Federation of worldwide sites providing data. Core nodes are: PCMDI (LLNL), BADC (UK), DKRZ (MPI), NCAR/JPL. NCI joined to provide an Australian Node.

NCI : Support the ESG as the Australian node and subsequent processing– Support publishing of Australian data as a primary node– Store/Replicate priority overseas data– Provide associated compute/storage services through NCI shares.

Page 14: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Methods to get climate data

• Providing two main methods to access climate data:1. ESG portal web site2. Filesystem on the NCI data cloud

• Providing computational systems for analysing the data:1. dcc – data compute cloud2. …

Page 15: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Current Status

Status of Publishing - CSIRO-QCCCE mk3.6• Models run at QCCCE facility and data transferred to NCI• Processed data and through L1 checks (check)• Eg: 7 Tbytes of final output required 90+ Tbytes of temporary space on Vayu and 200

Tbytes on the data cloud• Moved data between vayu and dc – time consuming• Currently completing rest of data processing. • Expect final size to be ~30 TbytesStatus of Publishing – CAWCR ACCESS• Preindustrial runs commencing by early October• 500 years at 250 year intervals• First publish by end of Dec (20Tbytes). Second release around Feb.Status of ESG software• Stable release and serving data• Federated Identity/authorisation systems are evolving• Continual updates (data replication not yet working within ESG stack)

Page 16: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

CMIP5 Modelling groups and data being generated

CAWCR - Centre for Australian Weather and Climate ResearchCCCMA - Canadian Centre for Climate Modelling and AnalysisCCSM - Community Climate System ModelCMA-BCC - Beijing Climate Center, China Meteorological AdministrationCMCC - Centro Euro-Mediterraneo per I Cambiamenti ClimaticiCNRM-CERFACS - Centre National de Recherches Meteorologiques - Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique.EC-Earth - EuropeFIO - The First Institute of Oceanography, SOA, ChinaGCESS - College of Global Change and Earth System Science, Beijing Normal UniversityGFDL - Geophysical Fluid Dynamics LaboratoryINM - Russian Institute for Numerical MathematicsIPSL - Institut Pierre Simon Laplace

LASG - Institute of Atmospheric Physics, Chinese Academy of Sciences ChinaMIROC - University of Tokyo, National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and TechnologyMOHC - UK Met Office Hadley CentreMPI-M - Max Planck Institute for MeteorologyMRI - Japanese Meteorological InstituteNASA GISS- NASA Goddard Institute for Space Studies USANCAR - US National Centre for Atmospheric ResearchNCAS - -UK National Centre for Atmospheric ScienceNCC - Norwegian Climate CentreNIMR - Korean National Institute for Meteorological ResearchQCCCE-CSIRO - Queensland Climate Change Centre of Excellence and Commonwealth Scientific and Industrial Research OrganisationRSMAS - University of Miami - RSMAS

24 modelling groups, 25 platforms being described, 44 models, 65 grids, and 223 simulations

Page 17: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Simulations:~90,000 years~60 experiments within CMIP5~20 modelling centres (from around the world) using~several model configurations each~2 million output “atomic” datasets ~10's of petabytes of output~2 petabytes of CMIP5 requested output~1 petabyte of CMIP5 “replicated” outputWhich will be replicated at a number of sites (including ours), arriving now!

Of the replicants:~ 220 TB decadal~ 540 TB long term~ 220 TB atmos-only

~80 TB of 3hourly data~215 TB of ocean 3d monthly data!~250 TB for the cloud feedbacks!~10 TB of land-biochemistry (from the long term experiments alone).

Slide sourced from Metafor web site early 2011.

CMIP5 Modelling groups and data being generated

Page 18: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

CMIP5 Submission

Page 19: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

SummaryModeling centers 13

Models 17

Data nodes 13

Gateways 5

Datasets 15,950

Size 177.4 TB

Files 40,7792

CMIP5 Archive Status – automatically generated

Last Update: Monday, 19 September 2011 03:21AM (UTC)

Page 20: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Protocol Number of Centers

Number of Models

Number of Datasets

Size (TB)

HTTP 13 19 15,958 177.4

GridFTP 6 6 1819 41.6

OPeNDAP 3 3 408 25

Datasets by Access Protocol

Page 21: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Modelling Centre

Model Capacity (TB)

CSIRO-QCCCE CSIRO-Mk3.6 ~15

IPSL IPSL-CM5A-LR 9.4

INM inmcm4 6.3

CMIP3 All 35.4

All High Priority Variables -

Lawson

7.14

Datasets on our data cloud

Page 22: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Status of Data Replication

Page 23: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Status of Data Replication

Page 24: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Data Replication

Data Replication supported by two methods1. Bulk-fast transfers by ESG nodes2. User-initiated data transfers at variable level

• First method is fast but requires coordination at the sites and international networks

• Second method can be very slow but relatively “simple”

• We will provide more details during our session today.

Page 25: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Data Processing Capability

• We have provided a first iteration of data processing capability – Early Access mode

• dcc provides data processing directly on the data cloud• Upgrades in place as more hardware becomes available

– Filesystem upgrade – mid Nov– Data processing – mid Nov (pending order)– Some software license issues being resolved

• Data processing pipelines being understood/established. Eg CVC data pipeline

Page 26: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Data Replication

Issues:• Slow links to some sites can get swamped• ESG software not ready to manage official replicas• Don’t have a clear view of when model data will be

available• Data may need to be revoked as errors are found.• Data capacity being closely managed/prioritised.• CAWCR, CoE, CSIRO, BoM and shareholders monitoring for

future expansion

Page 27: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

Keeping in touch

• ESG status messages / ESG Federation web site• [email protected]• Twitter feeds:

– @NCIEarthSystems– @NCIdatacloud– @NCIpeaksystems

• Open weekly Townhall Q/A meeting – details to be established.• Regular meetings between CAWCR, CoE and NCI on status. • Keep your Team leaders advised on your requirements/issues.• More developments planned through a VL proposal.

Page 28: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

THE END

Page 29: Early Access to NCI Climate Data & Analysis Systems Ben Evans Ben.Evans@anu.edu.au

NCI Data Cloud