32
Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Embed Size (px)

Citation preview

Page 1: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Using GENI for computational science

Ilya Baldin

RENCI, UNC – Chapel Hill

Page 2: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Networked Clouds

Cloud and Network Providers

Observatory

Wind tunnel

Science Workflows

Page 3: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

ExoGENI Testbed

Page 4: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

4

Computational/Data Science Projects on ExoGENI

• ADAMANT – Building tools for enabling workflow-based scientific applications on dynamic infrastructure (RENCI, Duke, USC/ISI)

• RADII – Building tools for supporting collaborative data-driven science (RENCI)

• GENI ScienceShakedown – ADCIRC storm surge modeling on GENI

• Goal of presentation to demonstrate some of the things that are possible with GENI today

Page 5: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Presentation title goes here 5

ADAMANT

Page 6: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Presentation title goes here 6

Scientific Workflows – Dynamic Use Case

Page 7: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

7

CC-NIE ADAMANT – Pegasus/ExoGENI

• Network Infrastructure-as-a-Service (NIaaS) for workflow-driven applications— Tools for workflows integrated with adaptive infrastructure

• Workflows triggering adaptive infrastructure— Pegasus workflows using ExoGENI— Adapt to application demands (compute, network, storage)— Integrate data movement into NIaaS (on-ramps)

• Target applications — Montage Galactic plane ensemble: Astronomy mosaics— Genomics: High-Throughput Sequencing

Page 8: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

8

ExoGENI: Enabling Features for Workflows

• On-Ramps / Stitchports— Connect ExoGENI to existing static

infrastructure to import/export

• Storage slivering— Networked storage: iSCSI target on

dataplane

— Neuca tools attach lun, format and mount filesystem

• Inter-domain links, multipoint broadcast networks

Page 9: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Computational workflows in Genomics

• Several versions as we scaled:

• Single machine• Cluster based• MapSeq: specialized

code & Condor• Pegasus & Condor

RNA-Seq

WGS

Page 10: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

10

VM

VMVMVMVMVMVM

VMVMVMVMVMVM

VMVMVMVMVMVM

Cloud providers (compute, data)

Goal: learning to use NIaaS for biomedical research

VMVMSlice 1

VMVM

VM

Slice 2

VM

User or workflow provisioned & isolated slices

VMVMVMVM

Network providers

Page 11: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Goal: Management of data flows in NIaaS

RENCI UNC

iRODS Data GridiCAT

RERE

VMVM

VM

Slice 2

VM

• Layer 2 connection within the slice

• Metadata control• Lab X can compute on Project

Y data in the cloud

• User X can move data from Study A to the cloud

• Data from Study W cannot remain on cloud resources

• Ease of access• Control over access• Auditing• Provenance

Page 12: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

12

Example ExoGENI requests auto-generated

Page 13: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Application to NIaaS - Architecture

Page 14: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Presentation title goes here 14

RADII

Page 15: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

RADII

• RADII: Resource Aware Data-centric Collaboration Infrastructure– Middleware to facilitate data-driven collaborations for domain

researchers and a commodity to the science community– Reducing the large gap between procuring the required

infrastructure and manage data transfers efficiently• Integration of data-grid (iRODS) and NIaaS (ORCA)

technologies on ExoGENI infrastructure – Novel tools to map data processes, computations, storage and

organization entities onto infrastructure with intuitive GUI based application

– Novel data-centric resource management mechanisms for provisioning and de-provisioning resources dynamically through out the lifecycle of collaborations

Page 16: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Why iRODS in RADII?

– RADII Policies to iRODS Rule Language• Easy to map policies to iRODS Dynamic PEP

• Reduced complexity for RADII

– Distributed and Elastic Data Grid

– Resource Monitoring Framework

– Geo-aware Resource hierarchy creation via

composable iRODS

– Metadata tagging

Page 17: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Resource Awareness

• iRODS RMS provides node specific resource utilization

• End-to-End parameters such as throughput, current network flow is important for judicious placement, replication and retrieval decision

• Created end-to-end Throughput, Latency and instantaneous transfer RX/TX per second monitoring.

• The best server selection based on end-to-end utility value:

Page 18: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Experiment Topology

Figure: Experimental Setup Topology

Page 19: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Experimental Setup

• The sites were : UCD, SL, UH, FIU• Parallel and multithreaded file ingestion from each of the clients• Total 400GB file ingestion from each client• One copy at the edge node and another replication based on utile

value.

Page 20: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Edge Put and Remote Replication Time

Figure: Edge Node Put Time Figure: Remote Replication Time

Page 21: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Presentation title goes here 21

ScienceShakedown

Page 22: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Motivation

Hurricane Sandy (2012)

Page 23: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Motivation

Real-time, on-demand computations of storm surge impacts• Hazards to coastal areas a major concern • Hazard/Threat Information needed ASAP (Urgently)• Critical need for:

– detailed high spatial resolution large compute resources

• Federal Forecast cycle every 6 hrs• Must be well within Cycle to be relevant/useful • I.e., New information at 5:59 is already old!!!

Page 24: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Computing Storm Surge

• ADCIRC Storm Surge Model– FEMA-approved for Coastal Flood Insurance Studies – Very high spatial resolution (millions of triangles)– Typically use 256-1024 cores for real-time (one simulation!)

ADCIRC grid for coastal North Carolina

Page 25: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Tackling Uncertainty

Research EnsembleNSF Hazards SEES project

22 members, H. Floyd (1999)

One simulation is NOT enough!Probabilistic Assessment of Hurricanes

A “few” likely hurricanesFully dynamic atmosphere (WRF)

Page 26: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Why GENI?

• Current limitations: Real-time demands for compute resource

– Large demands for real-time compute resources during storms

– Not enough demand to dedicate a cluster year-round

Page 27: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Why GENI?

• Current limitations: Real-time demands for compute resource

– Large demands for real-time compute resources during storms

– Not enough demand to dedicate a cluster year-round

• GENI enables

– Federation of resources

– Cloud bursting, urgent, on-demand

– High-speed data transfers to/from/between remote resources

– Replicate data/compute across geographic areas

• Resiliency, performance

Page 28: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Storm Surge Workflow

Parallel task (32 Core MPI)

• Each ensemble member is a high-performance parallel task that calculates one storm

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

ComputeCore

Page 29: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

Slice Topology

• 11 GENI sites (1 ensemble manager, 10 compute sites)• Topology: 92 VMs (368 cores), 10 inter-domain VLANs, 1 TB iSCSI storage• HPC compute nodes: 80 compute nodes (320 cores) from 10 sites

Page 30: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

ADCIRC Results from GENIStorm Surge for 6 simulations

N11 N17

N01 N14 N16

N20

Small Threat

Big Threat

Page 31: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

31

Conclusions

• GENI testbed represents a kind of shared infrastructure suitable for prototyping of solutions for some computational science domains

• GENI technologies represent a collection of enabling mechanisms that can provide foundation for the future federated science cyberinfrastructure

• Different members of GENI federations offer different capabilities for their users, suitable for a variety of problems

Page 32: Using GENI for computational science Ilya Baldin RENCI, UNC – Chapel Hill

32

Thank you!

• Funders

• Partners