Upload
adele-waters
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Using GENI for computational science
Ilya Baldin
RENCI, UNC – Chapel Hill
Networked Clouds
Cloud and Network Providers
Observatory
Wind tunnel
Science Workflows
ExoGENI Testbed
4
Computational/Data Science Projects on ExoGENI
• ADAMANT – Building tools for enabling workflow-based scientific applications on dynamic infrastructure (RENCI, Duke, USC/ISI)
• RADII – Building tools for supporting collaborative data-driven science (RENCI)
• GENI ScienceShakedown – ADCIRC storm surge modeling on GENI
• Goal of presentation to demonstrate some of the things that are possible with GENI today
Presentation title goes here 5
ADAMANT
Presentation title goes here 6
Scientific Workflows – Dynamic Use Case
7
CC-NIE ADAMANT – Pegasus/ExoGENI
• Network Infrastructure-as-a-Service (NIaaS) for workflow-driven applications— Tools for workflows integrated with adaptive infrastructure
• Workflows triggering adaptive infrastructure— Pegasus workflows using ExoGENI— Adapt to application demands (compute, network, storage)— Integrate data movement into NIaaS (on-ramps)
• Target applications — Montage Galactic plane ensemble: Astronomy mosaics— Genomics: High-Throughput Sequencing
8
ExoGENI: Enabling Features for Workflows
• On-Ramps / Stitchports— Connect ExoGENI to existing static
infrastructure to import/export
• Storage slivering— Networked storage: iSCSI target on
dataplane
— Neuca tools attach lun, format and mount filesystem
• Inter-domain links, multipoint broadcast networks
Computational workflows in Genomics
• Several versions as we scaled:
• Single machine• Cluster based• MapSeq: specialized
code & Condor• Pegasus & Condor
RNA-Seq
WGS
10
VM
VMVMVMVMVMVM
VMVMVMVMVMVM
VMVMVMVMVMVM
Cloud providers (compute, data)
Goal: learning to use NIaaS for biomedical research
VMVMSlice 1
VMVM
VM
Slice 2
VM
User or workflow provisioned & isolated slices
VMVMVMVM
Network providers
Goal: Management of data flows in NIaaS
RENCI UNC
iRODS Data GridiCAT
RERE
VMVM
VM
Slice 2
VM
• Layer 2 connection within the slice
• Metadata control• Lab X can compute on Project
Y data in the cloud
• User X can move data from Study A to the cloud
• Data from Study W cannot remain on cloud resources
• Ease of access• Control over access• Auditing• Provenance
12
Example ExoGENI requests auto-generated
Application to NIaaS - Architecture
Presentation title goes here 14
RADII
RADII
• RADII: Resource Aware Data-centric Collaboration Infrastructure– Middleware to facilitate data-driven collaborations for domain
researchers and a commodity to the science community– Reducing the large gap between procuring the required
infrastructure and manage data transfers efficiently• Integration of data-grid (iRODS) and NIaaS (ORCA)
technologies on ExoGENI infrastructure – Novel tools to map data processes, computations, storage and
organization entities onto infrastructure with intuitive GUI based application
– Novel data-centric resource management mechanisms for provisioning and de-provisioning resources dynamically through out the lifecycle of collaborations
Why iRODS in RADII?
– RADII Policies to iRODS Rule Language• Easy to map policies to iRODS Dynamic PEP
• Reduced complexity for RADII
– Distributed and Elastic Data Grid
– Resource Monitoring Framework
– Geo-aware Resource hierarchy creation via
composable iRODS
– Metadata tagging
Resource Awareness
• iRODS RMS provides node specific resource utilization
• End-to-End parameters such as throughput, current network flow is important for judicious placement, replication and retrieval decision
• Created end-to-end Throughput, Latency and instantaneous transfer RX/TX per second monitoring.
• The best server selection based on end-to-end utility value:
Experiment Topology
Figure: Experimental Setup Topology
Experimental Setup
• The sites were : UCD, SL, UH, FIU• Parallel and multithreaded file ingestion from each of the clients• Total 400GB file ingestion from each client• One copy at the edge node and another replication based on utile
value.
Edge Put and Remote Replication Time
Figure: Edge Node Put Time Figure: Remote Replication Time
Presentation title goes here 21
ScienceShakedown
Motivation
Hurricane Sandy (2012)
Motivation
Real-time, on-demand computations of storm surge impacts• Hazards to coastal areas a major concern • Hazard/Threat Information needed ASAP (Urgently)• Critical need for:
– detailed high spatial resolution large compute resources
• Federal Forecast cycle every 6 hrs• Must be well within Cycle to be relevant/useful • I.e., New information at 5:59 is already old!!!
Computing Storm Surge
• ADCIRC Storm Surge Model– FEMA-approved for Coastal Flood Insurance Studies – Very high spatial resolution (millions of triangles)– Typically use 256-1024 cores for real-time (one simulation!)
ADCIRC grid for coastal North Carolina
Tackling Uncertainty
Research EnsembleNSF Hazards SEES project
22 members, H. Floyd (1999)
One simulation is NOT enough!Probabilistic Assessment of Hurricanes
A “few” likely hurricanesFully dynamic atmosphere (WRF)
Why GENI?
• Current limitations: Real-time demands for compute resource
– Large demands for real-time compute resources during storms
– Not enough demand to dedicate a cluster year-round
Why GENI?
• Current limitations: Real-time demands for compute resource
– Large demands for real-time compute resources during storms
– Not enough demand to dedicate a cluster year-round
• GENI enables
– Federation of resources
– Cloud bursting, urgent, on-demand
– High-speed data transfers to/from/between remote resources
– Replicate data/compute across geographic areas
• Resiliency, performance
Storm Surge Workflow
Parallel task (32 Core MPI)
• Each ensemble member is a high-performance parallel task that calculates one storm
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
ComputeCore
Slice Topology
• 11 GENI sites (1 ensemble manager, 10 compute sites)• Topology: 92 VMs (368 cores), 10 inter-domain VLANs, 1 TB iSCSI storage• HPC compute nodes: 80 compute nodes (320 cores) from 10 sites
ADCIRC Results from GENIStorm Surge for 6 simulations
N11 N17
N01 N14 N16
N20
Small Threat
Big Threat
31
Conclusions
• GENI testbed represents a kind of shared infrastructure suitable for prototyping of solutions for some computational science domains
• GENI technologies represent a collection of enabling mechanisms that can provide foundation for the future federated science cyberinfrastructure
• Different members of GENI federations offer different capabilities for their users, suitable for a variety of problems
32
Thank you!
• Funders
• Partners