30
TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California, San Diego

TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Embed Size (px)

Citation preview

Page 1: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid ResourcesEnabling Scientific Discovery

Through Cyberinfrastructure (CI)

Diane Baxter, Ph.D.San Diego Supercomputer CenterUniversity of California, San Diego

Page 2: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

To clarify – “Cyberinfrastructure” . . .

• . . . is a coordinated set of hardware, software, and services, all integrated and working together

•“CI” encompasses networks, computers, data, sensors, handheld devices, other technologies, and the services or human “glue” that holds them all together.

network

data

computer

storage

fieldinstrument

network

computer

data

network

computerviz

computer

sensorsfield

data

wireless

The “computer” as an integrated set of resources

Page 3: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid National Research Cyberinfrastructure includes:

•Computing systems,

•Data storage systems, and data repositories,

•Visualization environments,

•and People,

•all linked together by High Performance Networks.

3

Page 4: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid . . . .

•Is an open scientific discovery infrastructure

•Provides leadership class resources at 11 partner sites

•Is an integrated, persistent computational resource

•Is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.

Page 5: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

Caltech

USC/ISI

UNC/RENCI

UW

Resource Provider (RP)Software Integration Partner

Grid Infrastructure Group (UChicago)

LSU

U Tenn.

The National TeraGrid

Page 6: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

http://www.teragrid.org/

A complex collaboration of over a dozen organizations working together to provide cyberinfrastructure

that goes beyond what can be provided by

individual institutions,

to improve research productivity and enable breakthroughs not otherwise possible.

6

Page 7: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid . . . •Uses high-performance network connections (10-30

Tb/sec)

• Integrates high-performance computers; data resources for analysis, visualization, and storage; data collection tools, high-end experimental facilities; and supporting expertise around the country

•Provides more than a petaflop of computing capability

•Consists of more than 30 petabytes of online and archival data storage, as well as systems to manage data acquisition and access

•Provides researchers access to over 100 discipline-specific databases.

Page 8: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

What’s in it (TeraGrid) for me?

• Instruments that delivers high-end IT resources - computation, storage, visualization, and data/service

– A computational facility – over a PetaFLOP in parallel computing capability

– A data storage and management facility - over 30 PetaBytes of storage (disk and tape), over 100 scientific data collections

– A high-bandwidth national data network

•Services: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources

•Access - without financial cost – Research accounts allocated via peer review– Startup and Education accounts automatic

8

Page 9: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid Compute Power

Computational Resources (size approximate - not to scale)

Slide Courtesy Tommy Minyard, TACC

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

TennesseeLONI/

LSU

9

Page 10: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

10

TG Data storage and management.1 (tape)

• TeraGrid provides persistent storage on disk and tape

• Backups of critical data stored remote from your home

• Allocatable tape-based storage systems:

• IU (Indiana University) - geographically distributed

• NCAR (National Center for Atmospheric Research) - also supports dual copy

• NCSA (National Center for Supercomputing Applications)

• SDSC (San Diego Supercomputer Center)

• Note: In addition, most sites have massive data storage systems that provide storage in support of computation

• Command line usage is reasonably straightforward with GridFTP, very easy with File Manager tool in the TeraGrid User Portal

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 11: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

11

Data storage and management.2 (Disk)

•GPFS-WAN (General Parallel File System Wide Area Network). ~ 1 petabyte– Home at San Diego Supercomputer Center; may be

accessed as if it were a local file system from NCAR, NCSA, IU, UC/ANL

• IU Data Capacitor– 1 petabyte of spinning disk– Primarily for short term storage of data

•Long term disk storage allocations– Indiana University, National Center for

Supercomputing Applications, San Diego Supercomputer Center

©Trustees of Indiana University. May be reused so long as IU and TeraGrid logos remain, and any modifications to original are noted. Courtesy Craig A. Stewart, IU

Page 12: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid Architecture

ComputeService

VizService

DataService

Network, Accounting, …

RP 1

RP 3

RP 2

TeraGrid Infrastructure (Network, Authorization, Accounting,

…)

POPS

Science Gateway

s

UserPortal

Command

Line

12

Page 13: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

13

Page 14: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

????? Translation please!

Page 15: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Enter: Science Gateways

•A Science Gateway– Enables scientific communities of

users with a common scientific goal

– Has a common interface – Leverages community investment

•Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

15

Page 16: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Today, there are approximately 29 gateways using the TeraGrid

Page 17: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

How do Gateways help?

•Makes science more productive– Researchers use same tools– Complex workflows– Common data formats– Data sharing

•Brings TeraGrid capabilities to the broad science community– Lots of disk space– Lots of compute resources– Powerful analysis capabilities– A community-friendly interface

to information and research tools

Page 18: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

But it’s not just ease of use. What can scientists do that they couldn’t do

previously?

• LEAD - access to radar data• NVO – access to sky surveys• OOI – access to sensor data• PolarGrid – access to polar ice sheet data• SIDGrid – analysis tools• GridChem – developing multiscale coupling

How would this have been done before gateways?

Page 19: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Gateways can further investments in other projects

•Increase access– To instruments

•Increase capabilities– To data analysis tools

•Improve workforce development– For underserved populations, through broad

access to learning resources

•Increase outreach•Increase public awareness

– Public sees value in investments in large facilities

•Slice bread

Page 20: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Gateways Greatly Expand Access

•Almost anyone can investigate scientific questions using high end resources– Not just those in the research groups of those who request

allocations– Gateways allow anyone with a web browser to explore

•Fosters new ideas, cross-disciplinary approaches•Encourages students to experiment•But used in production too

– Significant number of papers resulting from gateways including GridChem, nanoHUB

– Scientists can focus on challenging science problems rather than challenging infrastructure problems

Page 21: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Advanced support for Gateway Development

•Same peer review process used to request resources– 30,000 CPUs – + 6 months of help from a TG Gateway Team

member

– Reviews based on appropriate use of resources, science is not reviewed if already funded•Petascale•Multisite workflows•Gateways•Domain expertise

Page 22: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Support is Very Targeted• Start with well-defined objectives

– Focus on efficient or novel use of national CI resources

• Minimum .25 FTE for months to a year

– Enough investment to really understand and help solve complex problems

• Must have commitment from PIs

– Want to make sure work is incorporated into production codes and gateways

• Good candidates for targeted support include:

– Large, high impact projects

– Ability to influence new communities

– Suggestions from NSF directorates on important projects

• Lessons learned move into training and documentation

Page 23: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

When might a gateway be most appropriate?

•Researchers using defined sets of tools in different ways– Same executables, different input

•GridChem, CHARMM

– Creating multi-scale or complex workflows– Shared datasets

•Common data formats– National Virtual Observatory– Earth System Grid– Some groups have invested significant efforts here

•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

•Difficult to access data/advanced workflows– Sensor/radar input

•LEAD, GEON

Page 24: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

TeraGrid Pathways Activities

•2 Gateway components– Adapt gateways for educational use by

underrepresented communities•GEON – SDSC, Navajo Tech

– Teach participants from underrepresented communities how to build gateways•PolarGrid – IU, ECSU

Page 25: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Navajo Technical College and gateways

•Incorporating the use of gateways in their curricula•GEON, GISolve areas of initial interest

Page 26: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. – Mechanics of Force Propagation in

TonB-Dependent Outer Membrane Transport. Biophysical Journal 93:496-504 (2007).

– Results of the simulation may be seen at www.life.uiuc.edu/emad/TonB-BtuB/btub-2.5Ans.mpg

• Modeled mechanisms for transport of molecules through cell membrane.

• Used 400,000 CPU hours [45 processor-years] on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing CenterImage courtesy of Emad Tajkhorshid,

UIUC

What you can do with the TeraGrid:Simulation of cell membrane processes

26

Page 27: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Predicting storms

• Hurricanes and tornadoes cause massive loss of life and damage to property

• TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed

– Major Goal: assess how well ensemble forecasting predicts thunderstorms, including supercells tornadoes.

– Delivers “better than real time” prediction

– Used 675,000 CPU hours for the season

– Used 312 TB on HPSS storage at PSCSlide courtesy of Dennis Gannon, IU, and LEAD Collaboration

27

Page 28: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

PolarGrid

•Cyberinfrastructure Center for Polar Science (CICPS)– Experts in polar science,

remote sensing and cyberinfrastructure

– Indiana, ECSU, CReSIS

•Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland– Most existing ice sheet

models, including those used by IPCC cannot explain the rapid changes

http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v

Source: Geoffrey Fox

Page 29: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

•Components of PolarGrid– Expedition grid consisting of ruggedized laptops in a field grid

linked to a low power multi-core base camp cluster– Prototype and two production expedition grids feed into a 17

Teraflops "lower 48" system at Indiana University and Elizabeth City State (ECSU) split between research, education and training.

– Gives ECSU a top-ranked 5 Teraflop MSI high performance computing system

•Access to expensive data•High-end resources for analysis•MSI student involvement

Source: Geoffrey Fox

Page 30: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California,

Recent Gateways using TeraGrid Significantly

•SCEC•SIDGrid•CIG