53
TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI) Diane Baxter, Ph.D. San Diego Supercomputer Center University of California, San Diego

TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

  • Upload
    terry

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI). Diane Baxter, Ph.D. San Diego Supercomputer Center University of California, San Diego. The National TeraGrid. Grid Infrastructure Group (UChicago). UW. PSC. UC/ANL. NCAR. PU. NCSA. UNC/RENCI. IU. - PowerPoint PPT Presentation

Citation preview

Page 1: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid ResourcesEnabling Scientific Discovery

Through Cyberinfrastructure (CI)

Diane Baxter, Ph.D.San Diego Supercomputer CenterUniversity of California, San Diego

Page 2: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

Caltech

USC/ISI

UNC/RENCI

UW

Resource Provider (RP)Software Integration Partner

Grid Infrastructure Group (UChicago)

LSU

U Tenn.

The National TeraGrid

Page 3: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

http://www.teragrid.org/

A complex collaboration of over a dozen organizations working together to provide cyberinfrastructure

that goes beyond what can be provided by

individual institutions,

to improve research productivity and enable breakthroughs not otherwise possible.

3

Page 4: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid . . . .

•Deep - provides leadership class resources at 11 partner sites

•Wide - is an integrated, persistent computational resource for broad user communities

•Open - is an open scientific discovery infrastructure

•Is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research.

Page 5: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

To be more specific, TeraGrid . . . • Uses high-performance network connections

(10-30 Tb/sec)

• Integrates high-performance computers; resources for data analysis, visualization, and storage; data collection tools, high-end experimental facilities; and supporting expertise around the country;

• Provides more than a petaflop of computing capability;

• Offers more than 30 petabytes of online and archival data storage, as well as systems to manage data acquisition and access; and

• Provides researchers access to over 100 discipline-specific databases.

Page 6: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

What’s in it (TeraGrid) for me?

• Instruments that delivers high-end IT resources - computation, storage, visualization, and data/service

– A computational facility – over a PetaFLOP in parallel computing capability

– A data storage and management facility - over 30 PetaBytes of storage (disk and tape), over 100 scientific data collections

– A high-bandwidth national data network

•Services: help desk and consulting, Advanced Support for TeraGrid Applications (ASTA), education and training events and resources

•Access - without financial cost – Research accounts allocated via peer review– Startup and Education accounts automatic

6

Page 7: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid Compute Power

Computational Resources (size approximate - not to scale)

Slide Courtesy Tommy Minyard, TACC

SDSC

TACC

UC/ANL

NCSA

ORNL

PU

IU

PSC

NCAR

TennesseeLONI/

LSU

7

Page 8: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid Data Storage and Management

• Persistent storage on disk and tape

• Allocatable tape-based, geographically distributed storage systems for backups of critical data :

» IU (Indiana University)» NCAR (National Center for Atmospheric Research)» NCSA (National Center for Supercomputing Applications)» SDSC (San Diego Supercomputer Center)

• Command line usage with GridFTP, using the File Manager tool in the TeraGrid User Portal

• GPFS-WAN (General Parallel File System Wide Area Network). ~ 1 petabyte

• IU Data Capacitor (1 Pb spinning disk for short-term data storage)

• Long term disk storage allocations

Page 9: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid Architecture

ComputeService

VizService

DataService

Network, Accounting, …

RP 1

RP 3

RP 2

TeraGrid Infrastructure (Network, Authorization, Accounting,

…)

POPS

Science Gateway

s

UserPortal

Command

Line

Page 10: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)
Page 11: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

(Are your eyes glazing over?) Translation please!

Page 12: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Enter: Science Gateways

•A Science Gateway– Enables scientific communities of

users with a common scientific goal and vocabulary

– Has a common interface – Leverages community investment

•Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

12

Page 13: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Today, there are approximately 29 gateways using the TeraGrid

Page 14: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

How do Gateways help?

•Make science more productive– Researchers use same tools– Complex workflows– Common data formats– Data sharing

•Bring TeraGrid capabilities to the broad science community– Lots of disk space– Lots of compute resources– Powerful analysis capabilities– A community-friendly interface

to information and research tools

Page 15: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

But it’s not just ease of use. What can scientists do that they couldn’t do

previously?

• LEAD - access to radar data• NVO – access to sky surveys• OOI – access to sensor data• PolarGrid – access to polar ice sheet data• SIDGrid – analysis tools for social scientists• GridChem – developing multiscale coupling

How would this have been done before gateways?

Page 16: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Gateways can enhance and support investments in other projects

•Increase access– To instruments

•Increase capabilities– To data analysis tools

•Improve workforce development– For underserved populations, through broad

access to learning resources

•Increase outreach•Increase public awareness

– Public sees value in investments in large facilities

•Slice bread

Page 17: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Gateways Greatly Expand Access

•Almost anyone can investigate scientific questions using high end resources– Not restricted to those in research groups with allocations– Gateways allow anyone with a web browser to explore

•Fosters new ideas, cross-disciplinary approaches•Encourages students to experiment•But Gateways are used in production too

– Significant number of papers resulting from gateways including GridChem, nanoHUB

– Scientists can focus on challenging science problems rather than challenging infrastructure problems

Page 18: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

How do we develop a new gateway? Advanced support for Gateway Development

•Same peer review process used to request resources– 30,000 CPUs – + 6 months of help from a TG Gateway Team

member

– Reviews based on appropriate use of resources, science is not reviewed if already funded•Petascale•Multisite workflows•Gateways•Domain expertise

Page 19: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Support is Very Targeted• Start with well-defined objectives

– Focus on efficient or novel use of national CI resources

• Minimum .25 FTE for months to a year

– Enough investment to really understand and help solve complex problems

• Must have commitment from PIs

– Want to make sure work is incorporated into production codes and gateways

• Good candidates for targeted support include:

– Large, high impact projects

– Ability to influence new communities

– Suggestions from NSF directorates on important projects

• Lessons learned move into training and documentation

Page 20: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

When is a gateway be most appropriate?

• Researchers using defined sets of tools in different ways

– Same executables, different input•GridChem, CHARMM

– Creating multi-scale or complex workflows

– Shared datasets

• Common data formats

– National Virtual Observatory

– Earth System Grid

– Some groups have invested significant efforts already, e.g.:

•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

• Difficult to access data/advanced workflows

– Sensor/radar input

•LEAD, GEON

Page 21: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Work by Emad Tajkhorshid and James Gumbart, of University of Illinois Urbana-Champaign. – Mechanics of Force Propagation in

TonB-Dependent Outer Membrane Transport. Biophysical Journal 93:496-504 (2007).

– Results of the simulation may be seen at www.life.uiuc.edu/emad/TonB-BtuB/btub-2.5Ans.mpg

• Modeled mechanisms for transport of molecules through cell membrane.

• Used 400,000 CPU hours [45 processor-years] on systems at National Center for Supercomputing Applications, IU, Pittsburgh Supercomputing CenterImage courtesy of Emad Tajkhorshid,

UIUC

Things you can do with the TeraGrid:Simulate cell membrane processes

Page 22: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Predict storms

• Hurricanes and tornadoes cause massive loss of life and damage to property

• TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed

– Major Goal: assess how well ensemble forecasting predicts thunderstorms, including supercells tornadoes.

– Delivers “better than real time” prediction

– Used 675,000 CPU hours for the season

– Used 312 TB on HPSS storage at PSCSlide courtesy of Dennis Gannon, IU, and LEAD Collaboration

Page 23: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Watch Polar Ice Caps Melt (PolarGrid)

•Cyberinfrastructure Center for Polar Science (CICPS)– Experts in polar science,

remote sensing and cyberinfrastructure

– Indiana, ECSU, CReSIS

•Satellite observations show disintegration of ice shelves in West Antarctica and speed-up of several glaciers in southern Greenland– Most existing ice sheet

models, including those used by IPCC cannot explain the rapid changes

http://www.polargrid.org/polargrid/images/4/42/C0050-polargrid-big.m4v

Source: Geoffrey Fox

Page 24: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

CY2007 Usage by Discipline

3.95B SUs delivered in CY2007

Molecular

Biosciences

31%

Chemistry

17%Physics

17%

Astronomical

Sciences12%

Materials Research

6%

Earth Sciences

3%

All 19 Others

4%

Advanced Scientific Computing

2%

Atmospheric

Sciences

3%

Chemical, Thermal

Systems

5%

24

Page 25: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Do you want to see more Gateway examples?

•Yes • No

Page 26: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Recent Gateways using TeraGrid Significantly

•SCEC•SIDGrid•CIG

Page 27: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

SCEC using gateway to produce hazard map

•PSHA hazard map for California using newly released Earthquake Rupture Forecast (UCERF2.0) calculated using SCEC Science Gateway

•Warm colors indicate regions with a high probability of experiencing strong ground motion in the next 50 years.

•High resolution map, significant CPU use

Page 28: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

LEAD (portal.leadproject.org/)

• Simple enough an undergraduate can use it! http://wxchallenge.com/• National Center for Supercomputing Applications (NCSA) and IU teamed

up to support WxChallenge weather forecast competition. 64 teams, 1000 students, ~16,000 CPU hours on Big Red

• XBaya is available from http://www.collab-ogce.org/

Page 29: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

NanoHub Harnesses TeraGrid for Education

Nanotechnology education

•Used in dozens of courses at many universities

•Teaching materials•Collaboration space•Research seminars•Modeling tools•Access to cutting edge

research software

Page 30: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Social Informatics Data Grid

•Heavy use of “multimodal” data. – Subject might be viewing a

video, while a researcher collects heart rate and eye movement data.

•Events must be synchronized for analysis, large datasets result

•Extensive analysis capabilities are not something that each researcher should have to create for themselves.

http://www.ci.uchicago.edu/research/files/sidgrid.mov

Page 31: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

• Social scientists have traditionally worked in isolated labs without the capability to share data or insights with others.

• SIDGrid enables a number of capabilities. – Data that is expensive to collect can now be shared with others, increasing the

potential for scientific impact.– Geographically distant researchers can collaborate on the analysis of the same

data set.– Complex analysis tools and workflows are now available for all to use, rather

than having each lab duplicate efforts.– All researchers now have access to the highest quality computational resources

•SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis

• SIDGrid is unique among social science data archive projects– Focused on streaming data which change over time– Provides the ability to investigate multiple datasets, collected at different time

scales, simultaneously

• Active users of the SIDGrid system include a human neuroscience group and linguistic research groups from the University of Chicago and the University of Nottingham, UK

Page 32: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

SIDGrid sidgrid.ci.uchicago.edu

Page 33: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid Pathways Activities

•2 Gateway components– Adapt gateways for educational use by

underrepresented communities•GEON – SDSC, Navajo Tech

– Teach participants from underrepresented communities how to build gateways•PolarGrid – IU, ECSU

Page 34: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Navajo Technical College and gateways

•Incorporating the use of gateways in their curricula•GEON, GISolve areas of initial interest

Page 35: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Menu TG Resources and Services

•Computing – over a petaflop of computing power and growing

•Data – Data storage facilities & management tools – Scientific data collections

•Over 30 Science Gateways

•Remote visualization servers and software

•Technical Support– Central point of contact for support of all systems– Advanced Support for TeraGrid Applications (ASTA)

•Education and training events and resources– K-12 Education– Pathways– Campus Champions

35

Page 36: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Human

Connection:

Your

Campus

Champion

• The Campus Champions program supports campus representatives as the local source of knowledge about high-performance computing opportunities and resources.

Knowledge plus assistance will empower campus researchers, educators, and students to advance scientific discovery.

• Your campus will benefit by having direct access to the TeraGrid and input to its staff, resource allocations awarded for their use, and assistance in using those resources.

• TeraGrid will support the Campus Champion. See

– http://www.teragrid.org/eot/campuschamps.html

– To join the Campus Champions program, contact the TeraGrid Campus Champions Program Coordinator, at [email protected].

Page 37: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Online Resources

•Online resources at www.teragrid.org

•TeraGrid User Portal for managing allocations and job flow

•Documentation– Knowledge Base for quick answers to FAQ’s– HPC University to increase general HPC knowledge

•Calendar of events including upcoming workshops and training– Annual conference - TG09

•Arlington, VA•June 22-26, 2009

Page 38: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

TeraGrid: greater than the sum of its parts

•Leadership in cyberinfrastructure development, deployment and support

•Expertise in building national computing and data resources

•Leveraging extensive resources, expertise, R&D, and EOT– Leveraging activities at other participant sites– Learning from each other improves expertise of all TG staff– Shared training, education, and outreach resources benefit all

•Simplified access to high end resources– Single unified allocations process– Single point of contact for problem reporting– Coordinated software environments– Uniform access to heterogeneous resources to solve a single

scientific problem

38

Page 39: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Would you like to learn more about getting a TeraGrid allocation ?

Yes Not today

Page 40: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

How does the Allocations process work?

• Startup allocations: for code development, experimentation with TeraGrid platforms, and application testing. Startup requests may total up to 200,000 service units (SUs) of computation, up to 5TB on disk and 25TB on tape of storage.

• Education allocations: for use in classroom instruction or training activities, with the same SU and storage limits as Startup allocations.

• Research allocations: requires a detailed justification of resource usage. Requests are reviewed four times a year by the Resource Allocations Committee.

– National peer-review process

•allocates computational and data resources

•makes recommendations on allocation of advanced direct support services

•Currently awarding >1B Normalized Units of resources

– Principal investigator (PI) must be a researcher, educator, or postdoctoral researcher at a US academic or non-profit research institution.

Page 41: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Go to the POPS page - https://pops-submit.teragrid.org

Á

Page 42: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Create a POPS Login

Page 43: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Indicate that you are “New” to the Teragrid

Page 44: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Indicate this is a “Start-up” Request

Page 45: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Select Startup or Educational

Page 46: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Fill out PI information

Page 47: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Á

Skip Co-PIs info

Page 48: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Fill out info on your project

Page 49: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Fill out info on your funding

Page 50: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á

Á

Á

Á

Á

Estimate your computing need (reasonably)

Page 51: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Á Á when ready

Upload your CV and Submit!

Page 52: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Acknowledgements

• This work is made possible by the dedicated efforts of the TeraGrid staff. In particular, slides came from Scott Lathrop, Craig Stewart, John Towns, Dane Skow, Daphne Siefert-Herron, Vickie Lynch, David Hart (Indiana Dave); David Hart (California Dave), Fran Berman, Nancy Wilkins-Diehr, Laura McGinnis and probably others.

• The Grid Infrastructure Group management of the TeraGrid is funded by NSF grant 0503697.

• The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and supported by NSF grant 331480. Marcus Christie and Surresh Marru of the Extreme! Computing Lab contributed the LEAD graphics

• The ChemBioGrid Portal is developed under the leadership of IU Professor Dr. Geoffrey C. Fox and Dr. Marlon Pierce and funded via the Pervasive Technology Labs (supported by the Lilly Endowment, Inc.) and the National Institutes of Health grant P20 HG003894-01.

• Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly Endowment, Inc., or any other funding agency.

Page 53: TeraGrid Resources Enabling Scientific Discovery Through Cyberinfrastructure (CI)

Thank you!

•Questions?