34
TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways [email protected] University of Michigan CI Days, November 2, 2010 Entrance gate to the “Big House”, Ann Arbor, MI

TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways [email protected] University of Michigan CI Days, November 2,

Embed Size (px)

Citation preview

Page 1: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

TeraGrid Science Gateways

Nancy Wilkins-DiehrTeraGrid Area Director for

Science [email protected]

University of Michigan CI Days, November 2, 2010

Entrance gate to the “Big House”, Ann Arbor, MI

Page 2: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

You’ve heard a lot about the TeraGrid

Here’s a one-slide recap of the resources

University of Michigan CI Days, November 2, 2010

Page 3: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

TeraGrid resources today include:•Tightly Coupled Distributed Memory Systems, 2 systems in the top 10 at top500.org– Kraken (NICS): Cray XT5, 99,072 cores, 1.03 Pflop– Ranger (TACC): Sun Constellation, 62,976 cores, 579 Tflop, 123 TB RAM

•Shared Memory Systems– Cobalt (NCSA): Altix, 8 Tflop, 3 TB shared memory– Pople (PSC): Altix, 5 Tflop, 1.5 TB shared memory

•Clusters with Infiniband– Abe (NCSA): 90 Tflops– Lonestar (TACC): 61 Tflops– QueenBee (LONI): 51 Tflops

•Condor Pool (Loosely Coupled)– Purdue- up to 22,000 cpus

•Gateway hosting– Quarry (IU): virtual machine support

•Visualization Resources– TeraDRE (Purdue): 48 node nVIDIA GPUs– Spur (TACC): 32 nVIDIA GPUs

•Storage Resources– Wide area filesystems( Lustre, GPFS)– Archival storage– Data replication service University of Michigan CI Days,

November 2, 2010Source: Dan Katz, U Chicago

But change is consta

nt - new sy

stems:

• Data Analysis and Vis s

ystems

• Longhorn (TACC): D

ell/NVIDIA, C

PU and GPU

• Nautilus (N

ICS): SGI U

ltraViolet, 1

024 cores,

4TB global shared memory

• Data-Intensiv

e Computing

• Dash (SDSC): I

ntel Nehalem, 544 processo

rs, 4TB

flash memory, Gordon (S

DSC):

• FutureGrid

• Experimental computing grid

and cloud test-bed to

tackle research challenges in computer sc

ience

• Keeneland

• Experimental, h

igh-perform

ance computing

system with NVIDIA Tesla

accelerators

Page 4: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

So how do Gateways fit into this?Gateways are a natural result of the impact of the internet on

worldwide communication and information retrieval

•Implications on the conduct of science are still evolving–1980’s, Early gateways, National Center for Biotechnology

Information BLAST server, search results sent by email, still a working portal today

–1989 World Wide Web developed at CERN–1992 Mosaic web browser developed–1995 “International Protein Data Bank Enhanced by Computer

Browser”–2004 TeraGrid project director Rick Stevens recognized growth in

scientific portal development and proposed the Science Gateway Program

–Today, Web 3.0 and programmatic exchange of data between web pages

•Simultaneous explosion of digital information–Growing analysis needs in many, many scientific areas–Sensors, telescopes, satellites, digital images, video, genome

sequencers–#1 machine on Top500 today over 1000x more powerful than all

combined entries on the first list in 1993

University of Michigan CI Days, November 2, 2010

Only 18 years since the release of Mosaic!

Page 5: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

vt100 in the 1980s and alogin window on Ranger today

University of Michigan CI Days, November 2, 2010

Page 6: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Why are gateways worth the effort?

•Increasing range of expertise needed to tackle the most challenging scientific problems–How many details do you

want each individual scientist to need to know?•PBS, RSL, Condor•Coupling multi-scale codes•Assembling data from multiple sources

•Collaboration frameworks

University of Michigan CI Days, November 2, 2010

#! /bin/sh#PBS -q dque#PBS -l nodes=1:ppn=2 #PBS -l walltime=00:02:00#PBS -o pbs.out#PBS -e pbs.err#PBS -Vcd /users/wilkinsn/tutorial/exercise_3../bin/mcell nmj_recon.main.mdl

+( &(resourceManagerContact="tg-login1.sdsc.teragrid.org/jobmanager-pbs") (executable="/users/birnbaum/tutorial/bin/mcell") (arguments=nmj_recon.main.mdl) (count=128) (hostCount=10) (maxtime=2) (directory="/users/birnbaum/tutorial/exercise_3") (stdout="/users/birnbaum/tutorial/exercise_3/globus.out") (stderr="/users/birnbaum/tutorial/exercise_3/globus.err"))

=======# Full path to executableexecutable=/users/wilkinsn/tutorial/bin/mcell

# Working directory, where Condor-G will write # its output and error files on the local machine.initialdir=/users/wilkinsn/tutorial/exercise_3

# To set the working directory of the remote job, we# specify it in this globus RSL, which will be appended# to the RSL that Condor-G generatesglobusrsl=(directory='/users/wilkinsn/tutorial/exercise_3')

# Arguments to pass to executable.arguments=nmj_recon.main.mdl

# Condor-G can stage the executabletransfer_executable=false

# Specify the globus resource to execute the jobglobusscheduler=tg-login1.sdsc.teragrid.org/jobmanager-pbs

# Condor has multiple universes, but Condor-G always uses globusuniverse=globus

# Files to receive sdout and stderr.output=condor.outerror=condor.err

# Specify the number of copies of the job to submit to the condor queue.queue 1

Page 7: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Gateways democratize access to high end resources

•Almost anyone can investigate scientific questions using high end resources–Not just those in the research groups of those who request

allocations–Gateways allow anyone with a web browser to explore

•Opportunities can be uncovered via google–My then 11-year-old son discovered nanoHUB.org when his science class was studying Bucky Balls

•Foster new ideas, cross-disciplinary approaches–Encourage students to experiment

•But used in production too–Significant number of papers resulting from gateways

including GridChem, nanoHUB–Scientists can focus on challenging science problems rather

than challenging infrastructure problemsUniversity of Michigan CI Days,

November 2, 2010

Page 8: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Today, there are approximately 35 gateways using the TeraGrid

University of Michigan CI Days, November 2, 2010

This just in 35% of TeraGrid users charging jobs (June-Sept, 2010) were gateway users!

Page 9: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Not just ease of useWhat can scientists do that they

couldn’t do previously?•Linked Environments for Atmospheric Discovery (LEAD) - radar data coupled with on demand computing•Large Synoptic Survey Telescope (LSST) – access to sky surveys•Ocean Observing Initiative (OOI) – access to sensor data•PolarGrid – access to polar ice sheet data•SIDGrid – expensive datasets, analysis tools•GridChem –coupling multiscale codes

•How would this have been done before gateways?University of Michigan CI Days, November 2, 2010

Page 10: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

3 steps to connect a gateway to TeraGrid

•Request an allocation–Only a 1 paragraph abstract

required for up to 200k CPU hours

•Register your gateway–Visibility on public TeraGrid page

•Request a community account–Run jobs for others via your

portal•Staff support is available!•www.teragrid.org/gateways

University of Michigan CI Days, November 2, 2010

Page 11: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Tremendous Opportunities Using the Largest Shared Resources - Challenges too!

•What’s different when the resource doesn’t belong just to me?–Resource discovery–Accounting–Security–Proposal-based requests for resources (peer-reviewed access)

•Code scaling and performance numbers• Justification of resources•Gateway citations

•Tremendous benefits at the high end, but even more work for the developers•Potential impact on science is huge–Small number of developers can impact thousands of

scientists–But need a way to train and fund those developersUniversity of Michigan CI Days,

November 2, 2010

Page 12: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

When is a gateway appropriate?

•Researchers using defined sets of tools in different ways–Same executables, different input

•GridChem, CHARMM–Creating multi-scale workflows–Datasets

•Common data formats–National Virtual Observatory–Earth System Grid–Some groups have invested significant efforts here

•caBIG, extensive discussions to develop common terminology and formats

•BIRN, extensive data sharing agreements

•Difficult to access data/advanced workflows–Sensor/radar input

•LEAD, GEONUniversity of Michigan CI Days,

November 2, 2010

Page 13: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

How to get started?

•Conduct a needs assessment–Should I build a gateway?–Can I use an existing gateway?–What problems am I trying to solve?

•All gateways don’t need high end computing

•Decide on a software approach–Recommended software at www.teragrid.org

•Targeted effort by a few can benefit many–Could a pool of developers design gateways for different

domain areas? Yes!•TeraGrid staff assistance

University of Michigan CI Days, November 2, 2010

Page 14: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Expressed Sequence Tag (EST) Pipeline

•Take raw genome data in the FASTA format and run a series of applications on it–RepeatMasker, PaCE, CAP3 and BLAST used to generate the

final assembled output–Very variable run times (milliseconds to days)

•EST Pipeline based on the SWARM Web Service that provides a web service interface to clients and also manages the bulk job submission using the Birdbath API to submit to Condor

•2M jobs run in 49 hours, only a handful of failures

•Workflow is configured using a PHP based gateway that allows users to upload input data and select programs to run

University of Michigan CI Days, November 2, 2010Source: Archit Kulshrestha, IU

Page 15: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Cyberinfrastructure for Phylogenetic Research (CIPRES)

www.phylo.org•Enables large-scale phylogenetic reconstructions•Parallel “fastest in the west” versions of applications such as MrBayes, Raxml and Garli•Easy to use graphical user interface•Over 800 users, June-Sept–27% of all active TG users!!

•5M CPU hours awardedUniversity of Michigan CI Days,

November 2, 2010

Page 16: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Intellectual Merit:

• the CIPRES portal is cited in at least 35 publications

• this includes publications in Nature, PNAS, and Cell.

• highlights of scientific findings:

New Family Tree for Arthropoda: A team of scientists compared genetic sequences from 75 arthropod species and drew a new family tree for the most successful phylum of animals on Earth. This work represents an important advance in the century-old problem of arthropod evolution.

Genome Sequence of a Transitional Eukaryote: A group of scientists sequenced the genome of Naegleria gruberi, a single-cell organism that is a key transitional species between prokaryotes and eukaryotes. This work provides new insights into the origins of subcellular organelles.

Co-evolution of Beetles and Flowering Plants: A group of researchers studied the evolutionary history of angiosperms and the beetles that interact with them. The work provided compelling experimental evidence for the long-postulated co-evolution of these two symbiotic groups.

Source: Mark Miller, SDSC

University of Michigan CI Days, November 2, 2010

Page 17: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Broad Impacts:

• 77% of all jobs have been submitted from locations in the USA. Submissions are received regularly from researchers at top-tier institutions such as Harvard, Yale, and Stanford.

• Jobs are received regularly from academic institutions in 17 EPSCOR states.

• Job submissions have been received from 34 countries on 5 continents.

• At least 5 undergraduate classes are known to use the portal routinely. This is likely an underestimate (based on Web log patterns).

• More than 45,000 jobs have been run on the Portal over its lifetime. Between Dec 1, 2010 and June 30, 2010, users ran 6,108 parallel jobs on the TeraGrid.

Source: Mark Miller, SDSC

University of Michigan CI Days, November 2, 2010

Page 18: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Additional Gateways for Biology

•www.teragrid.org/gateways–List of all TeraGrid gateways

•Biodrugscore•RENCI Science Portal•Open Life Sciences Gateway•Robetta

University of Michigan CI Days, November 2, 2010

Page 19: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Biodrugscorewww.biodrugscore.org

•Derive and validate scoring functions•Create training sets using structural and binding data from multiple databases including PDBbind and PDBcal

•Define the components of scoring functions by picking from among a list of pre-computed terms

– Partial least-squares regression analysis•Validate scoring functions

•Apply custom scoring functions for the ranking of chemical libraries that are pre-docked against a large set of binding cavities from the human proteome

• If the receptor of interest is not available, biodrugscore makes it possible for users to dock libraries against their target on the TeraGrid using their own account.University of Michigan CI Days,

November 2, 2010

Page 20: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

NBCRwww.nbcr.net•Compute resources

•Service projects–Quantum to Continuum Mechanics Tools

–Data Analysis Tools for Molecular Sequences

–Heart Modeling–Visualization and multi-scale modeling

–Grid services and Telescience

•Tools and downloads–40+ packages, databases, services

University of Michigan CI Days, November 2, 2010

Page 21: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

RENCI Science Portal https://portal.renci.org/portal/

•125 biology applications–From Antigenic to WordMatch and everything in between

–RENCI Science Desktop–BlastMaster desktop

University of Michigan CI Days, November 2, 2010

Page 22: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Open Life Sciences Gatewayhttp://lsgw.uc.teragrid.org

•Bioinformatics applications and data collections

•Portal access, direct Web services calls, workflows with Taverna

–And now google gadgets!–igoogle.google..com, “add stuff”, search for TeraGrid

University of Michigan CI Days, November 2, 2010

Page 23: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Robettahttp://www.robetta.org

University of Michigan CI Days, November 2, 2010

•Protein structure prediction server–Rosetta code from the David Baker laboratory

•Also available–RosettaAntibody Server–RosettaDesign Server–RosettaDock Server–Rosetta Commons–FoldIt–Rosetta@home–Human Proteome Folding Project

Page 24: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Linked Environments for Atmospheric Discovery (LEAD)

• Providing tools that are needed to make accurate predictions of tornados and hurricanes

• Meteorological data• Forecast models• Analysis and visualization tools

• Data exploration and Grid workflow

University of Michigan CI Days, November 2, 2010

Page 25: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Highlights: LEAD Inspires StudentsAdvanced capabilities regardless of location

•A student gets excited about what he was able to do with LEAD•“Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!• Eric” (email, March 2007)

University of Michigan CI Days, November 2, 2010

Page 26: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Community Climate System Model (CCSM)

•Makes a world-leading, fully coupled climate model easier to use and available to a wide audience•Compose, configure, and submit CCSM simulations to the TeraGrid

•Used in Purdue’s POL 520/EAS 591: Models in Climate Change Science and Policy

–Semester-long projects, 100 year CCSM simulations, generate policy recommendations based on scientific, economic, and political models of climate change impacts

University of Michigan CI Days, November 2, 2010

Page 27: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Analytical UltracentrifugationEmerging computational tool for the study of proteins

•The Center for Analytical Ultracentrifugation of Macromolecular Assemblies, UT Health Sciences –Major advances in the

characterization of proteins and protein complexes as a result of new instrumentation and powerful software

–Monitoring the sedimentation of macromolecules in real time in the centrifugal field allows their hydrodynamic and thermodynamic characterization in solution

–Observations are electronically digitized and stored for further mathematical analysis

–http://uslims.uthscsa.edu/University of Michigan CI Days, November 2, 2010Source: Modern analytical ultracentrifugation in protein science: A tutorial review, Wikipedia

Page 28: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

UltraScan provides a comprehensive data analysis environment

•Management of analytical ultracentrifugation data for single users or entire facilities•Support for storage, editing, sharing and analysis of data–HPC facilities used for 2-D spectrum analysis and genetic

algorithm analysis•TeraGrid (~2M CPU hours used)•Technische University of Munich• Juelich Supercomputing Center

•Portable graphical user interface•MySQL database backend for data management•Over 30 active institutions•TeraGrid advanced support–Fault tolerance, workflows, use of multiple TG resources,

community account implementationUniversity of Michigan CI Days, November 2, 2010

Page 29: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Social Informatics Data GridCollaborative access to large, complex datasets

•SIDGrid is unique among social science data archive projects–Streaming data which

change over time•Voice, video, images (e.g. fMRI), text, numerical (e.g. heart rate, eye movement)

– Investigate multiple datasets, collected at different time scales, simultaneously•Large data requirements•Sophisticated analysis tools

University of Michigan CI Days, November 2, 2010

http://www.ci.uchicago.edu/research/files/sidgrid.mov

Page 30: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Viewing multimodal data like a symphony conductor

•“Music-score” display and synchronized playback of video and audio files–Pitch tracks–Text–Head nods, pause, gesture

references•Central archive of multi-modal data, annotations, and analyses–Distributed annotation efforts

by multiple researchers working on a common data set• History of updates

•Computational tools–Distributed acoustic analysis

using Praat–Statistical analysis using R–Matrix computations using

Matlab and Octave

University of Michigan CI Days, November 2, 2010

Source: Studying Discourse and Dialog with SIDGrid, Levow, 2008

Page 31: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Future Technical Areas•Web technologies change fast

–Must be able to adapt quickly•Gateways and gadgets

–Gateway components incorporated into any social networking page

–75% of 18 to 24 year-olds have social networking websites

•iPhone apps?•Web 3.0

–Beyond social networking and sharing content

–Standards and querying interfaces to programmatically share data across sites• Resource Description Framework (RDF), SPARQL

University of Michigan CI Days, November 2, 2010

Page 32: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

Gateways can further investments in other projects

•Increase access–To instruments, expensive data collections

•Increase capabilities–To analyze data

•Improve workforce development–Can prepare students to function in today’s cross-disciplinary

world•Increase outreach•Increase public awareness–Public sees value in investments in large facilities–Pew 2006 study indicates that half of all internet users have

been to a site specializing in science–Those who seek out science information on the internet are

more likely to believe that scientific pursuits have a positive impact on society University of Michigan CI Days,

November 2, 2010

Page 33: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

But gateways can only be truly effective if they are persistent

Gateway Sustainability Study•Characteristics of short funding cycles– Build exciting prototypes with input

from scientists– Work with early adopters to extend

capabilities– Tools are publicized, more scientists

interested– Funding ends– Scientists who invested their time to

use new tools are disillusioned• Less likely to try something new again

– Start again on new short-term project

•Need to break this cycle•EAGER grant to look at characteristics of successful gateways and domain areas where a gateway could have a big impact•Working with Katherine Lawrence, UM

University of Michigan CI Days, November 2, 2010

4 focus group meetings over 2 yearsFirst 2 held June, 2010

www.sciencegateways.org

Page 34: TeraGrid Science Gateways Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways wilkinsn@sdsc.edu University of Michigan CI Days, November 2,

University of Michigan CI Days, November 2, 2010

Thank you for your attention!Questions?

Nancy Wilkins-Diehr, [email protected]