27
Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center [email protected]

Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Embed Size (px)

Citation preview

Page 1: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Science Gatewaysand their tremendous potential for

science

Nancy Wilkins-DiehrTeraGrid Area Director for Science

Gateways San Diego Supercomputer Center

[email protected]

Page 2: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Overview

•What are Science Gateways?•What is TeraGrid?•Why TeraGrid and Gateways?•Examples of Success•How Does This Help Me?

Page 3: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Phenomenal Impact of the Internet on Scientific Research

Only 15 years since the release of Mosaic!

•Very rapid changes in how science is conducted–1988, National Center for Biotechnology Information BLAST server, search results sent by email, still a working portal today

–1992 Mosaic web browser developed–1995 “International Protein Data Bank Enhanced by Computer Browser”

–2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program

•Ensuing explosion of digital information–Need for analysis in a growing number of scientific areas

Page 4: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Very Rapid Changes in Web Usability

•First generation–Static Web pages

•Second generation –Dynamic, database interfaces, cgi–Lacked the ease of use of desktop applications

•Third generation–True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web.

–These new applications will enable remarkable new uses of the Web in the organizational workplace and on the Internet

•Fourth generation–Web 2.0– Source: Screen Porch White Paper, The University of Western Ontario (1998)

Page 5: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Gateways are a Natural Extension of Internet Developments

•3 common types of gateway–Web portal with users in front and services in back–Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services

–Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids

•Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness

Page 6: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Arden BementSenate Testimony, April 19, 2007

•“Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.”•“In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.”

•Gateways are a terrific example of interfaces that can support transformative science

Page 7: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Gateway Idea Resonates with Scientists

•Capabilities provided by the Web are easy to envision because we use them in every day life•Researchers can imagine scientific capabilities provided through a familiar interface

•Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities–But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities

–Scientists know they can undertake more complex analyses and that’s all they want to focus on

•But this seamless access doesn’t come for free. It all hinges on very capable developers

Page 8: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Tremendous Opportunities Using the Largest Shared Resources - Challenges too!

•What’s different when the resource doesn’t belong just to me?–Resource discovery–Accounting–Security–Proposal-based requests for resources (peer-reviewed access)•Code scaling and performance numbers•Detailed justification of resource request•Citations, metrics of success

•Tremendous benefits at the high end, but even more work for the developers•Potential impact on science is huge–Small number of developers can impact thousands of scientists

–But need a way to train and fund those developers and provide them with appropriate tools

Page 9: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

What is the TeraGrid?

•NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers

300+ Teraflops Computation

20+ Petabytes Storage

Dedicated cross-country network

Visualization

Page 10: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

TeraGrid Resources Available to Academic Researchers at No Cost

•TeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems•Proposal-based access, researchers can use resources at no cost–Targeted support available as well

Page 11: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Implementing Common Gateway Requirements

•Web Services– GT4 deployment, identification of remaining capabilities

– Information services, WebMDS

•Auditing– Need to retrieve job usage info on production resources

– GRAM audit deployed in test mode in September, inclusion in CTSSv4

•Community Accounts– Policy finalized, security approaches being tested by RPs

– Attribute-based authentication testing

•Allocations– Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations.

•Scheduling– Metascheduling RAT– On-demand via SPRUCE framework

•Outreach– Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD)

– SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop

•Primer– Living document in wiki, provides up-to-date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”)

Page 12: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Gateways are growing in numbersSuccess in a variety of domains

•10 initial projects as part of TG proposal•>20 Gateway projects today•No limit on how many gateways can use TG resources– Prepare services and documentation so developers can work independently

•Open Science Grid (OSG)•Special PRiority and Urgent Computing Environment (SPRUCE)•National Virtual Observatory (NVO)•Linked Environments for Atmospheric Discovery (LEAD)•Computational Chemistry Grid (GridChem)•Computational Science and Engineering Online (CSE-Online)•GEON(GEOsciences Network)•Network for Earthquake Engineering Simulation (NEES)•SCEC Earthworks Project•Network for Computational Nanotechnology and nanoHUB•GIScience Gateway (GISolve)•Biology and Biomedicine Science Gateway•Open Life Sciences Gateway•The Telescience Project•Grid Analysis Environment (GAE)•Neutron Science Instrument Gateway•TeraGrid Visualization Gateway•BIRN•Gridblast Bioinformatics Gateway•Earth Systems Grid•Astrophysical Data Repository (Cornell)

•Many others interested– SID Grid– HASTAC

Page 13: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Mapping Tool Used on Large Data Sets to Spot Brain Disorders

•Large Deformation Diffeomorphic Metric Mapping (LDDMM), developed at the Center for Imaging Science at Johns Hopkins•Computes a mathematical description of which shapes are similar and different by computing metric distances in the space of anatomical images

Source: SDSC Headlines, Paul Tooby

"Using TeraGrid resources at multiple sites, this research has been able to successfully

distinguish diagnostic categories such as Alzheimer's and Semantic Dementia from

control subjects," said Anthony Kolasny, JHU. "This can potentially lead to a powerful new cyberinfrastructure tool clinicians can use to

make earlier, more accurate diagnoses."

Page 14: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

BIRN uses SSHFS to mount TeraGrid filesystems locally

220TB through

CIS portal using autofs, samba,

smbwebclient.

CIS has 87TB of local storage.

/cis/net lists network drives.

Source: Anthony Kolasny, Johns Hopkins University

Page 15: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

What is SSHFS and how can it help?

•SSHFS allows you to mount data through an ssh connection. –http://fuse.sourceforge.net/sshfs.html–http://wikipedia.org/wiki/SSH_Filesystem

•Simple command line–sshfs remoteuser@remotehost:/path/to/remote_dir local_dir

•Performance is as fast as your ssh connection. Performance tuning possible.•Allows you to use local applications on remote data.–using Paraview to look at data processed on the TeraGrid and stored on the GPFS-WAN.

•Directly accessing the remote file. Your changes are seen by everyone.

Source: Anthony Kolasny, Johns Hopkins University

Page 16: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

TeraGrid Life Science Gateway•Application services for bio-informaticians•Ability for end-users to apply the large scale resources of the TeraGrid to their problems, while leveraging local resources, •Featured apps–InterProScan, version 4.2–InterProScan Data version 12.0–hmmr, version 2.3.2–Blastall (from InterProScan) version 2.2.6

•Plans to engage Bioinformatics Research Centers (BRC)– Eight BRCs sponsored by the National Institute of Allergy and Infectious Disease (NIAID)

– Funded to display sequencing and annotation data, comparative analysis, genome polymorphisms, gene expression, proteomics, host/pathogen interactions and pathways for the NIAID list of Category A-C priority pathogens and other pathogens causing emerging and re-emerging diseases.

Page 17: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

TeraGrid Bioportal

•Access to over 140 computational tools and many biological data sets•Collaborative workspace, simplified access to diverse set of tools•Database searching, alignment and phylogeny, pattern searching, DNA/RNA analysis, and protein analysis•EMBOSS (European Molecular Biology Open Software Suite), GLIMMER (Gene Locator and Interpolated Markov Modeler), HMMER (Hidden Markov Modeler), the NCBI (National Center for Biotechnology Information) toolkit and PHYLIP (PHYLogeny Inference Package). •Standard databases include NCBI Aggregate, PDB, Prints, RepBase, UniProt, PFam, ProSite, and TransFac

Page 18: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

GEONDeveloping cyberinfrastructure in support of an environment for integrative geoscience research

•IT advances can significantly impact how geoscientists conduct their daily research activities– Web/grid services, TeraGrid– Semantic data integration– Information management and ontologies

•Tremendous opportunities to conduct novel and efficient research in many areas of the geosciences•SYNSEIS – SYNthetic SEISmogram generation tool – Helps seismologists calculate synthetic 3D regional seismic waveforms

– Accesses distributed data centers and large computational clusters

– Users only need to have access to the Internet and a browser. The entire system is web-based and is accessible from the GEONgrid portal web page.

Page 19: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

GEON: LiDAR (Light Distance And Ranging) data

•Capable of generating digital elevation models (DEMs) more than an order of magnitude more accurate than those currently available•Opportunity for geologists to study the processes the shape the earth’s surface at resolutions not previously possible. •Distribution, interpolation and analysis of large LiDAR datasets, which frequently exceed a billion data-points, present significant computational challenges. •GEON tools begin with a user-defined subset of data and ends with download and visualization of interpolated surfaces and derived products.

Page 20: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Linked Environments for Atmospheric Discovery (LEAD)

•Providing tools that are needed to make accurate predictions of tornados and hurricanes

•Meteorological data•Forecast models•Analysis and visualization tools

•Data exploration and Grid workflow

Page 21: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

LEAD Inspires Students

•“Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February 2007. It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!” Eric (email, March 2007)

Page 22: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

NanoHub Explosive User Growth

•Nanohub attracts thousands of users•Over 2M hits in last month•In past 12 months– Over 21,000 users– Almost 175,000 simulation runs

•Very full-featured– Simulation tools– Research proceedings– Curricula content– Collaboration spaces

Nanohub is used to complete coursework by undergraduate and graduate students in

dozens of courses at 10 universities.

Page 23: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

GridChem - a desktop application gateway

•Computational Chemistry Grid (CCG) science gateway GridChem has been using TeraGrid in production since April 2006•Currently services over 100 users and has delivered hundreds of thousands of CPU hours•Many paper publications resulting from GridChem use

Page 24: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

CReSIS (Center for Remote Sensing of Ice Sheets)

•Awarded CI-TEAM funding to build a Polar Gateway–International Polar Year 2007-2008

•CReSISGrid–Build a TeraGrid Science Gateway

–Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics

–MSI impact through leadership of Linda Hayden, Elizabeth City State University

Page 25: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Tremendous Potential for Gateways

•In only 15 years, the Web has fundamentally changed human communication•Science Gateways can leverage this amazingly powerful tool to:–transform the way scientists collaborate

•tackle the toughest problems independent of location

–impact the amount of science that can result from each project–influence the public’s perception of science

•High end resources can have a profound impact•The future is very exciting!–Web 2.0–Application Hosting–Gateway-in-a-box

Page 26: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Would development of a gateway help your research?

•Researchers using defined sets of tools in different ways–Same executables, different input–Datasets–Workflow creation

•Common data formats•Large shared datasets•[email protected] mailing list–Email [email protected]–<subscribe gateways> in body

•Biweekly telecons to get advice from others•www.teragrid.org–Details about current gateways–Materials from June full day tutorial at TG07

Page 27: Science Gateways and their tremendous potential for science Nancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

Thank you for your attentionAny questions?

Nancy [email protected]