47
Virtual Organizations: Building Interdisciplinary Collaborations Dan Reed [email protected] Chancellor’s Eminent Professor Vice Chancellor for IT University of North Carolina at Chapel Hill Director, Renaissance Computing Institute

Virtual Organizations: Building Interdisciplinary Collaborations

  • Upload
    diem

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Virtual Organizations: Building Interdisciplinary Collaborations. Dan Reed [email protected] Chancellor’s Eminent Professor Vice Chancellor for IT University of North Carolina at Chapel Hill Director, Renaissance Computing Institute. Acknowledgments. Funding agencies NIH - PowerPoint PPT Presentation

Citation preview

Page 1: Virtual Organizations: Building Interdisciplinary Collaborations

Virtual Organizations: Building Interdisciplinary Collaborations

Dan [email protected]

Chancellor’s Eminent ProfessorVice Chancellor for IT

University of North Carolina at Chapel Hill

Director, Renaissance Computing Institute

Page 2: Virtual Organizations: Building Interdisciplinary Collaborations

Acknowledgments

• Funding agencies– NIH

• Carolina Center for Exploratory Genetic Analysis (CCEGA)

– NSF• TeraGrid Science Gateways

– State of North Carolina• RENCI and ancillary Bioportal support

• RENCI staff– Alan Blatecky, Kevin Gamiel, Xiaojun Guan– Clark Jefferies, Howard Lander– John Magee, Ruth Marinshaw, Jeff Tilson– Lavanya Ramakrishnan

• And a host of others …

Page 3: Virtual Organizations: Building Interdisciplinary Collaborations

21st Century Challenges• The three fold way

– theory and scholarship– experiment and measurement– computation and analysis

• Supported by– distributed, multidisciplinary teams– multimodal collaboration systems– distributed, large scale data sources– leading edge computing systems– distributed experimental facilities

• Socialization and community– multidisciplinary groups– geographic distribution– new enabling technologies– creation of 21st century IT infrastructure

• sustainable, multidisciplinary communities

• “Come as you are” response

Th

eory

Exp

erim

ent

Computation

Page 4: Virtual Organizations: Building Interdisciplinary Collaborations

Exemplar 21st Century Challenges

• Population growth in sensitive areas– severe weather sensitivity

• national impact– geobiology and environment– economics and finance– sociology and policy

• Economics and health care– longitudinal public health data

• environmental interactions– genetic susceptibility

• heart disease, cancer, Alzheimer's– privacy and insurance– public policy and coordination

Page 5: Virtual Organizations: Building Interdisciplinary Collaborations

Mean Onset of Alzheimer’s Disease• apolipoprotein (apo)

– apoE2, apoE3 and apoE4 alleles• on chromosome 19

– apoE4 allele• 40% to 60% of Alzheimer's patients• not the only cause for Alzheimer’s

• apo gene inheritance– ~25% inherit 1 copy of apoE4 allele

• Alzheimer's risk increases 4X

– 2% inherit 2 copies of apoE4 allele• Alzheimer's risk increases 10X

60 65 70 75 80 85

1.0

0.8

0.6

0.4

0.2

0P

ropo

rtio

n of

eac

hge

noty

pe u

naffe

cted

Age at onset

2/3

2/43/3

3/4

4/4

Source: Alan Roses, GSK

Page 6: Virtual Organizations: Building Interdisciplinary Collaborations

Big QuestionsDNA

sequenceProtein

structure

Homology basedprotein structure

prediction

Protein sequence and regulation

SequenceAnnotation

Q

Y

R

CGT

TAC

CAG

TATAP

rom

ote

rM

es

sa

ge

Protein/enzymefunction

Molecularsimulations

Bacteria and cells

Pathwaysimulations

Metabolic pathwaysand regulatory networks

Networkanalysis

Dataintegration

Organs, Organisms and Ecologies

Multi-proteinmachines

Page 7: Virtual Organizations: Building Interdisciplinary Collaborations

Identify Genes

Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4

Predictive Disease Susceptibility

Physiology

Metabolism Endocrine

Proteome

Immune Transcriptome

BiomarkerSignatures

Morphometrics

Pharmacokinetics

EthnicityEnvironment

AgeGender

Genetics and Disease Susceptibility

Source: Terry Magnuson, UNC

Page 8: Virtual Organizations: Building Interdisciplinary Collaborations

PITAC Report Contents• Computational Science: Ensuring

America’s Competitiveness 1. A Wake-up Call: The Challenges to U.S.

Preeminence and Competitiveness2. Medieval or Modern? Research and Education

Structures for the 21st Century3. Multi-decade Roadmap for Computational Science4. Sustained Infrastructure for Discovery and

Competitiveness5. Research and Development Challenges

• Two key appendices– Examples of Computational Science at Work– Computational Science Warnings – A Message

Rarely Heeded

• Available at www.nitrd.gov

Page 9: Virtual Organizations: Building Interdisciplinary Collaborations

Life Science Lessons from Astronomy

• Historically, discoveries accrued to those– with access to unique data– who built next generation telescopes

• Two things changed– growing costs and complexity of telescopes– emergence of whole sky surveys

• The result – virtual astronomy– discovering significant patterns

• analysis of rich image/catalog databases

– understanding complex astrophysical systems • integrated data/large numerical simulations

Page 10: Virtual Organizations: Building Interdisciplinary Collaborations

{Inter}national Virtual Observatory

Cluster Galaxy Morphology Analysis Portal

clusters

Chandra SIA

Skyview SIA

DSS SIA

2. Look up clusterin internally storedcatalog

1.User’s Machine

webbrowser

User selectsa cluster

3. X-ray and Optical Images retrieved via SIA interface

4. User launchesdistributed analysis

NED Cone Search

CADC CNOC Cone Search

5. Initial Galaxy Catalog generated via Cone Search

DSS SIA

CNOC SIA

6. Image cutout pointers merged into catalog

Morphology CalculationService

7.Morphological parameters calculatedon grid for each galaxy

8.User downloads finaltable and images for analysis & visualization

Source: Ray Plante, NCSA

Page 11: Virtual Organizations: Building Interdisciplinary Collaborations

The Bioinformatics Challenge• Challenge

– the rise of quantitative biology• burgeoning bioinformatics data

– complex analysis and modeling problems– education and training in new technologies

• Reality– diverse tools with idiosyncratic interfaces

• steep learning curves– software development by diverse groups– distributed, databases with diverse metadata

• Need– integrated, easy-to-use toolset with standard interfaces– extensible mechanisms that hide idiosyncrasies– tool and bioinformatics training

• The solution– bioinformatics infrastructure and coupled training

Page 12: Virtual Organizations: Building Interdisciplinary Collaborations

Need: Simple, Easy-To-Use Tools

“Genome. Bought the book. Hard to read.”

Eric Lander

Page 13: Virtual Organizations: Building Interdisciplinary Collaborations

Web and Social Processes• Google

– it’s a search engine, it’s a verb, …

• Blogs– published self-expression

• Instant Messenger– social networks

• Wireless messaging– semi-synchronous

• Internet commerce– the dot.com boom/bust– EBay, Amazon

• Spam, phishing, …– anti-social behavior

Page 14: Virtual Organizations: Building Interdisciplinary Collaborations

Benefits of Standards

• Interoperability• Separation of concerns• Reuse• Independence• Dependability• Sharing• Commonality• Shared knowledge base

– knowledge reuse– simplification (one hopes)

Page 16: Virtual Organizations: Building Interdisciplinary Collaborations

What’s A Grid/Web Service?

http://http://

http://http://

Web: Uniform access to documents

Grid/Web Services: Flexible, high-performance access to resources and services for distributed communities

Sensors andinstruments

Data archives

Computers

Softwarecatalogs

Colleagues

              

It’s been 12 years!

Page 17: Virtual Organizations: Building Interdisciplinary Collaborations

Grid History: I-Way at SC’95• A prototype national infrastructure

– 17 sites, connected by • vBNS and six other ATM networks

– 60 applications

• Features– I-POPs for site access– Kerberos authentication– manual scheduling– distributed communication libraries

• Experiences– led to Globus Grid toolkit

• Concurrent industry needs– led to web services for B2B interoperation

Page 18: Virtual Organizations: Building Interdisciplinary Collaborations

Web Services: “Commercial Grids”

• From browser-centric to service-centric– from human-computer to computer-computer– structured negotiation and response

• Workflow creation and management– end-to-end service negotiation– inter-organizational interaction

• Prerequisites– metadata standard for service descriptions– standard communication mechanisms– resource discovery and registration

Page 19: Virtual Organizations: Building Interdisciplinary Collaborations

eBay Web Services Architecture

• Over 40% of eBay's listings are now via API calls

Source: IBM

Page 20: Virtual Organizations: Building Interdisciplinary Collaborations

Web Services: A DefinitionA web service is … designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact … [using] its description using SOAP-messages, … using HTTP with an XML serialization ....

W3C Working Draft, August 2003

• SOAP (Simple Object Access Protocol)

ServiceProvider

ServiceConsumer

ServiceBroker

Publish

LocateInvoke

SOAPSOAP

SOAP

WSDL

• WSDL (Web Services Description Language)

UDDI

• UDDI (Universal Description, Discovery and Integration)

Page 21: Virtual Organizations: Building Interdisciplinary Collaborations

Technology Push

Source: Gartner Group

Page 22: Virtual Organizations: Building Interdisciplinary Collaborations

European myGrid Architecture

Source: www.mygrid.org

Page 23: Virtual Organizations: Building Interdisciplinary Collaborations

The Bioinformatics Challenges• Complex, multilevel models

– integration and in silico designs• Information visualization

– complexity and scale• Data models and ontologies

– community definition• Data federation, storage and management

– shared access and support• User access portals

– web-based tool and service interfaces • Packaging, distribution and deployment

– community building

Page 24: Virtual Organizations: Building Interdisciplinary Collaborations

Multilevel Cellular Models• Signaling networks

– environmental triggers and behavior• e.g., cell lifecycle

– different pathways in each tissue type • Metabolic networks

– measurable products in pathway – many systems are steady state– negative feedback leads to stabilization

• Protein interaction networks– localization of proteins that interact for function– protein-protein interactions for specific actions

• Gene regulatory networks– many things affect gene product concentration– nucleic-nucleic, protein-nucleic interactions

• Computing, physics, engineering and biology– control theory, mathematical models, phase spaces– from biological cartoons to predictive models

• e.g., microRNAs and gene expression controls

Page 25: Virtual Organizations: Building Interdisciplinary Collaborations

Biological Models• Simulation and prediction

– structures and dynamics

• Reasoning and discovery– reverse engineering

10-12

Bond Motion Catalysis

Diffusion

TranscriptionTranslation

Growth &Division

10-9 10-6 10-3 100 103 106

100

Metabolites Proteins Ribosomes Prokaryotes Eukaryotes

102 104 106 108 1010 1012

Temporal (seconds)

Spatial (nM3)

Page 26: Virtual Organizations: Building Interdisciplinary Collaborations

Biophysical and Environmental Modeling

Genomics

Proteomics

Cell biochemistryand structure

Cilia

Mucus

Airway/flow

Source: Ric Boucher, UNC

Page 27: Virtual Organizations: Building Interdisciplinary Collaborations

Data Heterogeneity and Complexity

DiseaseDisease

DiseaseDrug

DiseaseClinical

trialPhenotype

ProteinProtein

StructureProtein

SequenceP-P

interactions

Proteome

Gene sequenceGenome

sequence

Gene expressionGene

expression

homology

Genomic, proteomic, transcriptomic, metabalomic, protein-protein interactions, regulatory bio-networks, alignments, disease, patterns and motifs, protein structure, protein classifications, specialist proteins (enzymes, receptors), …

Source: Carole Goble (Manchester)

Page 28: Virtual Organizations: Building Interdisciplinary Collaborations

Source: Robert Morris, IBM

Sensor Data Overload

• High resolution brain imaging– 4.5 petabytes (PB) per brain

Source: Chris Johnson, Utah Art Toga, UCLA

Page 29: Virtual Organizations: Building Interdisciplinary Collaborations

RENCI: What Is It?• Statewide objectives

– create broad benefit in a competitive world– engage industry, academia, government and citizens

• Four target areas– public benefit

• supporting urban planning, disaster response, …– economic development

• helping companies and people with innovative ideas– research engagement across disciplines

• catalyzing new projects and increasing success• building multidisciplinary partnerships

– education and outreach• providing hands on experiences and broadening participation

• Mechanisms and approaches– partnerships and collaborations– infrastructure as needed to accomplish goals

Page 30: Virtual Organizations: Building Interdisciplinary Collaborations

Extant Data Models

Faculty, Staff & Students

Virtuous Cycle

InterdisciplinaryResearch & Education

Carolina Center for Exploratory Genetic Analysis (CCEGA)

Statistical &Computational

Techniques

ExperimentalGenetics Portal

Driving Problems

Analysis Techniques

PromotingMutual

Awareness

InteroperableData

Management

Page 31: Virtual Organizations: Building Interdisciplinary Collaborations

CCEGA Participants• Coordination team

– Dan Reed, RENCI– Terry Magnuson, CCGS– Alan Blatecky, RENCI– Kirk Wilhelmsen, CCGS

• Eleven departments/institutes– Biostatistics– Cancer Center– Genetics– Computer Science– Epidemiology– Genetics– Health Science Library– Information and Library Science– Pharmacy– RENCI– Statistics

• Campus wide support– from many sources

• Project participants– Brad Hemminger, Information & Library Science– James Evans, Genetics– Kevin Gamiel, RENCI– Xiaojun Guan, RENCI– Barrie Hays, Health Science Library– Clark Jefferies, RENCI– Ethan Lange, Genetics– Andrew Nobel, Statistics– Karen Mohlke, Genetics– Kari North, Epidemiology– Susan Paulsen, Computer Science– Fernando Manuel Pardo, Genetics– Charles Perou, Cancer Center– Lavanya Ramakrishnan, RENCI– Jan Prins, Computer Science– Patrick Sullivan, Genetics– Lisa Susswein, Cancer Center– David Threadgill, Genetics– Alexander Tropsha, Pharmacy– K.T.L. Vaughan, Health Science Library– Fred Wright, Biostatistics– Wei Wang, Computer Science– Fei Zou, Biostatistics

Page 32: Virtual Organizations: Building Interdisciplinary Collaborations

Data: From Lab and Clinic to Analysis• Independent data management

– data security– version control– redundancy– controlled access

Clinical

LaboratoryAnalysis

ELSI

Source: Brad Hemmenger, UNC

• NIH CCEGA– Carolina Center for Exploratory Genetic Analysis

Analysis

LAB

ELSI

Integration &Informatics

Clin

ic

Analysis

Page 33: Virtual Organizations: Building Interdisciplinary Collaborations

Data Management and Information Viz

…..

Information MiningModule

Information MiningModule

Information VisualizationModule

Information VisualizationModule

GenBank

Taxonomy Annotation

Taxonomy Annotation

Ontology AnnotationOntology

Annotation

Annotated Domain Literature

Annotated Domain Literature

Published Domain Literature

DB Schema Ontology

Annotation

Page 34: Virtual Organizations: Building Interdisciplinary Collaborations

From SNPs to HapMap

• Single Nucleotide Polymorphisms (SNPs)– one in ~1200 bases differ across individuals– SNPs act as markers to locate genes

• Common groups of SNPs are shared – i.e., form a haplotype

• HapMap data sources– 90 Yoruba individuals (30 trios) from Nigeria (YRI)– 90 individuals (30 trios) of European descent from Utah (CEU)– 45 Han Chinese individuals from Beijing (CHB)– 45 Japanese individuals from Tokyo (JPT)

• ~3,500,000 SNPs typed– basis for association studies for disease identification

Page 35: Virtual Organizations: Building Interdisciplinary Collaborations

CCEGA HapMap Simulator

• Synthetic data– disease models– model testing

• mining bakeoffs

Page 36: Virtual Organizations: Building Interdisciplinary Collaborations

Carolina Bioportal• Three overlapping target groups

– undergraduate education– graduate education and research– academic/industrial research

• Features– access to common bioinformatics tools– extensible toolkit and infrastructure

• OGCE and National Middleware Initiative (NMI)• leverages emerging international standards

– remotely accessible or locally deployable– packaged and distributed with documentation

• National reach and community– TeraGrid deployment

• science gateway• Education and training

– hands-on workshops• clusters, Grids, portals and bioinformatics

Page 37: Virtual Organizations: Building Interdisciplinary Collaborations
Page 38: Virtual Organizations: Building Interdisciplinary Collaborations

Distributed Grid and Web Services

Resource Layer(from PCs to Supercomputers)

Grid Portals

Launch, configureand control Application Interface

Workflow service

App InstanceApp InstanceApp Instance

SecuritySecurity

Data ManagementServiceData Management

Service

AccountingServiceAccounting

ServiceLogging

Logging

Event/MessageServiceEvent/Message

Service

PolicyPolicy

Administration& MonitoringAdministration

& MonitoringGrid Orchestration

Grid Orchestration

Registries andName bindingRegistries and

Name binding

Reservations And SchedulingReservations

And Scheduling

Open Grid Service Architecture Layer

Open Grid Service Infrastructure (web service component model)

Online instruments

Source: Dennis Gannon, Indiana

Page 39: Virtual Organizations: Building Interdisciplinary Collaborations

PISEApplication

XML Description

HTML Files

Bioportal

GatekeeperGridFTP MyProxy

OGCE User Databases

Job History Database

Application Processing

InterfaceGenerator

VelocityFiles

ApplicationProcessing

CommandFiles

Authentication,Grid Credential

User Profile

Job SubmissionJob

Records

RemoteFile

Access

Bioportal Architecture

www.ncbioportal.org

ApplicationDatabases

Localcluster• OGCE toolkit

– used by cyberinfrastructure projects• LEAD, NEES, PACI, DOE, TeraGrid …

Page 40: Virtual Organizations: Building Interdisciplinary Collaborations

Putting the Technologies Together

NC Bioportal

OGCE Toolkit (Grid middleware)

Chef (collaboration/standard portlets)

Velocity(templateengine)

Jakarta Jetspeed(enterprise portal)

Turbine(web app

framework)

Tomcat(Apacheservlet

container)

GridPortlets,

CoG

Databases

BioApplications

PISE(XML

Wrapper)

VMC

Page 41: Virtual Organizations: Building Interdisciplinary Collaborations

Community Software Toolkit: Lessons

• NSF PACI Alliance “In a Box” toolkits– cluster software (aka OSCAR)– Grid infrastructure (aka NMI)– Access Grid for distributed collaboration– tiled display walls for visualization

• Distribution materials– software and training materials

• CDs and web• Community workshops and training

– Linux Clusters Institute– MSI HPC workshops– hands on training

• Lowering the entry barrier– usage and deployment

• Bioportal distribution– workshops, tutorials– training materials– road shows

Page 42: Virtual Organizations: Building Interdisciplinary Collaborations

NC Bioportal: What’s Next• Engagement

– workshops, experiences and deployments• Infrastructure

– dynamic job scheduling across multiple sites– migration to OGCE 2.0– fully automated database updates– workflow construction and processing

• Portal tool suite– expanded applications and databases

• phylogeny, morphology, microarray analysis, …

• Training materials– additional modules based on user feedback– workshop materials packaged for self-study

• Leverage national presence– TeraGrid/NCSA bioinformatics portal

Page 43: Virtual Organizations: Building Interdisciplinary Collaborations

The Vision of Grid/Web Services

“… Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do.”– Book of Genesis

Peter BruegelThe Tower of Babel (1563)

Page 44: Virtual Organizations: Building Interdisciplinary Collaborations

Interdisciplinary Collaborations

• Appropriate reward structures– well-matched time constants

• Intellectual equality– balanced recognition of contributions

• Research/infrastructure distinctions– timelines and people needs differ

• Confidentiality and openness– academic/industry collaboration perspectives

• Intellectual property– background IP and differential disciplinary models

Page 45: Virtual Organizations: Building Interdisciplinary Collaborations

Some Thoughts on the Future• Grids/web services are not a panacea

– we have seen this movie before• standards debates can be endless• make new mistakes, not the same old ones

– code is shifted from modules to interfaces

• Danger of “Death by CS Abstraction”– “all problems can be solved by another level of indirection”

• Appropriate decomposition is a challenge– performance, usability, flexibility

• Generality and extensibility really matter– incremental aggregation and interoperability– data management and federation

• Better questions, not just private capabilities– limited by creativity not resources

Page 46: Virtual Organizations: Building Interdisciplinary Collaborations

The Cambrian Explosion

• Most phyla appear– sponges, archaeocyathids, brachiopods– trilobites, primitive mollusks, echinoderms

• Indeed, most appeared quickly!– Tommotian and Atdbanian – as little as five million years

• Lessons for computing– it doesn’t take long when conditions are right

• raw materials and environment

– leave fossil records if you want to be remembered!

Page 47: Virtual Organizations: Building Interdisciplinary Collaborations

Thanks for the Invitation!