24
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009

Cyberinfrastructure Overview

  • Upload
    mandar

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Cyberinfrastructure Overview. Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California, Santa Barbara DataONE Kick-off Meeting October 20-22, 2009. Cyberinfrastructure Objectives. - PowerPoint PPT Presentation

Citation preview

Page 1: Cyberinfrastructure Overview

Cyberinfrastructure Overview

Core Cyberinfrastructure Team

Matthew B. JonesNational Center for Ecological Analysis and Synthesis (NCEAS)

University of California, Santa Barbara

DataONE Kick-off Meeting October 20-22, 2009

Page 2: Cyberinfrastructure Overview

Cyberinfrastructure Objectives

Support synthesis in earth observation sciences

Support full lifecycle of scientific process Data acquisition and management Data preservation Data discovery and access Data integration Data analysis and visualization Process management and preservation

Evolve to accommodate technology change

Page 3: Cyberinfrastructure Overview

Design goals

Distributed management at Member Nodes Replication and caching for preservation and performance Software must provide benefits for scientists today Evolution of software and standards Support and adapt existing community software efforts Emphasize Free and Open Source Software

Page 4: Cyberinfrastructure Overview

What data are in scope?

Biological e.g., Gene, Organism,

Population, Species, Community, Biome, Ecosystem

Environmental e.g., Atmospheric, Chemical,

Ecological, Hydrological, Oceanographic, Physical

Social e.g., Land use, human population

Economic e.g., trade, ecosystem services,

resource extraction

Page 5: Cyberinfrastructure Overview

Providers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students

Consumers Academic and Agency Scientists Research networks Environmental observatories Citizen groups Students

Who are the providers and consumers?

Same people, different rolesdriving needs

Page 6: Cyberinfrastructure Overview

Every community has multiple metadata schemas

Biological Data Profile, Darwin Core, Dublin Core, Ecological Metadata Language, Open GIS schemas

multiple data formats ASCII, NetCDF, HDF, GeoTiff, ...

Some communities have general and domain specific ontologies

Addressing this heterogeneity is critical Integrated analysis of datasets requires

Syntax mapping Semantics mapping Sophisticated integration tools that do not exist

Metadata and data integration

Page 7: Cyberinfrastructure Overview

Integrating with existing infrastructure

KNB, ESDIS, and Waters Networks

Page 8: Cyberinfrastructure Overview

Overview of Components

Member Nodes Earth observing institutions, projects, and networks Provide resources for their own data and replicated data Focused on serving their constituencies

Coordinating Nodes Provide network-wide services to Member Nodes Geographically replicated services

Investigator Toolkit Tools for researchers to access DataNetONE General Purpose and discipline-specific tools Adapt existing tools where possible

Page 9: Cyberinfrastructure Overview

Node Design

Member nodes Geographically Distributed Nodes Authoritative repository for many datasets Diversity tolerant (less tightly coordinated) Freedom to try new tools, methods, and leapfrog forward Partial replication

Coordinating nodes Completely replicated Complete metadata catalogue Data Subset (initially a large fraction) Tightly coordinated, stable service platform

Page 10: Cyberinfrastructure Overview

DataONE Service Interface

Federated Identity and Authorization Services

Object Management Services

Discovery and Usage Services

Preservation Services

Network Services

Page 11: Cyberinfrastructure Overview
Page 12: Cyberinfrastructure Overview

Create common access methods for different clients

Create a mechanism to map heterogeneous services

Provide an interface between nodes and service requests

Simplicity of construction Lightweight Ease of implementation Implementations are opaque to service

consumers

Service Interface for Interoperability

Page 13: Cyberinfrastructure Overview

DataNetONE Components

Page 14: Cyberinfrastructure Overview

What is the Investigator Toolkit?

Suite of software tools for researchers Emphasize Free and Open Source, but support commercial General analysis frameworks (e.g., R, MATLAB) Domain-specific tools (e.g., GARP, Phylocom) Organized using scientific workflows

Supports the scientific lifecycle Data management and preservation Data query and access Data analysis and visualization Process management and preservation

Communication via the Service Interface

Page 15: Cyberinfrastructure Overview

Toolkit Functions

Supports the scientific lifecycle

Data management and preservation

Data query and access

Data analysis and visualization

Process management and preservation

Portal software

Page 16: Cyberinfrastructure Overview

Many existing open source efforts exist Data management: MATT, UDig, Specify Analysis and modeling: R, Octave

Workflow systems: Kepler, Taverna, Triana, Pegasus

Grid systems: Condor, Globus, BOINC Data and workflow portals: VegBank, myExperiment

Commercial tools important tooMATLAB, SAS, ArcGIS

DataONE: help communities build their own tools Integrate, interoperate, stabilize Create libraries to DataONE Service Interface

Who will build the Toolkit?

Page 17: Cyberinfrastructure Overview

Data Management and Preservation

Data management functions Data creation, input, editing, versioning Metadata creation, editing, annotation Local data storage, indexing, searching

Example applications Morpho metadata editor Mercury metadata editor MATT metadata editor ESRI ArcCatalog

Metacat Data Server -- lab group data management

Page 18: Cyberinfrastructure Overview

Data Analysis and Visualization

Need community-standard analysis frameworks R, Octave, GRASS SPlus, MATLAB, ArcGIS

Thousands of domain-specific analytical tools exist GARP: Genetic Algorithm for Rule Processing Blast search ClustalW Phlylocom Mesquite

Page 19: Cyberinfrastructure Overview

Workflow system capabilities

Workflow systems: Enable communication Support preservation of scientific processes Enable component re-use Allow integration across many software frameworks

Example workflow engines Kepler, Taverna, Pegasus, Triana

Page 20: Cyberinfrastructure Overview

Community tools have been successful

Investigator Toolkit will build upon these successes Adapt tools to work together with Service Interface Support Free and Open Source Software

Supported tools will build over time

Page 21: Cyberinfrastructure Overview

DataONE discovery portals

Data discovery portal at Coordinating Nodes

Workflow discovery portal at Coordinating Nodes

Other portals as needed

Page 22: Cyberinfrastructure Overview

Outstanding issues

Data Discovery, Access, and Availability Federated Identity, Authentication, and Access

Control Metadata and data standards

Evolution of specifications Data Integration and Interoperability Data and Metadata preservation, longevity,

and migration Versioning and identifiers

Scalability

Page 23: Cyberinfrastructure Overview

NIH Syndrome

Lots of: metadata catalogs and specifications data standards service definitions architectures and protocols

Many communities of practice GEOSS, KNB, CUAHSI, NBII, GBIF, TDWG, Ameriflux, EOS, OGC, W3C,

LTER, NEON, OOI and on and on and on...

DataONE can not just be Community n+1 Easy to get entrained in the details Have to save people work Have to engage groups early and earnestly

Page 24: Cyberinfrastructure Overview

DataONE

I am here

W3C

NCEAS

OGC

TDWG

LTER

Kepler

SONet

ME

KNB

GBIF

GEOSS

EOS

Where are you?