View
217
Download
0
Tags:
Embed Size (px)
Citation preview
ESG OverviewESG Overview
• Earth System Grid enables management, Earth System Grid enables management, discovery, distributed access, processing discovery, distributed access, processing and analysis of distributed terascale and analysis of distributed terascale climate research dataclimate research data
• A “Collaboratory Pilot Project”A “Collaboratory Pilot Project” funded by funded by the DOE(Department of Energy) SciDAC the DOE(Department of Energy) SciDAC programprogram
• Build upon ESG-I, Globus ToolkitBuild upon ESG-I, Globus Toolkit, , DataGrid technologiesDataGrid technologies
ESG OverviewESG Overview
• The main goal of ESG is to make climate data The main goal of ESG is to make climate data an easily accessible community resource. an easily accessible community resource.
• Enabling researchers to understand and make Enabling researchers to understand and make effective use of very large, distributed climate effective use of very large, distributed climate datasets is critical.datasets is critical.
• The broad strategy is to develope a collection of The broad strategy is to develope a collection of server-side capabilities – minimize the amount of server-side capabilities – minimize the amount of data movementdata movement
• Multiple interfaces to ESG will allow researchers Multiple interfaces to ESG will allow researchers to focus on science rather than issues of data to focus on science rather than issues of data transfer, format, and data set manipulationtransfer, format, and data set manipulation
ESG ParticipantsESG Participants• ANL Argonne National Laboratory (Argonne, IL)• ISI Information Sciences Institute (Marina del Rey, CA)• LANL Los Alamos National Laboratory (Los Alamos, NM)• LBNL Lawrence Berkeley National Laboratory (Berkeley, CA)• LLNL Lawrence Livermore Nat. Laboratory (Livermore, CA)• NCAR Nat. Center for Atmospheric Research (Boulder, CO)• NERSCNat. Energy Res. Scient. Comp. Center (Oakland, CA)• ORNL Oak Ridge National Laboratory (Oak Ridge, TN)• USC University Of Southern California (Los Angeles, CA)
ESG HistoryESG History• ESG-I: DOE NGI(Next Generation Internet) project
– Focus on high-performance data movement, Grid-enabled versions of LLNL tools
– Early successes include bandwidth challenge at SC’2001, significant technology output
– Experimental deployments only, at participating sites
• ESG-II: DOE SciDAC(Scientific Discovery through Advanced Computing) project– “Smart servers” for server-side data reduction– Integration with common “thin” clients, e.g. DODS and Data
Portals– Client software in the hands of environmental scientists– Production deployments at participating instances
Climate GRID Example for Ocean ModelClimate GRID Example for Ocean Model
Temperature(i,j)
Latitude(i,j)
Longitude(i,j)
Lat_bounds(i,j,4)
Lon_bounds(i,j,4)
ESG ComponentsESG Components
User authentication
Metadata Search
Replica Location and transfer
Data analysis and visualization
Demonstration Workflow:Demonstration Workflow:
• Globus Toolkit (ANL, ISI)– GridFTP data transfer– GRAM resource access– Community Authorization
Service (CAS)– Replica Location Service (RLS)– Metadata Catalog Service
(MCS)
• Web interface (NCAR) and workflow manager
• Hierarchical Resource Manager (HRM) (LBNL)
• Storage Resource Manager
• Metadata (NCAR, LLNL, ISI)
• OpenDAP-G (NCAR, ANL)
• Live Access Server (NCAR)
The Globus ToolkitThe Globus Toolkit™™
• An Open Source Project• Security• Directory, Metadata, and Replica Services• Resource Management• Data Access and Management• Distributed Computation• Open Grid Services Architecture (OGSA)
– Reliable, persistent web services
The Globus ToolkitThe Globus Toolkit™™
• Globus middleware supports linkage of distributed data archives, supercomputers, workstations, local disk caches into data/computational grids.
• GridFTP: high-performance, secure, robust data transfer mechanism: protocol, server, client library.• ESG is integrating OpenDAP (DODS protocol) with GridFTP
protocol.• Single sign-on using Grid Security Infrastructure
• Proxy certificates• Community Authorization Service (CAS)• Replica Location Service: manages copying and
placement of files in a distributed environment.• Logical vs. physical files
Distributed Data Access ProtocolDistributed Data Access Protocol
Data(local)
netCDF lib
Application
Data(remote)
OpenDAP Client
Application
OpenDAPViahttp
Big Data(remote)
ESG client
Application
ESGGrid +DODS
OpenDAP Server ESG Server
Distributed Application
dataOpenDAP
ViaGrid
Typical Application
Grid + OpenDAP-Transparency-Performance-Security-Resource Management-Analysis functions
ESG Metadata ServicesESG Metadata Services
METADATAEXTRACTION
METADATAEXTRACTION
METADATADISPLAY
METADATADISPLAY
METADATABROWSING
METADATABROWSING
METADATAQUERY
METADATAQUERY
ESG CLIENTS API & USER INTERFACES
Data &MetadataCatalog
Dublin CoreDatabase
CFDatabase
mirrorDublin CoreXML Files
COMMENTSXML Files
METADATA HOLDINGS
METADATAANNOTATION
METADATAANNOTATION
METADATAVALIDATION
METADATAVALIDATION
METADATA ACCESS(update, insert, delete, query)
METADATA ACCESS(update, insert, delete, query)
SERVICE TRANSLATIONLIBRARY
SERVICE TRANSLATIONLIBRARY
CORE METADATA SERVICES
METADATAAGGREGATION
METADATAAGGREGATION
METADATADISCOVERY
METADATADISCOVERY
METADATA & DATA REGISTRATION
METADATA & DATA REGISTRATION
PUBLISHINGPUBLISHING
HIGH LEVEL METADATA SERVICES
SEACH & DISCOVERYSEACH & DISCOVERYADMINISTRATIONADMINISTRATION BROWSING & DISPLAYBROWSING & DISPLAY
ANALYSIS & VISUALIZATIONANALYSIS & VISUALIZATION
Resource ManagementResource Management
• Hierarchical Resource Manager- queuing of file transfer requests - reordering of request to optimize Parallel FTP - monitoring progress and error messages - re-schedules failed transfers - enforces local resource policy
• Storage Resource Management - Manage space - Manage files on behalf of a user - Manage file sharing - Get files from remote locations when necessary - Manage multi-file requests - Provide grid access to/from mass storage - Transfer protocol negotiation
Live Access ServerLive Access Server
• General purpose Web server for geo-science data sets• Directs communications between a user and an application running
under a Web server • Converts requests into a series of commands which actually does
the data access
ESG Data PortalESG Data Portal
Goal: Make large ESG data sets Goal: Make large ESG data sets easily easily accessible toaccessible to
ScientistsScientists for production usefor production use
TOMCATServlet engine
TOMCATServlet engine
MCSMetadata Cataloguing Services
MCSMetadata Cataloguing Services
RLSReplica Location Services
RLSReplica Location Services
SOAP
RMI
MyProxyserver
MyProxyserver
MCS client
RLS client
MyProxy clientGRAM
gatekeeper
GRAMgatekeeper
CASCommunity Authorization Services
CASCommunity Authorization Services
CAS client
diskMSS
Mass Storage System
HPSSHigh PerformanceStorage System
disk
HPSSHigh PerformanceStorage System
disk
disk
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
gridFTP
gridFTP
gridFTPserver
gridFTPserver
gridFTPserver
gridFTPserver gridFTP
server
gridFTPserver
gridFTPserver
gridFTPserver
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
LBNL
LLNL
ISI
NCAR
ORNL
ANL
Striped gridFTPclient
Striped gridFTPclient
gridFTP
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
gridFTP
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
gridFTP
LASLive
AccessServer
LASLive
AccessServer
ESG: StrategiesESG: Strategies & Goals & Goals
• Move data a minimal amount, keep it close to computational point of origin when possible– Data access protocols, distributed analysis
• When we must move data, do it fast and with a minimum amount of human intervention– Storage Resource Management, fast networks
• Keep track of what we have, particularly what’s on deep storage– Metadata and Replica Catalogs
• Harness a federation of sites– Globus Toolkit -> The Earth System Grid -> The
UltraDataGrid
ESG Development in 2003ESG Development in 2003
• Metadata Conventions and Services– Application groups deciding on one (or more) metadata schemas– Better MCS support for XML schema– Distribution and federation of heterogeneous metadata catalogs
• Integration of DODS server and GridFTP data transport protocol
• Customization of Replica Location Service for ESG
• Storage Resource Manager (from LBNL) to optimize storage transfers
• Community authorization service to provide fine-grained access control