Upload
menora
View
29
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Grid a brief briefing. Carole Goble Information Management Group. Roadmap. What is the Grid? Example projects Relationship to the Semantic Web Example architectures The international programme. Take Home. The Grid is an international activity - PowerPoint PPT Presentation
Citation preview
The Grida brief briefingCarole GobleInformation Management Group
Roadmap What is the Grid? Example projects Relationship to the Semantic Web
Example architectures The international programme
Take Home The Grid is an international activity The Grid has attracted high profile
industrial and government support and funding
The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web
The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.
So what’s the Grid?
Isn’t it just High Performance Computing for High Energy
Physicists?
Why Grids? Large-scale science and engineering are
done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed.
The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering.
From Bill Johnston 27 July 01
CERN: Large Hadron Collider (LHC)Raw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs
CMS Detector
Why Grids? A biochemist exploits 10,000 computers to
screen 100,000 compounds in an hour; A biologist combines a range of diverse and
distributed resources (databases, tools, instruments) to answer complex questions;
1,000 physicists worldwide pool resources for petaop analyses of petabytes of data
Civil engineers collaborate to design, execute, & analyze shake table experiments
From Steve Tuecke 12 Oct. 01
Why Grids? (contd.) Climate scientists visualize, annotate, &
analyze terabyte simulation datasets An emergency response team couples real
time data, weather model, population data A multidisciplinary analysis in aerospace
couples code and data in four companies A home user invokes architectural design
functions at an application service provider
From Steve Tuecke 12 Oct. 01
Why Grids? (contd.) An application service provider
purchases cycles from compute cycle providers
Scientists working for a multinational soap company design a new product
A community group pools members’ PCs to analyze alternative designs for a local road
From Steve Tuecke 12 Oct. 01
The Grid Vision “…flexible, secure, coordinated
resource-sharing among dynamic collections of individuals, institutions, and resources–what we refer to as virtual organisations”
“The Anatomy of the Grid: Enabling Scalable Virtual Organizations” Foster, Kesselman and Tuecke, 2001
The Grid Problem Enable communities (“virtual
organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships.
From Steve Tuecke 12 Oct. 01
Large scale Multi-disciplinary
simulation Decision support
and optimization Virtual prototyping Collaborative
analysis and visualization
Large scale distributed data management
Large scale distributed computation
High speed communications
Dynamic collaborative virtual organisations
Visualisation
Data Computation
stretch
interrogation
workflows
results
Collaboration GridTechnology Grid
What is it?Where is it?
How to get it?When did it? happen?
Who knows it?Why does it?
What are you doing?
Governance & Control
Online Access to Scientific Instruments
DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago
tomographic reconstruction
real-timecollection
wide-areadissemination
Advanced Photon Source
archival storage
From Steve Tuecke 12 Oct. 01
desktop & VR clients with
shared controls
Supernova Cosmology
Network for EarthquakeEngineering Simulation
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
On-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USCFrom Steve Tuecke 12 Oct. 01
Home ComputersEvaluate AIDS Drugs
Community = 1000s of home
computer users Philanthropic
computing vendor (Entropia)
Research group (Scripps)
Common goal= advance AIDS research
From Steve Tuecke 12 Oct. 01
myGrid Personalised extensible
environments for data-intensive in silico experiments in biology
Straightforward discovery, interoperation, sharing
Workflow oriented provenance propagating change
Individual creativity & collaborative working personalisation
myGrid resourcesQuestion: Nucleotide binding protein in mouse
Answer: P12345 in Swiss-Prot is an ATPaseTerri Attwood is an expert on thisJackson Labs have a database but you need to
registerA paper has just been published in Proteins by
the Stanford lab on this.
GeoDISE – engineering design optimisation
Access to knowledge repository Access to optimisation and search tools Industrial analysis codes Distributed computing and data resources in
design optimisation Applied to industrial problems - large scale
CFD codes Demonstrate scalability across distributed
computational and data resources and teams of designers
GeoDISE Modern engineering firms are global and distributed
“Not just a problem of using HPC”
CAD and analysis tools, user interfaces, PSEs, and
Visualization
Optimisation methods
Data archives (e.g. design/ system usage)
Knowledge repositories & knowledge capture and reuse
tools.
Management of distributed compute and data resources
How to … ?
… improve design environments… cope with legacy code / systems
… integrate large-scale systems in a flexible way
… produce optimized designs
… archive and re-use design history
… capture and re-use knowledge
Virtual Sky http://virtualsky.org/
Broader Context “Grid Computing” has much in common
with major industrial thrusts Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers, Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at
site Y subject to community policy P, providing access to data at Z according to policy Q”
High performance: unique demands of advanced & high-performance systems
From Steve Tuecke 12 Oct. 01
Elements of the Problem Resource sharing
Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy,
negotiation, payment, … Coordinated problem solving
Beyond client-server: distributed data analysis, computation, collaboration, …
Dynamic, multi-institutional virtual organisations Community overlays on classic org structures Large or small, static or dynamic
Problem Solving Environments
From Steve Tuecke 12 Oct. 01
Broader Context “Grid Computing” has much in common with
major industrial thrusts Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers, Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at
site Y subject to community policy P, providing access to data at Z according to policy Q”
High performance: unique demands of advanced & high-performance systems
From Steve Tuecke 12 Oct. 01
The Globus Project™ Close collaboration with real Grid projects in science
and industry Development and promotion of standard Grid
protocols to enable interoperability and shared infrastructure
Development and promotion of standard Grid software APIs and SDKs to enable portability and code sharing
The Globus Toolkit™: Open source, reference software base for building grid infrastructure and applications
Global Grid Forum: Development of standard protocols and APIs for Grid computing
From Steve Tuecke 12 Oct. 01
Doesn’t Globus solve it all? Globus ToolKit is focused on the
Data/Computational layer No database connectivity Little brokering, and static not dynamic Weak metadata management, workflow Trashes firewalls No, not everything is JCL, FTP and LDAP Distributed computation dominates
etc…etc…
Is it done? NASA Power Grid is the only one really
working http://www.ipg.nasa.gov Linking similar supercomputers owned by
the same organisation Computation-focused
High Energy Physics is atypical
Example Application Projects AstroGrid: astronomy, etc.
(UK) Earth Systems Grid:
environment (US DOE) EU DataGrid: physics,
environment, etc. (EU) EuroGrid: various (EU) Fusion Collaboratory (US DOE) GridLab: astrophysics, etc.
(EU) Grid Physics Network (US
NSF) MetaNEOS: numerical
optimization (US NSF) NEESgrid: civil engineering
(US NSF)
RealityGrid (UK) DAME (UK) Comb-e-Chem (UK) GeoDISE (UK) iVDGL, StarLight (US/EU) DiscoveryNet (UK) myGrid (UK) GridPP (UK) Particle Physics Data Grid
(US DOE) etc…
“ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “
Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.
Every Community needs a Matchmaker!
Condor uses Matchmakers to build Computing Communities out of Commodity Components
.. someone has to bring together community members who have requests for goods and services with members who offer them. Both sides are looking for each other Both sides have constraints Both sides have preferences
Lets look at some Architectures
A Desiderata (adapted from Globus)
Software development toolkits e.g. Globus toolkit
Standard protocols, services & APIs
A modular “bag of technologies” Enable incremental
development of grid-enabled tools and applications
Reference implementations Learn through deployment and
applications Open source
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
Globus Layered Grid ArchitectureCERN - High Energy Physics
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Internet Protocol Architecture
From Steve Tuecke 12 Oct. 01
Keith Jeffery
Scientific Problems
Processes
Knowledge
Information
Jobs and Data
Data
Raw Resources
Knowledge / capability
Semantics / process
Data / applications
Value chain
Interoperability, higher level ontologies, reasoning, discovery, Reasoning services, Discovery services
Fulfillment Grid
"Reproduced by permission of the IT Innovation Centre, University of Southampton." http://www.it-innovation.soton.ac.uk
Three Layer Grid Abstraction
Grid
In
form
atio
n
Serv
ice
Uni
form
Res
ourc
eA
cces
sB
roke
ring
Glo
bal
Que
uing
Glo
bal E
vent
Serv
ices
Co-
Sche
dulin
g
Dat
a C
atal
ogui
ngU
nifo
rm D
ata
Acc
ess
Col
labo
ratio
n an
d R
emot
e In
stru
men
t Se
rvic
es
Net
wor
k C
ache
Com
mun
icat
ion
Serv
ices
Aut
hent
icat
ion
Aut
horiz
atio
n
Secu
rity
Serv
ices
Aud
iting
Faul
t M
anag
emen
t
Mon
itorin
g
Grid Common Services: Standardized Services and Resources Interfaces
Applications: Simulations, Data Analysis, etc.Toolkits: Visualization, Data Publication/Subscription, etc.
Distributed Resources
Discipline Specific Portals andScientific Workflow Management Systems
Condor pools
networkcaches
tertiary storage national user facilities
clustersnational supercomputer
facilities
High-speed Networks and Communications Services
= Globus services
Architecture of a Grid
Architecture of a Grid – upper layersProblem Solving Environments
• Knowledge based query• Tools to implement the human interfaces, e.g. SciRun, ECCE, WebFlow, .....• Mechanisms to express, organize, and manage the workflow of problem solutions (“frameworks”)• Access control
appl
icatio
n co
des
visu
aliza
tion
tool
kits
colla
bora
tion
tool
kits
inst
rum
ent
man
agem
ent
tool
kits
data
pub
lish
and
subs
crib
e to
olki
ts
Applications and Supporting Tools
Grid enabled libraries (security, communication services, data access, global event management, etc.)
Glob
us
MPI
CORB
A
Cond
or-G Java
/Jin
i
DCOM
Application Development and Execution Support
Distributed ResourcesGrid Common Services
“Knowledge Based” Data Grids
AttributesSemantics
Knowledge
Information
Data
Ingest Services
Management AccessServices
(Model-based Access)
(Data Handling System - SRB)
MC
AT/
HD
F
Grid
s
XM
L D
TD
SDLI
P
XTM
DTD
Rul
es -
KQ
L
InformationRepository
Attribute- based Query
Feature-basedQuery
Knowledge orTopic-Based Query / Browse
KnowledgeRepository for Rules
RelationshipsBetweenConcepts
FieldsContainersFolders
Storage(Replicas,Persistent IDs)
National Partnership for Advanced Computational Infrastructure
Compute Resources Catalogs Data Archives
InformationDiscovery
Metadatadelivery
Data Discovery
Data Delivery
Catalog Mediator Data mediator
1. Portals and Workbenches
Bulk DataAnalysis
CatalogAnalysis
MetadataView
DataView
4.Grid SecurityCachingReplicationBackupScheduling
2.Knowledge & ResourceManagement
Standard Metadata format, Data model, Wire format
Catalog/Image Specific Access
Standard APIs and Protocols Concept space
3.
5.
6.
7. Derived Collections
Astronomy Sky SurveyData Grid
referenceditems &
collections
referenceditems &
collections
ReferencedItems &
Collections
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
visualization...
CI Services
discussion
CI Services
personalization
CI Services
topic-map registry
CI Services
query transform
Core Services:annotation
Core Collection-Building Servicespersistent storage
Core Collection-Building Servicesmetadata harvesting
Core Services:metadata normalizing
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollectionsNSDL
CollectionsNSDL
Collections
Metadata & data access-based
services
Core NSDL BusMeta-data delivery
Data deliveryQuery
Global IdsSecurityNetwork
Virtual Collections &Mediators
Information about collections
Delivery PresentationAggregation - Channels
NSDL
ERA Concept model
Mediation of Information using XML
Storage Resource Broker/Extensible Meta-data CATalog
ERA: Archival Components Concept
Metadata
ArchivalRepository
OrderFulfillment
System
ReferenceWorkbench
Query
Rebuild
Present
Tapes
AccessioningWorkbench
Accession
Verify
Wrap & Containerize
Describe
CollectionDisks
Internet
Collection
Collection
Archival Research CatalogRecords
Schedules
Grid Security Infrastructure
The De Roure Triangle
Agents Web ServicesSemantic Web
Grid Computing
e-Busines
s
e-Scienc
e
?
California Institute of TechnologyRoy Williams Paul Messina
So what is going on?
UK: http://www.escience-grid.org.uk/International: http://www.gridforum.org/
£80m Collaborative projects
E-ScienceSteering
Committee
DG Research Councils
Director Director’s
Management RoleDirector’s
Awareness and Co-ordination Role
Generic Challenges EPSRC (£15m), DTI (£15m)
Industrial Collaboration (£40m)
Academic Application SupportProgramme
Research Councils (£74m), DTI (£5m)
PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)
Grid TAG
From Tony Hey 27 July 01
E-Science Programme
Key Elements of UK Grid Development Plan
Development of Generic Grid Middleware Network of Grid Core Programme e-Science
Centres National Centre http://www.nesc.ac.uk/ Regional Centres http://www.esnw.ac.uk/
Grid IRC Grand Challenge Project Support for e-Science Pilots Short term funding for e-Science
demonstrators Grid Network Team * Grid Engineering Team Grid Support Centre * Task Forces
Adapted from Tony Hey 27 July 01
Take Home The Grid is an international activity The Grid has attracted high profile
industrial and government support and funding
The Information/Knowledge Grid is in many ways indistinguishable from the Semantic Web
The Grid Community’s understanding of generic and theoretical issues for the IK Grid is immature and hackery.
Spares
Supernova Cosmology
Home ComputersEvaluate AIDS Drugs
Community = 1000s of home
computer users Philanthropic
computing vendor (Entropia)
Research group (Scripps)
Common goal= advance AIDS research
From Steve Tuecke 12 Oct. 01
Grid viewpoints
interrogation
workflows
results
Access Grid
New Biology
Technology Grid
private
public
What is it?Where is it?
How to get it?When did it happen?
Who knows it?Why does it?
What are you doing?
Governance & Control
Particle Physics and Astronomy Research Council (PPARC)
GridPP (http://www.gridpp.ac.uk/) to develop the Grid technologies required
to meet the LHC computing challenge ASTROGRID
(http://www.astrogrid.ac.uk/) a ~£4M project aimed at building a data-
grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory
Infrastructure Deployments Institutional Grid deployments: deploying
services and network infrastructure DISCOM, IPG, TeraGrid, DOE Science Grid, DOD
Grid, NEESgrid, ASCI (Netherlands) International deployments: supporting
international experiments and science iVDGL, StarLight
Support centers U.K. Grid Center U.S. GRIDS Center