Upload
rosamond-mathews
View
221
Download
5
Tags:
Embed Size (px)
Citation preview
Brain Meets Brawn: Why Grid and Agents Need
Each Other
Ian Foster
Argonne National Laboratory
University of Chicago
Globus Alliance
In collaboration with Nick Jennings & Carl Kesselman
http://www.aamas2004.org/proceedings/003-foster_i_grid.pdf
What is the Grid?
“The Grid is an international project that looks in detail at a terrorist cell operating on a global level and a team of American and British counter-terrorists who are tasked to stop it”
Gareth Neame, BBC's head of drama
Well, Not Exactly!
“The Grid is an international project that looks in detail at scientific collaborations operating on a global level and a team of computer scientists who are tasked to enable it”
At least, that’s where it started …
[email protected] ARGONNE CHICAGO
Open Distributed Systems
Interoperable infrastructure & tools for secure and
reliable resource sharing
GridAgents
Methodologies, algorithms for autonomous problem solvers in open systems
(brain) (brawn)
The two needeach other!
[email protected] ARGONNE CHICAGO
Overview
Where’s the muscle? What Grid is about
The need for brains The importance of automation
Bringing the two together Working with the Open Science Grid Research problems
[email protected] ARGONNE CHICAGO
Overview
Where’s the muscle? What Grid is about
The need for brains The importance of automation
Bringing the two together Working with the Open Science Grid Research problems
[email protected] ARGONNE CHICAGO
Why the Grid?Origins: Revolution in Science
Pre-Internet Theorize &/or experiment, alone
or in small teams; publish paper Post-Internet
Construct and mine large databases of observational or simulation data
Develop simulations & analyses Access specialized devices remotely Exchange information within
distributed multidisciplinary teams
[email protected] ARGONNE CHICAGO
Origins in Science
Engage via telepresence in an experiment at a remote facility
Integrate data from multiple sources in support of global change research
Harness computers across sites to process data from a physics experiment
Discover & access a genome analysis service (running on high-end computer)
[email protected] ARGONNE CHICAGO
Why the Grid?New Driver: Revolution in Business
Pre-Internet Central data processing facility
Post-Internet Enterprise computing is highly distributed,
heterogeneous, inter-enterprise (B2B) Business processes increasingly
computing- & data-rich Outsourcing becomes feasible
service providers of various sorts Growing complexity & need for
more efficient management
[email protected] ARGONNE CHICAGO
Grid Hype—and Products
[email protected] ARGONNE CHICAGO
Common Requirements Dynamically link resources/services
From collaborators, customers, eUtilities, … (members of evolving “virtual organization”)
Into a “virtual computing system” Dynamic, multi-faceted system spanning
institutions and industries Configured to meet instantaneous needs, for:
Multi-faceted QoX for demanding workloads Security, performance, reliability, …
[email protected] ARGONNE CHICAGO
The Grid “Resource sharing & coordinated
problem solving in dynamic, multi-institutional virtual organizations”
1. Enable integration of distributed resources
2. Using general-purpose protocols & infrastructure
3. To achieve better-than-best-effort service
[email protected] ARGONNE CHICAGO
“Protocols & Infrastructure”:Open Standards & Software
Standardized & interoperable mechanisms for secure & reliable: Authentication, authorization, policy, … Representation & management of state Initiation & management of computation Data access & movement Communication & notification
Good quality open source implementations to accelerate adoption & development E.g., Globus Toolkit
[email protected] ARGONNE CHICAGO
Incr
ease
d fu
nctio
nalit
y,st
anda
rdiz
atio
n
Customsolutions
1990 1995 2000 2005
Open GridServices Arch
Real standardsMultiple implementations
Web services, etc.
Managed sharedvirtual systems
Computer science research
Globus Toolkit
Defacto standardSingle implementation
Internetstandards
Grid Infrastructure:Open Software and Standards
2010
[email protected] ARGONNE CHICAGO
WS Core Enables Frameworks:E.g., Resource Management
Web services(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)
WS-Resource Framework & WS-Notification(Resource identity, lifetime, inspection, subscription, …)
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring, …)
Applications of the framework(Compute, network, storage provisioning,
job reservation & submission, data management,application service QoS, …)
[email protected] ARGONNE CHICAGO
Web Servicesand Stateful Resources
“State” appears in almost all applications Data in a purchase order Current usage agreement for resources Metrics associated with work load on a server
Web services can model, access and manage state in many different ways Ad-hoc, per-application approaches OGSI/WSRF proposes a standard approach
[email protected] ARGONNE CHICAGO
Building systems by composition of heterogeneous components demands that we standardize common patterns Approach to resource identification Lifetime management interfaces Inspection & monitoring interfaces Base fault representation Service and resource groups Notification And many more…
Standards encourage tooling & code re-use Build services more quickly & reliably
Why Standardize An Approach?
[email protected] ARGONNE CHICAGO
WSRF & WS-Notification Naming and bindings (basis for virtualization)
Every resource can be uniquely referenced, and has one or more associated services for interacting with it
Lifecycle (basis for fault resilient state management) Resources created by services following factory pattern Resources destroyed immediately or scheduled
Information model (basis for monitoring & discovery) Resource properties associated with resources Operations for querying and setting this info Asynchronous notification of changes to properties
Service Groups (basis for registries & collective svcs) Group membership rules & membership management
Base Fault type
[email protected] ARGONNE CHICAGO
Grid in Practice:Earthquake Engineering Example
Secure, reliable, on-demand access to data,software, people, and other resources(ideally all via a Web Browser!)
[email protected] ARGONNE CHICAGO
How it Really Happens(with the Globus Toolkit)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
CHEF ChatTeamlet
MyProxy
CHEF
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
Globus IndexService
GlobusGRAM
GlobusGRAM
GlobusDAI
GlobusDAI
GlobusDAI
Application Developer
2
Off the Shelf
9
Globus Toolkit
4
Grid Community
4
[email protected] ARGONNE CHICAGO
0
10
20
30
40
50
60
70
8:0
0
8:3
0
9:0
0
9:3
0
10
:00
10
:30
11
:00
11
:30
12
:00
12
:30
13
:00
13
:30
14
:00
14
:30
15
:00
15
:30
16
:00
16
:30
17
:00
17
:30
18
:00
18
:30
Nu
mb
er
of
Pa
rtic
ipa
nts
UIUC
Colorado
NEESgridMultisite OnlineSimulation Test
(July 2003)
Illin
ois
Colo
rado
Illinois (simulation)
[email protected] ARGONNE CHICAGO
Overview
Where’s the muscle? What Grid is about
The need for brains The importance of automation
Bringing the two together Working with the Open Science Grid Research problems
Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 7 substantial applications + CS experiments Running since October 2003
Korea
http://www.ivdgl.org/grid2003
[email protected] ARGONNE CHICAGO
Open Science Grid Components Computers & storage at 28 sites (to date)
2800+ CPUs Uniform service environment at each site
Globus Toolkit provides basic authentication, execution management, data movement
Pacman installation system enables installation of numerous other VDT and application services
Global & virtual organization services Certification & registration authorities, VO membership
services, monitoring services Client-side tools for data access & analysis
Virtual data, execution planning, DAG management, execution management, monitoring
IGOC: iVDGL Grid Operations Center
(Very Small)Example
Workflows
Genome sequence analysis
Physicsdata
analysis
Sloan digital sky
survey
Secure, Robust Grid Infrastructure (Mostly …)Intelligent Agents Also Play a Vital Role! E.g.:
Membership
Adaptivescheduling
SLAnegotiation
TroubleshootingSchema
mediation
Intrusiondetection
[email protected] ARGONNE CHICAGO
The Need for Automation:Critical if Grid is to Scale
Who contributes & gets to consume what? Policy negotiation, enforcement, auditing
How do I schedule jobs & data movement? Adaptive scheduling
Who can be trusted to do what? Community membership, reputation, trust
negotiation, intrusion detection Why do things fail?
Failure detection, problem determination, fault isolation, system adaptation
[email protected] ARGONNE CHICAGO
Adaptive Scheduling Adaptive placement
of data & computation Experiments on Grid3
0
10
20
30
40
50
60
70
80
22-Jan 23-Jan 24-Jan 25-Jan 26-Jan 27-Jan 28-Jan
Aver
age
num
ber o
f Idl
e C
PUs
FNAL_CMS
IU_ATLAS_Tier2 0
500
1000
1500
2000
2500
BN
L_
AT
LA
S
Ca
lTe
ch
-Gri
d3
UFlo
rid
a-P
G
IU_
AT
LA
S_
Tie
r2
UB
uff
alo
-CC
R
UC
Sa
nD
ieg
oP
G
UFlo
rid
a-G
rid
3
UM
_A
TLA
S
Va
nd
erb
ilt
AN
L_
HE
P
Ca
lTe
ch
-PG
Tota
l I/O
tra
ffic
(M
Byte
s)
0
200
400
600
800
Least Loaded At Data Cost Function
Tot
al R
espo
nse
Tim
e (s
econ
ds)
(K. Ranganathan)
0
50
100
150
200
250
300
350
400
0 50 100 150 200IP delay (ms)
Ove
rlay
del
ay (
ms)
.
AdaptiveUnstructured Multicast
“UMM: A dynamically adaptive, unstructured multicast overlay” M. Ripeanu et al.
A
E
B
D
C
A’
E’
B’
D’
C’
A”
E”
B”
D”
C”
Applicationoverlay
Baseoverlay
Physicaltopology
0
2
4
6
8
10
0
240
480
720
960
1200
1440
1680
1920
2160
2400
2640
2880
3120
3360
3600
3840
Time (sec)
RD
P
0
2
4
6
8
10
12
Max
xim
um li
nk s
tres
s .
MaxRDP
95% RDP
90%RDP
Stress
10 nodes fail then
rejoin 900s later
RDP=1
RDP=2
[email protected] ARGONNE CHICAGO
WS Core Enables Frameworks:E.g., Resource Management
Web services(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)
WS-Resource Framework & WS-Notification(Resource identity, lifetime, inspection, subscription, …)
WS-Agreement(Agreement negotiation)
WS Distributed Management(Lifecycle, monitoring, …)
Applications of the framework(Compute, network, storage provisioning,
job reservation & submission, data management,application service QoS, …)
[email protected] ARGONNE CHICAGO
Agreement Negotiation All interesting interactions will be based on
agreements between requestors & services Encode a negotiated quality of experience
Challenge is to establish the framework within which agreements can be: Established in an oversubscribed environment Transformed, composed, decomposed Managed like any other resource Evolved as a result of faults
WS-Agreement is the next step on this path FIPA Request Interaction Protocol?
[email protected] ARGONNE CHICAGO
WS-Agreement Model
Consumer Provider
create()
foo()Application Instance
Factory
Manager
create()Factory Agreement
Operations:Terminate()GetResourceProperty()...
Resource Properties:
Terms RelatedStatusAgrmts
inspect()
Agreement Layer
Service Layer
Consumer Provider
create()
foo()Application Instance
Factory
Manager
create()Factory Agreement
Operations:Terminate()GetResourceProperty()...
Resource Properties:
Terms RelatedStatusAgrmts
inspect()
Agreement Layer
Service Layer
[email protected] ARGONNE CHICAGO
WS-Agreement Applicability
WS-Agreement to be used/specialized by specs that define domain-specific services
Data management services Storage provisioning, transfer mgt, replication
Job/execution management services Cluster provisioning, image/service deployment, job
reservation and submission, specialized services, etc. Network provisioning services
Optical path, firewall traversal, etc. Application- and domain-specific services
[email protected] ARGONNE CHICAGO
Network
RRR
A
ServiceLevel
Bringing it All TogetherScenario: Resource management & scheduling
Storage
RRRIBM
IBM
Blades
RRR
Notification
GridScheduler
WS-Resource used to “model” physical
processor resources
WS-Resource Properties “project” processor status (like utilization)
Local processor manageris “front-ended” with A Web service interface
Other kinds of resources are also“modeled” as WS-Resources
JJ
J
WS-Notification can be used to “inform” the
scheduler when processor utilization
changes
Grid “Jobs” and “tasks” are also modeled using
WS-Resources and Resource Properties
Grid Scheduleris a
Web Service
Service Level Agreement
is modeled as a WS-Resource
Lifetime of SLA Resource tied to the duration
of the agreement
[email protected] ARGONNE CHICAGO
Overview
Where’s the muscle? What Grid is about
The need for brains The importance of automation
Bringing the two together Working with the Open Science Grid Research problems
[email protected] ARGONNE CHICAGO
Grid: Two Reasons to Care
Grid technology can ease creation of secure, robust, scalable agent systems E.g., Globus Toolkit
Grid deployments can serve as testbeds for agent concepts & algorithms E.g., Grid3 is evolving to Open Science Grid
with 1000s of CPUs & dozens of sites
Run ultra-large agent systems
Help tackle fundamental problems: trust, fault detection, adaptation, negotiation, etc.
[email protected] ARGONNE CHICAGO
Working on the
What it gives you Large collection of distributed resources Standard services Mechanisms for deploying further software
What you can do Use for large-scale agent experiments Define and deploy new services: e.g., intrusion
detection, fault determination Define challenge problems to motivate
development of adaptive techniques
[email protected] ARGONNE CHICAGO
10 Research Challenges (1)
Service architecture Robust foundation for autonomous behaviors
Trust negotiation & management Expressing & reasoning about trust
System management & troubleshooting Autonomic management of large systems
Negotiation Establishing, monitoring, evolving agreements
Service composition Describing, discovering, composing services
[email protected] ARGONNE CHICAGO
10 Research Challenges (2)
VO formation & management Lifecycle issues
System predictability Guarantees of emergent behavior
Human-computer collaboration Interactions in hybrid teams
Evaluation Benchmarks, challenge problems, testbeds
Semantic integration Ontology definition, schema mediation
[email protected] ARGONNE CHICAGO
Summary
Agents & Grid share a common interest in robust, scalable open distributed systems
Complementary foci in work to date Grid: secure & robust infrastructure (brawn) Agents: autonomous problem solvers (brain)
Both brain and brawn required for progress Specific proposals for convergence
Use Grids as testbeds for agent technology Coordinated attack on open problems
[email protected] ARGONNE CHICAGO
For More Information Globus Alliance
www.globus.org Global Grid Forum
www.ggf.org Open Science Grid
www.opensciencegrid.org Background information
www.mcs.anl.gov/~foster GlobusWORLD 2005
Feb 7-11, Boston
2nd Editionwww.mkp.com/grid2