Upload
hubert-powell
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Autonomic Computing
Omer F. Rana (Cardiff University)
Overview• Illustrative example:
– Managing Web Servers– Reference to IBM’s AC vision
• Use of SLAs to support system management – SLA standards, use of SLA in adaptation
• Approaches to adaptation – Stigmergy (social insects)– Utility-based approaches
• Toolkits
Recap … AC
• Automating the management of computer resources
• System components more complex– Better functionality– Hard to appreciate functionality– Interaction between components not always
obvious
• System admins under increasing pressure to respond to complexity
AC … 2
• Manual tuning– Generally script driven (requires updates to
configuration files)– Error-prone process (requires skilled personnel)
• Automated tuning– Try to model behaviour of the system – Use this behaviour as a “predictive” tool to determine
likely response from system– Design feedback control mechanisms (and use on-
line operation to adjust control)
AC application
Can be applied at two levels:• Individual component level
– Make each component more intelligent – Provide support infrastructure around this
intelligent component
• Interaction level– Facilitate better interaction between
components in some way – Allow “useful” interactions to “emerge”
Four Concepts• Self-configuring:
– Dynamic adaptation to changing environment– Addition of new features dynamically
• Self-healing:– Discover, diagnose and react to disruptions– Handling failure and isolating a component
• Self-Optimising:– Monitor and tune resource utilisation – Includes: dynamic partitioning, workload management
• Self-Protecting:– Anticipate/Identify, detect and protect from attacks– Extend existing security infrastructure to achieve this
Relationship to other themes
• Machine Learning and AI • Knowledge Management (Semantics)• Coordination Mechanisms and Protocols • System Administration • Performance Engineering and Monitoring
• Related Emerging areas– Ambient Intelligence – Amorphous Computing– Computational “Fabrics”
From:IBM
Response Time
Actual BOPS
Predicted BOPS
#Active Servers
#Requested Servers
11. Steady State
From Alan Ganek, IBM
2. Monitor, Detect Surge
Response Time
Actual BOPS
Predicted BOPS
#Active Servers
#Requested Servers
1 2
From Alan Ganek, IBM
3. Forecast, Provision Servers
Response Time
Actual BOPS
Predicted BOPS
#Active Servers
#Requested Servers
1 21 2 3
From Alan Ganek, IBM
Response Time
Actual BOPS
Predicted BOPS
#Active Servers
#Requested Servers
4. Monitor, Remove Servers1 2 43
From Alan Ganek, IBM
Apache Web Server Tuning
• Based on a client-server basis with a limit on MaxClients and KeepAlive – Tuning is equivalent to modifying MaxClients and
KeepAlive
• Performance Metrics – End-user response time – Resource utilisation – CPU and memory utilisation
• Measure parameters on server side• Over utilisation == thrashing and potential failure
Basis for Metrics• Master process + pool of worker processes• Each worker process handles interaction with a
Client • Worker processes limited by MaxClients• Worker Process: idle, waiting and busy
– Idle (no TCP connection made)– Waiting (waiting for HTTP request from client)– Busy (processing request)
• Persistent HTTP/1.1: TCP connection remains open between consecutive HTTP requests (reduces time to set up a connection)
• Persistent connection can be terminated by master or client process – if waiting time exceeds max. allowed by KeepAlive
Desired CPU level=0.5, and Memory=0.6
Manual Tuning
Dynamic Workload (additional requests at 20th Control Interval)
Manual Tuning … 2
Dynamic Workload
• To maintain CPU and Memory criteria, it is necessary to tune manually
• Achieved by adjusting MaxClients and KeepAlive parameters
• Dynamic workload (generally unpredictable) requires continuous re-tuning
• Trying to follow changes resulting from dynamic workload can be continuous process
AutoTune agents
• Autotune Adaptor Bean – Interfaces with target system for service level
metrics– Sets tuning parameters
• Autotune Controller Bean– Specifies control strategies (based on data
captured)– Interacts with system admin to configure
control strategy
Manages (1) timer,(2) Async events
Can set (1) controland (2) sample intervals
AutoTune Functionality
AutoTune Architecture Data set generator
AutoTune Agent Operations• Three agents:
– Feedback controller design• Model based controller• Linear Quadratic Regulation (LQR) controller
– Modelling• Non-production/testing mode• Alters tuning parameters: MaxClients and KeepAlive• Records performance metrics: CPU and memory• Construct dynamic model (based on time series)
– Run-time control• Production mode • Uses output from controller – dynamically adjusts MaxClients
and KeepAlive
Modelling agent• Build a mathematical model of the system
– Queuing theory– Data analysis based
• Mathematical model – Requires understanding of inner workings of server– May need to know about particular properties (exceptions) of the
way the server operates• Data-based model (“blackbox” approach)
– Gather data of system in the “wild”– Assume have covered sufficient number of test cases
• User Input– Range of Tuning Parameters: MaxClient [1,1024]; KeepAlive
[1,50]– Max delay required for tuning parameters to take effect on the
performance metrics: MaxClients (10m); KeepAlive (20m)
LinearModel
Feedback Control
• PID (proportional-integral-derivative) control – Correct error between a measured process
variable and a desired point– Calculating and outputting a corrective action
to adjust process accordingly
From Wikipedia
Feedback Control … 2• Proportional: reaction to current error
• Integral: reaction based on recent error (time based)
• Derivative: reaction based on rate by which error has been changing
• Use a weighted sum of the three modes
• Output as a corrective action to a control element
Proportional Mode
• Responds to a change in the process variable proportional to the current measured error value
• Multiply the error by a constant Kp (proportional gain)
m: output signal;Kp : proportional gaine: error (expected – actual)PB: proportionall Band
Integral Model• Controller output is proportional to the
amount and duration of the error
• Algorithm calculates the accumulated proportional offset over time
• Leads to controller approaching required value quicker – but contributes to system instability – may cause “overshoot”
m: output signal;Ti: Integral timee: error (expected – actual)
Derivative Mode• Acts as a breaking or damping action to
the controller response – as it overshoots
• Use of slope of error vs. time (rate of error change)
• Controller may be slower to reach required point (counters work of integral model controller)
m: output signal;Ti: Derivative timee: error (expected – actual)
Combining the three
• Output(t) = P + I + D
K_p = K; K_i = (K/T) ; K-D = KT_d
Run-time Control agent
• Implements an error feedback controller
• Makes use of a (1) desired, and (2) actual system utilisation
• Kp and Ki matrices obtained by the controller design agent
• Controller performance– Time to recover from a
workload change in the system
e=error between actualand desired value at kth interval
Accumulated error
Kp = proportional control gain, Ki = integral control gainFor stead state error
Controller Design Agent
• Relies on output of modelling agent
• Aims to minimise a quadratic cost function (J(Kp,Ki))
• Q, R are weighting matrices: Q is a 2x2 matrix and R is a 4x4 matrix
• Q = diag(q1,q2,q3,q4), and R=diag(r1,r2)– q1=1, q2=2, q3=(1/10^2),
q4=(1/2^2) (10% random CPU fluctuation, and 2% memory)
– r1=(1/50^2), r2=(1/1000^2)
Implementation• Undertaken with ABLE – extend AutoTune agent• Modelling agent
– Data generator extends AutotuneController bean (extends the process() method)
– ApacheAdaptor extends AutotuneAdaptor bean (implements socket connection with Apache Web server)
• Run-time Controller agent – Extends the AutotuneController bean– Also uses the ApacheAdaptor
• Controller Design agent– Extends the AutotuneController bean– Extends AutotuneAdaptor to read in model
parameters from Modelling agent
Experiment setup• Linux (v2.2.16) Apache HTTP v1.3.19• MaxClient and KeepAlive parameters to be
dynamically modifiable • Multiple clients supporting workload generator
– WAGON (Web trAffic GeneratOr and beNchmark) – Liu et al. (INRIA)
– Httperf to generate synthetic HTTP requests– File access distributions from Webstone 2.5
• Static and Dynamic workloads used – Static: Web page requests – session arrivals followed
a Poisson distribution (20 sessions/second)– Dynamic: Web page requests – session arrivals
followed a Poisson distribution (10 sessions/second)• Control Parameters
– Control interval (adaptation time): 5 seconds – Goal: CPU=0.5 and Memory=0.6
Automatic tuning of Apache Web Server (about 50 control intervals to converge)
With Dynamic Workload (at 20th Interval) – takes 20 intervals to adjust
Types of system components
• Computer Servers
• Web Servers
• Database systems
• Devices– Pervasive Computing– Ubiquitous Computing
Upgrades and Problem Diagnosis
FaultyModules
Upgrades and Problem Diagnosis
• Upgrade has 5 new autonomic modules
• Three modules found to be faulty (system reverts to old version)
• Analyse module dependencies
• Analyse log files to infer which of the three modules is the culprit
• Generate a “problem ticket” to software developer
QoS Management• QoS has been explored in:
– Computer Networks• Bandwidth, Delay, Packet loss rate and Jitter.
– Multimedia Applications• Frame rate and computation resource.
– Grid Computing• Network QoS, computation and storage
requirements.
Continue …
• QoS management:– Covers a range of different activities, from resource
specification, selection and allocation through to resource release.
• QoS system should address the following:– Specifying QoS requirements– Mapping of QoS requirements to resource capability– Negotiating QoS with resource owners– Establishing contracts / SLAs with clients– Reserving and allocating resources– Monitoring parameters associated with QoS sessions– Adapting to varying resource quality characteristics– Terminating QoS sessions
• User Expectations vs. Resource Management
When QoS is needed?
• Interactive sessions– Computation steering (control parameters & data
exchange)– Interactive visualization (visualization & simulations
services)
• Response within a limited time span• Co-scheduling or co-location support
From SCIRun, University of Utah
– Application QoS–User perception, response time, appl. Security, etc.– Middleware QoS–Comp., Memory and Storage– Network QoS–BW, Packet loss, Delay, Jitter
What is a Service Level Agreement (SLA) and why is useful for AC?
Client Provider
Can youdo X for mefor Y in return?
Yes
SLASLA
Distinguish between: Discovery of suitable provider Establishment of an SLA
P2P Search,Directory Service
SLA-Offer
SLA-AcceptSLA-Reject
A relationship between a client and provider in the context of a particularcapability (service) provision
SLA as a basis to support adaptive behaviour
What is an SLA?
Client Provider
Can youdo X for mefor Y in return?
No, but Ican do Zfor Y
SLASLA
Accept
SLA-CounterOffer
SLA-Offer
SLA-AcceptSLA-Reject
What is an SLA?
Client Provider
Can youdo X for mefor Y in return?
No
SLASLA
Can youdo Z for mefor Y in return?
NegotiationPhase(Single orMulti-Round)
SLA-Offer
SLA-CounterOffer
SLA-OfferDependency
Variations
Client
Providers
SLA
Client
Providers
SLA SLA
Multi-provider SLA
Single SLA is dividedacross multiple providers(e.g. workflow composition)
SLA dependencies
For an SLA to be valid, anotherSLA has to be agreed(e.g. co-allocation)
• Dynamically established and managed relationship between two parties
• Objective is “delivery of a service” by one of the parties in the context of the agreement
• Delivery involves:– Functional and non-functional properties of service
• Management of delivery:– Roles, rights and obligations of parties involved
What is an SLA?
Forming the Agreement
• Distinguish between:– Agreement itself – Mechanisms that lead to the formation of the
agreement
• Mechanisms that lead to agreement:– Negotiation (single or multi-shot)– One-shot creation– Policy-based creation of agreements, etc.
SLA Life Cycle• Identify Provider
– On completion of a discovery phase
• Define SLA– Define what is being requested
• Agree on SLA terms– Agree on Service Level Objectives
• Monitor SLA Violation– Confirm whether SLO’s are being violated
• Destroy SLA– Expire SLA
• Penalty for SLA Violation
WS-Agreement• Framework for SLA creation – interface
conforming to Web Services standards
• Service Client/Provider does not need to be a Web Service
• Provides a two layered model:– Agreement layer: Web Service-based
interface to create, represent and monitor agreements
– Service layer: Application specific-layer of service being provided
WS-Agreement
Agreement Initiator may be Service Consumer or Service Provider
ServiceLayer
AgreementLayer
WS-Agreement
Name/ID
Context
Terms Composition
Guarantee Terms
Service Terms
AgreementInformation about AgreementInitiatorResponderExpiration Time
Information about ServiceService Description Terms(generally, these are domaindependent)
Information about ServiceLevelService Level Objectives,Qualifying Conditions for the agreement to be valid,Penalty Terms, etc
WS-Agreement Terms
From: Viktor Yarmolenko (U Manchester)
WS-Agreement• Specification for Service Level Agreements
– Developed through GRAAP WG at the Open Grid Forum
– WSLA (from IBM) – previous efforts
• Provides:– Schema for agreement terms – A very simple protocol (two stage)– A state sequence – Support penalty clauses
• No support for negotiation
WS-Agreement Specification Document (GFD.107)
Data Center Scenario … 1• Identical servers – dynamically allocated among
multiple Web apps • For each application:
– Application Manager (performance optimiz.)
Interacting with a Resource Arbiter (server allocation)– Optimisation goal (“expected business value”) defined
by an “objective function”
• Resource Arbiter goal:– Allocate servers to maximise sum of expected
business value over all applications– Local value functions must share a common scale
Data Center Scenario … 2
Use of ReinforcementLearning
Resource Arbiter goal: allocate servers to maximize the sum of expected businessValue over all applications (assuming a common scale).
A Hybrid Reinforcement Learning Approach to Autonomic Resource AllocationGerald Tesauro et al., Proceedings of ICAC 2006, Dublin, Ireland.
Vi(.): utility curveEstimate of expectedbusiness value;e.g. Payments-penalties
Arbiter assignslist of assignedservers
Not all SLAs are equal• App events for trade stock data• Customer classes:
– Gold customers: pay for data– Public customers: connected over Internet
• Public customers get less information than Gold• Gold customers expect reliable delivery
– Need for acks increasing overhead in system• Cannot alter flow rate to tolerate delays
– But can support “admission” control
Utility Abstract measure of benefit to user (seek to maximize this given available resources)
SLA Classes
Risk-Aware Limited Lookahead Control for Dynamic Resource Provisioning in Enterprise Computing Systems, Dara Kusic and Nagarajan Kandasamy, Proceedings of ICAC 2006, Dublin, Ireland.
Assumes the existenceof multiple QoSclasses
Control System Architecture
• r_alloc: rate to a flow when it enters system• n_alloc: number of consumers (admitted for each class)
Utility-aware Resource Allocation in an Event Processing System, Sumeer Bhola, Mark Astley, Robert Saccone and Michael Ward, Proceedings of ICAC 2006, Dublin, Ireland.
Control System Strategies• Assumes knowledge of some “good” (ideal) state• Move system towards the good/ideal state• Impacted by:
– Response time (current good state transition)– Variability in operational environment (stability of approach)– Execution time– Discrete domain (tuning options from a finite set)
• Feedback control– PID– Kalman filter
• Neural network-based control – Use of learning approaches
• Rule-based approaches – Use of event recognition and triggers
Kalman Filters• Discrete time linear dynamic systems• Modelled on a Markov chain (with noise)• Linear operator applied to state to generate a new state
Fk = state transition model appliedto previous state xk-1
Bk = control input model applied toControl vector uk
Wk: process noise (normally distributed)
Differentiated Quality of Service
SilverCustomer
GoldCustomer
PlatinumCustomer
SAN Manager
SilverPolicy
GoldPolicy
PlatinumPolicy
SANStorage
From Joe Bigus (IBM)
SAN Manager Scenario Overview
Uses new AbleRuleAgent as rules-based policy manager Models multiple quality of service levels (represented by rule sets)N systems are defined, each with associated QoS levelsRequests include system identifier and current utilizationThe SAN Manager: Looks up QoS for that system Invokes the corresponding QoS rule set Rule sets make recommendations that allocations are either unchanged, increased or decreased SAN Manager evaluates recommendations and changes allocations based on total capacity limit
From Joe Bigus (IBM)
Platinum QoS RuleSet // Low allocation : if Allocation is Low and Utilization is Low then RecommendedAction = NoAction; : if Allocation is Low and Utilization is Normal then RecommendedAction = NoAction; : if Allocation is Low and Utilization is High then RecommendedAction = IncreaseAllocation;
// Normal allocation : if Allocation is Normal and Utilization is Low then RecommendedAction = DecreaseAllocation; : if Allocation is Normal and Utilization is Normal then RecommendedAction = NoAction; : if Allocation is Normal and Utilization is High then RecommendedAction = IncreaseAllocation;
// High allocation : if Allocation is High and Utilization is Low then RecommendedAction = DecreaseAllocation; : if Allocation is High and Utilization is Normal then RecommendedAction = DecreaseAllocation; : if Allocation is High and Utilization is High then RecommendedAction = Send.Warning_LowMem; : if Allocation is positively High and Utilization is positively High then RecommendedAction = Send.Warning_CritMem;
From Joe Bigus (IBM)
From Joe Bigus (IBM)
Dynamic SLA
• Limitations of a single agreement– Modifications since agreement was in place
• Cost of doing re-establishment– Not fully aware of operating environment
• Flexibility in describing Service Level Objectives– Not sure what to ask for (not fully aware of the
environment in which operating)– Too many violations
Dynamic WS-Agreement• Case 1: Static Agreement
– Identify Service Description Terms,– Guarantee Terms, and – Service Level Objectives (SLOs)
• Case 2: Dynamic Agreement– Identify Service Description Terms,– Guarantee Terms: defined as ranges or as
functions– Service Level Objectives: defined as ranges
or as functions
From: Viktor Yarmolenko
Function-based SLA (Yarmolenko et al.)
• Express initial SLA-Offer as a function of provider capability
From: Viktor Yarmolenko
From: Viktor Yarmolenko
From: Viktor Yarmolenko
Guarantee terms as functions
From: Viktor Yarmolenko
From: Viktor Yarmolenko
From: Viktor Yarmolenko
From: Viktor Yarmolenko
From: Viktor Yarmolenko
SLA Classes
• Guaranteed– constraints to be exactly observed– SLA is precisely/exactly defined– adaptation algorithm/optimization heuristics
• Controlled-load– some constraints may be observed– Range-oriented SLA– optimization heuristics
• Best-effort– any resources will do– no adaptation support
SLA Adaptation
• Assume capacityTotal: C= CG + CA + CB
• ‘best effort’ can uses the adaptive capacity, as long as its not used by the ‘guaranteed’
• When QoS degrades for ‘guaranteed’ • Then adaptive is utilized to compensate for
the degradation
• ‘best effort’ can still utilize the remaining capacity of the adaptive, as long as its not used by the ‘guaranteed’
• When the congested capacity is restored, the adaptive capacity can be used entirely by the ‘best effort’
G A B
G BA
G A B
BAG
G BA
o Before invoking the adaptive function:o Ensuring that the request at time (t) the agreed upon in the SLAo Ensuring that the total capacities within all SLAs at time (t) CG
Aim: compensation for QoS degradation for
‘guaranteed’ class only
Grid Node
Reservation ManagerAllocation Manager
Policy Manager
QoS Grid Service
Resources
Grid QoS service interface
Main components
• Policy Manager– To provide dynamic info about the domain-specific
resource characteristics and policy
• Reservation Manger– To provide advance/immediate resource reservation
• Data structure contains reservation entries• Interact with policy manager for resource char.
• Allocation Manger– To interact with the underlying resource manager for
resource allocation (e.g DSRT, Bandwidth Broker)
UDDIe
QoS Broker
Grid node 1 Grid node 2 Grid node 3
QoS Discovery
Client's Appl.
QoS service
ReservationAllocation
Policy
QoS service
ReservationAllocation
Policy
QoS service
ReservationAllocation
Policy
SLASLA
SLA
Joint work withArgonne National Lab.(Gregor von Laszewski et al.)
Reservation Approaches
• Resource reservation / allocation based on two strategies:– Time-domain: reserve the whole ‘compute’
power of Grid node.• Guaranteed exclusive access
– Resource-domain: reserve a CPU slot of the Grid node.
• Shared access – guaranteed resource capacity• Suitable for light weight applications/services.
CoG QoS Broker
UDDIeJava CoG Kit Core
Applications Portals Swing Legacy
Allocation ManagerReservation Manager
CoG QoS Grid Service
Policy Manager
CPU
Network
Disk
QoS Handler
Reso
urce
sRe
sour
ces
Resource Mangrs.Resource Mangrs.
Serv
ice
Agr
eem
ent
Serv
ice
Agr
eem
ent
Client
Client
Grid
Grid
GT2 Handler GT3 Handler
UDDIe HandlerReput Handler
CoG
Rep
utat
ion
Ser
vice
G-QoSMArchitecture
G-QoSM
Implementation Status
• References:– Rashid Al-Ali, Kaizar Amin, Gregor von Laszewski, Omer Rana and David Walker. An OGSA-
Based Quality of Service Framework. Proceedings of the Second International Workshop on Grid and Cooperative Computing (GCC 2003), Shanghai, China, December 2003.
– Rashid Al-Ali, Omer Rana, David Walker, Sanjay Jha and Shaleeza Sohail. G-QoSM: Grid
Service Discovery Using QoS Properties. Computing and Informatics Journal , Special Issue on Grid Computing, 21 (4), 2002.
• The QoS implementation is open source available for download from the Java CoG site http://www.globus.org/cog/java
Application Integration
1. Prepare: QoS negotiation TaskReturns: Agreement ID
2. Prepare: QoS job submission Task
3. Submit job to QoS service
QoS Job Submission Taskprivate void prepareQosJobSubmissionTask(){ // create a QoS JobSumbission Task Task task = new TaskImpl(``myTask'', QoS.JOBSUBMISSION); this.task.setAttribute(``agreementToken'', token); // create a remote job specification JobSpecification spec = new JobSpecificationImpl();
// set all the job related parameters spec.setExecutable(``/rashid/myExecutable''); spec.setRedirected(false); spec.setStdOutput(``QosOutput'');
//associate the specification with the task task.setSpecification(spec);
// create a Globus version of the security context SecurityContextImpl securityContext = new GlobusSecurityContextImpl(); securityContext.setCredential(null); task.setSecurityContext(securityContext); Contact contact = new Contact(``myQoScontact'');
ServiceContact service = new ServiceContactImpl(qosServiceURL); contact.setServiceContact(``QGSurl'',service); task.setContact(contact);}
QoS Task Submission
/*** QoS: Task Submission to QoS Handler ***/
private void QosTaskSubmission(Task task){ TaskHandler handler = new QoSTaskHandlerImpl();
// submit the task to the handler handler.submit(task);}
With Globus Toolkit 2
Best Effort
Guaranteed
Web Services Distributed Management (WSDM)
• Management USING Web Services (MUWS)– Web services to describe and access manageability of
resources
– Management applications use Web services just like other applications use Web services
• Management OF Web Services (MOWS) – An application of Management Using Web Services
for the Web Service as the IT resource
• Use Web Services as the distributed computing platform to enable interoperability between managers and manageable resources
WSDM Presentation WSMF Presentation
WSDM
Disturbance Benchmarking
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
Useful to comparethis with performancebenchmarks thatwe are much moreaware of
From Aaron Brown and Peter Shum (IBM)
Compare with automatedtesting mechanisms
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
From Aaron Brown and Peter Shum (IBM)
Behaviours and Interactions
• Interactions not “hard coded” – but expressed as high level objectives, eg. – Maximise this utility function– Find a reputable message translation service
• Autonomic Service providers can say “No”– Service provision must be consistent with
local policy and long term goals
• Policies may be expressed using logic or other formalisms
Emergence and Self-Organisation
• Increased complexity and autonomy implies that “global” coherent behaviours may be hard to specify
• Concept of “Emergence”• Interactions between autonomous systems that
can lead to useful global behaviours– How can we constrain each individual element within
such a system?– How can useful global behaviours be recognised
effectively?
Self Organisation
• Self-Organisation is a set of dynamical processes whereby structures or order appears at global level of a system from the interactions between the lower-level entities. The rules underlying the behaviour and that specify the interactions among the entities are implemented on the basis of local information, without any reference to the global pattern.
Emergence
• A dynamic, non-linear process that results in “macro-level” structures to form, based on interactions of system parts at the micro-level.
• Such emergence is “novel” – i.e. cannot be easily understood by taking the system apart and looking at the parts (reductionism)
Issues• Macro-Micro effect• Novelty
– Global behaviour is novel
• Coherence– Emergence has some sense of identity (i.e.
persists over some time)
• Dynamic– Emergence arise as system evolves over time
• Non-Linear• Distributed/Non-Centralised Control
– Not possible to control the entire system
Influences• Social Societies
– Emerging area of “Socionics”
• Biological Paradigms (Stigmergy)– Ant Colonies (Social Insects)– Swarms
• Particle Systems (fluidity and elasticity)– Chemical reactions– Spin Glass theory (due to temperature
changes)
Concepts of Utility
• What is considered “important”
• Value assigned to actions and operations
• Utility– Cost– Performance – Availability
• Some kind of “measurable” metric
Utility … 2• Payoff function
– assess behaviour of a particular action (reward signal)
• Analysis tool– relationship between local utility vs. utility of the
community
• Cost function– success w.r.t. a particular task
• Trust measure– measure of trust in a particular participant
Economic Utility: Metrics “Pyramid”
Utility OptimisationExpected Utility – E(x)
Infinite Horizon
Finite Horizon
0<<1
“U” may be negative
Long term rewards less useful
Social Insect Behaviour
• Self-organising Behaviour • The idea of simple behaviours interacting in a manner that produces a range
of interesting complex behaviours is very useful and exciting for designing complex systems :
• Positive Feedback (Autocatalytic) - Recruitment and Reinforcement
• Negative Feedback - Saturation, Exhaustion, or Competition• Fluctuations and Randomness - Random Walks, Errors,
Random Task-Switching etc.• Multiple Interactions
• Stigmergetic Behaviour• Waggle and Tremble dances (Bees)
From: Ashish Umre
Stigmergy
• Indirect communication via interaction with environment [Gassé, 59]– Sematonic [Wilson, 75] stigmergy
• action of agent directly related to problem solving and affects behavior of other agents.
– Sign-based stigmergy• action of agent affects environment not directly
related to problem solving activity.
Self-organised behaviour can be characterised by key properties like -
• The creation of spatiotemporal structures in an initially homogeneous medium, e.g. Nest Architectures, foraging trails, or social organisation.
• Multistability - possible coexistence of several stable states
• Existence of Bifurcations when some parameters are varied. (“Snowball effect”).
From: Ashish Umre
What do Ants do?• A few examples of collective behaviour that have been observed in
several species of Ants are: regulating nest temperature within limits of 1C; forming bridges; raiding particular areas of food; building and protecting their nest; sorting brood and food items; co-operating in carrying large items; emigration of a colony; complex patterns of egg and brood care; finding the shortest routes from nest to a food source; preferentially exploiting the richest available food source. task partitioning and division of labour
From: Ashish Umre
Ants in Nature
From: Ashish Umre
Adapting to Environment Changes
Pheromone Trails
D
E
H C
A
B
d=0.5
d=0.5
d=1.0
d=1.0
E
H
E
D
H C
A
B
30 ants
D
C
A
B
30 ants
15
ants
15
ants
15
ants
15
ants
30
ants
10
ants
20
ants
20
ants
10
ants
30
ants
T = 0 T = 1
What do Bees do?• Foraging Behaviour (Waggle
Dance)
• Task Partitioning and Division of Labour
• Scout-Recruit Concept (Tremble Dance)
• Group Decision Making and Colony Cooperation
• Regulating Hive temperature
• Communication : Food sources are exploited according to quality and distance from the hive
Waggle Dance
From: Ashish Umre
Wasps
• Pulp foragers, water
foragers & builders
• Complex nests
– Horizontal columns
– Protective covering
– Central entrance
hole
Pervasive Ants : Resource Discovery in Dynamic and Reconfigurable Networks
using Artificial Ants• Ants continuously explore new solutions
• Pulses “Drumming” used to update resource tables (The Modulatory Communication signal category of Drumming in the European Carpenter ants Camponotous herculeeanus and C. ligniperda. The worker ants strike the surface of the wooden chambers and galleries in which they live within their mandibles and gasters, producing vibrations that can be perceived by nestmates for 20 centimetres or more. Much, of the behaviour is classifiable as direct alarm communication. The behaviour of some categories is “tightened up”. Transition probabilities are raised, and hence uncertainty is reduced. The modulatory communication appears to be a primitive phenomenon in ants and other social insects.)
• Adaptive to continuous node failure and addition of new nodes and resources, and change in traffic conditions
From: Ashish Umre
Ant-Based Control Introduction
• Ant Based Control (ABC) is introduced to route calls on a circuit-switched telephone network– ABC is the first SI routing algorithm for
telecommunications networks• 1996
R. Schoonderwoerd, O. Holland, J. Bruten, L. Rothkranz, Ant-based load balancing in telecommunications networks, 1996.
ABC: Overview
• Ant packets are control packets• Ants discover and maintain routes
– Pheromone is used to identify routes to each node– Pheromone determines path probabilities
• Calls are placed over routes managed by ants• Each node has a pheromone table maintaining
the amount of pheromone for each destination it has seen– Pheromone Table is the Routing Table
ABC: Route Maintenance
• Ants are launched regularly to random destinations in the network
• Ants travel to their destination according to the next-hop probabilities at each intermediate node– With a small exploration probability an ant will
uniformly randomly choose a next hop
• Ants are removed from the network when they reach their destination
ABC: Routing Probability Update
• Ants traveling from source s to destination d lay s’s pheromone– Ants lay a pheromone trail back to their
source as they move– Pheromone is unidirectional
• When a packet arrives at node n from previous hop r, and having source s, the routing probability to r from n for destination s increases
Ant Algorithm
An ant in the network launched at node 3 with destination node 2, and has just travelled from node 4 to node 1. This ant will first alter node 1’s table corresponding to node 3 (its source node) by increasing the probability of selection ofnode 4; it will then select its next node randomly according to the probabilities in the table corresponding to its destination node, node 2.
•Every node has a pheromone table for every destination node in the network•A node with four neighbours in a 30-node network has 29 pheromone tables with four entries each.
Ants going from node 1 to 3
Updating Pheromone table• Ants can be launched from any node• Select next node according to probabilities
in the pheromone table for their destination nodes
• When ants arrive at a node – they update the probabilities of that node’s pheromone table (corresponding to their source node)
• Alter table to increase probability pointing to their previous node
• On reaching destination – ants die
Update law
• P = new probability (or pheromone) increase
• Probability can be reduced by operation of normalization (increase in another cell in table)
• Prob. can approach zero but never reaches it
Ant Algorithm
r
rtrtr
imsi
ms
1
)()1( ,
,
r
trtr
ilsi
ls
1
)()1( ,
.
r = 0.25 age
This equation specifies the new reinforced weight for the relevant node that corresponds to the ant’s last node
This equation specifies the weight for all other weights that do not correspond to the ant's last node
This equation specifies the reinforcement parameter that is employed in first two equations
From: Ashish Umre
Ageing• Delta_p changes with the age of the ant
– Age == path length (each hop increases ants age)
– Ants moving along shorter routes have higher age
– Age == delay of ants at nodes that are congested
– Delay ants age increases quicker
• As flow rate of ants to neighbours decreases – prevents ants from affecting pheromone table
ABC: Route Selection (Call Placement)
• When a call is originated, a circuit must be established
• The highest probability next hop is followed to the destination from the source
• If no circuit can be established in this way, the call is blocked
• Calls operate independently of ants
ABC: Initialization
• Pheromone Tables are randomly initialized• Ants are released onto the network to
establish routes• When routes are sufficiently short, actual
calls are placed onto the network• Calls and ants dynamically interact • New calls influence load on nodes
influences the ants by means of a delay mechanism
Relationship between calls, node utilisation, pheromone tables and ants. An arrow indicates the direction of influence
From: Ashish Umre
Average Packet Delay (With the Algorithm)
From: Ashish Umre
Average Packet Delay(Without Algorithm)
From: Ashish Umre
Packet and Pulse Loss (With the Algorithm)
From: Ashish Umre
Packet and Pulse Loss (Without the Algorithm)
From: Ashish Umre
Design Concerns
• Swarm Intelligent Systems are hard to
‘program’ since the problems are usually
difficult to define
– Solutions are emergent in the systems
– Solutions result from behaviors and
interactions among and between individual
agents
Summary of ABC• Ants regularly launched with random destinations • Ants walk randomly according to probabilities in pheromone
tables for their particular destination• Ants update the probabilities in the pheromone table for the
location they were launched• from, by increasing the probability of selection of their previous
location by subsequent ants.• The increase in these probabilities is a decreasing function of
the age of the ant, and of the original probability.• This probability increase could also be a function of penalties
or rewards the ant has gathered on its way.• The ants get delayed on parts of the system that are heavily
used.• The ants could eventually be penalised or rewarded as a
function of local system utilisation.• To avoid overtraining through freezing of pheromone trails,
some noise can be added to the behaviour of the ants.
Possible Solutions to Create Swarm Intelligence Systems
• Create a catalog of the collective behaviours • Model how social insects collectively perform
tasks– Use this model as a basis upon which artificial
variations can be developed– Model parameters can be tuned within a biologically
relevant range or by adding non-biological factors to the model
What are Ad Hoc Networks?
• Ad Hoc networks are
– self-organising multi-hop wireless networks;– no fixed infrastructure, such as base stations
or routers, is required;– ad hoc networks are rapidly deployable
networks;– all mobile hosts are embedded with packet
forwarding capabilities;
From: Ashish Umre
Current Routing Algorithms for Ad hoc Mobile Wireless Networks
• Table Driven routing Protocols:• Destination-Sequenced Distance Vector Routing (DSDV)
• Clustered Gateway Switch Routing (CGSR)
• The Wireless Routing Protocol (WRP)
• Source-Initiated On-Demand Routing:• Ad hoc On-Demand Distance Vector Routing (AODV)
• Dynamic Source Routing (DSR)
• Temporally-Ordered Routing Algorithm (TORA)
• Associativity-Based Routing (ABR)
• Signal Stability Routing (SSR)
From: Ashish Umre
Four Ingredients of Self Organization
• Positive Feedback
• Negative Feedback
• Amplification of Fluctuations - randomness
• Reliance on multiple interactions
Positive Feedback
Positive Feedback reinforces good solutions
• Ants are able to attract more help when a food source is found
• More ants on a trail increases pheromone and attracts even more ants
Negative Feedback
Negative Feedback removes bad or old solutions from the collective memory
• Pheromone Decay
• Distant food sources are exploited last– Pheromone has less time to decay on closer
solutions
Randomness
Randomness allows new solutions to arise and directs current ones
• Ant decisions are random– Exploration probability
• Food sources are found randomly
• Initially an ant will attempt to follow a random path to “explore” possible food sources
Multiple Interactions
No individual can solve a given problem. Only through the interaction of many can a solution be found
• One ant cannot forage for food; pheromone would decay too fast
• Many ants are needed to sustain the pheromone trail
• More food can be found faster• “Swarm” behaviour
Stigmergy
in
Action
This general “Clustering” behaviour is a key themein such approaches
Ants Agents
• Stigmergy can be operational– Coordination by indirect interaction is
more appealing than direct communication
– Stigmergy reduces (or eliminates) communications between agents
SI Advantages for Routing
SI based algorithms generally enjoy:• Multipath routing
– Probabilistic routing will send packets all over the network
• Fast route recovery– Packets can easily be sent to other neighbors by
recomputing next-hop probabilities
• Low Complexity– Little special purpose information must be maintained
aside from pheromone/probability information
More SI Advantages for Routing
• Scalability– As with any colonies numbering in the
millions, SI algorithms can potentially scale across several orders of magnitude
• Distributed Algorithm– SI based algorithms are inherently distributed
SI Disadvantages for Routing
SI also suffers from:
• Directional Links– Bidirectional links are generally assumed by
using reverse paths
• Novelty– SI is a relatively new approach to routing. It
has not been characterized very well, analytically
Pharaoh Ant (Monomorium Pharaonis)
• Colony Behaviours• Multiple Queening• Nest Conflict and
Cooperation• Migration• Clustering
• Analogies• Resource Allocation,
Discovery and Sharing• Adaptive Clustering
From: Ashish Umre
Current Issues in Mobile Agent Technologies
• Application Issues• Jumping Agents (Shopping, Taxi/Airport)• Location Sensitive (Bluetooth, HomeRF)• Profile Oriented
• Deployment Issues• Is the Infrastructure ready?
• Security Issues• Physical Mobility • Logical Mobility
From: Ashish Umre
Mobile Agents• Generalizing the “ant” based approach as a mobile agent• A paradigm based on code mobility
– Remote Evaluation – Code-on-demand (the Java Applet model)– Peer-2-Peer
• Migrate from one host to another “autonomously”– “Intelligent Viruses”? (do we really want these?)– Lead to security nightmares– Require writing in obscure languages (Tcl, Java etc)
• Provide an interesting paradigm for Grid computing– Assuming other Grid infrastructure is there
How do they differ from other DC paradigms
• Host supported mobility vs. autonomous migration – weak vs. strong mobility (Bradshaw and Suri’s work
on Nomad, vs. Aglets or Voyager)• What’s in a message?
– state– code or data
• How large should be a mobile agent • Tracking a mobile agent (forwarders, location service,
pheromone trails)• Host assisted
– state persistence (vs. soft state)– introspection
The overhyped differences between mobile objects and agents
• Mobile objects do not migrate autonomously– control transfer issues
• Mobile objects generally part of some application– limited or no access to a separate execution context
• Mobile object granularity is generally much finer– agents must carry code to interact with host (context
or place)• Mobile objects do not support a well defined API
– such as moveTo, retract, dispatch etc• Division of application into agents vs. objects will be
different • Absence of any standard framework
The overhyped reasons for why mobile agents are (apparently) useful
• Reduce in network load • Overcome network latency • Can encapsulate a protocol • Can execute autonomously and asynchronously • Can dynamically adapt their itinerary • May be heterogeneous • Are robust and can sustain faults in their environment
and why not … • all of the above can be done via messaging• too many security issues to be useful • unlike to support host platforms (standardisation has not
resulting in anything useful) • too hard to code, and abstraction is not obvious
Standardisation• MASIF (Mobile Agent System Interoperability Facility)
– Crystaliz, General Magic, IBM, GMD Fokus, Open Group
• Address interface between agent systems, and not agent applications
• MASIF Aim: Enable mobile agents to travel across various hosts in an open environment
• Support for locating an agent (MAFFinder)
• Released via OMG
MASIFStandardise on four areas:• Agent Management
– use of standard operations to manage agents from different vendors
• Agent Transfer– use of standard operations to create and migrate
agents from different agent systems• Agent and Agent System Naming
– use of standard Syntax and Semantics of parameters– part of MAFFinder
• Agent System Type and Location Syntax– use of standard syntax for location– part of MAFFinder
MASIF … 2void create_agent (
in Name agent_name,
in AgentProfile agent_profile,
in OctetString agent,
in string place_name,
in Arguments arguments,
in ClassNameList class_names,
in string code_base,
in MAFAgentSystem class_provider)
raises (ClassUnknown, ArgumentInvalid,
SerializationFailed,MAFExtendedException);
IDL Definition
MASIF … 3Location find_nearby_agent_system_of_profile(
in AgentProfile profile)
raises (EntrynotFound);
void resume_agent(
in Name agent_name_
raises (NameInvalid, ResumeFailed);
void list_all_agents_of_authority(
in Authority authority) ;
NameList list_all_agents() ;
Location list_all_places() ;
IDL Definition
MASIF … 4interface MAFFinder{
void register_agent(
in Name agent_name,
in Location agent_location,
in AgentProfile agent_profile)
raises (NameInvalid);
void register_agent_system(
in Name agent_system_name,
in Location agent_system_location,
in AgentSystemInfo agent_system_info)
raises (NameInvalid);
IDL Definition
MASIF … 5Location lookup_agent(
in Name agent_name,
in AgentProfile agent_profile)
raises (EntryNotFound);
Location register_place(
in string place_name,
in Location place_location)
raises (NameInvalid);
IDL Definition
At each host ...• An Agent Server
– one or more such servers can co-exist on a particular machine
– an agent server must be identifiable by a unique URL– must also be able to launch and subsequently support
tracking of the agent• System support for migratable, non-persistent code
– memory, CPU• System support for handling local security policy
– sandbox, authentication/access control mechanism, certificate verification mechanism, etc
MA Lifecycle
A
A
dispatch
retract
create
Class file
Class file
deactivate activate
dispose
Based on IBM Aglets
Why are they useful in Grids? • Important code delivery paradigm
• Must operate in the context of existing Grid systems
– may alleviate some issues with mobility
• Support essential needs of Grid computing
– software and protocol updates
– load balancing and migration
– user migration
• Most importantly -- they support a “Demand Oriented” style of computing
– move computation and data “on demand”
– move a limited set of functionality “on demand”
Achieving Parallelism• Mobile Agents also useful to support parallelism at a
coarser granularity
– simultaneous dispatch of agents to multiple sites
– simultaneous dispatch of messages to multiple sites via specialised group formation (aspect of “Spaces” -- formed through multicast groups)
– Integration with existing message passing libraries (MPI or PVM) via the host machine
• Achieved parallelism can be more dynamic
– Agents can decide where to migrate vs. pre-defined message transfer based on MPI or PVM
• May not be useful for “production grade” parallelism
Supporting Mobility• Object Identity: Killing old object as copy sent to a
remote host (address space) -- use of Java garbage collection when no references exist to object
– mobile object pool
• Object Serialisation: what happens to private, transient and state variables -- when to move?
– Java.io.serializable
– serialization of threads?
• State synchronisation and sharing: HORUS -- object server?
• Concurrency through Actors (objects that own their own thread) -- Actors are non-blocking
Explicit Serialization• Via the Externalizable interface in Java
– must be manually implemented by programmer
– can customise how an object’s fields are mapped to a stream
– means of checkpointing state (includes object’s field values + metadaat about class version, and field types)
– Write out all visible states of a thread to a stream, read back state, initiate a thread
• Consider method invocation as a “single” unit of computation
– allow thread read only before or after a method invocation (i.e. no active threads)
• Access to stack variables
– stack variables made part of object’s state
Custom Classloaders• Can also implement custom classloaders• Classloader used to:
– dynamically determine which code to migrate– which code should be released – how code interacts with the operating environment
• Classloaders are a useful way to extend existing Grid systems – use of the CoG Java toolkit or OGSA to link to Globus – interactivity between existing scheduling systems
• Offer class loading features as a Grid Service– characterised by application features?
• Classloaders take away intelligence from migrating code -- hence not the ideal solution
Write your own Classloader()
• Extend “Primordial Classloader” in Java – invoked after calling main() method– Matrix m = new Matrix() ; -- execute “new”
bytecode– System.out.println() -- invoke static
reference to class (putstatic, getstatic etc)• Class loaders enable Java apps (EMACS or Scientific
codes) to be dynamically extended• Byte code verifier - defineClass, ClassFormatError
• Package over-write/addition: java.lang.hackit -- protect system namespace
• Multiple Classloaders can co-exit
Dynamic Itinerary • A mobile agent may visit a number of hosts• This itinerary may change over time
– based on data collected at intermediate hosts– may not return to host machine
• Itinerary may be dictated by a particular host – agent may override this
• Dynamic itinerary useful in Grid context– load may not be known beforehand– hosts may not always be available or reliable – services may not always be present– users/experts may migrate
Locating an agent• Use of proxy
– local proxy to track agent
• Forwarders– creating a chain of non-persistent forwarders– pheromone based approaches
• A location service– event notification service – query service
Application scenario: Load gathering• Sensors measure network load
– similar to SNMP • Report this to an event gateway and monitor this at a given
control site• JAMM system an example
– other work taking place in the Global Grid Forum Network Monitoring group
• Mobile agent may be used to gather load – carry a schema for gathering parameters– interact via local host to SNMP gateway – record local parameters and carry statistics – pass through a given host to lodge results– itinerary may be application dependent
Java Agent Measurement and Monitoring (JAMM) - LBNL
JAMM scenario
Load gathering
Application Profiles• Application categories:
– restrict itinerary – identify common patterns
• Resource suggestions– identify common patterns– resource characteristics
• MA-MA interaction– used to inform about other resources– share application requirements– determine commonality in applications
Load imposed by Mobile Agents• MA performance becomes an issue• Issues
– where should a mobile agent visit next? – What should the mobile agent carry vs. leave behind?– How long does a mobile agent spend on a given host?– How long does it take for a mobile agent to visit from
A->B• Need for tools that can help gather this data
– Recorded within each agent – Support for specialised services which gather this – Data can be queried based on MA authorisation
David Kotz, Guofei Jiang et al. (Dartmouth College)
Fernando Pinel, Omer F. Rana (Cardiff)
Benchmarking• MA benchmarking efforts also important in this context.• Benchmarks can be micro-
– create (locally or remotely) and dispatch an agent– Retrieve an agent – blocking and non-blocking message exchanges
• or macro-– forwarding– roaming – proxy servers
M. Dikaiakos, M. Kyriakou, G. Samaras, "Performance Evaluation of Mobile-agent Middleware: A Hierarchical Approach." In Proceedings of the 5th IEEE International Conference on Mobile Agents, J.P. Picco (ed.), Lecture Notes of Computer Science series, vol. 2240, pages 244-259, Springer, Atlanta, USA, December 2001
Additional uses: Consumer Grids• More open perspective on Grids
• Individuals and organisations can operate as suppliers of services/resources
• Service providers must be able to:
– Dynamically download software to participate on the Grid
– Varying resource capabilities
– Dynamically determine resource properties
• Resource aware visualisation
– Remotely configure resource
• Mobile agents provide an important abstraction
• Many existing technologies are useful contenders: Peer-2-Peer and Web Services
Resource sharing• Peer-2-Peer
– CPU sharing (Entropia, Parabon, UD, SETI@HOME)– File sharing (Napster, Gnutella, Freenet)
• CPU sharing– Utilisation of free cycles via standard downloads– Requires upload of data on which to operate– Generally high redundancy and replication
• File Sharing– Search for common file types, and support file
placement– Use of indexing or intermediate servers
• Development libraries: JXTA
Resource Sharing … 2• In MA:
– CPU sharing: migration of mobile agent– File sharing: migration of associated data and state
• Migration and execution can be more intelligent• Use of forwarding and location services can be coupled
with additional services:– Work distribution and current state of computation– Resource events to support migration
• P2P infrastructure also useful:– Development of itineraries via overlay networks or
index servers– Security issues (?)
File Space Management
• Cache management– migration support for files (temporary results,
configuration etc)• File space re-ordering
– sharing of directory space across machines– virtual “file stores”
• Results to common queries– file placement closer to computation– file replication to support availability levels
• Managing user and project groups
Common Themes• Load balancing and migration
• Data capture (especially performance related)
• Trigger and configuration – set up of execution at remote sites– updates to execution or changes– user set up
• Establishing dynamic resource groups
• Resource provisioning beyond regional and national centres
Concerns• Dealing with licensed software
– proprietary code or data
• Dealing with production codes
– highly tuned performance
– issues of Grid computing are questionable here
• Domain decomposition
– issues in translating large scale codes to mobile agents
– where is the abstraction most suitable/relevant
• Interfaces between Grid systems and Mobile Agent systems
Issues … Swarm/Ant Systems
• Tragedy of the Commons: Self Organisation does not always produce the desired outcome (Thomas Schelling's Micromotives and Macrobehavior):– El Farol Bar problem– Sheep Grazing problem
• Some individuals and organizations are more comfortable and moreefficient with hierarchical organizations that are more centrallycontrolled
Issues … 2
• Useful in an “experiment” and “explorative” environment
• System must be “non-conservative” in its approach to experiment and evaluate different system behaviours
El Farol Bar … 2
• Agents select a night (1—7) – based on expected attendance or reward (from prior experience)
• Agent attends the bar– Attendance on selected night – Output of the reward function
• Update agent’s model of the system• Agents cannot communicate with each other• Global objective: Maximise cumulative reward of
entire system
Tragedy of the Commons
• Self-interested gain of one member of the community is to the detriment of the whole community
• Pasture on which each agent keeps cattle– Utility increases as number of animals
increase– Overgrazing affects all agents detrimentally
• Agent needs to decide whether to cooperate or defect
Braess’ Paradox
• Agents traverse a network consisting of a set of nodes – and a number of connections between the nodes
• Aim: each agent must reach its destination as quickly as possible– Traffic networks, water supply networks, electrical
networks etc • BP: Addition of an extra link has a detrimental
effect on performance• Introducing a shortest path link in a network that
has reached equilibrium
A
B C
D
A
B C
D
Occurs when a community of agents is unable to coordinate their activities to takeadvantage of changes in the environment.
Collective Intelligence (COIN)
• Developed at NASA by Wolpert et al.• Scalable coordination technique for
adaptable, learning based multiagent systems (MAS).
• All agents strive to maximise their local utility function.
• The goal of the system is to maximise the global utility function.
Collective Intelligence (COIN)
Local utility functions are derived from the global utility functions so that:
• Maximisation of local utility functions maximises the global utility function – global optimum ‘line-up’ with the Nash Equilibrium.
• Local utility functions are learnable: good signal-to-noise ratio for learning algorithms.
• Agents are coordinated indirectly. Emergent behaviour is still possible as agents are not given explicit instructions and behaviour is not predefined.
Adapting Collective Intelligence
• We are aiming to adapt this technique for agents that can be deployed via the internet.
• COIN concentrates of specific applications: coordinating communications satellites, robotic rovers.
• We want to apply this technique dynamically and concentrate on software agents.
LEAF – Learning Agent FIPA Compliant Community Toolkit
• Utility functions assigned dynamically.
• Utility extended to form two separate types: functional utility and performance utility.
• Assignment of multiple utility functions possible.
• Java API provided to support development of FIPA compliant agents.
FIPA - Foundation for Intelligent Physical Agents
• Standards for interoperable agent systems.• FIPA ACL: conversations consisting of FIPA
performatives such as inform, request, query etc.
• Agent management system (AMS) and directory facilitator (DF) part of the FIPA platform.
• LEAF utilises FIPA-OS implementation from Emorphia.
Community Building Kit: LEAFFour core concepts:
LEAF agentsLEAF utility functionsESNsLEAF tasks
Provides support for:JESS based policy descriptionReinforcement learning
LEAF Agent
LEAF: Learning Agent FIPA-Compliant Community Toolkit
Implementation of LEAF is based on FIPA-OS
FIPA-OS
LEAF
FIPAOSAgent Class
LeafNode Class
ESN Class
Task Class
LeafTask Class
• Coordination: utility functions are assigned to agents by an environment service node.
LEAF: Learning Agent FIPA-Compliant Community Toolkit
ESN
Community
f1
f2
LEAF: Learning Agent FIPA-Compliant Community Toolkit
ESN
Community a
f1
f2
ESN
f3
Community b
sum f2,f3
Multiple utility functions can be assigned
• Utility functions can have parameters that are not available locally to the agent.
LEAF: Learning Agent FIPA-Compliant Community Toolkit
ESN
Community
f1
LEAF: Learning Agent FIPA-Compliant Community Toolkit
• Utility functions can have parameters that are not available locally to the agent.
LEAF: Learning Agent FIPA-Compliant Community Toolkit
ESN
Community
R
O
O: observable propertiesR: remote parameters
f1
LEAF: Learning Agent FIPA-Compliant Community Toolkit
Performance and Functional
Utility
P
F
Speed of execution, number of tasks, CPU usage etc. Decision making,
learning - high level behaviour.
Performance Utility
• Provides a utility measure based on performance engineering related aspects– Comms metrics:
• number of messages exchanged, size of message, response time
– Execution metrics: • execution time, time to convergence, queue time
– Memory and I/O metrics: • Memory access time, disk access time
• The effect of implementation decisions (algorithms; languages) and deployment decisions (platforms; networks), can be assessed.
Functional Utility• Utility based on “problem solving” capability
• Statically defined– related to service properties (capability based)– degree of match between task properties and service
capability• syntax match (exact match)• range match• semantic match (subsumption/subclass)
• Dynamically defined– related to execution output (MSE)
Utility Function Implementation
• Extend the LocalUtilityFunction abstract class.
• Implement the compute() method.
• Functions have access to remote parameters and observable properties.
Utility Function Implementation
Utility functions
• Global Utility (G) = Si Local Utility (Ui)
• U = (jobs of type X processed)/(jobs of type X submitted)
• U = 1/(idle time)
Can you consider other utility functions that may be relevant?
For students
Access to utility functions
double computeFunctionalUtility()Computes the sum of all currently assigned functional utility functions.
double computePerformanceUtility()Computes the sum of all currently assigned performance utility functions.
String[] getFunctionalUtilityRequiredProperties()Returns the observable properties required to compute the agent’s functional utility functions.
String[] getPerformanceUtilityRequiredProperties()Returns the observable properties required to compute the agent’s performance utility functions.
Resource management
• The objective is to provide users with on-demand access to resources needed to execute applications.
• Each peer/agent can undertake three different roles: application agent, resource agent, broker agent.
• Multiple roles may be undertaken by the same peer.• Each peer is an autonomous agent capable of
learning within it’s environment with the goal of local utility maximisation.
Application Agents• Accept applications from users.• Decompose applications into tasks.• Identify suitable resources for task execution,
via broker agents.• Schedule and submit tasks to resource agents.• Manage dynamic application execution process.• Coordinated learning may be of benefit in
resource selection.
Resource Agents• Manage access to a particular resource.• Resources may be computational, visualisation,
scientific, or instrumentation based.• Resource agents allow tasks to be submitted
and executed on the resource.• Coordinated learning may allow resource agents
to optimise resource properties, and prioritise tasks.
Broker Agents
• Maintain information about discovered resource agents.
• Offer a matchmaking service, aimed at allowing application agents to discover resource agents.
• Coordinate learning may allow brokers to optimise their matchmaking service.
Agent based resource management
• Previous work used planning based BDI agents within the same framework.
• Current research involves investigating whether agents can benefit from coordinated learning.
• The eventual goal is to integrate the two techniques.
Agent Communities
• Communities are centred on the application/resource type: computational (C), visualisation (V), scientific (S), instrumentation (I) – there can be multiple communities of the same type.
• When an agent joins a community, it is assigned a local utility function.
• The agent learns to optimise this function to benefit the community.
• Agents are allowed to join multiple communities in an attempt to maximise their utility.
Agent Communities
Each community has a global utility function, based on community objectives:
1. Peers acting as application agents process as many applications as possible.
2. Peers acting as as application agents process as many applications as possible.
3. Peers acting as broker agents facilitate (1) and (2).
Global Utility Functions
where A is the number of applications processed by the community, idlei is the amount of time agent i spends idle. c1,c2 are constants
Application agent utility functions
where Aa is the number of applications processed by agent a, and Ja is the total resource usage time used by a. c1,c2 are constants
Resource agent utility functions
where Tr is the number of tasks processed by resource agent r, and idler is the total time spent idle by the resource. c1,c2 are constants
Broker agent utility functions
where n resources have been recommended by the resource agent, and Ul(i)Ti is the local utility of the recommended resource at the time of recommendation.
Simulations• 4 communities – (C,V,S,I)• 10 resource agents• 3 application agents• 1 broker agent• The current focus is on resource agent learning –
joining communities and updating resource properties
• Peers attempt to join communities in order to increase their utility, and will only remain in the community as long as their utility is above a certain threshold.
0
5
10
15
20
25
30
35
0 50 100 150 200 250time
Global UtilityNumber of Members
computational community
visualisation community
0
10
20
30
40
50
60
70
80
90
0 50 100 150 200 250 300 350 400time
Global UtilityNumber of Members
0
5
10
15
20
25
30
35
40
45
50
0 50 100 150 200 250 300 350 400 450 500time
Global UtilityNumber of Membersstorage
community
instrumentation community
0
1
2
3
4
5
0 100 200 300 400 500 600time
Global UtilityNumber of Members
Current research objectives
• The aim is to allow peers to form communities, around which the collection of peers is ‘greater than the sum of their parts’.
• Current work involves the engineering of this application, and the evolution of the utility functions to include a greater degree of social context
• Learning is currently very difficult for the agents – need to allow learning algorithms to converge.
Common Themes• Load balancing and migration
• Data capture (especially performance related)
• Trigger and configuration – set up of execution at remote sites– updates to execution or changes– user set up
• Establishing dynamic resource groups
• Resource provisioning beyond regional and national centres
Toolkits: ABLE
• ABLE (Agent Building and Learning Environment)
• Support use of Java Beans
• Provides a host of pre-built functionality
• Also provides Tuning agents for:– Load Balancing– System Control function
AbleBeans – Java Agent Building Blocks
AbleBean
AbleBean
Direct method calls
Notification Events
Action Events
AbleEvents
AbleBean, AbleRemoteBean: a Java interface (local and remote) AbleObject: AbleBean instantiation with autonomous threadBean interactions: Direct method calls and event passingAbleEvents: Notification and Action events with synchronous and asynchronous event handling AbleBeanInfo and Customizer required for use in Agent Editor Set of core data access and algorithm beans supplied
From Joe Bigus (IBM)
AbleAgent
Sensor Eff ector
get app data
call app action
AbleBean A
AbleBean CAbleBean B
App/ Service 1 App/ Service 2
AbleAgent, AbleRemoteAgent: a Java interface (extends AbleBean) Composable: can contain other AbleBeans and AbleAgentsSensors and Effectors: Allow agents to interface with apps Can be distributed, synchronous or asynchronous (autonomous)
AbleAgents – Intelligent JavaBeans
From Joe Bigus (IBM)
ABLE Component Library
Machine Learning
Machine Reasoning
Agents
Data Access/Analysis
Back propagationSelf organizing mapsRadial Basis FunctionsTD-LambdaDecision TreesNaive Bayes
Script (procedures) Forward / Backward chaining Predicate logic (Prolog)Rete'-based pattern matchFuzzy systemsPlanning (STRIPS)
Text/DB read/writeCache, Filter, TransformStatistical routinesGenetic algorithmsother math analysis
Classification Autotune (closed loop control) Clustering Storage manager (multiple QoS)Prediction
From Joe Bigus (IBM)
ABLE Application Design
ABLE Core Beans
Custom Beans
(domain-specific)
Application
AgentABLE Library
From Joe Bigus (IBM)
AbleBean Wrapper Design Pattern
myAlgorithmBean
myAlgorithmCustomizer
myAlgorithmBeanI nfo
theAlgorithm
init()myAlgorithmBean()
process()
setters()getters()
theAlgorithm()
init()
process()
getters()setters()
processTimerEvent()
Allows easy integration of existing J ava algorithms into the Able environmentRequires creation of 3 J ava classes, Bean wrapper, BeanI nfo and CustomizerBean contains an instance of the algorithm and calls methods on it No (or minimal) source changes required in the algorithm class
From Joe Bigus (IBM)
Rule Blocks <type> <name>() using <engine> { ruleList } ; • Semantically equivalent to Java methods• Can specify a return data type• Can use pre-defined or user-defined name• No formal parameter lists, use global vars• Specify inference engine via using <engine> clause • <engine> can be any AbleInferenceEngine Java subclass• Body of ruleblock contains one or more Rules• Use setControlParameter() built-in function to set goals,
options, etc. • Ruleblock can have local or shared working memory
ARL Rule Syntax
<ruleLabel> { preConditions } [priority] : <ruleBody>;
• ruleLabel – unique identifier in ruleset• preConditions – list of Java objects
(e.g.TimePeriods)• priority – used in conflict resolution during
inferencing • Rule body must be one of the ARL rule types • myRule { weekdaysOnly } [ 3.0 ] : println(“wow”);
ABLE Rule Templates Allow IT Developer or Programmer to create rulesets and templates using WSAD editor Minimize external meta-data or artifacts Business user can create rules from templates using web-based UI Allow easy parameterization of rules and rule logic, with constraints on parameter values Reuse existing ABLE data types and ARL syntax
Allow users to customize rule templates and create new rules Variable values are constrained based on ruleset author constraints Can generate individual rules or entire rulesets via templates Can edit generated rules using same authoring environment
ARL Rule Template Syntax Ruleset myRuleTemplateExample { import com.ibm.myclass.Customer; variables { Customer customer = new Customer() ; // myclass type template Categorical customerLevel = new Categorical("gold", "silver", "platinum"); template String salesMsg = new String("Thank you for shopping IBM"); // example msg template Continuous discountValue = new Continuous(0.01, 0.50); // allow range from 1% to 50% Double discount = new Double(0.0) ; }
inputs { customer } ; outputs { discount } ; void process() { Rule1: if (a > b) then println("regular old rule") ; Rule2: if (a <= b) then println("another regular old rule") ;
template myRuleTemplate1: if ( customer.level == customerLevel ) // NOTE: Rule is a template then { discount = discountValue ; println( salesMsg ) ; } } }
Agent Properties• Flexible• Autonomic• Generic
• KeepAlive
• MaxClients
• CPU
• MEM
Users
Apache Web Server
Desired Utilization Level
AutoTune Agent- Modeling
- Run-time Control
Autotune Agent Web-Tuning Scenario
Design Phase I: System Modeling
SysAdminBrainRuleSet
SysAdminActionsRuleSet
CPUWatcher
FindLargeObjectsfindDuplcateJobs
CleanupFindRunawayJobs
DiskWatcher
DiskPredictor
NOJWatcher
iSeries System Adminstration using ABLE
SysAdmin Agent
Task/Info Agents
Action Agents
P e r f o r m a n c e P r e d ic t io n u s in g N e u r a l N e t w o r k sP e r f o r m a n c e P r e d ic t io n u s in g N e u r a l N e t w o r k s
M o n i t o r D a t a
N e u r a l
P r e d i c t i o n A g e n t
W e b S e r v e r r u n n i n g o n W i n d o w s 2 0 0 0
H i t w i t h v a r i a b l e w o r k l o a d , s e a s o n a l i t y
C a p t u r e P e r f o r m a n c e M o n i t o r D a t a
T r a i n n e u r a l n e t w o r k t o p r e d i c t f u t u r e r e s p o n s e t i m e
WinGamma
• Data analysis toolkit – especially for time series data
• Can support identification of:– Time series “embedding” dimension – Level of noise present within data – Based on the “Gamma” statistic
• Can be used prior to training a neural network
WEKA: Waikato Environment for Knowledge Analysis
Explorer: building “classifiers”
• Classifiers in WEKA are models for predicting nominal or numeric quantities
• Implemented learning schemes include:– Decision trees and lists, instance-based
classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
• “Meta”-classifiers include:– Bagging, boosting, stacking, error-correcting
output codes, locally weighted learning, …
Monitoring Tools
• NWS (Network Weather Service)– Support a forecasting model – Work at “application-level” and not necessarily at the
network (resource) level
• NetLogger– Now supports instrumentation for Globus calls– Useful data capture process (event based)– Manage level of data captured
• Specialist support via Apache Web Server– Messaging and Execution time
From Brian Tierney (LBNL)
From: G. Obertelli (UCSB)
Additional Info.• IBM Autonomic Computing Web site
– http://www.research.ibm.com/autonomic/• IBM Autonomic Computing Library
– http://www-03.ibm.com/autonomic/library.html• LEAF project
– http://users.cs.cf.ac.uk/O.F.Rana/leaf/• DIPSO/FAEHIM project
– http://users.cs.cf.ac.uk/Ali.Shaikhali/faehim/• WinGamma
– http://www.cs.cf.ac.uk/wingamma/• WEKA
– http://www.cs.waikato.ac.nz/ml/weka/• ABLE Toolkit – Tutorial
– http://www.cs.iastate.edu/~colloq/docs/able2_bigus.ppt