267
Autonomic Computing Omer F. Rana (Cardiff University)

Autonomic Computing Omer F. Rana (Cardiff University)

Embed Size (px)

Citation preview

Page 1: Autonomic Computing Omer F. Rana (Cardiff University)

Autonomic Computing

Omer F. Rana (Cardiff University)

Page 2: Autonomic Computing Omer F. Rana (Cardiff University)

Overview• Illustrative example:

– Managing Web Servers– Reference to IBM’s AC vision

• Use of SLAs to support system management – SLA standards, use of SLA in adaptation

• Approaches to adaptation – Stigmergy (social insects)– Utility-based approaches

• Toolkits

Page 3: Autonomic Computing Omer F. Rana (Cardiff University)

Recap … AC

• Automating the management of computer resources

• System components more complex– Better functionality– Hard to appreciate functionality– Interaction between components not always

obvious

• System admins under increasing pressure to respond to complexity

Page 4: Autonomic Computing Omer F. Rana (Cardiff University)

AC … 2

• Manual tuning– Generally script driven (requires updates to

configuration files)– Error-prone process (requires skilled personnel)

• Automated tuning– Try to model behaviour of the system – Use this behaviour as a “predictive” tool to determine

likely response from system– Design feedback control mechanisms (and use on-

line operation to adjust control)

Page 5: Autonomic Computing Omer F. Rana (Cardiff University)

AC application

Can be applied at two levels:• Individual component level

– Make each component more intelligent – Provide support infrastructure around this

intelligent component

• Interaction level– Facilitate better interaction between

components in some way – Allow “useful” interactions to “emerge”

Page 6: Autonomic Computing Omer F. Rana (Cardiff University)

Four Concepts• Self-configuring:

– Dynamic adaptation to changing environment– Addition of new features dynamically

• Self-healing:– Discover, diagnose and react to disruptions– Handling failure and isolating a component

• Self-Optimising:– Monitor and tune resource utilisation – Includes: dynamic partitioning, workload management

• Self-Protecting:– Anticipate/Identify, detect and protect from attacks– Extend existing security infrastructure to achieve this

Page 7: Autonomic Computing Omer F. Rana (Cardiff University)

Relationship to other themes

• Machine Learning and AI • Knowledge Management (Semantics)• Coordination Mechanisms and Protocols • System Administration • Performance Engineering and Monitoring

• Related Emerging areas– Ambient Intelligence – Amorphous Computing– Computational “Fabrics”

Page 8: Autonomic Computing Omer F. Rana (Cardiff University)

From:IBM

Page 9: Autonomic Computing Omer F. Rana (Cardiff University)

Response Time

Actual BOPS

Predicted BOPS

#Active Servers

#Requested Servers

11. Steady State

From Alan Ganek, IBM

Page 10: Autonomic Computing Omer F. Rana (Cardiff University)

2. Monitor, Detect Surge

Response Time

Actual BOPS

Predicted BOPS

#Active Servers

#Requested Servers

1 2

From Alan Ganek, IBM

Page 11: Autonomic Computing Omer F. Rana (Cardiff University)

3. Forecast, Provision Servers

Response Time

Actual BOPS

Predicted BOPS

#Active Servers

#Requested Servers

1 21 2 3

From Alan Ganek, IBM

Page 12: Autonomic Computing Omer F. Rana (Cardiff University)

Response Time

Actual BOPS

Predicted BOPS

#Active Servers

#Requested Servers

4. Monitor, Remove Servers1 2 43

From Alan Ganek, IBM

Page 13: Autonomic Computing Omer F. Rana (Cardiff University)

Apache Web Server Tuning

• Based on a client-server basis with a limit on MaxClients and KeepAlive – Tuning is equivalent to modifying MaxClients and

KeepAlive

• Performance Metrics – End-user response time – Resource utilisation – CPU and memory utilisation

• Measure parameters on server side• Over utilisation == thrashing and potential failure

Page 14: Autonomic Computing Omer F. Rana (Cardiff University)

Basis for Metrics• Master process + pool of worker processes• Each worker process handles interaction with a

Client • Worker processes limited by MaxClients• Worker Process: idle, waiting and busy

– Idle (no TCP connection made)– Waiting (waiting for HTTP request from client)– Busy (processing request)

• Persistent HTTP/1.1: TCP connection remains open between consecutive HTTP requests (reduces time to set up a connection)

• Persistent connection can be terminated by master or client process – if waiting time exceeds max. allowed by KeepAlive

Page 15: Autonomic Computing Omer F. Rana (Cardiff University)

Desired CPU level=0.5, and Memory=0.6

Manual Tuning

Page 16: Autonomic Computing Omer F. Rana (Cardiff University)

Dynamic Workload (additional requests at 20th Control Interval)

Manual Tuning … 2

Page 17: Autonomic Computing Omer F. Rana (Cardiff University)

Dynamic Workload

• To maintain CPU and Memory criteria, it is necessary to tune manually

• Achieved by adjusting MaxClients and KeepAlive parameters

• Dynamic workload (generally unpredictable) requires continuous re-tuning

• Trying to follow changes resulting from dynamic workload can be continuous process

Page 18: Autonomic Computing Omer F. Rana (Cardiff University)

AutoTune agents

• Autotune Adaptor Bean – Interfaces with target system for service level

metrics– Sets tuning parameters

• Autotune Controller Bean– Specifies control strategies (based on data

captured)– Interacts with system admin to configure

control strategy

Page 19: Autonomic Computing Omer F. Rana (Cardiff University)

Manages (1) timer,(2) Async events

Can set (1) controland (2) sample intervals

AutoTune Functionality

Page 20: Autonomic Computing Omer F. Rana (Cardiff University)

AutoTune Architecture Data set generator

Page 21: Autonomic Computing Omer F. Rana (Cardiff University)

AutoTune Agent Operations• Three agents:

– Feedback controller design• Model based controller• Linear Quadratic Regulation (LQR) controller

– Modelling• Non-production/testing mode• Alters tuning parameters: MaxClients and KeepAlive• Records performance metrics: CPU and memory• Construct dynamic model (based on time series)

– Run-time control• Production mode • Uses output from controller – dynamically adjusts MaxClients

and KeepAlive

Page 22: Autonomic Computing Omer F. Rana (Cardiff University)

Modelling agent• Build a mathematical model of the system

– Queuing theory– Data analysis based

• Mathematical model – Requires understanding of inner workings of server– May need to know about particular properties (exceptions) of the

way the server operates• Data-based model (“blackbox” approach)

– Gather data of system in the “wild”– Assume have covered sufficient number of test cases

• User Input– Range of Tuning Parameters: MaxClient [1,1024]; KeepAlive

[1,50]– Max delay required for tuning parameters to take effect on the

performance metrics: MaxClients (10m); KeepAlive (20m)

LinearModel

Page 23: Autonomic Computing Omer F. Rana (Cardiff University)

Feedback Control

• PID (proportional-integral-derivative) control – Correct error between a measured process

variable and a desired point– Calculating and outputting a corrective action

to adjust process accordingly

From Wikipedia

Page 24: Autonomic Computing Omer F. Rana (Cardiff University)

Feedback Control … 2• Proportional: reaction to current error

• Integral: reaction based on recent error (time based)

• Derivative: reaction based on rate by which error has been changing

• Use a weighted sum of the three modes

• Output as a corrective action to a control element

Page 25: Autonomic Computing Omer F. Rana (Cardiff University)

Proportional Mode

• Responds to a change in the process variable proportional to the current measured error value

• Multiply the error by a constant Kp (proportional gain)

m: output signal;Kp : proportional gaine: error (expected – actual)PB: proportionall Band

Page 26: Autonomic Computing Omer F. Rana (Cardiff University)

Integral Model• Controller output is proportional to the

amount and duration of the error

• Algorithm calculates the accumulated proportional offset over time

• Leads to controller approaching required value quicker – but contributes to system instability – may cause “overshoot”

m: output signal;Ti: Integral timee: error (expected – actual)

Page 27: Autonomic Computing Omer F. Rana (Cardiff University)

Derivative Mode• Acts as a breaking or damping action to

the controller response – as it overshoots

• Use of slope of error vs. time (rate of error change)

• Controller may be slower to reach required point (counters work of integral model controller)

m: output signal;Ti: Derivative timee: error (expected – actual)

Page 28: Autonomic Computing Omer F. Rana (Cardiff University)

Combining the three

• Output(t) = P + I + D

K_p = K; K_i = (K/T) ; K-D = KT_d

Page 29: Autonomic Computing Omer F. Rana (Cardiff University)

Run-time Control agent

• Implements an error feedback controller

• Makes use of a (1) desired, and (2) actual system utilisation

• Kp and Ki matrices obtained by the controller design agent

• Controller performance– Time to recover from a

workload change in the system

e=error between actualand desired value at kth interval

Accumulated error

Kp = proportional control gain, Ki = integral control gainFor stead state error

Page 30: Autonomic Computing Omer F. Rana (Cardiff University)

Controller Design Agent

• Relies on output of modelling agent

• Aims to minimise a quadratic cost function (J(Kp,Ki))

• Q, R are weighting matrices: Q is a 2x2 matrix and R is a 4x4 matrix

• Q = diag(q1,q2,q3,q4), and R=diag(r1,r2)– q1=1, q2=2, q3=(1/10^2),

q4=(1/2^2) (10% random CPU fluctuation, and 2% memory)

– r1=(1/50^2), r2=(1/1000^2)

Page 31: Autonomic Computing Omer F. Rana (Cardiff University)

Implementation• Undertaken with ABLE – extend AutoTune agent• Modelling agent

– Data generator extends AutotuneController bean (extends the process() method)

– ApacheAdaptor extends AutotuneAdaptor bean (implements socket connection with Apache Web server)

• Run-time Controller agent – Extends the AutotuneController bean– Also uses the ApacheAdaptor

• Controller Design agent– Extends the AutotuneController bean– Extends AutotuneAdaptor to read in model

parameters from Modelling agent

Page 32: Autonomic Computing Omer F. Rana (Cardiff University)

Experiment setup• Linux (v2.2.16) Apache HTTP v1.3.19• MaxClient and KeepAlive parameters to be

dynamically modifiable • Multiple clients supporting workload generator

– WAGON (Web trAffic GeneratOr and beNchmark) – Liu et al. (INRIA)

– Httperf to generate synthetic HTTP requests– File access distributions from Webstone 2.5

• Static and Dynamic workloads used – Static: Web page requests – session arrivals followed

a Poisson distribution (20 sessions/second)– Dynamic: Web page requests – session arrivals

followed a Poisson distribution (10 sessions/second)• Control Parameters

– Control interval (adaptation time): 5 seconds – Goal: CPU=0.5 and Memory=0.6

Page 33: Autonomic Computing Omer F. Rana (Cardiff University)

Automatic tuning of Apache Web Server (about 50 control intervals to converge)

Page 34: Autonomic Computing Omer F. Rana (Cardiff University)

With Dynamic Workload (at 20th Interval) – takes 20 intervals to adjust

Page 35: Autonomic Computing Omer F. Rana (Cardiff University)

Types of system components

• Computer Servers

• Web Servers

• Database systems

• Devices– Pervasive Computing– Ubiquitous Computing

Page 36: Autonomic Computing Omer F. Rana (Cardiff University)

Upgrades and Problem Diagnosis

FaultyModules

Page 37: Autonomic Computing Omer F. Rana (Cardiff University)

Upgrades and Problem Diagnosis

• Upgrade has 5 new autonomic modules

• Three modules found to be faulty (system reverts to old version)

• Analyse module dependencies

• Analyse log files to infer which of the three modules is the culprit

• Generate a “problem ticket” to software developer

Page 38: Autonomic Computing Omer F. Rana (Cardiff University)

QoS Management• QoS has been explored in:

– Computer Networks• Bandwidth, Delay, Packet loss rate and Jitter.

– Multimedia Applications• Frame rate and computation resource.

– Grid Computing• Network QoS, computation and storage

requirements.

Page 39: Autonomic Computing Omer F. Rana (Cardiff University)

Continue …

• QoS management:– Covers a range of different activities, from resource

specification, selection and allocation through to resource release.

• QoS system should address the following:– Specifying QoS requirements– Mapping of QoS requirements to resource capability– Negotiating QoS with resource owners– Establishing contracts / SLAs with clients– Reserving and allocating resources– Monitoring parameters associated with QoS sessions– Adapting to varying resource quality characteristics– Terminating QoS sessions

• User Expectations vs. Resource Management

Page 40: Autonomic Computing Omer F. Rana (Cardiff University)

When QoS is needed?

• Interactive sessions– Computation steering (control parameters & data

exchange)– Interactive visualization (visualization & simulations

services)

• Response within a limited time span• Co-scheduling or co-location support

From SCIRun, University of Utah

– Application QoS–User perception, response time, appl. Security, etc.– Middleware QoS–Comp., Memory and Storage– Network QoS–BW, Packet loss, Delay, Jitter

Page 41: Autonomic Computing Omer F. Rana (Cardiff University)

What is a Service Level Agreement (SLA) and why is useful for AC?

Client Provider

Can youdo X for mefor Y in return?

Yes

SLASLA

Distinguish between: Discovery of suitable provider Establishment of an SLA

P2P Search,Directory Service

SLA-Offer

SLA-AcceptSLA-Reject

A relationship between a client and provider in the context of a particularcapability (service) provision

SLA as a basis to support adaptive behaviour

Page 42: Autonomic Computing Omer F. Rana (Cardiff University)

What is an SLA?

Client Provider

Can youdo X for mefor Y in return?

No, but Ican do Zfor Y

SLASLA

Accept

SLA-CounterOffer

SLA-Offer

SLA-AcceptSLA-Reject

Page 43: Autonomic Computing Omer F. Rana (Cardiff University)

What is an SLA?

Client Provider

Can youdo X for mefor Y in return?

No

SLASLA

Can youdo Z for mefor Y in return?

NegotiationPhase(Single orMulti-Round)

SLA-Offer

SLA-CounterOffer

SLA-OfferDependency

Page 44: Autonomic Computing Omer F. Rana (Cardiff University)

Variations

Client

Providers

SLA

Client

Providers

SLA SLA

Multi-provider SLA

Single SLA is dividedacross multiple providers(e.g. workflow composition)

SLA dependencies

For an SLA to be valid, anotherSLA has to be agreed(e.g. co-allocation)

Page 45: Autonomic Computing Omer F. Rana (Cardiff University)

• Dynamically established and managed relationship between two parties

• Objective is “delivery of a service” by one of the parties in the context of the agreement

• Delivery involves:– Functional and non-functional properties of service

• Management of delivery:– Roles, rights and obligations of parties involved

What is an SLA?

Page 46: Autonomic Computing Omer F. Rana (Cardiff University)

Forming the Agreement

• Distinguish between:– Agreement itself – Mechanisms that lead to the formation of the

agreement

• Mechanisms that lead to agreement:– Negotiation (single or multi-shot)– One-shot creation– Policy-based creation of agreements, etc.

Page 47: Autonomic Computing Omer F. Rana (Cardiff University)

SLA Life Cycle• Identify Provider

– On completion of a discovery phase

• Define SLA– Define what is being requested

• Agree on SLA terms– Agree on Service Level Objectives

• Monitor SLA Violation– Confirm whether SLO’s are being violated

• Destroy SLA– Expire SLA

• Penalty for SLA Violation

Page 48: Autonomic Computing Omer F. Rana (Cardiff University)

WS-Agreement• Framework for SLA creation – interface

conforming to Web Services standards

• Service Client/Provider does not need to be a Web Service

• Provides a two layered model:– Agreement layer: Web Service-based

interface to create, represent and monitor agreements

– Service layer: Application specific-layer of service being provided

Page 49: Autonomic Computing Omer F. Rana (Cardiff University)

WS-Agreement

Agreement Initiator may be Service Consumer or Service Provider

ServiceLayer

AgreementLayer

Page 50: Autonomic Computing Omer F. Rana (Cardiff University)

WS-Agreement

Name/ID

Context

Terms Composition

Guarantee Terms

Service Terms

AgreementInformation about AgreementInitiatorResponderExpiration Time

Information about ServiceService Description Terms(generally, these are domaindependent)

Information about ServiceLevelService Level Objectives,Qualifying Conditions for the agreement to be valid,Penalty Terms, etc

Page 51: Autonomic Computing Omer F. Rana (Cardiff University)

WS-Agreement Terms

From: Viktor Yarmolenko (U Manchester)

Page 52: Autonomic Computing Omer F. Rana (Cardiff University)

WS-Agreement• Specification for Service Level Agreements

– Developed through GRAAP WG at the Open Grid Forum

– WSLA (from IBM) – previous efforts

• Provides:– Schema for agreement terms – A very simple protocol (two stage)– A state sequence – Support penalty clauses

• No support for negotiation

WS-Agreement Specification Document (GFD.107)

Page 53: Autonomic Computing Omer F. Rana (Cardiff University)

Data Center Scenario … 1• Identical servers – dynamically allocated among

multiple Web apps • For each application:

– Application Manager (performance optimiz.)

Interacting with a Resource Arbiter (server allocation)– Optimisation goal (“expected business value”) defined

by an “objective function”

• Resource Arbiter goal:– Allocate servers to maximise sum of expected

business value over all applications– Local value functions must share a common scale

Page 54: Autonomic Computing Omer F. Rana (Cardiff University)

Data Center Scenario … 2

Use of ReinforcementLearning

Resource Arbiter goal: allocate servers to maximize the sum of expected businessValue over all applications (assuming a common scale).

A Hybrid Reinforcement Learning Approach to Autonomic Resource AllocationGerald Tesauro et al., Proceedings of ICAC 2006, Dublin, Ireland.

Vi(.): utility curveEstimate of expectedbusiness value;e.g. Payments-penalties

Arbiter assignslist of assignedservers

Page 55: Autonomic Computing Omer F. Rana (Cardiff University)

Not all SLAs are equal• App events for trade stock data• Customer classes:

– Gold customers: pay for data– Public customers: connected over Internet

• Public customers get less information than Gold• Gold customers expect reliable delivery

– Need for acks increasing overhead in system• Cannot alter flow rate to tolerate delays

– But can support “admission” control

Utility Abstract measure of benefit to user (seek to maximize this given available resources)

Page 56: Autonomic Computing Omer F. Rana (Cardiff University)

SLA Classes

Risk-Aware Limited Lookahead Control for Dynamic Resource Provisioning in Enterprise Computing Systems, Dara Kusic and Nagarajan Kandasamy, Proceedings of ICAC 2006, Dublin, Ireland.

Assumes the existenceof multiple QoSclasses

Page 57: Autonomic Computing Omer F. Rana (Cardiff University)

Control System Architecture

• r_alloc: rate to a flow when it enters system• n_alloc: number of consumers (admitted for each class)

Utility-aware Resource Allocation in an Event Processing System, Sumeer Bhola, Mark Astley, Robert Saccone and Michael Ward, Proceedings of ICAC 2006, Dublin, Ireland.

Page 58: Autonomic Computing Omer F. Rana (Cardiff University)

Control System Strategies• Assumes knowledge of some “good” (ideal) state• Move system towards the good/ideal state• Impacted by:

– Response time (current good state transition)– Variability in operational environment (stability of approach)– Execution time– Discrete domain (tuning options from a finite set)

• Feedback control– PID– Kalman filter

• Neural network-based control – Use of learning approaches

• Rule-based approaches – Use of event recognition and triggers

Page 59: Autonomic Computing Omer F. Rana (Cardiff University)

Kalman Filters• Discrete time linear dynamic systems• Modelled on a Markov chain (with noise)• Linear operator applied to state to generate a new state

Fk = state transition model appliedto previous state xk-1

Bk = control input model applied toControl vector uk

Wk: process noise (normally distributed)

Page 60: Autonomic Computing Omer F. Rana (Cardiff University)

Differentiated Quality of Service

SilverCustomer

GoldCustomer

PlatinumCustomer

SAN Manager

SilverPolicy

GoldPolicy

PlatinumPolicy

SANStorage

From Joe Bigus (IBM)

Page 61: Autonomic Computing Omer F. Rana (Cardiff University)

SAN Manager Scenario Overview

Uses new AbleRuleAgent as rules-based policy manager Models multiple quality of service levels (represented by rule sets)N systems are defined, each with associated QoS levelsRequests include system identifier and current utilizationThe SAN Manager: Looks up QoS for that system Invokes the corresponding QoS rule set Rule sets make recommendations that allocations are either unchanged, increased or decreased SAN Manager evaluates recommendations and changes allocations based on total capacity limit

From Joe Bigus (IBM)

Page 62: Autonomic Computing Omer F. Rana (Cardiff University)

Platinum QoS RuleSet // Low allocation : if Allocation is Low and Utilization is Low then RecommendedAction = NoAction; : if Allocation is Low and Utilization is Normal then RecommendedAction = NoAction; : if Allocation is Low and Utilization is High then RecommendedAction = IncreaseAllocation;

// Normal allocation : if Allocation is Normal and Utilization is Low then RecommendedAction = DecreaseAllocation; : if Allocation is Normal and Utilization is Normal then RecommendedAction = NoAction; : if Allocation is Normal and Utilization is High then RecommendedAction = IncreaseAllocation;

// High allocation : if Allocation is High and Utilization is Low then RecommendedAction = DecreaseAllocation; : if Allocation is High and Utilization is Normal then RecommendedAction = DecreaseAllocation; : if Allocation is High and Utilization is High then RecommendedAction = Send.Warning_LowMem; : if Allocation is positively High and Utilization is positively High then RecommendedAction = Send.Warning_CritMem;

From Joe Bigus (IBM)

Page 63: Autonomic Computing Omer F. Rana (Cardiff University)

From Joe Bigus (IBM)

Page 64: Autonomic Computing Omer F. Rana (Cardiff University)

Dynamic SLA

• Limitations of a single agreement– Modifications since agreement was in place

• Cost of doing re-establishment– Not fully aware of operating environment

• Flexibility in describing Service Level Objectives– Not sure what to ask for (not fully aware of the

environment in which operating)– Too many violations

Page 65: Autonomic Computing Omer F. Rana (Cardiff University)

Dynamic WS-Agreement• Case 1: Static Agreement

– Identify Service Description Terms,– Guarantee Terms, and – Service Level Objectives (SLOs)

• Case 2: Dynamic Agreement– Identify Service Description Terms,– Guarantee Terms: defined as ranges or as

functions– Service Level Objectives: defined as ranges

or as functions

Page 66: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 67: Autonomic Computing Omer F. Rana (Cardiff University)

Function-based SLA (Yarmolenko et al.)

• Express initial SLA-Offer as a function of provider capability

From: Viktor Yarmolenko

Page 68: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 69: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 70: Autonomic Computing Omer F. Rana (Cardiff University)

Guarantee terms as functions

From: Viktor Yarmolenko

Page 71: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 72: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 73: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 74: Autonomic Computing Omer F. Rana (Cardiff University)

From: Viktor Yarmolenko

Page 75: Autonomic Computing Omer F. Rana (Cardiff University)

SLA Classes

• Guaranteed– constraints to be exactly observed– SLA is precisely/exactly defined– adaptation algorithm/optimization heuristics

• Controlled-load– some constraints may be observed– Range-oriented SLA– optimization heuristics

• Best-effort– any resources will do– no adaptation support

Page 76: Autonomic Computing Omer F. Rana (Cardiff University)

SLA Adaptation

• Assume capacityTotal: C= CG + CA + CB

• ‘best effort’ can uses the adaptive capacity, as long as its not used by the ‘guaranteed’

• When QoS degrades for ‘guaranteed’ • Then adaptive is utilized to compensate for

the degradation

• ‘best effort’ can still utilize the remaining capacity of the adaptive, as long as its not used by the ‘guaranteed’

• When the congested capacity is restored, the adaptive capacity can be used entirely by the ‘best effort’

G A B

G BA

G A B

BAG

G BA

o Before invoking the adaptive function:o Ensuring that the request at time (t) the agreed upon in the SLAo Ensuring that the total capacities within all SLAs at time (t) CG

Aim: compensation for QoS degradation for

‘guaranteed’ class only

Page 77: Autonomic Computing Omer F. Rana (Cardiff University)

Grid Node

Reservation ManagerAllocation Manager

Policy Manager

QoS Grid Service

Resources

Grid QoS service interface

Page 78: Autonomic Computing Omer F. Rana (Cardiff University)

Main components

• Policy Manager– To provide dynamic info about the domain-specific

resource characteristics and policy

• Reservation Manger– To provide advance/immediate resource reservation

• Data structure contains reservation entries• Interact with policy manager for resource char.

• Allocation Manger– To interact with the underlying resource manager for

resource allocation (e.g DSRT, Bandwidth Broker)

Page 79: Autonomic Computing Omer F. Rana (Cardiff University)

UDDIe

QoS Broker

Grid node 1 Grid node 2 Grid node 3

QoS Discovery

Client's Appl.

QoS service

ReservationAllocation

Policy

QoS service

ReservationAllocation

Policy

QoS service

ReservationAllocation

Policy

SLASLA

SLA

Joint work withArgonne National Lab.(Gregor von Laszewski et al.)

Page 80: Autonomic Computing Omer F. Rana (Cardiff University)

Reservation Approaches

• Resource reservation / allocation based on two strategies:– Time-domain: reserve the whole ‘compute’

power of Grid node.• Guaranteed exclusive access

– Resource-domain: reserve a CPU slot of the Grid node.

• Shared access – guaranteed resource capacity• Suitable for light weight applications/services.

Page 81: Autonomic Computing Omer F. Rana (Cardiff University)

CoG QoS Broker

UDDIeJava CoG Kit Core

Applications Portals Swing Legacy

Allocation ManagerReservation Manager

CoG QoS Grid Service

Policy Manager

CPU

Network

Disk

QoS Handler

Reso

urce

sRe

sour

ces

Resource Mangrs.Resource Mangrs.

Serv

ice

Agr

eem

ent

Serv

ice

Agr

eem

ent

Client

Client

Grid

Grid

GT2 Handler GT3 Handler

UDDIe HandlerReput Handler

CoG

Rep

utat

ion

Ser

vice

G-QoSMArchitecture

G-QoSM

Page 82: Autonomic Computing Omer F. Rana (Cardiff University)
Page 83: Autonomic Computing Omer F. Rana (Cardiff University)

Implementation Status

• References:– Rashid Al-Ali, Kaizar Amin, Gregor von Laszewski, Omer Rana and David Walker. An OGSA-

Based Quality of Service Framework. Proceedings of the Second International Workshop on Grid and Cooperative Computing (GCC 2003), Shanghai, China, December 2003.

– Rashid Al-Ali, Omer Rana, David Walker, Sanjay Jha and Shaleeza Sohail. G-QoSM: Grid

Service Discovery Using QoS Properties. Computing and Informatics Journal , Special Issue on Grid Computing, 21 (4), 2002.

• The QoS implementation is open source available for download from the Java CoG site http://www.globus.org/cog/java

Page 84: Autonomic Computing Omer F. Rana (Cardiff University)

Application Integration

1. Prepare: QoS negotiation TaskReturns: Agreement ID

2. Prepare: QoS job submission Task

3. Submit job to QoS service

Page 85: Autonomic Computing Omer F. Rana (Cardiff University)

QoS Job Submission Taskprivate void prepareQosJobSubmissionTask(){ // create a QoS JobSumbission Task Task task = new TaskImpl(``myTask'', QoS.JOBSUBMISSION); this.task.setAttribute(``agreementToken'', token); // create a remote job specification JobSpecification spec = new JobSpecificationImpl();

// set all the job related parameters spec.setExecutable(``/rashid/myExecutable''); spec.setRedirected(false); spec.setStdOutput(``QosOutput'');

//associate the specification with the task task.setSpecification(spec);

// create a Globus version of the security context SecurityContextImpl securityContext = new GlobusSecurityContextImpl(); securityContext.setCredential(null); task.setSecurityContext(securityContext); Contact contact = new Contact(``myQoScontact'');

ServiceContact service = new ServiceContactImpl(qosServiceURL); contact.setServiceContact(``QGSurl'',service); task.setContact(contact);}

Page 86: Autonomic Computing Omer F. Rana (Cardiff University)

QoS Task Submission

/*** QoS: Task Submission to QoS Handler ***/

private void QosTaskSubmission(Task task){ TaskHandler handler = new QoSTaskHandlerImpl();

// submit the task to the handler handler.submit(task);}

Page 87: Autonomic Computing Omer F. Rana (Cardiff University)

With Globus Toolkit 2

Page 88: Autonomic Computing Omer F. Rana (Cardiff University)

Best Effort

Page 89: Autonomic Computing Omer F. Rana (Cardiff University)

Guaranteed

Page 90: Autonomic Computing Omer F. Rana (Cardiff University)
Page 91: Autonomic Computing Omer F. Rana (Cardiff University)

Web Services Distributed Management (WSDM)

• Management USING Web Services (MUWS)– Web services to describe and access manageability of

resources

– Management applications use Web services just like other applications use Web services

• Management OF Web Services (MOWS) – An application of Management Using Web Services

for the Web Service as the IT resource

• Use Web Services as the distributed computing platform to enable interoperability between managers and manageable resources

WSDM Presentation WSMF Presentation

Page 92: Autonomic Computing Omer F. Rana (Cardiff University)

WSDM

Page 93: Autonomic Computing Omer F. Rana (Cardiff University)

Disturbance Benchmarking

From Aaron Brown and Peter Shum (IBM)

Page 94: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 95: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 96: Autonomic Computing Omer F. Rana (Cardiff University)

Useful to comparethis with performancebenchmarks thatwe are much moreaware of

From Aaron Brown and Peter Shum (IBM)

Compare with automatedtesting mechanisms

Page 97: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 98: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 99: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 100: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 101: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 102: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 103: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 104: Autonomic Computing Omer F. Rana (Cardiff University)

From Aaron Brown and Peter Shum (IBM)

Page 105: Autonomic Computing Omer F. Rana (Cardiff University)

Behaviours and Interactions

• Interactions not “hard coded” – but expressed as high level objectives, eg. – Maximise this utility function– Find a reputable message translation service

• Autonomic Service providers can say “No”– Service provision must be consistent with

local policy and long term goals

• Policies may be expressed using logic or other formalisms

Page 106: Autonomic Computing Omer F. Rana (Cardiff University)

Emergence and Self-Organisation

• Increased complexity and autonomy implies that “global” coherent behaviours may be hard to specify

• Concept of “Emergence”• Interactions between autonomous systems that

can lead to useful global behaviours– How can we constrain each individual element within

such a system?– How can useful global behaviours be recognised

effectively?

Page 107: Autonomic Computing Omer F. Rana (Cardiff University)

Self Organisation

• Self-Organisation is a set of dynamical processes whereby structures or order appears at global level of a system from the interactions between the lower-level entities. The rules underlying the behaviour and that specify the interactions among the entities are implemented on the basis of local information, without any reference to the global pattern.

Page 108: Autonomic Computing Omer F. Rana (Cardiff University)

Emergence

• A dynamic, non-linear process that results in “macro-level” structures to form, based on interactions of system parts at the micro-level.

• Such emergence is “novel” – i.e. cannot be easily understood by taking the system apart and looking at the parts (reductionism)

Page 109: Autonomic Computing Omer F. Rana (Cardiff University)

Issues• Macro-Micro effect• Novelty

– Global behaviour is novel

• Coherence– Emergence has some sense of identity (i.e.

persists over some time)

• Dynamic– Emergence arise as system evolves over time

• Non-Linear• Distributed/Non-Centralised Control

– Not possible to control the entire system

Page 110: Autonomic Computing Omer F. Rana (Cardiff University)

Influences• Social Societies

– Emerging area of “Socionics”

• Biological Paradigms (Stigmergy)– Ant Colonies (Social Insects)– Swarms

• Particle Systems (fluidity and elasticity)– Chemical reactions– Spin Glass theory (due to temperature

changes)

Page 111: Autonomic Computing Omer F. Rana (Cardiff University)

Concepts of Utility

• What is considered “important”

• Value assigned to actions and operations

• Utility– Cost– Performance – Availability

• Some kind of “measurable” metric

Page 112: Autonomic Computing Omer F. Rana (Cardiff University)

Utility … 2• Payoff function

– assess behaviour of a particular action (reward signal)

• Analysis tool– relationship between local utility vs. utility of the

community

• Cost function– success w.r.t. a particular task

• Trust measure– measure of trust in a particular participant

Page 113: Autonomic Computing Omer F. Rana (Cardiff University)

Economic Utility: Metrics “Pyramid”

Page 114: Autonomic Computing Omer F. Rana (Cardiff University)

Utility OptimisationExpected Utility – E(x)

Infinite Horizon

Finite Horizon

0<<1

“U” may be negative

Long term rewards less useful

Page 115: Autonomic Computing Omer F. Rana (Cardiff University)

Social Insect Behaviour

• Self-organising Behaviour • The idea of simple behaviours interacting in a manner that produces a range

of interesting complex behaviours is very useful and exciting for designing complex systems :

• Positive Feedback (Autocatalytic) - Recruitment and Reinforcement

• Negative Feedback - Saturation, Exhaustion, or Competition• Fluctuations and Randomness - Random Walks, Errors,

Random Task-Switching etc.• Multiple Interactions

• Stigmergetic Behaviour• Waggle and Tremble dances (Bees)

From: Ashish Umre

Page 116: Autonomic Computing Omer F. Rana (Cardiff University)

Stigmergy

• Indirect communication via interaction with environment [Gassé, 59]– Sematonic [Wilson, 75] stigmergy

• action of agent directly related to problem solving and affects behavior of other agents.

– Sign-based stigmergy• action of agent affects environment not directly

related to problem solving activity.

Page 117: Autonomic Computing Omer F. Rana (Cardiff University)

Self-organised behaviour can be characterised by key properties like -

• The creation of spatiotemporal structures in an initially homogeneous medium, e.g. Nest Architectures, foraging trails, or social organisation.

• Multistability - possible coexistence of several stable states

• Existence of Bifurcations when some parameters are varied. (“Snowball effect”).

From: Ashish Umre

Page 118: Autonomic Computing Omer F. Rana (Cardiff University)

What do Ants do?• A few examples of collective behaviour that have been observed in

several species of Ants are: regulating nest temperature within limits of 1C; forming bridges; raiding particular areas of food; building and protecting their nest; sorting brood and food items; co-operating in carrying large items; emigration of a colony; complex patterns of egg and brood care; finding the shortest routes from nest to a food source; preferentially exploiting the richest available food source. task partitioning and division of labour

From: Ashish Umre

Page 119: Autonomic Computing Omer F. Rana (Cardiff University)

Ants in Nature

From: Ashish Umre

Page 120: Autonomic Computing Omer F. Rana (Cardiff University)

Adapting to Environment Changes

Page 121: Autonomic Computing Omer F. Rana (Cardiff University)

Pheromone Trails

D

E

H C

A

B

d=0.5

d=0.5

d=1.0

d=1.0

E

H

E

D

H C

A

B

30 ants

D

C

A

B

30 ants

15

ants

15

ants

15

ants

15

ants

30

ants

10

ants

20

ants

20

ants

10

ants

30

ants

T = 0 T = 1

Page 122: Autonomic Computing Omer F. Rana (Cardiff University)

What do Bees do?• Foraging Behaviour (Waggle

Dance)

• Task Partitioning and Division of Labour

• Scout-Recruit Concept (Tremble Dance)

• Group Decision Making and Colony Cooperation

• Regulating Hive temperature

• Communication : Food sources are exploited according to quality and distance from the hive

Page 123: Autonomic Computing Omer F. Rana (Cardiff University)

Waggle Dance

From: Ashish Umre

Page 124: Autonomic Computing Omer F. Rana (Cardiff University)

Wasps

• Pulp foragers, water

foragers & builders

• Complex nests

– Horizontal columns

– Protective covering

– Central entrance

hole

Page 125: Autonomic Computing Omer F. Rana (Cardiff University)

Pervasive Ants : Resource Discovery in Dynamic and Reconfigurable Networks

using Artificial Ants• Ants continuously explore new solutions

• Pulses “Drumming” used to update resource tables (The Modulatory Communication signal category of Drumming in the European Carpenter ants Camponotous herculeeanus and C. ligniperda. The worker ants strike the surface of the wooden chambers and galleries in which they live within their mandibles and gasters, producing vibrations that can be perceived by nestmates for 20 centimetres or more. Much, of the behaviour is classifiable as direct alarm communication. The behaviour of some categories is “tightened up”. Transition probabilities are raised, and hence uncertainty is reduced. The modulatory communication appears to be a primitive phenomenon in ants and other social insects.)

• Adaptive to continuous node failure and addition of new nodes and resources, and change in traffic conditions

From: Ashish Umre

Page 126: Autonomic Computing Omer F. Rana (Cardiff University)

Ant-Based Control Introduction

• Ant Based Control (ABC) is introduced to route calls on a circuit-switched telephone network– ABC is the first SI routing algorithm for

telecommunications networks• 1996

R. Schoonderwoerd, O. Holland, J. Bruten, L. Rothkranz, Ant-based load balancing in telecommunications networks, 1996.

Page 127: Autonomic Computing Omer F. Rana (Cardiff University)

ABC: Overview

• Ant packets are control packets• Ants discover and maintain routes

– Pheromone is used to identify routes to each node– Pheromone determines path probabilities

• Calls are placed over routes managed by ants• Each node has a pheromone table maintaining

the amount of pheromone for each destination it has seen– Pheromone Table is the Routing Table

Page 128: Autonomic Computing Omer F. Rana (Cardiff University)

ABC: Route Maintenance

• Ants are launched regularly to random destinations in the network

• Ants travel to their destination according to the next-hop probabilities at each intermediate node– With a small exploration probability an ant will

uniformly randomly choose a next hop

• Ants are removed from the network when they reach their destination

Page 129: Autonomic Computing Omer F. Rana (Cardiff University)

ABC: Routing Probability Update

• Ants traveling from source s to destination d lay s’s pheromone– Ants lay a pheromone trail back to their

source as they move– Pheromone is unidirectional

• When a packet arrives at node n from previous hop r, and having source s, the routing probability to r from n for destination s increases

Page 130: Autonomic Computing Omer F. Rana (Cardiff University)

Ant Algorithm

An ant in the network launched at node 3 with destination node 2, and has just travelled from node 4 to node 1. This ant will first alter node 1’s table corresponding to node 3 (its source node) by increasing the probability of selection ofnode 4; it will then select its next node randomly according to the probabilities in the table corresponding to its destination node, node 2.

•Every node has a pheromone table for every destination node in the network•A node with four neighbours in a 30-node network has 29 pheromone tables with four entries each.

Ants going from node 1 to 3

Page 131: Autonomic Computing Omer F. Rana (Cardiff University)

Updating Pheromone table• Ants can be launched from any node• Select next node according to probabilities

in the pheromone table for their destination nodes

• When ants arrive at a node – they update the probabilities of that node’s pheromone table (corresponding to their source node)

• Alter table to increase probability pointing to their previous node

• On reaching destination – ants die

Page 132: Autonomic Computing Omer F. Rana (Cardiff University)

Update law

• P = new probability (or pheromone) increase

• Probability can be reduced by operation of normalization (increase in another cell in table)

• Prob. can approach zero but never reaches it

Page 133: Autonomic Computing Omer F. Rana (Cardiff University)

Ant Algorithm

r

rtrtr

imsi

ms

1

)()1( ,

,

r

trtr

ilsi

ls

1

)()1( ,

.

r = 0.25 age

This equation specifies the new reinforced weight for the relevant node that corresponds to the ant’s last node

This equation specifies the weight for all other weights that do not correspond to the ant's last node

This equation specifies the reinforcement parameter that is employed in first two equations

From: Ashish Umre

Page 134: Autonomic Computing Omer F. Rana (Cardiff University)

Ageing• Delta_p changes with the age of the ant

– Age == path length (each hop increases ants age)

– Ants moving along shorter routes have higher age

– Age == delay of ants at nodes that are congested

– Delay ants age increases quicker

• As flow rate of ants to neighbours decreases – prevents ants from affecting pheromone table

Page 135: Autonomic Computing Omer F. Rana (Cardiff University)

ABC: Route Selection (Call Placement)

• When a call is originated, a circuit must be established

• The highest probability next hop is followed to the destination from the source

• If no circuit can be established in this way, the call is blocked

• Calls operate independently of ants

Page 136: Autonomic Computing Omer F. Rana (Cardiff University)

ABC: Initialization

• Pheromone Tables are randomly initialized• Ants are released onto the network to

establish routes• When routes are sufficiently short, actual

calls are placed onto the network• Calls and ants dynamically interact • New calls influence load on nodes

influences the ants by means of a delay mechanism

Page 137: Autonomic Computing Omer F. Rana (Cardiff University)

Relationship between calls, node utilisation, pheromone tables and ants. An arrow indicates the direction of influence

Page 138: Autonomic Computing Omer F. Rana (Cardiff University)

From: Ashish Umre

Page 139: Autonomic Computing Omer F. Rana (Cardiff University)

Average Packet Delay (With the Algorithm)

From: Ashish Umre

Page 140: Autonomic Computing Omer F. Rana (Cardiff University)

Average Packet Delay(Without Algorithm)

From: Ashish Umre

Page 141: Autonomic Computing Omer F. Rana (Cardiff University)

Packet and Pulse Loss (With the Algorithm)

From: Ashish Umre

Page 142: Autonomic Computing Omer F. Rana (Cardiff University)

Packet and Pulse Loss (Without the Algorithm)

From: Ashish Umre

Page 143: Autonomic Computing Omer F. Rana (Cardiff University)

Design Concerns

• Swarm Intelligent Systems are hard to

‘program’ since the problems are usually

difficult to define

– Solutions are emergent in the systems

– Solutions result from behaviors and

interactions among and between individual

agents

Page 144: Autonomic Computing Omer F. Rana (Cardiff University)

Summary of ABC• Ants regularly launched with random destinations • Ants walk randomly according to probabilities in pheromone

tables for their particular destination• Ants update the probabilities in the pheromone table for the

location they were launched• from, by increasing the probability of selection of their previous

location by subsequent ants.• The increase in these probabilities is a decreasing function of

the age of the ant, and of the original probability.• This probability increase could also be a function of penalties

or rewards the ant has gathered on its way.• The ants get delayed on parts of the system that are heavily

used.• The ants could eventually be penalised or rewarded as a

function of local system utilisation.• To avoid overtraining through freezing of pheromone trails,

some noise can be added to the behaviour of the ants.

Page 145: Autonomic Computing Omer F. Rana (Cardiff University)

Possible Solutions to Create Swarm Intelligence Systems

• Create a catalog of the collective behaviours • Model how social insects collectively perform

tasks– Use this model as a basis upon which artificial

variations can be developed– Model parameters can be tuned within a biologically

relevant range or by adding non-biological factors to the model

Page 146: Autonomic Computing Omer F. Rana (Cardiff University)

What are Ad Hoc Networks?

• Ad Hoc networks are

– self-organising multi-hop wireless networks;– no fixed infrastructure, such as base stations

or routers, is required;– ad hoc networks are rapidly deployable

networks;– all mobile hosts are embedded with packet

forwarding capabilities;

From: Ashish Umre

Page 147: Autonomic Computing Omer F. Rana (Cardiff University)

Current Routing Algorithms for Ad hoc Mobile Wireless Networks

• Table Driven routing Protocols:• Destination-Sequenced Distance Vector Routing (DSDV)

• Clustered Gateway Switch Routing (CGSR)

• The Wireless Routing Protocol (WRP)

• Source-Initiated On-Demand Routing:• Ad hoc On-Demand Distance Vector Routing (AODV)

• Dynamic Source Routing (DSR)

• Temporally-Ordered Routing Algorithm (TORA)

• Associativity-Based Routing (ABR)

• Signal Stability Routing (SSR)

From: Ashish Umre

Page 148: Autonomic Computing Omer F. Rana (Cardiff University)

Four Ingredients of Self Organization

• Positive Feedback

• Negative Feedback

• Amplification of Fluctuations - randomness

• Reliance on multiple interactions

Page 149: Autonomic Computing Omer F. Rana (Cardiff University)

Positive Feedback

Positive Feedback reinforces good solutions

• Ants are able to attract more help when a food source is found

• More ants on a trail increases pheromone and attracts even more ants

Page 150: Autonomic Computing Omer F. Rana (Cardiff University)

Negative Feedback

Negative Feedback removes bad or old solutions from the collective memory

• Pheromone Decay

• Distant food sources are exploited last– Pheromone has less time to decay on closer

solutions

Page 151: Autonomic Computing Omer F. Rana (Cardiff University)

Randomness

Randomness allows new solutions to arise and directs current ones

• Ant decisions are random– Exploration probability

• Food sources are found randomly

• Initially an ant will attempt to follow a random path to “explore” possible food sources

Page 152: Autonomic Computing Omer F. Rana (Cardiff University)

Multiple Interactions

No individual can solve a given problem. Only through the interaction of many can a solution be found

• One ant cannot forage for food; pheromone would decay too fast

• Many ants are needed to sustain the pheromone trail

• More food can be found faster• “Swarm” behaviour

Page 153: Autonomic Computing Omer F. Rana (Cardiff University)

Stigmergy

in

Action

This general “Clustering” behaviour is a key themein such approaches

Page 154: Autonomic Computing Omer F. Rana (Cardiff University)

Ants Agents

• Stigmergy can be operational– Coordination by indirect interaction is

more appealing than direct communication

– Stigmergy reduces (or eliminates) communications between agents

Page 155: Autonomic Computing Omer F. Rana (Cardiff University)

SI Advantages for Routing

SI based algorithms generally enjoy:• Multipath routing

– Probabilistic routing will send packets all over the network

• Fast route recovery– Packets can easily be sent to other neighbors by

recomputing next-hop probabilities

• Low Complexity– Little special purpose information must be maintained

aside from pheromone/probability information

Page 156: Autonomic Computing Omer F. Rana (Cardiff University)

More SI Advantages for Routing

• Scalability– As with any colonies numbering in the

millions, SI algorithms can potentially scale across several orders of magnitude

• Distributed Algorithm– SI based algorithms are inherently distributed

Page 157: Autonomic Computing Omer F. Rana (Cardiff University)

SI Disadvantages for Routing

SI also suffers from:

• Directional Links– Bidirectional links are generally assumed by

using reverse paths

• Novelty– SI is a relatively new approach to routing. It

has not been characterized very well, analytically

Page 158: Autonomic Computing Omer F. Rana (Cardiff University)

Pharaoh Ant (Monomorium Pharaonis)

• Colony Behaviours• Multiple Queening• Nest Conflict and

Cooperation• Migration• Clustering

• Analogies• Resource Allocation,

Discovery and Sharing• Adaptive Clustering

From: Ashish Umre

Page 159: Autonomic Computing Omer F. Rana (Cardiff University)

Current Issues in Mobile Agent Technologies

• Application Issues• Jumping Agents (Shopping, Taxi/Airport)• Location Sensitive (Bluetooth, HomeRF)• Profile Oriented

• Deployment Issues• Is the Infrastructure ready?

• Security Issues• Physical Mobility • Logical Mobility

From: Ashish Umre

Page 160: Autonomic Computing Omer F. Rana (Cardiff University)

Mobile Agents• Generalizing the “ant” based approach as a mobile agent• A paradigm based on code mobility

– Remote Evaluation – Code-on-demand (the Java Applet model)– Peer-2-Peer

• Migrate from one host to another “autonomously”– “Intelligent Viruses”? (do we really want these?)– Lead to security nightmares– Require writing in obscure languages (Tcl, Java etc)

• Provide an interesting paradigm for Grid computing– Assuming other Grid infrastructure is there

Page 161: Autonomic Computing Omer F. Rana (Cardiff University)

How do they differ from other DC paradigms

• Host supported mobility vs. autonomous migration – weak vs. strong mobility (Bradshaw and Suri’s work

on Nomad, vs. Aglets or Voyager)• What’s in a message?

– state– code or data

• How large should be a mobile agent • Tracking a mobile agent (forwarders, location service,

pheromone trails)• Host assisted

– state persistence (vs. soft state)– introspection

Page 162: Autonomic Computing Omer F. Rana (Cardiff University)

The overhyped differences between mobile objects and agents

• Mobile objects do not migrate autonomously– control transfer issues

• Mobile objects generally part of some application– limited or no access to a separate execution context

• Mobile object granularity is generally much finer– agents must carry code to interact with host (context

or place)• Mobile objects do not support a well defined API

– such as moveTo, retract, dispatch etc• Division of application into agents vs. objects will be

different • Absence of any standard framework

Page 163: Autonomic Computing Omer F. Rana (Cardiff University)

The overhyped reasons for why mobile agents are (apparently) useful

• Reduce in network load • Overcome network latency • Can encapsulate a protocol • Can execute autonomously and asynchronously • Can dynamically adapt their itinerary • May be heterogeneous • Are robust and can sustain faults in their environment

and why not … • all of the above can be done via messaging• too many security issues to be useful • unlike to support host platforms (standardisation has not

resulting in anything useful) • too hard to code, and abstraction is not obvious

Page 164: Autonomic Computing Omer F. Rana (Cardiff University)

Standardisation• MASIF (Mobile Agent System Interoperability Facility)

– Crystaliz, General Magic, IBM, GMD Fokus, Open Group

• Address interface between agent systems, and not agent applications

• MASIF Aim: Enable mobile agents to travel across various hosts in an open environment

• Support for locating an agent (MAFFinder)

• Released via OMG

Page 165: Autonomic Computing Omer F. Rana (Cardiff University)

MASIFStandardise on four areas:• Agent Management

– use of standard operations to manage agents from different vendors

• Agent Transfer– use of standard operations to create and migrate

agents from different agent systems• Agent and Agent System Naming

– use of standard Syntax and Semantics of parameters– part of MAFFinder

• Agent System Type and Location Syntax– use of standard syntax for location– part of MAFFinder

Page 166: Autonomic Computing Omer F. Rana (Cardiff University)

MASIF … 2void create_agent (

in Name agent_name,

in AgentProfile agent_profile,

in OctetString agent,

in string place_name,

in Arguments arguments,

in ClassNameList class_names,

in string code_base,

in MAFAgentSystem class_provider)

raises (ClassUnknown, ArgumentInvalid,

SerializationFailed,MAFExtendedException);

IDL Definition

Page 167: Autonomic Computing Omer F. Rana (Cardiff University)

MASIF … 3Location find_nearby_agent_system_of_profile(

in AgentProfile profile)

raises (EntrynotFound);

void resume_agent(

in Name agent_name_

raises (NameInvalid, ResumeFailed);

void list_all_agents_of_authority(

in Authority authority) ;

NameList list_all_agents() ;

Location list_all_places() ;

IDL Definition

Page 168: Autonomic Computing Omer F. Rana (Cardiff University)

MASIF … 4interface MAFFinder{

void register_agent(

in Name agent_name,

in Location agent_location,

in AgentProfile agent_profile)

raises (NameInvalid);

void register_agent_system(

in Name agent_system_name,

in Location agent_system_location,

in AgentSystemInfo agent_system_info)

raises (NameInvalid);

IDL Definition

Page 169: Autonomic Computing Omer F. Rana (Cardiff University)

MASIF … 5Location lookup_agent(

in Name agent_name,

in AgentProfile agent_profile)

raises (EntryNotFound);

Location register_place(

in string place_name,

in Location place_location)

raises (NameInvalid);

IDL Definition

Page 170: Autonomic Computing Omer F. Rana (Cardiff University)

At each host ...• An Agent Server

– one or more such servers can co-exist on a particular machine

– an agent server must be identifiable by a unique URL– must also be able to launch and subsequently support

tracking of the agent• System support for migratable, non-persistent code

– memory, CPU• System support for handling local security policy

– sandbox, authentication/access control mechanism, certificate verification mechanism, etc

Page 171: Autonomic Computing Omer F. Rana (Cardiff University)

MA Lifecycle

A

A

dispatch

retract

create

Class file

Class file

deactivate activate

dispose

Based on IBM Aglets

Page 172: Autonomic Computing Omer F. Rana (Cardiff University)

Why are they useful in Grids? • Important code delivery paradigm

• Must operate in the context of existing Grid systems

– may alleviate some issues with mobility

• Support essential needs of Grid computing

– software and protocol updates

– load balancing and migration

– user migration

• Most importantly -- they support a “Demand Oriented” style of computing

– move computation and data “on demand”

– move a limited set of functionality “on demand”

Page 173: Autonomic Computing Omer F. Rana (Cardiff University)

Achieving Parallelism• Mobile Agents also useful to support parallelism at a

coarser granularity

– simultaneous dispatch of agents to multiple sites

– simultaneous dispatch of messages to multiple sites via specialised group formation (aspect of “Spaces” -- formed through multicast groups)

– Integration with existing message passing libraries (MPI or PVM) via the host machine

• Achieved parallelism can be more dynamic

– Agents can decide where to migrate vs. pre-defined message transfer based on MPI or PVM

• May not be useful for “production grade” parallelism

Page 174: Autonomic Computing Omer F. Rana (Cardiff University)

Supporting Mobility• Object Identity: Killing old object as copy sent to a

remote host (address space) -- use of Java garbage collection when no references exist to object

– mobile object pool

• Object Serialisation: what happens to private, transient and state variables -- when to move?

– Java.io.serializable

– serialization of threads?

• State synchronisation and sharing: HORUS -- object server?

• Concurrency through Actors (objects that own their own thread) -- Actors are non-blocking

Page 175: Autonomic Computing Omer F. Rana (Cardiff University)

Explicit Serialization• Via the Externalizable interface in Java

– must be manually implemented by programmer

– can customise how an object’s fields are mapped to a stream

– means of checkpointing state (includes object’s field values + metadaat about class version, and field types)

– Write out all visible states of a thread to a stream, read back state, initiate a thread

• Consider method invocation as a “single” unit of computation

– allow thread read only before or after a method invocation (i.e. no active threads)

• Access to stack variables

– stack variables made part of object’s state

Page 176: Autonomic Computing Omer F. Rana (Cardiff University)

Custom Classloaders• Can also implement custom classloaders• Classloader used to:

– dynamically determine which code to migrate– which code should be released – how code interacts with the operating environment

• Classloaders are a useful way to extend existing Grid systems – use of the CoG Java toolkit or OGSA to link to Globus – interactivity between existing scheduling systems

• Offer class loading features as a Grid Service– characterised by application features?

• Classloaders take away intelligence from migrating code -- hence not the ideal solution

Page 177: Autonomic Computing Omer F. Rana (Cardiff University)

Write your own Classloader()

• Extend “Primordial Classloader” in Java – invoked after calling main() method– Matrix m = new Matrix() ; -- execute “new”

bytecode– System.out.println() -- invoke static

reference to class (putstatic, getstatic etc)• Class loaders enable Java apps (EMACS or Scientific

codes) to be dynamically extended• Byte code verifier - defineClass, ClassFormatError

• Package over-write/addition: java.lang.hackit -- protect system namespace

• Multiple Classloaders can co-exit

Page 178: Autonomic Computing Omer F. Rana (Cardiff University)

Dynamic Itinerary • A mobile agent may visit a number of hosts• This itinerary may change over time

– based on data collected at intermediate hosts– may not return to host machine

• Itinerary may be dictated by a particular host – agent may override this

• Dynamic itinerary useful in Grid context– load may not be known beforehand– hosts may not always be available or reliable – services may not always be present– users/experts may migrate

Page 179: Autonomic Computing Omer F. Rana (Cardiff University)

Locating an agent• Use of proxy

– local proxy to track agent

• Forwarders– creating a chain of non-persistent forwarders– pheromone based approaches

• A location service– event notification service – query service

Page 180: Autonomic Computing Omer F. Rana (Cardiff University)

Application scenario: Load gathering• Sensors measure network load

– similar to SNMP • Report this to an event gateway and monitor this at a given

control site• JAMM system an example

– other work taking place in the Global Grid Forum Network Monitoring group

• Mobile agent may be used to gather load – carry a schema for gathering parameters– interact via local host to SNMP gateway – record local parameters and carry statistics – pass through a given host to lodge results– itinerary may be application dependent

Page 181: Autonomic Computing Omer F. Rana (Cardiff University)

Java Agent Measurement and Monitoring (JAMM) - LBNL

Page 182: Autonomic Computing Omer F. Rana (Cardiff University)

JAMM scenario

Page 183: Autonomic Computing Omer F. Rana (Cardiff University)

Load gathering

Page 184: Autonomic Computing Omer F. Rana (Cardiff University)

Application Profiles• Application categories:

– restrict itinerary – identify common patterns

• Resource suggestions– identify common patterns– resource characteristics

• MA-MA interaction– used to inform about other resources– share application requirements– determine commonality in applications

Page 185: Autonomic Computing Omer F. Rana (Cardiff University)

Load imposed by Mobile Agents• MA performance becomes an issue• Issues

– where should a mobile agent visit next? – What should the mobile agent carry vs. leave behind?– How long does a mobile agent spend on a given host?– How long does it take for a mobile agent to visit from

A->B• Need for tools that can help gather this data

– Recorded within each agent – Support for specialised services which gather this – Data can be queried based on MA authorisation

Page 186: Autonomic Computing Omer F. Rana (Cardiff University)

David Kotz, Guofei Jiang et al. (Dartmouth College)

Page 187: Autonomic Computing Omer F. Rana (Cardiff University)

Fernando Pinel, Omer F. Rana (Cardiff)

Page 188: Autonomic Computing Omer F. Rana (Cardiff University)

Benchmarking• MA benchmarking efforts also important in this context.• Benchmarks can be micro-

– create (locally or remotely) and dispatch an agent– Retrieve an agent – blocking and non-blocking message exchanges

• or macro-– forwarding– roaming – proxy servers

M. Dikaiakos, M. Kyriakou, G. Samaras, "Performance Evaluation of Mobile-agent Middleware: A Hierarchical Approach." In Proceedings of the 5th IEEE International Conference on Mobile Agents, J.P. Picco (ed.), Lecture Notes of Computer Science series, vol. 2240, pages 244-259, Springer, Atlanta, USA, December 2001

Page 189: Autonomic Computing Omer F. Rana (Cardiff University)

Additional uses: Consumer Grids• More open perspective on Grids

• Individuals and organisations can operate as suppliers of services/resources

• Service providers must be able to:

– Dynamically download software to participate on the Grid

– Varying resource capabilities

– Dynamically determine resource properties

• Resource aware visualisation

– Remotely configure resource

• Mobile agents provide an important abstraction

• Many existing technologies are useful contenders: Peer-2-Peer and Web Services

Page 190: Autonomic Computing Omer F. Rana (Cardiff University)

Resource sharing• Peer-2-Peer

– CPU sharing (Entropia, Parabon, UD, SETI@HOME)– File sharing (Napster, Gnutella, Freenet)

• CPU sharing– Utilisation of free cycles via standard downloads– Requires upload of data on which to operate– Generally high redundancy and replication

• File Sharing– Search for common file types, and support file

placement– Use of indexing or intermediate servers

• Development libraries: JXTA

Page 191: Autonomic Computing Omer F. Rana (Cardiff University)

Resource Sharing … 2• In MA:

– CPU sharing: migration of mobile agent– File sharing: migration of associated data and state

• Migration and execution can be more intelligent• Use of forwarding and location services can be coupled

with additional services:– Work distribution and current state of computation– Resource events to support migration

• P2P infrastructure also useful:– Development of itineraries via overlay networks or

index servers– Security issues (?)

Page 192: Autonomic Computing Omer F. Rana (Cardiff University)

File Space Management

• Cache management– migration support for files (temporary results,

configuration etc)• File space re-ordering

– sharing of directory space across machines– virtual “file stores”

• Results to common queries– file placement closer to computation– file replication to support availability levels

• Managing user and project groups

Page 193: Autonomic Computing Omer F. Rana (Cardiff University)

Common Themes• Load balancing and migration

• Data capture (especially performance related)

• Trigger and configuration – set up of execution at remote sites– updates to execution or changes– user set up

• Establishing dynamic resource groups

• Resource provisioning beyond regional and national centres

Page 194: Autonomic Computing Omer F. Rana (Cardiff University)

Concerns• Dealing with licensed software

– proprietary code or data

• Dealing with production codes

– highly tuned performance

– issues of Grid computing are questionable here

• Domain decomposition

– issues in translating large scale codes to mobile agents

– where is the abstraction most suitable/relevant

• Interfaces between Grid systems and Mobile Agent systems

Page 195: Autonomic Computing Omer F. Rana (Cardiff University)

Issues … Swarm/Ant Systems

• Tragedy of the Commons: Self Organisation does not always produce the desired outcome (Thomas Schelling's Micromotives and Macrobehavior):– El Farol Bar problem– Sheep Grazing problem

• Some individuals and organizations are more comfortable and moreefficient with hierarchical organizations that are more centrallycontrolled

Page 196: Autonomic Computing Omer F. Rana (Cardiff University)

Issues … 2

• Useful in an “experiment” and “explorative” environment

• System must be “non-conservative” in its approach to experiment and evaluate different system behaviours

Page 197: Autonomic Computing Omer F. Rana (Cardiff University)

El Farol Bar … 2

• Agents select a night (1—7) – based on expected attendance or reward (from prior experience)

• Agent attends the bar– Attendance on selected night – Output of the reward function

• Update agent’s model of the system• Agents cannot communicate with each other• Global objective: Maximise cumulative reward of

entire system

Page 198: Autonomic Computing Omer F. Rana (Cardiff University)

Tragedy of the Commons

• Self-interested gain of one member of the community is to the detriment of the whole community

• Pasture on which each agent keeps cattle– Utility increases as number of animals

increase– Overgrazing affects all agents detrimentally

• Agent needs to decide whether to cooperate or defect

Page 199: Autonomic Computing Omer F. Rana (Cardiff University)

Braess’ Paradox

• Agents traverse a network consisting of a set of nodes – and a number of connections between the nodes

• Aim: each agent must reach its destination as quickly as possible– Traffic networks, water supply networks, electrical

networks etc • BP: Addition of an extra link has a detrimental

effect on performance• Introducing a shortest path link in a network that

has reached equilibrium

Page 200: Autonomic Computing Omer F. Rana (Cardiff University)

A

B C

D

A

B C

D

Occurs when a community of agents is unable to coordinate their activities to takeadvantage of changes in the environment.

Page 201: Autonomic Computing Omer F. Rana (Cardiff University)

Collective Intelligence (COIN)

• Developed at NASA by Wolpert et al.• Scalable coordination technique for

adaptable, learning based multiagent systems (MAS).

• All agents strive to maximise their local utility function.

• The goal of the system is to maximise the global utility function.

Page 202: Autonomic Computing Omer F. Rana (Cardiff University)

Collective Intelligence (COIN)

Local utility functions are derived from the global utility functions so that:

• Maximisation of local utility functions maximises the global utility function – global optimum ‘line-up’ with the Nash Equilibrium.

• Local utility functions are learnable: good signal-to-noise ratio for learning algorithms.

• Agents are coordinated indirectly. Emergent behaviour is still possible as agents are not given explicit instructions and behaviour is not predefined.

Page 203: Autonomic Computing Omer F. Rana (Cardiff University)

Adapting Collective Intelligence

• We are aiming to adapt this technique for agents that can be deployed via the internet.

• COIN concentrates of specific applications: coordinating communications satellites, robotic rovers.

• We want to apply this technique dynamically and concentrate on software agents.

Page 204: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF – Learning Agent FIPA Compliant Community Toolkit

• Utility functions assigned dynamically.

• Utility extended to form two separate types: functional utility and performance utility.

• Assignment of multiple utility functions possible.

• Java API provided to support development of FIPA compliant agents.

Page 205: Autonomic Computing Omer F. Rana (Cardiff University)

FIPA - Foundation for Intelligent Physical Agents

• Standards for interoperable agent systems.• FIPA ACL: conversations consisting of FIPA

performatives such as inform, request, query etc.

• Agent management system (AMS) and directory facilitator (DF) part of the FIPA platform.

• LEAF utilises FIPA-OS implementation from Emorphia.

Page 206: Autonomic Computing Omer F. Rana (Cardiff University)

Community Building Kit: LEAFFour core concepts:

LEAF agentsLEAF utility functionsESNsLEAF tasks

Provides support for:JESS based policy descriptionReinforcement learning

Page 207: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF Agent

Page 208: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

Implementation of LEAF is based on FIPA-OS

FIPA-OS

LEAF

FIPAOSAgent Class

LeafNode Class

ESN Class

Task Class

LeafTask Class

Page 209: Autonomic Computing Omer F. Rana (Cardiff University)
Page 210: Autonomic Computing Omer F. Rana (Cardiff University)

• Coordination: utility functions are assigned to agents by an environment service node.

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

ESN

Community

f1

f2

Page 211: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

ESN

Community a

f1

f2

ESN

f3

Community b

sum f2,f3

Multiple utility functions can be assigned

Page 212: Autonomic Computing Omer F. Rana (Cardiff University)

• Utility functions can have parameters that are not available locally to the agent.

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

ESN

Community

f1

Page 213: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

• Utility functions can have parameters that are not available locally to the agent.

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

ESN

Community

R

O

O: observable propertiesR: remote parameters

f1

Page 214: Autonomic Computing Omer F. Rana (Cardiff University)

LEAF: Learning Agent FIPA-Compliant Community Toolkit 

Performance and Functional

Utility

P

F

Speed of execution, number of tasks, CPU usage etc. Decision making,

learning - high level behaviour.

Page 215: Autonomic Computing Omer F. Rana (Cardiff University)

Performance Utility

• Provides a utility measure based on performance engineering related aspects– Comms metrics:

• number of messages exchanged, size of message, response time

– Execution metrics: • execution time, time to convergence, queue time

– Memory and I/O metrics: • Memory access time, disk access time

• The effect of implementation decisions (algorithms; languages) and deployment decisions (platforms; networks), can be assessed.

Page 216: Autonomic Computing Omer F. Rana (Cardiff University)

Functional Utility• Utility based on “problem solving” capability

• Statically defined– related to service properties (capability based)– degree of match between task properties and service

capability• syntax match (exact match)• range match• semantic match (subsumption/subclass)

• Dynamically defined– related to execution output (MSE)

Page 217: Autonomic Computing Omer F. Rana (Cardiff University)

Utility Function Implementation

• Extend the LocalUtilityFunction abstract class.

• Implement the compute() method.

• Functions have access to remote parameters and observable properties.

Page 218: Autonomic Computing Omer F. Rana (Cardiff University)

Utility Function Implementation

Page 219: Autonomic Computing Omer F. Rana (Cardiff University)

Utility functions

• Global Utility (G) = Si Local Utility (Ui)

• U = (jobs of type X processed)/(jobs of type X submitted)

• U = 1/(idle time)

Can you consider other utility functions that may be relevant?

For students

Page 220: Autonomic Computing Omer F. Rana (Cardiff University)

Access to utility functions

double computeFunctionalUtility()Computes the sum of all currently assigned functional utility functions.

  double computePerformanceUtility()Computes the sum of all currently assigned performance utility functions.

  String[] getFunctionalUtilityRequiredProperties()Returns the observable properties required to compute the agent’s functional utility functions.

  String[] getPerformanceUtilityRequiredProperties()Returns the observable properties required to compute the agent’s performance utility functions.

      

Page 221: Autonomic Computing Omer F. Rana (Cardiff University)

Resource management

• The objective is to provide users with on-demand access to resources needed to execute applications.

• Each peer/agent can undertake three different roles: application agent, resource agent, broker agent.

• Multiple roles may be undertaken by the same peer.• Each peer is an autonomous agent capable of

learning within it’s environment with the goal of local utility maximisation.

Page 222: Autonomic Computing Omer F. Rana (Cardiff University)
Page 223: Autonomic Computing Omer F. Rana (Cardiff University)

Application Agents• Accept applications from users.• Decompose applications into tasks.• Identify suitable resources for task execution,

via broker agents.• Schedule and submit tasks to resource agents.• Manage dynamic application execution process.• Coordinated learning may be of benefit in

resource selection.

Page 224: Autonomic Computing Omer F. Rana (Cardiff University)

Resource Agents• Manage access to a particular resource.• Resources may be computational, visualisation,

scientific, or instrumentation based.• Resource agents allow tasks to be submitted

and executed on the resource.• Coordinated learning may allow resource agents

to optimise resource properties, and prioritise tasks.

Page 225: Autonomic Computing Omer F. Rana (Cardiff University)

Broker Agents

• Maintain information about discovered resource agents.

• Offer a matchmaking service, aimed at allowing application agents to discover resource agents.

• Coordinate learning may allow brokers to optimise their matchmaking service.

Page 226: Autonomic Computing Omer F. Rana (Cardiff University)

Agent based resource management

• Previous work used planning based BDI agents within the same framework.

• Current research involves investigating whether agents can benefit from coordinated learning.

• The eventual goal is to integrate the two techniques.

Page 227: Autonomic Computing Omer F. Rana (Cardiff University)

Agent Communities

• Communities are centred on the application/resource type: computational (C), visualisation (V), scientific (S), instrumentation (I) – there can be multiple communities of the same type.

• When an agent joins a community, it is assigned a local utility function.

• The agent learns to optimise this function to benefit the community.

• Agents are allowed to join multiple communities in an attempt to maximise their utility.

Page 228: Autonomic Computing Omer F. Rana (Cardiff University)

Agent Communities

Each community has a global utility function, based on community objectives:

1. Peers acting as application agents process as many applications as possible.

2. Peers acting as as application agents process as many applications as possible.

3. Peers acting as broker agents facilitate (1) and (2).

Page 229: Autonomic Computing Omer F. Rana (Cardiff University)

Global Utility Functions

where A is the number of applications processed by the community, idlei is the amount of time agent i spends idle. c1,c2 are constants

Page 230: Autonomic Computing Omer F. Rana (Cardiff University)

Application agent utility functions

where Aa is the number of applications processed by agent a, and Ja is the total resource usage time used by a. c1,c2 are constants

Page 231: Autonomic Computing Omer F. Rana (Cardiff University)

Resource agent utility functions

where Tr is the number of tasks processed by resource agent r, and idler is the total time spent idle by the resource. c1,c2 are constants

Page 232: Autonomic Computing Omer F. Rana (Cardiff University)

Broker agent utility functions

where n resources have been recommended by the resource agent, and Ul(i)Ti is the local utility of the recommended resource at the time of recommendation.

Page 233: Autonomic Computing Omer F. Rana (Cardiff University)

Simulations• 4 communities – (C,V,S,I)• 10 resource agents• 3 application agents• 1 broker agent• The current focus is on resource agent learning –

joining communities and updating resource properties

• Peers attempt to join communities in order to increase their utility, and will only remain in the community as long as their utility is above a certain threshold.

Page 234: Autonomic Computing Omer F. Rana (Cardiff University)

0

5

10

15

20

25

30

35

0 50 100 150 200 250time

Global UtilityNumber of Members

computational community

Page 235: Autonomic Computing Omer F. Rana (Cardiff University)

visualisation community

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350 400time

Global UtilityNumber of Members

Page 236: Autonomic Computing Omer F. Rana (Cardiff University)

0

5

10

15

20

25

30

35

40

45

50

0 50 100 150 200 250 300 350 400 450 500time

Global UtilityNumber of Membersstorage

community

Page 237: Autonomic Computing Omer F. Rana (Cardiff University)

instrumentation community

0

1

2

3

4

5

0 100 200 300 400 500 600time

Global UtilityNumber of Members

Page 238: Autonomic Computing Omer F. Rana (Cardiff University)

Current research objectives

• The aim is to allow peers to form communities, around which the collection of peers is ‘greater than the sum of their parts’.

• Current work involves the engineering of this application, and the evolution of the utility functions to include a greater degree of social context

• Learning is currently very difficult for the agents – need to allow learning algorithms to converge.

Page 239: Autonomic Computing Omer F. Rana (Cardiff University)

Common Themes• Load balancing and migration

• Data capture (especially performance related)

• Trigger and configuration – set up of execution at remote sites– updates to execution or changes– user set up

• Establishing dynamic resource groups

• Resource provisioning beyond regional and national centres

Page 240: Autonomic Computing Omer F. Rana (Cardiff University)

Toolkits: ABLE

• ABLE (Agent Building and Learning Environment)

• Support use of Java Beans

• Provides a host of pre-built functionality

• Also provides Tuning agents for:– Load Balancing– System Control function

Page 241: Autonomic Computing Omer F. Rana (Cardiff University)

AbleBeans – Java Agent Building Blocks

AbleBean

AbleBean

Direct method calls

Notification Events

Action Events

AbleEvents

AbleBean, AbleRemoteBean: a Java interface (local and remote) AbleObject: AbleBean instantiation with autonomous threadBean interactions: Direct method calls and event passingAbleEvents: Notification and Action events with synchronous and asynchronous event handling AbleBeanInfo and Customizer required for use in Agent Editor Set of core data access and algorithm beans supplied

From Joe Bigus (IBM)

Page 242: Autonomic Computing Omer F. Rana (Cardiff University)

AbleAgent

Sensor Eff ector

get app data

call app action

AbleBean A

AbleBean CAbleBean B

App/ Service 1 App/ Service 2

AbleAgent, AbleRemoteAgent: a Java interface (extends AbleBean) Composable: can contain other AbleBeans and AbleAgentsSensors and Effectors: Allow agents to interface with apps Can be distributed, synchronous or asynchronous (autonomous)

AbleAgents – Intelligent JavaBeans

From Joe Bigus (IBM)

Page 243: Autonomic Computing Omer F. Rana (Cardiff University)

ABLE Component Library

Machine Learning

Machine Reasoning

Agents

Data Access/Analysis

Back propagationSelf organizing mapsRadial Basis FunctionsTD-LambdaDecision TreesNaive Bayes

Script (procedures) Forward / Backward chaining Predicate logic (Prolog)Rete'-based pattern matchFuzzy systemsPlanning (STRIPS)

Text/DB read/writeCache, Filter, TransformStatistical routinesGenetic algorithmsother math analysis

Classification Autotune (closed loop control) Clustering Storage manager (multiple QoS)Prediction

From Joe Bigus (IBM)

Page 244: Autonomic Computing Omer F. Rana (Cardiff University)

ABLE Application Design

ABLE Core Beans

Custom Beans

(domain-specific)

Application

AgentABLE Library

From Joe Bigus (IBM)

Page 245: Autonomic Computing Omer F. Rana (Cardiff University)

AbleBean Wrapper Design Pattern

myAlgorithmBean

myAlgorithmCustomizer

myAlgorithmBeanI nfo

theAlgorithm

init()myAlgorithmBean()

process()

setters()getters()

theAlgorithm()

init()

process()

getters()setters()

processTimerEvent()

Allows easy integration of existing J ava algorithms into the Able environmentRequires creation of 3 J ava classes, Bean wrapper, BeanI nfo and CustomizerBean contains an instance of the algorithm and calls methods on it No (or minimal) source changes required in the algorithm class

From Joe Bigus (IBM)

Page 246: Autonomic Computing Omer F. Rana (Cardiff University)
Page 247: Autonomic Computing Omer F. Rana (Cardiff University)
Page 248: Autonomic Computing Omer F. Rana (Cardiff University)

Rule Blocks <type> <name>() using <engine> { ruleList } ; • Semantically equivalent to Java methods• Can specify a return data type• Can use pre-defined or user-defined name• No formal parameter lists, use global vars• Specify inference engine via using <engine> clause • <engine> can be any AbleInferenceEngine Java subclass• Body of ruleblock contains one or more Rules• Use setControlParameter() built-in function to set goals,

options, etc. • Ruleblock can have local or shared working memory

Page 249: Autonomic Computing Omer F. Rana (Cardiff University)

ARL Rule Syntax

<ruleLabel> { preConditions } [priority] : <ruleBody>;

• ruleLabel – unique identifier in ruleset• preConditions – list of Java objects

(e.g.TimePeriods)• priority – used in conflict resolution during

inferencing • Rule body must be one of the ARL rule types • myRule { weekdaysOnly } [ 3.0 ] : println(“wow”);

Page 250: Autonomic Computing Omer F. Rana (Cardiff University)

ABLE Rule Templates Allow IT Developer or Programmer to create rulesets and templates using WSAD editor Minimize external meta-data or artifacts Business user can create rules from templates using web-based UI Allow easy parameterization of rules and rule logic, with constraints on parameter values Reuse existing ABLE data types and ARL syntax

Allow users to customize rule templates and create new rules Variable values are constrained based on ruleset author constraints Can generate individual rules or entire rulesets via templates Can edit generated rules using same authoring environment

Page 251: Autonomic Computing Omer F. Rana (Cardiff University)

ARL Rule Template Syntax Ruleset myRuleTemplateExample { import com.ibm.myclass.Customer; variables { Customer customer = new Customer() ; // myclass type template Categorical customerLevel = new Categorical("gold", "silver", "platinum"); template String salesMsg = new String("Thank you for shopping IBM"); // example msg template Continuous discountValue = new Continuous(0.01, 0.50); // allow range from 1% to 50% Double discount = new Double(0.0) ; }

inputs { customer } ; outputs { discount } ; void process() { Rule1: if (a > b) then println("regular old rule") ; Rule2: if (a <= b) then println("another regular old rule") ;

template myRuleTemplate1: if ( customer.level == customerLevel ) // NOTE: Rule is a template then { discount = discountValue ; println( salesMsg ) ; } } }

Page 252: Autonomic Computing Omer F. Rana (Cardiff University)

Agent Properties• Flexible• Autonomic• Generic

• KeepAlive

• MaxClients

• CPU

• MEM

Users

Apache Web Server

Desired Utilization Level

AutoTune Agent- Modeling

- Run-time Control

Autotune Agent Web-Tuning Scenario

Page 253: Autonomic Computing Omer F. Rana (Cardiff University)

Design Phase I: System Modeling

Page 254: Autonomic Computing Omer F. Rana (Cardiff University)

SysAdminBrainRuleSet

SysAdminActionsRuleSet

CPUWatcher

FindLargeObjectsfindDuplcateJobs

CleanupFindRunawayJobs

DiskWatcher

DiskPredictor

NOJWatcher

iSeries System Adminstration using ABLE

SysAdmin Agent

Task/Info Agents

Action Agents

Page 255: Autonomic Computing Omer F. Rana (Cardiff University)

P e r f o r m a n c e P r e d ic t io n u s in g N e u r a l N e t w o r k sP e r f o r m a n c e P r e d ic t io n u s in g N e u r a l N e t w o r k s

M o n i t o r D a t a

N e u r a l

P r e d i c t i o n A g e n t

W e b S e r v e r r u n n i n g o n W i n d o w s 2 0 0 0

H i t w i t h v a r i a b l e w o r k l o a d , s e a s o n a l i t y

C a p t u r e P e r f o r m a n c e M o n i t o r D a t a

T r a i n n e u r a l n e t w o r k t o p r e d i c t f u t u r e r e s p o n s e t i m e

Page 256: Autonomic Computing Omer F. Rana (Cardiff University)

WinGamma

• Data analysis toolkit – especially for time series data

• Can support identification of:– Time series “embedding” dimension – Level of noise present within data – Based on the “Gamma” statistic

• Can be used prior to training a neural network

Page 257: Autonomic Computing Omer F. Rana (Cardiff University)
Page 258: Autonomic Computing Omer F. Rana (Cardiff University)
Page 259: Autonomic Computing Omer F. Rana (Cardiff University)
Page 260: Autonomic Computing Omer F. Rana (Cardiff University)
Page 261: Autonomic Computing Omer F. Rana (Cardiff University)

WEKA: Waikato Environment for Knowledge Analysis

Page 262: Autonomic Computing Omer F. Rana (Cardiff University)

Explorer: building “classifiers”

• Classifiers in WEKA are models for predicting nominal or numeric quantities

• Implemented learning schemes include:– Decision trees and lists, instance-based

classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

• “Meta”-classifiers include:– Bagging, boosting, stacking, error-correcting

output codes, locally weighted learning, …

Page 263: Autonomic Computing Omer F. Rana (Cardiff University)
Page 264: Autonomic Computing Omer F. Rana (Cardiff University)

Monitoring Tools

• NWS (Network Weather Service)– Support a forecasting model – Work at “application-level” and not necessarily at the

network (resource) level

• NetLogger– Now supports instrumentation for Globus calls– Useful data capture process (event based)– Manage level of data captured

• Specialist support via Apache Web Server– Messaging and Execution time

Page 265: Autonomic Computing Omer F. Rana (Cardiff University)

From Brian Tierney (LBNL)

Page 266: Autonomic Computing Omer F. Rana (Cardiff University)

From: G. Obertelli (UCSB)

Page 267: Autonomic Computing Omer F. Rana (Cardiff University)

Additional Info.• IBM Autonomic Computing Web site

– http://www.research.ibm.com/autonomic/• IBM Autonomic Computing Library

– http://www-03.ibm.com/autonomic/library.html• LEAF project

– http://users.cs.cf.ac.uk/O.F.Rana/leaf/• DIPSO/FAEHIM project

– http://users.cs.cf.ac.uk/Ali.Shaikhali/faehim/• WinGamma

– http://www.cs.cf.ac.uk/wingamma/• WEKA

– http://www.cs.waikato.ac.nz/ml/weka/• ABLE Toolkit – Tutorial

– http://www.cs.iastate.edu/~colloq/docs/able2_bigus.ppt