Upload
birch
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Autonomic Computing: Model, Architecture, Infrastructure. Manish Parashar The Applied Software Systems Laboratory Rutgers, The State University of New Jersey http://automate.rutgers.edu Ack: NSF (CAREER, KDI, ITR, NGS), DoE (ASCI) UPP – Autonomic Computing - PowerPoint PPT Presentation
Citation preview
Autonomic Computing:Model, Architecture, Infrastructure
Manish ParasharThe Applied Software Systems Laboratory
Rutgers, The State University of New Jerseyhttp://automate.rutgers.edu
Ack: NSF (CAREER, KDI, ITR, NGS), DoE (ASCI)
UPP – Autonomic ComputingMt. St. Michel, France, September 15 – 17, 2004
UPP, September 15-17, 2004 2
Unprecedented Complexity, Uncertainty …
• Very large scales– million of entities
• Ad hoc (amorphous) structures/behaviors– p2p/hierarchical architecture
• Dynamic– entities join, leave, move, change behavior
• Heterogeneous– capability, connectivity, reliability, guarantees, QoS
• Unreliable– components, communication
• Lack of common/complete knowledge– number, type, location, availability, connectivity, protocols,
semantics, etc.
UPP, September 15-17, 2004 3
Autonomic Computing
• Our system programming paradigms, methods and management tools seem to be inadequate for handling the scale, complexity, dynamism and heterogeneity of emerging systems– requirements and objectives are dynamic and not know a priori– requirements, objectives and solutions (algorithms, behaviors,
interactions, etc.) depend on state, context, and content
• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self
protecting, highly decentralized, heterogeneous architectures that work !!!
• The goal of autonomic computing is to build self-managing system address these challenges using high level policies
UPP, September 15-17, 2004 4
Ashby’s Ultrastable System Model of the Human Autonomic Nervous System
Reacting Part R
Environment
Step Mechanisms/ Input Parameter S
Essential Variables
Motorchannels
Sensorchannels
UPP, September 15-17, 2004 5
Programming Distributed Systems
• A distributed system is a collections of logically or physically disjoint entities which have established a processing for making collective decisions.
if (Decision(CurrentState,Request)) then TransitionState(CurrentState,Request)
– Central/Distributed Decision & Transition– Programming System
• programming model, languages/abstraction – syntax + semantics
– entities, operations, rules of composition, models of coordination/communication
• abstract machine, execution context and assumptions• infrastructure, middleware and runtime
– Conceptual and Implementation Models
UPP, September 15-17, 2004 6
UPP 2004 – Autonomic Computing
• Objective: Investigate conceptual and implementation models for Autonomic Computing– Models, Architectures and Infrastructures for Autonomic
Computing• Manish Parashar et al.
– Grassroots Approach to Self-Management in Large-Scale Distributed Systems
• Ozalp Babaoglu et al.
– Autonomic Runtime System for Large Scale Applications• Salim Hariri et al.
UPP, September 15-17, 2004 7
Outline
• Programming emerging distributed systems• Project AutoMate and the Accord programming system• Sample applications in science and engineering• Conclusion
UPP, September 15-17, 2004 8
Autonomic Computing Architecture
• Autonomic elements (components/services)– Responsible for policy-driven self-management of individual
components
• Relationships among autonomic elements – Based on agreements established/maintained by autonomic
elements– Governed by policies– Give rise to resiliency, robustness, self-management of
system
UPP, September 15-17, 2004 9
Project AutoMate: Enabling Autonomic Applications(http://automate.rutgers.edu)
• Conceptual models and implementation architectures for autonomic computing– programming models, frameworks and middleware services
• autonomic elements• dynamic and opportunistic composition• policy, content and context driven execution and management
Ru
dd
erC
oo
rdin
ation
Mid
dlew
are
Sesam
e/DA
IS P
rotection Service
Autonomic Grid Applications
Programming SystemAutonomic Components, Dynamic Composition,
Opportunistic Interactions, Collaborative Monitoring/Control
Decentralized Coordination EngineAgent Framework,
Decentralized Reactive Tuple Space
Semantic Middleware ServicesContent-based Discovery, Associative Messaging
Content OverlayContent-based Routing Engine,
Self-Organizing Overlay
Ont
olog
y, T
axon
omy
Met
eor/
Sq
uid
Co
nte
nt-
bas
edM
idd
lew
are
Acc
ord
Pro
gra
mm
ing
Fra
mew
ork
UPP, September 15-17, 2004 10
Accord: A Programming System for Autonomic Applications
• Specification of applications that can detect and dynamically respond during execution to changes in both, the execution environment and application states– applications composed from discrete, self-managing
components which incorporate separate specifications for all of functional, non-functional and interaction-coordination behaviors
– separations of the specifications of computational (functional) behaviors, interaction and coordination behaviors and non-functional behaviors (e.g. performance, fault detection and recovery, etc.) so that their combinations are composable
– separation of policy and mechanism – policies in the form of rules are used to orchestrate a repertoire of mechanisms to achieve context-aware adaptive runtime computational behaviors and coordination and interaction relationships based on functional, performance, and QoS requirements
– extends existing distributed programming systems
UPP, September 15-17, 2004 11
Autonomic Elements in Accord
Sensor Invocation
Context/Content RulesAutonomic Element
Computational Element
Element Manager
Operational Port
Functional Port
Control Port Element Manager
State
Actuator Invocation
FunctionInterfaces
– Functional port defines set of functional behaviors provided and used– Control port defines sensors/actuators for externally monitoring and
controlling the autonomic element, and a set of guards to control the access to the sensors and actuators
– Operational port defines interfaces to formulate, inject and manage rules used to manage the runtime behaviors and interactions of the element
– Autonomic element embeds an element manager that is delegated to evaluate and execute rules in order to manage the execution of the element, and cooperates with other element managers to fulfill application objectives.
UPP, September 15-17, 2004 12
Rules In Accord
IF condition THEN then_actions ELSE else_actions
A logic combination of sensors, events, and functional interfaces
A sequence of sensors, actuators and functional interfaces
– Behavior rules manage the runtime behaviors of a component
– Interaction rules manage the interactions between components, between components and environments, and the coordination within an application.
• control structure, interaction pattern, communication mechanism
– Security rules control access to the functional interfaces, sensors/actuators and rule interfaces
– Conflicts are resolved using a simple priority mechanism
UPP, September 15-17, 2004 15
Dynamic Composition/Coordination In Accord
Workflow
Manager(s)
Interaction rules
Interaction rules
Interaction rules
Interaction rules
• Relationship is defined by control structure (e.g., loop, branch) and/or communication mechanism (e.g., RPC, shared-space)
– composition manager translates workflow into a suite of interaction rules injected into element managers
– element managers execute rules to establish control and communication relationships among elements in a decentralized manner
• rules can be used to add or delete elements
• a library of rule-sets defined for common control and communications relationships between elements.
– interaction rules must be based on the core primitives provided by the system.
UPP, September 15-17, 2004 16
Accord Implementation Issues
• Current implementations – C++ + MPI, DoE CCA, XCAT/OGSA– XML used for control/operational ports and rules– common ontology for specifying interfaces, sensors/actuators,
rule, content, context, …– timed behavior, fail-stop semantics– of course, these is a performance impact but in our
experience this have not been a show stoppers
• Accord assumes an execution environment that provides – agent-based control network– supports for associative coordination– service for content-based discovery and messaging, – support of context-based access control– execution environment of the underlying programming system
UPP, September 15-17, 2004 17
Accord Neo-CCA
usePort
Component A
providePort
providePort
Element Manager
usePort
Driver component
GoPort
Component B
providePort
usePort
Driver component
GoPort
providePort
Composition Agent
usePort
Component A
providePort
Component B
providePort
An original Neo-CCA application
The Neo-CCA based Accord application
usePort
UPP, September 15-17, 2004 18
Accord Neo-CCA
CA
Driver A B
EM
Neo-CCA frameworkNode x
CA
Driver A B
EM
Neo-CCA frameworkNode y
CA
Driver A B
EM
Neo-CCA frameworkNode z
UPP, September 15-17, 2004 19
Accord Application Infrastructure
• Rudder Decentralized Coordination Framework – support autonomic compositions, adaptations, optimizations, and
fault-tolerance.• context-aware software agents • decentralized tuple space coordination model
• Meteor Content-based Middleware– services for content routing, content discovery and associative
interactions• a self-organizing content overlay• content-based routing engine and decentralized information discovery
service – flexible routing and querying with guarantees and bounded costs
• Associative Rendezvous messaging – content-based decoupled interactions with programmable reactive
behaviors.
• Details in IEEE IC 05/04, ICAC 04, SMC 05
UPP, September 15-17, 2004 20
Data-Driven Optimization of Oil Production
UPP, September 15-17, 2004 21
AutonomicAutonomic OilOil Well Placement (VFSA)
permeability Pressure contours3 wells, 2D profile
Contours of NEval(y,z,500)(10)
Requires NYxNZ (450)evaluations. Minimum
appears here.
VFSA solution: “walk”: found after 20 (81) evaluations
UPP, September 15-17, 2004 22
AutonomicAutonomic OilOil Well Placement (VFSA)
UPP, September 15-17, 2004 23
AutonomicAutonomic OilOil Well Placement (SPSA)
Permeability field showing the positioning of current wells. The symbols “*” and “+” indicate injection and producer wells, respectively.
Search space response surface: Expected revenue - f(p) for all possible well locations p. White marks indicate optimal well locations found by SPSA for 7 different starting points of the algorithm.
UPP, September 15-17, 2004 24
AutonomicAutonomic OilOil Well Placement (SPSA)
UPP, September 15-17, 2004 25
CH4Air/H2Air Simulations
• Simulate the chemical reaction with the elements O, H, C, N, and AR under dynamic conditions– CRL/SNL, Livermore, CA
• Objective is to use current sensor date and simulation state to choose “best” algorithm the accelerates convergence– i.e., decreases nfe
UPP, September 15-17, 2004 26
Rule Generation for CH4Air Problem
Comparison of BDF algorithms in CH4Air problem in terms of nfe
0
200
400
600
800
1000
1200
1400
1000
1200
1400
1600
1800
2000
2200
2400
2600
2800
3000
temperature
the
nu
mb
er
of
nfe
BDF2
BDF3
BDF4
BDF5
UPP, September 15-17, 2004 27
Rules for CH4Air Problem
• IF 1000 <= temperature < 2000 THEN BDF 3• IF 2000 <= temperature < 2200 THEN BDF 4• IF 2200 <= temperature < 3000 THEN BDF 3• IF 3000 <= temperature THEN BDF 3
UPP, September 15-17, 2004 28
Experiment Results of CH4Air Problem
Comparison of rule based and non rule based execution of CH4Air problem in terms of nfe
0
200
400
600
800
1000
1200
1400
temperature
the
nu
mb
er
of
nfe rule based
execution
non rulebasedexecution
UPP, September 15-17, 2004 29
Rule Generation for H2Air Problem
Comparison of BDF algorithms in H2Air problem in terms of nfe
0
100
200
300
400
500
600
700
1000 1200 1400 1600 1800 2000 2200 2400 temperature
the
nu
mb
er
of
nfe
BDF2
BDF3
BDF4
BDF5
UPP, September 15-17, 2004 30
Rules for H2Air Problem
• IF 1000 <= temperature < 1200 THEN BDF 2• IF 1200 <= temperature < 1800 THEN BDF 4• IF 1800 <= temperature < 2400 THEN BDF 3• IF 2400 <= temperature THEN BDF 4
UPP, September 15-17, 2004 31
Experiment Results of H2Air Problem
Comparison of rule based and non rule based execution of H2Air problem in terms of nfe
0
100
200
300
400
500
600
700
1000 1200 1400 1600 1800 2000 temperature
the
nu
mb
er
of
nfe
Rulebasedexecution
Non rulebasedexecution
UPP, September 15-17, 2004 32
Computational Modeling of Physical Phenomenon
• Realistic, physically accurate computational modeling– Large computation requirements
• e.g. simulation of the core-collapse of supernovae in 3D with reasonable resolution (5003) would require ~ 10-20 teraflops for 1.5 months (i.e. ~100 Million CPUs!) and about 200 terabytes of storage
• e.g. turbulent flow simulations using active flow control in aerospace and biomedical engineering requires 5000x1000x500=2.5∙109 points and approximately 107 time steps, i.e. with 1GFlop processors requires a runtime of ~7∙106 CPU hours, or about one month on 10,000 CPUs! (with perfect speedup). Also with 700B/pt the memory requirement is ~1.75TB of run time memory and ~800TB of storage.
– Dynamically adaptive behaviors– Complex couplings
• multi-physics, multi-model, multi-resolution, ….
– Complex interactions• application – application, application – resource, application – data, application – user, …
– Software/systems engineering/programmability• volume and complexity of code, community of developers, …
– scores of models, hundreds of components, millions of lines of code, …
UPP, September 15-17, 2004 33
A Selection of SAMR Applications
Multi-block grid structure and oil concentrations contours (IPARS, M. Peszynska, UT Austin)
Blast wave in the presence of a uniform magnetic field) – 3 levels of refinement. (Zeus + GrACE +
Cactus, P. Li, NCSA, UCSD)
Mixture of H2 and Air in stoichiometric proportions with a non-uniform temperature field (GrACE + CCA, Jaideep Ray, SNL, Livermore)
Richtmyer-Meshkov - detonation in a deforming
tube - 3 levels. Z=0 plane visualized on the right (VTF + GrACE, R. Samtaney, CIT)
UPP, September 15-17, 2004 36
Autonomic Runtime Management
Self-Optimization& Execution
Self-Observation
& Analysis
AutonomicPartitioning
Partition/ComposeRepartition/Recompose
VCUVCUVirtual
Computation Unit
VCUVirtual
ResourceUnit
Dynamic Driver Application
Monitoring &Context-Aware
Services
Application
Monitoring
Service
Resource
Monitoring
Service
Heterogeneous, Dynamic Computational Environment
Natural Region Characterization
PerformancePrediction
Module
CPU
SystemCapability
Module
MemoryBandwidthAvailabilityAccess
Policy
Resource History Module
System StateSynthesizer
Application State Characterization
Nature of Adaptation
ApplicationDynamics
Computation/Communication
ObjectiveFunction
Synthesizer
Prescriptions
MappingDistribution
Redistribution
Execution
NRM NWM
Normalized Work Metric
NormalizedResource Metric
Autonomic Scheduling
VGTS: Virtual Grid Time SchedulingVGSS: Virtual Grid Space Scheduling
Global GridScheduling
Local GridScheduling
VGTS VGSS VGTS VGSS
Virtual GridAutonomic
RuntimeManager
Current System State
Current Application State
DeductionEngine
DeductionEngine
DeductionEngine
UPP, September 15-17, 2004 39
Autonomic Forest Fire Simulation
High computationzone
Predicts fire spread (the speed, direction and intensity of forest fire front) as the fire propagates, based on both dynamic and static environmental and vegetation conditions.
UPP, September 15-17, 2004 40
Conclusion
• Autonomic applications are necessary to address scale/complexity/heterogeneity/dynamism/reliability challenges
• Project AutoMate and the Accord programming system addresses key issues to enable the development of autonomic applications – conceptual and implementation
• More Information, publications, software, conference– http://automate.rutgers.edu– [email protected] / [email protected] – http://www.autonomic-conference.org
UPP, September 15-17, 2004 41
The Team
• TASSL Rutgers University – Autonomic Computing Research
Group• Viraj Bhat • Nanyan Jiang• Hua Liu (Maria) • Zhen Li (Jenny) • Vincent Matossian • Cristina Schmidt • Guangsen Zhang
– Autonomic Applications Research Group
• Sumir Chandra • Xiaolin Li • Li Zhang
• CS Collaborators– HPDC, University of Arizona
• Salim Hariri– Biomedical Informatics, The Ohio
State University• Tahsin Kurc, Joel Saltz
– CS, University of Maryland• Alan Sussman, Christian Hansen
• Applications Collaborators– CSM, University of Texas at Austin
• Malgorzata Peszynska, Mary Wheeler
– IG, University of Texas at Austin• Mrinal Sen, Paul Stoffa
– ASCI/CACR, Caltech• Michael Aivazis, Julian Cummings,
Dan Meiron– CRL, Sandia National Laboratory,
Livermore• Jaideep Ray, Johan Steensland