View
219
Download
0
Category
Tags:
Preview:
Citation preview
Monitoring Data Dependencies to Support
Recovery in Concurrent Process Execution*
Susan D. UrbanDepartment of Computer Science
February 6, 2009
*This research is partially supported by NSF Grant No. CCF-0820152.
2
The Challenge of Concurrent Execution in a Service-Oriented Environment
Serializability The concurrent execution of two or more transactions must be
equivalent to the serial execution of those transactions Two-phase locking and two-phase commit support serializability in
controlled distributed environmentsIsolation Data changes should not be released before the commit of a transaction Lack of isolation leads to cascaded rollbacks when transaction failure
occurs.• Transaction A fails and performs rollback• If transaction B reads modified data from transaction A, transaction B must
also rollback
The problem: Serializability and isolation are not generally applicable to long-running workflow or process scenarios composed of distributed, autonomous services.
Compensation can be used to logically undo a process Compensation does not account for the affect of the failure and recovery
process on concurrently executing processes
3
Concurrent Process Execution Scenario
Scenario Process1 fails at service operation5
Compensation can be executed to restore Process1
Process2 may be operating with incorrect data
Serviceoperation1
Serviceoperation2
Serviceoperation3
Serviceoperation4
Serviceoperation5
Serviceoperation2
Serviceoperation4
Serviceoperation5
……. Serviceoperationm
……. Serviceoperationn
Process1
Process2
Service Provider1
Service Provider2
Service Provider3
4
Research Challenges
Can we capture and share data changes and data dependencies among concurrently executing processes that invoke Grid/Web Services?
Can we provide a more intelligent way to dynamically analyze the relationships that exist between concurrently executing processes?
Can we determine how the recovery of one process can affect other concurrently executing processes based on application semantics?
“Our Success in hiding computers when they work brings with it a responsibility to hide them when they fail. Imagine Web Services as
available as telephones….we will have to design systems assuming that they will fail….
we should seek to ensure that all systems mask almost all of those failures from users.”
From Computer Science: Reflections on the Field, Reflections from the Field, National Research Council of the National Academies, 2004.
Harnessing Moore’s Law, by Mark Hill
5
Overview of Presentation
Related Work
The DeltaGrid Approach Overview of the Approach Delta-Enabled Grid Services (DEGS) Process Dependency Model Service Composition and Recovery Model Process Interference Rules and Recovery Algorithm
Implementation, Simulation, and Performance Evaluation
DeltaGrid Research Contributions
Current Directions (NSF Grant No. CCF-0820152) The D3 Project: Decentralized Data Dependency Analysis and
Recovery for Concurrent Processes
THE REACTIVE BEHAVIOR AND DATA MANAGEMENT RESEARCH TEAM
Past Members from Arizona State University Luther Blake (M.S.) The Design and Implementation of Delta-Enabled Grid
Services, 2006. Yang Xiao (Ph.D.) Using Deltas to Analyze Data Dependencies and Semantic
Correctness in the Recovery of Concurrent Processes, 2006. Vidya Gopalan (M.S.) Simulation and Evaluation of an Object-Oriented Condition
Evaluator for Process Interference Rules, 2008.
Current Team from Texas Tech University Ziao Liu, M.S. Student, Decentralized Data Dependency Analysis for Concurrent
Process Execution – in progress Le Gao, Ph.D. Student – in progress Andrew Courter, B.S./M.S. Student - in progress
http://reactive.cs.ttu.edu
6
7
Related Work: Transactions and Workflows
Transactional Workflow The ConTract Model (compensation, pre-/post-condition) (Wachter and Reuter 1992) METEOR (pre-defined hierarchical error model) (Worah 1997) CREW (explicitly specify data dependency) (Kamath and Ramamritham 1998) WAMO (automatic exception handling for workflow execution) (Eder and Liebhart
1995)
Exception handling in service composition environment Transaction protocols: WS-Transaction (Cabrera et al. 2002) Transactional Attitude (Mikalsen, Tai, and Rouvellou 2002) Web Service Composition Action (contingency) (Tartanoglu et al. 2003) (Tartanoglu et
al. 2003) BPEL4WS (Andrews et al. 2003) BPML (Arkin 2002)
Our Research Supports relaxed isolation and user-defined semantic correctness Rule-based approach to resolving failure and recovery impact on concurrent
processes. Dynamically analyzes data dependencies from streaming database log files.
The DeltaGrid Approach
Overview of the Approach
9
The DeltaGrid Approach
A semantically-robust execution environment for processes that execute over distributed, autonomous services
Process History Capture System
Failure Recovery System
Process Execution Engine
Rule Processor
DeltaGrid Event ProcessorData
Delta-Enabled Grid Services
Metadata Manager
Process Metadata
Rule Metadata
deltas
deltas
Invo
ke s
ervi
ces
App Exceptions
Query history, write process info
Read process script
Execute rules
Use analysis interface
Rule-based Failure recovery
Event One-way interaction between system components
two-way interaction between system components
10
Execution Semantics
Composition Structure
Process Interference Rules
Rule Specification
Process Dependency Model
Read/write Dependency
The DeltaGrid Abstract Execution Model
Recovery Algorithms
Global Execution History
Triggering ProcedureExecution Semantics
Composition Structure Rule Specification
Read and write Dependency
Recovery Algorithms
Global Execution History
Global Execution History Interface
Triggering Procedure
Service Composition and Recovery Model
DeltaGrid Abstract Execution Model
The DeltaGrid Approach
Delta-Enabled Grid Services
12
Delta-Enabled Grid Services
Source Database
Delta Repository
Delta-Enabled Grid Service
OGSA-DAI
Invoke DML activity
Execute DML statement
Delta Event Processor
Client Application
Delta notification
Delta propagation
Delta notification (push mode)
Invoke service operation • Delta – An incremental change in a data element• Captures data changes using either
• Triggers• Oracle Streams
• Sends deltas back to the delta event processor in either a push or pull mode using XML• Provides a way to externalize the DB log file as a stream of data change events
Triggers vs. Streams
Triggers Tightly coupled to update transaction Doubles time for update Push of deltas is not
automatic Easy to use but inflexible
Oracle Streams Decoupled from update
transaction Offload delta repository to limit affect on updates Automatic streaming to multiple destinations Complex but versatile
Expanding Investigation to DB2 and SQL Server
S. Urban, Y. Xiao, L. Blake, and S. Dietrich, Monitoring Data Dependencies in Concurrent Process Execution Through Delta-Enabled Grid Services, to appear in International Journal Of Web and Grid Services, 2009.
14
Use of Object Deltas
op11
p1Process
Object Deltas
X (x0) x1 x2
Y (y0) y1
op21
p2
x3
y2
op22op12op11
p1Process
Object Deltas
X (x0) x1 x2
Y (y0) y1
op21
p2
x3
y2
op22op12
Dynamically analyze data dependencies in concurrent process execution to identify process interference when failures occur.
Delta-Enabled Rollback (DE-rollback) can be used if recoverability conditions are satisfied.
The DeltaGrid Approach
Process Dependency Model
16
Write/Potential Read Dependency
Write Dependency Process-level
A write dependency exists if a process pi, writes a data item x that has been written by another process pj before pj completes (i≠j).
Operation-level Write dependency set
Potential Read Dependency Process-level
A read dependency exists if a process pi, read a data item x that has been written by another process pj before pj completes (i≠j).
Operation-level Potential read dependency set
17
Global Execution History
Delta DeltaDelta DeltaDeltatime
DEGS1 Local Execution History
Delta DeltaDelta DeltaDeltatime
DEGSn Local Execution History
…...
Global Execution Historydeltas
Delta DeltaDelta DeltaDeltatime
Delta DeltaDelta DeltaDelta
execution contextoperation1 (input, output, state, degsID, tss, tse)…
...
operationn (input, output, state, degsID, tss, tse)
process1 (input, output, state, tss, tse)…...
processm (input, output, state, tss, tse)
Write dependency
Potential Read dependency
Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Journal of Information Science and Technology, 2008. Special issue from 10th International Conference on Business Information Systems, Poznan, Poland, 2007.
18
Process Execution Scenario
op11
p1Process
X (x0) x1 x2
Y (y0) y1
op21
p2
z2
op13op12
time+
Operation
Z (z0) z1
x3
op22 op14
x4DEGS1
DEGS2
tss tse
ts1 ts2 ts3 ts4 ts5 ts6
Local Execution History of DEGS1
Local Execution History of DEGS2
Global Execution History
System Invocation
Event Sequence
The DeltaGrid Approach
Service Composition and Recovery Model
20
Service Composition Structure
abstractProcess
Composite Group
Atomic Group
1
Operation
1
Compensation Contingency
11
1*
1
1
*
1
0..1 0..1
1
0..1 0..1
Execution Entities:
• Operation• Compensation• Contingency• Atomic Group• Composite
Group• Process
21
Abstract Process Definition Example
op11
cop11
top11
ag111
op12
cop12
op13
top13
ag113
cg11
cg11.cop
cg11.top
op14 (non-critical)ag121
op15
cop15
cg12.top
op16
cop16
top16
ag13
cg1.cop
cg1.top
p1 = cg1
cg12
ag112
ag122
Atomic Group Compensation ContingencyComposite Group Deep/Shallow compensation ContingencySupports DE-RollbackProvides state diagrams and
algorithms for recovery semantics of the service composition model (single and concurrent execution cases)
Yang Xiao and Susan D. Urban, The DeltaGrid Service Composition and Recovery Model, to appear International Journal of Web Services Research, 2009.
22
Example: Process Interference Caused by Write Dependency
receiveClientOrder
InventoryItem (I0)
CA1
I1
time+
ClientOrder(CA0)
DEGS1
DEGS2
ts2 ts3 ts4 ts5 ts6
CheckCredit
CheckInventory
ChargeCreditCard
decInventory
packOrder
receiveClientOrder
CheckInventory
ChargeCreditCard
decInventory
verifyVOItem
IncInventory
packBackOrder
decInventory
ClientOrder(CB0)
I2 I3 I4
cop: unpackBackOrder
cop:incInventory
cop:decInventory
I5 I6
CA2
CB1
CheckCredit
ts7 ts8 ts9
Pc1=placeClientOrder
Pr=replenishInventory
Pc2=placeClientOrder
ts1
Write dependent on Pc1 and
Pr.
Write dependent on
Pc1.
The DeltaGrid Approach
Process Interference Rules and Recovery Algorithm
24
PIR Specification
create rule ruleName event failureRecoveryEvent define [viewName as <OQL expression>]condition [when condition]action recovery commands
event: <processName>ReadDependency(pf, rdp) <processName>WriteDependency(pf, wdp)
define: query over the global execution history interfacecondition: determine if process interference existsaction: deepCompensate/re-execute process
post-commitRecover/re-execute operation
25
Process Interference Rule Example
Compensation of replenishInventory removed inventory items needed in placeClientOrder
Create rule inventoryDecrease
Event placeClientOrderWriteDependency(failedProcess, wdProcess)
Define decreasedItems asselect fd.oIdfrom fd in failedProcess.getDeltasByRecovery(“InventoryItem”, “quantity”)group by fd.oIdhaving sum(fd.newValue – fd.oldValue) < 0
Condition when exists decItem in decreasedItems:decItem in (select d from d in wdProcess.getDeltas(“InventoryItem”, “quantity”))
Action deepCompensate(wdProcess);
Triggered after failure recovery of
failedProcess
Query deltas using object model
Use application semantics to
determine if process interference exists
26
Concurrent Process Recovery
P1
P4P3P2
P9P7 P8P6P5
Execution queue holding active processes
Generate recovery commands for the failed process p1
Generate process dependency graph (PDG) for p1
Dependent processes are temporarily suspended to evaluate PIRs.
Breadth-first traversal for PDG and PIR evaluation
A process depends on multiple processes
A process with PIR evaluated to be false Results show the correctness of the
PDG formation, the traversal process, use of DE-rollback, and the PIR evaluation process
Cascaded Process Recovery Example
27
P1
P4P3P2
P9P7 P8P6P5
P1
P2 P3 P4
P5 P6 P7 P8 P9
Recovery Needed
Recovery Not Needed
Special Cases to Consider
28
P1
P2
P4
P3
P5 P2
Handles cyclic dependencies Guarantees that updates are not lost in the
recovery process. Compensation has higher priority than DE-
rollback DE-rollback is only performed if no write
dependencies exist. Two failed processes p1 and p2 can have a common
dependent process p3. Recovery of failed processes p1 and p2 are
ordered by timestamps If p3 is recovered with p1, p3 does not appear
in the dependency graph of p2 but dependenciesintroduced by the recovery of p3 are considered in determining DE-rollback applicability in the recovery of p2
The DeltaGrid Approach
Implementation, Simulation, and Performance Evaluation
30
Process History Capture System (PHCS) and Process Recovery System (PRS)
Failure Recovery System
GlobalScheduleAccess DeltaAccess ProcessInfoAccess
Delta Repository
Process Runtime
Info
Service Layer
DeltaGrid Event ProcessorDelta-Enabled
Grid Service
Delta Receiver
Parser
Global Delta Object Schedule
Process History Analyzer
Process Execution Engine
XML files (deltas)
Process History Capture System
XML files (deltas)
OODB
Data Access Layer
Data Storage Layer
XML files
(deltas)
Global schedule
Query process history
Java objects (deltas)
Write process
execution context
deltas
Process runtime info
31
Simulation and Evaluation Framework
DEVSJAVA (B. Zeigler & H. Sarjoughian)
Implemented PHCS and PRS
Simulated DEGS and Execution Engine
Evaluation Setup for WD Retrieval Vary number of concurrent processes (10~100,
100~1000) Vary an operation’s distribution over objects (100
objects, 1000 objects)
Evaluation Result Analysis An operation’s distribution over objects does not matter Exponential increase without optimization Linear increase with optimization based on segmenting
the global schedule Advocates a distributed PHCS
Wri te Dependency Retr i eval Ti me (n: 10~100)
0
100
200
300
400
500
10 20 30 40 50 60 70 80 90 100
Number of concurrent processes
Proc
essi
ng t
ime
(Mil
lise
cond
)
100 obj ects
1000 obj ects
Wri te Dependency Retr i val Ti me (n: 100~1000)
020000400006000080000
100000120000
Number of concurrent processes
Proc
essi
ng t
ime
(Mil
lise
cond
)
100obj ects
1000obj ects
segment
32
Other Evaluation Results
Evaluation setup for Recovery Algorithm Vary number of concurrent processes (10~100, 100~1000) Vary process nesting level (1-5)
Evaluation result and analysis Linear increase when the number of concurrent processes grows
• Delta parsing/storage time (increases faster than global schedule)
• Global schedule construction time
• Operation-level read dependency retrieval time Exponential increase in PDG construction time with high process density Constant cascaded recovery processing time Advocates distributed PHCS
• Large amount of concurrent deltas
• High process dependency density
Improved delta object model interface performance through the use of SODA (Simple Object Data Access) interface.
The DeltaGrid Approach
Research Contributions
34
DeltaGrid Research Contributions
Defined the functionality required for the capture and use of incremental changes to autonomous data sources in a distributed Grid Service environment.
Designed a flexible approach to recovery of service execution failure, providing multi-level protection and maximizing forward recovery
Defined algorithms for analysis of data dependencies among concurrently executing processes based on deltas collected from distributed sites
Designed a rule-based approach for process interference handling based on application semantics
Design, implementation, and evaluation of the DeltaGrid simulation framework
The DeltaGrid Approach
Current Directions: The Decentralized Data Dependency (D3) Analysis Project
36
The D3 Project
NSF Grant No. CCF 0820152 (Software for Real-World Systems Program)
A Decentralized and Rule-Based Approach to Data Dependency Analysis and Failure Recovery in a Service-Oriented Environment
Objective: To enhance service-oriented environments with theories and methods that support dynamic, flexible, and user-defined approaches to the recovery of failed processes that execute in a loosely-coupled environment without isolation guarantees.
Builds on and integrates three main concepts: The DEGS capability of externalizing database log files. Decentralized, peer-to-peer techniques for sharing and merging log files. Event and rule-driven techniques for dynamic process recovery and
exception handling.
Decentralized Process Execution Units
A decentralized community of PEXAs, each controlling the execution of multiple
processes.
A decentralized community of PEXAs, each controlling the execution of multiple
processes.
Deltas are stored locally for services that execute
at the PEXA site.
Deltas are stored locally for services that execute
at the PEXA site.
PEXAs communicate in a decentralized manner to dynamically discover data dependencies and to support event and rule driven recovery among concurrent processes.
Research Challenges
Decentralized data dependency analysis Representation, communication, correctness, performance
Dynamic aspects of service composition Event-driven service composition Refinement of process interference rules Introduce application exception events and rules
Correctness of execution and recovery with respect to intended user semantics.
Using formal methods to express execution and recovery correctness in a dynamic, decentralized, concurrent execution environment.
Decentralized algorithms for data dependency analysis, rule execution, and recovery procedures.
39
Questions?
S. D. Urban, Y. Xiao, L. Blake, and S. Dietrich, Monitoring Data Dependencies in Concurrent Process Execution Through Delta-Enabled Grid Services, to appear in International Journal Of Web and Grid Services, 2009.
Y. Xiao and S. D. Urban, The DeltaGrid Service Composition and Recovery Model, to appear International Journal of Web Services Research, 2009.
Y. Xiao and S. Urban, Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Journal of Information Science and Technology, 2008.
Y. Xiao and S. D. Urban, “Using Data Dependencies to Support the Recovery of Concurrent Processes in a Service Composition Environment,” Proceedings of the Cooperative Information Systems Conference (COOPIS), Monterrey, Mexico, November, 2008.
Y. Xiao and S. D. Urban. 2007. Process Dependencies and Process Interference Rules for Analyzing the Impact of Failure in a Service Composition Environment, Proceedings of the 10th International Conference on Business Information Systems, Poznan, Poland, April 2007, pp. 67-81.
Y. Xiao., S. D. Urban, and N. Liao. 2006. The DeltaGrid Abstract Execution Model: service composition and process interference handling. Proceedings of the 25th Int. Conference on Conceptual Modeling, pp. 40-53, Tucson, Arizona.
Y. Xiao, S. D. Urban, and S. W. Dietrich. 2006. A Process History Capture System for Analysis of Data Dependencies in Concurrent Process Execution. Proceedings of the 2nd Int. Workshop on Data Engineering Issues in E-Commerce and Services, pp.152-166, San Francisco, California.
H. Ma, S. D. Urban, Y. Xiao, and S. W. Dietrich. 2005. GridPML: A Process Modeling Language and Process History Capture System for Grid Service Composition. Proceedings of IEEE Int. Conference on e-Business Engineering, pp.433-440, Beijing, China.
40
Global Execution History
Delta – An incremental change in a data value. Δ(oID, a, Vold, Vnew, tsn, opij)
DEGS Local Execution History lh(degsID) = <tss,tse,δ(degsID)> δ(degsID) = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| opij.degsID=degsID and tss<=tsx<=tse]
([] indicates a list of elements ordered by timestamp)
Execution Context Operation execution context ec(opij) = <tss, tse, Input, Output, State> Process execution context ec(pi) = <tss, tse, Input, Output, State> Global execution context gec = [ec(entity) | (entity=opij or entity=pi) and
(tss≤ ec(entity).tss< ec(entity).tse≤ tse)]Global execution history gh = <tss, tse, δg, gec> Δg = [Δ(oIDA, a, Vold, Vnew, tsx, opij)| tss<=tsx<=tse]
System Invocation Event Sequence Eseq = [eentity | entity = opij or entity = pi]
41
A Process Definition Example
Atomic Group Compensation Contingency
Composite Group Deep/Shallow
compensation Contingency
Delta-Enabled Rollback
State diagrams and algorithms for defining recovery semantics of the service composition model (single and concurrent execution cases)
receiveClientOrderag11
checkCreditag12
ag13
chargeCreditcard
ag14
Process placeClientOrder (p1 = cg1)
cg15
checkInventory
cop:creditBacktop:eCheckPay
ag151
good credit? rejectClientOrder
decInventorycop:incInventory
ag152
chargeCreditcardcop:creditBacktop:eCheckPay
ag161
addBackordercop:rmvBackorder
ag162
cg16
packOrdercop:unpackOrderag17
upsShipOrdercop:upsShipback
top:fedexShipOrder
ag18
cop:chgOrderStatus
yes
no
sufficient inventory items?
yes
no
Compensation
Contingency
42
The Global Delta Object Schedule
tsIndex1p1op1ts11
Time-sequence Index
TimeSequenceIndexprocessIdoperationIdTimestampseqNum
...
tsIndex2p2op1ts21
tsIndex3p1op1ts31
tsIndexNp5op2tsN1
oIndex1p1op1
Operation Index
OperationIndexprocessIdoperationId
...oIndex2p1op2
oIndex3p2op3
oIndexNpxopy
node1classAObject1property1
Node
node2classBObject1property2
node3classAObject1property1
nodeNclassCObject3property2
NodeclassNameObjectIdpropertyName
...
Delta Repository
Process Runtime
Info
OODB
Time+
Data Storage Index Structure Instance View
43
The Global Execution History Interface Supported by the PHCS
Global Delta Object Schedule
Data storage
ProcessInfo
OperationInfo
Process runtime info repository
1
*
Delta DeltaProperty
Delta repository
PropertyValueDataChange
1 *
*
1
11
Process History
Analyzer
Process Execution
EngineData
sources
DEGS
Dataaccess ProcessInfoAccessGlobalScheduelAccess DeltaAccess
Process Operation1 1* * DeltaValue
Process Operation1 1* * Delta
Global Execution History Object Model
44
Global Execution History Object Model
+getOperation(in opName)+getCurrentOperation()+getDeltas()+getDeltas(in className)+getDeltas(in className, in attrName)+getDeltasBeforeRecovery()+getDeltasBeforeRecovery(in className)+getDeltasBeforeRecovery(in className, in attrName)+getMostRecentDeltaBeforeRecovery(in className, in attrName)+getDetlasByRecovery()+getDeltasByRecovery(in className)+getDeltasByRecovery(in className, in attrName)+getMostRecentDeltaByRecovery(in className, in attrName)
-pID-pName-startTime-endTime-state
Process
+getDeltas(in className)+getDeltas(in className, in attrName)
-opID-opName-startTime-endTime-state-degsID
Operation
-oID-className-attrName-oldValue-newValue-dataType-timestamp
Delta
wdOperations
1 *
getContingency
1 *
rdProcessesForOP 1*
1 *
wdProcessesForOP 1*
wdProcessForP
1 *
getOperations
1 1
1 *
getDeltas
getProcess
getOperation
getCompensation
1 1
rdOperations
1 *
rdProcessForP
1 *
Application Exception Rules
Recommended