22
1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao [email protected] 1/11/2000 Services Paths

1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao [email protected] 1/11/2000 Services Paths

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

1

Fault-tolerant Paths

ISRG Retreat

Z. Morley [email protected]

1/11/2000

ServicesPaths

Page 2: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

2

Example path application:Jukebox/cell-phone application

• Ninja Jukebox:– service providing real-time streaming audio

data from a collection of CDs in the network

• GSM Cell-phone:– 12kbps data, 13kbps voice– communicates with BTS

Jukebox

JukeboxPath

:operator :connector

Page 3: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

3

What is a path?

• A way to compose services to create customizable complex services

• Goals:– composability– accessibility– availability, fault-tolerance– scalability– security

Page 4: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

4

Overall path construction process

– a continuous optimization process with feedback:

Path Instantiation, Execution, Maintenance

Logical Path Creation

Physical Path Creation

Path Tear-down

Page 5: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

5

Logical path creation:Path matching algorithm

• Formulated as shortest path graph search– Operators ===> edges– Data format/type ===> nodes

• Dijkstra’s shortest path algorithm– O(v2)

• Difficulty: expressing constraints and optimization variables

Page 6: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

6

Path maintenance:Partial Path Repair (PPR)

• APC(Automatic Path Creation) service guarantees robustness and fault-tolerance

• Two ways of monitoring:– active checking of operator status– operators notify APC of neighboring operators’

failure

Page 7: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

7

Performance measurements(4 operators, Jukebox/cell-phone app)

• Logical/Physical path creation time: 264ms

• Path instantiation time: 215ms– operator instantiation: 70ms– connector creation: 64ms– start operator running: 81 ms

• Path recovery: one operator fails– Time to detect failure of operator: 2ms– Time to repair one failed operator: 400ms

• Path tear-down time: 289ms

Page 8: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

8

Open design issues

• Wide area considerations• Improved path reliability model• Path performance modeling• Path resource management framework• Flexible path control

– control path, path migration, dynamic adaptation

• Applications for paths• Metrics for evaluation

Page 9: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

9

Wide area path design

APC

APC

APCSAN

SAN

SAN

WAN WAN

service

Hierarchical APC Service

Page 10: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

10

Step-by-step WAN path creation for Jukebox/Cell-phone application

1. End-user using cell-phone requests access to Jukebox service

• QoS needs: delay-sensitive, reliable service

2. APC uses graph search algorithm finds the logical path

3. APC searches for the physical path1. Finds relevant parameters affecting QoS, determines

the reliability model

Page 11: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

11

Step-by-step WAN path creation for Jukebox/Cell-phone application

5. Obtains resource information from resource management framework

6. Uses queuing model to evaluate choices

7. APC selects the optimal choice

8. APC dynamically adjusts the decision given feedback from the resource monitoring tools

Page 12: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

12

Operator placement decisions

• Depend on– operator computational requirement– software/hardware requirement– output/input properties

• data location, data volume, delay-sensitivity, degradation properties

– network characteristics• bandwidth, latency, jitter, packet loss

Page 13: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

13

Path resource management framework

• develop network monitoring tools– to obtain network statistics

• Available resources– computational, memory, network etc.

• Make trade-offs due to interdependencies among resources

• resource allocated per path basis

Page 14: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

14

Path resource management framework

• A high-level global hierarchical resource manager

• Local resource manager per SAN

• Runtime resource monitoring tools monitor/discover resource changes during the lifetime of paths

Page 15: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

15

Applications for paths

• Operators:– content transcoding operators:

• text-to-speech, mp3-to-PCM, PCM-to-GSM

• web search tools, filtering, aggregation, personalization

• Microsoft COM objects, existing Web services...

• Document conversion services

– protocol translation operators:• serial socket, security transcoder, RMI Lite

Any service Any device

Page 16: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

16

Measurement metrics

• Path creation time– logical/physical path creation, instantiation, execution

• Scalability– number of paths created per amount of time

• Fault-recovery time• Control, ease-of-use, programmability of paths• Ease of transparent path migration, adaptation to

resource changes

Page 17: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

17

Conclusion

• Recent work:– APC prototype built for SAN with reasonable

performance, Partial Path Repair

• Future work:– focus on WAN, scalable path design

– WAN test plan: campus-wide millennium cluster

– support for continuous path optimization and adaptation

Page 18: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

18

For more information

• Please send comments/questions to Z. Morley Mao – [email protected]

• Slides will be available at:– http://www.cs.berkeley.edu/~zmao/paths

Page 19: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

19

Extra Slides

Page 20: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

20

Flexible path control: control path

• Control path– Definition: make changes of operators, connectors

• independent of data path• highly-available, fault-tolerant• Proposed design:

– replicated control paths:• neighboring operators have control over each other

• APC has complete control over localized operators

Page 21: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

21

Flexible path control: path migration

Two paths of different quality running– migrate from the fast-to-startup, lower-quality one to

slow-to-startup, higher-quality one

• Transparent migration of paths– dynamic fusion of operators – dynamic deletion, addition, replacement of

operators/connectors– Goal: adapt changes in resources and locations of end

points

Page 22: 1 Fault-tolerant Paths ISRG Retreat Z. Morley Mao zmao@cs.berkeley.edu 1/11/2000 Services Paths

22

Wide area considerations

• Goal: scalable, network-partition-tolerant• proposed design:

– replicated APC service instances

– state of paths partitioned and replicated

– operators are soft-state

– continuous monitoring of operators and connectors by APC service instances

– a few localized path components hooked together over wide area