Distributed Process Management

Chapter 14 - 28 pages 1

Distributed Process Management

Chapter 14


Process Migration

• Move an active process from one machine to another

• The process migrates to a target machine

• Transferring a sufficient amount of the state of a process from one machine to another


Why Migrate?

• Load sharing• move processes from heavily loaded to lightly

load systems• load can be balanced to improve overall

performance

• Communications performance• processes that interact intensively can be moved

to the same node to reduce communications cost• may be better to move process to where the data

reside when the data is large


Why Migrate?

• Availability• long-running process may need to move

because the machine it is running on will be down

• Utilizing special capabilities• process can take advantage of unique

hardware or software capabilities


Initiation of Migration

• Operating system• when goal is load balancing

• Process• when goal is to reach a particular

resource


What is Migrated?

• Must destroy the process on the source system

• Process control block and any links must be moved


What is Migrated?

• Eager (all):Transfer entire address space• no trace of process is left behind• if address space is large and if the

process does not need most of it, then this approach my be unnecessarily expensive


What is Migrated

• Precopy: Process continues to execute on the source node while the address space is copied• pages modified on the source during

precopy operation have to be copied a second time

• reduces the time that a process is frozen and cannot execute during migration


What is Migrated?

• Eager (dirty): Transfer only that portion of the address space that is in main memory• any additional blocks of the virtual address

space are transferred on demand• the source machine is involved throughout

the life of the process• good if process is temporarily going to

another machine• good for a thread since the threads left

behind need the same address space


What is Migrated?

• Copy-on-reference: Pages are only brought over on reference• variation of eager (dirty)• has lowest initial cost of process

migration


What is Migrated?

• Flushing: Pages are cleared from main memory by flushing dirty pages to disk• later use copy-on-reference strategy• relieves the source of holding any pages

of the migrated process in main memory


Negotiation of Process Migration

Starter Starter1: Will you take P?

2: Yes, migrate to machine 3

3: MigrateOut P

4: Offer P

7: Accept offer

5: Offer P 6: MigrateIn P

0 1 2 3 4

A

P

B

KJ KJ KJ KJKJ

S D


Distributed Global States

• Operating system cannot know the current state of all process in the distributed system

• A process can only know the current state of all processes on the local system

• Remote processes only know state information that is received by messages• these messages represent the state in the past


Example

• Bank account is distributed over two branches

• The total amount in the account is the sum at each branch

• At 3 PM the account balance is determined

• Messages are sent to request the information


Example

Branch A

Branch B

3:00

Total = $100.00

SA = $100.00

SB = $0.00


Example

• If at the time of balance determination, the balance from branch A is in transit to branch B

• The result is a false reading


Example

Branch A

Branch B

3:00

Total = $0.00

SA = $0.00

SB = $0.00

msg = “Transfer $100 to Branch B”

2:59

3:01


Example

• All messages in transit must be examined at time of observation

• Total consists of balance at both branches and amount in message


Example

• If clocks at the two branches are not perfectly synchronized

• Transfer amount at 3:01 from branch A

• Amount arrives at branch B at 2:59• At 3:00 the amount is counted twice


Example

Branch A

Branch B

Total = $200.00

SA = $100.00

SB = $100.00

3:00 3:01

2:59

3:00

msg = “Transfer $100 to Branch B”


Example of a Snapshot

Process 1 Outgoing channels 2 sent 1,2,3,4,5,6 3 sent 1,2,3,4,5,6 Incoming channels

Process 2 Outgoing channels 3 sent 1,2,3,4 4 sent 1,2,3,4 Incoming channels 1 received 1,2,3,4 stored 5,6 3 received 1,2,3,4,5,6,7,8

Process 3 Outgoing channels 2 sent 1,2,3,4,5,6,7,8 Incoming channels 1 received 1,2,3 stored 4,5,6 2 received 1,2,3 stored 4 4 received 1,2,3

Process 4 Outgoing channels 3 sent 1,2,3 Incoming channels 2 received 1,2 stored 3,4


Ordering of Events

• Events must be order to ensure mutual exclusion and avoid deadlock

• Clocks are not synchronized• Communication delays• State information for a process is not

up to date


Ordering of Events

• Need to consistently say that one event occurs before another event

• Messages are sent when want to enter critical section and when leaving critical section

• Time-stamping• orders events on a distributed system• system clock is not used


Time-Stamping

• Each system on the network maintains a counter which functions as a clock

• Each site has a numerical identifier• When a message is received, the

receiving system sets is counter to one more than the maximum of its current value and the incoming time-stamp (counter)


Time-Stamping

• If two messages have the same time-stamp, they are ordered by the number of their sites

• For this method to work, each message is sent from one process to all other processes• ensures all sites have same ordering of

messages• for mutual exclusion and deadlock all processes

must be aware of the situation


Token-Passing Algorithmif not token_present then begin clock := clock + 1; Prelude

broadcast(Request, clock I); wait(access, token); token_present := True;

endendif;token_held := True: <critical section>

token(i) := clock; Postludetoken_head := False;for j := i + 1 to n, 1 to i - 1 do if (request(j) > token(J)) [Symbol]^token_present then begin token_present := False; send(access, token(j)) end endif;

when received (Request, k, j) do Notation: request(j) := max(request(j), k); send(j, access, token) send message of type access, with token, by process j if token_present[Symbol]Ynot token_held then broadcast(request, clock, i) send message from process i of type request, with <text of postlude> timestamp clock, to all other processes endif received(request, t, j) receive message from process j of type request,enddo; with timestamp t


Distributed Deadlock Detection Algorithm

{Date object Dj receiving a lock_request(Ti)}begin if Locked_by(Dj) = nil then send granted else begin send not granted to Ti; send Locked)by(Dj) to Ti endend.{Transaction Ti makes a lock request for data object Dj}begin send lock_request(Ti) to Dj; wait for granted/not granted; if granted then begin Locked_by(Dj) := Ti; Held_by(Ti) := end else {suppose Dj is being used by transaction Tj} begin Held_by(Ti) := Tj; Enqueue(Ti, Request_Q(Tj)); if Wait_for(Tj) = nil then Wait_for(Ti) := Tj else Wait_for(Ti) := Wait_for(Tj); update(Wait)for(Ti), Request_Q(Ti)) endend.

{Transaction Tj receiving update message}begin if Wait_for(Tj) Wait_for(Ti) then Wait_for(Tj) := Wait_for(Ti); if Wait_for(Tj) Request_Q(Tj) = nil then update(Wait_for(Ti), Request_Q(Tj)) else begin DECLARE DEADLOCK; {initiate deadlock resolution as follows} {Tj is chosen as the transaction to be aborted} {Tj releases all the data objects it holds} send clear(Tj, Held_by(Tj)); allocate each data object Di held by Tj to the first requester Tk in Request_Q(Tj); for every transaction Tn in Request_Q(Tj) requesting data object Di held by Tj do Enqueue(Tn, Request_Q(Tk)); endend.

{Transaction Tk receiving a clear(Tj, Tk) messagebegin purge the tuple having Tj as the requesting transactionfrom Request_Q(Tk)end.


Example of Distributed Deadlock Algorithm

T5

T1 T2 T3

T4

T6

T0

T5

T1 T2 T3

T4

T6

T0

Transaction Wait_for Held_by Request_Q

T0 T0 T3 T1

T1 T0 T0 T2

T2 T0 T1 T3

T3 T0 T2 T4, T6, T0

T4 T0 T3 T5

T5 T0 T4 nil

T6 T0 T3 nil

Transaction Wait_for Held_by Request_Q

T0 nil nil T1

T1 T0 T0 T2

T2 T0 T1 T3

T3 T0 T2 T4, T6

T4 T0 T3 T5

T5 T0 T4 nil

T6 T0 T3 nil

Documents

Distributed Process Management