39
Matrix Chain Scheduling Algorithm yen3 March 4, 2009 yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 1 / 30

Matrix Chain Scheduling Algorithm

Embed Size (px)

DESCRIPTION

Processor Allocation and Task Scheduling of Matrix Chain Productson Parallel Systems(Parallelizing matrix chain products)Heejo Lee, Jong Kim, Sungje Hong, Sunggu LeeDept of Computer Science and Engineering Pohang University ofScience and Technology, Korea

Citation preview

Page 1: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm

yen3

March 4, 2009

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 1 / 30

Page 2: Matrix Chain Scheduling Algorithm

Outline

1 IntroductionAbout the paperThe ProblemNotationProblem Description

2 Processor Allocation for Matrix ProductsProcessor AllocationDPA Algorithm

3 Matrix Chain Scheduling AlgorithmMatrix Chain Scheduling AlgorithmTwo-Pass Matrix Chain Scheduling Algorithm

4 Conclusion

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 2 / 30

Page 3: Matrix Chain Scheduling Algorithm

Introduction About the paper

About the paper

Processor Allocation and Task Scheduling of Matrix Chain Productson Parallel Systems(Parallelizing matrix chain products)

Heejo Lee, Jong Kim, Sungje Hong, Sunggu Lee

Dept of Computer Science and Engineering Pohang University ofScience and Technology, Korea

Technical Report CS-HPC-97-003

April, 2003(December, 1997)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 3 / 30

Page 4: Matrix Chain Scheduling Algorithm

Introduction The Problem

MCOP: the Matrix Chain Ordering Problem

MCSP: the Matrix Chain Scheduling Problem

The paper introduce a processor scheduling algorithm for MCSPwhich attempts to minimize the evaluations time of a chain of matrixproducts on a parallel computer, even at the expense of a slightincrease in the total number of operations.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 4 / 30

Page 5: Matrix Chain Scheduling Algorithm

Introduction Notation

A lot of symbol...Orz

P: the number of processors in a parallel system.

M: a matrix chain product with n matrices, i.e,M = M1 ×M2 × · · · ×Mn.

Mi : an mi ×mi+1 matrix (mi ≥ 1, 1 ≤ i ≤ n).

L: a product sequence subtree of L for a matrix chain M.

Li ,j : the sequence subtree of L for (Mi × · · · ×Mj).

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 5 / 30

Page 6: Matrix Chain Scheduling Algorithm

Introduction Notation

A lot of symbol (cont’d)

C : the minimum amount of computation for evaluating M.

∆C : the amount of increased computation by modifying the currentsequence tree.

pi ,j : the number of processor assigned for evaluating (Mi × · · · ×Mj)

(mi , mj , mk): single matrix product for multiplying an mi ×mj matrixby an mj ×mk matrix.

Φ(mi , mj , mk , p): the execution time of single matrix product(mi , mj , mk) when p processors are allocated.

D(x): the set of divisors of x , i.e., D(x) = {d |d divides x}

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 6 / 30

Page 7: Matrix Chain Scheduling Algorithm

Introduction Notation

A lot of symbol - 2003 new!

LD(x , y): the largest divisor in D(x) that is not larger than y .

SD(x , y): if x > y , then SD(x , y) is the smallest divisor in D(x) thatis larger then y. Otherwise, SD(x , y) = x

m: the largest dimension among all of the matrices, i.e.,m = max1≤i≤n+1(mi )

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 7 / 30

Page 8: Matrix Chain Scheduling Algorithm

Introduction Problem Description

The Problem

The problemis finding the optimal schedule with minimum evaluationtime of M = M1 ×M2 × · · · ×Mn

1 on a P processor parallel system.

The matrix multiplication parallel algorithm is based thatmultiplying A by B with p processors, the executions timeΦ(mi , mj , mk , p)2 can be approx imated as follows.

Φ(mi , mj , mk , p) ≈

{mimjmk

p if 1 ≤ p ≤ mimkmimjmk

p log( pmimk

) if mimk < p ≤ mimjmk

1M: a matrix chain product with n matrices, i.e, M = M1 ×M2 × · · · ×Mn.2Φ(mi , mj , mk , p): the execution time of single matrix product (mi , mj , mk) when p

processors are allocated.yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 8 / 30

Page 9: Matrix Chain Scheduling Algorithm

Introduction Problem Description

The Problem - cont’d

The matrix products can divide to two parts. The Time is

3

Ti ,j ≈ mini≤k<j

(Ti ,k(pi ,j) + Tk+1,j(pi ,j) + Φ(mi , mk+1, mj+1, pi ,j),

max(Ti ,k(pi ,k), Tk+1,j(pk+1,j)) + Φ(mi , mk+1, mj+1, pi ,j))

So, The paper define MCSP: find the product sequence for evaluatinga chain of matrix chain of matrix products and the processor schedulefor the sequence such that the evaluation time is minimized an aparallel system.

3pi,j : the number of processor assigned for evaluating (Mi × · · · ×Mj)yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 9 / 30

Page 10: Matrix Chain Scheduling Algorithm

Introduction Problem Description

MCSP Complexity

The paper assume that each matrix multiplication(mi ×mj)× (mj ×mk) has log(mj) time complexity mimjmk

processors.

The problem can be presented

Ti ,j =

{mini≤k<j

(max(Ti ,k , Tk+1,j) + log(mk+1)

)1 ≤ i < j ≤ n

0 i = j , 1 ≤ i ≤ n

but, MCSP is NP-hard.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 10 / 30

Page 11: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products Processor Allocation

the Concurrent Computations of Multiple Matrix Products

proportional allocation: the algorithm allocates processorsproportional to the computation amount of each task. Trying tominimize the completion of all tasks by balancing the execution timeof each task.

For example

We have two matrix multiplication (2, 3, 8), (4, 4, 3) and 20 processor

If we allocate 10 processors to each matrix multiplication, we will

DA(2, 3, 8, 10) = D(2, 3, 8, 8) = 6 unit timeDB(4, 4, 3, 10) = D(4, 4, 3, 6) = 8 unit timeD(DA, DB) = 8 unit time

If we allocate 8 processors to A, 12 processors to B, we will

DA(2, 3, 8, 8) = D(2, 3, 8, 8) = 6 unit timeDB(4, 4, 3, 12) = D(4, 4, 3, 12) = 4 unit timeD(DA, DB) = 6 unit time

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30

Page 12: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products Processor Allocation

the Concurrent Computations of Multiple Matrix Products

proportional allocation: the algorithm allocates processorsproportional to the computation amount of each task. Trying tominimize the completion of all tasks by balancing the execution timeof each task.

For example

We have two matrix multiplication (2, 3, 8), (4, 4, 3) and 20 processor

If we allocate 10 processors to each matrix multiplication, we will

DA(2, 3, 8, 10) = D(2, 3, 8, 8) = 6 unit timeDB(4, 4, 3, 10) = D(4, 4, 3, 6) = 8 unit timeD(DA, DB) = 8 unit time

If we allocate 8 processors to A, 12 processors to B, we will

DA(2, 3, 8, 8) = D(2, 3, 8, 8) = 6 unit timeDB(4, 4, 3, 12) = D(4, 4, 3, 12) = 4 unit timeD(DA, DB) = 6 unit time

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 11 / 30

Page 13: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products- O(P) time

Discrete Processor Allocation for Two Matrix Products(DPA)

Input: Two Matrix products X = (mx , mx+1, mx+2) andY = (my , my+1, my+2) and a set of P processors.

Output: The number of processors allocated to the matrix productsX and Y , denoted as Px and Py , which satisfy 1 ≤ Px , Py ≤ P andPx + Py ≤ P

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 12 / 30

Page 14: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products- O(P) time

Discrete Processor Allocation for Two Matrix Products(DPA)

1 Set Pprop = mxmx+1mx+2

mxmx+1mx+2+mymy+1my+2P

2 Find dx ,i in D(mx , mx+2) which satisfies dx ,i ≤ Pprop

3 Find dy ,j in D(my , my+2) which satisfies dy ,j ≤ P − Pprop

4 If Φ(mx , mx+1, mx+2, dx ,i ) < Φ(my , my+1, my+2, dy ,j), thenPx = dx ,i , Py = P − dx ,i . Otherwise Px = P − dy ,j , Py = dy ,j

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30

Page 15: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products- O(P) time

Discrete Processor Allocation for Two Matrix Products(DPA)

1 Set Pprop = mxmx+1mx+2

mxmx+1mx+2+mymy+1my+2P

2 Find dx ,i in D(mx , mx+2) which satisfies dx ,i ≤ Pprop

3 Find dy ,j in D(my , my+2) which satisfies dy ,j ≤ P − Pprop

4 If Φ(mx , mx+1, mx+2, dx ,i ) < Φ(my , my+1, my+2, dy ,j), thenPx = dx ,i , Py = P − dx ,i . Otherwise Px = P − dy ,j , Py = dy ,j

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30

Page 16: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products- O(P) time

Discrete Processor Allocation for Two Matrix Products(DPA)

1 Set Pprop = mxmx+1mx+2

mxmx+1mx+2+mymy+1my+2P

2 Find dx ,i in D(mx , mx+2) which satisfies dx ,i ≤ Pprop

3 Find dy ,j in D(my , my+2) which satisfies dy ,j ≤ P − Pprop

4 If Φ(mx , mx+1, mx+2, dx ,i ) < Φ(my , my+1, my+2, dy ,j), thenPx = dx ,i , Py = P − dx ,i . Otherwise Px = P − dy ,j , Py = dy ,j

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30

Page 17: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products- O(P) time

Discrete Processor Allocation for Two Matrix Products(DPA)

1 Set Pprop = mxmx+1mx+2

mxmx+1mx+2+mymy+1my+2P

2 Find dx ,i in D(mx , mx+2) which satisfies dx ,i ≤ Pprop

3 Find dy ,j in D(my , my+2) which satisfies dy ,j ≤ P − Pprop

4 If Φ(mx , mx+1, mx+2, dx ,i ) < Φ(my , my+1, my+2, dy ,j), thenPx = dx ,i , Py = P − dx ,i . Otherwise Px = P − dy ,j , Py = dy ,j

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 13 / 30

Page 18: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA description in 2003

1 Pprop = P × mxmx+1mx+2

mxmx+1mx+2+mymy+1my+2

2 di = LD(mxmx+2, Pprop)

3 di+1 = SD(mxmx+2, Pprop)

4 dj = LD(mjmj+2, P − Pprop)

5 dj+1 = SD(mjmj+2, P − Pprop)6 if Φ(X , di ) ≥ Φ(Y , dj) then

1 if max(Φ(X , di+1), Φ(Y , LD(P − di+1)))Px = di+1, Py = LD(mymy+2, P − di+1)

2 elsePx = di , Py = LD(mymy+2, P − di + 1)

3 elseif max(Φ(X , LD(P − dj+1)), Φ(Y , dj+1)) < Φ(Y , dj) thenPx = LD(mxmx+2, P − dj+1), Py = dj+1

4 elsePx = LD(mxmx + 2, P − dj), Py = dj

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 14 / 30

Page 19: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for Two Matrix Products - cont’d

the naıve algorithm time complexity is O(P) , P is the number ofprocessors.(The major reason is step 2,3)

The completion time of two matrix products when processors areallocated by the DPA algorithm is shorter then or equal to that by theproportional allocation.

DPA guarantees the minimum completion time for two matrixproducts.

DPA can easily extended for k independent matrix products.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 15 / 30

Page 20: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for k Matrix Products

DPA-k guarantees the minimum completion time of k independentmatrix products.

Discrete Processor Allocation for k Matrix Products (DPA-k)

Input: k matrix produets Xi = (mi ,1, mi ,2, mi ,3) for 1 ≤ i ≤ k aregiven on P processos (k ≤ P)

Output: The number of allocated processors Pi for each matrix Xi

which satisfies∑k

i=1 Pi ≤ P

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 16 / 30

Page 21: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for k Matrix Products - cont’d

Discrete Processor Allocation for k Matrix Products (DPA-k)

1 For i = 1 to k do1 Pprop,i =

mi,1mi,2mi,3Pkj=1 mj,1mj,2mj,3

P

2 Find the maximum di,j in D(mi,1, mi,3) which satisfies di,j ≤ Pprop, i3 Let Pi = di,j

2 While P −∑k

i=1 Pi > 0 do1 Find the product Xi with the maimum Φ(mi,1, mi,2, mi,3, Pi )2 If (Pi < mi,1mi,3) then

1 Find the minimum di,j in D(mi,1, mi,3) which satisfies di,j > Pi

2 If (P −Pk

i=1 Pi − (di,j − Pi ) > 0) thenPi = di,j Otherwise, stop the algorithm.

else stop the algorithm

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30

Page 22: Matrix Chain Scheduling Algorithm

Processor Allocation for Matrix Products DPA Algorithm

DPA for k Matrix Products - cont’d

Discrete Processor Allocation for k Matrix Products (DPA-k)

1 For i = 1 to k do1 Pprop,i =

mi,1mi,2mi,3Pkj=1 mj,1mj,2mj,3

P

2 Find the maximum di,j in D(mi,1, mi,3) which satisfies di,j ≤ Pprop, i3 Let Pi = di,j

2 While P −∑k

i=1 Pi > 0 do1 Find the product Xi with the maimum Φ(mi,1, mi,2, mi,3, Pi )2 If (Pi < mi,1mi,3) then

1 Find the minimum di,j in D(mi,1, mi,3) which satisfies di,j > Pi

2 If (P −Pk

i=1 Pi − (di,j − Pi ) > 0) thenPi = di,j Otherwise, stop the algorithm.

else stop the algorithm

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 17 / 30

Page 23: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm - Introduction

Matrix Chain Scheduling Algorithm

The algorithm has three stages.

1 Find optimal product sequence by MCOP

2 Top-Down Processor Assignmentprocessors are partitioned and assigned to each subtree to balance theevaluation time of both matrix product chains.

3 Bottom-Up Concurrent ExecutionThe steps executes products independently from the leaf and tries tomodify the product sequence to enhance concurrency so as to reducethe evaluation time of Ma.

aM: a matrix chain product with n matrices, i.e, M = M1 ×M2 × · · · ×Mn.

The Algorithm Time Complexity is O(n2 + nP)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30

Page 24: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm - Introduction

Matrix Chain Scheduling Algorithm

The algorithm has three stages.

1 Find optimal product sequence by MCOP

2 Top-Down Processor Assignmentprocessors are partitioned and assigned to each subtree to balance theevaluation time of both matrix product chains.

3 Bottom-Up Concurrent ExecutionThe steps executes products independently from the leaf and tries tomodify the product sequence to enhance concurrency so as to reducethe evaluation time of Ma.

aM: a matrix chain product with n matrices, i.e, M = M1 ×M2 × · · · ×Mn.

The Algorithm Time Complexity is O(n2 + nP)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30

Page 25: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm - Introduction

Matrix Chain Scheduling Algorithm

The algorithm has three stages.

1 Find optimal product sequence by MCOP

2 Top-Down Processor Assignmentprocessors are partitioned and assigned to each subtree to balance theevaluation time of both matrix product chains.

3 Bottom-Up Concurrent ExecutionThe steps executes products independently from the leaf and tries tomodify the product sequence to enhance concurrency so as to reducethe evaluation time of Ma.

aM: a matrix chain product with n matrices, i.e, M = M1 ×M2 × · · · ×Mn.

The Algorithm Time Complexity is O(n2 + nP)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30

Page 26: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm - Introduction

Matrix Chain Scheduling Algorithm

The algorithm has three stages.

1 Find optimal product sequence by MCOP

2 Top-Down Processor Assignmentprocessors are partitioned and assigned to each subtree to balance theevaluation time of both matrix product chains.

3 Bottom-Up Concurrent ExecutionThe steps executes products independently from the leaf and tries tomodify the product sequence to enhance concurrency so as to reducethe evaluation time of Ma.

aM: a matrix chain product with n matrices, i.e, M = M1 ×M2 × · · · ×Mn.

The Algorithm Time Complexity is O(n2 + nP)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 18 / 30

Page 27: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

First Step - Find Optimal Product Sequence by MCOP

The optimal product sequence can be found of in O(n log(n)) timeusing a sequential algorithm.

Many parallel algorithms have been studied which run in ploylog timeon P processor system.

notation

W [i ][j ]: the minimum number for operations for evaluating Li ,ja.

S [i ][j ]: the matrix index for partitioning the matrix chain(Mi × · · · ×Mj)

aLi,j : the sequence subtree of L for (Mi × · · · ×Mj).

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30

Page 28: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

First Step - Find Optimal Product Sequence by MCOP

The optimal product sequence can be found of in O(n log(n)) timeusing a sequential algorithm.

Many parallel algorithms have been studied which run in ploylog timeon P processor system.

notation

W [i ][j ]: the minimum number for operations for evaluating Li ,ja.

S [i ][j ]: the matrix index for partitioning the matrix chain(Mi × · · · ×Mj)

aLi,j : the sequence subtree of L for (Mi × · · · ×Mj).

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 19 / 30

Page 29: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Second Step - Top-Down Processor Assigment

pi ,j4 processors were assigned to Li ,j

5

pi,j × W [i ][S[i ][j]]W [i ][S[i ][j]]+W [S[i ][j]+1][j] processors are assigned to the subtree

Li,S[i ][j].

pi,j × W [S[i ][j]+1][j]W [i ][S[i ][j]]+W [S[i ][j]+1][j] processors are assigned to the subtree

LS[i ][j]+1,j

It define the tree recursively.

4pi,j : the number of processor assigned for evaluating (Mi × · · · ×Mj)5Li,j : the sequence subtree of L for (Mi × · · · ×Mj).

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 20 / 30

Page 30: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Second Step - For Example

For example, given a chain of 8 matrices with dimensions{3, 8, 9, 5, 3, 3, 3, 4} on 64 processors parallel system.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 21 / 30

Page 31: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Third Step - Bottom-Up Concurrent Execution

There are idle processors in the execution of Li ,j , the step try tomodify the product sequence to use these idle processors.

There are two condition.

y > x + 1 and the node associated with My+1 has the left child nodeassociated with My in the sequence tree.

y < x − 1 and the node associated with My has the right child nodeassociated with My+1 in the sequence tree.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 22 / 30

Page 32: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Sequence modification

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 23 / 30

Page 33: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

If y > z then

∆C = my+1my+2(my −mz) + mzmy (my+2 −my+1)6

If y < z then

∆C = my+1my+2(my −mz+1) + mz+1my (my+2 −my+1)

Lemma

If a leaf product (Mx , Mx+1) has a candidate product (MyMy+1) and theDPA algorithm will allocate px and py processors to the two matrixproducts, respectively, then evaluation using the modified sequencereduces the evaluation time when

∆C < min(Φ(mx , mx+1, mx+2, mxmx+2)×(px+py−mxmx+2), mymy+1my+2)

6∆C : the amount of increased computation by modifying the current sequence tree.yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 24 / 30

Page 34: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Matrix Chain Scheduling Algorithm

Candidate Products

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 25 / 30

Page 35: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm

Stage-1 MCOP

Two-Pass Matrix Chain Scheduling Algorithm - Stage 1

Stage-1 MCOP

1 Find the optimal product sequence for the MCOP by using a parallelalgorithm.

2 Generate the sequence tree L.

The Dynamic Programing Algorhtm - O(n2) to O(n3)

The sequential algorithm - O(n log(n))

The Parallel Algorithm - O(log3 n)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 26 / 30

Page 36: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm

Stage-2 Top-Down Processor Assignment

Two-Pass Matrix Chain Scheduling Algorithm - Stage 2

Stage-2 Top-Down Processor Assignment

1 Intialize i = 1, j = n, pi ,j = P.

2 If i is not S [i ][j ], then allocate pi ,j × W [i ][S[i ][j]]W [i ][S[i ][j]]+W [S[i ][j]+1][j]

processors to Li ,S[i ][j].ab

3 If j is not S [i ][j ] + 1, then allocate pi ,j × W [S[i ][j]+1][j]W [i ][S[i ][j]]+W [S[i ][j]+1][j]

processors to LS[i ][j]+1,j .

4 If i is j + 1 or j , then finish this stage; otherwise, call this algorithmrecursively, once with i = i , j = S [i ][j ] and once i = S [i ][j ] + 1, j = j .

aW [i ][j ]: the minimum number for operations for evaluating Li,j .bS [i ][j ]: the matrix index for partitioning the matrix chain (Mi × · · · ×Mj)

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 27 / 30

Page 37: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm

Stage-3 Bottom-Up Concurrent Execution

Two-Pass Matrix Chain Scheduling Algorithm - Stage 3

Stage-3 Bottom-Up Concurrent Execution

1. Let (Mk , Mk+1) be a leaf product and pk,k+1 be the number ofprocessors allocated to the leaf product. If pk,k+1 < mkmk+2, then goto 5.

2. Find a candidate product by tracing ancestors of the leaf product usingpostorder traversal. If there is no such candidate product, go to 5.

3. Let the product (MlMl+1) be a candidate product found by tracingancestors of the leaf product (MkMk+1). Check whether the candidateproduct satisfies Lemma 2. If not, go to 2.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 28 / 30

Page 38: Matrix Chain Scheduling Algorithm

Matrix Chain Scheduling Algorithm Two-Pass Matrix Chain Scheduling Algorithm

Stage-3 Bottom-Up Concurrent Execution

Two-Pass Matrix Chain Scheduling Algorithm - Stage 3

Stage-3 Bottom-Up Concurrent Execution

4. Modify the sequence tree such that the candidate product (MlMl+1)can be executed concurrently with (Mk , Mk+1). Relocate pkpk+1

processors using the DPA algorithm and go to 1 for each leaf productof two split subtrees.

5. Schedule the leaf product on min(pi ,j , mkmk+2) processors. Set theparent of the leaf procuctas a new leaf product.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 29 / 30

Page 39: Matrix Chain Scheduling Algorithm

Conclusion

Conclusion

We discuss the Matrix Chain Products Problem, DPA Algorithm,Matrix Chain Scheduling Algorithm.

The paper emphasizes the optimal matrix chain products treemodification on parallel system.

I will try to reduce my slide XD.

yen3 () Matrix Chain Scheduling Algorithm March 4, 2009 30 / 30