[IEEE Computer Soc. Press International Conference on Application Specific Systems, Architectures and Processors: ASAP '96 - Chicago, IL, USA (19-21 Aug. 1996)] Proceedings of International

Automatic Generation of Modular Mapypings *

Hyuk-Jae Lee and Jose A.B. Fortes School of Electrical and Computer Engineering

Purdue University, W. Lafayette, IN 47907 { hyuk,fortes}Qecn.purdue.edu

Abstract

Modular mappings have been recently proposed for optimizations of algorithms that cannot be

efficiently mapped by affine mappings. This paper addresses the problem of generating modular

mappings that satisfy conditions for validity and optimality. In general, this is a difficult problem

due to the presence of non-linear constraints. Hence, a method of O ( n Z ) complexity is provided

to assign values to some entries of a transformation matrix so that nori-linear constraints are

transformed into linear ones, where n is the dimension of a computation domain. The proposed

heuristic attempts to reduce the number of value-assigned entries and exclude as few solutions

as possible. This paper also considers the issue of deriving the inverse transformation of a

given modular mapping. It identifies a class of modular functions whose inverses result directly

from computing the inverse of the (coeficient) matrix used to specify a modular mapping. A n

eficient method of O ( n 2 ) complexity is provided to formulate the problem of generating such

modular mappings as an integer linear programming problem.

1 Introduction

Affine transformations are widely used for many optimizations (of programs with loops) in

parallelizing compilers and systolic array design [1]-[4]. Recently, modular mappings, described

'This research was partially funded by the National Science Foundation under grants MIP-9500673 and CDA- 9015696.

1063-6862/96 $5.00 0 1996 IEEE 155

http://hyuk,fortes}Qecn.purdue.edu

by linear transformations modulo a constant vector, have been proposed for additional important

optimizations and designs[5]. Initial work on modular mappings focused on the characterization

of injectivity of a modular mapping [6]-[7]. These conditions can be used to identify the space

of valid modular time-space mappings of a given regular algorithm. Additional work concen-

trated on finding constraints that capture conditions of modular mappings for fast schedules,

data alignment, data distribution, and efficient space allocation [8]-[lo]. This paper addresses

the issue of how to combine all these constraints and systematically generate modular time-

space mappings. The proposed method is a heuristic one but, nevertheless. it is systematic,

computationally affordable, and attempts to exclude as few solutions as possible.

To automatically generate a program that results from a modular time-space transformation,

it is necessary to compute the inverse of the transformation. This paper identifies conditions of

modular mappings (whose form is TA : j’ i A , as discussed in Section 2) whose inverse

can also be easily derived from the inverse of the transformation matrix T (i.e., the inverse is

of the form T;! : j’ i A I ) . Then, a heuristic is proposed to formulate integer linear

programming problems whose solutions are modular mappings that satisfy these conditions.

The heuristic is not optimal in the sense that the solution of the integer linear programming

problem excludes some feasible modular mappings. However, it is similar to the above-mentioned

heuristic for combining multiple constraints and is systematic and computationally affordable

while attempting to include as many solutions as possible.

The rest of the paper is organized as follows. Section 2 defines and characterizes modular

mappings. Previous work on the conditions for injectivity of modular mappings is briefly re-

viewed. Section 3 studies the problem of generating a modular mapping that satisfies injectivity

conditions as well as other constraints. Section 4 investigates conditions of modular mappings

whose inverse can be derived from the inverse of the transformation matrix. In addition, gen-

eration of modular time-space mappings that satisfy the invertibility conditions is considered.

156

Conclusions are presented in Section 5.

2 Background

A time-space transformation is a mapping from an index set (iteration set, or computation

domain) of a (nested-loop) program into the domain of time and space (i.e., processors). A

modular time-space transformation is a special type of time-space transformation that can be

described by a linear transformation modulo a constant vector.

Definition 1 [modular function] A modularftinction, Tfi : 2" -+ 2"', is a mapping of the form: q1.*, m,)

9L.l

T&) = [ q 2 ' * ) '4"" m2) 1 where T($,,) is a row vector. The matrix T=: [ & ] and vector

T(nr.*) ' j ( m o d " 1 )

6 = (ml,. *., m,,)T are called the tmnsformation matrix and modulus vcictor, respectively. 0 Definition 2 [modular time-space transformation] A modular time-space transformation, T6,

is a modular function that is injective when its domain is restricted to the index set J of an

algorithm, i.e., Tfi : 1 2"' is injective. 0

For modular time-space transformations considered in the remaining of this paper, it is always

the case that n = n'.

It is not trivial to check whether a transformation matrix T and a modulus vector 6 yield an

injective modular mapping. Initial work on the characterization of inject.ive modular mappings

of rectangular index sets appears in [5] , [6] and additional results are due to [7]. The following

theorem shows the conditions of transformation matrices that guarantee the injectivity of the

corresponding modular mappings. Other injectivity conditions can be also found in [5]-[7].

Theorem 1 ([5])Let T; be a modular function of the index set 1;. Let :+ be an arbitrary total

order on the set { 1 ,2 , . . ., n}. Ti is injective if its transformation matrix T satisfies (1) t i j is

relatively prime to bi, and (2) tij = 0 if i + j . 0

157

In addition to injectivity conditions, there are many other conditions that may have to be

imposed on modular mappings to guarantee their validity and optimality[8]-[10]. Due to space

limitations, full details of these constraints are not explained in this paper. However, in general,

the problem of generating a modular mapping can be described as follows:

Find T and T' that

minimize hi(T) (or h;(T*)) subject to: (1)GT > 0, (2)KT" > 0 , (3)TT' = I , (4) injectivity conditions.

Here, T i s the transformation matrix of the modular mapping, h,(T) (or h;(T*)) is the objective

function of this problem, and G and Ii' are matrices. The inequalities in constraints (1) and (2)

can be replaced by equalities. Constraint ( 3 ) captures the fact that T* is the inverse of T . The

difficulty in solving this problem lies in the fact that some constraints are imposed on T while

others are imposed on the inverse of T , so that the non-linear constraint (3) should be satisfied.

For injectivity conditions, the problem needs to be decomposed into n!n! subproblems which

have linear injectivity conditions [5]. Hence, by linearizing constraint (3), this problem can be

converted into n!n! linear programming problems.

3 Generation of Injective Modular Mappings with Constraints

The constraint TT-' = I consists of a set of non-linear equations of the form: Ck t zk t ; ; = &,,

where S,, is one if i = j , and zero if z # j. To make this condition linear, it is necessary to choose

either t z k or tk;' as an arbitrary constant. However, arbitrary choice of t.k or ti,' may result in

a non-optimal solution. The more entries are chosen arbitrarily, the smaller the search space is

and so is the likelihood of finding an optimal solution. Hence, it is desirable to minimize the

number of arbitrarily chosen entries.

Consider a directed graph g ( V , t, W ) induced by a matrix T and an order + as follows:

nodes: V = {vilvi represents the i t h row of T } . edges: E = { ( w i , w j ) l j > i } , w h e r e > i s a n o r d e r o n t h e s e t {1 ,2 , . . . , n } .

158

1 1

(b)

Figure 1. (a) Graph induced by T = ( i p i ) a n d 2 + 3 > 4 + 1

(b) Maximally merged graph

weights of edges: - w(v i , v j ) = 1 if ti? is not determined. ~ w(vi , v j ) = 0 if tij is determined.

Two nodes vi and vj are adjacent with respect to > if there does not exist any number k between

i and j in the order >. Let U ; and vj be adjacent with respect to > and let w(vi , v j ) = 0. Then,

a (v i , vj)-merged graph 4' is generated from a graph G ( V , &, W ) as follows:

two nodes vi and vj are merged into one node vi,j; for a given node 'U[ of 6, w(vI, becomes w ( q , vi) + w ( q , v?); for a given node VI of Q , w(wi,j, w l ) becomes w(vi, V I ) + w ( v j , v ~ ) .

T h e maximally merged graph of 6 is a graph generated by merging all pairs of adjacent nodes

connected by a zero-weight ed e.

Example 1 Consider T = p i ) , where * denotes undetermined entries, and the

order 2 + 3 + 4 > 1 on the set {1,2,3,4}. The graph induced by T is shown in Fig. 1. Since

w(v,,v,) = 0, v1 and v4 can be merged as shown in Fig. 1 (b). The weight ~ ( v ~ , ~ , v ~ ) becomes

w(v1,v3) + w(v4,v3) = 2. Similarly, ~ ( 0 1 , 4 , ~ 2 ) = 1 is obtained. There do not exist any other

adjacent nodes with zero edge weights. Hence, this graph is the maximally merged graph. 0

Proposition 1 Let Tfi be a modular mapping that satisfies conditions of Theorem 1. The

condition TT-' = I becomes a system of linear equations if the maximally merged graph of the

graph induced by T has at most two nodes. U

Proposition 1 provides the condition of the induced graph that gu.arantees the equations

159

TT-' = I to be linear. Suppose that an induced graph does not satisfy Proposition 1. Some

edge weights should be reset to zero so that the graph can be merged to a two-node graph. The

resetting of an edge weight implies determination of the corresponding entry of T thus imposing

more restrictions on T . Hence, one needs to carefully choose edges to be reset to minimize the

number of the determined entries. In addition, the value chosen for a determined entry also

affects the quality of the solution of the resulting linear programming problem. However, in the

reset procedure, there is no exact information on which value results in a better solution. In

general, the constraint on the entry in the first row of T often requires that the entry should

be non zero because the first row of T is the schedule vector ii that has constraint of the form

ii # 0. On the other hand, the entries in the other rows of T are often required to be zero for

the conditions of code generation discussed in the next section. Hence, in the reset step, it may

be desirable to assign one to all entries in the first row and zero to all entries in the other rows.

When an edge of a merged graph is reset, it affects as many undetermined entries as the

weight of the edge. Therefore, it is desirable to reset the edge with the smallest weight. After

an edge weight is reset, the corresponding adjacent nodes can be merged. This reset and merge

steps can be repeated until only two nodes are left. For a single reset/merge step, it is necessary

to find the edge with minimal weight among edges connecting adjacent nodes. To do so, it is

necessary to compare only edges connecting adjacent nodes and therefore O( n ) time is required

for given n nodes. Since O ( n ) reset/merge steps are necessary, the time complexity of generating

a two-node maximally merged graph is O ( n Z ) . Example 1 [continued] The maximally merged graph violates the condition in Proposition 1.

Hence, there are non-linear equations in the equations TT-' = I . To make a two-node maximally

merged graph, one needs to merge either v1,4 and v3 or w3 and v2. Here, the edge connecting 03

and vz has smaller weight than the other. Hence, it is desirable to reset weight w(v3,v2) and

merge v3 and w2. The resulting graph is shown in Fig. 2. This graph satisfies Proposition 1.

This implies that determination of t32 guarantees the condition TT-l = I to be linear. 0

160

Figure 2. The maximally merged graph with two nodes

4 Generation of Modular Mappings for Code Generation

For automatic code generation, it is necessary to derive the invers,e of a given time-space

mapping. The following proposition provides the conditions of a modular mapping with a

transformation matrix T whose inverse is a niodular mapping with tra,nsformation matrix T-'.

Proposition 2 Let T;(T) = (Ty))mod 5 be a modular time-space transformation. The inverse of

Ti is (T-'y))mod i if either ti , = 0 for all i, i # cy or t,j = 0 for all j , j + 0. In other words, for

any a E { 1,2, . t., n} either t,, is the only nonzero entry of the at' row or it is the only nonzero

entry of the cyth column. U

( H d i ) Example 2

and = (3 ,4,5)T. Proposition 2 indicates that (T-l(.))modg is not the inverse of (T(.)),nod;.

Consider another modular transformation Tlmod ;(.) with T' = ( p ) and g = ( 3 , 4 , 5 ) ~ . In

this case, the transformation matrix satisfies the condition of Proposition 2. Hence, (T-'(.))mod;

Consider a modular time-space transformation Tmod ;(.) with T =

is the inverse of Tmod ;( .). 0

Consider the generation of modular mappings that satisfy Proposition 2. As in Section 3, a

graph-theoretical approach can be used. Consider a directed graph G( V , E , W ) induced by an

n x n matrix T and an order + as follows:

nodes: V = {v&wi represents the i th row of T } . edges: { E = (vi ,vj) for all j > i}, where > is an order on the set {1 ,2 ," . ,n} . weights of edges:

- w(v;,vj) = 1 if t ; j is not determined. - w(t~ i ,v j ) = 0 if t i j = 0. - w(vi ,v j ) = -n if t , j # 0.

This graph is the same as that of Section 3 except for edge weights when tij # 0. As in

161

1 1

1 0 . 0

Figure 3. (a) Graph induced by T = ( i Y ) and 2 t 3 t 4 t 1

(b) Maximally merged graph

Section 3 , two adjacent nodes can be merged if the weight between these two nodes are zero.

Example 3 Suppose that the transformation matrix be T = ( i i !) where * denotes

undetermined entries and let 2 + 3 t 4 t 1 be an order on the set {1,2,3,4}. The graph

induced by T is shown in Fig. 3 (a). Fig. 1 (b) shows the maximally-merged graph. 0

The following proposition gives the condition of the induced graph that guarantees the cor-

responding modular mapping satisfies Proposition 2.

Proposition 3 Transformation matrix T satisfies the conditions of Proposition 2 if the induced

graph from T has the maximally merged graph that has at most two nodes. 0

If an induced graph cannot be merged into a two-node graph, then, as done in Section 3 , it

is necessary to choose undetermined entries to be zero and repeat the reset/merge steps until

only two nodes are left. The only difference occurs when an entry t i j is initially determined to

be nonzero. Then, it is not possible t o merge vi and uj because the corresponding weight is

initially set to -n which is small enough to prevent the weight from becoming zero even if two

nodes ui and uj are merged with other nodes and the weight 1o(ui,vj) between them increases

by summation with other weights.

Example 3 [continued] The maximally merged graph violates the condition in Proposition 3

because it has three nodes. Hence, it is necessary t o merge either o ( ~ , ~ ) and u3 or u3 and u2 .

Two nodes u ( ~ , ~ ) and cannot be merged because the connecting weight is negative. Therefore,

it is necessary to reset edge v3, u2 and merge these two nodes. The resulting graph is shown in

162

Figure 4. The maximally merged graph with two nodes

Fig. 4. This graph has only two nodes, and therefore satisfies Proposition 3. 0

A modular mapping that satisfies Proposition 2 always satisfies Proposition 1 in Section 3.

Hence, a modular mapping generated by the method in this section also guarantees linearity of

constraints of the problem of generating a modular mapping. The procedure discussed in this

section also requires O(n2) time.

5 Conclusions

This paper addresses the problem of systematically generating modu1a.r time-space mappings

that simultaneously satisfy many conditions for validity and optimality. In general, this can be

formulated as a non-linear integer programming problem. This paper proposes an O( n2)-time

heuristic that fixes a small number of entries of a transformation matrix so that the non-linear

program can be converted to a linear program where R is the dimension of a computation domain.

By fixing some entries, the solution space of the modular mapping generation problem is reduced.

However, the proposed heuristic attempts to preserve as much of the solution space as possible

while maintaining computationally affordable complexity. For automatjc code generation, the

inverse of a modular mapping is required. Hence, this paper provides (invertibility) conditions

for modular mapping whose inverse can be derived from the inverse of the transformation matrix.

An O(n2)-time heuristic is provided to formulate an integer programming problem to generate

modular mappings that satisfy invertibility conditions as well as other conditions.

References

[l] A. Darte and Y. Robert. On the alignment problem. Parallel Processing Letters, 4(3):259-

163

270, Sep. 1994.

[2] S.-Y. Kung. VLSI array processors. Prentice-Hall, 1988.

[3] G.-J. Li and B.W. Wah. The design of optimal systolic arrays. IEEE Trans. Compul.,

C-34:66-77, Jan. 1985.

[4] W. Shang and J.A.B. Fortes. Time optimal linear schedules for algorithms with uniform

dependencies. IEEE Trans. Comput., C-40:723-742, June 1991.

[5] H.-J. Lee and J.A.B. Fortes. On the injectivity of modular mappings. In Proc. Int. Conf.

Application-Specific Array Processors, pages 236-247, Aug. 1994.

[6] €1.-J. Lee and J.A.B. Fortes. Modular mappings of rectangular algorithms. Technical Report

TR-EE 94-22, Electrical Engr., Purdue Univ., May 1994.

[7] A. Darte, M. Dion, and Y. Robert. A characterization of one-to-one modular mappings. In

Proc. 7th IEEE Symp. Parallel Distributed Processing, pages 382-389, Oct. 1995.

[8] H.-J. Lee and J.A.B. Fortes. Toward data distribution independent parallel matrix multi-

plication. In Proc. Int. Parallel Processing Symposium, pages 436-440, April 1995.

[9] H.-J. Lee and J.A.B. Fortes. Data alignments for modular mappings of BLAS-like algo-

rithms. In Proc. Int. Conf. Application-Specific Array Processors, pages 34-41, July 1995.

[lo] H.-J. Lee and J.A.B. Fortes. Conditions of blocked BLAS-like algorithms for data alignment

and communication minimization. In Proc. Int. Conf. Parallel Processing, volume 3, pages

220-223, Aug. 1995.

164

Documents

[IEEE Computer Soc. Press International Conference on Application Specific Systems, Architectures and Processors: ASAP '96 - Chicago, IL, USA (19-21 Aug. 1996)] Proceedings of International