Download pdf - Problem-solving environments for parallel computers

Future Generation Computer Systems 7 (1991/92) 221-229 221 North-Holland

Problem-solving environments for parallel computers *

David A. P a d u a Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

Abstract

Padua, D.A., Problem-solving environments for parallel computers, Future Generation Computer Systems 7 (1991/92) 221-229.

Man-machine interaction can take place at different levels of abstraction ranging from the machine-instruction level to the problem-specification level. A problem-solving environment should provide restructuring and debugging tools to make the interaction at these different levels possible and to allow the efficient use of the target machine. Restructurers translate from specifications to programs or from programs to more efficient versions. When the target machine is parallel, the restructurers should include techniques for the automatic exploitation of parallelism. Debuggers are necessary to test for correctness and to evaluate performance at the different levels. Debuggers for parallel programs have to deal with the possibility of nondeterminacy.

Keywords. Parallel computing; compilers; l~roblem-solving environments; programming environments.

1. I n t r o d u c t i o n

Two of the central goals in software have been the development of good man-machine inter- faces and compilation techniques for the efficient generation of machine code. These goals are particularly important for parallel computers, whose acceptance by ordinary users is directly depen- dent on their becoming as easy to use as sequential machines.

In this paper we discuss man-machine interaction and compilation techniques under the name of problem-solving environments. The term programming environment is used more frequently,

but it is more restrictive because programming is only a part (which is not always needed) of the problem-solving process.

A problem-solving environment should facilitate man-machine interaction at different levels of abstraction ranging from the machine instruction level to the problem-specification level. The next section briefly discusses levels of abstraction in man-machine interaction. The rest of the paper discusses restructurers and debuggers, two of the most important tools that a problem-solving environment should provide.

* This work was supported in part by the National Science Foundation under Grant No. NSF-MIP-8410110, the US Department of Energy under Grant No. US DOE FG02- 85ER25001, and the NASA Ames Research Center under Grant No. NASA (DARPA) NCC2-559.

2. Levels of abstraction in man-machine interaction

One of the central goals of software research has been to move the language of man-machine interaction closer to the problem and away from

0376-5075/92/$05.00 © 1992 - Elsevier Science Publishers t~'V. All rights reserved

222 D.A. Padua

target machine considerations. Two main approaches have been taken toward this goal. One is the design of very-high-level languages such as Lisp, SETL, and Prolog that tend to simplify the task of programming within some restricted domain. The other is the design of specification-ori- ented packages that allow the solution of problems without requiring any programming. These packages range from relatively simple tools such as SLADOC, a routine searching system developed in the Applied Mathematical Sciences Pro- grams of the Ames Laboratory, to more complex systems such as ELLPACK [26] used to solve elliptic partial differential equation systems from a description of the equations, the boundary conditions, the domain, and the solution method.

These different approaches can be classified in a hierarchy of abstraction levels. The higher the level in the hierarchy, the fewer the concerns of the user with implementation issues. At the top of the hierarchy is the problem-specification level. Here, the only concern is with the specification of what is to be solved or analyzed: there is no need to be aware of what algorithms or programming languages are used. The interaction takes place by handling knobs or other devices, in the language of science or engineering, or in terms of mathematical formulas. Examples of systems at this level can be found in some application programs designed for engineering and science, and in computer-aided instruction programs, including flight simulation programs. From the end- user's point of view, this is clearly the most desir- able level of interaction. However, in many cases it is not possible to design general-purpose systems at this level because the design choices that have to be made when going from specification to algorithms are not understood to the point of automation.

For this reason, many specification systems require user intervention for algorithm selection. For example, the CIP language [2] accepts problem specifications which are translated into executable programs via a sequence of correctness- preserving transformations selected by the user. Also, PDE solver systems [5,9,19,26] require the user to specify not only the equations and boundary conditions, but also the solution strategy (dis- cretization and solution method among other things). This is due to our inability to automate the analysis of stability and accuracy. The PDE

solver systems just mentioned belong to a second layer in our hierarchy, the solution-specification layers, where the only responsibility of the user is to select the algorithm by either naming it or by selecting an existing program.

At the third, and lowest, level of the hierarchy the interaction takes place in the realm of programming. This level can be decomposed into sublevels corresponding to the different cate- gories of programming languages. These range from assembly language to very-high-level languages such as Lisp, Prolog, and SETL for symbolic computing, and Matlab and FIDIL [16] for numerical computing. Parallel programming languages such as Cedar Fortran [12] and Multilisp [13] are at a lower level than their sequential counterparts because parallel constructs are usu- ally concerned with performance and implementation rather than with the problem itself.

Traditionally, most of the interaction with sequential machines has been at the programming level, and most of the work on restructuring for parallel computers has concentrated on the translation from sequential programs to parallel versions. This work, described briefly in the next section, facilitates the process of programming by allowing the user to work at the sequential programming level while making the power of parallelism available to the target code. In this way, man-machine interaction takes place at a higher level than that of explicit parallel programming. Also, thanks to restructurers, sequential programs can be translated to different target parallel architectures, facilitating portability.

Less work has been done on the restructuring of specifications into parallel programs. Part of the problem is that the translation of specifications is not well understood even for sequential machines. It is very likely that much effort will be devoted in the near future to this problem. Effec- tive translators for specifications or even very- high-level languages are bound to become important tools and will probably become a determi- nant factor in making parallel computers widely accepted. Being able to translate specifications will help the cause of automatic exploitation of parallelism because at the specification level there are more opportunities for parallelism than at the programming level, since once the algorithm has been chosen and implemented, some opportunities for parallelism may be lost. Also, because of

Problem-solving environments 223

the absence of architectural bias, specifications are bet ter than programs for effective porting across widely different target architectures.

3. Restructurers

Restructurers translate objects at a level of the interaction hierarchy into objects at a lower level or into more efficient objects at the same level. For example, there are restructurers that translate sequential Fortran programs into equivalent (but lower level) parallel programs. There are also restructurers that translate sequential For- tran programs into more efficient sequential For- tran programs.

A restructurer could generate machine code directly from a program or specification, or it could generate a program in a high-level language. In either case, when the source code is sequential and the translated version is parallel, a restructurer is called a parallelizer. In the next few paragraphs we discuss parallelization issues for Fortran, Lisp, and specifications. In a final subsection a few words are said about the organization of parallelizers.

3.1. Fortran parallelization

Much work has been done over the past 20 years on parallelizing Fortran compilers (see ref. [25] for a tutorial on this work). The most important techniques deal with the translation of do loops, the most important source of parallelism in numerical programs [4]. Thus, e.g. the loop

do i = 1 ,n A(i) =B(i) + D ( i - 1 ) D(i) = E(i) + 1

end do

can be automatically translated into the following vector statements:

D(1 :n) =E(1 :n) +1 A(1 :n) =B(1 :n) + D ( 0 : n - 1 )

If the target machine is a multiprocessor, the code could be translated into

A(1) =B(1) +D(0) doall I = 1 ,n,K

do i =l,min(n -1,1 + K - 1 ) D(i) =E(i) +1 A(i +1) =B(i +1) +D(i)

end do end doall D(n) = E(n) + 1

where doall means that different iterations of the loop can be executed in parallel and scheduled in any order. In this example, the loop was blocked (i.e., divided into iteration sub-sequences) to make each parallel thread larger and therefore de- crease the overhead associated with interproces- sor cooperation. Another reason to block is to allow the exploitation of several levels of parallelism. Thus, if the processors of the target multiprocessor had vector capabilities, the inner loop should be translated into a vector statement.

In addition to loop parallelization, issues such as synchronization [22], communication, and memory usage may have an important influence on performance. For this reason, many parallelizers include strategies for synchronization instruction generation, locality enhancement, and data partition and distribution. These last two topics are particularly important for distributed memory machines as well as hierarchical memory systems such as the Cedar multiprocessor [20,7]. Synchro- nization considerations can be seen in the previous example where a transformation called align- ment was applied. This transformation tries to place the statement instance that generates a value in the same iteration as the instance con- suming that value. This is done to avoid synchronization operations.

Locality enhancement has been studied exten- sively. For example, some strategies for locality enhancement were presented in ref. [21] and ex- tended and developed into automatic strategies in ref. [1]. Also, locality enhancement mecha- nisms were manually applied to improve the performance of matrix multiplication on the Alliant F X / 8 0 [18].

Other important memory-related issues also arise. For example, overlapping vector fetches from global memory and computation is an important optimization for the Cedar multiprocessor [11]. Also, data partitioning and distribution heavily influence performance in distributed

224 D.A. Padua

memory machines and hierarchical systems such as Cedar. Traditionally, the experimental restructurers that perform data partitioning have re- quired the user to specify the data partitioning via assertions. More recently, there has been some work to automate this process.

To illustrate how a parallelizer might improve locality, let us assume a multiprocessor with a cache memory on each processor. Further assume that the cache (and thus the memory) is divided into blocks of K words each, and that data are only exchanged between memory and cache as whole blocks. Assume also that matrix columns are sequences of blocks (i.e. matrices are stored in column-major order and columns are much larger than blocks).

Now consider the loop:

do i=1 ,n do j = 1 ,n

B(j, i) =A(i, j) +1 end do

end do

A naive compiler might transform the outer loop into a doall without any other transformation, causing (1 + l / K ) block transfers between memory and caches for each assignment executed.

To improve this situation, the compiler could block both loops into groups of K iterations. This would have the effect of processing in K x K submatrix order the matrix A. After blocking and interchanging loops, we end up with the loop nest:

do I = l , n ,K do J = 1 ,n,K

do i =1,1 + K - 1 d o j = J , J + K - 1

B(j, i) =A(i, j) +1 end do

end do end do

end do

If the outer loop is now transformed into a doall loop, the number of cache block transfers de- creases to 2/K, a clear improvement over the naive approach when K is large. To conclude the work on this loop we need to block once more for

vector registers, vectorize the innermost loop, and map into vector register instructions:

doall I = 1,n,K do J = 1 ,n,K

do i =1,1 + K - 1 do j =J,J + K -1 ,32

m =min(j +31 ,n) vrl =A(i, j: m) vr2 =v r l + 1 B(j : m,i) =vr2

end do end do

end do end do

This resulting code segment is more efficient than the original loop, but is also less readable. Not all transformations have this effect. For example, vectorization often makes the code more readable.

There are two issues that are rapidly becoming quite important in the area of Fortran restructuring. One has to do with the automatic optimization of parallel programs. Several parallel For- tran dialects have recently become available and more are expected in the near future. The issue is the correctness of optimization techniques (including parallelization) to be applied to programs written in these parallel dialects. Recent work [23] shows that traditional optimization techniques can be applied to parallel programs provided that some conditions are met. Such conditions use information on threads that are parallel to the one being optimized by the compiler. Con- sider, for example, the following program:

a = 0 cobegin

a = l / /

do i = l , n 13(i) = a

end do coend

At the end of this program, it can be asserted that 13 will be a sequence of zeros followed by a sequence of ones. A parallelizer using traditional techniques will transform the loop into a doall. In

Problem-solving environments 225

the parallelized program, B could be assigned any sequence of zeros and ones which contradicts the assertion made on the original program. Clearly, this example illustrates a case where parallelization should not be applied without special intervention of the programmer.

The second issue is that of multilingual parallelization. The objective of this work is to develop methodologies for interprocedural analysis and optimization of programs that include modules written in different languages. Such programs arise in applications that include both numerical and non-numerical subcomputations where it may be convenient to use Fortran for the numerical part and, for example, Lisp or C for the non- numerical part. The ability to perform compiler optimizations across language boundaries could increase the quality of the object code without sacrificing programming convenience.

3.2. Lisp parallelization

Fortran is not the only language for which parallelization techniques have been developed. During the last few years there has been an increasing interest in the translation of very- high-level languages such as Lisp and Prolog for parallel computers. Besides being very high level, these languages are of interest because they are appropriate for non-numerical computations.

Parallelizers for Lisp use some of the techniques developed for Fortran (and other impera- tive languages such as C). However, the center of attention is no longer the loop since recursion is used more frequently in Lisp. For this reason, a technique called recursion splitting has been pro- posed to parallelize recursive constructs. To ex- plain the technique, consider a function of the form:

X: function (p) if Q(p) then return R(p) y := X(F(p)) return G(p, y)

end

(1)

Clearly, any recursive function can be cast into this form by choosing Q, R, F, and G appropri- ately. Recursion splitting transforms an invocation X(po) into the expression

The value returned by the procedure invocation expand(P0, Q, F, R) is the sequence

Po, Pl . . . . . Prn, R(Prn+I)

where Q(Pr,+I) is true and Q(Pi) is false for all i < m + 1, and p~ is defined as the result of apply- ing F to Po a total of i times. The procedure

reduce (G, (Po, Pl . . . . . Pro, R(Prn+0))

is defined as the expression

G(Po .. . . G(Prn-1, G(Prn, R(prn+l)))...).

The advantage of translating the program into this expand-reduce form is that, in many cases, the reduce and expand functions can be parallelized. Consider the following function [10,14]:

TAK: function (a, b, c) if a < =b then return (c) return TAK (TAK ( a - 1, b, c),

TAK (b -1 , c, a), TAK ( c - 1, a, b))

end

This function can be cast into the canonical form shown above in several ways. One is the following:

TAK: function (a, b, c) if a < = b then return (c) y =TAK ( a - 1 , b, c) return TAK (y,

TAK (b -1 , c, a), TAK ( c - 1, a, b))

end

This function can be transformed into the expand-reduce form and then parallelized. Here the function used by expand is:

F (a, b, c ) - = a - l , b, c

and, therefore, the result of the expand function is the sequence (assuming that a 0 > b0):

reduce (G, expand(Po, Q, F, R)) aoboc o a o - l b o C o a o -2boCo. . . b o + lboC o boboc o

226 D.A. Padua

which can be clearly generated in parallel. Also, some invocations to TAK in the reduce expression can be performed in parallel before the reduce function is invoked. These invocations corre- spond to the two inner invocations of TAK in the return expression above. More examples of recursion splitting and a detailed description of Lisp parallelization techniques can be found in refs. [15] and [14].

As mentioned above, the work on Lisp parallelization is important partially because of its application to non-numerical computation. We believe high-speed non-numerical computations will become more common in the future. For example, applications such as CAD for placement and routing, as well as software development systems, will profit from parallelism. Many applications will include both numerical and non- numerical subcomputations. Examples include expert systems that rely on the results of numerical computations in their decision-making, e.g. complete electrical CAD systems, and mathemat- ics problem solvers such as Mathematica.

3.3. Specification parallelization

Problem and solution specification systems will be more common in the future, and end-users will use computers more effectively as problem solving tools. Furthermore, such systems present both a challenge and an opportunity for parallel processing. Specification systems will make the power of parallel computers available without the inconveniences produced by architectural idiosyn- crasies. Also, for this same reason, specifications should be trivially portable across diverse architectures.

A common belief is that parallelization can be applied more effectively to specifications than to programs. The reason is that at the specification level there is more information in a more convenient form than at the programming level. For example, it is proving difficult to find a good solution to the problem of automatic partitioning and distribution. The reason is that the compiler needs to determine what information is processed by the different loops and in what order. Such information should be readily available in a specification system. Another indication that using higher-level languages helps parallelization is

provided by parallelizing compilers where it is sometimes necessary to infer the specification from the program before effective parallelization can take place. For example, linear recurrences are recognized by parallelizers via pattern-match- ing and automatically replaced by invocations to parallel algorithms.

The challenge for the parallel processing com- munity is to develop techniques to profit from the opportunities presented by specification systems. Very little has been done in this area. See ref. [17] for some ideas on parallelization for ELL- PACK.

3.4. Interactive restructurers

Another area of current active research stud- ies the development of program manipulation systems that provide the user with a set of commands to interactively parallelize programs. Such a system gives the programmer more control than that provided by a batch compiler. Thanks to such systems, the programmer will be able to write an easy-to-read but probably inefficient program and, through manually-applied correctness-preserving transformations, generate an equivalent, more efficient program.

There are several research and commercial interactive parallelizers under development today. At Illinois we are working on the Delta system [24], which includes a collection of interactive commands and a programming language to create more powerful commands than those origi- nally provided by the system. This meta-programming language could also be used to write complete parallelizers, and we expect will make possible the rapid prototyping of restructuring techniques, allowing for the testing of different restructuring strategies much faster than is possible today.

4. Program behavior analysis

A visualization subsystem to display program components and their behavior is necessary for both performance and correctness debugging. Such a system should make it easy to represent instances of an abstract type, in an abstract form or in a form that is easily related to the real-life objects that the data type represents, either indi-

Problem-soluing environments 227

Table 1 Objects needed at different debugging levels

Debug Perfor- Algorithm Problem logic mance debugging specific

debugging tuning debugging

Program graph x Program graph

VS. machine utilization × x

Data structure dynamics × x ×

Collections of data structures rendered x x × x

vidually or collectively. It should also allow repre- sentation and analysis of other program entities such as dependence graphs and process invocation graphs. Table 1 shows what objects are useful for debugging at the program, solution specification, and problem specification levels.

Of the work in this area, performance visualization and the debugging of user-written parallel programs seem the most germane to parallel processing. Other issues such as program structure visualization are important for any computer system. Here we will only discuss debugging of ex- plicitly parallel programs. A major difficulty in parallel programming is the need for explicit synchronization between the components. There are ways to avoid having to use explicit synchronization. For example, by relying on the compiler for the automatic generation of parallel code. How- ever, conventional parallel programming will always be necessary and the use of explicit synchronization by the programmer can clearly lead to errors. An important form of such an error is unintended nondeterminacy. A program is nondeterminate if different runs, with the same input data and same initial value, may produce different results. For example, the segment of code:

loop. The reason for this is that 91 in iteration i writes into A(i), 92 in iteration i+1 reads from A(i), and the order of these two events is not guaranteed. This situation is called a read/write race. Nondeterminacy could be intentional if the program includes an asynchronous algorithm, but most often is the result of a mistake on the part of the programmer. Nondeterminacy is sometimes very difficult to detect. Tools to detect such situations could therefore be quite useful in some cases.

Much work has been done lately in the development of techniques to automatically detect nondeterminacy. See, for example, refs. [6] and [3]. At Illinois we are working on the development of analysis techniques and have developed a tool for use with Cedar Fortran [8]. The approach, however, is not restricted to Fortran dialects.

Given a program and a set of input values, the purpose of the tool is to inform the programmer whether nondeterminacy exists, and if so, what is the cause. Thus, for the segment of code above, the tool would indicate that there is a read/wr i te race between $1 and S 2.

The tool is designed to be used before a parallel break-point debugger is invoked, and its purpose is to help the programmer remove all the undesirable nondeterminacies before invoking the break-point debugger, and in this way facilitate the debugging task by dividing it into two phases.

The tool uses two approaches to detect nondeterminacy. One is static and is based on the analysis of the source program. The other approach uses a trace produced at run-time by the (instrumented) program being debugged. Trace analysis is more accurate than the static analysis, but is slower and is only known to be valid for the specific input data used to generate the trace.

5. Interaction belweeul the l~d~

A = O doal l i = 1 ,N

s l : A(i) =1

s2: B(i) =A( i - 1) end doal l

is nondeterminate since there is no way to know whether B(i) contains 1 or 0 at the end of the doall

In Fig. 1 we present a simplified version of a problem solving session. A typical session will start with the specification of a problem to be solved. Such a specification could be formal or informal, and may lead directly to an executable program. More likely, it will be the first step of a process that will commence with the selection of the algorithms to be used. After the selection,

228 D.A. Padua

I Specify Problem [ •

Select Algorithm(s!,, I ' * " " -

5earcll tor Existing Routine Write

or Program Generate Program

Estimate Performance

I Execute Program p Observe behavior

to determine correctness

and performance

Fig. 1. A problem-solving session

one or more of the following steps will have to be taken: (1) a routine implementing the algorithms may be searched (using a system such as SLA- DOC); (2) a program may be automatically generated by using the information concerning the problem and the algorithms; or (3) a new program may have to be written. The program resulting from the previous step may be run directly since it may already be tuned for the target machine (this will be the case if, for example, the program was generated by a very-high-level language system) or the collection of modules (per- haps in different languages) may have to restructured automatically or interactively. Powerful systems providing programmers with information on past experience with the target machine or with the routines being restructured could be an in- valuable tool here.

After the program is run, the user may wish to observe various aspects of the program's behavior and of the results of the performed computations. This program behavior can be shown to the programmer as a sequence of snapshots of the state of the objects the program manipulates, or as a movie. The manipulated objects can be repre-

sented in an abstract mathematical form such as matrices, linked lists, etc., or they might be depic- tions of real-life objects. Using such information the user/programmer may decide to change aspects of the program or some aspect of the specifications of the problem.

In these problem solving environments, the different layers of the abstraction hierarchy will coexist in the same way that Fortran, Lisp and assembly language coexist in the systems of today.

References

[1] W. Abu-Sufah, D.J. Kuck and D.H. Lawrie, On the performance enhancement of paging systems through program analysis and transformation, IEEE Trans. Corn- put. C-30, (5) (May 1981) 341-356.

[2] F.L. Bauer, B. Moller, M. Partsch and P. Pepper, Formal program construction by transformations-computer- aided, intuition-guided programming, IEEE Trans. Soft- ware Eng. 15 (2) (February 1989) 165-180.

[3] D. Callahan, K. Kennedy and J. Subhlok, Analysis of event synchronization in a parallel programming tool, in: Proc. 2nd ACM SIGPLAN Symp. on Principles and Prac- tice o f Parallel Programming, Seattle, WA, (March 1990) 21-30.

[4] D.-K. Chen, H.-M. Su and P.C. Yew, The impact of synchronization and granularity on parallel systems, in: Proc. 17th Annual Internat. Syrup. on Computer Architec- ture, Seattle, WA (May 1990) 239-248.

[5] G.O. Cook Jr., ALPAL - A tool for the development of large-scale simulation codes, Lawrence Livermore Na- tional Laboratory, Rept. No. UICD-21482. (22 August 1988).

[6] A. Dinning and E. Schonberg, An empirical comparison of monitoring algorithms for access anomaly detection, in: Proc. 2nd ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, Seattle, WA (March 1990) 1-10.

[7] R. Eigenmann, J. Hoeflinger, G. Jaxon and D. Padua, Cedar Fortran and its compiler, in: Proc. Joint Conf. on Vector and Parallel Processing, Zurich (September 1990).

[8] P.A. Emrath and D.A. Padua, Automatic detection of non-determinacy in parallel programs, Proc. Workshop on Parallel and Distributed Debugging, Sigplan Not. 24 (1) (Jan. 1989) 89-99.

[9] B. Engquist and T. Smedsaas, Automatic computer code generation for hyperbolic and parabolic differential equations, SlAM J. Sci. Stat. Comput., 1 (2) (June 1980) 249-259.

[10] R.P. Gabriel, Performance and Evaluation o f Lisp Sys- tems (MIT Press, Cambridge, MA, 1985).

[11] E.H. Gornish, Compile-time analysis for data prefetch- ing, M.S. Thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana- Champaign, Rept. No. CSRD 939 (1989).

Problem-solving encironments 229

[12] M.D. Guzzi, D.A. Padua, J.P. Hoeflinger, and D.H. Lawrie, Cedar Fortran and other vector and parallel Fortran dialects, J. Supercomputing (1990).

[13] R.H. Halstead Jr., Multilisp: a language for concurrent symbolic computation, ACM Trans. Program. Lang. Syst. (1985).

[14] W.L. Harrison, The interprocedural analysis and automatic parallelization of scheme programs, PhD Thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Rept. No. 860 (February 1989).

[15] W.L. Harrison and D.A. Padua, PARCEL: Project for the automatic restructuring and concurrent evaluation of Lisp, in: Proc. 1988 Internat. Conf. on Supercomputing, St. Malo, France (July 1988) 527-538.

[16] P.N. Hilfinger and P. Colella, FIDIL: a language for scientific programming, Lawrence Livermore National laboratory, Rept. No. UCRL-98057 (January 1988).

[17] E.N. Houstis, J.R. Rice, N.P. Chisochoides, H.C. Karathanasis, P.N. Papachiou, M.K. Samartzis, E.A. Vavalis and K.Y. Wang, / /E l lpack: a numerical simulation programming environment for parallel MIMD machines, in: Proc. 1990 Internat. Conf. on Supercomputing, Amsterdam, The Netherlands (June 1990) 96-107.

[18] W. Jalby and U. Meier, Optimizing matrix operations on a parallel multiprocessor with a memory hierarchy, in: Proc. 1986 Internat. Conf. on Parallel Processing (19-22 August 1986) 429-432.

[19] C. Konno, M. Yamabe, M. Saji, N. Sagawa, Y. Umetani, tt. Hirayama and T. Ohta, Automatic code generation method of DEQSOL (Differential EQuation SOlver Lan- guage), J. Inf. Process. 11 (1) (1987) 15-21.

[20] D.J. Kuck, E.S. Davidson, D.H. Lawrie and A.H. Sameh, Parallel supercomputing today and the Cedar approach, Science 201 (February 1986) 967-974.

[21] A. McKellar and E. Coffman Jr., Organizing matrices and matrix operations for paged memory systems, commun. ACM 12 (1969) 153-165.

[22] S.P. Midkiff and D.A. Padua, Compiler algorithms for synchronization, IEEE Trans. Comput. C-36 (12) (De- cember 1987) 1485-1495.

[23] S.P. Midkiff and D.A. Padua, Issues in the compile-time optimization of parallel programs, in: Proc. 1990 Internat. Conf. on Parallel Processing (August 1990).

[24] D.A. Padua, Preliminary design of the delta system, Rept. 880, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign (June 1989).

[25] D.A. Padua and M.J. Wolfe, Advanced compiler optimizations for supercomputers, Commun. ACM 29 (12) (Dec. 1986) 1184-1201.

[26] J.R. Rice and R.F. Boisvert, Solving Elliptic Problems Using ELLPACK (Springer, New York, 1985).

I~a,itl ~. i~adm~ received the Licen- ciatura in Computer Science from the Universidad Central de Venezuela in 1973, and the Ph.D. degree from the University of Illinois at Urbana- Champaign in 1980. From 1981 to 1984 he was with the Department of Computer Science at the Universidad Simon Bolivar, Venezuela. He has been at the University of Illinois since 1985, where he is now an Associate Director of the Center for Supercom- puting Research and Development

(CSRD) and an Associate Professor in the Department of Computer Science. Dr. Padua has published over 40 papers on different aspects of parallel computing including machine organization, parallel programming languages and tools, and parallelizing compilers. He led the development of the languages and compilers used in Cedar, a multiprocessor developed at CSRD. His current research focuses on the experimental analysis of parallelizing compilers and on the development of the techniques needed to make these compilers more effective. A co-organizer of the Workshops on Languages and Compilers for Parallel Computing, Dr. Padua served as Pro- gram Committee Chairman of the Second ACM Symposium on Parallel Programming, and Program Co-Chairman of the 1990 International Conference on Parallel Processing. He serves on the editorial board of the IEEE Transaction on Parallel and Distributed Systems and is a senior member of the IEEE.