Future Generation Computer Systems 7 (1991/92) 221-229 221 North-Holland
Problem-solving environments for parallel computers *
David A. Padua Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Padua, D.A., Problem-solving environments for parallel computers, Future Generation Computer Systems 7 (1991/92) 221-229.
Man-machine interaction can take place at different levels of abstraction ranging from the machine-instruction level to the problem-specification level. A problem-solving environment should provide restructuring and debugging tools to make the interaction at these different levels possible and to allow the efficient use of the target machine. Restructurers translate from specifications to programs or from programs to more efficient versions. When the target machine is parallel, the restructurers should include techniques for the automatic exploitation of parallelism. Debuggers are necessary to test for correctness and to evaluate performance at the different levels. Debuggers for parallel programs have to deal with the possibility of nondeterminacy.
Keywords. Parallel computing; compilers; l~roblem-solving environments; programming environments.
1. In t roduct ion
Two of the central goals in software have been the development of good man-machine inter- faces and compilation techniques for the efficient generation of machine code. These goals are par- ticularly important for parallel computers, whose acceptance by ordinary users is directly depen- dent on their becoming as easy to use as sequen- tial machines.
In this paper we discuss man-machine interac- tion and compilation techniques under the name of problem-solving environments. The term pro- gramming environment is used more frequently,
but it is more restrictive because programming is only a part (which is not always needed) of the problem-solving process.
A problem-solving environment should facili- tate man-machine interaction at different levels of abstraction ranging from the machine instruc- tion level to the problem-specification level. The next section briefly discusses levels of abstraction in man-machine interaction. The rest of the pa- per discusses restructurers and debuggers, two of the most important tools that a problem-solving environment should provide.
* This work was supported in part by the National Science Foundation under Grant No. NSF-MIP-8410110, the US Department of Energy under Grant No. US DOE FG02- 85ER25001, and the NASA Ames Research Center under Grant No. NASA (DARPA) NCC2-559.
2. Levels of abstraction in man-machine interac- tion
One of the central goals of software research has been to move the language of man-machine interaction closer to the problem and away from
0376-5075/92/$05.00 1992 - Elsevier Science Publishers t~'V. All rights reserved
222 D.A. Padua
target machine considerations. Two main ap- proaches have been taken toward this goal. One is the design of very-high-level languages such as Lisp, SETL, and Prolog that tend to simplify the task of programming within some restricted do- main. The other is the design of specification-ori- ented packages that allow the solution of prob- lems without requiring any programming. These packages range from relatively simple tools such as SLADOC, a routine searching system devel- oped in the Applied Mathematical Sciences Pro- grams of the Ames Laboratory, to more complex systems such as ELLPACK  used to solve elliptic partial differential equation systems from a description of the equations, the boundary con- ditions, the domain, and the solution method.
These different approaches can be classified in a hierarchy of abstraction levels. The higher the level in the hierarchy, the fewer the concerns of the user with implementation issues. At the top of the hierarchy is the problem-specification level. Here, the only concern is with the specification of what is to be solved or analyzed: there is no need to be aware of what algorithms or programming languages are used. The interaction takes place by handling knobs or other devices, in the lan- guage of science or engineering, or in terms of mathematical formulas. Examples of systems at this level can be found in some application pro- grams designed for engineering and science, and in computer-aided instruction programs, includ- ing flight simulation programs. From the end- user's point of view, this is clearly the most desir- able level of interaction. However, in many cases it is not possible to design general-purpose sys- tems at this level because the design choices that have to be made when going from specification to algorithms are not understood to the point of automation.
For this reason, many specification systems require user intervention for algorithm selection. For example, the CIP language  accepts prob- lem specifications which are translated into exe- cutable programs via a sequence of correctness- preserving transformations selected by the user. Also, PDE solver systems [5,9,19,26] require the user to specify not only the equations and bound- ary conditions, but also the solution strategy (dis- cretization and solution method among other things). This is due to our inability to automate the analysis of stability and accuracy. The PDE
solver systems just mentioned belong to a second layer in our hierarchy, the solution-specification layers, where the only responsibility of the user is to select the algorithm by either naming it or by selecting an existing program.
At the third, and lowest, level of the hierarchy the interaction takes place in the realm of pro- gramming. This level can be decomposed into sublevels corresponding to the different cate- gories of programming languages. These range from assembly language to very-high-level lan- guages such as Lisp, Prolog, and SETL for sym- bolic computing, and Matlab and FIDIL  for numerical computing. Parallel programming lan- guages such as Cedar Fortran  and Multilisp  are at a lower level than their sequential counterparts because parallel constructs are usu- ally concerned with performance and implemen- tation rather than with the problem itself.
Traditionally, most of the interaction with se- quential machines has been at the programming level, and most of the work on restructuring for parallel computers has concentrated on the trans- lation from sequential programs to parallel ver- sions. This work, described briefly in the next section, facilitates the process of programming by allowing the user to work at the sequential pro- gramming level while making the power of paral- lelism available to the target code. In this way, man-machine interaction takes place at a higher level than that of explicit parallel programming. Also, thanks to restructurers, sequential pro- grams can be translated to different target paral- lel architectures, facilitating portability.
Less work has been done on the restructuring of specifications into parallel programs. Part of the problem is that the translation of specifica- tions is not well understood even for sequential machines. It is very likely that much effort will be devoted in the near future to this problem. Effec- tive translators for specifications or even very- high-level languages are bound to become impor- tant tools and will probably become a determi- nant factor in making parallel computers widely accepted. Being able to translate specifications will help the cause of automatic exploitation of parallelism because at the specification level there are more opportunities for parallelism than at the programming level, since once the algorithm has been chosen and implemented, some opportuni- ties for parallelism may be lost. Also, because of
Problem-solving environments 223
the absence of architectural bias, specifications are better than programs for effective porting across widely different target architectures.
Restructurers translate objects at a level of the interaction hierarchy into objects at a lower level or into more efficient objects at the same level. For example, there are restructurers that trans- late sequential Fortran programs into equivalent (but lower level) parallel programs. There are also restructurers that translate sequential For- tran programs into more efficient sequential For- tran programs.
A restructurer could generate machine code directly from a program or specification, or it could generate a program in a high-level lan- guage. In either case, when the source code is sequential and the translated version is parallel, a restructurer is called a parallelizer. In the next few paragraphs we discuss parallelization issues for Fortran, Lisp, and specifications. In a final subsection a few words are said about the organi- zation of parallelizers.
3.1. Fortran parallelization
Much work has been done over the past 20 years on parallelizing Fortran compilers (see ref.  for a tutorial on this work). The most impor- tant techniques deal with the translation of do loops, the most important source of parallelism in numerical programs . Thus, e.g. the loop
do i = 1 ,n A(i) =B(i) +D( i -1 ) D(i) = E(i) + 1
can be automatically translated into the following vector statements:
D(1 :n) =E(1 :n) +1 A(1 :n) =B(1 :n) +D(0 :n -1 )
If the target machine is a multiprocessor, the code could be translated into
A(1) =B(1) +D(0) doall I = 1 ,n,K
do i =l,min(n -1,1 +K-1) D(i) =E(i) +1 A(i +1) =B(i +1) +D(i)
end do end doall D(n) = E(n) + 1
where doall means that different iterations of the loop can be executed in parallel and scheduled in any order. In this example, the loop was blocked (i.e., divided into iteration sub-sequences) to make each parallel thread larger and therefore de- crease the overhead associated with interproces- sor cooperation. Another reason to block is to allow the exploitation of several levels of paral- lelism. Thus, if the processors of the target multi- processor had vector capabilities, the inner loop should be translated into a vector statement.
In addition to loop parallelization, issues such as synchronization , communication, and memory usage may have an important influence on performance. For this reason, many paralleliz- ers include strategies for synchronization instruc- tion generation, locality enhancement, and data partition and distribution. These last two topics are particularly important for distributed memory machines as well as hierarchical memory systems such as the Cedar multiprocessor [20,7]. Synchro- nization considerations can be seen in the previ- ous example where a transformation called align- ment was applied. This transformation tries to place the statement instance that generates a value in the same iteration as the instance con- suming that value. This is done to avoid synchro- nization operations.
Locality enhancement has been studied exten- sively. For example, some strategies for locality enhancement were presented in ref.  and ex- tended and developed into automatic strategies in ref. . Also, locality enhancement mecha- nisms were manually applied to improve the per- formance of matrix multiplication on the Alliant FX/80 .
Other important memory-related issues also arise. For example, overlapping vector fetches from global memory and computation is an im- portant optimization for the Cedar multiproces- sor . Also, data partitioning and distribution heavily influence performance in distributed
224 D.A. Padua
memory machines and hierarchical systems such as Cedar. Traditionally, the experimental restruc- turers that perform data partitioning have re- quired the user to specify the data partitioning via assertions. More recently, there has been some work to automate this process.
To illustrate how a parallelizer might improve locality, let us assume a multiprocessor with a cache memory on each processor. Further assume that the cache (and thus the memory) is divided into blocks of K words each, and that data are only exchanged between memory and cache as whole blocks. Assume also that matrix columns are sequences of blocks (i.e. matrices are stored in column-major order and columns are much larger than blocks).
Now consider the loop:
do i=1 ,n do j = 1 ,n
B(j, i) =A(i, j) +1 end do
A naive compiler might transform the outer loop into a doall without any other transformation, causing (1 + l /K ) block transfers between mem- ory and caches for each assignment executed.
To improve this situation, the compiler could block both loops into groups of K iterations. This would have the effect of processing in K x K submatrix order the matrix A. After blocking and interchanging loops, we end up with the loop nest:
do I = l ,n ,K do J = 1 ,n,K
do i =1,1 +K-1 do j= J , J +K-1
B(j, i) =A(i, j) +1 end do
end do end do
If the outer loop is now transformed into a doall loop, the number of cache block transfers de- creases to 2/K, a clear improvement over the naive approach when K is large. To conclude the work on this loop we need to block once more for
vector registers, vectorize the innermost loop, and map into vector register instructions:
doall I = 1,n,K do J = 1 ,n,K
do i =1,1 +K-1 do j =J,J +K -1 ,32
m =min(j +31 ,n) vrl =A(i, j: m) vr2 =vr l + 1 B(j : m,i) =vr2
end do end do
end do end do
This resulting code segment is more efficient than the original loop, but is also less readable. Not all transformations have this effect. For ex- ample, vectorization often makes the code more readable.
There are two issues that are rapidly becoming quite important in the area of Fortran restructur- ing. One has to do with the automatic optimiza- tion of parallel programs. Several parallel For- tran dialects have recently become available and more are expected in the near future. The issue is the correctness of optimization techniques (in- cluding parallelization) to be applied to programs written in these parallel dialects. Recent work  shows that traditional optimization tech- niques can be applied to parallel programs pro- vided that some conditions are met. Such condi- tions use information on threads that are parallel to the one being optimized by the compiler. Con- sider, for example, the following program:
a=l / /
do i= l ,n 13(i) =a
end do coend
At the end of this program, it can be asserted that 13 will be a sequence of zeros followed by a sequence of ones. A parallelizer using traditional techniques will transform the loop into a doall. In
Problem-solving environments 225
the parallelized program, B could be assigned any sequence of zeros and ones which contradicts the assertion made on the original program. Clearly, this example illustrates a case where paralleliza- tion should not be applied without special inter- vention of the programmer.
The second issue is that of multilingual paral- lelization. The objective of this work is to develop methodologies for interprocedural analysis and optimization of programs that include modules written in different languages. Such programs arise in applications that include both numerical and non-numerical subcomputations where it may be convenient to use Fortran for the numerical part and, for example, Lisp or C for the non- numerical part. The ability to perform compiler optimizations across language boundaries could increase the quality of the object code without sacrificing programming convenience.
3.2. Lisp parallelization
Fortran is not the only language for which parallelization techniques have been developed. During the last few years there has been an increasing interest in the translation of very- high-level languages such as Lisp and Prolog for parallel computers. Besides being very high level, these languages are of interest because they are appropriate for non-numerical computations.
Parallelizers for Lisp use some of the tech- niques developed for Fortran (and other impera- tive languages such as C). However, the center of attention is no longer the loop since recursion is used more frequently in Lisp. For this reason, a technique called recursion splitting has been pro- posed to parallelize recursive constructs. To ex- plain the technique, consider a function of the form:
X: function (p) if Q(p) then return R(p) y := X(F(p)) return G(p, y)
Clearly, any recursive function can be cast into this form by choosing Q, R, F, and G appropri- ately. Recursion splitting transforms an invoca- tion X(po) into the expression
The value returned by the procedure invocation expand(P0, Q, F, R) is the sequence
Po, Pl ..... Prn, R(Prn+I)
where Q(Pr,+I) is true and Q(Pi) is false for all i < m + 1, and p~ is defined as the result of apply- ing F to Po a total of i times. The procedure
reduce (G, (Po, Pl . . . . . Pro, R(Prn+0))
is defined as the expression
G(Po .... G(Prn-1, G(Prn, R(prn+l)))...).
The advantage of translating the program into this expand-reduce form is that, in many cases, the reduce and expand functions can be paral- lelized. Consider the following function [10,14]:
TAK: function (a, b, c) if a < =b then return (c) return TAK (TAK (a - 1, b, c),
TAK (b -1 , c, a), TAK (c - 1, a, b))
This function can be cast into the canonical form shown above in several ways. One is the follow- ing:
TAK: function (a, b, c) if a < = b then return (c) y =TAK (a -1 , b, c) return TAK (y,
TAK (b -1 , c, a), TAK (c - 1, a, b))
This function can be transformed into the ex- pand-reduce form and then parallelized. Here the function used by expand is:
F (a, b, c ) -=a- l , b, c
and, therefore, the result of the expand function is the sequence (assuming that a 0 > b0):
reduce (G, expand(Po, Q, F, R)) aoboc o a o - lboC o a o -2boCo.. . b o +lboC o boboc o
226 D.A. Padua
which can be clearly generated in parallel. Also, some invocations to TAK in the reduce expression can be performed in parallel before the reduce function is invoked. These invocations corre- spond to the two inner invocations of TAK in the return expression above. More examples of recur- sion splitting and a detailed description of Lisp parallelization techniques can be found in refs.  and .
As mentioned above, the work on Lisp paral- lelization is important partially because of its application to non-numerical computation. We believe high-speed non-numerical computations will become more common in the future. For example, applications such as CAD for placement and routing, as well as software development systems, will profit from parallelism. Many appli- cations will include both numerical and non- numerical subcomputations. Examples include expert systems that rely on the results of numeri- cal computations in their decision-making, e.g. complete electrical CAD systems, and mathemat- ics problem solvers such as Mathematica.
3.3. Specification parallelization
Problem and solution specification systems will be more common in the future, and end-users will use computers more effectively as problem solving tools. Furthermore, such systems present both a challenge and an opportunity for parallel processing. Specification systems will make the power of parallel computers available without the inconveniences produced by architectural idiosyn- crasies. Also, for this same reason, specifications should be trivially portable across diverse archi- tectures.
A common belief is that parallelization can be applied more effectively to specifications than to programs. The reason is that at the specification level there is more information in a more conve- nient form than at the programming level. For example, it is proving difficult to find a good solution to the problem of automatic partitioning and distribution. The reason is that the compiler needs to determine what information is processed by the different loops and in what order. Such information should be readily available in a speci- fication system. Another indication that using higher-level languages helps parallelization is
provided by parallelizing compilers where it is sometimes necessary to infer the specification from the program before effective parallelization can take place. For example, linear recurrences are recognized by parallelizers via pattern-match- ing and automatically replaced by invocations to parallel algorithms.
The challenge for the parallel processing com- munity is to develop techniques to profit from the opportunities presented by specification systems. Very little has been done in this area. See ref.  for some ideas on parallelization for ELL- PACK.
3.4. Interactive restructurers
Another area of current active research stud- ies the development of program manipulation systems that provide the user with a set of com- mands to interactively parallelize programs. Such a system gives the programmer more control than that provided by a batch compiler. Thanks to such systems, the programmer will be able to write an easy-to-read but probably inefficient program and, through manually-applied correct- ness-preserving transformations, generate an equivalent, more efficient program.
There are several research and commercial interactive parallelizers under development to- day. At Illinois we are working on the Delta system , which includes a collection of interac- tive commands and a programming language to create more powerful commands than those origi- nally provided by the system. This meta-program- ming language could also be used to write com- plete parallelizers, and we expect will make possi- ble the rapid prototyping of restructuring tech- niques, allowing for the testing of different re- structuring strategies much faster than is possible today.
4. Program behavior analysis
A visualization subsystem to display program components and their behavior is necessary for both performance and correctness debugging. Such a system should make it easy to represent instances of an abstract type, in an abstract form or in a form that is easily related to the real-life objects that the data type represents, either indi-
Problem-soluing environments 227
Table 1 Objects needed at different debugging levels
Debug Perfor- Algorithm Problem logic mance debugging specific
debugging tuning debugging
Program graph x Program graph
VS. machine utilization x
Data structure dynamics x
Collections of data structures rendered x x x
vidually or collectively. It should also allow repre- sentation and analysis of other program entities such as dependence graphs and process invoca- tion graphs. Table 1 shows what objects are useful for debugging at the program, solution specifica- tion, and problem specification levels.
Of the work in this area, performance visual- ization and the debugging of user-written parallel programs seem the most germane to parallel pro- cessing. Other issues such as program structure visualization are important for any computer sys- tem. Here we will only discuss debugging of ex- plicitly parallel programs. A major difficulty in parallel programming is the need for explicit syn- chronization between the components. There are ways to avoid having to use explicit synchroniza- tion. For example, by relying on the compiler for the automatic generation of parallel code. How- ever, conventional parallel programming will al- ways be necessary and the use of explicit synchro- nization by the programmer can clearly lead to errors. An important form of such an error is unintended nondeterminacy. A program is nonde- terminate if different runs, with the same input data and same initial value, may produce differ- ent results. For example, the segment of code:
loop. The reason for this is that 91 in iteration i writes into A(i), 92 in iteration i+1 reads from A(i), and the order of these two events is not guaranteed. This situation is called a read/write race. Nondeterminacy could be intentional if the program includes an asynchronous algorithm, but most often is the result of a mistake on the part of the programmer. Nondeterminacy is some- times very difficult to detect. Tools to detect such situations could therefore be quite useful in some cases.
Much work has been done lately in the devel- opment of techniques to automatically detect nondeterminacy. See, for example, refs.  and . At Illinois we are working on the develop- ment of analysis techniques and have developed a tool for use with Cedar Fortran . The ap- proach, however, is not restricted to Fortran di- alects.
Given a program and a set of input values, the purpose of the tool is to inform the programmer whether nondeterminacy exists, and if so, what is the cause. Thus, for the segment of code above, the tool would indicate that there is a read/write race between $1 and S 2.
The tool is designed to be used before a paral- lel break-point debugger is invoked, and its pur- pose is to help the programmer remove all the undesirable nondeterminacies before invoking the break-point debugger, and in this way facilitate the debugging task by dividing it into two phases.
The tool uses two approaches to detect nonde- terminacy. One is static and is based on the analysis of the source program. The other ap- proach uses a trace produced at run-time by the (instrumented) program being debugged. Trace analysis is more accurate than the static analysis, but is slower and is only known to be valid for the specific input data used to generate the trace.
5. Interaction belweeul the l~d~
A=O doall i = 1 ,N
s l : A(i) =1
s2: B(i) =A(i - 1) end doall
is nondeterminate since there is no way to know whether B(i) contains 1 or 0 at the end of the doall
In Fig. 1 we present a simplified version of a problem solving session. A typical session will start with the specification of a problem to be solved. Such a specification could be formal or informal, and may lead directly to an executable program. More likely, it will be the first step of a process that will commence with the selection of the algorithms to be used. After the selection,
228 D.A. Padua
I Specify Problem [
Select Algorithm(s!,, I ' * " " -
5earcll tor Existing Routine Write
or Program Generate Program
I Execute Program p Observe behavior
to determine correctness
Fig. 1. A problem-solving session
one or more of the following steps will have to be taken: (1) a routine implementing the algorithms may be searched (using a system such as SLA- DOC); (2) a program may be automatically gener- ated by using the information concerning the problem and the algorithms; or (3) a new pro- gram may have to be written. The program result- ing from the previous step may be run directly since it may already be tuned for the target machine (this will be the case if, for example, the program was generated by a very-high-level lan- guage system) or the collection of modules (per- haps in different languages) may have to restruc- tured automatically or interactively. Powerful sys- tems providing programmers with information on past experience with the target machine or with the routines being restructured could be an in- valuable tool here.
After the program is run, the user may wish to observe various aspects of the program's behavior and of the results of the performed computations. This program behavior can be shown to the pro- grammer as a sequence of snapshots of the state of the objects the program manipulates, or as a movie. The manipulated objects can be repre-
sented in an abstract mathematical form such as matrices, linked lists, etc., or they might be depic- tions of real-life objects. Using such information the user/programmer may decide to change as- pects of the program or some aspect of the speci- fications of the problem.
In these problem solving environments, the different layers of the abstraction hierarchy will coexist in the same way that Fortran, Lisp and assembly language coexist in the systems of today.
 W. Abu-Sufah, D.J. Kuck and D.H. Lawrie, On the performance enhancement of paging systems through program analysis and transformation, IEEE Trans. Corn- put. C-30, (5) (May 1981) 341-356.
 F.L. Bauer, B. Moller, M. Partsch and P. Pepper, Formal program construction by transformations-computer- aided, intuition-guided programming, IEEE Trans. Soft- ware Eng. 15 (2) (February 1989) 165-180.
 D. Callahan, K. Kennedy and J. Subhlok, Analysis of event synchronization in a parallel programming tool, in: Proc. 2nd ACM SIGPLAN Symp. on Principles and Prac- tice of Parallel Programming, Seattle, WA, (March 1990) 21-30.
 D.-K. Chen, H.-M. Su and P.C. Yew, The impact of synchronization and granularity on parallel systems, in: Proc. 17th Annual Internat. Syrup. on Computer Architec- ture, Seattle, WA (May 1990) 239-248.
 G.O. Cook Jr., ALPAL - A tool for the development of large-scale simulation codes, Lawrence Livermore Na- tional Laboratory, Rept. No. UICD-21482. (22 August 1988).
 A. Dinning and E. Schonberg, An empirical comparison of monitoring algorithms for access anomaly detection, in: Proc. 2nd ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, Seattle, WA (March 1990) 1-10.
 R. Eigenmann, J. Hoeflinger, G. Jaxon and D. Padua, Cedar Fortran and its compiler, in: Proc. Joint Conf. on Vector and Parallel Processing, Zurich (September 1990).
 P.A. Emrath and D.A. Padua, Automatic detection of non-determinacy in parallel programs, Proc. Workshop on Parallel and Distributed Debugging, Sigplan Not. 24 (1) (Jan. 1989) 89-99.
 B. Engquist and T. Smedsaas, Automatic computer code generation for hyperbolic and parabolic differential equations, SlAM J. Sci. Stat. Comput., 1 (2) (June 1980) 249-259.
 R.P. Gabriel, Performance and Evaluation of Lisp Sys- tems (MIT Press, Cambridge, MA, 1985).
 E.H. Gornish, Compile-time analysis for data prefetch- ing, M.S. Thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana- Champaign, Rept. No. CSRD 939 (1989).
Problem-solving encironments 229
 M.D. Guzzi, D.A. Padua, J.P. Hoeflinger, and D.H. Lawrie, Cedar Fortran and other vector and parallel Fortran dialects, J. Supercomputing (1990).
 R.H. Halstead Jr., Multilisp: a language for concurrent symbolic computation, ACM Trans. Program. Lang. Syst. (1985).
 W.L. Harrison, The interprocedural analysis and auto- matic parallelization of scheme programs, PhD Thesis, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign, Rept. No. 860 (February 1989).
 W.L. Harrison and D.A. Padua, PARCEL: Project for the automatic restructuring and concurrent evaluation of Lisp, in: Proc. 1988 Internat. Conf. on Supercomputing, St. Malo, France (July 1988) 527-538.
 P.N. Hilfinger and P. Colella, FIDIL: a language for scientific programming, Lawrence Livermore National laboratory, Rept. No. UCRL-98057 (January 1988).
 E.N. Houstis, J.R. Rice, N.P. Chisochoides, H.C. Karathanasis, P.N. Papachiou, M.K. Samartzis, E.A. Vavalis and K.Y. Wang, / /El lpack: a numerical simula- tion programming environment for parallel MIMD ma- chines, in: Proc. 1990 Internat. Conf. on Supercomputing, Amsterdam, The Netherlands (June 1990) 96-107.
 W. Jalby and U. Meier, Optimizing matrix operations on a parallel multiprocessor with a memory hierarchy, in: Proc. 1986 Internat. Conf. on Parallel Processing (19-22 August 1986) 429-432.
 C. Konno, M. Yamabe, M. Saji, N. Sagawa, Y. Umetani, tt. Hirayama and T. Ohta, Automatic code generation method of DEQSOL (Differential EQuation SOlver Lan- guage), J. Inf. Process. 11 (1) (1987) 15-21.
 D.J. Kuck, E.S. Davidson, D.H. Lawrie and A.H. Sameh, Parallel supercomputing today and the Cedar approach, Science 201 (February 1986) 967-974.
 A. McKellar and E. Coffman Jr., Organizing matrices and matrix operations for paged memory systems, com- mun. ACM 12 (1969) 153-165.
 S.P. Midkiff and D.A. Padua, Compiler algorithms for synchronization, IEEE Trans. Comput. C-36 (12) (De- cember 1987) 1485-1495.
 S.P. Midkiff and D.A. Padua, Issues in the compile-time optimization of parallel programs, in: Proc. 1990 Internat. Conf. on Parallel Processing (August 1990).
 D.A. Padua, Preliminary design of the delta system, Rept. 880, Center for Supercomputing Research and Development, University of Illinois at Urbana-Champaign (June 1989).
 D.A. Padua and M.J. Wolfe, Advanced compiler opti- mizations for supercomputers, Commun. ACM 29 (12) (Dec. 1986) 1184-1201.
 J.R. Rice and R.F. Boisvert, Solving Elliptic Problems Using ELLPACK (Springer, New York, 1985).
I~a,itl ~. i~adm~ received the Licen- ciatura in Computer Science from the Universidad Central de Venezuela in 1973, and the Ph.D. degree from the University of Illinois at Urbana- Champaign in 1980. From 1981 to 1984 he was with the Department of Computer Science at the Universidad Simon Bolivar, Venezuela. He has been at the University of Illinois since 1985, where he is now an Associate Director of the Center for Supercom- puting Research and Development
(CSRD) and an Associate Professor in the Department of Computer Science. Dr. Padua has published over 40 papers on different aspects of parallel computing including machine organization, parallel programming languages and tools, and parallelizing compilers. He led the development of the lan- guages and compilers used in Cedar, a multiprocessor devel- oped at CSRD. His current research focuses on the experi- mental analysis of parallelizing compilers and on the develop- ment of the techniques needed to make these compilers more effective. A co-organizer of the Workshops on Languages and Compilers for Parallel Computing, Dr. Padua served as Pro- gram Committee Chairman of the Second ACM Symposium on Parallel Programming, and Program Co-Chairman of the 1990 International Conference on Parallel Processing. He serves on the editorial board of the IEEE Transaction on Parallel and Distributed Systems and is a senior member of the IEEE.