[ACM Press the fourth international conference - Imperial College, London, United Kingdom (1989.09.11-1989.09.13)] Proceedings of the fourth international conference on Functional

Update Analysis

and the

Efficient Implementation of Functional Aggregates

Adrienne Bless* Virginia Polytechnic Institute and State University

Blacksburg, Virginia 24061 [email protected]

I Introduction

Functional languages offer clean semantics, lazy evaluation, and higher-order functions, but their outstanding property is their lack of side effects. While it is often argued that it is easier to reason about programs written without side-effects, making such programs run efficiently has historically been difficult. The inefficiency introduced by the lack of side-effects is particularly apparent in aggregate structures such as arrays.l These structures provide constant-time access and update in imperative languages, and particularly in numer- ical applications are the backbone of many algorithms. In functional languages, however, the update operation typically requires copying the aggregate, introducing un- acceptable inefficiency. This problem has been explored on several fronts in recent years. Schmidt [21,22] stud- ies techniques for determining when updates to semantic store and environment arguments are single-threaded and thus may be done destructively, but his analysis holds only for a call-by-value evaluation scheme. Hudak’s static reference counting [12] provides a similar analysis for general updatable objects but again only for first-order call-by-value functional languages. Gopinath’s work on targeting [8,9] is related but emphasizes the properties of specific operators and assumes the previous computation of liveness information. His work

*This research was supported in part by the National Science Foundation under grant DCR- 8451415.

l%‘e use vectors and arrays in examples throughout this paper, but the work extends to general aggregate structures.

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

0 1989 ACM 0-89791-328-O/89/0009/0026 $1.50 26

also assumes call-by-value. It is not clear how easily any of these methods could be extended to lazy languages. Other approaches include a tree-shaped array representation for applications in matrix algebra [24], an “associative aggregate” structure along with a special architecture that supports it efficiently [20], and variations on a monolithic approach in which arrays are created and filled according to some specifications but may not be updated afterward [3,13,17,23]. [ll] explores the rela- tionship between incrementally updatable arrays, non- determinism, and parallelism in functional languages. I-structures [19] provide write-once incremental updating, a compromise between monolithic structures and full incremental updating. I-structures can be implemented efficiently without complex compile-time or runtime analysis and they do not restrict parallelism, but write-once structures lack the flexibility of general incrementally updatable structures.2

In this paper we introduce a technique for efficiently implementing incrementally updatable aggregates in sequential first-order lazy functional languages. The technique is based on path analysis, a compile-time analysis that yields information about order of evaluation of expressions and produces speedups over both the naive implementation and another optimization technique in- volving trailers. Path analysis is described in [5], but its application to destructive aggregate updating is not addressed in that work. A complete description of path analysis and its applications may be found in [4].

The next section describes the syntax and semantics of our first-order lazy functional language. Section 3 describes the a.ggregate update problem and explains why the obvious implementation techniques are insufficient. Section 4 briefly describes path analysis, and Section

‘Consider the histogram problem proposed in [2]; this cannot be solved by I-structures, but in a sequential system is trivially solved with general incrementally updatable arrays.

5 presents update semantics and update analysis and shows how they can be applied to destructive aggregate updating. Section 6 gives benchmarks from our implementation, and Section 7 presents our conclusions and discusses extensions to update analysis.

2 Standard Syntax and Seman- tics

Figure 1 shows the standard syntax and semantics for our first-order functional language. Note that we assume that all programs have been lambda-lifted [16], so we only find paths through top-level functions.

In the semantic equations, double brackets surround syntactic objects, as in El[~i], and single brackets indicate environment update, as in env[y/z]; [yi/~] is shorthand for I[yi/~r, . . . . Y~/x,J, where the subscript bounds are inferred from context. For any domain D, D” refers to the domain of n-tuples with each element drawn from D.

3 The Aggregate Update Prob- lem

3.1 The Basic Problem

In imperative languages, arrays are typically selected from by subscripting, e.g. a[;], and updated by assignment, e.g. cc[i] := 2. Both access and assignment are constant-time operations. In functional languages, however, the absence of side-effects requires constructs such as those below, each of which we assume to be strict:

mLv(n,f) * an array a of size n such that a[i] = f(i)

a[i] * ith element of array a ?.@(a, i, X) =+ a new array u’ such that

d[i) = 2 ayj] = ub] vj # i

The important point here is that upd(a,i, Z) does not modify its first argument, but instead returns a new array a’ that is exactly like a except in the ith location. Thus conceptually a’ is a copy of a. This is desirable in that it maintains the functional semantics, but in a naive (copying) implementation the calls to upd become very inefficient. To see how copying can affect the complexity of a program, consider the simple function to initialize an array:

init(u, i, Z) = if i = 0 then a else init(upd(a, i, x), i - 1, E)

Calling init on an array A of length n will make n copies of A, each of which has time and space complexity O(n); thus initializing A takes time O(n2) and space O(n’), while the efficient imperative counterpart takes time O(n) and constant space.

Clearly, copying is not a viable implementation for functional arrays. One alternative to copying is a technique called trailers which is described in the next section.

3.2 Trailers

When an array is updated using trailers, only the cell being updated is copied. An array is represented as a pair (T, V), where V is some contiguous array representation and T is a list of trailers, or (inder,vulue) pairs that indicate where V’s elements are “shadowed”. To access the value of the ich index, first the trailer list must be searched to see if the jth element of the array is shadowed; if so, the value in the trailer is returned, otherwise the i’” element of V is returned. Updating the jth element is more complicated. Suppose a is an array of length n such that Vi,a[i] = zli and we wish to let b = upd(u,j, z). The array portion of u’s representation is destructively modified so that its jth element is Z, and b is made to point to this structure with an empty trailer list. Meanwhile, the pair (j,vj) becomes the first element in a’s trailer list, so that accesses to a see that its jth element is actually uj. Note that we could have left a alone and constructed b to point to u’s array representation with a trailer list containing (j, z); however, it has been observed that the most recent version of an array is the most commonly accessed, and so we are optimizing for this case.

While using trailers is generally more efficient than copying entire arrays, it still introduces substantial overhead. With trailers each update requires a new cell to hold the trailer, adding both space and time overhead. Furthermore, since the trailer list must be checked at each reference to the array, access time is slowed by a constant factor eve71 GOT single-threaded uses. While it may not be clear to the reader to what extent these fac- tors affect overall program execution, the benchmarks in Section 5 show that the effect can be significant; indeed, sometimes it is more efficient to copy!

3.3 Destructive Updating

Both of the techniques described above introduce unac- ceptable inefficiency into functional programs with aggregates. It seems that only destructive updating is ef-

27

Abstract Syntax

c E Con constants 2 E Bv bound variables

P E Pf primitive functions f E Fv function variables e E Exp expressions, where e = c 1 I 1 p(el, . . . . e,) 1 f(el, . . . . e,)

pr E Prog programs, where pr = {fi(zl, . . . . 2,) = ei}

Semantic Domains

Int the standard flat domain of integers Boo1 the standard flat domain of boolean values Bus = Int + Boo1 the domain of basic values Fun = U~zl(Basn + Bus) the domain of first-order functions. D = Bus + Fun + {error} the domain of denoteable values. Bve=Bv+D the domain of bound variable environments Env = Fv + D the domain of function environments

Semantic Functions

K : Con---t Bus P : Pf + Fun & : Exp + Bve + Env -+ D

&P : Prog + Env

K[n] = 72, integer n IC[true] = true

K[false]l = false

p[+j = X(X, y). (Int?(x) and Int?(y)) + x + y, error P[IFJ = X(x, y, z). (BooZ?(z)) + (z + y, z), error

E[c]bve env = Kc[c]

~Uxilb ve env = bve[xj]

fb(el,... , e,)] bve env = Pb](L[el]bve env, . . . , C[e,]bve env)

fIf(el, . . . , e,)]bve env = env[f(f[el]bve env, . . . , I[e,l]bve env)

&U{fi(% 7. . . , xCn) = ei}] = env whererec

env = I[(X(Yl,. . . , yn).CBe;D[yk/~ck]env)/fi]

Figure 1: Standard First-Order Syntax and Semantics

28

ficient enough, but how can this be done safely in a functional language?

The key lies in knowing when a structure is used. If an array a is updated at some point in a program and is never used again after that point,3 that update could have safely been done destructively. This is intuitively clear, since if a is never referenced again there is no way to tell whether it has changed. What may be less clear is how often this will be the case; however, em- pirical evidence indicates that in most applications the new array returned by upd is used in future references but the old array is not. Consider the previous example of initializing an array; in a copying implementation, executing the call init(A, n, X) produces n copies of partially-initialized forms of A, n- 1 of which are dis- carded. The nth copy contains a fully initialized array and is returned as the value of the function call. It is likely that the initial value of A will never be used again either, having been empty or having contained information that the user no longer needed.* Therefore even the initial copy was probabably unnecessary, in which case the entire initialization could have been done in place.

What we are proposing here is a “behind-the-scenes” side-effect, one that vastly improves efficiency but does not affect the semantics of the program.5 Of course, the difficulty is in deciding whether it is safe to do an update destructively, that is, will a be referenced again after it is updated? A local analysis is straightforward but insufficient; the wide use of function calls in functional programming dictates an interprocedural analysis. This is the role of path analysis, a powerful compile-time technique for inferring order-of-evaluation information in lazy sequential functional languages. Path analysis is briefly described in the next section.

4 Path Analysis

4.1 Overview

Path semantics is a non-standard semantics that describes order of evaluation of expressions for a lazy sequential functional language. Path analysis is an abstract interpretation of path semantics that provides

31n the terminology typical for imperative languages, a is no longer liu e.

4 Note that if the intent was not to throw away the original value of A, the intent must have been to copy it, since every element in it is being updated. Thus a copy must be made, and it would have to be done explicitly in an imperative language.

5 Unlike I-structures, arrays implemented in this way do not com-

compile-time information about order of evaluation. First-order path analysis subsumes first-order strictness analysis, and the extra information that it provides al- lows a variety of optimizations beyond the standard conversion of call-by-name into call-by-value. Here we present only a brief intuitive description of path analysis. For the full theory of path semantics and path analysis, see [4,5]; for more detail about abstract interpretation, see [1,7].

4.2 Basic Paths

A path through a function f is either a totally ordered subset of f’s formal parameters, where the ordering represents the evaluation order of those parameters in the body of f, or the bottom path, denoted &, indicating that f does not terminate. The domain of paths Path is flat, that is, Q(p,pl,pz E Path,pl,pa # Ip), lp & p and pl and p2 are incomparable. We write

1 3h,z2, --*> I,), n 1 0, for a path with n elements where the zi represent formal parameters. Thus a particular call to a function on a sequential machine has exactly one path, but at compile-time we can only infer a set of possible paths. Consider the following functions:

9(a7 b) a+b f(t,y,z) 1 if?: = 0 then y else g(+, Z)

Assuming that + evaluates its arguments left-to-right,6 the set of paths through g contains only one element, (a,b). The set of possible paths through f is ((2, y), (z,z)}, with the first and second paths corresponding to the consequent and alternate of the conditional, re- spectively. Note that although 2 is demanded twice if the alternate is taken, it appears only once in the second path. This reflects the “one-time evaluation” property of lazy evaluation: the first demand to z will cause it to be evaluated, but the second demand simply returns a stored value and so does not contribute to the path. Also note that this is an interprocedural analysis in that the path through f depends on the path through g. Re- cursive and mutually recursive dependencies also yield complete interprocedural information.

4.3 Occurrence Pat 11s

It is clear that order of evaluation information is useful in determining when an aggregate is used, but the paths

“To simplify presentation, we will make this assumption throughout this paper; however, as discussed in [4,5], the order in which a strict operator evaluates its arguments may be de-

promisereferential transparency. That is, f(mkv(n,g), mk-v(n, 9)) termined siatically or dynamically in any one of a number of has the same semantics as (Xu.+f(a, a))(mku(n,g)). WayS.

29

described so far do not contain quite enough information. Since pat.hs describe order of evaluation, a bound variable can appear at most once in any path through a function; yet the question in aggregate updating is whether an aggregate will appear again after it is updated! In other words, we really want to know the order in which bound variables are used. Fortunately, order of use information is easily derived from order of evaluation information. For each function f, derive a new function f’ that is exactly like f except that each OCCZ~T- rence of a bound variable in f becomes a unique bound variable in f’. Then the paths through f’ will yield order of use information for f- Consider the factorial function:

facfn, act) = if n = 0 then ace else fac(n - 1, n * ace)

fac’(n1,122,na, accr, ucc2) = if nr = 0 then accr else fac(nz - 1, n3 * acc2)

Note that fat’ calls fat recursively, not fat’; this follows from the fact that function behavior is described by order of evaluation, which is the information provided by the paths through fat. Of course, fuc’ would not make any sense in call position anyway, since it has the wrong number of arguments; it is a dummy function to be used only for deriving internal order of use information.

Computing the paths through fat and fat’ is straightforward:

fat,paths : ({n, 4, &I fat’-paths : {(nl,uccl),(nl,n2,n3,accz)rl,)

The paths through fat’ contain strictly more information than the paths through fat, since by collapsing all occurrences of a given bound variable into that bound variable and removing duplicates we could derive the paths through fat from those through fat’. Notice that both fat-paths and fat’-paths contain the bottom path Lpr indicating that non-termination is a possibility.

5 Destructive Aggregate Updat- ing

Order of evaluation information is central to determining when an aggregate may be updated destructively, but by itself it is insufficient. The following additional information is required:

1. At each call to upcl,

(b) what lexical occurrence of upd is called, and

(c) where the update occurs relative to other elements in its path.

2. Where aliasing occurs, in particular, if two variables that appear to refer to different aggregates could in fact be aliases for the same one.

The next two sections describe update semantics and update analysis, which provide the basic information about what is updated and where. Section 5.3 gives examples of applications of update analysis and shows how we deal with aliasing.

5.1 Update Semantics

To derive update information, paths are extended to UP- date paths, where an update path may contain update elements in addition to bound variables. An update element is a pair (u, o), where u is the index of the lexical occurrence of upd being applied and a is the aggregate being updated. An update element appears in a path wherever an update occurs in that path. Formally, update paths are defined as follows:

Updpath = {&i} tJ ((21, . . ..%)lzi E Bv + ue)

where Ue = {(u, u)Iu E Nat, a E Agg} Agg as defined below

Besides update elements, we also need information about how aggregates are propagated so that we can tell what aggregates could be affected by a call to upd. We introduce Agg, the flat domain of aggregates that could be returned by a path, with bottom element 1,:

Agg = {la} + {none} + Bv

The non-terminating path lP is said to return the aggregate I,; a terminating path that cannot return a named aggregate because of type restrictions or anonymity is said to return the aggregate none. All other paths return a bound variable, and the aggregate associated with one of these paths is that bound variable.

Now we can define update paths that also carry aggregate information; we call these update pairs:

Upair = Agg @ Updpath

The constructor @ represents the smash product, essen- tially a strict cross product. In this case, this implies the following:

(a) which a.ggregate is being updated,

30

V(a,p) E Upair,(a = la) * (p = IP)

Thus Upair is a flat domain with bottom element

(Lz, -b>.

We say that an update pair u = (a, p) has two compo- nents, an aggregate component a and a path component p. We will sometimes write ua for the aggregate component and up for the path component. The semantics for update pairs for the first-order case is given in Figure 2.

The body of update semantics has the same form as that of path semantics, but the primitives show the additional information being provided. + cannot return an aggregate for type reasons, and so the first element of an aggregate pair returned from + is always none. If can propagate the value returned by either of its arms, and so the appropriate aggregate is that associated with the arm taken. Sel is assumed not to return an aggregate, which means that aggregates cannot be stored inside other aggregates. And although upd returns an aggregate, it is anonymous and so cannot be shared until it becomes named, e.g., by being passed as a parameter to a function. At that point sharing will be detected inside the function to which it is passed; there is no possibility of its being shared by the function in which it is produced.

Note that the semantics for upd and se1 indicate that their arguments are evaluated from right -to left, not left to right as the reader may have expected. Like +, these strict primitives could take their arguments in any order, but there is often an advantage to the right-to-left ordering. Consider the function swap:

swap(a, i, j) = updl(updz(a, i, sel(a, j)), j, seZ(a, i))

Swap takes an array a and two integers i and j and returns a new array in which the values of u[;] and ab] have been interchanged. The interesting point about swap is that upd2 can be done destructively only if updl evaluates its last argument before its first argument.? A little thought about upd suggests that this will often be the case, since its first argument is an array, which could easily be produced by a call to upd, and its other two arguments are an integer and an arbitrary value, whose computations seem less likely to include an update. Thus it makes sense to put the arguments most likely to perform updates last, which in this case

‘It is interesting to note that swap requires a sort of “special treatment” in imperative languages as well, where one array element is stored in a temporary variable before its array location is overwritten. Thus in imperative languages the temporary storage must be used explicitly, while in functional languages it is implicit, as an argument to the outer call to upd. Fur- thermore, if the arguments to upd were evaluated in such an order that the inuer update could not be done destructively, a trailer would be required to hold the shadowed value of a[;]; this trailer represents exactly the storage that is used by lemp in the Pascal program.

means upd’s second and third arguments should be evaluated before its first. Of course, a counterexample in which the opposite ordering would do better is easily constructed, but we speculate that such counterexam- ples occur infrequently in practice, and we fix a right- toleft ordering 011 upd’s arguments in our analysis. A similar argument applies to the arguments to sel, and we evaluate them from right to left as well.

All references to E could be eliminated by incorpo- rating the standard semantics directly into update semantics, at which point the standard semantics could be shown to be an abstraction of update semantics. It is also easy to show that path semantics is an abstraction of update semantics.

5.2 Update Analysis

Like path semantics, update semantics is not useful for static program optimization since it relies on the standard semantics. However, update semantics can be abstracted to update analysis in a manner simiIar to that in which path semantics was abstracted to path analysis. The conditional holds the key, since this is where update semantics relies on the standard semantics; in the absence of that information, there are two possible paths through a conditional, yielding a set of possible paths through any expression. The form of each update pair does not change, but we now operate on the powerdomain of update pairs, using the Egli-Milner powerdomain construction for its ability to model non- termination. Update analysis is presented in Figure 3.

Note that the ordering on the arguments to the strict primitives is still fixed. At this point we could allow any ordering, e.g.,

tik[+]l = Xs.{fnone,zP : yP),(none, yp : zp) 1 (z, y) E 8)

This is a more general model, but allowing all such orderings substantially increases the complexity of update analysis. Although we use fixed orderings, there are many issues involved in choosing an ordering statically; these issues are discussed in [4].

Theorem 1 oPI[ivrj is compudable for any program pr.

Proof: The proof depends on showing that the domains are finite and the operations are monotonic. The domain of paths is finit.e, and clearly Ayg is finite, so Upair must be finite a.s well. Showing monotonicity of x and lJ is straightforward (a complete proof may be found in [4]), and t.he existence of a least fixpoint is guaranteed. 0

31

Semantic Domains

Upair, the flat domain of update pairs Ufun = U,“=l(D” ---+ Upa + Upair) Uenv = Fv + Ufun, the function envronment Ubve = Bv --f Upair, the bound variable environment

Semantic Functions

u: Exp + Bve -+ Ubve -+ Uenv + Upair t?/k : Pf + UfUn

Up : Prog + Uenv

U[c] bve ubve uenv = (none, 0) U[x] bve ubve uenv = ubve[x]

ub(el, . . . . e,)] bve ubve uenv = let di = &[ei] bve p; = U[e;] bve ubve uenv

in Uk([pB(dl, --,&,Pl,-.,Pn)

Wfh . . . . e,)] bve ubve uenv = let di = Z[ei] bve pi = Zll[ei] bve ubve uenv

in uenv[f](dl, ..-,&,PI, . . ..P.)

up[{fi(Xl, -.., 2,) = ei}] = uenv whererec

uenv = KYYl, ‘“, Yn, 21, “‘, -tn)- Nk4l [yi/x:i] [zi/xi] uenv)/fi] env = &!I{fifi(z1, . . . . z,) = ei>]

The path-append operator ‘5” is defined as follows: Vp E Path, xi E D, 1 5 i < n, n > 0

p: lp = lp &:p = Lp

(Xl > “‘, 2,) : (x,+~, . . . . xn) = if tfn+1 E {%-An}

then (x1, . . . . z,) : (x,+2, . . . . x,) else (Xl, a”, 27Ta, %l+1) : (Gn+z, . ...%)

Figure 2: Update Semantics

32

Semantic Domains

Upair, the flat domain of update pairs

Pem( &a+), the powerdomain of Upair

u&l = U,“=l(P( UpaS) -+ Pem( Upair))

U&au = Fv + U~W, the function envronment

U&Hi? = Bv + Upair, the bound variable environment

Semantic Functions

ii : Exp -+ U&e --f U&zv -+ Pem( Upair) . ^ Uk : Pf ---f Ufun

fiP .

: Prog + Uenv

ti[cn b ve uenv = un t!?[x:]lbve uenv = (_bve[x])

$(el, . . . . e,)]bve uenv ^ ^

= UkBp](U[el]bve uenw x . . . x U[e,]bve uenw)

~Kf<el, .--, e,)]bve uenw ^

= uenv[f](L4[eJbwe uenw x .., x U[e,]bve uenw) L&[{fi(xl, . . ..x.) = e()] = menu whererec

Uenv = [(Js. Ui~EiB b&l uenv I (~1, .-, Y,J E sl>/fJ

& [+I = As.{(none, xP : yp) 1(x, y) E s}

^ &pfJj = ~~.{(C”,pP :cP),(QJp :q)(P,C,a) E s}

Uk lrwq = Xs.{(none, xp : ip : up : ((j, au))) ( (a, i, x) E s}

tik[sel] = As.{(none, yp : xp) 1 (x, y) E s}

Figure 3: Update Analysis

33

5.3 Applying Update Analysis

Update analysis now seems to contain the information required to detect when destructive aggregate updating is safe. The method is simple: Compute the set of update pairs for each function in a program, and for the occurrence version of each function. Then look at the update paths through the occurrence functions: if in any path in which an update element (updi, u) occurs

there is later another occurrence of a, then ‘1~pdi cannot be done destructively. In this discussion we will use the notation (up&, u) instead of (i, a) as it is easier to read. Also, integer or unspecified values (e.g., indices or values to be stored in an array) are often represented by a single argument, usually either i or j. Although this tends to trivialize the functions, it simplifies the presentation and has no effect on update analysis. Consider once again the init example:

Note that updl appears not only in the paths through g, but also in the paths through f, as the update information in g is “exported” into every function that uses it. Again, the occurrence paths through f’ and g’ show that updl can be done destructively.

While it is instructive to consider examples in which destructive updating is possible, it is even more instructive to consider examples in which it is not possible. Failure to catch a potential optimization is disappoint- ing, but performing an unsafe optimization renders the entire analysis useless. The next few examples examine ways in which it can be unsafe to update destructively and show how update analysis detects these cases. First, take the simplest case:

f(a, i) = sel(updl(a, i, i), i) + sel(a, i)

init(u, i,~) = if i = 0 then a else init(upd(a, i, 2), i - 1, Z)

Recalling that upd evaluates its arguments from right to left and discarding the aggregate portion of the final update pairs, the update paths through init are

The regular paths and occurrence paths for f appear below:

VP, (44, (ha, (vd W-

But it is the occurrence palhs, or paths through init’, that indicate whether or not an update can be done destructively. Numbering the occurrences of each bound variable lexically from left to right, we get init’ and its paths:

f-paths : ((4 a, (wA7 4)) f/-paths : ((i3,i2,il,al,(uPd~,ul),i4,u2)~

Since an occurrence of a (~2) is used after another occurrence of a (al) is updated, the path through f’ indicates that updl cannot be done destructively.

Next, consider a slight modification of equation (1) above:

inil’(al,aa,il,i2,i3,21,z2) =

if ii = 0 then a1 else init(updl(a2, i2, q),i3 - 1, ~2)

init’,paths : { I,,

(il, al), (il,i3,~1,i2,a2,(updl,a2)), (il,i3,x2,~1,i2,a2,(updl,a2))

1

In both of the paths that contain update elements, the aggregate being updated is not used again after the update, so updi can be done destructively.

Another example shows how the effect of updating in one function can be accounted for in another function:

g(a, b,i) = if i = 0 then a else upd(b,i,i) (2) f(x, Y, j) = s4d~, y,$, 33 + SeG, 8

g-paths : Hi, 4 (6 b, (~P~,W f-paths : {(~,xtY),(i,Y,(uP~llY))}

g/-paths : {(il,u), ( il,iz,i3,b,(Updl,b))) f’-paths : {(j2,jl,~,j3,Y2),(~2,~l,Y1,(~~~1,~1),~3,~2)

Now the paths through f’ indicate that upd~ cannot be done destructively, although it should be noted that the paths through g’ do not show this. This emphasizes that a system of functions must be analyzed as a whole.

Now consider a third example:

g(a, b, i) = if i = 0 then b else upd(u, i, i) (1) f(x,y,d = 47(x, ~,j>,j> + sel(y,j)

The paths and occurrence paths are shown below:

f (a, b) = sel(up4(a, il, z), i2) + se+, i3)

!d4 = f(v)

The appropriate paths are as follows:

f-paths : {(a, (WI, a>, b)l f’-paths : ~(%(WG4W~

g-paths : -UC> (UP4 1 c>>> g-paths : ((~1, (UP&, CZ)))

None of these paths indicates that updl cannot be done

f’paths: ((~2,~l,~1t~3,~2),(~3,~l,2,(2~~dl~z)~~3~~2)) destructively, yet this is clearly the case. The problem

34

is that a and b are aliases for c, and although the conflict really occurs inside of f, it can’t be detected by examining only f and the functions it relies on; information is also required about the functions that use f. Note that this is different from the situation that arose in equation (2) above, where f could detect its conflict because g exported its update information to f. In that case, the information flow from callee to caller was s&i- cient; in this case, we need information flow from caller to callee as well.

We accomplish this by doing a simple transitive clo- sure of the set of tuples of possible aggregates on which each function might be called. In keeping with our previous notion of “aggregate,” an argument expression that might evaluate to a bound variable is assigned that variable, while an expression that cannot propagate a bound variable is assigned the value none. (Recall that we already know which arguments can be propagated by user-defined functions, as these are precisely the first elements of the update pairs found by update analysis.) In this way we can detect functions that might be called with the same aggregate for more than one argument. This is admittedly a very operational approach, and it is safe only for the first-order case; a full higher-order analysis would require a collecting interpretation [14], a formal denotational description of how the meaning of an expression can depend on its context.

Once we have the set of aggregate tuples with which each function might be called, we simply substitute the elements of those tuples in for the corresponding bound variables in the occurrence paths through the function, producing a new set of paths. Going back to the last example, we find that f is called with the argument tuple (c,c), and so we substitute c for each occurrence of a or b in f’s occurrence paths. The new set of paths for f’ looks like this:

new-f’qaths : {(c,(updl,c),c)}

The effect of the aliasing is that updl cannot be done conflict in f.

now clear; this path shows destructively because of a

6 Benchmarks

In this section we present and discuss benchmarks for programs that are optimized using update analysis. Our benchmarks are for programs in ALFL[~~], a functional langua.ge developed at Yale, and were run on a RiIac- into&II with 13 megabytes RAM. The ALFL programs were translated into T [18] and then submitted to Orbit, the T compiler. Times shown are for compiled T code,

using version 3.1 of T with 8 megabyte heaps. Since ALFL translates into T, the ALFL compiler could at best generate optimal T code. Note that T uses applicative- order evaluation, and that arrays are non-functional and are implemented efficiently through destructive udpat- ing.

Table 1 presents benchmarks for the following prc~ grams:

qzlicbort: Hoare’s quicksort.

bubsort: Bubblesort.

tridiag: Tridiagonal factorization.

init: Vector initialization.

matinit: Matrix initialization.

matmuk hlIatrix multiplication.

The size of the structures manipulated by each of the programs is noted in the table. Note that vector size does not affect update analysis; large vectors were used for the smaller programs to bring run times out of the noise level. The lOOO-element vector was added for init because it allowed the copying strategy to be bench- marked in a function where updating dominated the runtime. In addition to update analysis, strictness analysis, termination analysis, and uncurrying were per- formed on all programs [6,25]. For each program, the table gives the cpu time used by the hand-coded T pro gram using iteration and destructive operations when- ever possible; for the ALFL program using update analysis (which determined that all updates could be done destructively for each of these benchmarks); for the ALFL program using the trailers implementation; and when possibIe, for the ALFL program using the copying implementation.

In quicksort update analysis resulted in optimal performance - the time for the compiled ALFL code is the same as the time for the hand-coded T version. Us- ing trailers, however, produced a three-fold slowdown, which would be even worse if adjusted to account for eventual garbage collection of the additional memory required. Quicksort’s functions are all strict in their arguments aud its computations are vector-intensive, so the effects of more or less efficient vector operations are quite pronounced.

BUbSOTt differs from quicksort in two ways: its functions are strict in fewer arguments, and it does more selections relative to its number of updates. The strictness issue is reflected in the difference between the time for the ALFL program with destructive updates and the

35

TRIDIAG (1000 elements) MATMULT

((30x30)x(30x30)) MATINIT (30x 30)

Table 1: Benchmarks for update analysis on ALFL programs (seconds)

time for the T program. The more interesting point, however, is how close the trailer and copying times are. In the trailer implementation, access and update both carry penalties, while in the copying implementation only update is penalized. Thus the additional overhead of copying is to some extent compensated for by faster access. Of course, part of the picture is missing here; the copying implementation uses much more memory than the trailer implementation, and for a 200-element array the time for the trailer implementation increases in about the expected proportion, while the copying implementation cannot complete execution without garbage collecting (twice!). The destructive implementation, of course, is far superior to either.

Init is interesting because it is not strict in its last argument, and thus we must build and force a thunk to pass and access that argument. This accounts entirely for the difference between the T runtime of .02 and the optimized ALFL runtime of .04 on the lO,OOO-element array. However, this difference is swamped by the jump to 1.35 that occurs when trailers are used. The slowdown is so great for init because it does almost nothing except update arrays, so inefficiency in an array operation has a very strong effect on the overall runtime. Running init on a lOOO-element array produced times uncomfortably close to the noise level (a.lthough the ra- tios remained very close to those for the lO,OOO-element array), but this was the only update-intensive esample on which we were able to benchmark the copying implementation. The slowdown here was anticipated, but is still impressive!

Tridiag can be fully strictified, but is interesting because it relies heavily on floating-point operations. T uses a consing ff oating-point implementation, requiring two longwords for each floating-point operation. The overhead thus introduced is substantial, and the effect of an inefficient array implementation is muffled by the floating-point inefficiencies. However, update analysis still produces optimal performance, and using trailers yields a performance degradation of over a factor of two, so the optimization is still significant.

Matmult shows significant speedup from update analysis, but the effect is somewhat muffled by non-strict functions and the basic cost of matrix operations. The interesting point here is that because selections greatly outnumber updates, trailers are actually slower than a copying implementation.

Matinit shows slightly better speedup than matmult largely because it performs no other interesting runtime function besides matrix operations, so the effect of im- proved performance on these operations is more pronounced. Nevertheless, its speedup is less than that of init because of the greater cost of the matrix operations and the overall greater complexity of the program.

6.1 collclusiolls

The data in this section shows that the effect of suc- cessful update a.nalysis varies widely. In a function whose dominant costs stem from array manipulation,

36

update analysis can produce speedups of one to two or- ders of magnitude; in a function with high overhead from sources such as non-strict functions or expensive runtime operations, particularly if the number of array manipulations is relatively few, the effect may be much smaller. The speedup is significant, however, for all of these array-based functions. The speedup produced by trailers can also be significant, but as the matmult benchmark demonstrates, the trailer representation will actually lose to copying when array selections greatly outnumber array updates.

7 Complexity of Update Analy- sis

In [5] we show that path analysis subsumes strictness analysis, which is shown in [15] to haGe a lower-bound worst-case complexity exponential in the number of arguments to a function. It is easy to show that path analysis is an abstraction of update analysis, so update analysis must be at least exponential in the number of arguments to a function in the worst case; while the computation of a tight lower bound on the complexity of update analysis is beyond the scope of this paper, we suspect that it is in fact worse than that of strictness analysis. For a function of n arguments, an upper bound on the number of iterations required is easily established as CiZ1(n!/lc!), approximated by n!, representing the height of domain Pem(Path). In practice, update analysis is expensive and may be intractable for large programs. This is not surprising in light of another result in this area: a full interprocedural strictness analysis with a limited higher-order analysis was recently determined to be impractical for large programs as we1l.s Thus although update analysis as described here may have limited applications as a practical tool, we hope and expect that it can serve as a basis for further abstraction.

8 Conclusions and Future Work

We have extended path semantics to update semantics and its computable abstraction update analysis, which when combined with a primitive form of collecting provides the information required to determine when destructive aggregate updating is safe in a first- order sequential lazy functional language. Our benchmarks show that update analysis is effective at detecting

‘This conclusion was reached by the functional programming group at Yale, and was based on the strictness analyzer described in [25].

destructively-updatable aggregates, and that the analysis can result in very significant improvements in runtime. Although update analysis is found to be expensive, we speculate that a suitable abstraction could be found to provide a substantial amount of information at a reasonable cost,.

Future work includes searching for such an abstraction along with exploring update anaIysis for higher- order and parallel systems. The theory of higher-order update analysis is straightforward, but we have not implemented it directly because of its complexity. We cur- rently use heuristics to handle higher-order constructs, but we hope to find a suitable abstraction for update analysis that will extend to higher-order constructs as well. Update analysis for a parallel system requires a new model of order of evaluation in which the evaluation of the arguments to strict primitives is not constrained to be done sequentially. We are exploring a graphical model of order of evaluation that may be suitable, and hope to develop a general technique for the analyis of functional programs in a parallel system.

9 Acknowledgements

Thanks to Paul Hudak for his comments on various as- pects of this work and a draft of this paper. Much of this work was done while the author was at Yale University.

References

PI

PI

[31

PI

S. Abramsky and C. Hankin. Abstract Interpre- tation of Declarative Languages. Ellis Horwood, 1987.

Arvind, R.S. Nikhil, and K.P. Keshav. I-structures: data structures for parallel computing. In Pro- ceedings of the Workshop on Graph Reduction, Los Alamos, New Mexico, February 1987.

H. Barendregt and M. van Leeuwen. Fanciional Programming and the Language TALE. Techni- cal Report, Mathematical Institute, Netherlands, 1985.

A. Bloss. Path Analysis and the Optimization of Non-strict Functional Languages. PhD thesis, Yale University, Department of Computer Science, 1989. Available as Research Report YALEU/DCS/RR- 704.

37

151

PI

[71

PI

PI

WI

Pll

P21

P31

WI

PA

A. Bloss and P. Hudak. Path semantics. In Proc. Third Workshop on the Mathematical Founda- tions of Programming Language Semantics, ACM, Springer-Verlag LNCS 298, April 1987.

A. Bloss, P. Hudak, and J. Young. An optimising compiler for a modern functional language. The Computer Journal, 31(6):152-161, 1988.

P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th ACM Symposium on Principles of Programming Languages, pages 238-252, ACM, 1977.

K. Gopinath. Copy elimination in single assignment languages. PhD thesis, Stanford University, 1988.

K. Gopinath and J. Hennessy. Copy elimination in functional languages. In Proceedings of the 16th ACM Symposium on Principles of Programming Languages, January 1989.

P. Hudak. ALFL Reference Manual and Program- mer’s Guide. Research Report YALEU/DCS/RR- 322, Second Edition, Yale University, October 1984.

P. Hudak. Arrays, non-determinism, side-effects, and parallelism: a functional perspective. In Pro- ceedings of the Santa Fe Graph Reduction Work- shop, pages 312-327, Los Alamos National Labora- tory/MCC, Springer-Verlag 279, October 1986.

P. Hudak. A semantic model of reference counting and its abstraction (detailed summary). In Symposium On Lisp and Functional Programming, pages 351-363, ACM, August 1986.

P. Hudak and Philip et al. Wadler. Re- port on the Functional Programming Language Haskell: Draft Proposed Standard. Technical Re- port YALEU/DCS/RR666, Yale University, De- partment of Computer Science, December 1988.

P. Hudak and J. Young. Collecting interpretations of expressions (without powerdomains). In Pro- ceedings of the 15th ACM Symposium on Principles of Programming Languages, pages 107-118, Jan- uary 1988.

P. Hudak and J. Young. Higher-order strictness analysis for untyped lambda calculus. In 12th ACM Symposium on Principles of Programming Languages, pages 97-109, January 1986.

WI

I171

Ml

WI

PO1

WI

P21

[231

PI

P51

T. Johnsson. Lambda lifting: transforming programs to recursive equations.

R. Keller. FEL Programmer’s Guide. Technical Report, University of Utah, April 1983.

D. Kranz, R. Kelsey, J. Rees, P. Hudak, J. Philbin, and N. Adams. Orbit: an optimizing compiler for Scheme. In SIGPLAN ‘86 Symposium on Compiler Construction, pages 219-233, ACM, June 1986. Published as SIGPLAN Notices Vol. 21, No. 7, July 1986.

R.S. Nikhil, K. Pingali, and Arvind. Id Nouveau. Computation Structures Group Memo 265, Mas- sachusetts Institute of Technology, Laboratory for Computer Science, July 1986.

J.T. O’Donnell. An architecture that efficiently updates associative aggregates in applicative pro gramming language. In Functional Programming Languages and Computer Architecture, pages 164- 189, Springer-Verlag LNCS 201, September 1985.

D.A. Schmidt. Detecting global variables in den+ tational specifications. ACM Transactions on Pro- gramming Languages and Systems, 7(2):299-310, 1985.

D.A. Schmidt. Detecting Stack-Based Environ- ments in Denotational Definitions. Research Re- port TR-CS-86-3, Kansas State University, Octo- ber 1986.

P. Wadler. A new array operation. In Pro- ceedings of the Santa Fe Graph Reduction Work- shop, pages 328-335, Los Alamos National Lab+ ratory/MCC, Springer-Verlag LNCS 279, October 1986.

D. Wise. Matrix algebra and applicative programming. In Proceedings of 1987 Functional Program- ming Languages and Computer Architecture Con- ference, pages 134-153, Springer Verlag LNCS 274, September 1987.

J. Young. Theory and Practice of Semantics- Directed Compiling for Functional Programming Languages. PhD thesis, Yale University, Depart- ment of Computer Science, 1988.

38

Documents

[ACM Press the fourth international conference - Imperial College, London, United Kingdom (1989.09.11-1989.09.13)] Proceedings of the fourth international conference on Functional