11
Systems and Computers in Japan, Vol. 26, No. 10, 1995 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J78-D-I, No. 2, February 1995, pp. 200-209 The Extended C Language NCX for Data-Parallel Programming Taiichi Yuasa, Member and Toshiro Kijima, Nonmember Faculty of Engineering, Toyohashi University of Technology, Toyohashi, Japan 441 Yutaka Konishi, Nonmember Department of Electronic Engineering, Aichi College of Technology, Gamagori, Japan 443 SUMMARY NCX is an extended C language for data-paral- lelism, which is one of the most important computation models to support realistic applications of massively parallel computers. The design criteriaof the language include easy shifting from C language, low-cost imple- mentation of efficient compilers, and high integrity as a programming language. The language is based on the concept of virtual processors, each being powerful enough to execute the full-set C language. Several features for data-parallel computation, such as inter- processor communication, are added to the language so that obey the design principles of the base language C. The language is intended to be used on various archi- tectures and is now being implemented for some ma- chines with different architectures. This paper over- views the major extended features of NCX, together with some programming examples, and shows that NCX provides sufficient expressive power for data- parallel computation while it is based on the simple and clear notion of virtual processors. Key words: Data-parallel communication; NCX language; C language. 1. Introduction Several computation models have been proposed to support programming on massively parallel com- puters with a huge number of processors. Among them, data-parallelism is one of the most promising models. In fact, many application programs of mas- sively parallel computers are known to be expressed with data-parallelism. Although not all parallel appli- cations can be expressed by data-parallelism, it is ex- pected that more applications be developed in data- parallelism in the future because more computation power is required today. So far, some extended C languages have been proposed for data-parallel programming [l, 2, 5, 71. Many of these have originally been designed for relatively low-level programming on specific archi- tecture, in particular on (single-instruction, multiple- data) SIMD machines. Thus, these languages tend to have restrictions that reflect the architecture of the target machine. As non-SIMD massively parallel ma- chines become more popular, these languages are sometimes used on machines other than the original target machines [2, 6, 91. Even then, language re- strictions usually remain that complicate language-level compatibility. This results in low intensity as a modem programming language (see [ 101 for concrete examples of this discussion). This paper introduces a new extended C lan- guage, called NCX [ 111, for data-parallelprogramming. NCX was designed as a common programming lan- 13 ISSNO882- 1666/95/OoIO-Oo13 0 1995 Scripta Technica, Inc.

The extended C language NCX for data-parallel programming

Embed Size (px)

Citation preview

Page 1: The extended C language NCX for data-parallel programming

Systems and Computers in Japan, Vol. 26, No. 10, 1995 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J78-D-I, No. 2, February 1995, pp. 200-209

The Extended C Language NCX for Data-Parallel Programming

Taiichi Yuasa, Member and Toshiro Kijima, Nonmember

Faculty of Engineering, Toyohashi University of Technology, Toyohashi, Japan 441

Yutaka Konishi, Nonmember

Department of Electronic Engineering, Aichi College of Technology, Gamagori, Japan 443

SUMMARY

NCX is an extended C language for data-paral- lelism, which is one of the most important computation models to support realistic applications of massively parallel computers. The design criteriaof the language include easy shifting from C language, low-cost imple- mentation of efficient compilers, and high integrity as a programming language. The language is based on the concept of virtual processors, each being powerful enough to execute the full-set C language. Several features for data-parallel computation, such as inter- processor communication, are added to the language so that obey the design principles of the base language C. The language is intended to be used on various archi- tectures and is now being implemented for some ma- chines with different architectures. This paper over- views the major extended features of NCX, together with some programming examples, and shows that NCX provides sufficient expressive power for data- parallel computation while it is based on the simple and clear notion of virtual processors.

Key words: Data-parallel communication; NCX language; C language.

1. Introduction

Several computation models have been proposed

to support programming on massively parallel com- puters with a huge number of processors. Among them, data-parallelism is one of the most promising models. In fact, many application programs of mas- sively parallel computers are known to be expressed with data-parallelism. Although not all parallel appli- cations can be expressed by data-parallelism, it is ex- pected that more applications be developed in data- parallelism in the future because more computation power is required today.

So far, some extended C languages have been proposed for data-parallel programming [l, 2, 5, 71. Many of these have originally been designed for relatively low-level programming on specific archi- tecture, in particular on (single-instruction, multiple- data) SIMD machines. Thus, these languages tend to have restrictions that reflect the architecture of the target machine. As non-SIMD massively parallel ma- chines become more popular, these languages are sometimes used on machines other than the original target machines [2, 6, 91. Even then, language re- strictions usually remain that complicate language-level compatibility. This results in low intensity as a modem programming language (see [ 101 for concrete examples of this discussion).

This paper introduces a new extended C lan- guage, called NCX [ 111, for data-parallelprogramming. NCX was designed as a common programming lan-

13 ISSNO882- 1666/95/OoIO-Oo13 0 1995 Scripta Technica, Inc.

Page 2: The extended C language NCX for data-parallel programming

guage that can be used on a wide class of parallel machines, including SIMD, (single-program, multiple- data) (SPMD) and (multiple-instruction,multiple-data) (MIMD) machines. It is also intended to be used efficiently on ordinary sequential machines. With current compiler technology, automatic parallelization of sequential programs is still in the experimentation stage, and thus explicit description of parallelism is inevitable to make use of the full power of parallel machines. This is, however, expensive and it is not realistic to prepare a separate parallel code for each machine. Rather, we would like to write a single code in a common language that can be executed efficiently on parallel machines with various architectures. One of the design objectives of NCX is to sewe as such a common language.

NC'X has the following features to make it easy to write data-parallel programs.

1. The language assumes an infinite number of virtual processors (VPs for short), each having the power of executing the full-set C language [4] efficiently. By dynamically selecting a fi- nite set of VPs at any time during computa- tion the programmer explicitly expresses par- allelism.

2. No extensions are made for data types. The concept of parallel data types, which can be found in many data-parallel languages, does not exist in NCX.

3. Inter-W communications are expressed in terms of remote accessing of variables from one VP to another VP.

4. The sequential part of a program is executed by a distinguished VP, called root. That is, by selecting the VP set that consists only of the root, the program code can be expected se- quentially. Sequential execution is a special case of parallel execution, and thus no special concepts or notations for sequential execution are necessary.

With these features, expert C programmers are ex- pected to shift to NCX without difficulties.

For implementation efficiency, NCX has the fol- lowing features.

1. The programmer can explicitly specilj data allocation for parallel computation. The allo- cation is specified in terms of VPs, to make it

machine-independent. The mapping from a W set to physical processors is determined by each language processor.

2. For each function, the programmer can specify those VP sets that the function is intended for. This gives the language processor the chance to generate an efficient code dedicated for the specified VP sets.

3. In general, a function call in data-parallel languages requires processor synchronization. However, some functions do not actually re- quire synchronization on their calls. By analyzing the entire program, the compiler could detect such functions and remove re- dundant synchronizations. This is, however, impossible when the program files are com- piled separately, as with almost all C com- pilers. Therefore, NCX provides a mechanism to declare such functions in the program file of the caller. For this purpose, we adopt the const declaration for functions that is found in some C compilers, such as G€C [8].

In the rest of this paper, we first introduce VP sets, called fields, in section 2. Fields are the most fundamental concept for data-parallel computation in NCX. All extended features of NCX are defined in terms of fields. In sections 3 and 4, we describe variable allocation to fields and constructs €or data- parallel computation, respectively. The assignment and reduction operations, described in section 5 , are similar to those in other data-parallel C languages, but are more flexible, because they are defined in terms of fields. In sections 6 and 7, we present two unique features of NCX, the mechanism to dynamically extend parallelism and indirect remote accessing via pointers. Finally in section 8, we introduce the extended mech- anism of NCX to define functions, which is suitable to program in terms of fields.

2. Virtual Processors

As mentioned already, NCX programs are sup- posed to be executed by virtual processors (Ws). Although the language assumes an infinite number of VPs, only a finite number of VPs are executing at any time during the computation. These VPs are called active at that time, while other W s are called inactive.

At a certain time during the computation, by the activity of a VP, we mean the state of the VP whether it is active or not. There are two ways to change the

14

Page 3: The extended C language NCX for data-parallel programming

activity of VPs. One way is to select those VPs that satisfy a certain condition, among the currently active VPs. The selected VPs keep executing and the other VPs become inactive. NCX constructs such as the if statement, to be described here, are used for this purpose. The other way to change the VP activities is to select only VPs in a VP set, called field, that is defined in advance (statically or dynamically). It is possible to define multiple fields for a single program, but only one field is selected at any moment during the program execution. That field is called the current field.

2.1. Base fields

Each field is defined as a finite subset of one of the predefined base fields. A base-field is an infinite set of VPs that are connected by a certain network. There are three base-fields: mesh, hypercube, and bi- narytree. The network of each base field corresponds to a communication pattern that is frequently used in actual data-parallel computation.

The base-field mesh is a set of VPs that are arranged in a mesh structure with infinite number of dimensions and with infinite size in each dimension. More precisely, mesh is defined as follows. Each VP in mesh is uniquely identifiedby an index (io, i,, i,, ...), which is an infinite sequence of integers. The VP with the index (Io, i,, i,, ...) is directly connected to the VPs with the following indexes.

(20 f 1, 2 1 , 2 2 , . . .)

we denote the VP in mesh with the index (io, i,, i,, ...). If all ik (k 2 n) are 0 for some n (2 0), then we abbreviate P(io, i,, i,, ...) as

The base-field hypercube is a set of VPs that are arranged in a hypercube structure with infinite number of dimensions. The base-field binarytree is a set of VPs that are arranged in a binary tree structure with infinite depth. Since mesh will be used much more frequently than the other two base-fields in real ap-

plications, we will focus on mesh in the rest of the paper.

The VP P(0) in mesh is called the root processor, and is denoted Proor This VP is identical to the VP in hypercube at the origin and to the VP in binarytree at the root position. Two base-fields do not share VPs except Prmt, and no VP other than Pro,, in a base-field is connected to any V P s other than Prmt in another base-field. Although any VP can communicate with any other VP (even in a different base-field), com- munication between two connected VPs can be as- sumed faster.

2.2. Fields

Each field is declared as a finite subset of a base field. The following are the declarations of three fields a l , vl , and h l , which we will use throughout this pa- per:

field al(10,lO) on mesh; field vl(l0) on mesh; field hl(i:lO) on mesh(0,i);

These fields are subsets of mesh, and are called mesh fields. The field a1 is a set of VPs that are arranged in a 10 x 10 matrix:

The field v l is a one-dimensional set of VPs:

Figure 1 illustrates these fields. In the figure, the circle at the upper-left corner stands for the root processor Proor (= P(0, 0) ) , the circle at the lower-left corner for P(9) (= P(9, 0)) , and the circle at the up- per-right corner for P(0, 9 ) .

The i in the declaration of h l is called index variable and is used to specify the location of mesh where each VP of h l resides. This correspondence is called a field mapping, or simply a mapping. In this example, the i-th VP of h l is specified as P(0, i), and thus h l represents the VP set:

( ~ ( 0 , i) I O S i < 10)

The field mappings for the fields a1 and vl are ab- breviated in the declarations because they use the de-

1s

Page 4: The extended C language NCX for data-parallel programming

I

I

I

I

I

I

I

I

I

I

' V I

I

I

I

I

I

a

I

I

Fig. 1. Field examples.

fault mappings. With explicit mapping specifications, their declarations would look

As will be explained later, each NCX variable is allocated to a specified field. By using field mapping, the programmer can easily specify data allocation for efficient communication.

Each expression used for a mapping specification is either an integer constant or a polynomial (uc + b of an index variablen with some integer constants u and b. Therefore, the VP set in an arbitrary rectangular region can be declared as a mesh field. NCX requires that, if a field mapping is explicitly specified in a mesh field declaration, then every dimension must be associ- ated with an index variable, and each index variable must appear at least once in the mapping specification. With this constraint, field mappings are guaranteed one-to-one.

2.3. Tbe built-in field mono

A special field, called mono, is predefined in the language, which consists only of the root processor Prools. When the current field is mono, only one pro- cessor ProOl is active. Therefore, the program is execu- ted sequentially. An NCX program begins to run with

Fig. 2. Variable allocations to fields.

mono as the initial current field. That is, only the root processor Pml is active when a program begins to run.

NCX is designed so that any C language program runs as an NCX program in exactly the same way as defined in the C language, if all functions are declared to be monospecific (see section 8). The only VP that becomes active during the execution of such a program is the root processor Pro,,.

3. Variables

3.1. Variable declarations

Variable declarations in NCX are associatedwith fields to which the variables are allocated. For exam- ple:

int a on al; int h on hl; int v on vl; int x on mono;

The first declaration declares an int variable a that is allocated to (all VPs in) the field al . Similarly, h, v, and x are allocated to the fields hl , v l , and mono, respectively (see Fig. 2). The key word "on1' and the rest of a variable declaration are optional. When ab- breviated, variables are allocated to mono for top-level declarations, and to the current field for declarations inside function definitions.

NCX makes no extensionsfor data types. Arrays,

16

Page 5: The extended C language NCX for data-parallel programming

pointer variables, structure variables, and union vari- ables are declared by adding the optional specification to C language declarations. For instance, the following declaration allocates an int array m with 12 elements and an int pointer p, both to the field al.

int m[i21, *p on ai;

3.2. Variable references

Since a variable declaration allocates variables of the same name to all VPs in the specified field, the variable name is not enough to identify a variable. We need to specify the VP to which the intended variable is allocated. To do this, NCX uses the following syntax for variable references.

identzjied ( ezpr, , . . . , ezpr,, )

The value of each erpr is used as a field index to iden- tify the VP. The VP is determined by the field map- ping of the field to which the variable is allocated. For instance,

denote a at P(2, 3), v at P(1, 0), and b at P(0, l), re- spectively.

The VP specification can be abbreviated in the following cases.

0 When a mono variable, i.e., a variable that is allocated to mono, is referenced. VP speci- fication is unnecessary for mono variables, since they are allocated only to the root pro- cessor.

0 When a variable that is allocated to the cur- rent field is referenced and each currently active VP references the variable that is allocated to the VP itself. This abbreviation allows concise coding and, more importantly, makes it clear that the reference is local to each VP.

4. Parallel Constructs

4.1. The in statement

The in statement is introduced to NCX to switch the current field. An in statement

in f ie ld stat

makesfield the current field. Only VPs infield become active and execute the body stat. After all VPs have finished the execution of stat, the field that was the current field right before the in statement will become the current field again, and the VP activities before the in statement will be resumed.

Suppose a and b are variables that are allocated to the field al . With the following in statement, VPs in a1 will assign the value of b to a in parallel

in a1 a = b;

Since VPs in a1 are arranged in a 10 x 10 matrix, this statement can be interpreted as a parallel assignment from a two-dimensional square matrix b to another square matrix a.

An in statement may contain declarations for index variables that are used to specify individual VPs within the body. For example, the following in state- ment declares two index variables i and j.

in al(i,j) . . .

For each VP P(i, j ) in al, these variables are bound to i and j , respectively. Index variables can be used in exactly the same way as ordinary variables. For ex- ample, if we regard the variable a in a1 as a 2-D square matrix, the following statement is to assign to each element of a the sum of the row number and the column number of the element.

in ai(i,j)

Here are more assignment examples in an in statement.

in ai(i,j) .(

a = x ;

b = b@(i,O);

a = aa(j,i);

1 By the first assignment, each VP in a a1 assigns the value of the mono variable x to its own a. This is a broadcast operation from Proo, to the VPs in al. By the next assignment statement, each P(i, j) assigns the value of b at P(i, 0), to its own b. This can be re-

17

Page 6: The extended C language NCX for data-parallel programming

garded as a multicast operation from the left-most column to each row. The last assignment transposes the matrix a.

When a VP accesses a variable in another VP, the operation is called remote access. A remote access of a variable is performed by specifying another VP as the field matrix after the variable name. Any value is allowed as the field index as long as the value specifies a VP correctly. This implies that VPs can perform re- mote accesses of a random pattern. However, random remote accesses are expensive on most machines. It is recommended to use the network of each field for bet- ter performance. For this purpose, NCX provides several macros with which the programmer can specify VPs that are connected directly with each VP.

There are some restrictions on the use of NCX statements as will be mentioned later, but any valid statement can be used in the body of an in statement. In particular, an in statement can be nested within another in statement. Incidentally, an in statement does not block the scope of a variable that is declared outside the in statement. Such a variable can be ref- erenced from inside the in statement.

4.2. Statements

Statements in NCX are:

labeled statement,

expression statement,

compound statement,

and statements that begin with the following key words:

i f while for do switch break continue return goto i n a l l spawn

Among these, statements that begin with in, all, and spawn are extensions of NCX. The other statements are defined also in C. They behave in exactly the same way as in C, when the current field is mono. In this sense, these NCX statements are upper compatible with those in the C language.

When the current field is other than mono, these statements have similar meanings as in other data- parallel C languages. For example, an if statement

i f ( ezpr 1 stat] else stat2

is used to select some of the currently active Ws. All active VPs evaluatetqr first. Then only those the VPs whose values are true remain active and execute statl. After the execution of statl, only those VPs whose values were false become active and execute stat2 After the execution of stat2, the activity before the if statement will be resumed. For example, the following if statement doubles the values of a if they are larger than 100, and sets 100 to a otherwise.

i n a1

i f (a >= I O O )

a *= 2 ;

e l m a * 100;

With a while statement,

while ( expr ) stat

each active VP executes the body stat repeatedly, while it satisfies the condition apr . All active Ws evaluate t q r first. Those VPs whose values are false become inactive until the other VPs finish execution of the while statement. Those Ws whose values are true execute the body stat and then evaluate the condition t q r again. This process repeats as long as there remain VPs whose values of t q r are true. The execu- tion of the entire while statement ends when all VPs that evaluate q r get false values. After that, activity before the while statement will be resumed. The other iteration statements are defined similarly.

goto statements are allowed only when the cur- rent field is mono. This constraint comes from the nature of the data-parallel model. Similarly, some of the other statements have natural constraints on their use. One of such constraints is that a break statement cannot break from within an in statement to the out- side.

An all statement

a l l .?tat

activates all VPs in the current field to execute stat. After all VPs have finished the execution of stat, the activity before the all statement will be resumed. The third extended statement spawn will be explained in a later section.

18

Page 7: The extended C language NCX for data-parallel programming

5. Assignments and Reductions

Each assignment expression is executed in two steps. First, all active VPs evaluate the right-hand side. After all VPs have finished the evaluation, each VP assigns its value to the destination. If no two VPs assign to a same destination, then the assignment is a parallel version of the ordinary C assignment. If, however, more than one VPs are going to a same des- tination, then the assignments are performed one after another, in an unspecified order. As a result, one of the values will become the value of the destination, after the assignment.

For the assignment statements below, the des- tination are shared by more than one VP.

i n a l ( i , j ) {

a; x +=

x += 1; v@(i) += a;

1

By the first assignment statement, all active VPs assign to the same mono variable x. Since the assignments to x are performed one after another, the value of a in one of the active Ws will become the value of x after the assignment. Thus assignment expressions in NCX can be used for data selection among Ws. This fea- ture is useful for such applications that select one of the solutions that VPs have found independently. As with this assignment, when all VPs assign to a same destination, only a single assignment operation is necessary. In particular, assignments to mono variables can be detected at compile time, and thus the pro- grammer may assume that only one assignment opera- tion will actually be performed.

The destination of the second assignment state- ment is also a mono variable. Since all VPs are going to assign 1, the value of x always becomes 1. Again, the programmers assume that only one assignment op- eration will actually be performed. If we regard a as a matrix and v as a column vector, then the last as- signment statement assigns one of the values of a in each row i to the i-th element of v. That is, this assignment statement performs row-wise selections in parallel.

Some assignment operators, such as + =, perform an arithmetic operation (an addition in the case of + =) before assignment. Such an operator is "atomic"

in the sense that, if two VPs share a same destination, then either VP performs the operation first while the other VP is waiting. Thus, if a destination of += is shared by multiple VPs, then all the values of the VPs are added to the destination. In other words, the sum of the values will be added to the destination.

1

The first assignment statement adds the sum of the values of a to the mono variable x, and the second assignment statement adds the number of active VPs to x. The last assignment state sums the values of a in each row to the corresponding element in v.

In NCX, some of the assignment operators can be used as prefix operators. When used as prefix operators, they perform the so-called reduction operations, i.e., combine values in all active VPs to produce a single value. For example, by

i n al(i,j) {

1 x = (+= a ) / (+= I);

the average of as in all active VPs will be assigned to X.

6. The spawn Statement

spawn is a construct to expand the rank of the current field temporarily. By expanding the rank, more VPs become active and, thus, more parallelism is obtained.

As an example of spawn, we will show the code to multiply 10 x 10 square matrices a and b. The result is stored in the third matrix c.

i n a l ( i , j ) { c = 0 ;

spawn(k: 10) c Q ( i , j ) += aQ(i,k) * b@(k,j);

In this example, spawn expands the rank of the current field from a 2-D field a1 to a 3-D field. For each

19

Page 8: The extended C language NCX for data-parallel programming

x = *pQ(6) ; activeP(i, j) in al, VPS

become active during the spawn statement. As a re- sult, the computation is performed by a 3-D field. k is the index variable for the third dimension.

7. Pointers and Arrays

When a variable is allocated to a field, the address of the variable in each VP is unspecified in NCX. The variables may be allocated at the same location for all VPs. Or, they may be allocated at different locations of every VPs. Therefore, in order to obtain the address of a variable by the address operator &, the intended VP, to which the variable is allocated, must be specified in addition to the name of the variable. The same syntax as for variable refer- ences is used for this purpose. That is, the address of an NCX variable can be obtained by the following ex- pression.

As in C, the expression to access an array ele- ment

pointer-ezpr [ e q r 3

is equivalent to

* ( ( pointer-ezpr ) t ( ezpr ) )

in NCX. This equivalence relation is maintained even when the expression is used as an lvalue, e.g., as the left-hand side of an assignment.

To access a location that is pointed to by a pointer, the programmer has to specifj which VP the pointer points to, since an NCX pointer may not contain the information. NCX determines the VP from the program text, by a certain rule. The determined VP is called the home processor of the pointer expression. Roughly speaking, the home processor is determined by the field index in the pointer expression. For references and assignmentsvia a pointer variable, the home processor is the VP to which the pointer available is allocated. For example, suppose the pointer variable p is declared as:

i n t *p on ai;

Then the home processor of the right-hand side of

is P(5) . Therefore, this assignment statement first obtains the value of the pointer variable p in P(5), and then assigns the value at the location of P(5) to which the pointer points. Similarly, the statement

* p Q ( l ) = 3;

first obtains the value of the pointer variable p in P(1) and then assigns 3 to the location of P(1) to which the pointer points.

8. Functions

A function call is performed by all active VPs in parallel. The invoked function and the number of arguments must be the same for all VPs, but the argument values may be different among VPs. The syntax of function calls is the same as in C, and a function call may appear wherever a function call is allowed in C.

When a function is invoked while the current field is F, the function is said to be "called from F." NCX functions are classified into three categories according to the fields that are allowed to call the function.

1. functions that can be called from any field

2. functions that can be called only from a cer- tain field

3. functions that can be called from fields of a certain base-field

The header of the function definition determines the category to which the function belongs.

8.1. Functions that can be called h m any field

A function that is definedwith a function header with the syntax of C can be called from any field. For example,

i n t max(x,y) i n t x , y ; <

i f (x >= y) r e t u r n (x ) ;

else

20

Page 9: The extended C language NCX for data-parallel programming

return ( y 1 ; 1

this function m a can be called from any field.

All standard library functions in C are also de- fined in NCX. Some of them, namely arithmetic func- tions, are defined so that they can be called from any field. For instance, sin is defined as follows (in the same manner as in C), and thus can be called from any field.

double sin(doub1e X ) {

. . 1

8.2. Functions that can be called only h m a certain field

If a field name is specified in the function header, then the function can be called only from the specified field.

int matmullO(a,b) in al(i,j) int a , b; 1

int c = 0;

spawn(k : 10)

cO(i,j) += aQ(i,k) * bQ(k,j); return (c) ;

1

This function performs the matrix multiplication we have already shown as an example of spawn. This function is intended only for al , and cannot be called from any other field. i and j are index variables that are similar to those declared in in statements, and are used to identify VPs in the body of the function.

Some of the standard library functions of NCX, namely I/O functions, are intended only for mono. For instance, getc is defined as follows, and can be called only from the field mono.

int getc(F1LE *stream) in mono < . . .

1

83. Functions that can be called h m fields of a certain base-field

The above function matmull0 can be called only from al. By modifying the function header, we can obtain a similar function which can be called from any field that is arranged in a 2-D square matrix.

int matmul (a, b) in field f(i:?n, j:n) on mesh

int a, b;

int c = 0 :

spawn(k:n) cO(i,j) += a@(i,k) * b@(k,j);

return(c) ;

The code after in in the second line is a field pattern. Only those fields that match this pattern can invoke this function. f is the name that can be used to denote the field that actually invokes the function. f is not used in this example, but it could be used in the same way as ordinary field names, in such places as in state- ments (to specify the new current field) and variable declarations (to specify the field to which the variables are allocated). The part after f, enclosed by paren- theses, specifies that only those mesh fields that are arranged in a 2-D square matrix match the pattern. .n means that the size of the first dimension is

arbitrary and declares a variable n which is used to refer to the actual size of the first dimension. The size of the second dimension is specified to be n, i.e., the same as the size of the first dimension. The variable n is also used in the spawn statement to specify the size of the extended dimension.

113 II

9. Conclusions

We have introduced NCX, and extended C lan- guage for data-parallel programming, and presented major extended features. The basic idea of NCX is quite simple. It assumes an infinite set of VPs, each having the power of executing the full-set C language. Programming in NCX is giving programs for VPs. Based on this simple model, several extensions for data-parallel programming are made so that they follow the design principles of the base language C. In spite of the simple base idea, NCX provides sufficient mechanisms for data-parallel programming. Our ex-

21

Page 10: The extended C language NCX for data-parallel programming

periences show that expert C programmers can easily shift to NCX.

Currently, several language processors of NCX are being developed for many machines with different architectures, including SIMD machines [ 131, dis- tributed-memory MIMD machines [3], vector proces- sors, shared-memory parallel machines, and engineer- ing workstations. These implementations use a common preprocessor, called NICS [12] (NCX Inter- mediate Code Synthesizer), and translate NCX pro- grams into C language programs of target machines. This approach reduces the cost of implementation to a great extent.

For libraries, all standard library functions of C are defined also in NCX. However, as already men- tioned, standard I/O libraries of C are intended for mono and are able to perform sequential VO only. Currently parallel VO libraries for NCX are being designed.

REFERENCES

1. R. Bagrodia, K. M. Chandy and E. Kwan. UC: A Language for the Connection Machine. Proceedings Supercomputing '90, pp. 525-534 (1990).

2. P. J. Hatcher and M. J. Quinn. Data-Parallel Programming on MIMD Computers. MIT Press (1991).

3. H. Ishihata, T. Horie, S. Inano, T. Shimizu and S. Kato. An Architecture of Highly Parallel Computer APlOOO. IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, pp. 13-16 (1991).

AUTHORS

4. B. W. Kernighan and D. M. Ritchie. The C Programming Language, second edition, Pren- tice-Hall(1988).

5. MasPar Computer Corporation. MasPar Par- allel Application Language (MPL) User Guide, Document Part Number 9302-0100

6. S. Prakash, M. Dhagat and R. Bagrodia. Syn- chronization Issues in Data-Parallel Lan- guages. Languages and Compilers for Parallel Computing. Springer Lecture Notes in Com- puter Science, pp. 76-95 (1993).

7. J. R. Rose and G. L. Steele, Jr. An Extended C Language for Data Parallel Programming. Proceedings of the 2nd International Con- ference on Supercomputing (1987).

8. R. M. Stallman. Using and Porting GNU CC. FreeSoftware Foundation, Inc. (1990).

9. Thinking Machines Corporation. C* Pro- gramming Guide (1993).

10. W. F. Tichy, M. Philippsen and P. Hatcher. A critique of the programming language C*. Commun. ACM, 35(6), pp. 21-24 (19%).

11. T. Yuasa, et al. The Data-parallel C Lan- guage NCX Language Specification (Version 3). Toyohashi University of Technology (1993).

12. T. Yuasa, K. Kijima and Y. Konishi. The Data-Parallel C Language NCX and Its Im- plementation Strategies. Theory and Practice of Parallel Programming. Springer LNCS 907,

13. T. Yuasa, M. Matsuda and T. Kijima. SM-1 and its language systems. In: Parallel Lan- guage and Compiler Research in Japan, edited by Nicolau, A., Sato, M., and Bic, L., Kluwer Academic Press (to appear).

(1991).

pp. 433-456 (1995).

Taiichi Yuasa received his B.S., M.S. and Ph.D. degrees in 1977, 1979 and 1987, respectively, all from Kyoto University, Kyoto, Japan. He joined the Faculty of the Research Institute for MathematicalSciences, Kyoto University in 1982. He is currently a Professor at Toyohashi University of Technology, Toyohashi, Japan. His current area of interest includes symbolic computation and massively parallel computation. Dr. Yuasa is a member of ACM; IEEE; Information Processing Society of Japan; the Institute of Electronics, Information and Communication Engineers; and Japan Society for Software Science and Technology.

22

Page 11: The extended C language NCX for data-parallel programming

AUTHORS (continued, from left to right)

Toshiro Kijima received his B.E. and M.S. degrees in 1989 and 1991, respectively, both from Toyohashi University of Technology (TUT), Japan. He is a technical official in Computer Science at TUT. His current areas of interest include massively parallel computers and programming languages.

Yutaka Konishi received his B.S and M.S. degrees in 1986 and 1994, respectively, both from Toyohashi University of Technology, Japan. He was at Nikko Securities Co., Ltd. Tokyo, from 1986 to 1989. He joined the Department of Electronics Engineering, Aichi College of Technology, Japan, in 1990. His current research interests include parallel computers and programming languages.

23