3
Volume 15, Number 2 INFORMATION PROCESSING LETTERS 6 September 1982 FAST ALGiORIlYIM FOR SI=ARSE MXIXIX MLJLTIF@LICA’I’ION Amir SCHOOR School of Mathematics, Tel Aviv University, Ramat-Aviv, Tel Aviv, Israel Received 6 August 1980; revised version received 4 January 1982 Keywords: Analysis of algorithm, computational complexity, matrix multiplication In this note we present a fast algorithm for the multiplication of two sparse matrices, whose aver- age time complexity is an order of magnitude better than that of standard known algorithms (cf. [ 1 J) Specifically, let A be an M X N matrix whose density (Le., the fraction of nonzero elements) is D, , and let B be an N X K matrix with density D,. Our algorithm computes C = A X B on the average in time O(D,D&NK), which improves on the worst-case bound of O((D, + D&MNK) for the standard algorithm [ 11. The data structure that we use to represent a sparse matrix A is an orthogonal linked list (cf. [2]). That is, each nonzero element Aij of A is represented by a record a(i, j) whose fields are: the value of Ai,, the row index i, the column index j and two pointers to the next nonzero elements in its row and its column, respectively. The last link in each row or column is null. The addresses of the first nonzero element in each row (resp. column) of the matrix A are given by an array Row-A (resp. Column_A) of pointers. Thus an M X N matrix having DNM nonzero elements (i.e., having density D) will occupy SDMN + M + N memory cells in this representation. In the standard method for sparse matrix multi- plication each element Cid of the product C = A X B of an MXN matrix A of density D, by an N X K matrix B of density Dz is computed by simultaneously scanning the ith row of A and the jt:!z column of B, looking for elements a(i, t) and b(t,j) having the same column index (resp. row index) t, and summing the products of t.he values of these ‘matching elements’. This algorithm runs in time O((D, + D,)MNK), since it performs that many index comparisons. However, the number of actual multiplications that has to be performed is much less on the average. In what follows we present a faster algorithm which avoids these useless time-consuming index- comparison, and thus only using O(D,D,MNK) time on the average. In order to simplify our arguments, we will initially assume that the matrix A (resp. B) is uniforndy sparse over its columns (resp. rows). That is, we assume that the density of nonzero elements in each column (resp. row) of A (resp. B) is D, (resp. D,). In this ‘uniform’ case one can easily check that the number of actual multiplications that has to be performed is D,D#NK. Our algorithm will make use of this fact, and will be able to avoid the additional unnecessary index comparisons, thus only requir- ing O(D,D&NK) time. Its performance in the general nonuniform case will be discussed lacer in this note. Our fast algorithm is based on the simple ob- servation that as the matrices A and B are multi- plied, each nonzero element Aji of A is multiplied by all the nonzero elements of the ith row of B, and only by them. Our algorithm proceeds by iterating over the rows of A. For each such row i it iterates over the list of its nonzero elements. For _ each such element Ai,t the algoritt scans the tth _I row of B and multiplies Ai,t by each of the non- OOZO-0190/82/0000-0000/$02.75 0 1982 North-Holland 87

Fast algorithm for sparse matrix multiplication

Embed Size (px)

Citation preview

Volume 15, Number 2 INFORMATION PROCESSING LETTERS 6 September 1982

FAST ALGiORIlYIM FOR SI=ARSE MXIXIX MLJLTIF@LICA’I’ION

Amir SCHOOR School of Mathematics, Tel Aviv University, Ramat-Aviv, Tel Aviv, Israel

Received 6 August 1980; revised version received 4 January 1982

Keywords: Analysis of algorithm, computational complexity, matrix multiplication

In this note we present a fast algorithm for the multiplication of two sparse matrices, whose aver- age time complexity is an order of magnitude better than that of standard known algorithms (cf. [ 1 J) Specifically, let A be an M X N matrix whose density (Le., the fraction of nonzero elements) is D, , and let B be an N X K matrix with density D,. Our algorithm computes C = A X B on the average in time O(D,D&NK), which improves on the worst-case bound of O((D, + D&MNK) for the standard algorithm [ 11.

The data structure that we use to represent a sparse matrix A is an orthogonal linked list (cf. [2]). That is, each nonzero element Aij of A is represented by a record a(i, j) whose fields are: the value of Ai,, the row index i, the column index j and two pointers to the next nonzero elements in its row and its column, respectively. The last link in each row or column is null. The addresses of the first nonzero element in each row (resp. column) of the matrix A are given by an array Row-A (resp. Column_A) of pointers. Thus an M X N matrix having DNM nonzero elements (i.e., having density D) will occupy SDMN + M + N memory cells in this representation.

In the standard method for sparse matrix multi- plication each element Cid of the product C = A X B of an MXN matrix A of density D, by an N X K matrix B of density Dz is computed by simultaneously scanning the ith row of A and the jt:!z column of B, looking for elements a(i, t) and b(t,j) having the same column index (resp. row

index) t, and summing the products of t.he values of these ‘matching elements’. This algorithm runs in time O((D, + D,)MNK), since it performs that many index comparisons. However, the number of actual multiplications that has to be performed is much less on the average.

In what follows we present a faster algorithm which avoids these useless time-consuming index- comparison, and thus only using O(D,D,MNK) time on the average. In order to simplify our arguments, we will initially assume that the matrix A (resp. B) is uniforndy sparse over its columns (resp. rows). That is, we assume that the density of nonzero elements in each column (resp. row) of A (resp. B) is D, (resp. D,). In this ‘uniform’ case one can easily check that the number of actual multiplications that has to be performed is D,D#NK. Our algorithm will make use of this fact, and will be able to avoid the additional unnecessary index comparisons, thus only requir- ing O(D,D&NK) time. Its performance in the general nonuniform case will be discussed lacer in this note.

Our fast algorithm is based on the simple ob- servation that as the matrices A and B are multi- plied, each nonzero element Aji of A is multiplied by all the nonzero elements of the ith row of B, and only by them. Our algorithm proceeds by iterating over the rows of A. For each such row i it iterates over the list of its nonzero elements. For _

each such element Ai,t the algoritt scans the tth _I ’

row of B and multiplies Ai,t by each of the non-

OOZO-0190/82/0000-0000/$02.75 0 1982 North-Holland 87

Volume 15, Number 2 INFORMATION PROCESSING LETTERS 6 September 1982

zero elements BLj of this row. Each such product Ai,*B*j is added to the (i,j)th entry of the output matrix C.

TO perform these updatings of C rapidly, we use an auxiliary array END_COL whose jth entry points to the current last nonzero entry of the jth cdmn of C: Initially, each entry of END_COL is set to the null pointer, and C itself is empty. Suppose that the algorithm has already processed the first i - 1 rows of A; during this processing it has computed the nonzero entries in the first i - 1 rows of C. Currently C is maintained as a se- quence of # linked lists, such that the jth such list represents the nonzero elements in the first (i - 1) places in column j of C, and such that END_COLtjJ points to the last element in this list. The &orithm then iterates over the i#r row of A. For each nonzero product of the form V= Ai,tBti it checks whether the element pointed to by END_COu J is already at the ith row of C. If so, V is added to the value of this element; otherwise, a new element is created, added to the end of the jth column list of C, its value is set to V, and END_COLtjJ points to it.

When this part of the algorithm terminates, the nonzero entries of C have all been correctly com- puted, and are grouped into K separate linked lists, one per each cokunn of C. A final pass of the algorithm is next required, in order to introduce orthogonal links to repremt the row structure of C and to bring the representation of C into the standard sparse format.

In this final pass-the column lists of C are scanned one after the other. During this pass we also maintain an auxiliary array END-ROW, whose ith entry points to the current last element in the ith row of C. Initially, END, ROW is set everywhere to the null pointer. Suppose that the algorithm has already scanned the first j - 1 cd- umns of C, and is now scanning the jth column. For each nonzero entry Cij in this column, we examine p zz END-ROW [i]. If p is null, then ROW_C[i] is set to point at C,, since this is the

row of c; other- wise we make Cij the next element in the ith row after p, and reset END_ROW[iJ to Cij. At the end of this pass, C is represented in the required form.

88

Here is an informal, but more compact version of the algorithm:

(1) { Iuitialization) for i:=l to M do

ROW_tJi]:=END_ROW[i]:-=nill for j:= 1 to K do

COLUMN_CljJ:=END_COL~]::=nil (2) {Compute the nonzero entries in C except for their row

pointers} fori:=l toMdo

for ,sach element A, in the list lROW_A[i] do for each element IQ in the list ROW_B[t] do

if the row of the element of C pointed to by END_COYj)=i then add Ahts to the value of C,

else create a new record C, whose row is i, whose cohrmn is j and whose v&e is Ai,&; if ElUD_COyj]=nil, set COLUMN_q] to point at C; otherwise link <‘, as the next element in column of END_CO;Yj(. In either case reset END_COu] to point at C.

(3) {Fill in the row links in the records of C) for j:= ItoKdo

for each element C,, in the list COLUMN_q) do begin

if END_ROW[i]=nil then set ROW_C!(i] to point Cij

else set the record pointed to by END_ROwi] to point to C, as its next element i.a its row

reset END_ROwi] to point at C, end

Next consider the general case in which the density of A (resp, B) is not necessarily uniform over its columns (resp. rows). By examining the preceding algorithm it is plain that its time com- plexity is O(R), where

i=l

with Xi the number of nonzero entries in the i th column of A and Yi the number of nonzero entries in the ith row of B. Hence, the expected running time of the algorithm is proportional to the ex- pected size of R Let us make the following rea- sonable, assumptions. eoneerzkg the distribution of the nonzero~elements ,of- A and B:

(1) The (distribution. -of- the. elements of A is independent of the distributionof the elements of B;

Volume 15, Number 2 INFORMATION PROCESSING LE’ITERS 6 September 1982

l(2) The A-distribution is symmetric over the columns of A; likewise, the B-distribution is sym- metric over its rows;

(3) The expected number of nonzero elements of A (resp. B) is D,MN (resp. D,NK).

Then the expected value R is (E(X) denotes mathematical expectation of X)

E(R)= i E(XiYi) i=l

= 5 E(Xi)EI(yi) tbY (l))- i=l

But by (2) md (3)s E(Xi) = D,M and E(Yi) = DZK for each i= l,...,N. Hence

E(R) = D,D&fNK,

so that the average time complexity of the algo- rithm is indeed O(D,D,MNK), as asserted.

As a final comment, we would like to em- phasize the fact that the fast algorithm given above

works continuously on rows of the matrix B, UQ- like the conventional matrix multiplication meth d which works on columns of B. Thus, whenever one works with large matrices which are represented continuously by their rows, the fast algorithm GUI work faster than the conventional one, on non- sparse matrices as well as on sparse matrices.

Acknowhdgement

It is a pleasure to thank Micha Sharir for his comments and going over the manuscript.

References

VI

121

T. Donal and SJ. MacVcQh, Effax of data-representation on cost of sparse matrix operation, Acta Infarm. 7 (1977) 361-394. D.E. Knuth, The Art of Computer Programmin& Vol. 1: Fundamental At8Orithms (Addison~Wesley, Readin& MA, 1%9) p. 300 (contains the data structure used in this paper).

89