17
THE MATRIX & THE SOCIAL NETWORK DAVID F. GLEICH PURDUE UNIVERSITY, COMPUTER SCIENCE

What the matrix can tell us about the social network

Embed Size (px)

DESCRIPTION

In this talk, I give a high level picture of my research about how looking at social network problems as matrix computations is a productive line of work.

Citation preview

Page 1: What the matrix can tell us about the social network

THE MATRIX & THE SOCIAL NETWORK

DAVID F. GLEICH PURDUE UNIVERSITY, COMPUTER SCIENCE

Page 2: What the matrix can tell us about the social network

RÉSUMÉ Undergraduate Harvey Mudd College Joint CS/Math!Internships Microsoft and Yahoo!!Graduate Stanford University !

"Computational and Mathematical Engineering!Internships Intel, Yahoo!, Microsoft, Library of Congress!Postdoc University of British Columbia!Postdoc Sandia National Laboratories "!

"John von Neumann Fellow !!

Faculty Purdue University Computer Science!!

Page 3: What the matrix can tell us about the social network

Copyright Warner Brothers

Page 4: What the matrix can tell us about the social network

Copyright Warner Brothers

“The series depicts a

cyberpunk story incorporating

numerous references to philosophical and religious

ideas.” --Wikipedia

Page 5: What the matrix can tell us about the social network

Copyright Warner Brothers

Page 6: What the matrix can tell us about the social network

Copyright Warner Brothers

Physical simulation

Page 7: What the matrix can tell us about the social network

Matrix computations are the heart

(and not brains) of

modern physical

simulation.

Page 8: What the matrix can tell us about the social network

Matrix computations

Physics Statistics

Engineering Graphics

Databases …

Page 9: What the matrix can tell us about the social network

Matrix computations

A =

2

66664

A1,1 A1,2 · · · A1,n

A2,1 A2,2 · · ·...

.... . .

. . . Am�1,nAm,1 · · · Am,n�1 Am,n

3

77775

Ax = b min kAx � bk Ax = �x

Linear systems Least squares Eigenvalues

Page 10: What the matrix can tell us about the social network

“ … greed,

obsession, unpredictability

… ” --Wikipedia

Copyright Columbia Pictures

Page 11: What the matrix can tell us about the social network

The power of connections

Page 12: What the matrix can tell us about the social network

The matrix is a powerful and

productive paradigm for

studying networks of

connections.

Page 13: What the matrix can tell us about the social network

Matrix computations

A =

2

66664

A1,1 A1,2 · · · A1,n

A2,1 A2,2 · · ·...

.... . .

. . . Am�1,nAm,1 · · · Am,n�1 Am,n

3

77775

Ax = b min kAx � bk Ax = �x

Linear systems Least squares Eigenvalues

Page 14: What the matrix can tell us about the social network

RAPr on WikipediaE [x(A)]

United States

C:Living people

France

United Kingdom

Germany

England

Canada

Japan

Poland

Australia

Std [x(A)]

United States

C:Living people

C:Main topic classif.

C:Contents

C:Ctgs. by country

United Kingdom

France

C:Fundamental

England

C:Ctgs. by topic

Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41

A new matrix-based sensitivity analysis of Google’s PageRank.

Presented at" WAW2007, WWW2010

Published in the

J. Internet Mathematics

Led to new results on uncertainty quantification in

physical simulations published in SIAM J. Matrix

Analysis and SIAM J. Scientific Computing.

Patent Pending

Improved web-spam detection!

Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)

Page 15: What the matrix can tell us about the social network

Fast matrix computations for "Katz scores and commute times.

Presented at"WAW2010

Published in the

J. Internet Mathematics

Reduced computation time by orders of magnitude!

Tweet along @dgleich

MAIN RESULTS – SLIDE THREE

David F. Gleich (Sandia) ICME la/opt seminar 4 of 50

Collaborators Chen Greif, Laks V. S. Lakshmanan, Francesco Bonchi, Pooya Esfandiar

Page 16: What the matrix can tell us about the social network

Tall-and-skinny QR factorizations on MapReduce architectures. Overlapping clusters for distri-buted network computations.

mrtsqr – summary of parametersBlocksize How many rows to

read before computing a QR factorization, expressed as a multiple of the number of columns (See paper)

Splitsize The size of each local matrix

Reduction treeThe number of reducers and iterations to use

David Gleich (Sandia)

A1

A2

A1

A2qr

Q2

A1

R1map

Mapper 1-1Serial TSQR

emit

S(2)S(1)

A

shuffle

Iteration 1

(Red)

(Red)

S(2)

(Red)

Iter 2 Iter 315/22MapReduce 2011

Network Alignment

�r

t

s

j

��t�

Square

A L Bweight overlap

upper bound 60,120 17,571 solving an LP – 1 day

NetAlignBP 56,361 15,214 iterative updates(think matrix-vectormultiplies) – 10 min

rounded LP 46,270 17,251 solving an LP –1 day

Note For these results, A is approx 200k vertices, B is approx 300k vertices, and L is5m edges. This setup yields a 5m variable integer QP.

David F. Gleich (UBC) Sandia 2 / 35

Algorithms for large sparse network alignment problems.

Rank aggregation via skew-symmetric matrix completion.

RANK AGGREGATION VIA NUCLEAR NORM MINIMIZATIONDAVID F.GLEICH · PURDUE

LEK-HENGLIM · U. CHICAGO

1. THE IDEAA classic data mining task is to determine the important items in a dataset. This isthe problem of rank-aggregation. Formally, given a series of votes on items by agroup of voters, rank-aggregation is the process of permuting the set of items sothat the best item is first, the second is next, and so on. These problems are difficult.Arrow’s theorem (1950) shows that the ideal rank-aggregation is impossible. Evena compromise proposed by Kemeny (1959), to take something like an average rank-ing, is NP-hard as shown by Dwork et al. (2001).

But can we somehow relax the problem?

Numerical rank-aggregations are intimately inter-twined with skew-symmetric matrices. Suppose thateach item is described by a score s�, and we are ableto collect comparisons about items. Assembling thecomparisons into a matrix yields

Y�,j = s�� sj, or Y = seT � esTwhich is a rank-2 skew-symmetric matrix.

OUTLINE

Ratings (= R)+ (§2)

Pairwise comparisons (= Y)+ (§3)

Ranking scores (= s)+ (sorting)

Rank aggregations.

Using Y, the idea is to get s, and then rank the items. But not all comparisons maybe available, nor will all comparisons be trustworthy. Thus, this is a matrix comple-tion problem: given a measured Y with missing entries, find the true Y.

Contributions(1) We propose a new method for computing a rank aggregation based on matrixcompletion, which is tolerant to noise and incomplete data. (2) We solve a struc-tured matrix-completion problem over the space of skew-symmetric matrices. (3)We prove a recovery theorem detailing when our approach will work. (4) We per-form a detailed evaluation of our approach with synthetic data and an anecdotalstudy with Netflix ratings. Below, we show how our method (log-odds and arithmeticmean) improve on a mean-rating rank aggregation in Netflix.

Mean Rating Log-odds (all) Arithmetic Mean (30)LOTR III: Return . . . LOTR III: Return . . . LOTR III: Return . . .LOTR I: The Fellowship . . . LOTR I: The Fellowship . . . LOTR I: The Fellowship . . .LOTR II: The Two . . . LOTR II: The Two . . . LOTR II: The Two . . .Lost: Season 1 Star Wars V: Empire . . . Lost: S1Battlestar Galactica: S1 Raiders of the Lost Ark Star Wars V: Empire . . .Fullmetal Alchemist Star Wars IV: A New Hope Battlestar Galactica: S1Trailer Park Boys: S4 Shawshank Redemption Star Wars IV: A New HopeTrailer Park Boys: S3 Star Wars VI: Return ... LOTR III: Return . . .Tenchi Muyo! . . . LOTR III: Return . . . Raiders of the Lost ArkShawshank Redemption The Godfather The GodfatherVeronica Mars: S1 Toy Story Shawshank RedemptionGhost in the Shell: S2 Lost: S1 Star Wars VI: Return ...Arrested Development: S2 Schindler’s List GladiatorSimpsons: S6 Finding Nemo Simpsons: S5Inu-Yasha CSI: S4 Schindler’s List

2. PAIRWISE AGGREGATIONA rating matrix R collects R�,�, the rating by user �on item �. Pairwise aggregations produce a skew-symmetric comparison matrix Y. The advantages ofcomparison matrices are discussed at right. Commonchoice are:Arithmetic mean The arithmetic mean of all voters who have rated both � andj is Y�j =

P�(R���R�j)

#{�|R��,R�j exist}. These are translation invariant.

Geometric mean The (log) geometric mean over all voters who have ratedboth � and j is Y�j =

P�(logR���logR�j)

#{�|R��,R�j exist} . These are scale invariant.

Binary comparison Let Y��j = sign(R�j � R��) be the user preference matrix.Averaging over users yields the probability difference that the alternative j is pre-ferred to � than vice versa Y�j = Pr{� | R�� > R�k} � Pr{� | R�� < R�j}. These areinvariant to a monotone transformation.

Logarithmic odds ratio This idea translates binary comparison to a logarith-mic scale: Y�j = log Pr{�|R���R�j}

Pr{�|R��R�j}.

Whereas user-item rating matricesare notoriously incomplete, item-item comparison matrices are ac-tually nearly dense. For Netflix, thematrix R is 1% filled, whereas Y is99.77% filled. The number of userssupporting each comparison is his-togrammed below.

100 101 102 103 104 105 106100

101

102

103

104

105

106

107

Number of Pairwise Comparisons

Occ

uren

ces

3. SKEW-SYMMETRIC MATRIX COMPLETIONLet Y�,j be known for (�, j) 2 �. Our goal is to find the simplestskew-symmetric matrix that matches the data. We use rank as ameasure of complexity:

minimize r�nk(Y)

subject to Y = �YT and Y�,j = Y�,j for all (�, j) 2 �This problem is also NP-hard. However, using the nuclearnorm instead of rank is a common heuristic.

minimize kYk�subject to Y = �YT and Y�,j = Y�,j for all (�, j) 2 �

This program is convex. The nuclear norm is the sum of sin-gular values and is the largest convex underestimator ofrank on the unit rank ball.

We further relax this problem and study one tolerant tonoise. Let b = vec(Y�), and let A(X) = vec(X�). Then westudy the LASSO problem:

minimize kA(Y)� bk2subject to kXk� � and Y = �YT

Jain et al. (NIPS2010) proposed the SVP algorithm for solv-ing this problem without the skew-symmetric constraint. Af-ter studying alternatives, we discovered that their algorithmwill give us the skew-symmetric constraint for free due to aparticular property of the SVD of a skew-symmetric matrix:

THEOREM Let A = �AT be an n⇥n skew-symmetric matrixwith eigenvalues ��1,���1, ��2,���2, . . . , ��j,���j, where �� >0 and j = bn/2c. Then the SVD of A is given by

A = U

266664

�1�1

�2�2 ...

�j�j

377775VT (1)

for U and V given in the proof.

SUMMARY “Even rank-k truncated SVDs of skew-symmetricmatrices are also skew-symmetric.”

For us, this theorem says the the SVP algorithm from Jainet al. will produce a skew-symmetric output and satisfy theskew-symmetric constraint.

One final piece of the algorithm involve extracting the sfrom the completed matrix Y. For this, we use the Bordacount s = 1

nYe.

The Rank Aggregation AlgorithmINPUT ranking matrix R, minimum comparisons c1: Get Y from R via §2 .2: Drop entries in Y with < c users3: Let � be the index set for all retained entries in Y and bbe the values for these entries

4: U,S,V = SVP(index set �, values b, rank 2)5: Compute s = (1/n)USVTe

5. RECOVERABILITYA hallmark of matrix completion results are recovery the-orems that show when the solution of the convex heuristicyields the true solution. Using a recent matrix-completiontheorem from David Gross (2010), we prove:

THEOREM Let s be centered, i.e., sTe = 0. Let Y =seT � esT where � = m�x� s2� /(s

Ts) and � = ((m�x� s�) �(min� s�))/ksk. Also, let � ⇢ H be a random set of elementswith size |�| � O(2n�(1 + �)(logn)2) where � = m�x((n� +1)/4, n�2). Then the solution of

minimize kXk�subject to tr�ce(X�W�) = tr�ce((�Y)�W�), W� 2 �

is equal to �Y with probability at least 1� n��.

SUMMARY “About n logn comparisons for recovery.”

As stated, this theorem is not useful because we only needa spanning set of measurements from Y to generate thescore vector. Instead, this theorem gives intuition for thenoisy recovery probem. We test this by generating a skew-symmetric matrix Y from a score vector s, and determinehow many comparisons we need before the we are able tofind the true vector s using the SVP algorithm. We then dothe same experiment by adding Gaussian noise to the mea-surements. We find a threshold at n logn measurements.

Noiseless recovery Noisy recovery

102 103 1040

0.2

0.4

0.6

0.8

1

Frac

tion

of tr

ials

reco

vere

d

Samples

5n 2n lo

g(n)

6n lo

g(n)

102 103 1040

0.2

0.4

0.6

0.8

1

Frac

tion

of tr

ials

reco

vere

d

Samples200 1000 50000

0.01

0.02

0.03

0.04

0.05

Noi

se le

vel

Samples

5n 2n lo

g(n)

6n lo

g(n)

6. SYNTHETIC RESULTSWe also test an item response theory model from Ho andQuinn, 2008. Here, we try and find item scores from syn-thetic user ratings. Our algorithm outperforms the mean rat-ing in a Kendall-� correlation metric.

Nuclear-norm rating Mean rating

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Med

ian

Kend

all’s

Tau

20 10 5 21.5

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Med

ian

Kend

all’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Med

ian

Kend

all’s

Tau

0 0.2 0.4 0.6 0.8 10.5

0.6

0.7

0.8

0.9

1

Error

Med

ian

Kend

all’s

Tau

Please email David Gleich ([email protected]) for questions Our code is available online: Google “skew-nuclear gleich” to find it. KDD2011 - 23 August 2011

Page 17: What the matrix can tell us about the social network

My research

Presentations on my website Implementations on my website

Even more on my website!

www.cs.purdue.edu/homes/dgleich

@dgleich on Twitter

Or we "can "

chat "this "

week.