Upload
david-gleich
View
698
Download
2
Tags:
Embed Size (px)
DESCRIPTION
In this talk, I give a high level picture of my research about how looking at social network problems as matrix computations is a productive line of work.
Citation preview
THE MATRIX & THE SOCIAL NETWORK
DAVID F. GLEICH PURDUE UNIVERSITY, COMPUTER SCIENCE
RÉSUMÉ Undergraduate Harvey Mudd College Joint CS/Math!Internships Microsoft and Yahoo!!Graduate Stanford University !
"Computational and Mathematical Engineering!Internships Intel, Yahoo!, Microsoft, Library of Congress!Postdoc University of British Columbia!Postdoc Sandia National Laboratories "!
"John von Neumann Fellow !!
Faculty Purdue University Computer Science!!
Copyright Warner Brothers
Copyright Warner Brothers
“The series depicts a
cyberpunk story incorporating
numerous references to philosophical and religious
ideas.” --Wikipedia
Copyright Warner Brothers
Copyright Warner Brothers
Physical simulation
Matrix computations are the heart
(and not brains) of
modern physical
simulation.
Matrix computations
Physics Statistics
Engineering Graphics
Databases …
Matrix computations
A =
2
66664
A1,1 A1,2 · · · A1,n
A2,1 A2,2 · · ·...
.... . .
. . . Am�1,nAm,1 · · · Am,n�1 Am,n
3
77775
Ax = b min kAx � bk Ax = �x
Linear systems Least squares Eigenvalues
“ … greed,
obsession, unpredictability
… ” --Wikipedia
Copyright Columbia Pictures
The power of connections
The matrix is a powerful and
productive paradigm for
studying networks of
connections.
Matrix computations
A =
2
66664
A1,1 A1,2 · · · A1,n
A2,1 A2,2 · · ·...
.... . .
. . . Am�1,nAm,1 · · · Am,n�1 Am,n
3
77775
Ax = b min kAx � bk Ax = �x
Linear systems Least squares Eigenvalues
RAPr on WikipediaE [x(A)]
United States
C:Living people
France
United Kingdom
Germany
England
Canada
Japan
Poland
Australia
Std [x(A)]
United States
C:Living people
C:Main topic classif.
C:Contents
C:Ctgs. by country
United Kingdom
France
C:Fundamental
England
C:Ctgs. by topic
Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41
A new matrix-based sensitivity analysis of Google’s PageRank.
Presented at" WAW2007, WWW2010
Published in the
J. Internet Mathematics
Led to new results on uncertainty quantification in
physical simulations published in SIAM J. Matrix
Analysis and SIAM J. Scientific Computing.
Patent Pending
Improved web-spam detection!
Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)
Fast matrix computations for "Katz scores and commute times.
Presented at"WAW2010
Published in the
J. Internet Mathematics
Reduced computation time by orders of magnitude!
Tweet along @dgleich
MAIN RESULTS – SLIDE THREE
David F. Gleich (Sandia) ICME la/opt seminar 4 of 50
Collaborators Chen Greif, Laks V. S. Lakshmanan, Francesco Bonchi, Pooya Esfandiar
Tall-and-skinny QR factorizations on MapReduce architectures. Overlapping clusters for distri-buted network computations.
mrtsqr – summary of parametersBlocksize How many rows to
read before computing a QR factorization, expressed as a multiple of the number of columns (See paper)
Splitsize The size of each local matrix
Reduction treeThe number of reducers and iterations to use
David Gleich (Sandia)
A1
A2
A1
A2qr
Q2
A1
R1map
Mapper 1-1Serial TSQR
emit
S(2)S(1)
A
shuffle
Iteration 1
(Red)
(Red)
S(2)
(Red)
Iter 2 Iter 315/22MapReduce 2011
Network Alignment
�r
t
s
j
��t�
Square
A L Bweight overlap
upper bound 60,120 17,571 solving an LP – 1 day
NetAlignBP 56,361 15,214 iterative updates(think matrix-vectormultiplies) – 10 min
rounded LP 46,270 17,251 solving an LP –1 day
Note For these results, A is approx 200k vertices, B is approx 300k vertices, and L is5m edges. This setup yields a 5m variable integer QP.
David F. Gleich (UBC) Sandia 2 / 35
Algorithms for large sparse network alignment problems.
Rank aggregation via skew-symmetric matrix completion.
RANK AGGREGATION VIA NUCLEAR NORM MINIMIZATIONDAVID F.GLEICH · PURDUE
LEK-HENGLIM · U. CHICAGO
1. THE IDEAA classic data mining task is to determine the important items in a dataset. This isthe problem of rank-aggregation. Formally, given a series of votes on items by agroup of voters, rank-aggregation is the process of permuting the set of items sothat the best item is first, the second is next, and so on. These problems are difficult.Arrow’s theorem (1950) shows that the ideal rank-aggregation is impossible. Evena compromise proposed by Kemeny (1959), to take something like an average rank-ing, is NP-hard as shown by Dwork et al. (2001).
But can we somehow relax the problem?
Numerical rank-aggregations are intimately inter-twined with skew-symmetric matrices. Suppose thateach item is described by a score s�, and we are ableto collect comparisons about items. Assembling thecomparisons into a matrix yields
Y�,j = s�� sj, or Y = seT � esTwhich is a rank-2 skew-symmetric matrix.
OUTLINE
Ratings (= R)+ (§2)
Pairwise comparisons (= Y)+ (§3)
Ranking scores (= s)+ (sorting)
Rank aggregations.
Using Y, the idea is to get s, and then rank the items. But not all comparisons maybe available, nor will all comparisons be trustworthy. Thus, this is a matrix comple-tion problem: given a measured Y with missing entries, find the true Y.
Contributions(1) We propose a new method for computing a rank aggregation based on matrixcompletion, which is tolerant to noise and incomplete data. (2) We solve a struc-tured matrix-completion problem over the space of skew-symmetric matrices. (3)We prove a recovery theorem detailing when our approach will work. (4) We per-form a detailed evaluation of our approach with synthetic data and an anecdotalstudy with Netflix ratings. Below, we show how our method (log-odds and arithmeticmean) improve on a mean-rating rank aggregation in Netflix.
Mean Rating Log-odds (all) Arithmetic Mean (30)LOTR III: Return . . . LOTR III: Return . . . LOTR III: Return . . .LOTR I: The Fellowship . . . LOTR I: The Fellowship . . . LOTR I: The Fellowship . . .LOTR II: The Two . . . LOTR II: The Two . . . LOTR II: The Two . . .Lost: Season 1 Star Wars V: Empire . . . Lost: S1Battlestar Galactica: S1 Raiders of the Lost Ark Star Wars V: Empire . . .Fullmetal Alchemist Star Wars IV: A New Hope Battlestar Galactica: S1Trailer Park Boys: S4 Shawshank Redemption Star Wars IV: A New HopeTrailer Park Boys: S3 Star Wars VI: Return ... LOTR III: Return . . .Tenchi Muyo! . . . LOTR III: Return . . . Raiders of the Lost ArkShawshank Redemption The Godfather The GodfatherVeronica Mars: S1 Toy Story Shawshank RedemptionGhost in the Shell: S2 Lost: S1 Star Wars VI: Return ...Arrested Development: S2 Schindler’s List GladiatorSimpsons: S6 Finding Nemo Simpsons: S5Inu-Yasha CSI: S4 Schindler’s List
2. PAIRWISE AGGREGATIONA rating matrix R collects R�,�, the rating by user �on item �. Pairwise aggregations produce a skew-symmetric comparison matrix Y. The advantages ofcomparison matrices are discussed at right. Commonchoice are:Arithmetic mean The arithmetic mean of all voters who have rated both � andj is Y�j =
P�(R���R�j)
#{�|R��,R�j exist}. These are translation invariant.
Geometric mean The (log) geometric mean over all voters who have ratedboth � and j is Y�j =
P�(logR���logR�j)
#{�|R��,R�j exist} . These are scale invariant.
Binary comparison Let Y��j = sign(R�j � R��) be the user preference matrix.Averaging over users yields the probability difference that the alternative j is pre-ferred to � than vice versa Y�j = Pr{� | R�� > R�k} � Pr{� | R�� < R�j}. These areinvariant to a monotone transformation.
Logarithmic odds ratio This idea translates binary comparison to a logarith-mic scale: Y�j = log Pr{�|R���R�j}
Pr{�|R��R�j}.
Whereas user-item rating matricesare notoriously incomplete, item-item comparison matrices are ac-tually nearly dense. For Netflix, thematrix R is 1% filled, whereas Y is99.77% filled. The number of userssupporting each comparison is his-togrammed below.
100 101 102 103 104 105 106100
101
102
103
104
105
106
107
Number of Pairwise Comparisons
Occ
uren
ces
3. SKEW-SYMMETRIC MATRIX COMPLETIONLet Y�,j be known for (�, j) 2 �. Our goal is to find the simplestskew-symmetric matrix that matches the data. We use rank as ameasure of complexity:
minimize r�nk(Y)
subject to Y = �YT and Y�,j = Y�,j for all (�, j) 2 �This problem is also NP-hard. However, using the nuclearnorm instead of rank is a common heuristic.
minimize kYk�subject to Y = �YT and Y�,j = Y�,j for all (�, j) 2 �
This program is convex. The nuclear norm is the sum of sin-gular values and is the largest convex underestimator ofrank on the unit rank ball.
We further relax this problem and study one tolerant tonoise. Let b = vec(Y�), and let A(X) = vec(X�). Then westudy the LASSO problem:
minimize kA(Y)� bk2subject to kXk� � and Y = �YT
Jain et al. (NIPS2010) proposed the SVP algorithm for solv-ing this problem without the skew-symmetric constraint. Af-ter studying alternatives, we discovered that their algorithmwill give us the skew-symmetric constraint for free due to aparticular property of the SVD of a skew-symmetric matrix:
THEOREM Let A = �AT be an n⇥n skew-symmetric matrixwith eigenvalues ��1,���1, ��2,���2, . . . , ��j,���j, where �� >0 and j = bn/2c. Then the SVD of A is given by
A = U
266664
�1�1
�2�2 ...
�j�j
377775VT (1)
for U and V given in the proof.
SUMMARY “Even rank-k truncated SVDs of skew-symmetricmatrices are also skew-symmetric.”
For us, this theorem says the the SVP algorithm from Jainet al. will produce a skew-symmetric output and satisfy theskew-symmetric constraint.
One final piece of the algorithm involve extracting the sfrom the completed matrix Y. For this, we use the Bordacount s = 1
nYe.
The Rank Aggregation AlgorithmINPUT ranking matrix R, minimum comparisons c1: Get Y from R via §2 .2: Drop entries in Y with < c users3: Let � be the index set for all retained entries in Y and bbe the values for these entries
4: U,S,V = SVP(index set �, values b, rank 2)5: Compute s = (1/n)USVTe
5. RECOVERABILITYA hallmark of matrix completion results are recovery the-orems that show when the solution of the convex heuristicyields the true solution. Using a recent matrix-completiontheorem from David Gross (2010), we prove:
THEOREM Let s be centered, i.e., sTe = 0. Let Y =seT � esT where � = m�x� s2� /(s
Ts) and � = ((m�x� s�) �(min� s�))/ksk. Also, let � ⇢ H be a random set of elementswith size |�| � O(2n�(1 + �)(logn)2) where � = m�x((n� +1)/4, n�2). Then the solution of
minimize kXk�subject to tr�ce(X�W�) = tr�ce((�Y)�W�), W� 2 �
is equal to �Y with probability at least 1� n��.
SUMMARY “About n logn comparisons for recovery.”
As stated, this theorem is not useful because we only needa spanning set of measurements from Y to generate thescore vector. Instead, this theorem gives intuition for thenoisy recovery probem. We test this by generating a skew-symmetric matrix Y from a score vector s, and determinehow many comparisons we need before the we are able tofind the true vector s using the SVP algorithm. We then dothe same experiment by adding Gaussian noise to the mea-surements. We find a threshold at n logn measurements.
Noiseless recovery Noisy recovery
102 103 1040
0.2
0.4
0.6
0.8
1
Frac
tion
of tr
ials
reco
vere
d
Samples
5n 2n lo
g(n)
6n lo
g(n)
102 103 1040
0.2
0.4
0.6
0.8
1
Frac
tion
of tr
ials
reco
vere
d
Samples200 1000 50000
0.01
0.02
0.03
0.04
0.05
Noi
se le
vel
Samples
5n 2n lo
g(n)
6n lo
g(n)
6. SYNTHETIC RESULTSWe also test an item response theory model from Ho andQuinn, 2008. Here, we try and find item scores from syn-thetic user ratings. Our algorithm outperforms the mean rat-ing in a Kendall-� correlation metric.
Nuclear-norm rating Mean rating
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Med
ian
Kend
all’s
Tau
20 10 5 21.5
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Med
ian
Kend
all’s
Tau
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Med
ian
Kend
all’s
Tau
0 0.2 0.4 0.6 0.8 10.5
0.6
0.7
0.8
0.9
1
Error
Med
ian
Kend
all’s
Tau
Please email David Gleich ([email protected]) for questions Our code is available online: Google “skew-nuclear gleich” to find it. KDD2011 - 23 August 2011
My research
Presentations on my website Implementations on my website
Even more on my website!
www.cs.purdue.edu/homes/dgleich
@dgleich on Twitter
Or we "can "
chat "this "
week.