Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Algorithms on graphs and networks
Leonid Zhukov
Networks
• Explicit: graphs & networks – Power grids, phone lines– Internet, Web graph– Social networks (friends, business etc)
• Implicit: transaction – Email, Messenger, ICQ– Online auctions, E-store (Amazon) – Foreign trade
• Constructed: affinity between data points– Music (online radio)– Books, videos, …
Flickr – social network
Flickr – a graph
Adjacency matrix
Adjacency matrix
Flickr: reverse Cuthill-McKee
580,000 users, 3,500,000 links
Flickr: some stats
• # nodes = 584,207 • # edges (nnz) = 3,555,115
• max in-degree = 3531• max out-degree = 8976• < in-degree > = < out-degree > = 6• diameter = 18
• # strongly connected components = 152,324• largest strongly connected components = 274,649 374 186 155 11
• # connected component = 43,189• largest connected components = 404,893 378 112 108 103• highest core number = 249 (size 668)
Flickr: scale free graph
in-degrees
out-degrees
Graph partitioning
Graph partitioning
11
1 32
54
76
8
Graph separators:
A B
Normalized cut:
Spectral graph partitioning
12
• assign each node indicator p= {-1,-1,-1,-1, +1, +1, +1}
• smallest cut:
• combinatorial optimization, NP hard, relax:
1 32
54
76
8
=>
Spectral: solution
13
• Quadratic optimization:
• Eigenvalue problem:
• Rounding off
Example: normalized cuts
14
1 325
4
76 8
L = x =
-1 +1
Spectral: ordering
15
Adjacency matrix Adjacency matrix – re-ordered
Eigenvector Eigenvector – sorted
perm = [ 1 2 6 7 3 8 4 5 ]
Spectral: graph cut
16
Eigenvector – sortednode layout
Cut values
Clustering
17
Clustering
18
Recursive partitioning tree
19
Flickr: “10-core” spectral ordering
Partition tree
21
Sponsored search
22
View Bids Tool
Sponsored search data
23
accessory desk .83 Office Maxalfred hitchcock .01 Videoflicks.comeducational software .13 Buy.comeducational software .4 eBAY International AGeducational software .17 OfficeMaxepson .28 Buy.comepson .4 eBAY International AGfax .13 Buy.comfax .4 eBAY International AG fax .38 OfficeMaxgame .02 Net Businessgame .25 Buy.comgeorge harrison .15 eBAY International AGgeorge harrison .05 MusicStackgeorge harrison .01 Videoflicks.comjackson michael .05 MusicStackjackson michael .01 Videoflicks.com
bidded term $ bid advertiser
Sponsored search bid graph
24
Terms
Advertisers
accessory desk
educational software
epson
fax
game
george harrison
Office Max
eBAY International AG
Buy.com
Net Business
MusicStack
Videoflicks.com
jackson michael
alfred hitchcock
Graph partitioning
25
Terms
Advertisers
accessory desk
educational software
epson
fax
game
george harrison
Office Max
eBAY International AG
Buy.com
Net Business
MusicStack
Videoflicks.com
jackson michael
alfred hitchcock
Sponsored search bid matrix
26
Office Max
Buy.com
Net Business
MusicStack
eBAY
Videoflicks.com
1 2 3 4 5 6 7 8
1. accessory desk2. educational software3. epson4. fax
5. game6. george harrison7. jackson michael8. alfred hitchcock
Overture bid graph : real life
27
Bi-partite graph partitioning
28
Terms Advertisers
Bi-partite formulation
29
• Eigensystem:
• SVD:
• Solution:
Back to data
30
Clusters
31
Sample clusters
32
'A.S.C. Incorporated''Atlantic Telecom'
'AudioLink''Cost Plus Electronics'
'Headset Express''Hello Direct'
'PDA Mountain''PK TECH INC‘
…………………..
'accessory phone''cordless headset'
'cordless headset phone''free hands'
'headset''headset microphone'
'headset phone''headset plantronics'
'headset wireless''isdn'
'plantronics‘…………….
'Berent Associates Center forShyness and Social Therapy'
'Dr. Puff''New York Psychotherapy
Collective''Ruby Shoes'
'Self-employed psychologist''www.kabbalah.com‘
……………
'health mental''help self'
'improvement self''parenting‘
Yahoo! Music – LAUNCHcast radio
33
user profileplayer
Launch cast data
34
190389 1034047 100190389 1034060 20190389 1034129 50190389 1034276 100190389 1034388 80190389 1034801 100190389 1034831 100190389 1034882 0190389 1034883 40190389 1035010 20190389 1035473 60190389 1035840 0190389 1035926 30190389 1036157 30190389 1036454 60190389 1036736 30190389 1036737 100190389 1036740 70190389 1036746 80190389 1036762 100
1037729 Coral1037730 Rick Braun1037731 Britney Spears1037732 Gary Motley1037733 Candido Fabre1037734 K'Stalia Y Los Salchichas1037735 Donny Hoffa1037736 Rafael Mendez1037737 Lithops1037738 The Paris Ensemble1037739 The Sunset Orchestra1037740 The Mariachis Of Chiapas1037741 Jenny Simpson1037742 Doc West & The Yard Dogs1037743 Geoff Bartley1037744 Twilight Circus Dub Sound...1037745 Tekneek
user ID artist ID ratingsartist ID artist
Ratings: data representation
a2a1
u1
u3
u2
Vector space model:w_ij = cos(a_i,a_j) =
(a_i * a_j) /(||a_i|| ||a_j||)
a3
u4
u4
u1
u2
u3
a1
a2
a3
ratings: users - artists
Similarity graph
0 3 1 0 4 51 0 0 0 0 50 4 0 0 0 0
2 0 2 0 4 00 5 0 0 0 51 0 3 0 0 4
similarities
artists
user
s
artist-artistsim graph
Compute
1 0.3 0.8 0.0 0.5 0.3 1 0.4 0.0 0.2 0.8 0.4 1 0.3 0.0 0.0 0.0 0.3 1 0.0 0.5 0.2 0.0 0.0 1
artists
artis
ts
Yahoo! Music, artist-artist
Artist clusters
Rank Label
1 Mozart
2 Royal Philharmonic Orchestra
3 Yo-Yo Ma
4 Scottish Chamber Orchestra
5 New York Philharmonic
6 Gilbert Johnson
7 New York Pro Music Antiqua
8 New Philharmonia Orchestra
9 Izzy
10 London Symphony Orchestra
11 Philadelphia Orchestra
12 Roberto Michelucci
13 Columbia Symphony Orchestra
14 San Francisco Symphony
15 London Philharmonic Orchestra
16 Berlin Philharmonic Orchestra
Rank Label
1 Enrique Iglesias
2 Avril Lavigne
3 Backstreet Boys
4 Kylie Minogue
5 Good Charlotte
6 Simple Plan
7 T.A.T.U.
8 Hilary Duff
9 Paula Abdul
10 No Doubt
11 Britney Spears
12 Westlife
13 O-Town
14 LFO (Lyte Funky Ones)
15 98 Degrees
16 Shakira
cluster # 6 cluster # 79
Sample clusters
Ordering: graph embedding in 1D
40
• every node –coordinate x(i)
-1 1 x
1 32
54
76
8
1 2 6 7 3 8 4 5
0create ordering/permutation
Graph embedding : 2D
• Quadratic optimization problem
• Replace
• Quadratic optimization problem
• Embedding: x = ev(2) , y = ev(3)
2D spectral embedding
intuition
-1 1 x
1 32
54
76
8
1 2 6 7 3 8 4 5
0
Embedding 2D - better
• Change constraints: average -> “per node”
• Optimization:
Semi Definite Programming (SDP)
• Minimum bisection SDP relaxation
• Standard form primal SDP
• Factor:
SDP embedding
• Spectral:
• SDP:
-1 1
r=1
SDP spherical embedding
SDP sphere unrolled
Clusters
Visualization pipeline
Solver
SDPMap 3D to 2D
Visualize3D mapping
0 3 1 0 4 51 0 0 0 0 50 4 0 0 0 0
2 0 2 0 4 00 5 0 0 0 51 0 3 0 0 4
similarities
artists
user
s
artist-artistsim graph
Compute
David Gleich Leonid Zhukov Matthew Rasmussen Kevin LangStanford University Yahoo!, Inc. MIT Yahoo! Research
The World of Music: SDP layout of high-dimensional data