View
222
Download
0
Embed Size (px)
Citation preview
SCS CMU
Proximity Tracking on Time-Evolving Bipartite Graphs
Speaker: Hanghang Tong
Joint Work with
Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos
Apr. 24-26, 2008, Atlanta SIAM Conference on Data Mining
SCS CMU
3
Graph Mining: the big picture Graph/Global Level
Subgraph/Community Level
Node Level We are here!
SCS CMU
4
Proximity on Graph: What?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
a.k.a Relevance, Closeness, ‘Similarity’…
SCS CMU
5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.05
0.1
0.15
0.2
0.25
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Link Prediction
Prox. Hist. for a set of deleted links
density
density
Prox (ij)+Prox (ji)
Prox (ij)+Prox (ji)
Prox. is effective to ‘deleted’ and absent edges!
Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]
Prox. Hist. for a set of absent links
SCS CMU
6
Neighborhood Search on graphs
ICDM
KDD
SDM
Philip S. Yu
IJCAI
NIPS
AAAI M. Jordan
Ning Zhong
R. Ramakrishnan
…
…
… …Conference Author
A: Proximity! [Sun+ ICDM2005]
Q: what is most related conference to ICDM?
SCS CMU
7
Example
ICDM
KDD
SDM
ECML
PKDD
PAKDD
CIKM
DMKD
SIGMOD
ICML
ICDE
0.009
0.011
0.0080.007
0.005
0.005
0.005
0.0040.004
0.004
SCS CMU
8
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
Region Automatic Image Caption
Q: How to assign keywords to the test image?A: Proximity! [Pan+ 2004]
SCS CMU
9
Center-Piece Subgraph(CePS)
A C
B
A C
B
Original GraphCePS
Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]
CePS guy
Input Output
SCS CMU
10
OutputInput
Data Graph
Query Graph
Matching SubgraphAccountant
CEO
Manager
SEC
Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007]
Best-Effort Pattern Match
SCS CMU
Challenge• Graphs are evolving over time!
–New nodes/edges show up;
–Existing nodes/edges die out;
–Edge weights change…
11
Q: How to Generalize everything? A: Track Proximity!
SCS CMU
Trend analysis on graph level
12
M. Jordan
G.HintonC. Koch
T. Sejnowski
Year
Rank of Influential-ness
SCS CMU
13
Roadmap
• Motivation
• Prox. On Static Graphs
• Prox. On Time-Evolving Graphs
• Experimental Results
• Conclusion
SCS CMU
14
Random walk with restart
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector More red, more relevant
Nearby nodes, higher scores
4r
Query
SCS CMU
15
Computing RWR
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector Starting vectorAdjacency matrix
1
(1 )i i ir cWr c e
Restart p
Query
SCS CMU
16
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
Q: Given query i, how to solve it?
??
Adjacency matrix Starting vectorRanking vectorRanking vector
Query
SCS CMU
RWR on Bipartite Graph
17
n
m
authors
Conferences
Author-Conf. Matrix
Observation: n >> m!
Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs
SCS CMU
18
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 0 0 0 1/4 0 0 0 0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4
0.9
0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
0
0
0
0
00.1
0
0
0
0
0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
RWR on Skewed bipartite graphs
??
… . . .. . …. .. .
. …. . ..
0
0
n m
Ar
… .
. .. . …. .. .
. …. . .. Ac
m confs
n aus
SCS CMU
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
19
BB_Lin: Pre-Computation [Tong+ 06]
M = AcArX
1( 0.9 )I M 3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
m conferences
n authors
SCS CMU
20
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
M = AcArX
1( 0.9 )I M
2-step RWR for Conferences
All Conf-Conf Prox. Scores
m conferences
n authors
SCS CMU
21
BB_Lin: Pre-Computation [Tong+ 06]
• Step 1:
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
M = Ac ArX
1( 0.9 )I M
3( )O m m E
2-step RWR for Conferences
All Conf-Conf Prox. Scores
Ac/ArE edges
m x m
SCS CMU
BB_Lin: On-Line Stage
22Ac/Ar
E edges
(Base) Case 1: - Conf - Conf
authors
Conferences
Read out !
SCS CMU
BB_Lin: Examples
• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line
• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line
25
SCS CMU
26
Roadmap
• Motivation
• Prox. On Static Graphs
• Prox. On Time-Evolving Graphs
• Experimental Results
• Conclusion
SCS CMU
27
Challenges
• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– On-line cost for query: fraction of seconds
• w/ 1.5 hr pre-computation for m x m core matrix
• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– On-line cost: 1.5hr itself becomes a part this!
SCS CMU
28
1( 0.9 )I M 3( )O m m E t=0
Q: How to update the core matrix?
t=11( 0.9 )I M
~ ~3( )O m m E
?
SCS CMU
Update the core matrix
• Step 1:
• Step 2:
29
M = Ac ArX
1( 0.9 )I M ~
~
~
M= X+
Rank 2 update
= + X
2( )O m 3( )O m m E
SCS CMU
Update : General Case
• E’ edges changed
• Involves n’ authors, m’ confs.
• Observation
30
M = AcArX
~
min( ', ') 'n m E
n authors
m Conferences
SCS CMU
31
• Observation: – the rank of update is small!– Real Example (DBLP Post)
• 1258 time steps• E’ up to ~20,000!• min(n’,m’) <=132
• Our Algorithm
Update : General Case
2(min( ', ') ')O n m m E
3( )O m m E
min( ', ') 'n m E
31
n authors
m Conferences
SCS CMU
32
Roadmap
• Motivation
• Prox. On Static Graphs
• Prox. On Time-Evolving Graphs
• Experimental Results
• Conclusion
SCS CMU
34
Philip S. Yu’s Top-5 conferences up to each year
ICDE
ICDCS
SIGMETRICS
PDIS
VLDB
CIKM
ICDCS
ICDE
SIGMETRICS
ICMCS
KDD
SIGMOD
ICDM
CIKM
ICDCS
ICDM
KDD
ICDE
SDM
VLDB
1992 1997 2002 2007
DatabasesPerformanceDistributed Sys.
DatabasesData Mining
DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs
SCS CMU
35
KDD’s Rank wrt. VLDB over years
Prox.Rank
Year
Data Mining and Databases are more and more relavant!
SCS CMU
37
10 most influential authors in NIPS community up to each year
Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years
T. Sejnowski
M. Jordan
SCS CMU
38
Fast-Single-Update
176x speedup
40x speedup
log(Time) (Seconds)
Datasets
Our method
Our method
SCS CMU
39
Fast-Batch-Update
Min (n’, m’)E’
Time (Seconds)Time (Seconds)
15x speed-up on average!
Our method Our method
SCS CMU
Conclusion• Trends Analysis on Graph Level
– pTrack/cTrack
• Scalable for evolving graphs
40
Trends
gr a p h