39
SCS CMU Proximity Tracking on Time-Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos Apr. 24-26, 2008, Atlanta SIAM Conference on Data Mining

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

SCS CMU

Proximity Tracking on Time-Evolving Bipartite Graphs

Speaker: Hanghang Tong

Joint Work with

Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos

Apr. 24-26, 2008, Atlanta SIAM Conference on Data Mining

SCS CMU

2

Graphs are everywhere!

SCS CMU

3

Graph Mining: the big picture Graph/Global Level

Subgraph/Community Level

Node Level We are here!

SCS CMU

4

Proximity on Graph: What?

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

a.k.a Relevance, Closeness, ‘Similarity’…

SCS CMU

5

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.05

0.1

0.15

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Link Prediction

Prox. Hist. for a set of deleted links

density

density

Prox (ij)+Prox (ji)

Prox (ij)+Prox (ji)

Prox. is effective to ‘deleted’ and absent edges!

Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]

Prox. Hist. for a set of absent links

SCS CMU

6

Neighborhood Search on graphs

ICDM

KDD

SDM

Philip S. Yu

IJCAI

NIPS

AAAI M. Jordan

Ning Zhong

R. Ramakrishnan

… …Conference Author

A: Proximity! [Sun+ ICDM2005]

Q: what is most related conference to ICDM?

SCS CMU

7

Example

ICDM

KDD

SDM

ECML

PKDD

PAKDD

CIKM

DMKD

SIGMOD

ICML

ICDE

0.009

0.011

0.0080.007

0.005

0.005

0.005

0.0040.004

0.004

SCS CMU

8

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Image

Keyword

Region Automatic Image Caption

Q: How to assign keywords to the test image?A: Proximity! [Pan+ 2004]

SCS CMU

9

Center-Piece Subgraph(CePS)

A C

B

A C

B

Original GraphCePS

Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]

CePS guy

Input Output

SCS CMU

10

OutputInput

Data Graph

Query Graph

Matching SubgraphAccountant

CEO

Manager

SEC

Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007]

Best-Effort Pattern Match

SCS CMU

Challenge• Graphs are evolving over time!

–New nodes/edges show up;

–Existing nodes/edges die out;

–Edge weights change…

11

Q: How to Generalize everything? A: Track Proximity!

SCS CMU

Trend analysis on graph level

12

M. Jordan

G.HintonC. Koch

T. Sejnowski

Year

Rank of Influential-ness

SCS CMU

13

Roadmap

• Motivation

• Prox. On Static Graphs

• Prox. On Time-Evolving Graphs

• Experimental Results

• Conclusion

SCS CMU

14

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant

Nearby nodes, higher scores

4r

Query

SCS CMU

15

Computing RWR

1

43

2

5 6

7

9 10

811

12

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0

0.13

0.22

0.13

0.050.9

0.05

0.08

0.04

0.03

0.04

0.02

0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

0.13 0

0.10 0

0.13 0

0.22

0.13 0

0.05 00.1

0.05 0

0.08 0

0.04 0

0.03 0

0.04 0

2 0

1

0.0

n x n n x 1n x 1

Ranking vector Starting vectorAdjacency matrix

1

(1 )i i ir cWr c e

Restart p

Query

SCS CMU

16

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

Q: Given query i, how to solve it?

??

Adjacency matrix Starting vectorRanking vectorRanking vector

Query

SCS CMU

RWR on Bipartite Graph

17

n

m

authors

Conferences

Author-Conf. Matrix

Observation: n >> m!

Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs

SCS CMU

18

• Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0.9

0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0

0

0

0

00.1

0

0

0

0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3

1

0 0

RWR on Skewed bipartite graphs

??

… . . .. . …. .. .

. …. . ..

0

0

n m

Ar

… .

. .. . …. .. .

. …. . .. Ac

m confs

n aus

SCS CMU

• Step 1:

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

19

BB_Lin: Pre-Computation [Tong+ 06]

M = AcArX

1( 0.9 )I M 3( )O m m E

2-step RWR for Conferences

All Conf-Conf Prox. Scores

m conferences

n authors

SCS CMU

20

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

M = AcArX

1( 0.9 )I M

2-step RWR for Conferences

All Conf-Conf Prox. Scores

m conferences

n authors

SCS CMU

21

BB_Lin: Pre-Computation [Tong+ 06]

• Step 1:

• Step 2:

• Cost:

• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes

M = Ac ArX

1( 0.9 )I M

3( )O m m E

2-step RWR for Conferences

All Conf-Conf Prox. Scores

Ac/ArE edges

m x m

SCS CMU

BB_Lin: On-Line Stage

22Ac/Ar

E edges

(Base) Case 1: - Conf - Conf

authors

Conferences

Read out !

SCS CMU

BB_Lin: On-Line Stage

23Ac/Ar

E edges

Case 2: - Au - Conf

authors

Conferences

1 matrix-vec!

SCS CMU

BB_Lin: On-Line Stage

24Ac/Ar

E edges

Case 3: - Au - Au

authors

Conferences

2 matrix-vec!

SCS CMU

BB_Lin: Examples

• NetFlix dataset (2.7m user x 18k movies)– 1.5hr for pre-computation; – <1 sec for on-line

• DBLP dataset (400k authors x 3.5k confs)– A few minutes for pre-computation– <0.01 sec for on-line

25

SCS CMU

26

Roadmap

• Motivation

• Prox. On Static Graphs

• Prox. On Time-Evolving Graphs

• Experimental Results

• Conclusion

SCS CMU

27

Challenges

• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– On-line cost for query: fraction of seconds

• w/ 1.5 hr pre-computation for m x m core matrix

• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– On-line cost: 1.5hr itself becomes a part this!

SCS CMU

28

1( 0.9 )I M 3( )O m m E t=0

Q: How to update the core matrix?

t=11( 0.9 )I M

~ ~3( )O m m E

?

SCS CMU

Update the core matrix

• Step 1:

• Step 2:

29

M = Ac ArX

1( 0.9 )I M ~

~

~

M= X+

Rank 2 update

= + X

2( )O m 3( )O m m E

SCS CMU

Update : General Case

• E’ edges changed

• Involves n’ authors, m’ confs.

• Observation

30

M = AcArX

~

min( ', ') 'n m E

n authors

m Conferences

SCS CMU

31

• Observation: – the rank of update is small!– Real Example (DBLP Post)

• 1258 time steps• E’ up to ~20,000!• min(n’,m’) <=132

• Our Algorithm

Update : General Case

2(min( ', ') ')O n m m E

3( )O m m E

min( ', ') 'n m E

31

n authors

m Conferences

SCS CMU

32

Roadmap

• Motivation

• Prox. On Static Graphs

• Prox. On Time-Evolving Graphs

• Experimental Results

• Conclusion

SCS CMU

34

Philip S. Yu’s Top-5 conferences up to each year

ICDE

ICDCS

SIGMETRICS

PDIS

VLDB

CIKM

ICDCS

ICDE

SIGMETRICS

ICMCS

KDD

SIGMOD

ICDM

CIKM

ICDCS

ICDM

KDD

ICDE

SDM

VLDB

1992 1997 2002 2007

DatabasesPerformanceDistributed Sys.

DatabasesData Mining

DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs

SCS CMU

35

KDD’s Rank wrt. VLDB over years

Prox.Rank

Year

Data Mining and Databases are more and more relavant!

SCS CMU

37

10 most influential authors in NIPS community up to each year

Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years

T. Sejnowski

M. Jordan

SCS CMU

38

Fast-Single-Update

176x speedup

40x speedup

log(Time) (Seconds)

Datasets

Our method

Our method

SCS CMU

39

Fast-Batch-Update

Min (n’, m’)E’

Time (Seconds)Time (Seconds)

15x speed-up on average!

Our method Our method

SCS CMU

Conclusion• Trends Analysis on Graph Level

– pTrack/cTrack

• Scalable for evolving graphs

40

Trends

gr a p h

SCS CMU

41

Thank you!

www.cs.cmu.edu/~htong