34
Damping Effect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018 Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun Department of Computer Science, Duke University, USA

Damping Effect on PageRank Distributiontcliu/publication/HPEC... · Damping E ect on PageRank Distribution IEEE High Performace Extreme Computing, Waltham, MA, USA September 26, 2018

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Damping Effect on PageRank Distribution

IEEE High Performace Extreme Computing, Waltham, MA, USA

September 26, 2018

Tiancheng Liu Yuchen Qian Xi Chen Xiaobai Sun

Department of Computer Science, Duke University, USA

Outline

� Personalized PageRank model:

invention by Brin and Page (1998)

in need of innovative extension

� The PageRank model family:

an analytic apparatus with increased

description power and scope

� Analysis:

damping effects on PageRank distributions

� Algorithm:

exploiting structures of the personalized,

stochastic Krylov (PSK) space

� Findings:

by experiments on real-world network data

Sparse graphs in sparse matrix representations

x1

x2

x3

x4

x5

x6 x7

x8

x9

x10

x11

x12

x13

x14x15

x16x17

x18

x19

x20

link graph G(V ,E)directed edge(u, v) ∈ E

2

4

6

8

10

12

14

16

18

20 2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

2 4 6 8 10 12 14 16 18 20

adjacency matrix AA(v , u) = 1

din in-degreesdout out-degrees

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

probability transition matrix PP = A · diag(1./dout)

factor form in storage

1 / 26

Precursor: Personalized PageRank

Web surfing modeled as a random walk on Mα(v), a Markov chain with a personalized term S

Mα(v) = αdamping factor

Plink graph

+ (1− α)S , S = vpersonalized vector

eT

gathering vector

x1

x2

x3

x4

x5

x6 x7

x8

x9

x10

x11

x12

x13

x14x15

x16x17

x18

x19

x20

personalized Markov chain

= α

x1

x2

x3

x4

x5

x6 x7

x8

x9

x10

x11

x12

x13

x14x15

x16x17

x18

x19

x20

link graph

+(1− α)

x1

x2

x3

x4

x5

x6 x7

x8

x9

x10

x11

x12

x13

x14x15

x16x17

x18

x19

x20

personalized direct links

Bernoulli decision at each click:

follow P-links or S-links

with probability α ∈ (0, 1)

a.k.a. damping factor

The personalized term S :

direct links to v-nodes (yellow)

gathering/broadcasting

rank-1, stochastic 2 / 26

Precursor: Personalized PageRank

Web surfing modeled as a random walk on Mα(v), a Markov chain with a personalized term S

Mα(v) = αdamping factor

Plink graph

+ (1− α)S , S = vpersonalized vector

eT

gathering vector

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

= α0.85

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

+ (1− α)

0.15

5 10 15 20

2

4

6

8

10

12

14

16

18

20

Bernoulli decision at each click:

follow P-links or S-links

with probability α ∈ (0, 1)

a.k.a. damping factor

The personalized term S :

direct links to v-nodes (yellow)

gathering/broadcasting

rank-1, stochastic2 / 26

Equivalent expressions of PageRank distribution vector

Purpose: multi-aspect investigation for interpretation and computational analysis

1. Steady state distribution of Mα

Mαx =[αP + (1− α)veT

]x = x

the power method(2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

)k

Mkα

2

4

6

8

10

12

14

16

18

20

x0

−→2

4

6

8

10

12

14

16

18

20

x

Asymptotic walk on Mα, memoryless of x0

2. Solution to sparse linear system

(I − αP)x = (1− α)v

many iterative solution methods

3. Explicit representation

x = (1− α)∑

k αk(Pkv)

in Neumann series with P, v , α

(1− α)∑k

αk(

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

link graph P

)k 2

4

6

8

10

12

14

16

18

20

v

−→2

4

6

8

10

12

14

16

18

20

x

Cumulative propagation of v on P

4. Differential transition equation

x(α) = [P(I − αP)−1 − (1− α)−1I ]x(α)

spectrum-based method

3 / 26

Outline

� Personalized PageRank model:

invention by Brin and Page (1998)

in need of innovative extension

� The PageRank model family:

an analytic apparatus with increased

description power and scope

� Analysis:

damping effects on PageRank distributions

� Algorithm:

exploiting structures of the personalized,

stochastic Krylov (PSK) space

� Findings:

by experiments on real-world network data

PageRank model family: characterizing various propagation patterns

Model description in equivalent

expressions:

� Propagation kernel functions

propagation patterns

� Cumulative propagation on P

� Linear systems

� Differential transitions

PageRank distribution response

to damping variation

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Geometric kernels (Brin-Page)

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Poisson kernels (Chung)

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Conway-Maxwell-Poisson kernels (slow)

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

Conway-Maxwell-Poisson kernels (fast)

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Negative Binomial kernels

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Logarithmic kernels

A few particular subfamilies of propagation kernel functions

4 / 26

Propagation kernel functions

Propagation kernel function fρ(λ)

fρ( λgraph

eigenvalue

) =∑k

wk(ρ)

discretepmf

λk

PageRank vector (model solution) with particular

network P and personalized distribution vector v

x = fρ(P)v =∑k

wk(ρ)

damping onk-th step

· Pkvk-th step

propagation

{wk(ρ)} : any probability mass function (pmf)

of variable ρ, w.i./w.o. additional parameters

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

0.9

0

0.8

2

10-5

4

Bin

co

un

ts

106

100

0.7

6

105

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

30

200

2

1010

-5

4

Bin

co

un

ts

106

100

6

105

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

0.95

0.90

0.85

2

10-5

4

106

Bin

co

un

ts

0.8100

6

105

PageRank distributions of 3 propagation patternswith P for link graph Twitter(www) 1

1 H. Kwak et al. (2009) 5 / 26

Propagation pattern kernels : CMP sub-family

Conway-Maxwell-Poisson (CMP):

wk( ρ

dampingvariable

, νdampingspeed

) =ρk

(k!)ν Znormalization

constant

Damping speed parameter ν ≥ 0

ν =

0, geometric, (B-P, 1998)

1, Poisson, (Chung, 2007)

< 1, slow decaying with k

> 1, fast decaying with k

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

slow damping speed: 0 ≤ ν ≤ 1 (ρ = 0.9)including BP model and Chung’s model

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

fast damping speed: ν ≥ 1 (ρ = 5)

Slow and fast propagation patterns of CMP distribution

6 / 26

Propagation pattern kernels: NB sub-family

Negative Binomial (NB): step k

wk( ρ

dampingvariable

, rdistribution

shape

) =

(k + r − 1

k

)ρk(1− ρ)r

Distribution shape parameter r :

r =

1, geometric distribution

∞, Poisson distribution, with r · ρ(1−ρ)

= const

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Propagation patterns of NB distribution

7 / 26

Propagation pattern kernels: logarithmic distribution

Logarithmic: step k

wk(ρ) =−1

ln(1− ρ)

ρk

k, ρ ∈ (0, 1)

unique new model in the model family:

weight decay faster than geometric distribution

weight decay slower than Poisson distribution

no extra control parameters0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Propagation patterns of logarithmic distributions

8 / 26

Propagation pattern kernels: precursor models and new model

Precursor models:

Brin-Page1 model: geometric distribution

wk(α) = (1− α)αk

Chung’s2 model: Poisson distribution

wk(β) = e−ββk

k!

new model in the family:

log-γ model: logarithmic distribution

wk(γ) =−1

ln(1− γ)

γk

k

1 L. Page and S. Brin, 1998 2 F. Chung, PNAS, 2007

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

9 / 26

Cumulative propagation on P

link graph P andpersonalized vector v

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

P

2

4

6

8

10

12

14

16

18

20

v

2

4

6

8

10

12

14

16

18

20

v

2

4

6

8

10

12

14

16

18

20

Pv

2

4

6

8

10

12

14

16

18

20

P2v

· · ·

propagation on P

2

4

6

8

10

12

14

16

18

20

Pm−1v

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14 16 18 20

geometric kernel (Brin-Page)

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

2 4 6 8 10 12 14 16 18 20

Poisson kernel (Chung)

0 2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14 16 18 20

Logarithmic kernel (log-γ)

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

2

4

6

8

10

12

14

16

18

20

x(α) = zα∑k

αk Pk v

0 2 4 6 8 10 12 14 16 18 20

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

2

4

6

8

10

12

14

16

18

20

x(β) = zβ

∑k

βk

k!Pk v

0 2 4 6 8 10 12 14 16 18 20

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2

4

6

8

10

12

14

16

18

20

x(γ) = zγ∑k

γk

kPk v

10 / 26

Linear systems

Close-form expression of the coefficient matrix

Aρ(P)x = v , Aρ(P) = f −1ρ (P)

Particular instances

Brin-Page model:

Aα(P) = (1− α)−1(I − αP)

Chung’s model:Aβ(P) = e−β(I−P)

log-γ model:

Aγ(P) = ln(1− γ) ln−1(I − γP)

– Except the Brin-Page model, explicit forma-

tion of the coefficient matrix is non-necessary

– This formulation is used for derivation of the

differential transition equation (next)

11 / 26

Differential transition

Effect of damping variation in one model:

Node-wise trajectory of PageRank vector x(ρ)

x(ρ) =d

dρx(ρ) =

∂ρfρ(P)v = Qρ(P)x(ρ)

at any particular value of ρ

Brin-Page model:

Qα(P) = [P(I − αP)−1 − (1− α)−1I ]

Chung’s model:Q = −(I − P)

log-γ model:

Qγ(P) =(1− γ)−1

ln(1− γ)I − P(I − γP)−1(ln(I − γP))−1

◦ Matrix-vector multiplication for Chung’s model

◦ Linear-solver may be used once again for Brin-

Page model

• An efficient spectrum-based algorithm for all

models, without eigen-decomposition of P

12 / 26

Outline

� Personalized PageRank model:

invention by Brin and Page (1998)

in need of innovative extension

� The PageRank model family:

an analytic apparatus with increased

description power and scope

� Analysis:

damping effects on PageRank distributions

� Algorithm:

exploiting structures of the personalized,

stochastic Krylov (PSK) space

� Findings:

by experiments on real-world network data

Inter-model correspondence

statistically similar damping level of propagation on P:

at expected propagation weight center

µ(wk(ρ)) =∑k∈Nw

k · wk(ρ)

Brin-Page ←→ Chung’sα

1− α= β

Brin-Page ←→ log-γα

1− α=

1− γ

)−1

ln(1− γ)

0 2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

pmfs associated with Brin-Page, Chung’s,and log-γ model, at corresponding dampingvariables (α = 0.85, β = 5.66, γ = 0.94)

13 / 26

Intra-model damping effect by KL divergence and its derivative

Aggregated effect of damping variation: KL divergence

of PageRank vectors (scalar)

KL(x(ρ), x(ρo)) =∑i

xi (ρ) logxi (ρ)

xi (ρo)

d

dρKL(x(ρ), x(ρo)) = x(ρ)(log x(ρ)− log x(ρo) + e)

0.7 0.75 0.8 0.85 0.9 0.95 1

0

0.05

0.1

0.15

0.2

-1

0

1

2

3

4

5

6

7

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

Damping variation in KL and dKL/dρ(Twitter-www, Brin-Page model)

* dKL/dρ in red, KL in blue

* reference damping factor denote as ρo 14 / 26

Outline

� Personalized PageRank model:

invention by Brin and Page (1998)

in need of innovative extension

� The PageRank model family:

an analytic apparatus with increased

description power and scope

� Analysis:

damping effects on PageRank distributions

� Algorithm:

exploiting structures of the personalized,

stochastic Krylov (PSK) space

� Findings:

by experiments on real-world network data

Personalized, stochastic Krylov space

Personalized, stochastic Krylov (PSK) space:

PSK(P, v) = span{v ,Pv ,P2v , · · · ,Pkv , · · · },v ≥ 0, eTv = 1

Properties:

◦ Any convex combination of the Krylov vectors is

a probability distribution

◦ The same PSK space is shared by all models,

housing all model solutions and their trajectories

◦ The PSK space is of finite dimension m

◦ Let K = [v ,Pv ,P2, · · · ,Pm−1v ] and K = QR.

There exists a Hessenberg matrix H such that

PQ = QH, Qe1 = v and that g(P)v = Q g(H)e1for any function g

link graph P andpersonalized vector v

2 4 6 8 10 12 14 16 18 20

2

4

6

8

10

12

14

16

18

20

P

2

4

6

8

10

12

14

16

18

20

v

2

4

6

8

10

12

14

16

18

20

v

2

4

6

8

10

12

14

16

18

20

Pv

2

4

6

8

10

12

14

16

18

20

P2v

· · ·

Krylov vectors

2

4

6

8

10

12

14

16

18

20

Pm−1v

PageRank vector

x(ρ) = fρ(P)v ∈ PSK(P, v)

PageRank vector trajectory

x(ρ) = Qρ(P)x(ρ) ∈ PSK(P, v)

15 / 26

Efficient algorithm for damping effect analysis

intra-model, inter-model damping variations, across all models under consideration

based on the PSK properties, without eigen-decomposition

P

n×n

v

n×1

K

n×m

Krylov matrixQ

n×m

R

m×m

H

m×m

Hessenbergmatrix

{x(ρ)}{x(ρ)}

PageRankdistributions

PageRankdistributiontrajectories

Krylov space

construction QR decomp.

PQ = QH

g(P)v = Qg(H)e1

16 / 26

Outline

� Personalized PageRank model:

invention by Brin and Page (1998)

in need of innovative extension

� The PageRank model family:

an analytic apparatus with increased

description power and scope

� Analysis:

damping effects on PageRank distributions

� Algorithm:

exploiting structures of the personalized,

stochastic Krylov (PSK) space

� Findings:

by experiments on real-world network data

Data: real-world large social and knowledge network snapshots

Total #nodes #nodes in LSCC [max(dout), µ(dout),max(din)]

Google 1 875,713 434,818 [4209, 8.86, 382]

Wikilink 2 12,150,976 7,283,915 [7527, 50.48, 920207]

DBpedia 3 18,268,992 3,796,073 [8104, 26.76, 414924]

Twitter(www) 4 41,652,230 33,479,734 [2936232, 42.65, 768552]

Twitter(mpi) 5 52,579,682 40,012,384 [778191, 47.57, 3438929]

Friendster 6 68,349,466 48,928,140 [3124, 32.76, 3124]

1 Google Inc. (2002) 2 Wikipedia Foundation (2017) 3 DBpedia (2017)4 H. Kwak et al. (2009) 5 M. Cha et al. (2010) 6 ArchiveTeam (2011) 17 / 26

Sparse real-world networks under Dulmage-Mendelsohn permutation

200 400 600 800

100

200

300

400

500

600

700

800

0

0.5

1

1.5

2

2.5

3

3.5

Google (τ = 8)5000 10000 15000

2000

4000

6000

8000

10000

12000

14000

16000

180000

0.5

1

1.5

2

2.5

3

3.5

4

DBpedia (τ = 2)2000 4000 6000 8000 10000 12000

2000

4000

6000

8000

10000

12000 0

1

2

3

4

5

Wikilink (τ = 2)

1 2 3 4

104

0.5

1

1.5

2

2.5

3

3.5

4

104

0

1

2

3

4

5

Twitter(www) (τ = 2)

1 2 3 4 5

104

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

104

0

1

2

3

4

5

Twitter(mpi) (τ = 3)

1 2 3 4 5 6

104

1

2

3

4

5

6

104

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Friendster (τ = 3)

each point represent a 1000× 1000 block, a block with ≥ τ non-zeros is colored blue18 / 26

Personalized stochastic Krylov space: small-world phenomenon

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

Google (m = 62)

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

DBpedia (m = 19)

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

Wikilink (m = 27)

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

Twitter(www) (m = 25)

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

Twitter(mpi) (m = 30)

0 10 20 30 40 50 60 70 80

-16

-14

-12

-10

-8

-6

-4

-2

0

Friendster (m = 24)

Effective PSK(P, v) dimension m by Rii in QR decomposition

19 / 26

Damping effect: KL and dKL/dρ across models

0.7 0.75 0.8 0.85 0.9 0.95 1

0

0.05

0.1

0.15

0.2

-1

0

1

2

3

4

5

6

7

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

α0 = 0.85

0.75 0.8 0.85 0.9 0.95 1

0

0.01

0.02

0.03

0.04

0.05

0.06

0

5

10

15KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

γ0 = 0.94146

0 5 10 15 20 25 30 35

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

β0 = 5.6

0.7 0.75 0.8 0.85 0.9 0.95 1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

-3

-2

-1

0

1

2

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

α0 = 0.95

0.75 0.8 0.85 0.9 0.95 1

-0.05

0

0.05

0.1

0.15

0.2

-1

0

1

2

3

4

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

γ0 = 0.98831

0 5 10 15 20 25 30 35

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

β0 = 19

Twitter(www) dataset

substantial different sensitivity

patterns across model

B-P model and log-γ model are

sensitive when damping parameter

approaches 1

Chung’s model is less sensitive with

damping parameter change,

especially with large β

20 / 26

Damping effect: KL and dKL/dρ across datasets

0.7 0.75 0.8 0.85 0.9 0.95 1

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0

2

4

6

8

10

12

14KL divergence

analytical derivative

emprical derivative, =0.008

emprical derivative, =0.002

Google

0.7 0.75 0.8 0.85 0.9 0.95 1

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-1

0

1

2

3

4

5

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

DBpedia

0.7 0.75 0.8 0.85 0.9 0.95 1

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

-1

0

1

2

3

4

5

6

7KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

Wikilink

0.7 0.75 0.8 0.85 0.9 0.95 1

0

0.05

0.1

0.15

0.2

-1

0

1

2

3

4

5

6

7

KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

Twitter(www)

0.7 0.75 0.8 0.85 0.9 0.95 1

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

-1

0

1

2

3

4

5

6KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

Twitter(mpi)

0.7 0.75 0.8 0.85 0.9 0.95 1

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

-0.5

0

0.5

1

1.5

2

2.5

3

3.5KL divergence

analytical derivative

emprical derivative, =0.004

emprical derivative, =0.002

Friendster

Brin-Page model, α0 = 0.85

similar trend across 6 datasets

low variation with relatively small α

substantially larger variation when

α −→ 1

21 / 26

Intra-model variation: PageRank vector profiles across models

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

Brin-Page model

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

Chung’s model

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

log-γ model

0.9

0

0.8

2

10-5

4

Bin

co

un

ts

106

100

0.7

6

105

30

200

2

1010

-5

4B

in c

ou

nts

106

100

6

105

0.95

0.90

0.85

2

10-5

4

106

Bin

co

un

ts

0.8100

6

105

PageRank vector profile: normalized histogram of PageRank valuesTwitter(www) dataset

22 / 26

Intra-model variation: PageRank vector profiles across datasets

10 -6 10 -4 10 -2 10 0 10 20

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 4

Google

10 -5 10 0 10 50

1

2

3

4

5

6

7

8

# o

f n

od

es (

bin

co

un

ts)

10 5

DBpedia

10 -5 10 0 10 50

2

4

6

8

10

12

14

# o

f n

od

es (

bin

co

un

ts)

10 5

Wikilink

10 -5 10 0 10 50

1

2

3

4

5

6

7

# o

f n

od

es (

bin

co

un

ts)

10 6

Twitter(www)

10 -5 10 0 10 50

1

2

3

4

5

6

7

8

9

10

# o

f n

od

es (

bin

co

un

ts)

10 6

Twitter(mpi)

10 -8 10 -6 10 -4 10 -2 10 0 10 20

1

2

3

4

5

6

7

8

# o

f nodes (

bin

counts

)

10 6

Friendster

Brin-Page model, α0 = 0.8523 / 26

Recap

Intellectual merits

◦ Rich family of PageRank models

capturing, differentiating various activities

and propagation patterns with

quantitative form and speed

◦ Unified analysis of damping effects

easily instantiated on particular network P

and personalized vector v

◦ The PSK space

residence for all model solutions,

foundation for efficient model solution

methods

Experimental findings

� Model utility

inter-model difference in PageRank

distribution profile is much greater than

intra-model difference

� Bump/peak in PageRank distribution

single, with minority support

� The PSK dimension

with small-world networks, the dimension

of personalized, stochastic Krylov space is

low, which leads to upper bounds on

algorithm complexity

24 / 26

Thank you!

Tiancheng Liu – tcliu [at] cs.duke.edu

Recap

Intellectual merits

◦ Rich family of PageRank models

capturing, differentiating various activities

and propagation patterns with

quantitative form and speed

◦ Unified analysis of damping effects

easily instantiated on particular network P

and personalized vector v

◦ The PSK space

residence for all model solutions,

foundation for efficient model solution

methods

Experimental findings

� Model utility

inter-model difference in PageRank

distribution profile is much greater than

intra-model difference

� Bump/peak in PageRank distribution

single, with minority support

� The PSK dimension

with small-world networks, the dimension

of personalized, stochastic Krylov space is

low, which leads to upper bounds on

algorithm complexity