30
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS

Link Reconstruction from Partial Information

Embed Size (px)

DESCRIPTION

Link Reconstruction from Partial Information. Gong Xiaofeng, Li Kun & C. H. Lai TSL@NUS. General situations where problems may arise. Observed network (A NxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc. - PowerPoint PPT Presentation

Citation preview

Page 1: Link Reconstruction  from Partial Information

Link Reconstruction from Partial Information

Gong Xiaofeng, Li Kun & C. H. LaiTSL@NUS

Page 2: Link Reconstruction  from Partial Information

General situations where problems may arise

Observed network (ANxN filled with 0s and 1s) Scenarios:A) no side information. statistical analysis, clustering, modeling, process, etc.B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity

measure.C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to

link prediction.D) network is subject to change. one kind of prediction problem (link prediction), node

prediction, network evolution, etc.

Page 3: Link Reconstruction  from Partial Information

B.1 Problem of network reconstruction

1

2

3

4

5

010?0

10?01

0?011

?010?

011?0

5

4

3

2

1

54321

Guess out the values (0 or 1) of dashed arrows.

There are some unknown links, which may be corrupted, missed or unable to measure at time.

Presumptions: o Network has structures.o Unknown links are fairly sampled.oNumber of unknown links are small.

Page 4: Link Reconstruction  from Partial Information

B.2 Procedures of reconstruction of links

Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each unknown links (i,j)-> determine a threshold of connection probability Pt-> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise

observed network

parameters

model function

optimizationconnection probability

threshold reconstruction or prediction

modelingprediction

Page 5: Link Reconstruction  from Partial Information

B.3 Reformulated signal detection problem

Observed network -> 3 types of signals, 0, 1 and ?.Fitted model -> connection probabilities, P0 and P1.Signals (P?) to be classified -> ?

Problem: Giving connection probability P? -> type of signal (0 or 1)

Assumption under certain model:Unknown links do not influence significantly the reliability of fitted model (P0 and P1) , i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.

Page 6: Link Reconstruction  from Partial Information

Searching an optimal detection scheme? e.g., Neyman-Pearson criterion,

Observation (data): connection probability (p)Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region

Decision D: D0 (accept H0) and D1 (accept H1)

B.4 An equivalent hypothesis testing problem

11

00

RpD

RpDD

1010 RRERR

md

mf

PHDPP

HDPPHDPP

1)(

)()(

11

1001

fyD PHDPHDP )(),(min 0110)(

Page 7: Link Reconstruction  from Partial Information

B.5 Measuring reconstruction performance

actual valuepre

dictin

g o

utco

me

p np’ True Positive (TP) False Positive (FP) P’n’ False Negative (FN) True Negative (TN) N’

P N

Contingency table (or confusion matrix)

statistics defined: Sensitivity or True Positive Rate

(TPR):

TPR=TP/P=TP/(TP+FN)

False Positive Rate (FPR): FPR=FP/N=FP/(FP+TN)Accuracy (ACC): ACC=(TP+TN)/(P+N)

True Negative Rate or Specificity (SPC)

:SPC=TN/N=1-FPR

Positive Predictive Value (PPV): PPV=TP/(TP+FP)Receiver Operating Characteristic

(ROC):

TPR vs. FPR

Page 8: Link Reconstruction  from Partial Information

B.6 Relation to performance measures

f0(p)

R4R3

R2R1

f1(p)

pt

)(

)(

)(

)(

1

4

3

2

10

RSTN

RSFP

RSFN

RSTP

if

connection probabilities

Page 9: Link Reconstruction  from Partial Information

B.7 Criterion of MAP

)()(

)()({

011

100

pHPpHPD

pHPpHPDD

For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.

MAPii

i cLHcP

HcP

cP

HcPcHP

)()(

)(

)(

)()(

1

0

0

1

Page 10: Link Reconstruction  from Partial Information

A.1 Probabilistic model of structured networks

IIdd

eAobp

CMAwwfC

wwCCnkwCC

CMAC

wwwwk

ji

wwwwijij

ijijjiij

jiijkk

Tmkkkk

jiij

T

ji or

matrix adjancency ,matrix connection

attribute define node for

ij1

,)1(Pr

)()(

),(),2,1,(

),(

],,,[,

)()(

21

Page 11: Link Reconstruction  from Partial Information

A.2 Estimate model parameters (MLE)

met are conditions stopping wheniterating cease )3

updateusly simultaneo )2

initial fromstart )1

onoptimizati basedgradient iterated

)()1(

)(

)1(

)(

)1ln()1(ln)Pr(ln

0ww01

0

11

,,

W

LWW

W

wwddp

pA

w

p

pp

pA

w

L

pApAwAL

jk

N

j kjjk

jkjkN

j k

jk

jkjk

jkjk

k

ijiijij

ijiijij

Page 12: Link Reconstruction  from Partial Information

B.8 Example network

Page 13: Link Reconstruction  from Partial Information

B.9 Density function of connection

probabilities

0 0.2 0.4 0.6 0.8 1

-0.01

0

0.01

0.02

0.03

0.04

0.05

Connection Probability (p)

Pro

babi

lity

dens

ity f

unct

ion

(pdf

)

f1(p)

1/r f0(p)

Page 14: Link Reconstruction  from Partial Information

B.10 MAP detector minimizes average error

1

0

0

11100

1100

0 110

0 01

001

101010

)(

)(0)()(

)())(1(min

)()(

)(1)()(

)()(

t

ttt

t

tt

p

pp

pf

pfpfpf

p

M

pFpFM

dppfHDP

dppfdppfHDP

HDPHDPM

t

t

t

Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).

Page 15: Link Reconstruction  from Partial Information

B.11 Distribution of connection

probabilities

0 0.2 0.4 0.6 0.8 1-0.5

0

0.5

1

1.5

2

Connection probability (p)

Sca

led

prob

abili

ty d

istr

ibut

ion

func

tion

F1(p)

1/r (1-F0(p))

F1(p)+1/r (1-F0(p))

Page 16: Link Reconstruction  from Partial Information

B.12 Generalizability of algorithm

0 0.2 0.4 0.6 0.8 110

-6

10-5

10-4

10-3

10-2

10-1

100

Connection probability (p)

Pro

babi

lity

dens

ity f

unct

ion

(pdf

)

F0(p)

F0m(p)

0 0.2 0.4 0.6 0.8 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Connection probability (p)

Pro

babi

lity

dens

ity f

unct

ion

(pdf

)

F1(p)

F1m(p)

Unknowns following same distribution approximately?

Possible reasons for unfavorable burst at tail, source of model error.

Page 17: Link Reconstruction  from Partial Information

B.13 Robustness of algorithm

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Connection probability (p)

Sca

led

prob

abili

ty d

istr

ibut

ion

func

tion

1/r (1-F0(p)) 5%

1/r (1-F0m(p)) 5%

1/r (1-F0(p)) 10%

1/r (1-F0m(p)) 10%

1/r (1-F0(p)) 15%

1/r (1-F0m(p)) 15%

1/r (1-F0(p)) 20%

1/r (1-F0m(p)) 20%

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

Connection Probability (p)

Pro

babi

lity

dist

ribu

tion

func

tion

F1(p) 5%

F1m(p) 5%

F1(p) 10%

F1m(p) 10%

F1(p) 15%

F1m(p) 15%

F1(p) 20%

F1m(p) 20%

sensitive to number of unknown links?

Page 18: Link Reconstruction  from Partial Information

B.14 Comparison of operation points

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0

0.5

1

1.5

2

Connection probability (p)

Sca

led

prob

abilt

y di

stru

btio

n fu

nctio

n

F1(p)

1/r (1-F0(p))

F1m(p)

1/r (1-F1m(p)

F1(p)+1/r (1-F0(p))

F1m(p)+1/r (1-F0

m(p))

Page 19: Link Reconstruction  from Partial Information

B.15 Reconstruction results

P N ACC (%) TP/P (%)TN/N (%)

TP/(TP+FP) (%)

201 5293 98.13 80.60 98.79 71.68

222 5272 98.13 80.63 98.86 74.90

192 5302 98.11 75.52 98.92 71.78

224 5270 98.25 80.80 98.99 77.35

235 5259 98.13 75.32 99.14 79.73

217 5277 98.38 78.34 99.20 80.19

204 5290 98.31 77.45 99.11 77.07

192 5302 98.25 71.88 99.21 76.67

231 5263 98.16 77.06 99.09 78.76

217 5277 97.93 71.89 99.00 74.64

213.5 5280.5 98.18 76.95 99.03 76.28

USAir Network, 10% missed

Page 20: Link Reconstruction  from Partial Information

C.1 A variant problem of link reconstruction

Observed network -> types of signals, 0 and 1.

1

2

3

4

5

010?0

10?01

0?011

?010?

011?0

5

4

3

2

1

54321

0100

1001

0011

010

0110

0

0

0

00

0

5

4

3

2

1

54321

some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.

Page 21: Link Reconstruction  from Partial Information

C.2 Procedures for the variant problem

Available information -> fitted probabilistic model P(NxN)-> connection probability p(i,j) of each 0-link (i,j)-> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.

Page 22: Link Reconstruction  from Partial Information

C.3 Algorithm based on common neighbor

max/ nnpp ijjiij

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0

0.5

1

1.5

2

2.5

Connection Probability (p)

Scale

d d

istr

ibution f

unctions

F11/r (1-F0)F1+1/r (1-F0)

0.1 0.2 0.3 0.4 0.5 0.6 0.7

0

0.5

1

1.5

Connection probability (p)S

cale

d d

istr

ibution f

unction

F11/r (1-F0)F1 + 1/r (1-F0)

Page 23: Link Reconstruction  from Partial Information

C.4 Comparison between two methods

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

-0.02

0

0.02

0.04

0.06

0.08

0.1

Connection probability (p)

Pro

babi

lity

dens

ity f

unct

ion

f1 common neighbors1/r f0 common neighborsf1 model1/r f0 model

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

-0.5

0

0.5

1

1.5

2

2.5

3

Connection probability (p)

Sca

led

dist

ribut

ion

func

tion

F1 common neighbors1/r (1-F0) common neighborsF1 model1/r (1-F0) model

Probability density functions Distribution functions

Page 24: Link Reconstruction  from Partial Information

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.2

0.4

0.6

0.8

1

1.2

1.4

Connection probability (p)

Dis

trib

utio

n fu

nctio

n

common neighborsred: 20%blue 5%

model-basedred: 20%blue 5%

C.5 Generalizability and robustness of algorithms

Page 25: Link Reconstruction  from Partial Information

0 100 200 300 400 500 600 700 800 900 10000

50

100

150

200

250

Number of predicted links

Num

ber

of h

its

common neighborsprobabilistic modelperfect algorithm

C.6 Reconstruction performance by ranking

Page 26: Link Reconstruction  from Partial Information

0 100 200 300 400 500

0

50

100

150

200

250

300

350

400

450

500

nz = 1740

D.1 Problem of link prediction

Procedure is identical to that of the variant link

reconstruction problem.

0 50 100 150 200 250 300 350 400 450 5000

50

100

150

200

250

300

350

400

Number of links predicted

Num

ber

of h

its

common neighbormodel basedperfect algorithm

Econophysics Co-authorship network (N=506, m=519, nL=379)

0 100 200 300 400 500

0

50

100

150

200

250

300

350

400

450

500

nz = 1038

Page 27: Link Reconstruction  from Partial Information

D.2 Factors to affect prediction performance

Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0

0.05

0.1

0.15

0.2

0.25

0.3

Connection probability (p)

Pro

babi

lity

dens

ity f

unct

ion

f1f0fn

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

Connection probability (p)

Sca

led

dist

ribut

ion

func

tion

F11/r (1-F0)Fn

Page 28: Link Reconstruction  from Partial Information

D.3 Effects of training set size

Assume new links to be known, examine the variant

problem above: training data set is not able to capture

underlying distribution faithfully, either size is too small

or growing rule is time dependent.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Connection probability (p)

Pro

babi

lity

desi

ty f

unct

ion

F1FnF0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

14

Connection Probability (p)

Sca

led

dsitr

ibut

ion

func

tion

F1Fn1/r (1-F0)

Page 29: Link Reconstruction  from Partial Information

Conclusions

The problem of network reconstruction is thoroughlystudied. Under more general framework, the problemcan be reformulated as hypothesis testing problem,which gives deeper insights into our understanding ofthe problem, and enable us to relate the reconstructionperformance of various methods to quantities at morefundamental level.

Page 30: Link Reconstruction  from Partial Information

THANK YOUTHANK YOU