28
Paper Presentation Steve Jan Virginia Tech March 5, 2015 Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 1 / 28

Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Paper Presentation

Steve Jan

Virginia Tech

March 5, 2015

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 1 / 28

Page 2: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

2 paper to present

Nonparametric Multi-group Membership Model for DynamicNetworks, NIPS13, Myunghwan Kim and Jure Leskovec, Stanford

Community Detection in Graphs through Correlation, KDD14, LianDuan, W. Nick Street, Yanchi Liu, Haibing Lu, New Jersey Instituteof Technology, Santa Clara University

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 2 / 28

Page 3: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Nonparametric Multi-group Membership Model

Social networkis often dynamic in a sense that relations betweenentities rise and decay over time.

Problem: extract a summary of the common structure and dynamicof the underlying relations.

Applications: Predict missing relationships, forecast future links,identify clusters and groups of nodes

Note

It uses lots of statistic techniques to solve this problem.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 3 / 28

Page 4: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Dynamic Multi-group Membership Graph Model

They pay close attention to the three processes governing networkdynamics:

Birth and death dynamics of individual groups

Evolution of memberships of nodes to groups

The structure of network interactions between group members as wellas non-members.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 4 / 28

Page 5: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Birth and death dynamics of individual groups

Why do we know when the groups birth and death?It would be more clear that for the number of groups at each specific time.A group can be be in one of two states:{ active (alive) or inactive (not yetborn or dead) }.

Figure: Blact: active (alive), White: inactive

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 5 / 28

Page 6: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Formal Way of Birth and death dynamics of individualgroups

It uses distance-dependent Indian Buffet Processes (dd-IBP) to model,which is a time-relate stochastic process.Customers enter an Indian Buffet restaurant and sample some subset of aninfinitely long sequence of dishes.In this applications, time t would be customers, they samples a set ofactive groups Kt .Formally speacking, at the first time step t = 1, we have Poisson(λ)number of groups that are initially active, i.e., K1 ∼ Poisson(λ).Poisson(γλ) new groups are also born at time t.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 6 / 28

Page 7: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Dynamics of node group memberships

Intuition: Nodes joining and leaving groups based on their current status..They further uses Markov chain to model dynamics of nodes joining andleaving groups.They denote each node i of the network is whether belong to communityK at time t by a binary variable z tik ∈ {0, 1}

where, ak , bk are two parameters and probability.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 7 / 28

Page 8: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Relationship between node group memberships and links ofthe network

Intuition: Link netween two nodes based on their current groups Theyassume there is a connection between nodes memberships to groups andthe links of the network.They build on the Multiplicative Attribute Graph model: each group k isassociated with a link affinity matrix M ∈ R2×2.These four entries represent groups members, members and non-members,as well as non- members themselves.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 8 / 28

Page 9: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Model Inference via MCMC

After introducing these three models, then they try to sample theseparameters.

Sampling node group memberships Z : Use forward-backwardrecursion algorithm.

group membership transition matrix Q: Use a conjugate prior ofBernoulli distribution and some posterior distribution.

Sampling link affinities M: Use Metropolis-Hastings and HybridMonte Carlo (HMC) sampling.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28

Page 10: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Experiments

Datasets they use:

NIPS co-authorships network for T = 17 years (1987 to 2003).

DBLP co- authorship network is obtained from 21 Computer Scienceconferences from 2000 to 2009 (T = 10)

INFOCOM dataset represents the physical proximity interactionsbetween 78 students at the 2006 INFOCOM conference, T = 50

Tasks they have:

Missing link prediction

Future network forecasting

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 10 / 28

Page 11: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Missing link prediction

Randomly hold out 20% of node pairs throughout the entire time period.Naive: Relationship between each pair of nodes is decided by Bernoullidistribution with Beta(1, 1) prior.LFRM: static networksDRIFT: infinite factorial HMM model.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 11 / 28

Page 12: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Future network forecasting

Given networks from t = 1, . . . ,T , they want to predict the link oft = T + 1. They train the models on first Tobs networks, fix theparameters, and then for each model they run MCMC sampling one timestep into the future.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 12 / 28

Page 13: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Conclusion

We learn three models for time series.

How to sampel these parameters

Personally I think this paper is good in terms of the statisticsmethods they use

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 13 / 28

Page 14: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Community Detection in Graphs through Correlation

Then, we move to the next paper. This paper is about CommunityDetection, based on Modularity-based.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 14 / 28

Page 15: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Major problem of modularity

Resolution problem.Km is an m-cliqueThe detected communities are marked by circles with dash lines.

Multi-resolutionFurther divide each detected communityBias: (the tendency to merge small communities and to split largecommunities, are introduced.)

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 15 / 28

Page 16: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Connection with itemset search

Graph communities: number of internal edges is greater thanexpected under assumption of random partition

Correlated itemsets: occur more than expected under the assumptionof item independence

Connection: modularity = leverage

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 16 / 28

Page 17: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Correlated Itemsets

Given itemset S = {I1, I2, . . . , Im} with m items in a dataset with ntransactions

True probability: tps = P(S)

Expected probability eps =∏m

i=1 P(Ii )

Correlation measure: Ms = f (tps , eps)

Chi-square: (tps−eps )2

eps

Probability ratio : tps/epsLeverage: tps − eps

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 17 / 28

Page 18: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Correlated itemset example

t1: Beef, Chicken, Milkt2:Beef, Cheeset3: Cheese, Bootst4: Beef, Chicken, Cheeset5: Beef, Chicken, Clothes, Cheese, Milk

For the itemset {Beef, Chicken}tp = 3

5 , ep = 35 ∗

45 , Leverage = tp − ep = 3

25

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 18 / 28

Page 19: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Modularity Function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 19 / 28

Page 20: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

For partition {G1,G2, . . . ,Gl}on graph G

ki : degree of node i

k internal : number of nodes in the same group of node i that connectto node.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 20 / 28

Page 21: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 21 / 28

Page 22: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

They found that if translating the undirect-graph modulaity todirect-graph one, they can use itemset criteria to represnt moduality.If we randomly select an edge from the doubly-directed graph:

The true probability of the edge in Gp : tp =

∑i∈Gpk

internali

2m

Probability the edge started from Gp :

∑i∈Gpki2m

Probability the edge ended in Gp :

∑j∈Gpkj

2m

The expected probability of the edge in Gp under the assumption of

independence: ep =

∑i∈Gpki2m ∗

∑j∈Gpkj

2m

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 22 / 28

Page 23: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 23 / 28

Page 24: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 24 / 28

Page 25: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Transforming modularity function

Connecting correlation with modularity

For a given partition Gp, partial modularity Qp = tpp − eppFor a given itemset S , leverage = tps − eps

Since the other correlation measures are also functions of tp and ep, theycan change the partial modularity function Qp by using the formula ofother correlation measures.

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 25 / 28

Page 26: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Experiments

Modify the objective function

Greedy search (hierarchical clustering)

Baseline: Modularity-based methods (Leverage)

Datasets: Real life: 1. Karate club( two equal size communities) 2.College football(12 equal size communities)

Evaluation measures:

Rand Index (Rand1971), Jaccard, F-measure, Normalized mutualinformation (Danon 2005)

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 26 / 28

Page 27: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Real life datasets

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 27 / 28

Page 28: Paper Presentation...Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 9 / 28 Experiments Datasets they use: NIPS co-authorships network for T = 17 years (1987 to 2003). DBLP

Summary

Connection between community detection and correlation search

Modularity is good only when there are large and clear communities

Likelihood ratio is robust to any type of communities

Probability ratio partitions the whole graph into small communitieswith 2 or 3 objects

Steve Jan (Virginia Tech) Paper Presentation March 5, 2015 28 / 28