Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

Link Prediction

Topics in Data MiningFall 2015

Bruno Ribeiro

Deterministic Heuristics Methods

Matrix / Probabilistic Methods

Supervised Learning Approaches

Overview

Input: Snapshot of social network at time t

Ouput: Predict edges that will be added to network at time t’ > t

Product Recommendation

Example of link prediction application

Twitter’s Who to Follow

Another example of link prediction application

A few more examples:

Fraud Detection: (Beutel, Akoglu, Faloutsos, KDD’15)◦ Focuses on discovering surprising link patterns

Anomaly detection: (Rattigan et al, 2005) ◦ Focuses on finding unlikely existing links

Other Applications

Deterministic Heuristics

Every entity is assigned a score

Score(v) = how many friends person already has

A Very Naïve Approach

Assign connection weight score(u, v) to non-existing edge (u, v)

Rank edges and recommend ones with highest score

Can be viewed as computing a measure of proximity or “similarity” between nodes u and v.

General Approach

Negated length of shortest path between u and v. All nodes that share one neighbor will be linked

Problem: Network diameter often very small

Shortest Path

Common neighbors uses as score the number of common neighbors between vertices u and v.

Score can be high when vertices have too many neighbors

Common Neighbors

Fixes common neighbors “too many neighbors” problem by dividing the intersection by the union

Between 0 and 1

Jaccard Similarity

This score gives more weight to neighbors that that are not shared with many others.

Adamic / Adar

5 neighbors

50neighbors

Too activeLess evidence of missing link between Alice and BobLess activeMore evidence of missing link between Bob and Carol

The probability of co-authorship of u and v is proportional to the product of the number of collaborators of u and v

Preferential Attachment

Exponentially weighted sum of number of paths of length l

For small α << 1 : predictions ≈ common neighbors

Katz (1953)

where, Hu,v is the random walk hitting time between u and v

Hitting Time

Stationary probability walker is at v under the following random walk: ◦ With probability α, jump back to u ◦ With probability 1 – α, go to random neighbor of

current node

Personalized PageRank

N(u) and N(v) are number of in-degrees of nodes u and v

Only directed graphs

score(u,v) ∊ [0,1]. If u=v then, s(u,v)=1

SimRank

Latent Space Models

Represent the adjacency matrix A with a lower rank matrix Ak.

If Ak(u,v) has large value for a missing A(u,v)=0, then recommend link (u,v)

Low Rank Reconstruction

Vn⨉k

(Non-negative matrix factorization)

Product Recommendations

Users have a propensity rate to buy certaincategory of product (W)

In a category some products have a propensity rate to be bought (H)

Example Euclidean Generative Model

The problem of link prediction is to infer distances in the latent space, i.e., find the nearest neighbor in latent space not currently linkedRaftery, A. E., Handcock, M. S., & Hoff, P. D. (2002). Latent

space approaches to social network analysis. J. Amer. Stat. Assoc., 15, 460.

Unit plane

• Vertices uniformly distributed in latent unit

hyperplane

• Vertices close in latent space more likely to be

connected

• Probability of edge = f (distance on latent space)

Directed Graphical Models

Bayesian networks and Prob. Relational Models (Getoor et al., 2001, Neville & Jensen 2003)

Captures the dependence of link existence on attributes of entities

Constrains probabilistic dependency to directed acyclic graph

Undirected Graphical Models Markov Networks

Bayesian Networks

Example of Today’s Approaches

Credit: N. Barbieri, F. Bonchi, G. Manco KDD’14

Whom To Follow and Why

A Relational Markov Network (RMN) specifies cliques and the potentials between attributes of related entities at the template level

Single model provides distribution for entire graph

Train model to maximize the probability of observed edge and labels

Use trained model to predict edges and unknown attributes

Relational Markov Networks

Latent Space Rational

Generative model

Link Prediction Heuristic

Node u

Most likely neighbor of node v ?

Node v

Observed edges

Model parameters (latent space)

Relationship between Deterministic & Probabilistic methods?

Yes = (Sarkar, Chakrabarti, Moore, COLT’10)

Empirically

RandomShortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

How do we justify these observations?

Especially if the graph is sparse

Credit: Sarkar, Chakrabarti, Moore

Raftery’s Model (cont)

Higher probability of linking

Two sources of randomness• Point positions: uniform in D dimensional space

• Linkage probability: logistic with parameters α, r

• α, r and D are known

radius r

α determines the steepness

Raftery’s Model: Probability of New Edge

Logistic function integrated over volume determined by dij

Connection to Common Neighbors

No. commonneigh.

Net size Decreases with N

Empirical Bernstein bounds (Maurer & Pontil, 2009)

Distinct radius per node gives Adamic/Adar

Create binary classifier that predicts whether an edge exists between two nodes

Supervised Learning Approaches

Types of Features

Features

Label Features

KeywordsFrequency of

an item

Topological Features

Shortest DistanceDegree

CentralityPageRank

SVM Decision Trees Deep Belief Network (DBN) K-Nearest Neighbors (KNN) Naive Bayes Radial Basis Function (RBF) Logistic Regression

Link Prediction Heuristic

Classifier

Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

Documents

TCC - Bruno Ribeiro da Silva - Portal - IdUFF - Bruno Ribeiro da... · QUADRO 1- Despesas Públicas 23 QUADRO 2- Receitas Públicas 24 QUADRO 3- Regime de Colaboração financeira

MINISTÉRIO DA EDUCAÇÃO EMPRESA BRASILEIRA DE SERVIÇOS … · 2014. 8. 8. · Bruno De Almeida Ribeiro 16829950 7.00 4.00 2.00 4.00 34.00 51.00 Bruno Do Carmo Furtado 16805532

2º Encontro Brasileiro de DATA SCIENCE...VAREJO DE ALIMENTOS (BRUNO CEZAR TRANQUILLINI, BRUNO BRITO PEREIRA DE SOUZA, FÁBIO RIBEIRO LEAL, EDUARDO DE REZENDE FRANCISCO - FGV) [#175]

Goiânia...LUCIANA FERNANDES CAVALLEIRO DE MACEDO REYNER ABRANTES STIVAL LÍVIA MARA RODRIGUES RODRIGO BARROS RIBEIRO BRUNO LELITSCEW DA BELA …

141215 bruno ribeiro

VÍTOR BRUNO MARQUES RIBEIRO - sigarra.up.pt

Estudos e pesquisas em saúde no Brasil Professor: Bruno Ribeiro Mestre em ciências da saúde e meio-ambiente

BRUNO DA SILVA MUSSA CURY - unirio.br Bruno... · BRUNO RIBEIRO DA SILVA A aplicação do coaching na carreira profissional dos bibliotecários: foco na gestão de serviços. Trabalho

BRUNO JOSÉ RAMOS RIBEIRO BRUNO TEODORO MACHADO … · de forma ágil. Outra estratégia, nomeada Inbound Marketing tem como meta determinado público alvo específico, apresentando

Dissertacao Mestrado Bruno Chapadeiro Ribeiro

LÍNGUA EJA - centrodemidias.am.gov.br · lÍngua inglesa prof. ademir ribeiro prof. bruno frÓes eja 1ª fase

Como roubei como um artista - Bruno Léo Ribeiro

LISTÃO UNILASALLE 1ª EDIÇÃO · allan dos santos wiegand ... bruno nicola da rosa pes bruno ribeiro cardoso camila de moura alves ... jefferson fagundes da fonseca

PCMSO PROGRAMA DE CONTROLE MÉDICO DE SAÚDE OCUPACIONAL NR 7 Enfermeiro do Trabalho Bruno Ribeiro

EDITAL nº 066.2/2017 – IFMS XAVIER MAURO Deferido ... AMANDA HIKARI SILVA TANAKA Deferido ... BRUNO KENDJI RIBEIRO MINAMI Deferido

Model Selection Topics in Data Mining Fall 2015 Bruno Ribeiro

Resume - Bruno Ribeiro

Passagens 2017 - Banco de Brasília · brb bruno ribeiro gois susis/nucam bsb/gru/cgh/bsb 21/12/16 avianca r$ 1.468,00 r$ 55,47 r$ 1.523,47 brb bruno ribeiro gois susis/nucam bsb/cgh

Modeling and Predicting the Growth and Death of Membership ...ribeiro/pdf/Ribeiro_ · Modeling and Predicting the Growth and Death of Membership-based Websites Bruno Ribeiro School

João Renato de Souza Coelho Benazzi Bruno Yagelovic Pedrabooks.scielo.org/id/hcmrr/pdf/ribeiro-9788523217341-09.pdf · João Renato de Souza Coelho Benazzi Bruno Yagelovic Pedra