Discovering Roles and Anomalies in Graphs:
Theory and Applications
Part 1: Theory Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)
SDM'12 Tutorial
T. Eliassi-Rad & C. Faloutsos 2
Overview
SDM'12 Tutorial
Roles
Features
Anomalies
Patterns
= rare roles
T. Eliassi-Rad & C. Faloutsos 3
Overview
SDM'12 Tutorial
Roles
Features
Anomalies
Patterns
= rare roles
Roadmap • What are roles • Roles and communities • Roles and equivalences (from sociology) • Roles (from data mining) • Summary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 4
What are roles? • “Functions” of nodes in the network
– Similar to functional roles of species in ecosystems
• Measured by structural behaviors • Examples
– centers of stars
– members of cliques – peripheral nodes
– … SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 5
Example of Roles
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 6
centers of stars members of cliques peripheral nodes
Why are roles important? Task Use Case Role query Identify individuals with similar
behavior to a known target
Role outliers Identify individuals with unusual behavior
Role dynamics Identify unusual changes in behavior
Identity resolution Identify known individuals in a new network
Role transfer Use knowledge of one network to make predictions in another
Network comparison
Determine network compatibility for knowledge transfer
Role Discovery
ü Automated discovery
ü Behavioral roles
ü Roles generalize
Roadmap • What are roles • Roles and communities • Roles and equivalences (from sociology) • Roles (from data mining) • Summary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 8
Roles and Communities • Roles group nodes with similar structural
properties • Communities group nodes that are well-
connected to each other • Roles and communities are complementary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 9
Roles and Communities
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 10
RolX * Fast Modularity†
* Henderson, et al. 2012; † Clauset, et al. 2004
Roles and Communities
• Roles – Faculty
– Staff – Students
– …
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 11
• Communities – AI lab
– Database lab – Architecture lab
– …
Consider the social network of a CS dept
Roadmap • What are roles • Roles and communities • Roles and equivalences (from sociology) • Roles (from data mining) • Summary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 12
Equivalences
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 13
Equivalences
Deterministic
Regular
Automorphic
Structural
Probabilistic Stochastic
Deterministic Equivalences
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 14
Regular
Automorphic
Structural
Structural Equivalence • [Lorrain & White, 1971] • Two nodes u and v are structurally
equivalent if they have the same relationships to all other nodes
• Hypothesis: Structurally equivalent nodes are likely to be similar in other ways – i.e., you are your friend
• Weights & timing issues are not considered • Rarely appears in real-world networks
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 15
u v
d e
a b c
Structural Equivalence: Algorithms • CONCOR (CONvergence of iterated CORrelations)
[Breiger et al. 1975] • A hierarchical divisive approach
1. Starting with the adjacency matrix, repeatedly calculate Pearson correlations between rows until the resultant correlation matrix consists of +1 and -1 entries
2. Split the last correlation matrix into two structurally equivalent submatrices (a.k.a. blocks): one with +1 entries, another with -1 entries
• Successive split can be applied to submatrices in order to produce a hierarchy (where every node has a unique position)
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 16
Structural Equivalence: Algorithms • STRUCUTRE [Burt 1976] • A hierarchical agglomerative approach
1. For each node i, create its ID vector by concatenating its row and column vectors from the adjacency matrix
2. For every pair of nodes 〈i, j〉, measure the square root of sum of squared differences between the corresponding entries in their ID vectors
3. Merge entries in hierarchical fashion as long as their difference is less than some threshold α
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 17
Structural Equivalences: Algorithms • Combinatorial optimization approaches
– Numerical optimization with tabu search [UCINET]
– Local optimization [Pajek]
• Partition the sociomatrices into blocks based on a cost function that minimizes the sum of within block variances
– Basically, minimize the sum of code cost within each block
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 18
Deterministic Equivalences
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 19
Regular
Automorphic
Structural
Automorphic Equivalence • [Borgatti, et al. 1992; Sparrow 1993] • Two nodes u and v are automorphically equivalent if all
the nodes can be relabeled to form an isomorphic graph with the labels of u and v interchanged – Swapping u and v (possibly
along with their neighbors) does not change graph distances
• Two nodes that are automorphically equivalent share exactly the same label-independent properties
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 20
u v
Automorphic Equivalence: Algorithms • Sparrow (1993) proposed an algorithm that scales linearly to the
number of edges
• Use numerical signatures on degree sequences of neighborhoods
• Numerical signatures use a unique transcendental number like π, which is independent of any permutation of nodes
• Suppose node i has the following degree sequence: 1, 1, 5, 6, and 9. Then its signature is Si,1 = (1 + π)(1 + π) (5 + π) (6 + π) (9 + π)
• The signature for node i at k+1 hops is Si,(k+1) = Π(Si,k + π)
• To find automorphic equivalence, simply compare numerical signatures of nodes
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 21
Deterministic Equivalences
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 22
Regular
Automorphic
Structural
Regular Equivalence • [Everett & Borgatti, 1992] • Two nodes u and v are regularly equivalent if they
are equally related to equivalent others
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 23
+Regular Equivalenceg q
� Introduced by Borgatti (1996)
� Definition:� Two actors are regularly equivalent if they are equally related to
i l t thequivalent others
President Motes
Faculty
Graduate Students
Hanneman, Robert A. and Mark Riddle. 2005. Introduction to social network methods. Riverside, CA: University of California, Riverside ( published in digital form at http://faculty.ucr.edu/~hanneman/ )
Regular Equivalence (continued)
• Basic roles of nodes – source
– repeater – sink
– isolate
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 24
Regular Equivalence (continued)
• Based solely on the social roles of neighbors
• Interested in – Which nodes fall in which social roles?
– How do social roles relate to each other?
• Hard partitioning of the graph into social roles
• A given graph can have more than one valid regular equivalence set
• Exact regular equivalences can be rare in large graphs
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 25
Regular Equivalence: Algorithms • Many algorithms exist here • Basic notion
– Profile each node’s neighborhood by the presence of nodes of other "types"
– Nodes are regularly equivalent to the extent that they have similar "types" of other nodes at similar distances in their neighborhoods
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 26
Equivalences
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 27
Equivalences
Deterministic
Regular
Automorphic
Structural
Probabilistic Stochastic
Stochastic Equivalence • [Holland, et al. 1983;
Wasserman & Anderson, 1987]
• Two nodes are stochastically equivalent if they are “exchangeable” w.r.t. a probability distribution
• Similar to structural equivalence but probabilistic
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 28
u v
a
b
p(a,v)
p(u,b) p(v,b)
p(a,u)
p(a,u) = p(a,v) p(u,b)= p(v,b)
Stochastic Equivalence: Algorithms • Many algorithms exist here • Most recent approaches are generative
[Airoldi, et al 2008] • Some choice points
– Single [Kemp, et al 2006] vs. mixed-membership [Koutsourelakis & Eliassi-Rad, 2008] equivalences (a.k.a. “positions”)
– Parametric vs. non-parametric models
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 29
Roadmap • What are roles • Roles and communities • Roles and equivalences (from sociology) • Roles (from data mining) • Summary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 30
RolX: Role eXtraction • Introduced by Henderson, et al. 2011b • Automatically extracts the underlying roles
in a network – No prior knowledge required
• Assigns a mixed-membership of roles to each node
• Scales linearly on the number of edges
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 31
RolX: Flowchart
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 32
Node × Node Matrix
Recursive Feature
Extraction
Node × Feature Matrix
Role Extraction
Node × Role Matrix
Role × Feature Matrix
RolX: Flowchart
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 33
Node × Node Matrix
Recursive Feature
Extraction
Node × Feature Matrix
Role Extraction
Node × Role Matrix
Role × Feature Matrix
Example: degree, avg weight, # of edges in egonet, mean clustering coefficient of neighbors, etc
Recursive Feature Extraction • ReFeX [Henderson, et al. 2011a] turns network connectivity into
recursive structural features
• Neighborhood features: What is your connectivity pattern? • Recursive Features: To what kinds of nodes are you connected? SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 34
ReFeX
Local Egonet Recursive
Neighborhood
Regional
1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2#1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0#339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1#941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0#940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1#1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1#945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1#332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1#946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2#333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5#1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1#944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1#331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5#949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0#336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0#337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1#947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0#335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#
Nod
es
Propositionalisation (PROP) • [Knobbe, et al. 2001; Neville, et al. 2003; Krogel, et al.
2003]
• From multi-relational data mining with roots in Inductive Logic Programming (ILP)
• Summarizes a multi-relational dataset (stored in multiple tables) into a propositional dataset (stored in a single “target” table)
• Derived attribute-value features describe properties of individuals
• Related more to recursive structural features than structural roles
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 35
Role Extraction
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 36
Features 1411# 0# 1# 2# 1# 0# 0# 0# 1# 1# 0# 1# 0# 0# 1# 1# 2# 2#1410# 0# 1# 1# 1# 0# 1# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#338# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 1# 0# 0# 0#339# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#1415# 0# 1# 1# 2# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 1# 1# 1#941# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1414# 0# 1# 1# 1# 0# 1# 0# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#942# 0# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#1413# 0# 1# 1# 1# 0# 1# 1# 0# 0# 0# 0# 0# 1# 1# 0# 1# 1#1412# 0# 0# 0# 0# 0# 0# 0# 1# 2# 0# 1# 1# 0# 0# 1# 2# 0#940# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 1# 0# 1# 1# 1#1419# 0# 0# 1# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 1#945# 0# 1# 4# 3# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 1# 3# 1#332# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1418# 0# 0# 1# 0# 0# 0# 0# 1# 0# 0# 0# 1# 2# 0# 1# 0# 1#946# 0# 1# 1# 0# 0# 1# 0# 1# 0# 0# 0# 1# 4# 0# 1# 1# 2#333# 0# 0# 0# 0# 1# 0# 1# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0#1417# 0# 1# 1# 1# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 1# 1# 1#943# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#330# 1# 3# 2# 0# 1# 2# 2# 0# 2# 2# 2# 0# 3# 1# 0# 2# 5#1416# 0# 1# 1# 1# 1# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 1# 1#944# 0# 1# 4# 2# 0# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 3# 1#331# 0# 3# 2# 1# 0# 1# 0# 0# 2# 0# 2# 0# 2# 0# 1# 2# 5#949# 0# 0# 0# 0# 2# 0# 0# 1# 0# 1# 0# 1# 0# 0# 0# 0# 0#336# 0# 0# 0# 0# 2# 0# 0# 1# 1# 1# 1# 1# 0# 0# 0# 1# 0#337# 1# 1# 1# 0# 0# 1# 2# 0# 1# 1# 1# 0# 1# 1# 1# 1# 1#947# 1# 0# 0# 0# 2# 0# 1# 0# 0# 2# 0# 1# 0# 1# 0# 0# 0#334# 0# 0# 0# 1# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0#948# 0# 0# 0# 0# 0# 1# 0# 1# 1# 0# 1# 1# 1# 0# 1# 1# 0#335# 0# 0# 0# 1# 0# 0# 0# 0# 0# 0# 0# 0# 0# 0# 1# 0# 0#531# 1# 0# 0# 0# 1# 0# 2# 0# 0# 2# 0# 0# 0# 2# 0# 0# 0#
Nod
es Recursively
extract roles
Automatically factorize roles
Automatically Discovered Roles
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 37
Network Science Co-authorship Graph
[Newman 2006]
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 38
Making Sense of Roles
cliquey bridge periphery isolated
0
2
4
6
8
10
12
14 D
egre
e
Avg.
Wei
ght
Clu
ster
ing
Coe
ff.
Ecc
entri
city
Pag
eRan
k
Gat
eKee
per
Gat
eKee
per-
Loca
l
Piv
ot
Stru
ctur
al H
ole
Rol
e 1
Rol
e 2
Rol
e 3
Rol
e 4
NodeSense NeighborSense
Role 1 Role 2 Role 3 Role 4 Default
Mixed-Membership over Roles
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 39
Bright blue nodes are peripheral nodes
Bright red nodes are locally central nodes
Amazon Political Books Co-purchasing Network [V. Krebs 2000]
conservative liberal
neutral
Role Query
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 40
Node Similarity for M.E.J. Newman (bridge)
Node Similarity for F. Robert (cliquey)
Node Similarity for J. Rinzel (isolate)
Roles Generalize across Disjoint Networks
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 41
0
0.2
0.4
0.6
0.8
1
A1-
B
A2-
B
A3-
B
A4-
B
A-B
Cla
ssifi
catio
n A
ccur
acy
TrainSet-TestSet
RolX Feat Default
Figure 4: RolX provides better generalization per-
formance between enterprise IP networks A and
B (mean accuracy of RolX =85%, Feat=71%, p-
value=0.01).
0.35
0.45
0.55
0.65
0.75
0.85
0.95
0 10 20 30 40 50 60 70
Mea
n A
ccur
acy
Number of Roles
Figure 5: RolX chooses a high accuracy model size
of 9 roles, in the middle of the peak accuracy range
of 7-11 roles. The Y-axis depicts the mean classifi-
cation accuracy using RolX (over all 4 test sets) by
model size.
is right in the middle of the peak accuracy range shown inFigure 5. Figure 6a shows that the model selection crite-rion used by RolX is highly correlated with classificationaccuracy (Pearson correlation is -0.91). Figure 6b shows thedefault RolX model selection criterion decomposed into itsconstituent parts. Model cost is the cost associated withrepresenting the model itself while error cost is the cost ofrepresenting the di↵erences between the original feature val-ues and the estimated feature values reconstructed using themodel. As expected, we see a consistent increase in modelcost and a consistent decrease in error cost as the numberof roles increases.
Figure 7 shows that IP tra�c classes are well-separatedin the RolX “role space”, with as few as 3 roles (extractedfrom the original 373 structural features). Note that weachieve even better separation with the automatically se-lected model size of 9 roles (see Figure 4), but we can onlyclearly visualize up to 3.
We omit for brevity the “sensemaking” table for the 3-roleIP experiment. It shows that Roles 1 and 2 are lower-volumeIPs while Role 3 is high-volume servers or P2P nodes. Role3 contains nodes of all three types (Web, DNS, P2P). ThisRole is overloaded since the model size of 3 is not as predic-tive as larger model sizes (see Figure 5).
Reality Mining Device data: Using the Reality MiningDevice dataset, we conducted two sets of transfer learningexperiments. The first set of experiments involves a binaryclassification task where we try to predict whether a givensubject is a business school student or not. The second
2
3
4 5
6
7 8
9 10
11
12 13 15
30
0.7
0.75
0.8
0.85
0.9
3800000 4300000 4800000 5300000 5800000 6300000 6800000
Mea
n C
lass
ific
atio
n A
ccur
acy
Description Length
(a)
0 1 2 3 4 5 6 7 8 9
0 5 10 15 20 25 30 35 Des
crip
tion
Len
gth
(mill
ions
of
bits
)
Number of Roles
Total Cost Model Cost Error Cost
(b)
Figure 6: RolX ’s model selection is e↵ective: (a)
Classification accuracy is highest when RolX selec-
tion criterion is minimized. Red markers indicate
the peak performing model sizes of 7-11 roles (b)
RolX ’s model selection criterion balances model size
and reconstruction accuracy.
set is similar, where we try to predict whether a subjectis a graduate student in the Media Lab or not. As trainand test sets, we used each pair of consecutive months inour dataset. In Figure 8, we show the accuracy of RolX .The Baseline is a classifier that learns to always predict themajority class of the training set on the test set. The timelabels denote the month for the train data, and the monthfollowing that is used as the test data. We also use all thedata in 2004 and 2005 as train and test data, respectively.Notice that RolX outperforms the baseline classifier most ofthe time with an average of 83% and 76% accuracy for thetwo experiment sets, respectively. We notice that RolX ’saccuracy drops when September and May data is used astraining, possibly because these months correspond to thestart and end of the school semesters; the behavior of thesubjects would be generally di↵erent than usual in thesemonths, thus providing not as much predictive informationas the other months would.
4. STRUCTURAL SIMILARITYHere we describe experiments in which RolX is used for
its most basic task: grouping nodes based on their structuralsimilarity.
4.1 Network Sciences Coauthorship data:Our first data set is the weighted Network Science Co-
authorship Graph with 1589 authors (from the network sci-ence community) and 2743 weighted edges [25]. Figure 9shows (a) the role-colored graph (where each node is col-
Roles Generalize across Networks
SDM'12 Tutorial
Thu Network
Wed Network
Tue Network
Feature Discovery
Feature Extraction
Feature Extraction
Inference Learning
Regression Regression
Inference
RolX
C3 C2
V1
G1 G2
V2 V3
G3
C1
L
F
M
E.g., degree, avg wgt, etc
V: (node × feature) matrix
G: (node × role) matrix
F: (role × feature) matrix
L: List of feature names
C: Class labels
M: model
Discovery Stage
42
Roles Generalize across Networks
43
Thu Network
Wed Network
Tue Network
Feature Discovery
Feature Extraction
Feature Extraction
Inference Learning
Regression Regression
Inference
RolX
C3 C2
V1
G1 G2
V2 V3
G3
C1
L
F
M
V: (node × feature) matrix
G: (node × role) matrix
F: (role × feature) matrix
L: List of feature names
C: Class labels
M: model
E.g., degree, avg wgt, etc
Discovery Stage Application Stage
Roles: Regular Equivalence vs. RolX RolX Regular Equivalence
Mixed-membership over roles ✓ Fully automatic ✓
Uses structural features ✓ Uses structure ✓ ✓
Generalizable across disjoint networks
✓
?
Scalable (linear on # of edges) ✓
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 44
Roadmap • What are roles • Roles and communities • Roles and equivalences (from sociology) • Roles (from data mining) • Summary
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 45
Summary • Roles
– Structural behavior (“function”) of nodes
– Complementary to communities
– Previous work mostly in sociology under equivalences
– Recent graph mining work produces mixed-membership roles, is fully automatic and scalable
– Can be used for many tasks: transfer learning, re-identification, node dynamics, etc
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 46
Acknowledgement • LLNL: Brian Gallagher, Keith Henderson • IBM Watson: Hanghang Tong • Google: Sugato Basu • CMU: Leman Akoglu, Danai Koutra, Lei Li
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 47
Thanks to: LLNL, NSF, IARPA.
References Deterministic Equivalences
• S. Boorman, H.C. White: Social Structure from Multiple Networks: II. Role Structures. American Journal of Sociology, 81:1384-1446, 1976.
• S.P. Borgatti, M.G. Everett: Notions of Positions in Social Network Analysis. In P. V. Marsden (Ed.): Sociological Methodology, 1992:1–35.
• S.P. Borgatti, M.G. Everett, L. Freeman: UCINET IV, 1992.
• S.P. Borgatti, M.G. Everett, Regular Blockmodels of Multiway, Multimode Matrices. Social Networks, 14:91-120, 1992.
• R. Breiger, S. Boorman, P. Arabie: An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Comparison with Multidimensional Scaling. Journal of Mathematical Psychology, 12:328-383, 1975.
• R.S. Burt: Positions in Networks. Social Forces, 55:93-122, 1976.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 48
References • P. DiMaggio: Structural Analysis of Organizational Fields: A Blockmodel Approach.
Research in Organizational Behavior, 8:335-70, 1986.
• P. Doreian, V. Batagelj, A. Ferligoj: Generalized Blockmodeling. Cambridge University Press, 2005.
• M.G. Everett, S. P. Borgatti: Regular Equivalence: General Theory. Journal of Mathematical Sociology, 19(1):29-52, 1994.
• K. Faust, A.K. Romney: Does Structure Find Structure? A critique of Burt's Use of Distance as a Measure of Structural Equivalence. Social Networks, 7:77-103, 1985.
• K. Faust, S. Wasserman: Blockmodels: Interpretation and Evaluation. Social Networks, 14:5–61. 1992.
• R.A. Hanneman, M. Riddle: Introduction to Social Network Methods. University of California, Riverside, 2005.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 49
References • F. Lorrain, H.C. White: Structural Equivalence of Individuals in Social Networks.
Journal of Mathematical Sociology, 1:49-80, 1971.
• L.D. Sailer: Structural Equivalence: Meaning and Definition, Computation, and Applications. Social Networks, 1:73-90, 1978.
• M.K. Sparrow: A Linear Algorithm for Computing Automorphic Equivalence Classes: The Numerical Signatures Approach. Social Networks, 15:151-170, 1993.
• S. Wasserman, K. Faust: Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
• H.C. White, S. A. Boorman, R. L. Breiger: Social Structure from Multiple Networks I. Blockmodels of Roles and Positions. American Journal of Sociology, 81:730-780, 1976.
• D.R. White, K. Reitz: Graph and Semi-Group Homomorphism on Networks and Relations. Social Networks, 5:143-234, 1983.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 50
References Stochastic blockmodels
• E.M. Airoldi, D.M. Blei, S.E. Fienberg, E.P. Xing: Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9:1981-2014, 2008.
• P.W. Holland, K.B. Laskey, S. Leinhardt: Stochastic Blockmodels: Some First Steps. Social Networks, 5:109-137, 1983.
• C. Kemp, J.B. Tenenbaum, T.L. Griffiths, T. Yamada, N. Ueda: Learning Systems of Concepts with an Infinite Relational Model. AAAI 2006.
• P.S. Koutsourelakis, T. Eliassi-Rad: Finding Mixed-Memberships in Social Networks. AAAI Spring Symposium on Social Information Processing, Stanford, CA, 2008.
• K. Nowicki ,T. Snijders: Estimation and Prediction for Stochastic Blockstructures, Journal of the American Statistical Association, 96:1077–1087, 2001.
• Z. Xu, V. Tresp, K. Yu, H.-P. Kriegel: Infinite Hidden Relational Models. UAI 2006.
• S. Wasserman, C. Anderson: Stochastic a Posteriori Blockmodels: Construction and Assessment, Social Networks, 9:1-36, 1987.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 51
References Role Discovery
• K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, C. Faloutsos: It's Who Your Know: Graph Mining Using Recursive Structural Features. KDD 2011: 663-671.
• K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, L. Li, C. Faloutsos: RolX: Structural Role Extraction & Mining in Large Graphs. Technical Report, Lawrence Livermore National Laboratory, Livermore, CA, 2011.
• Ruoming Jin, Victor E. Lee, Hui Hong: Axiomatic ranking of network role similarity. KDD 2011: 922-930.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 52
References Community Discovery
• A. Clauset, M.E.J. Newman, C. Moore: Finding Community Structure in Very Large Networks. Phys. Rev. E., 70:066111, 2004.
• M.E.J. Newman: Finding Community Structure in Networks Using the Eigenvectors of Matrices. Phys. Rev. E., 74:036104, 2006.
Propositionalisation
• A.J. Knobbe, M. de Haas, A. Siebes: Propositionalisation and Aggregates. PKDD 2001: 277-288.
• M.-A. Krogel, S. Rawles, F. Zelezny, P.A. Flach, N. Lavrac, S. Wrobel: Comparative Evaluation of Approaches to Propositionalization. ILP 2003: 197-214.
• J. Neville, D. Jensen, B. Gallagher: Simple Estimators for Relational Bayesian Classifiers. ICDM 2003: 609-612.
SDM'12 Tutorial T. Eliassi-Rad & C. Faloutsos 53
T. Eliassi-Rad & C. Faloutsos 54
Back to Overview
SDM'12 Tutorial
Roles
Features
Anomalies
Patterns
= rare roles