Communities & Roles Two types ways of identifying nodes that “go together”...

Preview:

Citation preview

Communities & RolesTwo types ways of identifying nodes that “go together”

a)Communities/Groupsa) Cohesive subgroups literature: start w. Freemanb) Network Operationalization

a) Graph Theoreticb) Heuristic Algorithms

a) Graph search & modularityb) Cluster analysisc) LDA/Principle components

c) Fundamental limitations

b)Roles/Positionsa) Literature grounded in structural anthropology & kinshipb) Roles as relations imply paired setsc) Goal is to identify nodes with common patterns

a) Original is CONCORb) Alternatives based on triads, other clusterings

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Focus on collectivities that are: “Relatively small, informal, and involve close personal ties.” What we would call “Primary Groups”

What (network) structure characterizes such a group?

Goal: Identify (a) non-overlapping groups that allow one to (b) identify internal group structure.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Winship’s Model:

1) Assign people to equivalence classes that are hierarchically nested:

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

In words, this means that whatever metric you define, a person is closer to themselves than to anyone else, that the relation be symmetric, and that triads be transitive (which, given the symmetric condition, means that they be complete).

You can then identify partitions by scaling the proximity, such that these three conditions are met.

Winship’s Model:

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

A B C D E F G H I J KA . 5 5 4 4 4 4 3 3 3 3 B 5 . 5 4 4 4 4 3 3 3 3 C 5 5 . 4 4 4 4 3 3 3 3 D 4 4 4 . 5 5 5 3 3 3 3 E 4 4 4 5 . 5 5 3 3 3 3 F 4 4 4 5 5 . 5 3 3 3 3 G 4 4 4 5 5 5 . 3 3 3 3 H 3 3 3 3 3 3 3 . 5 5 5 I 3 3 3 3 3 3 3 5 . 5 5 J 3 3 3 3 3 3 3 5 5 . 5 K 3 3 3 3 3 3 3 5 5 5 .

Winship’s Model:

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

total

{A-G} {H-K}

{A-C} {D-G}

Winship’s Model:

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Granovetter’s Model:

Proceed exactly as in Winship, but treat intransitivity differently when looking at strong or weak ties.

If x and y are strongly connected, and y and z are strongly connected, then x and z should be at least weakly connected.

An example of a graph fitting the prohibition against G-intransitive relations.

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Granovetter’s Model:

Social Sub-groups

The Davis - “Old South” Example

Social Sub-groups

The Davis - “Old South” Example: Ties > 2

Social Sub-groups

The Davis - “Old South” Example: Ties > 3

Social Sub-groups

The Davis - “Old South” Example: Ties > 4

Meets the G-transitivity condition

Social Sub-groups

The Davis - “Old South” Example: Ties > 5

Stronger than the G-transitivity condition

Social Sub-groups

Lin Freeman: The sociological concept of “Group”

Freeman argues that the G-intransitivity model fits the data best for each of the 7 groups he studies.

Substantively, the types of groups this model predicts are very similar to those predicted by the general transitivity model, except re-cast as a valued relation.

Empirically, if you want to identify groups based on levels like this, you can use PAJEK and walk through the model in just the same way as we did with “Old South” or you can use UCI-NET (or program it, it’s not hard)

Methods: How do we identify primary groups in a network?

A) Classic graph theoretical methods: Cliques and extensions of cliques•Cliques•k-cores•k-plexes•Freeman (1992) Models•K-components (we talked about these already)

B) Algorithmic methods: search through a network trying to maximize for a particular pattern (I.e. like Frank & Yasumoto)

•Adjust assignment of actors to groups until a particular pattern of ties (block diagonal, usually) is identified.•Standard models:

- Factions (UCI-NET)- KliqueFinder (Frank)-RNM/CROWDS/JIGGLE (Moody)-Principle component analysis (PCA)-Flow models (MCL)-Modularity Maximization routines- General Distance & Clustering Methods

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

Start with a clique. A clique is defined as a maximal subgraph in which every member of the graph is connected to every other member of the graph. Cliques are collections of nodes where density = 1.0.

Properties of cliques:•Density: 1.0•Everyone connected to n-1 alters•Distance between every pair is 1•Ratio of within group ties to between group ties is infinite

•All triads are transitive

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

In practice, complete cliques are not very useful. They tend to overlap heavily and are limited in their size.

Graph theorists have thus relaxed the complete connectivity requirement (with varying degrees of success). See the Moody & White paper on cohesion for a discussion of many of these attempts.

Methods: How do we identify primary groups in a network?

Graph Theoretical Models.

k-cores: Every person connected to at least k other people.

Ideally, they would look something like this (here two 3-cores).

However, adding a single tie from A to B would make the whole graph a 3-core

Methods: How do we identify primary groups in a network?Graph Theoretical Models.

Extensions of this idea include:

K-plex: Every member connected to at least n-k other people in the graph (recall in a clique everyone is connected to n-1, so this relaxes that condition.

n-clique: Every person is connected by a path of N or less (recall a clique is with distance = 1).

N-clan: same as an n-clique, but all paths must be inside the group.

I’ve never had much luck with any of these methods empirically. Real data is usually too messy to work well. You should try them, and gain some intuition for yourself. The place to start is in UCINET.

Methods: How do we identify primary groups in a network?

UCINET will compute all of the best-known graph theoretic treatments for subgroups

Graph Theoretical Models.

Methods: How do we identify primary groups in a network?

Consider running different methods on a known group structure:

Graph Theoretical Models.

Methods: How do we identify primary groups in a network?Graph Theoretical Models.

Methods: How do we identify primary groups in a network?CliquesGraph Theoretical Models.

Methods: How do we identify primary groups in a network?

The only way to get something meaningful from this is to analyze the clique overlap matrix, which is what the “Clique by partion” dataset does, using cluster analysis

Cliques

Heuristic strategies for identifying primary groups: Search:

1) Fit Measure: Identify a measure of groupness (usually a function of the number of ties that fall within group compared to the number of ties that fall between group).2) Algorithm to maximize fit. Once we have the index, we need a clever method for searching through the network to maximize the fit.

Destroy:Break apart the network in strategic ways, removing the weakest parts first, what’s left are your primary groups. See “edge betweeness” “MCL”

Evade:Don’t look directly, instead find a simpler problem that correlates:Examples: Generalized cluster analysis, Factor Analysis, RM.

Methods: How do we identify primary groups in a network?

Segregation Index(Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and Research 6411-30.)

Freeman asked how we could identify segregation in a social network. Theoretically, he argues, if a given attribute (group label) does not matter for social relations, then relations should be distributed randomly with respect to the attribute. Thus, the difference between the number of cross-group ties expected by chance and the number observed measures segregation.

)(

)(

XE

XXESeg

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Consider the (hypothetical) network below. There are two attributes in this network: people with Blue eyes and Brown eyes and people who are square or not (they must be hip).

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Segregation Index

Mixing Matrix:

Blue Brown

Blue 6 17

Brown 17 16

Hip Square

Hip 20 3

Square 3 30

Seg = -0.25

Seg = 0.78

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Segregation Index

One problem with the segregation index is that it is not ‘margin free.’ That is, if you were to change the distribution of the category of interest (say race) by a constant but not the core association between race and friendship choice, you can get a different segregation level.

One antidote to this problem is to use odds ratios. In this case, and odds ratio tells us the relative likelihood that two people in the same category will choose each other as friends.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

rse

gn

om

log_or-.602628 1.8946

-.176744

.684106

Log(Same-Sex Odds Ratio)

Fri

ends

hip

Seg

rega

tion

Ind

ex

Segregation index compared to the odds ratio:

r=.95

Complete Network AnalysisNetwork Connections: Social Subgroups

The second problem is that the Segregation index has no clear maximum – if every node is assigned to a single group the value can be higher than if everyone is assigned to the “right” group. -- it tends to have a monotonically changing score. This means you can’t just keep adjusting nodes until you see a best fit, but instead have to look for changes in fit.

The modularity score solves this problem by re-organizing the expectation in a way that forces the value to 0 if everyone is in a single group.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

We can also measure the extent that ties fall within clusters with the modularity score:

Where:m is the number of edgesk is the degreeAij is the edge weight between ij(cicj) is 1 if in the same group is the resolution parameter

Q has the advantage of going to 0 if there is only 1 group, which means maximizing the score is sensible. Note resolution parameter means N of groups is not truly “automatic”

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Modularity Scores Comparison to Segregation Index – comparing values for known solutions

Modularity Score Plotted against Segregation Index for various nets

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Number of groups

In-group Density

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

•Louvain Method (Blondel et al) in PAJEK & R•Factions in UCI-NET

•Multiple options for the exact factor maximized. I recommend either the density or the correlation function, and I would calculate the distance in each case.

•Frank’s KliqueFinder•Moody’s crowds / Jiggle•Generalized blockmodel in PAJEK•iGraph (R) has a couple that see this sort (Fast-Greedy is good)

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Factions in UCI-NET

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

Factions in UCI-NET

Factions in UCI-NET

Factions in UCI-NET

Reduced BlockMatrix

1 2 3 4 5 6

-- -- -- -- -- --

1 59 1 2 14 1 0

2 1 54 0 1 12 2

3 1 2 55 0 1 12

4 9 1 1 51 0 0

5 0 12 2 0 62 1

6 1 0 9 2 0 64

Fit perfectly

UCINETBiggest drawbacks of FACTIONS are:

A) SLOWB) Have to specify the number of groups.

Methods: How do we identify primary groups in a network?Search: Optimize a partition to fit

R – “Fast Greedy”

This is a direct optimization of Modularity

PAJEK – “Louvain”

This is a direct optimization of Modularity

Cluster analysis

In addition to tools like FACTIONS, we can use the distance information contained in a network to cluster observations that are ‘close’ to each other. In general, cluster analysis is a set of techniques that allows you to identify collections of objects that are simmilar to each other in some degree.

A very good reference is the SAS/STAT manual section called, “Introduction to clustering procedures.” (http://wks.uts.ohio-state.edu/sasdoc/8/sashtml/stat/chap8/index.htm)

(See also Wasserman and Faust, though the coverage is spotty).

We are going to start with the general problem of hierarchical clustering applied to any set of analytic objects based on similarity, and then transfer that to clustering nodes in a network.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis

Imagine a set of objects (say people) arrayed in a two dimensional space. You want to identify groups of people based on their position in that space.

How do you do it?

How Cool you are

How

Sm

art y

ou a

re

Start by choosing a pair of people who are very close to each other (such as 15 & 16) and now treat that pair as one point, with a value equal to the mean position of the two nodes.

x

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Now repeat that process for as long as possible.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

This process is captured in the cluster tree (called a dendrogram)

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

As with the network cluster algorithms, there are many options for clustering. The three that I use most are:

•Ward’s Minimum Variance -- the one I use almost 95% of the time•Average Distance -- the one used in the example above•Median Distance -- very similar

Again, the SAS manual is the best single place I’ve found for information on each of these techniques.

Some things to keep in mind:Units matter. The example above draws together pairs

horizontally because the range there is smaller. Get around this by standardizing your data.

This is an inductive technique. You can find clusters in a purely random distribution of points. Consider the following example.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

data random; do i=1 to 20; x=rannor(0); y=rannor(0); output; end;run;

The data in this scatter plot are produced using this code:

Cluster analysis

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis Resulting dendrogram

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysisResulting cluster solution

Cluster analysis

Cluster analysis works by building a distance matrix between each pair of points. In the example above, it used the Euclidean distance which in two dimensions is simply the physical distance between the points in a plot.

Can work on any number of dimensions.

To use cluster analysis in a network, we base the distance on the path-distance between pairs of people in the network.

Consider again the blue-eye hip example:

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysis

Distance Matrix0 1 3 2 3 3 4 3 3 2 3 2 2 1 11 0 2 2 2 3 3 3 2 1 2 2 1 2 13 2 0 3 2 4 3 3 2 1 1 1 2 2 32 2 3 0 1 1 2 1 1 2 3 3 3 2 13 2 2 1 0 2 1 1 1 1 2 2 3 3 23 3 4 1 2 0 1 1 2 3 4 4 4 3 24 3 3 2 1 1 0 2 2 2 3 3 4 4 33 3 3 1 1 1 2 0 1 2 3 3 4 3 23 2 2 1 1 2 2 1 0 1 2 2 3 3 22 1 1 2 1 3 2 2 1 0 1 1 2 2 23 2 1 3 2 4 3 3 2 1 0 1 2 2 32 2 1 3 2 4 3 3 2 1 1 0 1 1 22 1 2 3 3 4 4 4 3 2 2 1 0 2 21 2 2 2 3 3 4 3 3 2 2 1 2 0 11 1 3 1 2 2 3 2 2 2 3 2 2 1 0

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

The distance matrix implies a space that nodes are embedded within. Using something like MDS, we can represent the space implied by the distance matrix in two dimensions. This is the image of the network you would get if you did that.

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

Cluster analysisWhen you use variables, the cluster analysis program generates a distance matrix. We can, instead use the network distance matrix directly. If we do that with this example network, we get the following:

Cluster analysis

Cluster analysis

In SAS you use two commands to get a cluster analysis. The first does the hierarchical clustering. The second analyzes the cluster output to create the tree.

Example 1. Using variables to define the space (like income and musical taste):

proc cluster data=a method=ave out=clustd std;var x y;id node;run;

proc tree data=clustd ncl=5 out=cluvars;run;

Cluster analysisExample 2. Using a pre-defined distance matrix to define the space (as in a social network).You first create the distance matrix (in IML), then use it in the cluster program.

proc iml; %include 'c:\moody\sas\programs\modules\reach.mod';

/* blue eye example */

mat2=j(15,15,0); mat2[1,{2 14 15}]=1; /* lines cut here */ mat2[15,{1 14 2 4}]=1;

dmat=reach(mat2); mattrib dmat format=1.0;

print dmat; id=1:nrow(dmat); id=id`;

ddat=id||dmat;

create ddat from ddat; /* creates the dataset */ append from ddat;

quit;

data ddat (type=dist); /* tells SAS it is a distance */ set ddat; /* matrix */run;

Cluster analysisExample 2. Using a pre-defined distance matrix to define the space (as in a social network).Once you have it, the cluster program is just the same.

proc cluster data=ddat method=ward out=clustd;id col1;run;

proc tree data=clustd ncl=3 out=netclust;copy col1;run;

proc freq data=netclust;tables cluster;run;

proc print data=netclust;var col1 cluster;run;

Moody’s CROWDS algorithm combines the search approach with an initial cluster analysis and a routine for determining how many clusters are in the network. It does so by using the Segregation index and all of the information from the cluster hierarchy, combining two groups only if it improves the segregation fit for both groups.

.395.341 .319 .254

.404 .185 .614

.197 .372

.394

.279 .238 .224

.370

.325.368 .473.285.171

.589

.679 .496

.398 .255

.387

.701

.402.410

.555 .400

.646

.692

.085.127

.762

.735

.745

.745

Total

Methods: How do we identify primary groups in a network?Evade: Find a “cheap” indicator, and cluster/optimize that

The logic behind these algorithms is that you remove some weak links and see what is left. Most popular is the “edge betweenness” algorithm.

Methods: How do we identify primary groups in a network?Destroy: Remove lines/nodes until what is left over reveals something of interest

UCINET has the MCL (Markov clustering, based on flow betweenness in a random walk sense) algorithm programmed.

Methods: How do we identify primary groups in a network?Destroy: Remove lines/nodes until what is left over reveals something of interest

“Evade” – look for something that correlates with your split

Newman’s Leading Eigenvector (in R – this is the “bottom” partition, not the best fit, which aggregates/joins from here)

The Recursive Neighborhood Means algorithm creates the variables that are then used in the cluster analysis to identify groups.

•Start by randomly assigning every node a value on k variables•Then calculate the average for each variable for the people each person is tied to•Repeat this process multiple times

This results in people who have many ties to each other having similar values on the k random variables. This similarity then gets picked up in a cluster analysis.

“Evade” – look for something that correlates with your split

Example of the RNM procedure

Time 1 Time 2 Time 3

Example of the RNM procedure

As an example, consider the process active on a known-to-be clustered networks, starting with 2 random k variables.

You get something like this, where the nodes are now placed according to their resulting values on the 2 variables.

The algorithm does a good job uncovering clusters in fake datasets.

The algorithm does a good job uncovering clusters in fake datasets.

Compared to real data:

RNM Partition on the Prison data

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.

SES IQ

IncomeMathScore

1.0 1.0

0.0 0.0

We often use simple indicators and assume they measure our concepts

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.

SES IQ

IncomeReading

ScoreOccupation

Highest Degree

House Size

LanguagesSpoken

MathScore

But we don’t have to! We can imagine that each latent concept causes our indicators, and build a measurement model.

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.

But we don’t have to! We can imagine that each latent concept causes our indicators, and build a measurement model.

33

22

11

)(

)(

)(

sesHouseSize

sesOccupation

sesIncome

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.

In a network, we assume that the tie pattern is an imperfect measure of an underlying latent structure that we can explain with similar factors. Instead of lots of “measurements” we have many columns in the adjacency (sim) matrix, and we can summarize that with factor scores.

-- works best if the similarity matrix has more information – so multiple account data are perfect.– or you can transform the data in some way to more

information (like use a distance matrix.

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.

/* this section builds info on how to weight dyads for in-group, out-group. */

twostp=((adjmat+adjmat`)>0)*adjmat; /* make it either direction w. the first term */ttie=adjmat#twostp; /*=1 if tie contributes to a transitive triple */ttie=((ttie+ttie`));

adjraw=adjmat; adjmat=(adjmat+adjmat`); /* force it to be symetric, 1=asym 2=reciped */

adjmat=adjmat-diag(adjmat); /* remove any self ties */d2=reachlim((adjmat>0),3);

/* re-weight to bias toward recip ties */wm_4 = (d2=1)#(adjmat=2)#8; /* recip direct ties */wm_2a = (d2=1)#(adjmat=1)#4; /* unrecip direct ties */wm_1 = 2*(d2=2);/* ties 2-steps out */wm_p5 = 0*(d2=3); /* ties 3-steps out - note it's zeroed out here*/wm=wm_4+wm_2a+wm_1++wm_p5+(3*(ttie/(max(ttie)))); /* transitivity is at the end*/wm=wm-diag(wm);

Here is code I used in the PROSPER data:

Strategies for identifying primary groups: Evade

Factor Analysis: Treat the adjacency/similarity matrix as a set of N variables and look for latent factors that explain the variance in the data.Here is code I used in the PROSPER data: /* run factor analysis. Note nfactors is a high value, should only take those

w. EV > 2, but this gives us room... */

proc factor rotate=varimax min=&minev out=factset data=symmat nfactors=175

outstat=fscores noprint;

run; quit;

Strategies for identifying primary groups: Evade

Result:

Strategies for identifying primary groups: Evade

Result:

Each column is a person, these are the factor loadings for each person on each retained factor.

Strategies for identifying primary groups: Evade

Result:

Sociogram for a single school

Strategies for identifying primary groups: Evade

Result:

Sociogram for a single school.

Problem is that there are no necessary connectivity checks – you can get “groups” that are disconnected.

Biggest strengths are:a) Really fastb) Allows for overlapping

groupsc) Gives you “embeddedness”

scores based on factor loadigs

The Crowds Algorithm1. Identify members of network bicomponents, remove people not included.

2. Cluster the reduced network. - Identify optimal number of groups: (TREEWALK) - For each level of the cluster partition tree do (BFS): -Move up the tree from smaller to larger groups. -If the fit for both groups is improved by joining them then do so. -If not, then identify group at that level. -End TREEWALK.

Do until all groups are identified (GLOBAL LOOP): 3. Evaluate node fit. Do until nodes cannot be moved: For each identified cluster do (GRPCHECK):

- Ensure group is a bi-component. -Calculate effect on group a of moving node j to group a. -Calculate effect on j's present group of removing j. - If there is a positive net gain to moving j from own group to a, then do so. End. 4. Identify Bridging members.-If removing j from group a would improve the fit of group a, AND assigning j to any other group would lower

the fit for that group, then j is considered a bridge. Place all bridges in separate class.5. Group Check.Check returns to combining groups. IF merging groups would improve the fit of all groups to be merged, then do so.- Evaluate bridges, to be sure that they are not bridging two groups that have now merged. End Global loop. 

Strategies for identifying primary groups: Hybrid

Social Sub-groups

Frank & Yasumoto: Action and Structure

They expect to find evidence of enforceable trust within social subgroups and evidence of reciprocity between such groups.

To do so, they must identify primary subgroups within the network. They do so using a density based criterion. Frank’s algorithm iteratively assigns nodes to subgroups until a parameter that maximizes in-group density is reached. Basic model is:

logit(Yij)= + ij

Seek to find an assignment of nodes to groups (g) that maximizes fit. This results in a ‘block diagonal’ adjacency matrix, where most of the ties fall along the diagonal.

Relations among the French Financial Elite (as drawn by F&Y)

Group-weighted MDS

Relations within group are weighted heavier than between to generate this picture:

Return to first question: What is a group?

•The simple notions of a complete clique are difficult to square w. real-world data.•Density is an indicator, but subject to over-grouping (no connectivity) and star-patterns.•Groups are likely internally differentiated – with “core” vs. “periphery” members

•Most sociological theories of groups rest on transitive closure and short distances •There’s a sense that members are equal – a tight-knit group•The group should be fairly small – face-to-face scale•The social processes underlying the group turn on reciprocity, trust, communication, homogeneity of norms & beliefs.•Almost all require a comparative set: in-group to out-group. It is relational not essential.•Cross-cutting social circles – would lead us to expect overlapping groups, but in practice most methods do not do that, as it’s analytically too cumbersome.

Practically, group detection is hard and most methods will give you (slightly) different results. You can compare results using a Rand statistic (proportion of pairs similarly categorized in two partitions), but for small settings these differences can matter.

Fast & Greedy Louvain Edge Between

Markov Chain Leading Eigenvect RNM (CROWDS)

Overview•Social life can be described (at least in part) through social roles.•To the extent that roles can be characterized by regular interaction patterns, we can summarize roles through common relational patterns.•Identifying these sets is the goal of block-model analyses.

Nadel: The Coherence of Role Systems•Background ideas for White, Boorman and Brieger. Social life as interconnected system of roles•Important feature: thinking of roles as connected in a role system = social structure

White, Boorman and Breiger: Social structure from Multiple Networks I. Blockmodels of Roles and Positions

•The key article describing the theoretical and technical elements of block-modeling

Roles & Positions

Nadel: The Coherence of Role Systems

Elements of a Role:

•Rights and obligations with respect to other people or classes of people

•Roles require a ‘role compliment’ another person who the role-occupant acts with respect to

Examples:Parent - child, Teacher - student, Lover - lover, Friend - Friend,

Husband - Wife, etc.

Nadel (Following functional anthropologists and sociologists) defines ‘logical’ types of roles, and then examines how they can be linked together.

Nadel describes how various roles fit together to form a coherent whole. Roles are collected in people through the ‘summation of roles”

Necessary:Some roles fit together necessarily. For example, the expected

interaction patterns of “son-in-law” are implied through the joint roles of “Husband” and “Spouse-Parent”

Coincidental:Some roles tend to go together empirically, but they need not

(businessman & club member, for example).

Distinguishing the two is a matter of usefulness and judgement, but relates to social substitutability. The distinction reverts to how the system as a whole will be held together in the face of changes in role occupants.

Nadel: The Coherence of Role Systems

Nadel: The Coherence of Role Systems

Given that roles can be identified as ‘going together’ is there a logic that underlies their connection? Nadel uses a functional description based on ascription and achievement:

Nadel: The Coherence of Role Systems

And he gives an example of a simple role system:

Nadel’s task is to make sense of these roles, to identify how they are interconnected to form a system -- a coherent structure.

This is a difficult task to do analytically, as the eventual failure of Parsonian functionalism shows.

White et al: From logical role systems to empirical social structures

With the fall of parsons and functionalism in the late 60s, many of the ideas about social structure and system were also tossed. White et al demonstrate how we can understand social structure as the intercalation of roles, without the a priori logical categories.

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might represent a family as:

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Thus, we might see an exchange network such as:

Provides food for

Romantic Love

Bickers with

White et al: From logical role systems to empirical social structures

Start with some basic ideas of what a role is: An exchange of something (support, ideas, commands, etc) between actors. Which is a summary of a (sort of) family.

H W

C

C

C

Provides food for

Romantic Love

Bickers with(and there are, of course, many other relations inside the family)

White et al: From logical role systems to empirical social structures

White et al: From logical role systems to empirical social structures

The key idea, is that we can express a role through a relation (or set of relations) and thus a social system by the inventory of roles. If roles equate to positions in an exchange system, then we need only identify particular aspects of a position. But what aspect? Block modeling focuses on equivalence positions.

Structural Equivalence

Two actors are structurally equivalent if they have the same types of ties to the same people. That is, they have the exact same ties.

Structural Equivalence

A single relation

Structural Equivalence

Graph reduced to positions

Alternative notions of equivalence

Instead of exact same ties to exact same alters, you look for nodes with similar ties to similar types of alters

Blockmodeling: basic steps

In any positional analysis, there are 4 basic steps:

1) Identify a definition of equivalence2) Measure the degree to which pairs of actors are equivalent3) Develop a representation of the equivalencies4) Assess the adequacy of the representation

1) Identify a definition of equivalence

Structural Equivalence: Two actors are equivalent if they have the same type of ties to the same people.

Automorphic Equivalence:

Actors occupy indistinguishable structural locations in the network. That is, that they are in isomorphic positions in the network.

Two graphs are isomorphic if there is some mapping of nodes to positions that equates the two. For example, all 030T triads are isomorphic.

A graph is automorphic, if there are patterns internal to the graph that are equated (if the mapping goes from the set of nodes in the graph to other nodes in the graph). In general, automorphicaly equivalent nodes are equivalent with respect to all graph theoretic properties (I.e. degree, number of people reachable, centrality, etc.) and are structurally indistinguishable.

Key difference from structural equivalence is relaxing of the necessity of being linked to the same nodes.

1) Identify a definition of equivalence

Automorphic Equivalence:

Regular Equivalence:Regular equivalence does not require actors to have identical ties to

identical actors or to be structurally indistinguishable.

Actors who are regularly equivalent have identical ties to and from equivalent actors.

If actors i and j are regularly equivalent, and actor i has a tie to/from some actor, k, then actor j must have the same kind of tie to/from some actor l, and actors k and l must be regularly equivalent.

So effectively this is a recursive definition, and not necessarily unique. There may be several ways to assign actors to clusters that satisfy this definition.

(This is related to graph colorings, regular equivalence definitions are those where nodes have neighbors of the same color).

1) Identify a definition of equivalence

Regular Equivalence:

There may be multiple regular equivalence partitions in a network, and thus we tend to want to find the maximal regular equivalence position, the one with the fewest positions.

Role or Local Equivalence:While most equivalence measures focus on position within the full

network, some measures focus only on the patters within the local tie neighborhood. These have been called ‘local role’ equivalence.

Note that:Structurally equivalent actors are automorphically equivalent,Automorphically equivalent actors are regularly equivalent.

Structurally equivalent and automorphically equivalent actors are role equivalent

In practice, we tend to ignore some of these fine distinctions, as they get blurred quickly once we have to operationalize them in real graphs. It turns out that few people are ever exactly equivalent, and thus we approximate the links between the types.

In all cases, the procedure can work over multiple relations simultaneously.

The process of identifying positions is called blockmodeling, and requires identifying a measure of similarity among nodes.

0 1 1 1 0 0 0 0 0 0 0 0 0 01 0 0 0 1 1 0 0 0 0 0 0 0 01 0 0 1 0 0 1 1 1 1 0 0 0 01 0 1 0 0 0 1 1 1 1 0 0 0 00 1 0 0 0 1 0 0 0 0 1 1 1 10 1 0 0 1 0 0 0 0 0 1 1 1 10 0 1 1 0 0 0 0 0 0 0 0 0 00 0 1 1 0 0 0 0 0 0 0 0 0 00 0 1 1 0 0 0 0 0 0 0 0 0 00 0 1 1 0 0 0 0 0 0 0 0 0 00 0 0 0 1 1 0 0 0 0 0 0 0 00 0 0 0 1 1 0 0 0 0 0 0 0 00 0 0 0 1 1 0 0 0 0 0 0 0 00 0 0 0 1 1 0 0 0 0 0 0 0 0

Blockmodeling is the process of identifying these types of positions. A block is a section of the adjacency matrix - a “group” of people.

Here I have blocked structurally equivalent actors

. 1 1 1 0 0 0 0 0 0 0 0 0 01 . 0 0 1 1 0 0 0 0 0 0 0 01 0 . 1 0 0 1 1 1 1 0 0 0 01 0 1 . 0 0 1 1 1 1 0 0 0 00 1 0 0 . 1 0 0 0 0 1 1 1 10 1 0 0 1 . 0 0 0 0 1 1 1 10 0 1 1 0 0 . 0 0 0 0 0 0 00 0 1 1 0 0 0 . 0 0 0 0 0 00 0 1 1 0 0 0 0 . 0 0 0 0 00 0 1 1 0 0 0 0 0 . 0 0 0 00 0 0 0 1 1 0 0 0 0 . 0 0 00 0 0 0 1 1 0 0 0 0 0 . 0 00 0 0 0 1 1 0 0 0 0 0 0 . 00 0 0 0 1 1 0 0 0 0 0 0 0 .

1 2 3 4 5 61 0 1 1 0 0 02 1 0 0 1 0 03 1 0 1 0 1 04 0 1 0 1 0 1 5 0 0 1 0 0 06 0 0 0 1 0 0

Once you block the matrix, reduce it, based on the number of ties in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present):

Structural equivalence thus generates 6 positions in the network

1 2 3 4 5 6

12

3

4

5

6

. 1 1 1 0 0 0 0 0 0 0 0 0 01 . 0 0 1 1 0 0 0 0 0 0 0 01 0 . 1 0 0 1 1 1 1 0 0 0 01 0 1 . 0 0 1 1 1 1 0 0 0 00 1 0 0 . 1 0 0 0 0 1 1 1 10 1 0 0 1 . 0 0 0 0 1 1 1 10 0 1 1 0 0 . 0 0 0 0 0 0 00 0 1 1 0 0 0 . 0 0 0 0 0 00 0 1 1 0 0 0 0 . 0 0 0 0 00 0 1 1 0 0 0 0 0 . 0 0 0 00 0 0 0 1 1 0 0 0 0 . 0 0 00 0 0 0 1 1 0 0 0 0 0 . 0 00 0 0 0 1 1 0 0 0 0 0 0 . 00 0 0 0 1 1 0 0 0 0 0 0 0 .

1 2 31 1 1 02 1 1 1 3 0 1 0

Once you partition the matrix, reduce it:

Regular equivalence

1 2

3

(here I placed a one in the image matrix if there were any ties in the ij block)

To get a block model, you have to measure the similarity between each pair. If two actors are structurally equivalent, then they will have exactly similar patterns of ties to other people. Consider the example again:

. 1 1 1 0 0 0 0 0 0 0 0 0 01 . 0 0 1 1 0 0 0 0 0 0 0 01 0 . 1 0 0 1 1 1 1 0 0 0 01 0 1 . 0 0 1 1 1 1 0 0 0 00 1 0 0 . 1 0 0 0 0 1 1 1 10 1 0 0 1 . 0 0 0 0 1 1 1 10 0 1 1 0 0 . 0 0 0 0 0 0 00 0 1 1 0 0 0 . 0 0 0 0 0 00 0 1 1 0 0 0 0 . 0 0 0 0 00 0 1 1 0 0 0 0 0 . 0 0 0 00 0 0 0 1 1 0 0 0 0 . 0 0 00 0 0 0 1 1 0 0 0 0 0 . 0 00 0 0 0 1 1 0 0 0 0 0 0 . 00 0 0 0 1 1 0 0 0 0 0 0 0 .

1 2 3 4 5 6

12

3

4

5

6

C D Match1 1 10 0 1. 1 01 . 00 0 10 0 11 1 1 1 1 1 1 1 11 1 10 0 10 0 10 0 10 0 1Sum: 12

C and D match on 12 other people

If the model is going to be based on asymmetric or multiple relations, you simply stack the various relations:

H W

CC

C

Provides food for

Romantic Love

Bickers with

Romance0 1 0 0 01 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0

Feeds0 0 1 1 10 0 1 1 10 0 0 0 00 0 0 0 00 0 0 0 0

Bicker0 0 0 0 00 0 0 0 00 0 0 1 10 0 1 0 00 0 1 1 0

0 1 0 0 01 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 1 1 10 0 1 1 10 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 01 1 0 0 01 1 0 0 01 1 0 0 00 0 0 0 00 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 0

Stacked

0 8 7 7 5 5 11 11 11 11 7 7 7 7 8 0 5 5 7 7 7 7 7 7 11 11 11 11 7 5 0 12 0 0 8 8 8 8 4 4 4 4 7 5 12 0 0 0 8 8 8 8 4 4 4 4 5 7 0 0 0 12 4 4 4 4 8 8 8 8 5 7 0 0 12 0 4 4 4 4 8 8 8 811 7 8 8 4 4 0 12 12 12 8 8 8 811 7 8 8 4 4 12 0 12 12 8 8 8 811 7 8 8 4 4 12 12 0 12 8 8 8 811 7 8 8 4 4 12 12 12 0 8 8 8 8 7 11 4 4 8 8 8 8 8 8 0 12 12 12 7 11 4 4 8 8 8 8 8 8 12 0 12 12 7 11 4 4 8 8 8 8 8 8 12 12 0 12 7 11 4 4 8 8 8 8 8 8 12 12 12 0

For the entire matrix, we get:

(number of agreements for each ij pair)

1.00 -0.20 0.08 0.08 -0.19 -0.19 0.77 0.77 0.77 0.77 -0.26 -0.26 -0.26 -0.26-0.20 1.00 -0.19 -0.19 0.08 0.08 -0.26 -0.26 -0.26 -0.26 0.77 0.77 0.77 0.77 0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45 0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36 0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20 0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20 0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20 0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00

The metric used to measure structural equivalence by White, Boorman and Brieger is the correlation between each node’s set of ties. For the example, this would be:

Another common metric is the Euclidean distance between pairs of actors, which you then use in a standard cluster analysis.

The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

1.00 -.77 0.55 0.55 -.57 -.57 0.95 0.95 0.95 0.95 -.75 -.75 -.75 -.75-.77 1.00 -.57 -.57 0.55 0.55 -.75 -.75 -.75 -.75 0.95 0.95 0.95 0.950.55 -.57 1.00 1.00 -1.0 -1.0 0.73 0.73 0.73 0.73 -.75 -.75 -.75 -.750.55 -.57 1.00 1.00 -1.0 -1.0 0.73 0.73 0.73 0.73 -.75 -.75 -.75 -.75-.57 0.55 -1.0 -1.0 1.00 1.00 -.75 -.75 -.75 -.75 0.73 0.73 0.73 0.73-.57 0.55 -1.0 -1.0 1.00 1.00 -.75 -.75 -.75 -.75 0.73 0.73 0.73 0.730.95 -.75 0.73 0.73 -.75 -.75 1.00 1.00 1.00 1.00 -.77 -.77 -.77 -.770.95 -.75 0.73 0.73 -.75 -.75 1.00 1.00 1.00 1.00 -.77 -.77 -.77 -.770.95 -.75 0.73 0.73 -.75 -.75 1.00 1.00 1.00 1.00 -.77 -.77 -.77 -.770.95 -.75 0.73 0.73 -.75 -.75 1.00 1.00 1.00 1.00 -.77 -.77 -.77 -.77-.75 0.95 -.75 -.75 0.73 0.73 -.77 -.77 -.77 -.77 1.00 1.00 1.00 1.00-.75 0.95 -.75 -.75 0.73 0.73 -.77 -.77 -.77 -.77 1.00 1.00 1.00 1.00-.75 0.95 -.75 -.75 0.73 0.73 -.77 -.77 -.77 -.77 1.00 1.00 1.00 1.00-.75 0.95 -.75 -.75 0.73 0.73 -.77 -.77 -.77 -.77 1.00 1.00 1.00 1.00

Concor iteration 1:

Concor iteration 2:1.00 -.99 0.94 0.94 -.94 -.94 0.99 0.99 0.99 0.99 -.99 -.99 -.99 -.99-.99 1.00 -.94 -.94 0.94 0.94 -.99 -.99 -.99 -.99 0.99 0.99 0.99 0.990.94 -.94 1.00 1.00 -1.0 -1.0 0.97 0.97 0.97 0.97 -.97 -.97 -.97 -.970.94 -.94 1.00 1.00 -1.0 -1.0 0.97 0.97 0.97 0.97 -.97 -.97 -.97 -.97-.94 0.94 -1.0 -1.0 1.00 1.00 -.97 -.97 -.97 -.97 0.97 0.97 0.97 0.97-.94 0.94 -1.0 -1.0 1.00 1.00 -.97 -.97 -.97 -.97 0.97 0.97 0.97 0.970.99 -.99 0.97 0.97 -.97 -.97 1.00 1.00 1.00 1.00 -.99 -.99 -.99 -.990.99 -.99 0.97 0.97 -.97 -.97 1.00 1.00 1.00 1.00 -.99 -.99 -.99 -.990.99 -.99 0.97 0.97 -.97 -.97 1.00 1.00 1.00 1.00 -.99 -.99 -.99 -.990.99 -.99 0.97 0.97 -.97 -.97 1.00 1.00 1.00 1.00 -.99 -.99 -.99 -.99-.99 0.99 -.97 -.97 0.97 0.97 -.99 -.99 -.99 -.99 1.00 1.00 1.00 1.00-.99 0.99 -.97 -.97 0.97 0.97 -.99 -.99 -.99 -.99 1.00 1.00 1.00 1.00-.99 0.99 -.97 -.97 0.97 0.97 -.99 -.99 -.99 -.99 1.00 1.00 1.00 1.00-.99 0.99 -.97 -.97 0.97 0.97 -.99 -.99 -.99 -.99 1.00 1.00 1.00 1.00

The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

1.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.001.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.01.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.001.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.01.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.01.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.01.00 -1.0 1.00 1.00 -1.0 -1.0 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00-1.0 1.00 -1.0 -1.0 1.00 1.00 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00

Concor iteration 3:

The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

Concor iteration 3:1.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.01.00 1.00 1.00 1.00 1.00 1.00 1.00 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00-1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 1.00 1.00 1.00 1.00 1.00 1.00 1.00

1347891025611121314

The initial method for finding structurally equivalent positions was CONCOR, the CONvergence of iterated CORrelations.

Repeat the process on the resulting 1-blocks until you have reached structural equivalent blocks

Because CONCOR splits every sub-group into two groups, you get a partition tree that looks something like this:

CONCOR example:

Consider a simple senate voting network:

Network is dense, since every cell has some score and dynamic the pattern changes over time.

Color by structural equivalence…

Network is dense, since every cell has some score and dynamic the pattern changes over time.

Adjust position to collapse SE positions.

CONCOR example:

Consider a simple senate voting network:

Network is dense, since every cell has some score and dynamic the pattern changes over time.

And then adjust color, line width, etc. for clarity.

While we’ve gone some distance with identifying relevant information from the mass, how do we account for time?

CONCOR example:

Consider a simple senate voting network:

CONCOR example:

Repeat at each wave, linking positions over time

CONCOR example:

Automorphic and Regular equivalence are more difficult to find, and require iteratively searching over possible class assignments for sets that have the same graph theoretic patterns. Usually start with a set of nodes defined as similar on a number of network measures, then look within these classes for automorphic equivalence classes.

The classic reference is REGE (White & Reitz 1985), which recursively defines the degree of equivalence between pairs and then adjusts for as many iterations as you specify.

A theoretically appealing method for finding structures that are very similar to regular equivalence, role equivalence, uses the triad census. Each node is involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of these triads. These positions are summarized in the following figure:

Network Sub-Structure: Triads

003

(0)

012

(1)

102

021D

021U

021C

(2)

111D

111U

030T

030C

(3)

201

120D

120U

120C

(4)

210

(5)

300

(6)

Intransitive

Transitive

Mixed

An Example of the triad census

Type Number of triads--------------------------------------- 1 - 003 21--------------------------------------- 2 - 012 26 3 - 102 11 4 - 021D 1 5 - 021U 5 6 - 021C 3 7 - 111D 2 8 - 111U 5 9 - 030T 3 10 - 030C 1 11 - 201 1 12 - 120D 1 13 - 120U 1 14 - 120C 1 15 - 210 1 16 - 300 1---------------------------------------Sum (2 - 16): 63

003

012_S

012_E

012_I

102_D

102_I

021D_S

021D_E

021U_S

021U_E

021C_S

021C_B

021C_E

111D_S

111D_B

111D_E

111U_S

111U_B

111U_E

030T_S

030T_B

030T_E

030C

201_S

201_B

120D_S

120D_E

120U_S

120U_E

120C_S

120C_B

120C_E

210_S

210_B

210_B

300

Triadic Position Census: 36 Positions within 16 Directed TriadsIndicates the position.

Triadic Position Census: 40 Positions within all mutual ties but two types of relations

36 36 10 10 10 10 43 43 43 43 43 43 43 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 020 20 41 41 41 41 14 14 14 14 14 14 14 14 9 9 11 11 11 11 12 12 12 12 12 12 12 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 010 10 1 1 1 1 8 8 8 8 8 8 8 8 2 2 10 10 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 5 5 5 5 1 1 1 1 1 1 1 1

Triad position vectors for the example network, resulting in 3 positions:

1.00 1.00 0.64 0.64 0.64 0.64 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.981.00 1.00 0.64 0.64 0.64 0.64 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.980.64 0.64 1.00 1.00 1.00 1.00 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.500.64 0.64 1.00 1.00 1.00 1.00 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.500.64 0.64 1.00 1.00 1.00 1.00 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.500.64 0.64 1.00 1.00 1.00 1.00 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.500.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.000.98 0.98 0.50 0.50 0.50 0.50 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Correlating each person’s triad position vector with each other persons results in the following table, which clearly shows the positions that are equivalent:

Jefferson High School Sunshine High School

School provides a good boundary for social relations

School does not provide a good boundary for social relations

Complete Network AnalysisNetwork Connections: Role Positions

Jefferson High School Sunshine High School

Image networks. Width of tie is proportional to the ratio of cell density to mean cell density.

34%

32%

33%

4%

43%

52%

Complete Network AnalysisNetwork Connections: Role Positions

Once you have decided on a number of blocks, you need to determine what counts as a ‘one’ block or a ‘zero’ block. Usually this is a some function of the density of the resulting block.

General rules:“Fat Fit” Only put a one in blocks with all ones in the adjacency matrix“Lean Fit” Put a zero if all the cells are zero, else put a one“Density fit” If the average value of the cell is above a certain cutoff.

White, Boorman and Breiger used a ‘lean fit’ (zeroblock) rule for the examples in their paper:

An example: White et al, figure 1.Biomedical Specialty data:

White et al, figure 3.Biomedical Specialty data: Key to structure lies in zero blocks

Recent models

Recent work has generalized blockmodels in two directions:

Specific structural hypothesesexample: Core-periphery models or Structural Hole ideas

Generalized blockmodeling based on particular relationship types & patterns. Pat Doreian’s recent work the the PAJEK folks.

Connectivity sets. Identifying sets of nodes with some common patter of connectivity. This is a merge/mingle of community detection & positions. Moody & White would be an example.

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure.

1 1 1 11 1 1 11 1 1 1

1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1

1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1

An ideal core-periphery network:

Borgatti SP and Everett M G (1999) Models of core/periphery structures. Social Networks 21 375-395

Recent modelsCore-Periphery

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure.

(observed blocked network)

Recent modelsCore-Periphery

(observed blocked network)(Ideal CP blocked network)

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure.

Recent modelsCore-Periphery

(observed blocked network)(Ideal CP blocked network)

A core periphery structure exists to the extent that the correlation between the ideal structure and the observed structure is high. We can search for cores by simply proposing a partition (many times) and then selecting the best fitting partition. But that’s silly-slow!

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure.

Recent modelsCore-Periphery

A continuous version of “coreness” can be had by generalizing the ideal image seen above. Instead of just 0/1, pairs of “high core” nodes have a very strong tie connecting them, and core-periphery nodes have a very low score.

Coreness can thus be defined as a type of centrality, but one that assumes a particular underlying structure to the network. Nodes with high coreness are more likely to be at the center of a core-periphery structure.

As it turns out, coreness is essentially Eigenvector centrality, and UCINET sorts nodes by eigenvector centrality and build the “core” until the correlation between ideal/observed drops.

To identify a core-periphery structure, we compare an observed block structure to an ideal block structure.

Recent modelsCore-Periphery

Recent modelsCore-Periphery

The recent work on generalization focuses on the patterns that determine a block.

Instead of focusing on just the density of a block, you can identify a block as any set that has a particular pattern of ties to any other set.

This work starts from the observation that types of equivalence limit the observed types of blocks. So, for example, regularly equivalent blocks must be either empty, complete, or 1-covered. The “direct” approach is thus to search for these sorts of coverings.

Recent modelsGeneralized Block Models

Recent modelsGeneralized Block Models

Recent modelsGeneralized Block Models

From Carrington, Scott & Wasserman. Models & Methods in Social Network Analysis

“A friend of a friend is a friend”

“The enemy of an enemy is a friend”

+ +

+

+

- -

F x F = F

E x E = F

We can generalize the balance rule to multitudes of “compound relations”

Use matrices for primary relations and matrix multiplication for compounds

Compound Relations.

Compound Relations.

One of the most powerful tools in role analysis involves looking at role systems through compound relations.

A compound relation is formed by combining relations in single dimensions. The best example of compound relations come from kinship.

SiblingChild of

Sibling0 1 0 0 01 0 0 0 00 0 0 1 00 0 1 0 00 0 0 0 0

Child of0 0 1 1 00 0 0 0 10 0 0 0 00 0 0 0 00 0 0 0 0

x =

Nephew/Niece0 0 0 0 10 0 1 1 00 0 0 0 00 0 0 0 00 0 0 0 0

SC = SC

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

Consider a system with two sorts of relations. Here, one is hierarchical and the other defines “within class”.

We can build a role table with Boolean multiplation of the relations

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

“Boss”

X

“Boss”

“boss of my boss is my boss”

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

“On the same level”

X

“On the same level”

“On the same level”

An example of compound relations can be found in W&F. This role table catalogues the compounds for two relations “Is boss of” and “Is on the same level as”

Kinship networks form a foundation to social structures.

In the west, we have 2 primary relations (Parent of, married to) and one partitioning attribute (male or female). So:

Parent of a Parent = GrandparentFather’s Father = Paternal GrandfatherMother’s Father = Maternal GrandfatherWife’s Mother’s Son = Brother-in-lawMother’s Mother’s son’s son = Cousin (mom’s side)

Quality: The entire western kinship structure can be decomposed into a set of equations consisting of only Parent, Child, and Gender.

Quantity: Given a fertility rate of 2 kids, the two-step* kinship neighborhood would have 26 people; if the fertility rate were 3 the same count goes up to 46.

*2-steps includes aunt’s & uncles, but not their spouses.

Compound Relations.

The scientists second rule has to be to look for regularity and exploit that for theory. Consider as a good example, Harrison White’s Kinship model:

Compound Relations.

Ego connects to any of these

Compound Relations.

The scientists second rule has to be to look for regularity and exploit that for theory. Consider as a good example, Harrison White’s Kinship model:

Kinship networks form a foundation to social structures.

In China, we have the same 2 primary relations:

Parent ofMarried to

But 3 partitioning attributes:GenderRelative AgeRelational Order (1st wife, 2nd wife, etc)

This means that compounds we name as equivalent (cousin, uncle) are named differently.

But, while westerners largely ignore gender for anything other than final designation (aunt/uncle, niece/nephew), Chinese kinship terms are differentiated by parent’s line (maternal aunt, maternal uncle, etc.).

We know this designation, but use it rarely.

Compound Relations.

*2-steps includes aunt’s & uncles, but not their spouses.

Compound Relations.

Uncles

Compound Relations.

Compound Relations.

The Chinese extended family network – for “normal” relations westerners would recognize – includes 74 unique kinship terms.

The same set in the west has 28 different terms.

Each of these terms carries a different expected gift exchange system at holidays and mourning attire at death.

Compound Relations.

How has this system changed? Consider the effects of the 1-child policy:

Source: Population research Bureau

With a fertility of 6, 2-step kinship nets would have 166 people; with 2 it’s 26.A full implementation of 1-child removes the “relative age” operator, erasing every kinship term dependent on “older” or “younger” and means that families play either in a maternal or a paternal line, but not both.

Compound Relations.

Using Compound Relations theoretically:

Other work on this general topic:

Using Compound Relations theoretically:

Other work on this general topic:

Methods: How to?

The basic block model formation can be done in multiple ways:

1. Apply any of our group-finding algorithms to a role-based similarity matrix- Here you’re simply converting the conditions for equivalence to

adjacency and solving for modularity. Requires either a community detection algorithm that uses valued ties or a binarization of the similarity matrix.

2. Cluster node-level structural indices (get at regular/automorphic equivalence)- This is the “evade” correlate to SE from community detection: cluster on

a BUNCH of easy-to-calculate node-level network statistics and this gives you nodes that are equivalent (with respect to the measures you used!)

Methods: How to?

The basic block model formation can be done in multiple ways:Role-specific algorithms:

Methods: How to?

The basic block model formation can be done in multiple ways:Role-specific algorithms:

Methods: How to?

Triad Structural Equivalence in SAS

Methods: How to?

Triad Structural Equivalence in SAS

Methods: How to?

Triad Structural Equivalence in SAS

Addendum A new statistic for determining the number of groups in a network.

Proc cluster gives you a statistic for the basic “fit” of a cluster solution.

This statistic varies depending on the method used, but is usually something like an R2. Consider this dendrogram:

Addendum A new statistic for determining the number of groups in a network.

Proc cluster gives you a statistic for the basic “fit” of a cluster solution.

This statistic varies depending on the method used, but is usually something like an R2. Consider this dendrogram:

The SPRSQ and the RSQ are your fit statistics.

Addendum A new statistic for determining the number of groups in a network.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

SPRSQ

RSQ

A sharp change in the statistic is your best indicator.

Addendum A new statistic for determining the number of groups in a network.

Modularity:

mN

S

ss

L

d

L

lM

1

2

2

M is the modularity scoreS indexes each group (“module”)ls is the number of lines in group sL is the total number of linesds is the sum of the degrees of the nodes in sNm is the number of groups

Role Positions

Identifying positions: Could use the Modularity score at each tree cut…

Role Positions

Example positions identified in a single school network(role 7 is a “leading crowd” in the simplest sum-of-in-degree sense)

Repeating this process across all networks, generates a population of within-school position profiles.

We then pool & cluster these position profiles in a “2nd-order clustering” to identify a set of roles that can be compared across the populations.

We settle on 5 position solution:

Role Positions

89/2

313

260/

501

39/1

078

50/1

235

4/81

5

35/2

63

50/8

19

0/41

6Outsiders Aloofs Friends HangersCentral

Core

Role Positions

Uninvolved outsiders (35% of students, 28% of role groups)

Largely uninvolved: nominate few and are nominated rarely by others. Includes isolated dyads & small groups; mixing matrix show that few friends tend to be others in same positon.

Role Positions

Non-Reciprocated (17% of students, 15% of role groups)

Makes nominations, but rarely reciprocated and has low in-degree, targeting highly central nodes with nominations. “Hangers on” position.

Role Positions

Basically average – positive scores largely because the isolates have been removed – liked by some, like others.

Everyday kids: good friends (21% of students, 29% of role groups)

Role Positions

“Popular Aloof” (9% of students, 9% of role groups)

High in-degree but low out-degree, but the few they do nominate tend to reciprocate.

Role Positions

Central Core (17.5% of students, 17.8% of role groups)

Highly reciprocated ties, active, very central; both high in-degree and reciprocation rates.

Role Positions

How stable is occupancy of a school role?

Role Positions

How stable is occupancy of a school role?

Recommended