Measurement Sensitivity It seems a reasonable approach to assessing the effect of measurement error...

Preview:

Citation preview

Measurement Sensitivity

It seems a reasonable approach to assessing the effect of measurement error on the ties in a network is to ask how would the network measures change if the observed ties differed from those observed. This question can be answered simply with Monte Carlo simulations on the observed network. Thus, the procedure I propose is to:

• Generate a probability matrix from the set of observed ties, • Generate many realizations of the network based on these underlying probabilities, and •Compare the distribution of generated statistics to those observed in the data.

•How do we set pij?•Range based on observed features (Sensitivity analysis)•Outcome of a model based on observed patterns (ERGM)

Measurement Sensitivity

As an example, consider the problem of defining “friendship” ties in highschools.

Should we count nominations that are not reciprocated?

Measurement Sensitivity

All ties Reciprocated

Measurement Sensitivity

Measurement Sensitivity

Measurement Sensitivity

Measurement Sensitivity

Measurement Sensitivity

Measurement Sensitivity

Statistical Analysis of Social Networks

Comparing multiple networks: QAP

The substantive question is how one set of relations (or dyadic attributes) relates to another. For example:

• Do marriage ties correlate with business ties in the Medici family network?• Are friendship relations correlated with joint membership in a club?

(review)

Modeling Social Networks parametrically:ERGM approaches

The earliest approaches are based on simple random graph theory, but there’s been a flurry of activity in the last 10 years or so.

Key historical references:- Holland and Leinhardt (1981) JASA- Frank and Strauss (1986) JASA- Wasserman and Faust (1994) – Chap 15 & 16-Wasserman and Pattison (1996)

Good practical overview: http://www.jstatsoft.org/v24 Great tutorial: http://statnet.csde.washington.edu/workshops/SUNBELT/EUSN/ergm/ergm_tutorial.html (last year’s sunbelt)

Or-https://statnet.csde.washington.edu/trac/wiki/Sunbelt2014 (lots of how to slides)

Modeling Social Networks parametrically:ERGM approaches

The “p1” model of Holland and Leinhardt is the classic foundation – the basic idea is that you can generate a statistical model of the network by predicting the counts of types of ties (asym, null, sym). They formulate a log-linear model for these counts; but the model is equivalent to a logit model on the dyads:

)(1Xlogit ij jiji X

Note the subscripts! This implies a distinct parameter for every node i and j in the model, plus one for reciprocity.

Modeling Social Networks parametrically:ERGM approaches

Modeling Social Networks parametrically:ERGM approaches

Results from SAS version on PROSPER datasets

Modeling Social Networks parametrically:ERGM approaches

Once you know the basic model format, you can imagine other specifications:

(orig) chars) (node )(1Xlogit

y)reciprocit ial(different )(1Xlogit

(orig) )(1Xlogit

ij

ij

ij

jiji

jigji

jiji

X

X

X

Key is to ensure that the specification doesn’t imply a linear dependency of terms.

Model fit is hard to judge – newer work shows that the se’s are “approximate” ;-)

)(

)}(exp{)(

xz

xXp

Where: is a vector of parameters (like regression coefficients)z is a vector of network statistics, conditioning the graph is a normalizing constant, to ensure the probabilities sum to 1.

Modeling Social Networks parametrically:ERGM approaches

)(

}exp{

)( ,

ji

ijij x

xXp

The simplest graph is a Bernoulli random graph,where each Xij is independent:

Where:

ij = logit[P(Xij = 1)]

() =[1 + exp(ij )]

Note this is one of the few cases where () can be written.

Modeling Social Networks parametrically:ERGM approaches

Typically, we add a homogeneity condition, so that all isomorphic graphs are equally likely. The homogeneous bernulli graph model:

)(

}{exp

)( ,

ji

ijx

xXp

Where:

() =[1 + exp()]g

Modeling Social Networks parametrically:ERGM approaches

If we want to condition on anything much more complicated than density, the normalizing constant ends up being a problem. We need a way to express the probability of the graph that doesn’t depend on that constant. First some terms:

j and ibetween tienox with Sociomatri

0 toforcedelement ijx with Sociomatri

1 toforcedelement ijx with Sociomatri

,

,

,

cji

ji

ji

X

X

X

Modeling Social Networks parametrically:ERGM approaches

)|0(

)|1()exp(

cijij

cijij

ij XXp

XXpw

)]()([exp{

)}(exp{

)}(exp{

)|0(

)|1(

ijij

ij

ij

cijij

cijij

xzxz

xz

xz

XXp

XXp

)]()([)|0(

)|1(log

ijijcijij

cijij

ij xzxzXXp

XXp

Modeling Social Networks parametrically:ERGM approaches

)]()([)|0(

)|1(log

ijijcijij

cijij

ij xzxzXXp

XXp

Note that we can now model the conditional probability of the graph, as a function of a set of difference statistics, without reference to the normalizing constant. The model, then, simply reduces to a logit model on the dyads.

Modeling Social Networks parametrically:ERGM approaches

Modeling Social Networks parametrically:ERGM approaches

)]()([)|0(

)|1(log

ijijcijij

cijij

ij xzxzXXp

XXp

Consider the simplest possible model: the Bernoulli random graph model, which says the only feature of interest is the number of edges in the graph. What is the change statistic for that feature?

dyads) allfor 1 is e(differenc 1][

zero) is vakyeso absent, is edge (assume )0(

one) is valueso present, is edge (assume )1(

ijij

ij

ij

xxz

xz

xz

Modeling Social Networks parametrically:ERGM approaches

Consider the simplest possible model: the Bernoulli random graph model, which says the only feature of interest is the number of edges in the graph. What is the change statistic for that feature?

The “Edges” parameter is simply an intercept-only model.

NODE ADJMAT

1 0 1 1 1 0 0 0 0 0

2 1 0 1 0 0 0 1 0 0

3 1 1 0 0 1 0 1 0 0

4 1 0 0 0 1 0 0 0 0

5 0 0 1 1 0 1 0 1 0

6 0 0 0 0 1 0 0 1 1

7 0 1 1 0 0 0 0 0 0

8 0 0 0 0 1 1 0 0 1

9 0 0 0 0 0 1 0 1 0

Density: 0.311

Modeling Social Networks parametrically:ERGM approaches

Consider the simplest possible model: the Bernoulli random graph model, which says the only feature of interest is the number of edges in the graph. What is the change statistic for that feature?

The “Edges” parameter is simply an intercept-only model.

proc logistic descending data=dydat;

model nom =;

run; quit;

---see results copy coef ---

data chk;

x=exp(-0.5705)/(1+exp(-0.5705));

run;

proc print data=chk;

run;

Modeling Social Networks parametrically:ERGM approaches

Including: A Practical Guide To Fitting p* Social Network

ModelsVia Logistic Regression

The site includes the PREPSTAR program for creating the variables of interest. The following example draws from this work. – this bit nicely walks you through the logic of constructing change variables, model fit and so forth.

But the estimates are not very good for any parameters other than “dyad independent” parameters!

Modeling Social Networks parametrically:ERGM approaches

The logit model estimation procedure was popularized by Wasserman & colleagues, and a good guide to this approach is:

Modeling Social Networks parametrically:ERGM approaches

Parameters that are often fit include:1) Expansiveness and attractiveness parameters. = dummies for

each sender/receiver in the network2) Degree distribution 3) Mutuality 4) Group membership (and all other parameters by group)5) Transitivity / Intransitivity6) K-in-stars, k-out-stars7) Cyclicity8) Node-level covariates (Matching, difference)9) Edge-level covariates (dyad-level features such as exposure)10) Temporal data – such as relations in prior waves.

Modeling Social Networks parametrically:Exponential Random Graph Models

Modeling Social Networks parametrically:Exponential Random Graph Models

…and there are LOTS of terms…

Modeling Social Networks parametrically:Exponential Random Graph Models

The terms currently available are (help(ergm.terms)

Node Main Effects: nodecov(attrname) Main effect of a covariate: nodefactor(attrname, base=1) Factor attribute effect: nodeicov(attrname) Main effect of a covariate for in-edges: nodeifactor(attrname, base=1) Factor attribute effect for in-edges: nodeocov(attrname) Main effect of a covariate for out-edges: nodeofactor(attrname, base=1) Factor attribute effect for out-edges: receiver(base=1) Receiver effect: sender(base=1) Sender effect: sociality(attrname=NULL, base=1) Undirected degree:

Modeling Social Networks parametrically:Exponential Random Graph Models

Attribute Mixing Effects absdiff(attrname, pow=1) Absolute difference: absdiffcat(attrname, base=NULL) Categorical absolute difference: dyadcov(x, attrname=NULL) Dyadic covariate: edgecov(x, attrname=NULL) Edge covariate: The edgecov and dyadcov terms are

equivalent for undirected networks. hamming(x, cov, attrname=NULL) Hamming distance: hammingmix(attrname, x, base=0) Hamming distance within mixing: match(attrname, diff=FALSE, keep=NULL) Uniform homophily and differential

homophily: This is an alias for nodematch(attrname, diff=FALSE). nodematch(attrname, diff=FALSE, keep=NULL) Uniform homophily and differential

homophily: nodemix(attrname, base=NULL) Nodal attribute mixing:

Modeling Social Networks parametrically:Exponential Random Graph Models

Structural Effects Base Volume

density Density: edges Edges: meandeg Mean vertex degree:

Degree/Star effects

altkstar(lambda, fixed=FALSE) Alternating k-star: gwdegree(decay, fixed=FALSE, cutoff=30) Geometrically weighted degree

distribution: gwidegree(decay, fixed=FALSE, cutoff=30) Geometrically weighted in-degree

distribution: gwodegree(decay, fixed=FALSE, cutoff=30) Geometrically weighted out-degree

distribution: idegree(d, by=NULL, homophily=FALSE) In-degree: isolates Isolates: istar(k, attrname=NULL) In-stars: kstar(k, attrname=NULL) k-Stars: odegree(d, by=NULL, homophily=FALSE) Out-degree: ostar(k, attrname=NULL) k-Outstars:

Modeling Social Networks parametrically:Exponential Random Graph Models

Structural Effects Dyadic Effects

asymmetric(attrname=NULL, diff=FALSE, keep=NULL) Asymmetric dyads: degree(d, by=NULL, homophily=FALSE) Degree: degcrossprod Degree Cross-Product: degcor Degree Correlation: mutual(same=NULL, diff=FALSE, by=NULL, keep=NULL) Mutuality:

Path Effects m2star Mixed 2-stars, a.k.a 2-paths: See also twopath. threepath(keep=1:4) Three-paths: twopath 2-Paths:

Modeling Social Networks parametrically:Exponential Random Graph Models

Triadic Effects ctriple(attrname=NULL) Cyclic triples:. cycle(k) Cycles: dsp(d) Dyadwise shared partners: esp(d) Edgewise shared partners: balance Balanced triads: gwdsp(alpha, fixed=FALSE, cutoff=30)Geometrically weighted dyadwise shared

partner distribution: gwesp(alpha, fixed=FALSE, cutoff=30) Geometrically weighted edgewise shared

partner distribution: gwnsp(alpha, fixed=FALSE, cutoff=30) Geometrically weighted nonedgewise shared

partner distribution: intransitive Intransitive triads: localtriangle(x) Triangles within neighborhoods: nearsimmelian Near simmelian triads: nsp(d) Nonedgewise shared partners: simmelian Simmelian triads: simmelianties Ties in simmelian triads: transitive Transitive triads: transitiveties(attrname=NULL) Transitive ties: triadcensus(d) Triad census: triangle(attrname=NULL) Triangles: tripercent(attrname=NULL) Triangle percentage: ttriple(attrname=NULL) Transitive triples:

Modeling Social Networks parametrically:Exponential Random Graph Models

Two Mode Networks b1concurrent(by=NULL) Concurrent node count for the first mode in a bipartite (aka two-

mode) network: b1degree(d, by=NULL) Degree for the first mode in a bipartite (aka two-mode) network: b1factor(attrname, base=1) Factor attribute effect for the first mode in a bipartite (aka

two-mode) network : b1star(k, attrname=NULL) k-Stars for the first mode in a bipartite (aka two-mode)

network: b1starmix(k, attrname, base=NULL, diff=TRUE) Mixing matrix for k-stars centered on

the first mode of a bipartite network: b1twostar(b1attrname, b2attrname, base=NULL) Two-star census for central nodes

ceneterd on the first mode of a bipartite network: b2concurrent(by=NULL) Concurrent node count for the second mode in a bipartite (aka

two-mode) network:. b2degree(d, by=NULL) Degree for the second mode in a bipartite (aka two-mode) network: b2factor(attrname, base=1) Factor attribute effect for the second mode in a bipartite

(aka two-mode) network : b2star(k, attrname=NULL) k-Stars for the second mode in a bipartite (aka two-mode)

network: b2starmix(k, attrname, base=NULL, diff=TRUE) Mixing matrix for k-stars centered on

the second mode of a bipartite network: b2twostar(b1attrname, b2attrname, base=NULL) Two-star census for central nodes

ceneterd on the second mode of a bipartite network: gwb1degree(decay, fixed=FALSE, cutoff=30) Geometrically weighted degree

distribution for the first mode in a bipartite (aka two-mode) network: gwb2degree(decay, fixed=FALSE, cutoff=30) Geometrically weighted degree

distribution for the second mode in a bipartite (aka two-mode) network: concurrent(by=NULL) Concurrent node count:

Modeling Social Networks parametrically:Exponential Random Graph Models

In practice, logit estimated models are difficult to estimate, and we have no good sense of how approximate the PMLE is.

The STATNET generalization is to use MCMC methods to better estimate the parameters. This is essentially a simulation procedure working “under the hood” to explore the space of graphs described by the model parameters; searching for the best fit to the observed data.

Modeling Social Networks parametrically:Exponential Random Graph Models:

Modeling Social Networks parametrically:Exponential Random Graph Models:

Modeling Social Networks parametrically:Exponential Random Graph Models

You can specify a model as a simple statement on terms:

Modeling Social Networks parametrically:Exponential Random Graph Models

A simple example: One of the schools in PROSPER

library(statnet);library(foreign);g <- read.paj("C:/jwmdata/prosper/Network_data_files/PAJEK/MATCHED/SC1C1W1Sch101.net");g %v% "indegree" <- degree(g,cmode="indegree");g %v% "outdegree" <- degree(g,cmode="outdegree");atr<-read.table("C:/jwmdata/prosper/Network_data_files/Rfiles/ergmfiles/n111101.txt");g %v% "sex" <- atr[,2 ];g %v% "white" <- atr[,3 ];g %v% "slun" <- atr[,4 ];g %v% "irtuse" <- atr[,5 ];g %v% "irtdev" <- atr[,6 ];g %v% "tgrad" <- atr[,7 ];g %v% "discip" <- atr[,8 ];g %v% "church" <- atr[,9 ];g %v% "sens" <- atr[,10 ];

plot(g,vertex.col="sex");plot(g,vertex.col="slun");plot(g,vertex.col="white");

Dynamics 1:Simple time-lag model: Prosper Peers

Modeling Social Networks parametrically:Exponential Random Graph Models

Complete Network AnalysisStochastic Network Analysis An example:

Panel model in PROSPER

Complete Network AnalysisStochastic Network Analysis

Modeling Social Networks parametrically:Exponential Random Graph Models: Degeneracy

"Assessing Degeneracy in Statistical Models of Social Networks" Mark S. Handcock, CSSS Working Paper #39

Modeling Social Networks parametrically:Exponential Random Graph Models:

Quick example (demo)

Modeling Social Networks parametrically:Latent Space Models

Modeling Social Networks parametrically:Latent Space Models

Z = a dimension in some unknown space that, once accounted for makes ties independent. Z is effectively chosen with respect to some latent cluster-space, G. These “groups” define different social sources for association.

Modeling Social Networks parametrically:Latent Space Models

Z = a dimension in some unknown space that, once accounted for makes ties independent. Z is effectively chosen with respect to some latent cluster-space, G. These “groups” define different social sources for association.

Modeling Social Networks parametrically:Latent Space Models

Modeling Social Networks parametrically:Latent Space Models

Prosper data, with three groups

Modeling Social Networks parametrically:Latent Space Models

Prosper data, with three groups (posterior density plots)

Modeling Social Networks parametrically:Latent Space Models

…note there is a non-R option.,..

Generating Random Graph Samples

A conceptual merge between exponential random graph models and QAP/sensitivity models is to attempt to identify a sample of graphs from the universe you are trying to model.

)(

)}(exp{)(

xz

xXp

That is, generate X empirically, then compare z(x) to see how likely a measure on x would be given X. The difficulty, however, is generating X.

Generating Random Graph Samples

The first option would be to generate all isomorphic graphs within a given constraint.

This is possible for small graphs, but the number gets large fast. For a network with 3 nodes, there are 16 possible directed graphs. For a network with 4 nodes, there are 218, for 5 nodes 9608, for 6 nodes1,540,944, and so on…

So, the best approach is to sample from the universe, but, of course, if you had the universe you wouldn’t need to sample from it. How do you sample from a population you haven’t observed?

(a) use a construction algorithm that generates a random graph with known constraints (b) use a ERGM model like above.

Romantic Networks

Generating Random Graph Samples

Romantic Networks

Generating Random Graph Samples

Romantic Networks

Generating Random Graph Samples

A draw from the simulation, this is what appeared in “Glamour”

Edge-matching random permutation

Can easily generate networks with appropriate degree distributions by generating “edge stems” and sorting:

aDegree:1: 22: 23: 1

b

di=1

c

c

di=2

d

d

f

f

di=3

f

(need to ensure you have a valid edge list!)

Generating Random Graph Samples

Edge-matching random permutationGenerating Random Graph Samples

PartnerDistribution

ComponentSize/Shape

Emergent Connectivity in low-degree networks

Generating Random Graph Samples

Development of STD cores in low-degree networks: rapid transition without stars.

Complete Network AnalysisNetwork Connections: Connectivity

Extend this view across the space of low-degree distributions defined by shape and volume...

Complete Network AnalysisNetwork Connections: Connectivity

Complete Network AnalysisNetwork Connections: Connectivity

ERGMs make it (fairly) easy to simulate networks from models.

•Simple: simulation from an estimated ERGM (this is how the GOF function works)•Simple II: simulate from a pre-defined ERGM formula (i.e. set the parameters by hand)•A little harder: Simulate from EGO networks. Here you can use ERGM to match the observed distribution for mixing by node characteristics reported in an ego-network survey.

• Can use degree, attribute mixing, •A bit harder: fit global structure features using ego-nets by modeling distribution of sub-structures (see Jeff Smith’s work)

Generating Random Graph SamplesModel based estimates

ERGM to simulate networks from Add Health

Modeling Network DynamicsRule-based simulation models

Rule-Based simulation models:The network-science approach to dynamic networks has been to identify toy behavioral models and play out the implications of these models for network dynamics. Focus is typically on how the network evolves (or reaches a steady stat).

dynamics OF networksBalance, preferential attachment, voter models

dynamics ON networksdiffusion simulations

These are usually agent-based models, difficult to specify – tradeoff in simplicity & realism.

Modeling Network DynamicsDescriptive dynamic techniques

Goal here is to make sense of how networks change or how things flow through them using a clear measurement / metrics approach. Challenge is defining the network.

Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR 1983, 248-257)Non-financial interlocks:1886 - 1890

Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR 1983, 248-257)Non-financial interlocks:1891 - 1895

Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR 1983, 248-257)Non-financial interlocks:1896 - 1900

Time and Social Networks

Examples of looking at change in networks: Roy and interlocking directorates (ASR 1983, 248-257)Non-financial interlocks:1901 - 1905

Bearman and Everett: The Structure of Social Protest

1

3 2

45

6

13

2

4

5

7

61

3

2

4

5

(‘61-63) (‘66-68) (‘71-73)

7

61

3

2

4

5

(‘76-78) (‘81-83)

7

51

6

3

4

2

See paper for group compositions

Data on drug users in Colorado Springs, over 5 years

Data on drug users in Colorado Springs, over 5 years

Data on drug users in Colorado Springs, over 5 years

Data on drug users in Colorado Springs, over 5 years

Data on drug users in Colorado Springs, over 5 years

http://csde.washington.edu/statnet/movies/ConcurrencyAndReachability.mov

Animation captures much of the dynamism we care about:

STD Diffusion

Representing dynamic networks?

Animation captures much of the dynamism we care about:

Representing dynamic networks?

Animation captures much of the dynamism we care about:

Representing dynamic networks?

Modeling Network DynamicsRandom Graph models

Panel ERGM: Simply want to account for effect of past structures, you can add temporal covariates to the standard ERGM. Really only good for two waves.

STERGM: Separable Temporal ERGM. This is a two-equation model, with one equation for the formation of ties, a 2nd for the dissolution of ties. Goal is like ERGM, to explain the dynamics of the network.

http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutorial.pdf

RELEVENT: Relational Events Model. This is really a model of action on a network think of conversation events or similar. Dynamic networks of very short duration events.

http://statnet.csde.washington.edu/workshops/SUNBELT/current/relevent/statnet_sunbelt2014_relevent.pdf

SIENA: Stochastic Actor Oriented Model (SAOM). Used to disentangle selection from influence, by jointly modeling both as functions of each other. Multi-equation model, simplest is one for behavior & one for network formation.Intro: https://www.stats.ox.ac.uk/~snijders/siena/SnijdersSteglichVdBunt2009.pdf Manual: https://www.stats.ox.ac.uk/~snijders/siena/RSiena_Manual.pdf

Modeling Network DynamicsRandom Graph models: STERGM

http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutorial.html slides adapted from the workshop materials: http://statnet.csde.washington.edu/EpiModel/nme/index.html

Modeling Network DynamicsRandom Graph models: STERGM

http://statnet.csde.washington.edu/workshops/SUNBELT/current/tergm/tergm_tutorial.html slides adapted from the workshop materials: http://statnet.csde.washington.edu/EpiModel/nme/index.html

Under certain assumptions, you can model a single network w. average duration information (assumes an equilibrium process)

Modeling Network DynamicsRandom Graph models: STERGM

samp.fit <- stergm(samp, formation= ~edges+mutual+cyclicalties+transitiveties, dissolution = ~edges+mutual+cyclicalties+transitiveties, estimate = "CMLE", times=1:3)

SIENA

SIENA: Key Assumptions of the model

SIENA

SIENA

SIENA

Key element is how actors make changes. This is based on an evaluation of “utility” functions, similar to discrete choice models.

The model is then implemented as an actor-simulation, where actors are striving to maximize their utility.

note Tom is adamant that this is an “as if” model – no clear ontological commitment to a “choice” model!

Modeling Network DynamicsRandom Graph models: Siena

Modeling Network DynamicsRandom Graph models: Siena

Osgood, D. W., Ragan, D. T., Wallace, L., Gest, S. D., Feinberg, M. E., & Moody, J. 2013. “Peers and the emergence of alcohol use: Influence and selection processes in adolescent friendship networks.” Journal of Research on Adolescence 23:500–512.

Modeling Network DynamicsRandom Graph models: RelEvent

For repeated interactions amongst nodes

Recommended