12
Social Networks 38 (2014) 88–99 Contents lists available at ScienceDirect Social Networks jo u r n al hom ep age: www.elsevier.com/locat e/socnet Statistical power of the social network autocorrelation model Wei Wang a,, Eric J. Neuman b , Daniel A. Newman b a University of Central Florida, United States b University of Illinois at Urbana-Champaign, United States a r t i c l e i n f o Keywords: Network autocorrelation model Social network analysis Statistical power a b s t r a c t The network autocorrelation model has become an increasingly popular tool for conducting social net- work analysis. More and more researchers, however, have documented evidence of a systematic negative bias in the estimation of the network effect (). In this paper, we take a different approach to the prob- lem by investigating conditions under which, despite the underestimation bias, a network effect can still be detected by the network autocorrelation model. Using simulations, we find that moderately-sized network effects (e.g., = .3) are still often detectable in modest-sized networks (i.e., 40 or more nodes). Analyses reveal that statistical power is primarily a nonlinear function of network effect size () and network size (N), although both of these factors can interact with network density and network structure to impair power under certain rare conditions. We conclude by discussing implications of these findings and guidelines for users of the autocorrelation model. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Identifying and determining network effects are some of the major goals and unique advantages of social network analysis. Of the many models proposed to investigate network effects on individual outcomes, the network autocorrelation model (Anselin, 1988; Cliff and Ord, 1981; Doreian, 1980, 1981; Ord, 1975) is perhaps the dominant approach; it has been recently touted as “a workhorse for modeling network influences on individual behavior” (Fujimoto et al., 2011, p. 231). The network autocorre- lation model has some clear advantages over other conventional approaches (e.g., egocentric or dyadic) in that it simultaneously accommodates network effects and individual attributes. Because of these advantages, scholars continue to use and build upon the model. For instance, Dow (2007) extended the one-network autocorrelation model to multiple networks and applied it to understand the simultaneous multiple processes of cultural trans- mission. The standard one-mode network autocorrelation model has also been extended to a two-mode model (i.e., actor × event) by Fujimoto et al. (2011). More importantly, the primary esti- mation method for network autocorrelation models—maximum likelihood—that was originally elaborated by Doreian (1981) has now been integrated in modern statistical packages such as R (Butts, Corresponding author at: Department of Psychology, University of Central Florida, 4000 Central Florida Blvd., Psychology Bldg 99 Ste. 320, Orlando, FL 32816, United States. Tel.: +1 407 823 4350; fax: 407-823-5862. E-mail addresses: [email protected], [email protected] (W. Wang). 2008), Matlab (LeSage, 1999), and Stata (Pisati, 2001). These devel- opments have made the model much more accessible to network researchers. Despite the obvious benefits of the model and its rising popular- ity, there has been growing evidence that the maximum likelihood algorithm used to estimate the parameters of autocorrelation mod- els produces estimates of the network effect () that are negatively biased (Dow et al., 1982; Farber et al., 2009; Mizruchi and Neuman, 2008; Neuman and Mizruchi, 2010; Smith, 2009). This issue poten- tially leads to two serious problems for users of the network autocorrelation model. First, the model may fail to detect a net- work effect that truly exists, thus committing a Type II error (i.e., ˇ error). Second, if the model does detect a network effect, the param- eter of the network effect may be underestimated. Without further understanding the magnitude of these problems, users may begin to doubt the veracity of all network effect results, not just those subject to the conditions in which the bias has been detected. In this paper, we take a different approach to studying the underestimation problem. Rather than look for more conditions in which is underestimated, we investigate the likelihood of identifying a statistically significant network effect by the net- work autocorrelation model under various conditions. Specifically, given certain known network properties (e.g., size of network effect , network density, network size, and network structure) what is the likelihood—that is, what is the statistical power—of iden- tifying a network effect using the autocorrelation model? While investigating this question, we also attempt to answer a more practical question: what network size (N) is required in order to obtain decent power (e.g., 80% power) to detect a network effect, given http://dx.doi.org/10.1016/j.socnet.2014.03.004 0378-8733/© 2014 Elsevier B.V. All rights reserved.

Statistical power of the social network autocorrelation model

Embed Size (px)

Citation preview

S

Wa

b

a

KNSS

1

mOi1pablaaotaumhbmln

FU

h0

Social Networks 38 (2014) 88–99

Contents lists available at ScienceDirect

Social Networks

jo u r n al hom ep age: www.elsev ier .com/ locat e/socnet

tatistical power of the social network autocorrelation model

ei Wanga,∗, Eric J. Neumanb, Daniel A. Newmanb

University of Central Florida, United StatesUniversity of Illinois at Urbana-Champaign, United States

r t i c l e i n f o

eywords:etwork autocorrelation modelocial network analysistatistical power

a b s t r a c t

The network autocorrelation model has become an increasingly popular tool for conducting social net-work analysis. More and more researchers, however, have documented evidence of a systematic negativebias in the estimation of the network effect (�). In this paper, we take a different approach to the prob-lem by investigating conditions under which, despite the underestimation bias, a network effect can stillbe detected by the network autocorrelation model. Using simulations, we find that moderately-sized

network effects (e.g., � = .3) are still often detectable in modest-sized networks (i.e., 40 or more nodes).Analyses reveal that statistical power is primarily a nonlinear function of network effect size (�) andnetwork size (N), although both of these factors can interact with network density and network structureto impair power under certain rare conditions. We conclude by discussing implications of these findingsand guidelines for users of the autocorrelation model.

. Introduction

Identifying and determining network effects are some of theajor goals and unique advantages of social network analysis.f the many models proposed to investigate network effects on

ndividual outcomes, the network autocorrelation model (Anselin,988; Cliff and Ord, 1981; Doreian, 1980, 1981; Ord, 1975) iserhaps the dominant approach; it has been recently touteds “a workhorse for modeling network influences on individualehavior” (Fujimoto et al., 2011, p. 231). The network autocorre-

ation model has some clear advantages over other conventionalpproaches (e.g., egocentric or dyadic) in that it simultaneouslyccommodates network effects and individual attributes. Becausef these advantages, scholars continue to use and build uponhe model. For instance, Dow (2007) extended the one-networkutocorrelation model to multiple networks and applied it tonderstand the simultaneous multiple processes of cultural trans-ission. The standard one-mode network autocorrelation model

as also been extended to a two-mode model (i.e., actor × event)y Fujimoto et al. (2011). More importantly, the primary esti-

ation method for network autocorrelation models—maximum

ikelihood—that was originally elaborated by Doreian (1981) hasow been integrated in modern statistical packages such as R (Butts,

∗ Corresponding author at: Department of Psychology, University of Centrallorida, 4000 Central Florida Blvd., Psychology Bldg 99 Ste. 320, Orlando, FL 32816,nited States. Tel.: +1 407 823 4350; fax: 407-823-5862.

E-mail addresses: [email protected], [email protected] (W. Wang).

ttp://dx.doi.org/10.1016/j.socnet.2014.03.004378-8733/© 2014 Elsevier B.V. All rights reserved.

© 2014 Elsevier B.V. All rights reserved.

2008), Matlab (LeSage, 1999), and Stata (Pisati, 2001). These devel-opments have made the model much more accessible to networkresearchers.

Despite the obvious benefits of the model and its rising popular-ity, there has been growing evidence that the maximum likelihoodalgorithm used to estimate the parameters of autocorrelation mod-els produces estimates of the network effect (�) that are negativelybiased (Dow et al., 1982; Farber et al., 2009; Mizruchi and Neuman,2008; Neuman and Mizruchi, 2010; Smith, 2009). This issue poten-tially leads to two serious problems for users of the networkautocorrelation model. First, the model may fail to detect a net-work effect that truly exists, thus committing a Type II error (i.e., ˇerror). Second, if the model does detect a network effect, the param-eter of the network effect may be underestimated. Without furtherunderstanding the magnitude of these problems, users may beginto doubt the veracity of all network effect results, not just thosesubject to the conditions in which the bias has been detected.

In this paper, we take a different approach to studying theunderestimation problem. Rather than look for more conditionsin which � is underestimated, we investigate the likelihood ofidentifying a statistically significant network effect by the net-work autocorrelation model under various conditions. Specifically,given certain known network properties (e.g., size of network effect�, network density, network size, and network structure) whatis the likelihood—that is, what is the statistical power—of iden-

tifying a network effect using the autocorrelation model? Whileinvestigating this question, we also attempt to answer a morepractical question: what network size (N) is required in order toobtain decent power (e.g., 80% power) to detect a network effect, given

Netw

asa

ass8wsapwnttm

2s

go1gpsotrfTcsdtfieaiottrtmatmHmfeNdi1Nb2(co2

W. Wang et al. / Social

pproximate network effect size, density, and structure? We believeuch information provides useful guidelines to users of the networkutocorrelation model.

By using simulations and manipulating network properties suchs network effect �, network density, and network structure, wehow that for a common network effect size of � = .3, a networkize (N) of 40–80 nodes is sufficient to obtain statistical power of0% or higher, depending on the network structure. In addition,e find that the Type I error (i.e., the probability of statistically

upporting network effects that do not exist) remains accept-bly small. We conduct further analyses to reveal that statisticalower is primarily a function of network effect size (�) and net-ork size (N), although both of these factors can interact withetwork density and network structure to impair power under cer-ain rare conditions. We conclude by discussing the implications ofhese findings and offer guidelines for users of the autocorrelation

odel.

. Network autocorrelation model and its applications inocial science

The network autocorrelation model was initially proposed byeographers to remedy the dependence problem in the error termsf regression analysis for geographic proximity data (Cliff and Ord,981; Ord, 1975). Spatial dependence is quite common in geo-raphic data. For example, the average real estate prices of tworoximal areas are closer than those of two distant areas. If thispatial dependence is not acknowledged and accounted for in therdinary least-squares (OLS) regression model (i.e., Y = X + ε), thenhe model residuals of proximal areas are more similar than theesiduals of distant areas. Such an error term ε thus violates aundamental assumption for the conventional regression model:he error terms should be independent with zero mean and aonstant variance and should follow a Gaussian distribution. Toolve the assumption violation problem and to remove the spatialependence of the disturbance, geographic researchers proposedwo autocorrelation models (Cliff and Ord, 1981; Ord, 1975). Therst model, termed the spatial disturbances model or the spatialrror model (Anselin and Hudak, 1992), decomposes the problem-tic spatially dependent error term ε into ε = �Wε + �, where Ws an N × N adjacency matrix of the spatial distances among thebservations (e.g., W is a social network matrix), � is the parame-er representing the correlation strength of spatial dependence inhe residuals of ε, and � is now the vector of Gaussian-distributedesiduals. The second model, which is more straightforward thanhe first model and is the model we focus on in the current paper,

odels the spatial dependence directly on the dependent vari-ble Y instead of on the model residuals. This second model wasermed the spatial effect model (Doreian, 1980), the network effect

odel (Doreian et al., 1984), or the spatial lag model (Anselin andudak, 1992), and is Y = �WY + X + ε, where W is the same N × Natrix of spatial distances among the observations as specified

or the first model (e.g., W is the social network matrix). How-ver the error term ε in this model follows a Gaussian distribution(0, �2H) and the parameter � represents the strength of spatialependence in the dependent variable Y. Because of its versatil-

ty, this model was soon adopted by social scientists (Doreian,990; White et al., 1981) who used it to model social influence.ow these models of spatial and network autocorrelation haveeen applied in many social sciences such as political science (Cho,003; Franzese and Hays, 2007; Franzese et al., 2012), sociology

Crowder and South, 2008; Loftin and Ward, 1983), cultural psy-hology and anthropology (Dow, 2007; Dow and Eff, 2008), andrganizational studies (Ibarra and Andrews, 1993; Mizruchi et al.,006).

orks 38 (2014) 88–99 89

3. The estimation challenge

Despite the many advantages of the network autocorrelationmodel, one serious problem has emerged. In numerous simula-tion studies as far back as the early 1980s, researchers have shownthat maximum likelihood estimation of the network effect � canbe negatively biased under several conditions. The earliest knownevidence of an estimation bias was identified by Dow et al. (1982).In a study of the disturbances model using small networks (N = 20,30, and 40), Dow and colleagues found that � was underestimatedacross a variety of target �’s (.2, .4, .6, and .8) for a random Wwith density of .1, and also for a “language”-structured W withhigher density. For the random networks, the magnitude of the biasincreased as the target � increased, and network size had little effecton the bias. For the structured networks, the bias was less severethan that found with the random networks, decreased in magni-tude as the target � increased, and was less severe for the largestnetworks.

Despite the authors’ bold claim that “estimates of the sig-nificance of � are unreliable from ML [maximum likelihood]procedures” (Dow et al., 1982:198), the problem of a potential biasin properly-specified models was ignored for over two decades.Interest has been renewed in this area over the past five years ascomputing power makes it possible to carry out in-depth simula-tions over a host of different network conditions.

Perhaps the first such systematic investigation of bias was con-ducted by Mizruchi and Neuman (2008). Using random networksof sizes 40, 50, and 100 and across network densities from .05 to.95, they reported strong evidence of a negative bias in the estima-tion of � using the network effects model. Regardless of networksize or whether W was row-standardized, they consistently foundthat the underestimation of � increased with increasing density ofW and that the relationship between network density and negativebias in the estimate of � became stronger at higher levels of target�. Only when the noise in the residual term, ε, of the autocorrela-tion model was reduced to unrealistically low levels or when thenumber of exogenous variables (X’s) in the model was increased tounrealistically high levels was much of the underestimation bias of� attenuated—though it was never entirely eliminated.

In follow-up work (Neuman and Mizruchi, 2010) the authorsextended their previous study to examine whether the estima-tion bias held for non-random networks. Using larger networksthan before (N = ∼400), they ran simulations of star, caveman, andsmall-world networks along with random networks (all networkdensities ≤ .5) at target �’s of 0, .2, and .5. The pattern of findingswas the same as before: a negative bias in the estimation of � witha magnitude that increased with increasing network density. Yetthey also identified a negative bias in the estimation of � for low-density star networks. At a minimum, this underestimation of � forlow-density star networks suggests that high density is not the solesource of the underestimation bias. More strongly, it might suggestthat high density itself does not directly cause the bias but that highdensity networks create a condition that leads to the bias, and thatthis condition could also be caused by other network configurations(e.g., low-density star models).

Consistent with Mizruchi and Neuman’s simulation findingsfor the effects model, Smith (2009) analytically showed that formaximally-connected networks (that is, W’s with density = 1) max-imum likelihood estimates of � exhibit a negative bias in both thespatial effects and spatial disturbances models. Smith then con-ducted simulations of both effects and disturbances autocorrelationmodels to examine cases where W is not maximally-connected.

Using 50-node, randomly-connected networks with target � = .5and with densities of .3, .5, .8, .9, .95, and .99, he replicated Mizruchiand Neuman’s finding that � is seriously underestimated as thedensity of W increases. Smith’s results further showed that for his

9 l Networks 38 (2014) 88–99

sm

�eelliiTi

aafaT2stototeowti

mdrubdt

4p

iStwqpbiteivgu

4

rtod

Table 1Hypothesis testing with the network autocorrelation model.

Results of thenetwork analysis

True network effect in the population

No (� = 0) Yes (� > 0)

Finding asignificantnetwork effect

Type I error (˛) [“mirage”] Power = (1 − ˇ)

Not finding asignificant

Correct rejection Type II error (ˇ) [“blindness”]

0 W. Wang et al. / Socia

imulation design the negative bias is stronger in the disturbancesodel than in the effects model.Additional evidence of a negative bias in the estimation of

for both effects and disturbances models came from Farbert al. (2010). Extending their own work on power analysis of theffects model (Farber et al., 2009), they conducted simulations ofarge (N = 100, 500, and 1000) structured networks that varied byevels of degree distribution and clustering coefficient. Their find-ngs showed that the magnitude of the underestimation bias of �ncreases as target � increases and as the density of W increases.he latter finding is especially interesting given that the W matricesn their study were relatively sparse (density ≤ .08).

Finally, Fujimoto et al. (2011) identified similar patterns of neg-tive bias in the estimation of � when they extended the networkutocorrelation model to 2-mode networks. In their model, W isormed by projecting an actor-to-event affiliation matrix A onto

1-mode co-membership actor-to-actor network (i.e., W = AAT).hey constructed random affiliation matrices (100 actors; up to0 events) where the number of events per person follows a Pois-on outdegree distribution that yielded W’s with densities similaro those studied by Mizruchi and Neuman (2008). Focusing solelyn the network effects model, Fujimoto and colleagues found thathe estimates of � were negatively biased and that the magnitudef the bias increased with increasing density of W. Interestingly,hough, they found that by including a variable for the number ofvents participated in by each actor (i.e., each actor’s row sum in theriginal affiliation matrix) the underestimation bias of � was some-hat attenuated. Yet unlike prior studies, they did not find that

he underestimation of � became more pronounced as the target �ncreased.

Because such an underestimation issue may potentially lead toany serious problems such as Type II error rates (i.e., failing to

etect network effects that do in fact exist), it may yield illusoryesults that can ultimately discourage network researchers fromsing the autocorrelation model. Estimation of the model may note able to detect an expected network effect; or even when it doesetect an effect, the estimated parameters may be smaller than therue parameters.

. Statistical power analysis: a new approach to the oldroblem

This paper attempts to take a different approach to the vex-ng bias problem and considers it from a positive perspective.pecifically, we ask a practical yet important question: what ishe likelihood of identifying a significant network effect, regardlesshether estimation of the network parameter may be biased? Thisuestion turns the negatively biased problem into a statisticalower question. We believe understanding statistical power mighte more meaningful to autocorrelation network users because (a)

n practice, researchers may be equally as concerned with whetherhe network effect is statistically significant as they are with pointstimates of the magnitude of the network effect and (b) moremportantly, statistical power is a function of many manageableariables such as network size. Thus power analysis provides usefuluidelines for study design to aid network autocorrelation modelsers and practitioners.

.1. Statistical power

Two types of errors occur when testing a hypothesis in scientific

esearch. The first type of error, termed Type I error (or error), ishe error of finding an effect that is not there (i.e., a mirage). Thether type of error, Type II error (or error), is the error of failing toetect an effect that is truly there (i.e., blindness). In other words,

network effect

Type I error is a false positive conclusion and Type II error is a falsenegative conclusion. In network autocorrelation model analyses,Type I error occurs when the analysis renders a statistically signifi-cant network effect under conditions when in actuality there is nonetwork effect (i.e., � = 0). Type II error occurs when the analysisfinds no network effect under conditions when a network effecttruly exists (i.e., |�| > 0). Statistical power is defined as the proba-bility of correctly detecting an effect that truly exists (power = 1– Type II error rate = 1 − ˇ). In the context of network autocorrela-tion model analysis, high power means a better probability that theanalysis identifies a true network effect, and thus a better proba-bility of rejecting the false null hypothesis that there is no networkeffect. Table 1 presents the four possible outcomes from hypothe-sis testing in the network effects analysis: Type I error (˛), Type IIerror (ˇ), power, and correct rejection.

4.2. The importance of statistical power analysis

Although scientists conducting hypothesis tests typically focuson Type I error (i.e., the p < .05 threshold), Type II error and statis-tical power are vitally important in virtually all scientific research(Abraham and Russell, 2008; Cohen, 1988, 1992). Practically, statis-tical power analysis is important for determining the sample size ofa study that would be needed in order to detect an expected effect.More importantly, statistical power is critical for the healthy accu-mulation of scientific truth (Abraham and Russell, 2008; Schmidt,1992). Given a study design that has moderately large power (e.g.,power = .60), if researchers conduct 10 independent studies byrandomly selecting 10 samples from the population where theeffect does exist, then on average only 6 out of the 10 studies willfind a statistically significant effect, with the other 4 studies erro-neously concluding a null effect. Understanding statistical powerin this way has implications for the accumulation of scientificknowledge, as it can help to diagnose and prevent two types ofpublication bias. First, the average effect size estimate based ononly the 6 studies that found significant effects likely overestimatesthe true population effect (Abraham and Russell, 2008; Schmidt,1992). That is, because journal editors often favor publishingstudy findings of significant effects, researchers who happen tofind null effects often decide not to submit the manuscript orfail to publish the findings, which inevitably causes a file drawerproblem of hidden null studies (Rosenthal, 1979). Second, thereis increasing recognition that many prevalent research practicescan lead to capitalization on chance and to the reporting of falsepositive results (Simmons et al., 2011). This type of publicationbias can also be diagnosed via power analysis, because a findingshould only replicate at a rate consistent with the statistical powerof its design (Francis, 2012a, 2012b). Without knowing whetherthe inconsistent findings of 10 hypothetical studies are entirely

attributable to the moderate statistical power of the design, futureresearchers may blindly put efforts into investigating reasons forthe inconsistent findings; for instance, by searching for moderators

Netw

te

(fiafsMi

4

aeaistttmtsYorw

pscBietcsw

Hm

Hm

fs�fl�ofsnppi

Hmowa

and density of .10, each actor is expected to have 5 ties. In thisstudy we used a binary network, thus a tie was denoted by 1 andthe absence of a tie was denoted by 0. The Bernoulli distributionwas used to determine which alters were selected as an actor’s

W. Wang et al. / Social

hat do not exist. Therefore understanding statistical power isssential for the healthy accumulation of scientific knowledge.

Fortunately, after the early call for statistical power by Cohen1962), who found that many studies lacked sufficient power in theeld of psychology, many scholars have worked on statistical powernalysis, and several books have been published offering guidelinesor researchers seeking to evaluate statistical power under variouscenarios (e.g., Cohen, 1969, 1988; Kraemer and Thiemann, 1987;urphy and Myors, 1998). Statistical power has been receiving

ncreased attention ever since.

.3. Factors influencing power in the network correlation model

Many factors affect the power of a hypothesis test, though for fixed significance level (e.g., = .05), the two primary factors areffect size and sample size. Effect size refers to the magnitude ofn effect in the population. The larger an effect size, the easier its to detect. One common way to express the magnitude of effectize is the Pearson correlation r. The squared effect size r2 reflectshe shared variance of two variables. For example, r2 = .09 indicateshat the Pearson correlation of two variables (r) is .30 and that thewo variables share 9% of their variance. In network autocorrelation

odels, the effect size has a different expression. Researchers usehe Greek letter � (pronounced ‘rho’) to denote the network effectize, as shown in the network autocorrelation regression model

= �WY + X + ε. The network effect size is related to the strengthf the variable dependence in a network. A high network effect �epresents a highly contagious dependence in the network (e.g.,here one’s friends’ attitudes are related to one’s own attitudes).

Sample size typically has a straightforward effect on statisticalower: larger sample sizes result in higher statistical power. Asample size increases, the calculated estimate of effect size getsloser to the true effect size, which yields a smaller standard error.ecause a test statistic (e.g., t-value) is typically calculated by divid-

ng the estimated effect size by its standard error, a smaller standardrror will yield a bigger test statistic, which is more likely to leado rejection of the null hypothesis. In the context of network auto-orrelation research, Farber et al. (2009), using large networks ofize 100, 500, and 1000, also found that statistical power increasedith increasing network size. Thus we propose the hypotheses:

ypothesis 1. Statistical power of the network autocorrelationodel will be positively related to network size (N).

ypothesis 2. Statistical power of the network autocorrelationodel will be positively related to network effect size (�).

It is also possible for us to be more specific about the proposedorm of the relationships between network size (N), network effectize (�), and statistical power. By noting that the network effect

is closely related to the Pearson correlation, we can go evenurther to hypothesize that the power of the network autocorre-ation model will be approximately related to t = effect size/SE =/√

(1 − �2)/(N − 2) ∼= �√

N/√

(1 − �2), which is an adapted formf the well-known t-distribution for the Pearson correlation. Noteurther that

√N is simply a nonlinear transformation of network

ize N, and �/√

(1 − �2) is simply a nonlinear transformation ofetwork effect size �. We also use the handy fact that, for theurposes of the current study, the quadratic polynomial a� + b�2

rovides a very good approximation of the quotient �/√

(1 − �2)n the range where 0 ≤ � ≤ .5. As such, we can now propose,

ypothesis 3. Statistical power of the network autocorrelation

odel will be positively related to the product of the square root

f network size (√

N) and a polynomial including the square of net-ork effect size (�2). That is, statistical power should be related to

polynomial of order �2√

N.

orks 38 (2014) 88–99 91

Besides effect size and sample size, other factors specificallyassociated with networks might additionally influence the statisti-cal power of the network autocorrelation model. In Section 3, wereviewed the literature showing that network density influencesthe estimation of the network effect; as density increases, the esti-mates of � become more negatively biased (Farber et al., 2010;Fujimoto et al., 2011; Mizruchi and Neuman, 2008; Neuman andMizruchi, 2010; Smith, 2009). Because a network effect is moredifficult to estimate in a high density network than in a low densitynetwork, network density should decrease the statistical power forthe network autocorrelation model.

Hypothesis 4. Statistical power of the network autocorrelationmodel will be negatively related to network density (d).

Further, although Neuman and Mizruchi (2010) found thatnetworks with different structures in general display a similar pat-tern of underestimation bias of �, they did identify some differences(e.g., severe underestimation of � for low-density star networks),which could have an impact on statistical power. Thus in the currentstudy we also decided to include network structure as a poten-tial factor that might influence statistical power in the networkautocorrelation model. We therefore raise the research question:

Research Question 1. Is statistical power of the network autocor-relation model a function of network structure or of interactionsbetween network structure, network density (d), network size (N)and network effect size (�)?

5. Study design and analysis procedures

To examine the statistical power of the network autocorrelationmodel, we conducted simulations that varied across our four factorsof interest: effect size, network size, network density, and networkstructure. Our models examine five levels of network effect size(� = 0, .1, .2, .3, and .5), three levels of network density (.05, .10, and.30), four network structures (random, star, small-world, and com-munity), and, depending on the network structure, five or six levelsof network size ranging from 10 to 200.1 We chose this range of net-work sizes in an attempt to understand the power of the networkautocorrelation model under network sizes that are manageablefor most social scientists, who are the potential users of this model.

5.1. Generating input variables

5.1.1. Four types of network structures studied in this paperFollowing Neuman and Mizruchi (2010), we included one ran-

dom structure and three non-random networks in our simulations.Two of the non-random networks, star and small world, are thesame as studied by Neuman and Mizruchi. For our third non-random network we examine the community structure, which webelieve is a more realistic model of social life as compared withthe caveman structure that Neuman and Mizruchi included in theirstudy.

1. Random network: In a random network, the average number ofties an actor has is determined by the product of the densityand network size. For example, for a network with a size of 50

1 Our manipulated network sizes were mathematically required to vary slightlyacross different network structures as a result of the constraints of manipulatingnetwork density and structure together.

92 W. Wang et al. / Social Networks 38 (2014) 88–99

netwo

2

3

density between communities dout. In this configuration, theoverall density d = din/g + dout(1 − 1/g). [Equivalently, the num-ber of communities g = (din − dout)/(d − dout).]3 For a given target

Fig. 1. Representations of the four

nominated friends. Although random networks typically bear lit-tle resemblance to actual social networks, we included them inthis study because (a) they serve as a control condition to whichother networks can be compared and (b) they have been includedin many other studies of network autocorrelation estimation bias(e.g., Fujimoto et al., 2011; Mizruchi and Neuman, 2008; Neumanand Mizruchi, 2010; Smith, 2009). We depict a random networkwith a size of 50 and density .10 in the upper left panel in Fig. 1.

. Star. In a pure star network one “super star” is connected toeveryone else with no ties among the others. Such a pure starstructure constrains the density of a network as a function of thenetwork’s size. As Neuman and Mizruchi (2010, p. 292) show,a pure star network with size N has N–1 total ties and a den-sity equal to (N–1)/(N × (N − 1)/2) = 2/N. Thus to study star-typenetworks with densities larger than 2/N, we added random tiesamong the non-star nodes according to the method suggestedby Neuman and Mizruchi (2010, pp. 299–300). For example, forthe 50-node star network presented in the upper-right panel ofFig. 1, we added 74 extra ties among the non-star nodes to obtaina density of .10.

. Small-World. We adopted Neuman and Mizruchi’s (2010, p. 293)method, which followed Watts and Strogatz (1998), in construc-

ting the small-world networks. In a small world network of Nnodes, each node is connected to its k nearest neighbors but withsome of the ties ‘rewired’ at random with probability p. When pis 0, the result is a ring lattice as shown in the lower left corner of

rk structures studied in this paper.

Fig. 1.2 For certain values of p > 0, the result is a network with thesame high clustering of the ring lattice but also with short aver-age path lengths between any two nodes: the two characteristicsof a small world. Given N and a target density d, we determinedthe number of edges, k, each node needed to connect with itsneighbors by rearranging the formula for density d = 2k/(N − 1)to k = d(N − 1)/2. Following Neuman and Mizruchi’s (2010) rec-ommendation, we then rewired ties with p = .10 because this wasfound to achieve high clustering and low average path length.

4. Community. A community structured network has been found tobe a common structure in the real world (Girvan and Newman,2002); it typically consists of several cohesive subgroups (i.e.,communities). We depict an example of a community networkin the lower right panel in Fig. 1. The parameters of this struc-ture are the number of nodes (that is, the network size) N, thenumber of communities g, and the overall density d, which canbe parceled into the density within communities din and the

2 For illustrative purposes, we present a network with p = 0 in Fig. 1. In the simu-lation, we set p = .1 to be more realistic.

3 In the community structure network, the value of within community density,din, is expected to be much greater than the value of between community densityin the network, dout. That is, din � dout.

Netw

5

aosesg

s1Nws

aaefrtbd

(

swNpo

fio

of network structure, where network structure was coded as a setof orthogonal contrast codes (described below).

4 Going forward we will use the generic term � for the network effect that wecalled �′ in the previous section.

W. Wang et al. / Social

overall density d in each condition, we carefully chose the valuesof din and dout to obtain a reasonable number of communitiesg in the network. For example, for the condition of networksize N = 60 with a target overall network density d = .1, weset within-community density din = .55 and outside-communitydensity dout = .01, which led to the community number g = 6, with10 members in each community. Moreover, once the within-community density din = .55 was determined for this condition,we then varied each community’s density, so that each of theg = 6 communities had a different density din, but the 6 commu-nities altogether had an average din = .55. We believe that such animplementation of varying within-community densities makesthe simulated community networks more realistic. For this spe-cific example, we sampled the 6 within-community densitiesfrom a uniform distribution U(.2, .9), which has a mean of .55,for each replication. For other conditions with different networksize N and overall density d, we used the same strategy to developappropriate parameters g, din and dout, and to vary the uniformdistributions to generate appropriate within-community densi-ties.

.2. Parameters and variables

After simulating networks (W), we generated the other vari-bles in the autocorrelation model (i.e., X, ˇ, ε, and Y). We includedne exogenous variable, X, which we randomly sampled from atandard normal distribution. was therefore a vector with twolements: the intercept (b0) and the coefficient for X (b1); and weet b0 = 2 and b1 = .3. The error term ε was a vector that was alsoenerated from a standard normal distribution.

One critical difference between our simulation and previousimulation studies that examined the estimation bias (Dow et al.,982; Farber et al., 2010; Fujimoto et al., 2011; Mizruchi andeuman, 2008; Neuman and Mizruchi, 2010; Smith, 2009) was theay in which we generated �. Because of our focus on power, we

et � as follows:

�′ = the number of standard deviations by which Y will increasewhen WY increases by one standard deviation [i.e., �raw(�WY /�Y )]

Using �′ is necessary in order to have a meaningful comparisoncross all the conditions in the simulation (as well as for futurepplications that seek to estimate the power to detect networkffects). This is because it is not the raw magnitude of the estimaterom the autocorrelation model, which we refer to here as �raw, butather the magnitude of �′ that influences power—that is, the abilityo detect the network effect (similar to how it is not the covariance,ut rather the correlation, that influences the statistical power toetect a linear relationship between two variables; Cohen, 1992).

Another way of expressing �′, when binary W is row-normalizedi.e., the sum of each row is 1), is

′ = �raw/√

Nd (1)

This is because for a row-normalized W with size N and den-ity d, each row has Nd expected elements with a value of 1/(Nd),ith the remaining elements equal to 0. Thus the vector WY, sized

× 1, is essentially a vector of sample means (Y) of Y, with a sam-le size equal to Nd. By the properties of means and variancesf random variables, E(Y) = �Y and var(Y) = �2

Y /Nd. Therefore,√ √

′ = �raw/ �2Y /Nd/�2

Y = �raw/ Nd.In the current paper, we manipulated �′ by setting it to one of

ve levels (0, .1, .2, .3, and .5). We next calculated the raw valuef the network effect, �raw, to be used as input to the network

orks 38 (2014) 88–99 93

autocorrelation model. We computed �raw using Eq. (2), by rear-ranging Eq. (1):

�raw = �′√

Nd (2)

Once X, ˇ, �raw, and ε were established, we computed thedependent “observed” variable Y by rearranging the network auto-correlation model:

Y = (I − �rawW)−1(X + ε) (3)

5.3. Simulation and analysis procedures

We followed a similar simulation procedure to Mizruchi andNeuman (2008) and �raw Neuman and Mizruchi (2010). All simula-tions and analyses were conducted within the R statistical program(R Development Core Team, 2012). We estimated parameters forthe network autocorrelation models using the lnam function inthe sna package (Butts, 2008). The lnam function is based on themaximum likelihood estimation (MLE) algorithm of Anselin (1988).

We ran a series of simulations as we varied our variables of inter-est: effect size (� = 0, .1, .2, .3, and .5),4 network size (N, which variedslightly depending on the structure being analyzed), network den-sity (d = .05, .1, and .3), and network structure (random, star, smallworld, and community). For each simulation condition, we ran 250replications. For each replication, we constructed a new networkand generated new values for the variables. After we ran the estima-tion function, we recorded the estimated network parameter andits corresponding estimated standard error, and then calculated atest statistic by dividing the estimated network parameter by thecorresponding estimated standard error. The test statistics approx-imately follow a normal distribution, thus for a nominal level of.05 the critical value is 1.96. All test statistics that exceeded 1.96were flagged as a detected statistically significant network effect.5

For conditions in which there was a network effect (i.e., � = .1, .2, .3,and .5), we then calculated statistical power (i.e., the probability ofdetecting effects that do exist) by dividing the number of detectednetwork effects by the number of replications. For conditions inwhich there was no network effect (i.e., � = 0), this same calculation(the number of detected network effects divided by the number ofreplications) equals the Type I error rate (i.e., the probability ofdetecting effects that do not exist).

Finally, in an attempt to summarize the simulation results, weconducted ordinary least squares regression analyses to examinethe linear effects of �, network size N, �2

√N, density, structure,

and their interactions on statistical power. Specifically, we ran sixregression models to test the four hypotheses and investigate theproposed research question. Model 1 tested Hypotheses 1 and 2by entering the predictor terms of � and network size N, includingtheir interaction term. Model 2 tested Hypothesis 3 by examiningthe effect of �2

√N along with its lower-order terms. Model 3 tested

Hypothesis 4 to examine the effects of density d. Last, Models 4–6investigated the proposed research question to examine the effects

5 We obtained several negative estimates of � that yielded a z statistic below−1.96, which would have been counted as statistically significant under a two-tailedtest. Because � is never negative in these simulations, we excluded these replica-tions with statistically significant negative p from the numerator when calculatingstatistical power. Otherwise, power would have been overestimated.

94

W.

Wang

et al.

/ Social

Netw

orks 38

(2014) 88–99

Table 2Regression analysis of statistical power to detect � for the network autocorrelation model Y = �WY + ˇX + ε.

Model 1 (� × N) Model 2 (�2 ×√

N) Model 3 (density d) Model 4 (star vs. random) Model 5 (small-worldvs. star & random)

Model 6 (community vs.star & random &small-world)

Constant −.107 (.056) .802* (.370) .814 (.665) 1.069 (.738) .792 (.737) .813 (.519)Effect size (�) 1.911** (.179) −9.370** (2.870) −9.185† (5.162) −12.900* (5.724) −9.072 (5.719) −9.021* (4.029)Sample size (N) .004** (.001) .016** (.005) .017* .008) .019* (.009) .016 (.009) .017* (.006)Density (d) −.094 (3.271) −.949 (3.385) .021 (3.618) −.101 (2.604)Structure star .389 (.899)Small-world .016 (.694)Community .076 (.320)�2 17.190** (4.586) 16.656* (8.247) 21.580* (9.145) 17.360 (9.137) 16.335* (6.438)√

N −.287** (.087) −.290 (.154) −.343* (.167) −.286 (.167) −.288* (.120)� × N −.006** (.002) −.155** (.036) −.158* (.063) −.194** (.067) −.157* (.067) −.154** (.050)� ×

√N 3.312** (.674) 3.325** (1.198) 4.096** (1.296) 3.315* (1.298) 3.262** (.935)

�2 × N .227** (.058) .230* (.101) .279* (.107) .238* (.108) .223** (.079)�2 ×

√N −5.107** (1.077) −5.072** (1.914) −6.094** (2.070) −5.234* (2.074) −4.950** (1.494)

d × � −.866 (25.384) 11.580 (26.270) −3.177 (28.080) −2.418 (20.212)d × N −.004 (.042) −.012 (.042) −.003 (.044) −.004 (.033)d ×

√N .022 (.773) .199 (.788) .004 (.835) .021 (.613)

d × �2 2.687 (40.558) −13.820 (41.970) 6.320 (44.870) 7.221 (32.295)d × � × N .020 (.323) .142 (.326) .007 (.342) .003 (.255)d × � ×

√N −.159 (6.000) −2.737 (6.116) .182 (6.481) .186 (4.754)

d × �2 × N −.028 (.517) −.190 (.520) −.006 (.547) .016 (.408)d × �2 ×

√N −.052 (9.587) 3.369 (9.772) −.591 (10.360) −.985 (7.596)

� × structure contrast −5.036 (6.979) −.116 (5.389) −.911 (2.485)N × structure contrast .001 (.011) .001 (.008) .001 (.004)d × structure contrast −.941 (4.088) .479 (3.334) .250 (1.696)�2 × structure contrast 6.370 (11.150) 1.854 (8.611) 2.613 (3.971)√

N × structure contrast −.048 (.210) −.010 (.154) −.022 (.073)� × N × structure contrast −.005 (.087) −.007 (.061) −.012 (.030)� ×

√N × structurre contrast .480 (1.627) .087 (1.198) .267 (.568)

�2 × N × structure contrast −.002 (.139) .024 (.097) .028 (.048)�2 ×

√N × structurre contrast −.479 (2.599) −.447 (1.914) −.648 (.908)

d × � × structure contrast 11.250 (31.730) −10.220 (25.880) −6.101 (13.167)d × N × structure contrast .001 (.055) .011 (.039) .012 (.021)d ×

√N × structure contrast .079 (.993) −.162 (.753) −.138 (.389)

d × �2 × structure contrast −16.160 (50.700) 17.230 (41.340) 13.770 (21.038)d × � × N × structure contrast −.048 (.424) −.166 (.302) −.153 (.159)d × � ×

√N × structure contrast −.328 (7.709) 2.974 (5.843) 2.270 (3.019)

d × �2 × N × structure contrast .073 (.677) .273 (.482) .305 (.254)d × �2 ×

√N × structure contrast .378 (12.320) −5.045 (9.336) −4.992 (4.823)

R2 .492** .709** .716** .736** .724** .830**

�R2 .217** .007 .020** .008 .114**

*p < .05; **p < .01. Unstandardized coefficients are reported with standard errors in parentheses. �R2 significance testing was based on an F-test: Models 2 and 3 were compared to Models 1 and 2, respectively; and Models 4, 5,and 6 were each compared to Model 3.

W. Wang et al. / Social Networks 38 (2014) 88–99 95

00.20.40.60.8

1

10 25 50 75 100 200 10 25 50 75 100 200 10 25 50 75 100 200

Pow

er

Network Size

(a) Random

ρ = .5

ρ = .3ρ = .2

densit y = .05 densit y = .1 dens ity = .3

00.20.40.60.8

1

25 50 75 100 200 25 50 75 100 200 10 25 50 75 100 200

Pow

er

Network Si ze

(b) Star

ρ = .5ρ = .3ρ = .2ρ = .1ρ = 0

density = .0 5 densi ty = . 1 density = .3

00.20.40.60.8

1

41 81 121 201 21 41 61 81 10 1 201 21 41 61 81 101 201

Pow

er

Network Si ze

(c) Small -World

ρ = .5ρ = .3ρ = .2ρ = .1ρ = 0

density = .05 density = .1 density = .3

00.20.40.60.8

1

20 40 60 80 100 200 20 40 60 80 100 20 0 20 40 60 80 100 20 0

Pow

er

Network Si ze

(d) Communit y

ρ = .5ρ = .3ρ = .2ρ = .1ρ = 0

density = .05 density = .1 densit y = .3

res, as

6

dsndra

Fig. 2. Statistical power of the four network structu

. Results

Statistical power and Type I error rates of all the studied con-itions are presented in Fig. 2 [each panel represents a differenttructure; within each panel the three graphs represent different

etwork densities (d); within each graph the five lines representifferent network effect sizes (�); and within each line the pointsepresent different network sizes (N)], and the regression resultsre presented in Table 2.

a function of network size, density, and effect size.

6.1. Power and network size

Fig. 2 shows a clear pattern that the statistical power of thenetwork autocorrelation model increases with network size, untilit reaches the perfect power level of 1.0. This monotonically

increasing pattern holds for almost all the conditions of random,star, and small-world networks, and for low density communitynetworks. For high density (d = .3) small-world networks and formedium and high density (d ≥ .1) community networks, the same

9 l Netw

mnnaecnici(2enwtcMan

6

f(shwaposcs(.atoon

6

wlstodto2bdico

6

pg

6 W. Wang et al. / Socia

onotonically increasing pattern holds, except for the largestetwork effect size (� = .5). That is, for high density communityetworks with very strong network effects (� = .5), the resultsre much more anomalous. In general, however, the results arencouraging as power is typically high (i.e., greater than .8) underonditions with a manageable network size (under 100) when theetwork effect is moderately large (e.g., � = .3). Regression analysis

n Table 2 shows that, all else equal, network size has a signifi-ant positive effect on power (Model 1; supporting Hypothesis 1),ndicating that increasing network size generally increases powerb = .004; so increasing network size by 5 nodes improves power by%, on average). We should note, however, that when the networkffect size is moderate-to-large (� ≥ .3) the relationship betweenetwork size and power appears to be much stronger at small net-ork sizes (i.e., a nonlinear relationship between N and power,

hat depends upon �). This pattern was also confirmed by a signifi-ant interaction effect between effect size and network size (� × N,odel 1 in Table 2), and more specifically by the significant inter-

ction effect between effect size squared and the square root ofetwork size (�2

√N; Model 2 in Table 2), supporting Hypothesis 3.

.2. Power and network effect �

Fig. 2 also reveals that the network effect size is one of the mainactors that influences power, which is consistent with Cohen’s1988) claim (Hypothesis 2 supported, Model 1, Table 2). Withtrong network effects, even (relatively) small networks can haveigh statistical power for detecting autocorrelation. For example,hen � = .5 in the small-world and community networks (panels c

nd d) with density of .05 or .1, a network size of 40 yields almosterfect power. In contrast, when effect size is small (� = .1), onlyne condition (community networks with density = .1 and networkize = 200) has power greater than .6, and the average power onean get with a larger network size of 200 is only .47 across all fourtructures and three levels of density. When effect size is modest� = .2), across all the conditions, one can get an average power of96 with a network size of 200, and an average power of .81 with

network size of 100. For a common network effect size of � = .3,he average power is .92 when the network size is 75. As a rulef thumb, it appears that one can typically obtain a desired powerf .80 to detect a network effect by using a network size of 40–80odes.

.3. Power and network density

Compared to network size (N) and network effect size (�), net-ork density exerts a much smaller effect on power. In Fig. 2, the

ines with the same style, which represent the same � levels, changelightly under different levels of density within each network struc-ure. The regression analysis in Table 2 (Model 3) also reveals thatverall, network density (both the main effect of density and allensity interaction effects with effect size � and network size Nerms) had almost no effect on power. That is, after including termsf effect size � and network size N in the regression model (Model), adding density and density interaction terms only increased R2

y .007 (Model 3). In addition, none of the interaction effects withensity was statistically significant. These results indicate that the

mpact of network density on statistical power of the network auto-orrelation model is almost nil. To restate, the �R2 = .7% for Model 3ver Model 2 (p > .05, n.s.) is a negligible effect.

.4. Power and network structure

To analyze the effect of network structure on statisticalower, we conducted multiple regression analyses with ortho-onal contrast coding (Models 4–6). Specifically, in Model 4, we

orks 38 (2014) 88–99

compared the star structure against the random structure (cod-ing random = −1, star = 1, small-world = 0, and community = 0).In Model 5, we contrasted small-world with random and starstructures (coding random = −1, star = −1, small-world = 2, andcommunity = 0). Finally, in Model 6, we contrasted communitystructure against all other network structures (coding random = −1,star = −1, small-world = −1, and community = 3). The advantage ofanalyzing network structure using these orthogonal contrast codesis that the network structure effects (i.e., changes in R2 of Models 4,5, and 6) are all independent of each other but can also be summedto yield the total impact of network structure on statistical power.

Interestingly, we found that network structure did not havemuch impact on statistical power, as regards the star and small-world structures (see Model 4 and Model 5 in Table 2). That is,the effects for the star structure (�R2 = 2.0%) and for the small-world structure (�R2 = .8%) are very small. However, we foundthat the community network structure had a relatively larger—yetnegative—effect on statistical power (�R2 = 11.4%), but only when�, density, and N are all simultaneously large (see Fig. 2).

To further confirm these results, we also ran a full 4-wayinteraction model (i.e., �2 × √

N × d × structures) with all 72 pre-dictors (i.e., �, N, �2,

√N, d, three orthogonal network structure

variables, the interaction terms �2 × √N × d × star, �2 × √

N × d ×small-world, �2 × √

N × d × community, and all the lower-orderterms). In analysis not reported here, the change in R2 for thisfull 72-predictor model beyond Model 3 (�R2 = 18%) was approx-imately equal to the sum of all three �R2 values for the networkstructure Models 4, 5, and 6 (i.e., 2.0% + .8% + 11.4% = 14.2%; that is,the three network structure orthogonal contrast variables togetheraccount for almost all the variance due to network structure). Inthis full model with 72 predictors, the only statistically signifi-cant individual terms were � × √

N, �2 × √N, and �2 × √

N × d ×community. This result is fully consistent with Models 1 through 6in Table 2 and with Fig. 2. Consistent with Hypotheses 1 through 3,�, N, and �2

√N primarily drive statistical power, and the answer

to Research Question 1 is that when it comes to network struc-ture, only community structure has a [negative] effect on statisticalpower—but only when �, density, and N are all large simultaneously(see Fig. 2). Results of this 72-predictor analysis are available uponrequest.

To make our results on power more instructive, we further con-structed a power table (Table 3) based on our findings. This tablepresents the network size (N) required for a power level of .80, asa function of network effect size �, network density, and networkstructure. In practice, researchers and practitioners can take thistable as a useful guide when designing a study involving the net-work autocorrelation model. In many situations, an approximatenetwork effect size (�) can be obtained or approximately estimatedfrom previous similar studies, and network density and structurecan also often be inferred, with more or less confidence, from pastresearch. With such estimates in mind, researchers may use thistable to plan a sufficient network size in order to obtain a powerfultest of the network effect, for which they are likely to be able todetect network effects that really exist (i.e., likely to reject the nullhypothesis of no network effect).

At this point, we further note that Table 3 might appear, uponfirst inspection, to contradict Table 2. That is, Table 3 seems toemphasize network density and network structure as importantfeatures in determining the statistical power of a network auto-correlation study, whereas Table 2 seems to indicate that networkdensity and network structure have only very minor impacts onpower (�R2 ≤ 2% in Table 2), with network structure only having

important consequences in the case of the community structure.To clarify the apparent distinction between Tables 2 and 3, we notethat network density and network structure have little overall effecton statistical power (Table 2 and Fig. 2), in that density and structure

W. Wang et al. / Social Networks 38 (2014) 88–99 97

Table 3Network size corresponding to power = .80, holding significance level at = .05.

Network structures Density = .05 Density = .10 Density = .30

� = 1 � = 2 � = 3 � = 5 � = 1 � = 2 � = 3 � = 5 � = 1 � = 2 � = 3 � = 5

Random >200 63 32 19 >200 106 25 11 >200 189 79 24Star >200 117 73 50 >200 133 55 41 >200 >200 74 23Small-world >200 86 40 <21 >200 81 40 <21 >200 76 36 <21Community >200 82 34 <20 >200 73 naa <20 >200 69 34 naa

a Power curve is nonmonotonic in this case, so no minimum sample size bound can be determined. In the table above, network effect size � has a major impact on statisticalpower, whereas network density and network structure have little overall effect on power (see Table 2 and Fig. 2). That is, a larger effect size � always decreases the requiredsample size to achieve 80% power. In contrast, network density and network structure—while relevant to power, as shown above—do not have consistent overall, directionale

dodcqdcnaptpd

6

pv�Fwdc6FsTˇepet

7

itpmbccpl�g

oc

ffects on power.

o little to change the rank-order of various study designs in termsf their statistical power. However, network density and structureo slightly change the shape (if not the rank order) of the powerurves in Fig. 2. As such, when we select a cut point to define “ade-uate power” (i.e., power = 80%, in the case of Table 3), then networkensity and structure can relate to the point at which the powerurve crosses the 80% threshold, even if density and structure doot have a major effect on power overall. Table 2 describes the over-ll relationships between network density, network structure, andower; whereas Table 3 supports one typical application of powerables—to estimate the exact required sample size N above whichower can be judged to be adequate (i.e., power ≥80% chance ofetecting an effect that really exists).

.5. Type I error rates

Our main interest in this paper lies with examining statisticalower (i.e., 1 – Type II error) for identifying true network effectsia the autocorrelation model. However, running simulations with

= 0 (i.e., the dotted lines with asterisk markers in each graph ofig. 2) also allows us to examine the Type I error of identifying net-ork effects (i.e., the probability of supporting effects that in realityo not exist). Type I error rates were consistently low across all theonditions. Of the 68 simulation conditions we ran where � = 0, in6 conditions (97.1%), the Type I error rates were less than .05.urther regression analysis showed that network size, density, andtar structure exerted significant unique [negative] main effects onype I error rate (bsize = −.000, ˇsize = −.284, p < .01; bdensity = −.051,density = −.431, p < .01; bstar = −.009, ˇstar = −.280, p < .05). Otherffects were not statistically significant (bsmall-world = −.004,

= .327; bcommunity = −.002, p = .508)—though we note that Type Irror rates were uniformly low, and had little overall variability inhis analysis.

. Summary and concluding remarks

Network autocorrelation models have shown great usefulnessn many disciplines, and their applications are expected to con-inue thriving in the coming years. Despite this optimistic outlook,revious research has converged on a vexing problem: the esti-ate of the network effect parameter � is consistently negatively

iased. This problem, like a fly in the ointment, annoys and even dis-ourages many researchers and potential users of this model. Oururrent paper turned around the problem and looked at it from aositive angle. Specifically, we systematically investigated the like-

ihood of correctly identifying a nonzero network effect parameter under various network conditions, in an attempt to provide useful

uidelines for users of the network effects model.

Our findings indeed bring happy news to researchers and usersf the network autocorrelation model. Results of this paper indi-ate (not surprisingly) that the statistical power for correctly

identifying the network effect in the model increases with networksize; but also show that, for most networks, a network size of 40–80is sufficient to obtain a decent statistical power level of .80. Ourresults further suggest (also not surprisingly) that statistical powerincreases with the increasing magnitude of the network effect size�. Although a monotonic relationship between statistical powerand both network size and effect size is not surprising, the requisitelevels of network size and effect size needed to attain power of 80%or higher for this model were previously unknown (see Table 3).In addition, we found that network structure also influenced thelikelihood of identifying the network effect, but only for commu-nity networks, and then only when effect size, network density,and sample size are all simultaneously large. Overall, we believethat these findings have many practical and important implica-tions for social network researchers. For example, if one attempts toobtain an expected network effect when planning a network study,s/he might be concerned with statistical power for detecting thedesired network effect. Our results suggest that statistical powercan be affected by the network size (nonlinearly), magnitude of thenetwork effect (nonlinearly), and to a much lesser degree networkdensity and structure, in systematic and interactive ways.

Perhaps a more instructive finding of this study is that �2√N seems to primarily drive power for the network autocorre-

lation model; the model with �2√

N explains power quite well(R2 = 70.9%; Model 2 in Table 2). Network density does not mattermuch beyond �2 and

√N (�R2 = .7%; Model 3 in Table 2). Network

structure also does not matter much, except where the network hasa community structure with � ≥ .3 and density > .1, in which casenetwork size begins to have unusual negative effects on power.In sum, Model 2 appears to possess high explanatory value, par-simony, and elegance, which might be good news for researchersseeking to estimate statistical power of the network autocorrela-tion model. Once one has a reasonable estimate or guess of thenetwork effect size �, estimating the statistical power can be arelatively easy task using Model 2.

Future research might reveal that statistical power of the net-work autocorrelation model can vary across different types ofnetworks. In practice, network properties are likely to vary by thetypes of network relations. For instance, in a given group of peo-ple, the communication network, the friendship network, and theadvice-seeking network might have different densities, structures,and magnitudes of the network effect size. Moreover, network sizecan also be controlled by defining different network boundaries.Thus our findings provide researchers with useful guidelines formaking important decisions regarding the choice of an appropriatetype of network and determining the reasonable network bound-ary when designing a network effects study (e.g., a study of social

influence, social contagion, or social comparison; Leenders, 2002).

Our results indicate that the statistical power of the networkautocorrelation model seems to be inconsistent with, or at leastunrelated to, the negative parameter bias issue found by many

9 l Netw

paDtlsscdbms

Tiect

acapapalKtattatc

rspoNmmcnf�nbm

posngtpsncnwnWf

8 W. Wang et al. / Socia

revious studies (Farber et al., 2010; Fujimoto et al., 2011; Mizruchind Neuman, 2008; Neuman and Mizruchi, 2010; Smith, 2009).espite the underestimation bias for � under high density condi-

ions found in the previous research, the current study shows veryittle decrement in statistical power attributable to network den-ity. We also show very little decrement in power due to networktructure, with the single exception of the anomalous findings forommunity structure under the simultaneous conditions of highensity, high �, and large N. In other words, the underestimationias found in many past studies of the network autocorrelationodel does not necessarily translate into chronic problems for the

tatistical power of the model.Another important finding related to the statistical power is

ype I error rates. Our results show that when the network effect �s 0, the Type I error rate of erroneously identifying a nonexistentffect is smaller than the nominal rate of .05 and is stable acrossonditions. These results for Type I error rates further demonstratehe robustness of the network autocorrelation model.

One area that needs more attention in the power results is thenomalous pattern observed in the community structure underonditions where both network effect size � and network densityre high. Specifically, for a density of .3, the commonly observedattern that statistical power increases with network size (N) dis-ppears, replaced by an anomalous pattern which shows a decliningower trend at large values of �. As helpfully suggested by annonymous reviewer, this is possibly due to the collinearity prob-em in network autocorrelation models (Kelejian and Prucha, 2002).elejian and Prucha (2002) found that vector WY is proportional

o the error term and collinear with the intercept, especially for single cross section and when W is row-normalized, which ishe case for our current simulations. One possibility for alleviatinghis problem is to use panel data with multiple network matrices,s demonstrated by Kelejian and Prucha (2002). We call for fur-her attention to these potential mechanisms in future research onommunity structures with high network density.

To summarize, the statistical power of the network autocor-elation model is depicted in Fig. 2. The pictures in Fig. 2 can beummarily described using the Table 2 equations, which reveal thatower is primarily a function of �, network size N, and the higher-rder term �2

√N. Network density has little effect on power.

etwork structure also has little effect on power, except for com-unity networks with very high between-group density and aoderate-to-strong network effect (� ≥ .3). Indeed, the lower right

orner of Fig. 2 reveals boundary conditions where traditionalotions of statistical power begin to break down as an interactive

unction of network density, network structure, network effect size, and network size N. Future researchers who observe or expectetwork conditions similar to the lower right corner of Fig. 2 cane warned that the statistical power of the network autocorrelationodel behaves oddly in this region.We close with three conclusions from this study. First, despite

ast studies converging on the finding of negative bias in estimationf the network effect �, our study shows that it can nonetheless beurprisingly easy to identify a significant network effect. For mostetworks, a size of 40–80 is sufficient to achieve a power of .80,iven a common network effect size such as � = .3. Second, the sta-istical power of the network autocorrelation model seems to berimarily driven by �, N, and �2

√N, while network density and

tructure have little overall impact on statistical power. Lastly, theetwork autocorrelation model exhibits reasonable Type I errorontrol across network conditions. In sum, statistical power is aovel approach to studying the network autocorrelation model,

hich advances our understanding of the model and introducesew issues and study topics in social network research and design.e hope our study will attract more attention to, and enhance

uture applications of, the network effects model.

orks 38 (2014) 88–99

Acknowledgement

There is no acknowledgement at this moment.

References

Abraham, W.T., Russell, D.W., 2008. Statistical power analysis in psychologicalresearch. Soc. Pers. Psychol. Compass 2 (1), 283–301.

Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic,Dordrecht.

Anselin, L., Hudak, S., 1992. Spatial econometrics in practice: a review of threesoftware options. Reg. Sci. Urban Econ. 22, 509–536.

Butts, C.T., 2008. Social network analysis with SNA. J. Stat. Softw. 24, 1–51.Cho, W.K.T., 2003. Contagion effects and ethnic contribution networks. Am. J. Polit.

Sci. 47, 368–387.Cliff, A.D., Ord, J.K., 1981. Spatial Processes: Models and Applications. Pion, London.Cohen, J., 1962. The statistical power of abnormal-social psychological research: a

review. J. Abnorm. Soc. Psychol. 65, 145–153.Cohen, J., 1969. Statistical Power Analysis for the Behavioral Sciences. Academic

Press, New York.Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.

Lawrence Erlbaum Associates, Hillsdale, NJ.Cohen, J., 1992. A power primer. Psychol. Bull. 112, 155–159.Crowder, K., South, S.J., 2008. Spatial dynamics of white flight: the effects of local and

extralocal racial conditions on neighborhood out-migration. Am. Sociol. Rev. 73,792–812.

Doreian, P., 1980. Linear models with spatially distributed data: spatial disturbancesor spatial effects? Sociol. Methods Res. 9, 29–60.

Doreian, P., 1981. Estimating linear models with spatially distributed data. Sociol.Methodol. 11, 359–388.

Doreian, P., 1990. Network autocorrelation models: problems and prospects. In:Griffith, D.A. (Ed.), Spatial Statistics: Past, Present, and Future, Monograph, vol.12. Institute of Mathematical Geography, Ann Arbor, pp. 369–389.

Doreian, P., Teuter, K., Wang, C.-S., 1984. Network autocorrelation models: someMonte Carlo results. Sociol. Methods Res. 13, 155–200.

Dow, M.M., 2007. Galton’s problem as multiple network autocorrelation effects: cul-tural trait transmission and ecological constraint. Cross-Cult. Res. 41, 336–363.

Dow, M.M., Eff, E.A., 2008. Global, regional, and local network autocorrelation in thestandard cross-cultural sample. Cross-Cult. Res. 42, 148–171.

Dow, M.M., Burton, M.L., White, D.R., 1982. Network autocorrelation: a simula-tion study of a foundational problem in regression and survey research. Soc.Networks 4, 169–200.

Farber, S., Páez, A., Volz, E., 2009. Topology and dependency tests in spatial andnetwork autoregressive models. Geogr. Anal. 41, 158–180.

Farber, S., Páez, A., Volz, E., 2010. Topology, dependency tests and estimation bias innetwork autoregressive models. Prog. Spatial Anal. Adv. Spatial Sci., 29–57.

Francis, G., 2012a. Evidence that publication bias contaminated studies relatingsocial class and unethical behavior. Proc. Natl. Acad. Sci. 109, E1587.

Francis, G., 2012b. Replication initiative: beware misinterpretation. Science 336(6083), 802.

Franzese, R.J., Hays, J.C., 2007. Spatial-econometric models of cross-sectional inter-dependence in political-science panel and time-series-cross-section data. Polit.Anal. 15, 140–164.

Franzese, R.J., Hays, J.C., Kachi, A., 2012. Modeling history dependence in network-behavior coevolution. Polit. Anal. 20, 175–190.

Fujimoto, K., Chou, C-P., Valente, T.W., 2011. The network autocorrelation modelusing two-mode data: affiliation exposure and potential bias in the autocorre-lation parameter. Soc. Networks 33, 231–243.

Girvan, M., Newman, M.E.J., 2002. Community structure in social and biologicalnetworks. Proc. Natl. Acad. Sci. 99, 7821–7826.

Ibarra, H., Andrews, S.B., 1993. Power, social influence, and sense making: effects ofnetwork centrality and proximity on employee perceptions. Adm. Sci. Q. 38 (2),277.

Kelejian, H.H., Prucha, I.R., 2002. 2SLS and OLS in a spatial autoregressive model withequal weights. Reg. Sci. Urban Econ. 32, 691–707.

Kraemer, H.C., Thiemann, S., 1987. How Many Subjects? Statistical Power Analysisin Research. Sage, Newbury Park, CA.

Leenders, R.A.J., 2002. Modeling social influence through network autocorrelation:constructing the weight matrix. Soc. Networks 24, 21–47.

LeSage, J.P., 1999. Spatial Econometrics. Web Book of Regional Science.http://www.rri.wvu.edu/regscweb.htm

Loftin, C., Ward, S.K., 1983. A spatial autocorrelation model of the effects of popula-tion density on fertility. Am. Sociol. Rev. 48, 121–128.

Mizruchi, M.S., Neuman, E.J., 2008. The effect of density on the level of bias in thenetwork autocorrelation model. Soc. Networks 30, 190–200.

Mizruchi, M.S., Stearns, L.B., Marquis, C., 2006. The conditional nature of embedded-ness: a study of borrowing by large U.S. firms 1973–1994. Am. Sociol. Rev. 71,310–333.

Murphy, K.R., Myors, B., 1998. Statistical Power Analysis: A Simple and GeneralModel for Traditional and Modern Hypothesis Tests. Lawrence Erlbaum Asso-ciates, Mahwah, NJ.

Neuman, E.J., Mizruchi, M.S., 2010. Structure and bias in the network autocorrelationmodel. Soc. Networks 32, 290–300.

Netw

O

PR

R

S

W. Wang et al. / Social

rd, K., 1975. Estimation methods for models of spatial interaction. J. Am. Stat. Assoc.70, 120–126.

isati, M., 2001. Tools for spatial data analysis. Stata Tech. Bull. 60, 21–37. Development Core Team, 2012. R: A Language and Environment for Statistical

Computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-

900051-07-0 http://www.R-project.org/

osenthal, R., 1979. The ‘file drawer problem’ and tolerance for null results. Psychol.Bull. 86, 638–641.

chmidt, F.L., 1992. What do data really mean? Research findings, meta-analysis,and cumulative knowledge in psychology. Am. Psychol. 47, 1173–1181.

orks 38 (2014) 88–99 99

Simmons, J.P., Nelson, L.D., Simonsohn, U., 2011. False-positive psychology: undis-closed flexibility in data collection and analysis allows presenting anything assignificant. Psychol. Sci. 22 (11), 1359–1366.

Smith, T.E., 2009. Estimation bias in spatial models with strongly connected weightmatrices. Geogr. Anal. 41, 307–332.

Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘small-world’ networks.Nature 393, 440–442.

White, D.R., Burton, M.L., Dow, M.M., 1981. Sexual division of labor inAfrican agriculture: a network autocorrelation analysis. Am. Anthropol. 83,824–849.