Demography 7263 fall 2015 spatially autoregressive models 2

10/7/2015 DEM 7263 Fall 2015 - Spatially Autoregressive Models 2

file:///Users/ozd504/Google%20Drive/dem7263/Rcode15/Lecture_3.html 1/21

DEM 7263 Fall 2015 - SpatiallyAutoregressive Models 2Corey S. Sparks, Ph.D.September 16, 2015

Spatial Regression ModelsThis lecture builds off the previous lecture on the Spatially Autoregressive Model (SAR) with either a lag orerror specification. The lag model is written:

Where Y is the dependent variable, X is the matrix of independent variables, is the vector of regressionparameters to be estimated from the data, is the autoregressive coefficient, which tells us how strongthe resemblance is, on average, between and it’s neighbors. The matrix W is the spatial weight matrix,describing the spatial network structure of the observations, like we described in the ESDA lecture.

In the lag model, we are specifying the spatial component on the dependent variable. This leads to aspatial filtering of the variable, where they are averaged over the surrounding neighborhood defined in W,called the spatially lagged variable. In R we use the spdep package, and the lagsarlm() function to fitthis model.

The error model says that the autocorrelation is not in the outcome itself, but instead, any autocorrelationis attributable to there being missing spatial covariates in the data. If these spatially patterned covariatescould be measures, the tne autocorrelation would be 0. This model is written:

This model, in effect, controls for the nuisance of correlated errors in the data that are attributable to aninherently spatial process, or to spatial autocorrelation in the measurement errors of the measured andpossibly unmeasured variables in the model. This model is estimated in R using errorsarlm() in thespdep library.

Examination of Model SpecificationTo some degree, both of the SAR specifications allow us to model spatial dependence in the data. Theprimary difference between them is where we model said dependence.

The lag model says that the dependence affects the dependent variable only, we can liken this to adiffusion scenario, where your neighbors have a diffusive effect on you.

The error model says that dependence affects the residuals only. We can liken this to the missing spatiallydependent covariate situation, where, if only we could measure another really important spatiallyassociated predictor, we could account for the spatial dependence. But alas, we cannot, and we insteadmodel dependence in our errors.

Y = ρWY + β + eX ′

βρ

Yi

Y = β + eX ′

e = λWe + v



These are inherently two completely different ways to think about specifying a model, and we should reallymake our decision based upon how we think our process of interest operates.

That being said, this way of thinking isn’t necessarily popular among practitioners. Most practitioners wantthe best fitting model, ‘nuff said. So methods have been developed that test for alternate modelspecifications, to see which kind of model best summarizes the observed variation in the dependentvariable and the spatial dependence.

More exotic types of spatial dependenceSpatial Durbin Model Another form of a spatial lag model is the Spatial Durbin Model (SDM). This modelis an extension of the ordinary lag or error model that includes spatially lagged independent variables. Ifyou remember, one issue that commonly occures with the lag model, is that we often have residualautocorrelation in the model. This autocorrelation could be attributable to a missing spatial covariate. Wecan get a kind of spatial covariate by lagging the predictor variables in the model using W. This model canbe written:

Where, the parameter vector are now the regression coefficients for the lagged predictor variables. Wecan also include the lagged predictors in an error model, which gives us the Durbin Error Model (DEM):

Generally, the spatial Durbin model is preferred to the ordinary error model, because we can include the“unspecified” spatial covariates from the error model into the Durbin model via the lagged predictorvariables.

Spatially Autoregressive Moving Average Model Futher extensions of these models includedependence on both the outcome and the error process. Two models are described in LeSage and Pace(https://books.google.com/books?id=EKiKXcgL-D4C&hl=en). The Spatial Autocorrelation Model, or SACmodel and the Spatially autoregressive moving average model (SARMA model). The SAC model is:

Where, you can potentially have two different spatial weight matrices, and . Here, the lagged errorterm is taken over all orders of neighbors, leading to a more global error process, while the SARMA modelhas form:

Y = ρWY + β + WXθ + eX ′

θ

Y = β + WXθ + eX ′

e = λWe + v

Y = ρ Y + β + eW1 X ′

e = θ e + vW2

Y = ( + ρ β + ( + ρ ( + θ eIn W1)+1X ′ In W1)+1 In W2)+1

W1 W2

Y = ρ Y + β + uW1 X ′

u = ( + θ )eIn W2

e U N(0, )σ2In

Y = ( + ρ β + ( + ρ ( + θ )eIn W1)+1X ′ In W1)+1 In W2

https://books.google.com/books?id=EKiKXcgL-D4C&hl=en



which gives a “locally” weighted moving average to the residuals, which will avereage the residuals only inthe local neighborhood, instead of over all neighbor orders.

Fitting these models in R can be done in the spdep library.

spdat<-readShapePoly("~/Google Drive/dem7263/data/usdata_mort.shp")#Create a k=4 nearest neighbor setus.nb4<-knearneigh(coordinates(spdat), k=4)us.nb4<-knn2nb(us.nb4)us.wt4<-nb2listw(us.nb4, style="W")

hist(spdat$mortrate)

spplot(spdat,"mortrate", at=quantile(spdat$mortrate), col.regions=brewer.pal(n=5, "Reds"), main="Spatial Distribution of US Mortality Rate")



fit.1.us<-lm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat)summary(fit.1.us)



## ## Call:## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat)## ## Residuals:## Min 1Q Median 3Q Max ## -3.8305 -0.4275 0.0283 0.4764 4.3289 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.11679 0.01987 5.877 4.63e-09 ***## scale(ppersonspo) 0.60711 0.01705 35.615 < 2e-16 ***## scale(p65plus) -0.04993 0.01521 -3.283 0.00104 ** ## scale(pblack_1) 0.11096 0.01637 6.780 1.44e-11 ***## scale(phisp) -0.28913 0.01496 -19.327 < 2e-16 ***## I(RUCC >= 7)TRUE -0.25367 0.03133 -8.096 8.09e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 0.757 on 3061 degrees of freedom## Multiple R-squared: 0.4279, Adjusted R-squared: 0.427 ## F-statistic: 457.9 on 5 and 3061 DF, p-value: < 2.2e-16

lm.morantest(fit.1.us, listw=us.wt4)

## ## Global Moran's I for regression residuals## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## data = spdat)## weights: us.wt4## ## Moran I statistic standard deviate = 32.692, p-value < 2.2e-16## alternative hypothesis: greater## sample estimates:## Observed Moran's I Expectation Variance ## 0.399924558 -0.001324880 0.000150646

#SAR - Lag modelfit.lag<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, type="lag", method="MC")summary(fit.lag, Nagelkerke=T)



#SAR - Error modelfit.err<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, etype="error", method="MC")summary(fit.err, Nagelkerke=T)

## ## Call:## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, type = "lag", method = "MC")## ## Residuals:## Min 1Q Median 3Q Max ## -3.466301 -0.344081 0.018554 0.372984 4.207889 ## ## Type: lag ## Coefficients: (numerical Hessian approximate standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.089962 0.016594 5.4213 5.918e-08## scale(ppersonspo) 0.388805 0.015567 24.9763 < 2.2e-16## scale(p65plus) -0.011433 0.012948 -0.8830 0.377211## scale(pblack_1) 0.039483 0.013815 2.8580 0.004263## scale(phisp) -0.159103 0.013043 -12.1983 < 2.2e-16## I(RUCC >= 7)TRUE -0.193846 0.026215 -7.3945 1.419e-13## ## Rho: 0.52131, LR test value: 902.04, p-value: < 2.22e-16## Approximate (numerical Hessian) standard error: 0.01516## z-value: 34.388, p-value: < 2.22e-16## Wald statistic: 1182.5, p-value: < 2.22e-16## ## Log likelihood: -3043.937 for lag model## ML residual variance (sigma squared): 0.3983, (sigma: 0.63111)## Nagelkerke pseudo-R-squared: 0.57369 ## Number of observations: 3067 ## Number of parameters estimated: 8 ## AIC: 6103.9, (AIC for lm: 7003.9)



## ## Call:## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, etype = "error", method = "MC")## ## Residuals:## Min 1Q Median 3Q Max ## -3.210098 -0.346408 0.012802 0.374748 4.328378 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.0725882 0.0310245 2.3397 0.0193## scale(ppersonspo) 0.4709620 0.0191357 24.6117 < 2.2e-16## scale(p65plus) 0.0069908 0.0158295 0.4416 0.6588## scale(pblack_1) 0.1241018 0.0219149 5.6629 1.488e-08## scale(phisp) -0.1747092 0.0216743 -8.0607 6.661e-16## I(RUCC >= 7)TRUE -0.1568718 0.0298017 -5.2639 1.411e-07## ## Lambda: 0.59373, LR test value: 864.1, p-value: < 2.22e-16## Approximate (numerical Hessian) standard error: 0.016898## z-value: 35.137, p-value: < 2.22e-16## Wald statistic: 1234.6, p-value: < 2.22e-16## ## Log likelihood: -3062.906 for error model## ML residual variance (sigma squared): 0.39392, (sigma: 0.62763)## Nagelkerke pseudo-R-squared: 0.56838 ## Number of observations: 3067 ## Number of parameters estimated: 8 ## AIC: 6141.8, (AIC for lm: 7003.9)

#Spatial Durbin Modelfit.durb<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, type="mixed", method="MC")summary(fit.durb, Nagelkerke=T)



#Spatial Durbin Error Modelfit.errdurb<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, etype="emixed", method="MC")summary(fit.errdurb, Nagelkerke=T)

## ## Call:## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, type = "mixed", method = "MC")## ## Residuals:## Min 1Q Median 3Q Max ## -3.447670 -0.342805 0.012785 0.367833 4.212018 ## ## Type: mixed ## Coefficients: (numerical Hessian approximate standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.091094 0.021729 4.1923 2.762e-05## scale(ppersonspo) 0.444623 0.020635 21.5474 < 2.2e-16## scale(p65plus) 0.040883 0.016606 2.4620 0.0138170## scale(pblack_1) 0.075434 0.026573 2.8387 0.0045294## scale(phisp) -0.051883 0.028393 -1.8273 0.0676571## I(RUCC >= 7)TRUE -0.149795 0.030011 -4.9914 5.995e-07## lag.scale(ppersonspo) -0.111945 0.028397 -3.9422 8.074e-05## lag.scale(p65plus) -0.086548 0.022807 -3.7948 0.0001478## lag.scale(pblack_1) -0.056995 0.029989 -1.9005 0.0573651## lag.scale(phisp) -0.117009 0.031643 -3.6978 0.0002175## lag.I(RUCC >= 7)TRUE -0.048229 0.045244 -1.0660 0.2864326## ## Rho: 0.55654, LR test value: 817.69, p-value: < 2.22e-16## Approximate (numerical Hessian) standard error: 0.016799## z-value: 33.129, p-value: < 2.22e-16## Wald statistic: 1097.5, p-value: < 2.22e-16## ## Log likelihood: -3009.484 for mixed model## ML residual variance (sigma squared): 0.38516, (sigma: 0.62061)## Nagelkerke pseudo-R-squared: 0.58316 ## Number of observations: 3067 ## Number of parameters estimated: 13 ## AIC: 6045, (AIC for lm: 6860.7)



## ## Call:## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, etype = "emixed", method = "MC")## ## Residuals:## Min 1Q Median 3Q Max ## -3.3942600 -0.3428956 0.0072492 0.3682379 4.2252946 ## ## Type: error ## Coefficients: (asymptotic standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.1418118 0.0404196 3.5085 0.0004507## scale(ppersonspo) 0.4795335 0.0194441 24.6622 < 2.2e-16## scale(p65plus) 0.0214884 0.0160595 1.3380 0.1808815## scale(pblack_1) 0.0839306 0.0238751 3.5154 0.0004391## scale(phisp) -0.0976042 0.0252078 -3.8720 0.0001080## I(RUCC >= 7)TRUE -0.1707347 0.0301252 -5.6675 1.449e-08## lag.scale(ppersonspo) 0.1721983 0.0315524 5.4575 4.828e-08## lag.scale(p65plus) -0.1018223 0.0289988 -3.5113 0.0004460## lag.scale(pblack_1) -0.0069395 0.0331526 -0.2093 0.8341982## lag.scale(phisp) -0.2354451 0.0332012 -7.0915 1.327e-12## lag.I(RUCC >= 7)TRUE -0.1395266 0.0573674 -2.4322 0.0150092## ## Lambda: 0.55704, LR test value: 789.54, p-value: < 2.22e-16## Approximate (numerical Hessian) standard error: 0.017131## z-value: 32.517, p-value: < 2.22e-16## Wald statistic: 1057.4, p-value: < 2.22e-16## ## Log likelihood: -3023.558 for error model## ML residual variance (sigma squared): 0.38865, (sigma: 0.62342)## Nagelkerke pseudo-R-squared: 0.57932 ## Number of observations: 3067 ## Number of parameters estimated: 13 ## AIC: 6073.1, (AIC for lm: 6860.7)

#SAC Modelfit.sac<-sacsarlm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, type="sac", method="MC")summary(fit.sac, Nagelkerke=T)



#SMA Modelfit.sma<-spautolm(scale(mortrate)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, listw=us.wt4, family="SMA")summary(fit.sma)

## ## Call:## sacsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, type = "sac", method = "MC")## ## Residuals:## Min 1Q Median 3Q Max ## -3.286200 -0.323325 0.018039 0.349200 3.786466 ## ## Type: sac ## Coefficients: (numerical Hessian approximate standard errors) ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.0722677 0.0124066 5.8249 5.714e-09## scale(ppersonspo) 0.2715715 0.0164644 16.4945 < 2.2e-16## scale(p65plus) -0.0089727 0.0096066 -0.9340 0.3503## scale(pblack_1) 0.0062671 0.0094103 0.6660 0.5054## scale(phisp) -0.1108634 0.0102596 -10.8058 < 2.2e-16## I(RUCC >= 7)TRUE -0.1539096 0.0215581 -7.1393 9.381e-13## ## Rho: 0.7211## Approximate (numerical Hessian) standard error: 0.019407## z-value: 37.156, p-value: < 2.22e-16## Lambda: -0.43194## Approximate (numerical Hessian) standard error: 0.045712## z-value: -9.4493, p-value: < 2.22e-16## ## LR test value: 952.18, p-value: < 2.22e-16## ## Log likelihood: -3018.867 for sac model## ML residual variance (sigma squared): 0.34811, (sigma: 0.59)## Nagelkerke pseudo-R-squared: 0.5806 ## Number of observations: 3067 ## Number of parameters estimated: 9 ## AIC: 6055.7, (AIC for lm: 7003.9)



## ## Call: ## spautolm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(p65plus) + ## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, ## listw = us.wt4, family = "SMA")## ## Residuals:## Min 1Q Median 3Q Max ## -3.293104 -0.344431 0.016537 0.380445 4.329735 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.085878 0.024072 3.5675 0.0003603## scale(ppersonspo) 0.519189 0.018359 28.2795 < 2.2e-16## scale(p65plus) -0.014067 0.015643 -0.8992 0.3685331## scale(pblack_1) 0.131516 0.019449 6.7619 1.362e-11## scale(phisp) -0.228134 0.018665 -12.2224 < 2.2e-16## I(RUCC >= 7)TRUE -0.186258 0.030287 -6.1497 7.763e-10## ## Lambda: 0.54914 LR test value: 645.76 p-value: < 2.22e-16 ## Numerical Hessian standard error of lambda: 0.021071 ## ## Log likelihood: -3172.079 ## ML residual variance (sigma squared): 0.49357, (sigma: 0.70254)## Number of observations: 3067 ## Number of parameters estimated: 8 ## AIC: 6360.2

Using the Lagrange Multiplier Test (LMT)The so-called Lagrange Multiplier (econometrician’s jargon for a score test(https://en.wikipedia.org/wiki/Score_test)) test. These tests compare the model fits from the OLS, spatialerror, and spatial lag models using the method of the score test.

For those who don’t remember, the score test is a test based on the relative change in the first derivativeof the likelihood function around the maximum likelihood. The particular thing here that is affecting thevalue of this derivative is the autoregressive parameter, or . In the OLS model or = 0 (so both thelag and error models simplify to OLS), but as this parameter changes, so does the likelihood for the model,hence why the derivative of the likelihood function is used. This is all related to how the estimationroutines estimate the value of or .

In general, you fit the OLS model to your dependent variable, then submit the OLS model fit to the LMTtesting procedure.

Then you look to see which model (spatial error, or spatial lag) has the highest value for the test.

Enter the uncertainty… So how much bigger, you might say?

Well, drastically bigger, if the LMT for the error model is 2500 and the LMT for the lag model is 2480, thisis NOT A BIG DIFFERENCE, only about 1%. If you see a LMT for the error model of 2500 and a LMT for

ρ λ ρ λ

ρ λ

https://en.wikipedia.org/wiki/Score_test



the lag model of 250, THIS IS A BIG DIFFERENCE.

So what if you don’t see a BIG DIFFERENCE, HOW DO YOU DECIDE WHICH MODEL TO USE???

Well, you could think more, but who has time for that.

The econometricians have thought up a “better” LMT test, the so-called robust LMT, robust to what I’mnot sure, but it is said that it can settle such problems of a “not so big difference” between the lag anderror model specifications.

So what do you do? In general, think about your problem before you run your analysis, should this fail you,proceed with using the LMT, if this is inconclusive, look at the robust LMT, and choose the model whichhas the larger value for this test.

Here’s how we do the Lagrange Multiplier test in R:

lm.LMtests(fit.1.us, listw=us.wt4, test="all")



## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## data = spdat)## weights: us.wt4## ## LMerr = 1056.8, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## data = spdat)## weights: us.wt4## ## LMlag = 1084.9, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## data = spdat)## weights: us.wt4## ## RLMerr = 77.419, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),## data = spdat)## weights: us.wt4## ## RLMlag = 105.56, df = 1, p-value < 2.2e-16## ## ## Lagrange multiplier diagnostics for spatial dependence## ## data: ## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 7),



## data = spdat)## weights: us.wt4## ## SARMA = 1162.4, df = 2, p-value < 2.2e-16

There is a 2.66% difference the regular LM test between the error and lag models, but a 36.35%difference in the Robust LM tests. In this case, I would say that either the lag model looks like the bestone, using the Robust Lagrange multiplier test, or possibly the SARMA model, since it’s test is 7.14%difference between it and the lag model. Unfortunately, there is no a robust test for SARMA model.

Of course, the AIC is also your friend:

AICs<-c(AIC(fit.1.us),AIC(fit.lag), AIC(fit.err), AIC(fit.durb), AIC(fit.errdurb), AIC(fit.sac), AIC(fit.sma))plot(AICs, type="l", lwd=1.5, xaxt="n", xlab="")axis(1, at=1:7,labels=F) #6= number of modelslabels<-c("OLS", "Lag","Err", "Durbin","Err Durbin", "SAC", "SMA" )text(1:7, par("usr")[3]-.25, srt=45, adj=1, labels=labels, xpd=T)mtext(side=1, text="Model Specification", line=3)symbols(x= which.min(AICs), y=AICs[which.min(AICs)], circles=1, fg=2,lwd=2,add=T)



knitr::kable(data.frame(Models=labels, AIC=round(AICs, 2)))

Models AIC

OLS 7003.92

Lag 6103.87

Err 6141.81

Durbin 6044.97

Err Durbin 6073.12

SAC 6055.73

SMA 6360.16



Which shows that the Spatial Durbin model best fits the data, although the degree of difference between itan the SAC model is small. A likelihood ratio test could be used:

anova(fit.sac, fit.durb)

## Model df AIC logLik Test L.Ratio p-value## fit.sac 1 9 6055.7 -3018.9 1 ## fit.durb 2 13 6045.0 -3009.5 2 18.766 0.00087355

Which indicates that the Durbin model fits significantly better than the SAC model. Durbin it is!!

Interpreting effects in spatial lag modelsIn spatial lag models, interpretation of the regression effects is complicated. Each observation will have adirect effect of its predictors, but each observation will also have in indirect effect of the information of itsneighbors, although Spatial Error models do not have this issue. In OLS, the impact/effect of a predictor isstraight forward: and , but when a model has a spatial lag of either the outcome or a

predictor, this becomes more complicated, indeed: may not = 0, or , where

This implies that a change in the ith region’s predictor can affect the jth region’soutcome * We have 2 situations: * , or the direct impact of an observation’s predictor on its ownoutcome, and: * , or the indirect impact of an observation’s neighbor’s predictor on its outcome.

This leads to three quantities that we want to know: * Average Direct Impact, which is similar to atraditional interpretation * Average Total impact, which would be the total of direct and indirect impacts ofa predictor on one’s outcome * Average Indirect impact, which would be the average impact of one’sneighbors on one’s outcome

These quantities can be found using the impacts() function in the spdep library. We follow the examplethat converts the spatial weight matrix into a “sparse” matrix, and power it up using the trW() function.This follows the approximation methods described in Lesage and Pace, 2009. Here, we use Monte Carlosimulation to obtain simulated distributions of the various impacts. We are looking for the first part of theoutput and

W <- as(us.wt4, "CsparseMatrix")trMC <- trW(W, type="MC")im<-impacts(fit.durb, tr=trMC, R=100)sums<-summary(im, zstats=T)data.frame(sums$res)

## direct indirect total## scale(ppersonspo) 0.46662605 0.28356899 0.75019505## scale(p65plus) 0.03052456 -0.13349913 -0.10297457## scale(pblack_1) 0.07299591 -0.03141598 0.04157993## scale(phisp) -0.07557088 -0.30528244 -0.38085333## I(RUCC >= 7)TRUE -0.17116336 -0.27538528 -0.44654864

=δyi

δxikβk = 0δyi

δxjkδyi

δxjk= (W)δyi

δxjkSr

(W) = ( + ρWSr In )+1βk(WSr )ii

(WSr )ij



data.frame(sums$pzmat)

## Direct Indirect Total## scale(ppersonspo) 0.000000e+00 3.025957e-11 0.000000e+00## scale(p65plus) 5.627965e-02 1.666638e-04 8.603047e-03## scale(pblack_1) 4.002635e-03 4.458669e-01 2.282902e-01## scale(phisp) 3.258540e-03 1.776357e-15 0.000000e+00## I(RUCC >= 7)TRUE 4.701231e-08 3.215075e-04 1.183164e-07

We see all variables have a significant direct effect, we also see that poverty, %65 and older, hispanic %and Rural classifications all have significant indirect impacts.

We can likewise see the effects by order of neighbors, similar to what Yang et al(2015)(http://onlinelibrary.wiley.com/doi/10.1002/psp.1809/abstract) do in their Table 4.

Here, I do this up to 5th order neighbors.

im2<-impacts(fit.durb, tr=trMC, R=100, Q=5)sums2<-summary(im2, zstats=T, reportQ=T, short=T)sums2

http://onlinelibrary.wiley.com/doi/10.1002/psp.1809/abstract



## Impact measures (mixed, trace):## Direct Indirect Total## scale(ppersonspo) 0.46662605 0.28356899 0.75019505## scale(p65plus) 0.03052456 -0.13349913 -0.10297457## scale(pblack_1) 0.07299591 -0.03141598 0.04157993## scale(phisp) -0.07557088 -0.30528244 -0.38085333## I(RUCC >= 7)TRUE -0.17116336 -0.27538528 -0.44654864## =================================## Impact components## $direct## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 0.444622945 4.088322e-02 0.0754335845 -0.051882953## Q2 -0.013343554 -1.031633e-02 -0.0067936505 -0.013947189## Q3 0.027779054 1.384883e-03 0.0041301314 -0.005236257## Q4 0.001876622 -1.134093e-03 -0.0003328032 -0.002447803## Q5 0.003725966 -7.670799e-06 0.0004575737 -0.001032264## I(RUCC >= 7)TRUE## Q1 -0.149795252## Q2 -0.005748838## Q3 -0.010676877## Q4 -0.002104921## Q5 -0.001650075## ## $indirect## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 -0.11194455 -0.086547898 -0.056994720 -0.11700869## Q2 0.19849377 -0.015098082 0.017055692 -0.08004847## Q3 0.07526521 -0.015529123 0.001581148 -0.04707647## Q4 0.05547206 -0.006737802 0.003511382 -0.02666654## Q5 0.02819111 -0.004373386 0.001311446 -0.01517115## I(RUCC >= 7)TRUE## Q1 -0.04822936## Q2 -0.10446060## Q3 -0.05065954## Q4 -0.03203150## Q5 -0.01734835## ## $total## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 0.33267839 -0.045664678 0.018438864 -0.16889164## Q2 0.18515021 -0.025414409 0.010262042 -0.09399565## Q3 0.10304427 -0.014144241 0.005711279 -0.05231273## Q4 0.05734868 -0.007871894 0.003178579 -0.02911434## Q5 0.03191707 -0.004381057 0.001769020 -0.01620342## I(RUCC >= 7)TRUE## Q1 -0.19802461## Q2 -0.11020944## Q3 -0.06133642## Q4 -0.03413642## Q5 -0.01899843



## ## ========================================================## Simulation results (numerical Hessian approximation variance matrix):## ========================================================## Simulated z-values:## Direct Indirect Total## scale(ppersonspo) 24.554793 6.7216483 18.104171## scale(p65plus) 1.900607 -3.9878531 -2.837798## scale(pblack_1) 2.742441 -0.6043566 1.279930## scale(phisp) -2.836526 -9.2371212 -13.114996## I(RUCC >= 7)TRUE -5.925557 -2.9859748 -4.682966## ## Simulated p-values:## Direct Indirect Total ## scale(ppersonspo) < 2.22e-16 1.7968e-11 < 2.22e-16## scale(p65plus) 0.0573535 6.6674e-05 0.0045426 ## scale(pblack_1) 0.0060984 0.5456066 0.2005697 ## scale(phisp) 0.0045607 < 2.22e-16 < 2.22e-16## I(RUCC >= 7)TRUE 3.1124e-09 0.0028268 2.8275e-06## ========================================================## Simulated impact components z-values:## $Direct## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 22.031762 2.50264522 2.6289953 -1.821736## Q2 -3.581552 -4.33455683 -1.6372627 -4.258351## Q3 16.177175 1.44532145 2.7793864 -3.469907## Q4 5.320057 -3.95357488 -0.7569249 -7.613727## Q5 9.285672 -0.09051354 2.7890538 -5.417370## I(RUCC >= 7)TRUE## Q1 -5.0167049## Q2 -0.9294231## Q3 -5.8401338## Q4 -2.7180194## Q5 -5.2438143## ## $Indirect## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 -3.741952 -4.328597 -1.6448784 -4.235061## Q2 23.483859 -1.962929 2.2884409 -11.329172## Q3 13.530230 -3.355604 0.4615838 -11.817991## Q4 13.304216 -2.641677 1.5096461 -10.250396## Q5 9.996513 -2.927159 1.0046851 -8.502278## I(RUCC >= 7)TRUE## Q1 -0.9340371## Q2 -5.5117560## Q3 -4.0125250## Q4 -4.5937811## Q5 -4.0956369## ## $Total## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)



## Q1 14.85348 -2.811740 1.285049 -12.132992## Q2 17.88894 -2.834859 1.281413 -13.148656## Q3 16.78818 -2.835910 1.274677 -12.287536## Q4 13.12455 -2.814464 1.265018 -10.358400## Q5 10.04507 -2.771581 1.252643 -8.483635## I(RUCC >= 7)TRUE## Q1 -4.691556## Q2 -4.691647## Q3 -4.604700## Q4 -4.443521## Q5 -4.229207## ## ## Simulated impact components p-values:## $Direct## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 < 2.22e-16 0.012327 0.0085638 0.06849504 ## Q2 0.00034156 1.4605e-05 0.1015756 2.0594e-05 ## Q3 < 2.22e-16 0.148368 0.0054462 0.00052064 ## Q4 1.0373e-07 7.6992e-05 0.4490948 2.6645e-14 ## Q5 < 2.22e-16 0.927879 0.0052862 6.0482e-08 ## I(RUCC >= 7)TRUE## Q1 5.2565e-07 ## Q2 0.3526699 ## Q3 5.2159e-09 ## Q4 0.0065674 ## Q5 1.5729e-07 ## ## $Indirect## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 0.0001826 1.5006e-05 0.099995 2.2849e-05 ## Q2 < 2.22e-16 0.04965448 0.022112 < 2.22e-16 ## Q3 < 2.22e-16 0.00079192 0.644380 < 2.22e-16 ## Q4 < 2.22e-16 0.00824967 0.131134 < 2.22e-16 ## Q5 < 2.22e-16 0.00342074 0.315048 < 2.22e-16 ## I(RUCC >= 7)TRUE## Q1 0.35028 ## Q2 3.5527e-08 ## Q3 6.0073e-05 ## Q4 4.3529e-06 ## Q5 4.2101e-05 ## ## $Total## scale(ppersonspo) scale(p65plus) scale(pblack_1) scale(phisp)## Q1 < 2.22e-16 0.0049274 0.19878 < 2.22e-16 ## Q2 < 2.22e-16 0.0045846 0.20005 < 2.22e-16 ## Q3 < 2.22e-16 0.0045695 0.20242 < 2.22e-16 ## Q4 < 2.22e-16 0.0048859 0.20586 < 2.22e-16 ## Q5 < 2.22e-16 0.0055785 0.21034 < 2.22e-16 ## I(RUCC >= 7)TRUE## Q1 2.7114e-06



## Q2 2.7101e-06 ## Q3 4.1306e-06 ## Q4 8.8498e-06 ## Q5 2.3452e-05

So we see that, for instance, for the direct impact of poverty, .4446/.4667 = 95.26% of the effect is due toa county’s own influence on itself, while (-.013 + .0277 + .0019 + .0037)/.4667 = 4.35 % of the effect ofpoverty comes from other neighboring counties.

Science

Demography 7263 fall 2015 spatially autoregressive models 2