Lectures 3: Bayesian estimation of spatial regression models · ALL Bayesian econometric theory in 2 overheads In econometrics we care about 1) inference about parameters, 2) model

Lectures 3: Bayesian estimation ofspatial regression models

James P. LeSage

University of Toledo

Department of Economics

Toledo, OH 43606

[email protected]

March 2004

Introduction

• Application of Bayesian estimation methods to SAR,

SDM and SEM spatial regression models where the

number of observations is very large should result

in estimates nearly identical to those from maximum

likelihood methods.

• A standard result in econometrics because prior

information is dominated by a large amount of sample

information.

• Three points,

1) Bayesian methods can solve the problem of

inference in maximum likelihood computed using

numerial hessians, which are not always very good.

2) Bayesian methods can be used to relax the

assumption of constant variance normal disturbances

made by maximum likelihood methods, resulting in

extended models.

3) Bayesian methods can be used to formally

solve model comparison problems. We can

compare models based on: different weight matrices,

different explanatory (X) variables, or different model

specifications, e.g., SAR, SDM, SEM.

• We can develop Bayesian variants of the SAR, SDM and

SEM models that generalize on maximum likelihood by

allowing for non-constant variance over space.

1

ALL Bayesian econometric theory in 2overheads

In econometrics we care about 1) inference about

parameters, 2) model comparisons, and 3) prediction.

• p(A, B) = p(A|B)p(B) basic probability rules for

A, B r.v.

• p(A, B) = P (B|A)p(A) basic probability rules

• setting these two equal and rearranging yields Bayes’Rule

p(B|A) =p(A|B)p(B)

p(A)(1)

• Letting y = A represent model data and θ = B denote

model parameters:

P (θ|y) =p(y|θ)p(θ)

p(y)(2)

• With reference to 1) inference, All Bayesian

learning/inference about θ|y is based on the posterior

P (θ|y). Since we don’t care to learn about p(y),

because it doesn’t involve the parameters θ, our object

of interest we can simplify to:

p(θ|y) = p(y|θ)p(θ) (3)

2

Overhead #2

• With reference to 2) model comparison, each of m

models is denoted by a likelihood function and prior,

p(θi|y, Mi) =

p(y|θi, Mi)p(θi|Mi)

p(y|Mi)(4)

• Use Bayes’ rule again to explode terms like p(y|Mi) to

arrive at the posterior model probabilities, the basis for

inference about different models, given the sample data.

p(Mi|y) =p(y|Mi)p(Mi)

p(y)(5)

• p(y|Mi) is called the marginal likelihood and we can

solve for this key quantity needed for model comparison

finding:

p(y|Mi) =

Zp(y|θi

, Mi)p(θi|Mi)dθ

i(6)

• to avoid dealing with p(y), we can find:

POij =p(Mi|y)

p(Mj|y)=

p(y|Mi)p(Mi)

p(y|Mj)p(Mj)(7)

• With reference to 3) prediction of y? based on y, using

rules of probability:

p(y?|y) =

Zp(y

?, θ|y)dθ =

Zp(y

?|y, θ)p(θ|y)dθ

(8)

3

Markov Chain Monte Carlo estimation

Bayesian ordinary least-squares regression (non-spatial

example)

y = Xβ + ε (9)

ε ∼ N(0, σ2In)

The parameters to be estimated in (9) are (β, σ), for

which we assign a prior density of the form π(β, σ) =

π1(β)π2(σ), with a normal prior for β and diffuse prior for

σ.

Rβ ∼ N(r, T ) (10)

π1(β) ∝ exp{−(1/2)(Rβ − r)′T−1

(Rβ − r)}

π2(σ) ∝ (1/σ)

(Like Theil and Goldberger (1961) regression.)

For our purposes it is convenient to express (10) in an

alternative (equivalent) form based on a factorization of T−1

into Q′Q = T−1, and q = Qr leading to (11).

4

Qβ ∼ N(q, Im) (11)

π1(β) ∝ exp{−(1/2)(Qβ − q)′(Qβ − q)}

Following the usual Bayesian methodology, we combine

the likelihood function for our simple model:

L(β, σ) ∝ (1/σn)exp[(y−Xβ)

′(y−Xβ)/2σ

2] (12)

with the priors π1(β) and π2(σ) to produce the posterior

density for (β, σ) shown in (13).

p(β, σ) ∝ (1/σn+1

)exp[β − β̂(σ)]′[V (σ)]

−1[β − β̂(σ)]

β̂(σ) = (X′X + σ

2Q

′Q)

−1(X

′y + σ

2Q

′q)

V (σ) = σ2(X

′X + σ

2Q

′Q)

−1(13)

In (13), we have used the notation β̂(σ) to convey

that the mean of the posterior, β̂, is conditional on the

parameter σ, as is the variance, denoted by V (σ). This

single parameter prevents analytical solution of the Bayesian

regression problem.

5

In order to overcome this problem, Theil and Goldberger

(1961) observed that conditional on σ, the posterior density

for β is multivariate normal. They proposed that σ2 be

replaced by an estimated value, σ̂2, based on least-squares

estimates β̂.

β̂TG = (X′X + σ̂

2Q

′Q)

−1(X

′y + σ̂

2Q

′q)

var(β̂TG) = σ̂2(X

′X + σ̂

2Q

′Q)

−1(14)

The advantage of this solution is that the estimation

problem can be solved using existing least-squares regression

software. Their solution produces a point estimate which we

label β̂TG and an associated variance-covariance estimate,

both of which are shown in (14).

6

An MCMC Sampling solution

The Gibbs sampler provides a way to sample from a

multivariate probability density based only on the densities

of subsets of vectors conditional on all others.

A two-step Gibbs sampler for the posterior distribution

(13) of our Bayesian regression model based on the

distribution of β conditional on σ and the distribution of σ

conditional on β. For our regression problem, the posterior

density for β conditional on σ, p(β|σ), is multivariate

normal with mean equal to and variance as indicated in (15).

p(β|σ) ∼ N [β̂(σ), V (σ)] (15)

β̂(σ) = (X′X + σ

2Q

′Q)

−1(X

′y + σ

2Q

′q)

V (σ) = σ2(X

′X + σ

2Q

′Q)

−1

The posterior density for σ conditional on β, p(σ|β) is:

p(σ|β) ∝ (e′e)/χ

2(n)(16)

(y − Xβ)′(y − Xβ)/σ

2|β ∼ χ2(n)

7

The Gibbs sampler suggested by these two conditional

posterior distributions involves the following computations.

• Begin with arbitrary values for the parameters β0 and

σ0, which we designate with the superscript 0.

• Compute the mean and variance of β ∼N [β̂(σ), V (σ)] using (15) conditional on the initial

value σ0.

• Use the computed mean and variance of β to draw a

multivariate normal random vector, which we label β1.

• Use the value β1 along with a random χ2(n) draw to

determine σ1 using (16).

The above four steps are known as a ‘single pass’ through

our (two-step) Gibbs sampler, where we have replaced the

initial arbitrary values of β0 and σ0 with new values labeled

β1 and σ1. We now return to step using the new values β1

and σ1 in place of the initial values β0 and σ0, and make

another ‘pass’ through the sampler. This produces a new set

of values, β2 and σ2.

8

An Example% ----- A simple Gibbs samplern=100; k=3; % set nobs and nvarsx = randn(n,k); % generate data setb = ones(k,1);y = x*b + randn(n,1);r = [1.0 1.0 1.0]’; % prior meansR = eye(k); T = eye(k); % prior varianceQ = chol(inv(T)); Qpr = Q*r; % convenience transformb0 = (x’*x)\(x’*y); % use ols as initial valuessige = (y-x*b0)’*(y-x*b0)/(n-k);ndraw = 6000; nomit = 1000; % set the number of drawsbsave = zeros(ndraw,k); % allocate storage for resultsssave = zeros(ndraw,1);xpx = x’*x; xpy = x’*y; % compute these before samplingtic;for i=1:ndraw; % Start the samplingxpxi = inv(xpx + sige*Q);xpyi = (xpy + sige*Qpr);% update bb = xpxi*xpyi;b = norm_rnd(sige*xpxi) + b;bsave(i,:) = b’;% update sigee = y - x*b;chi = chis_rnd(1,n);sige = e’*e/chi;ssave(i,1) = sige;end; % End the samplingtoc;bhat = mean(bsave(nomit+1:ndraw,:)); % calculate meansbstd = std(bsave(nomit+1:ndraw,:)); % and std deviationsshat = mean(ssave(nomit+1:ndraw,1));

9

Results

Gibbs estimates (1.22 seconds for 6,000 draws)Variable Coefficient t-statistic t-probabilityvariable 1 1.101726 9.507527 0.000000variable 2 1.034981 9.637922 0.000000variable 3 0.994645 9.326300 0.000000

Theil-Goldberger Regression EstimatesR-squared = 0.7376Rbar-squared = 0.7321sigma^2 = 1.0071Durbin-Watson = 1.8900Nobs, Nvars = 100, 3***************************************************************Variable Prior Mean Std Deviationvariable 1 1.000000 1.000000variable 2 1.000000 1.000000variable 3 1.000000 1.000000***************************************************************

Posterior EstimatesVariable Coefficient t-statistic t-probabilityvariable 1 1.099942 9.512551 0.000000variable 2 1.035667 9.804278 0.000000variable 3 0.995215 9.490806 0.000000

10

Bayesian spatial models

Similar to the least-squares model we already did.

Parameters are: β, σ, ρ. We need the conditional

distributions for each of these, and then we can develop

a sampler.

A normal-gamma conjugate prior for β and σ, and a

uniform prior for ρ. The prior distributions are indicated

using π.

y = ρWy + Xβ + ε

ε ∼ N(0, σ2In)

π(β) ∼ N(c, T )

π(1/σ2) ∼ Γ(d, ν)

π(ρ) ∼ U [0, 1]

(17)

11

In the case of large samples involving 3,000 observations,

the normal-gamma priors for β, σ should exert relatively little

influence. Setting c to zero and T to a very large number

results in a diffuse prior for β. Diffuse settings for σ involve

setting d = 0, ν = 0. For completeness, we develop the

results for the case of a normal-gamma prior on β, σ.

In contrast to the case of the priors on β, σ, assigning

an informative prior to the parameter ρ associated with

spatial dependence should exert an impact on the estimation

outcomes even in large samples. This is due the important

role played by spatial dependence in these models. In typical

applications where the magnitude and significance of ρ is

a subject of interest, a diffuse prior would be used. It is

however possible to rely on an informative prior for this

parameter.

The parameters β, σ and ρ in the SAR model can

be estimated by drawing sequentially from the conditional

distributions of these parameters.

12

Conditional distributions

To implement this estimation method, we need to

determine the conditional distributions for each parameter in

our Bayesian SAR model.

• The conditional distribution for β follows from the

maximum likelihood model:

p(β|ρ, σ) ∼ N(b̄, σ2B) (18)

b̄ = A(X′Sy + σ

2T−1

c)

B = σ2A

A = (X′X + σ

2T−1

)−1

S = (In − ρW )

We see that the conditional for β is a multivariate normal

distribution from which it is easy to sample a vector β.

• The conditional distribution for σ given the other

parameters, takes the form (see Gelman, Carlin, Stern

and Rubin, 1995):

p(σ2|β, ρ) ∝ (σ

2)−(n

2+d+1)exp

�−e

′e +

2ν

2σ2

�

e = (In − ρW )y − Xβ

13

which is proportional to an inverse gamma distribution

with parameters (n/2) + d and e′e + 2ν. Again, this

would be an easy distribution from which to sample a

scalar value for σ.

• Finally, the conditional posterior distribution of ρ takes

the form:

p(ρ|β, σ) ∝ |S(ρ)|(s2(ρ))

−(n−k)/2π(ρ) (19)

s2(ρ) =

(Sy − Xb(ρ))′(Sy − Xb(ρ))

(n − k)

S = (In − ρW )

A problem arises here in that this distribution is not one

for which established algorithms exist to produce random

draws.

There are however two ways to sample from this

conditional distribution.

1) Use Metropolis-Hastings algorithms.

2) A griddy Gibbs sampler, with univariate

numerical integration to find the conditional posterior

distribution and then carry out a draw using

“inversion”.

14

The MCMC sampler

By way of summary, an MCMC estimation scheme

involves starting with arbitrary initial values for the

parameters which we denote β0, σ0, V 0, ρ0. We then

sample sequentially from the following set of conditional

distributions for the parameters in our model.

• p(β|σ0, ρ0, ), which is a multivariate normal distribution

with mean and variance defined in (18). This updated

value for the parameter vector β we label β1.

• p(σ|β1, ρ0), which is chi-squared distributed with n+2d

degrees of freedom. Note that we rely on the updated

value of the parameter vector β = β1 when evaluating

this conditional density. We label the updated parameter

σ = σ1 and note that we will continue to employ the

updated values of previously sampled parameters when

evaluating the next conditional densities in the sequence.

• p(ρ|β1, σ1, which we could sample using a metropolis

step or univariate numerical integration and inversion.

15

Sampling for ρ using griddy Gibbs

This is a recent discovery of mine:

p(ρ|y, X, W ) ∝ |S| · (s2)−(n−k)/2

π(ρ) (20)

We work with the log of the expression in (20), and

construct a vector associated with a grid of q values for ρ in

the feasible interval that takes the form in (21).

0BB@

Ln p(ρ1)

Ln p(ρ2)...

Ln p(ρq)

1CCA ∝

0BB@

Ln|S(ρ1)|Ln|S(ρ2)|

...

Ln|S(ρq)|

1CCA−(

n − k

2)

0BB@

Ln(s2(ρ1))

Ln(s2(ρ2))...

Ln(s2(ρq))

1CCA

(21)

Where:

s2(ρi) = e

′oeo − 2ρie

′deo + ρ

2i e

′ded

e = eo − ρed

eo = y − Xβo

ed = Wy − Xβd

16

βo = (X′X)

−1X

′y

βd = (X′X)

−1X

′Wy (22)

This vector allows univariate numerical integration using

a simple method such as Simpson’s rule. For this model,

you can compute the log-determinant once before you begin

sampling.

17

Computational issues using griddy Gibbs

There are four separate terms involved in the univariate

integration problem over the range of support for the

parameter ρ. These are shown in (23) as T1, T2, T3 and T4.

In addition, there are constants of proportionality that arise

during the analytical integration of β and σ that include a

Gamma function that involves n − k, where n denotes the

number of observations and k the number of explanatory

variables. For cases involving model comparisons where

the number of explanatory variables k varies, we need also

include this expression during solution of our integration

problem, which we will refer to as T0.

T1(ρ) = |In − ρW | (23)

T2 = |X ′X|−1/2

T3(ρ) = s2(ρ)

−(n−k)/2

s2(ρ) = {[(In − ρW )y − Xβ]

′[(In − ρW )y − Xβ]}

T4(ρ) =1

Be(α, α)

(1 + ρ)α−1(1 − ρ)α−1

22α−1

18

Griddy Gibbs sampling for rho

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

uniform 0,1 values

rho

draw

cumulative marginaldistribution for rho

Figure 1: Griddy Gibbs sampling for ρ

19

Applied examples

load elect.dat; % load data on votesy = log(elect(:,7)./elect(:,8));x1 = log(elect(:,9)./elect(:,8));x2 = log(elect(:,10)./elect(:,8));x3 = log(elect(:,11)./elect(:,8));latt = elect(:,5);long = elect(:,6);n = length(y); x = [ones(n,1) x1 x2 x3];n = 3107;[junk W junk] = xy2cont(latt,long);vnames = strvcat(’voters’,’const’,’educ’,’homeowners’,’income’);

result = sar(y,x,W); % maximum likelihood estimatesprt(result,vnames);

ndraw = 2500; nomit = 500;prior.novi = 1; % homoscedastic modelresult2 = sar_g(y,x,W,ndraw,nomit,prior); % MCMC estimationresult2.tflag = ’tstat’;prt(result2,vnames);plt(result2);

result = sdm(y,x,W); % maximum likelihood estimatesprt(result,vnames);

ndraw = 2500;nomit = 500;prior.novi = 1; % homoscedastic modelresult2 = sdm_g(y,x,W,ndraw,nomit,prior); % MCMC estimationresult2.tflag = ’tstat’;prt(result2,vnames);

20

Results

sar: hessian not positive definite augmenting small eigenvalues

Spatial autoregressive Model EstimatesDependent Variable = votersR-squared = 0.4426Rbar-squared = 0.4420sigma^2 = 0.0138Nobs, Nvars = 3107, 4log-likelihood = 3209.629# of iterations = 12min and max rho = 0.0000, 1.0000total time in secs = 1.2710time for lndet = 0.8010time for t-stats = 0.2000Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30***************************************************************Variable Coefficient Asymptot t-stat z-probabilityconst 0.626492 20.910168 0.000000educ 0.220617 26.439243 0.000000homeowners 0.481831 48.551766 0.000000income -0.099245 -20.069609 0.000000rho 0.586990 41.899648 0.000000

21

Bayesian spatial autoregressive modelHomoscedastic versionDependent Variable = votersR-squared = 0.4423Rbar-squared = 0.4417mean of sige draws = 0.0138Nobs, Nvars = 3107, 4ndraws,nomit = 2500, 500total time in secs = 15.1320time for lndet = 0.7910time for sampling = 14.1800Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30numerical integration used for rhomin and max rho = -1.0000, 1.0000***************************************************************

Posterior EstimatesVariable Coefficient Asymptot t-stat z-probabilityconst 0.626406 15.115220 0.000000educ 0.220135 14.089999 0.000000homeowners 0.481993 32.344852 0.000000income -0.099131 -6.002571 0.000000rho 0.587551 38.863675 0.000000

22

sar: hessian not positive definite augmenting small eigenvalues

Spatial Durbin modelDependent Variable = votersR-squared = 0.5072Rbar-squared = 0.5063sigma^2 = 0.0126log-likelihood = 3316.1081Nobs, Nvars = 3107, 4# iterations = 17min and max rho = 0.0000, 1.0000total time in secs = 0.9920time for lndet = 0.7310time for t-stats = 0.1600Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30***************************************************************Variable Coefficient Asymptot t-stat z-probabilityconst 0.466628 248.431325 0.000000educ 0.136006 5.507480 0.000000homeowners 0.581067 38.360260 0.000000income -0.069938 -3.036205 0.002396W-educ 0.107467 3.664551 0.000248W-homeowners -0.410606 -19.836192 0.000000W-income -0.077025 -2.695428 0.007030rho 0.653982 33.605980 0.000000

23

Bayesian Spatial Durbin modelHomoscedastic versionDependent Variable = votersR-squared = 0.5072mean of sige draws = 0.0126Nobs, Nvars = 3107, 7ndraws,nomit = 2500, 500total time in secs = 16.1930time for lndet = 0.7410time for sampling = 15.2620Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30numerical integration used for rhomin and max rho= -1.0000, 1.0000***************************************************************Variable Coefficient Asymptot t-stat z-probabilityconst 0.461836 9.089562 0.000000educ 0.135410 5.314491 0.000000homeowners 0.580899 37.353203 0.000000income -0.069273 -3.005387 0.002652W-educ 0.106829 3.521691 0.000429W-homeowners -0.411537 -14.616216 0.000000W-income -0.076070 -2.476424 0.013271rho 0.655221 40.227536 0.000000

24

Using plt(results)

0 1000 2000 3000 4000�4

�3

�2

�1

0

1SAR heteroscedastic Gibbs Actual vs. Predicted

ActualPredicted

0 1000 2000 3000 40001.5

1

0.5

0

0.5

1

1.5Residuals

0 1000 2000 3000 40000

0.5

1

1.5

2Mean of Vi draws

0.5 0.55 0.6 0.65�5

0

5

10

15

20

25

30

Posterior Density for rho

Figure 2: plotting the results structure

25

Bayesian heteroscedastic/robust spatialmodels

We introduce a more general version of the SAR, SDM and SEMmodels that allows for non-constant variance across space as well asoutliers. When dealing with spatial datasets one can encounter whathave become known as “enclave effects”, where a particular region doesnot follow the same relationship as the majority of spatial observations.For example, all counties in a single state might represent aberrantobservations that differ from those in all other counties. This will lead tofat-tailed errors that are not normal, but more likely to follow a Student-tdistribution.

• Introduce a set of variance scalars (v1, v2, . . . , vn), as unknownparameters that need to be estimated.

• This allows us to assume ε ∼ N(0, σ2V ), where V =diag(v1, v2, . . . , vn).

• The prior distribution for the vi terms takes the form of anindependent χ2(r)/r distribution. Recall that the χ2 distributionis a single parameter distribution, where we have represented thisparameter as r. This allows us to estimate the additional nparameters vi in the model by adding the single parameter r to ourestimation procedure.

26

The heteroscedastic prior

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

1.2

1.4

prio

r pro

babi

lity

dens

ity

Vi values

r=2r=5r=20

Figure 3: Prior Vi distributions for various values of r

• The prior mean of the vi equals unity and the prior variance of thevi is 2/r.

• This implies that as r becomes very large, the terms vi will allapproach unity, resulting in V = In, the traditional assumption ofconstant variance across space.

• On the other hand, small values of r lead to a skewed distributionthat permits large values of vi that deviate greatly from the priormean of unity.

• The role of these large vi values is to accommodate outliers orobservations containing large variances by downweighting theseobservations.

27

• Note that ε ∼ N(0, σ2V ), with V diagonal implies a generalizedleast-squares (GLS) correction to the vector y and explanatoryvariables matrix X. The GLS correction involves dividing throughby

√vi, which leads to large vi values functioning to downweight

these observations. Even in large samples, this prior will exert animpact on the estimation outcome.

Bayesian heteroscedastic SAR model

A normal-gamma conjugate prior for β and σ, and a uniform priorfor ρ in addition to the chi-squared prior for the terms in V . The priordistributions are indicated using π.

y = ρWy + Xβ + ε

ε ∼ N(0, σ2V ) V = diag(v1, . . . , vn)

π(β) ∼ N(c, T )

π(r/vi) ∼ IIDχ2(r)

π(1/σ2) ∼ Γ(d, ν)

π(ρ) ∼ U [0, 1]

(24)

28

Estimation of Bayesian spatial models

An unfortunate complication that arises with this extension is thatthe addition of the chi-squared prior creates a complicated posteriordistribution. Assume for the moment, diffuse priors for β, σ. A keyinsight is that if we knew V , this problem would look like a GLS versionof the SAR model.

That is, conditional on V , we would arrive at similar expressions asin maximum likelihood, where the y and X are transformed by dividingthrough by:

pdiag(V ). We rely on a Markov Chain Monte Carlo

(MCMC) estimation method that exploits this fact.

The parameters β, V and σ in the heteroscedastic SAR model canbe estimated by drawing sequentially from the conditional distributionsof these parameters.

29

Conditional distributions

To implement this estimation method, we need to determinethe conditional distributions for each parameter in our Bayesianheteroscedastic SAR model.

• The conditional distribution for β follows from the insight that givenV , we can rely on standard Bayesian GLS regression results to showthat:

p(β|ρ, σ, V ) ∼ N(b̄, σ2B) (25)

β̄ = A(X′V−1

Sy + σ2T−1

c)

B = σ2A

A = (X′V−1

X + σ2T−1

)−1

We see that the conditional for β is a multivariate normal distributionfrom which it is easy to sample a vector β.

• The conditional distribution for σ given the other parameters, takesthe form (see Gelman, Carlin, Stern and Rubin, 1995):

p(σ2|β, ρ, V ) ∝ (σ

2)−(n

2+d+1)exp

�−e

′V−1

e +2ν

2σ2

�

e = (In − ρW )y − Xβ

which is proportional to an inverse gamma distribution withparameters (n/2) + d and e′V −1e + 2ν. Again, this wouldbe an easy distribution from which to sample a scalar value for σ.

• Geweke (1993) shows that the conditional distribution of V given theother parameters is proportional to a chi-square density with r + 1

30

degrees of freedom. Specifically, we can express the conditionalposterior of each vi as:

p(e2i + r

vi|β, ρ, σ

2, v−i) ∼ χ

2(r + 1) (26)

where v−i = (v1, . . . , vi−1, vi+1, . . . , vn) for each i, and e isas defined above. Again, this represents a known distribution fromwhich it is easy to construct a scalar draw.

• Finally, the conditional posterior distribution of ρ takes the form:

p(ρ|β, σ, V ) ∝ |S(ρ)|(s2(ρ))

−(n−k)/2π(ρ) (27)

s2(ρ) =

(Sy − Xb(ρ))′V −1(Sy − Xb(ρ))

(n − k)

Again, we can use either Metropolis-Hastings or univariate numericalintegration to draw ρ values.

31

Example code

while (iter <= ndraw); % ============> start sampling;xs = matmul(x,sqrt(V)); % ============> update betays = sqrt(V).*y; Wys = sqrt(V).*Wy;AI = inv(xs’*xs + sige*TI);yss = ys - rho*Wys;b = xs’*yss + sige*TIc;b0 = AI*b;bhat = norm_rnd(sige*AI) + b0;xb = xs*bhat;nu1 = n + 2*nu; % ============> update sigee = (yss - xb); d1 = 2*d0 + e’*e;chi = chis_rnd(1,nu1);sige = d1/chi;ev = y - rho*Wy - xb; % ============> update vichiv = chis_rnd(n,rval+1);vi = ((ev.*ev/sige) + in*rval)./chiv;V = in./vi;b0 = AI*xs’*ys; % ============> update rhobd = AI*xs’*Wys;e0 = ys - xs*b0; ed = Wys - xs*bd;epe0 = e0’*e0;eped = ed’*ed;epe0d = ed’*e0;rho = draw_rho(detval,epe0,eped,epe0d,n,k,rho);bsave(iter,1:k) = bhat’;% ============> save drawsssave(iter,1) = sige;psave(iter,1) = rho;vmean = vmean + vi;

iter = iter + 1;end; % ============> end sampling

32

Applied Example

To illustrate the heteroscedastic Bayesian SAR model,

generate data with outliers.

y = (In − ρW )−1

Xβ + (In − ρW )−1

ε (28)

load anselin.dat; % 49 observations from Columbus, OHlatt = anselin(:,4); long = anselin(:,5);% create W-matrix using Anselin’s neigbhorhood crime data set[junk W junk] = xy2cont(latt,long);[n junk] = size(W); k = 3; IN = eye(n);sige = 0.1; rho = 0.7; % true values for sige and rhox = randn(n,k); % generate random normal X-matrixbeta = ones(k,1); % true parameter valuesndraw = 2500; % # of draws to carry outnomit = 500; % # of draws to exclude for burn-in% generate y based on the SAR modelout = ones(n,1);out(30,1) = 10; out(35,1) = 10; out(40,1) = 10;y = inv(IN-rho*W)*x*beta + inv(IN-rho*W)*randn(n,1).*out;% estimate maximum likelihood modelresult = sar(y,x,W);prt(result);prior.rval = 4; % heteroscedastic priorresult2 = sar_g(y,x,W,ndraw,nomit,prior);prt(result2);

33

Results

Spatial autoregressive Model EstimatesR-squared = 0.4525Rbar-squared = 0.4287sigma^2 = 8.5044Nobs, Nvars = 49, 3log-likelihood = 108.95304# of iterations = 16min and max rho = -1.0000, 1.0000total time in secs = 0.1000time for lndet = 0.0600time for t-stats = 0.0100Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30***************************************************************Variable Coefficient Asymptot t-stat z-probabilityvariable 1 0.557826 1.087610 0.276767variable 2 1.082276 2.589818 0.009603variable 3 -0.082036 -0.188551 0.850445rho 0.766955 4.667705 0.000003

34

Results

Bayesian spatial autoregressive modelHeteroscedastic modelR-squared = 0.4657sigma^2 = 1.0282r-value = 4Nobs, Nvars = 49, 3ndraws,nomit = 2500, 500total time in secs = 13.9300time for lndet = 0.0600time for sampling = 13.8500Pace and Barry, 1999 MC lndet approximation usedorder for MC appr = 50iter for MC appr = 30numerical integration used for rhomin and max rho = -1.0000, 1.0000***************************************************************

Posterior EstimatesVariable Coefficient Std Deviation p-levelvariable 1 0.761523 0.229360 0.000500variable 2 0.758247 0.170338 0.000000variable 3 0.727195 0.193013 0.000000rho 0.768175 0.056886 0.000000

35

0 5 10 15 20 25 30 35 40 45 50�5

0

5

10

15

20

25SAR model Actual vs. Predicted

ActualPredicted

0 5 10 15 20 25 30 35 40 45 50�5

0

5

10

15

20Residuals

Figure 4: Maximum likelihood act/pred and residuals

0 5 10 15 20 25 30 35 40 45 500

5

10

15

20

25

30

35

40Viestimates

Observations

Figure 5: Posterior means of the vi estimates

36

Documents

Lectures 3: Bayesian estimation of spatial regression models · ALL Bayesian econometric theory in 2 overheads In econometrics we care about 1) inference about parameters, 2) model