Upload
ferra-yanuar
View
216
Download
0
Embed Size (px)
Citation preview
7/30/2019 Bayes SEM With R
1/21Electronic copy available at: http://ssrn.com/abstract=1433709
Bayesian Estimation of Structural Equation Modelswith R
A User Manual
J. Buschken
Catholic University of Eichstatt-Ingolstadt
G. Allenby
The Ohio State University
Working Paper
Please do not cite without the authors permission
Draft as of 2009-07-07
Joachim Buschken, Catholic University of Eichstatt-Ingolstadt, Ingolstadt School of Management, Mar-
keting Department, Auf der Schanz 49, D-85049 Ingolstadt, Germany, phone: +49 841 937 1976, fax: +49
841 937 2976, email: [email protected] M. Allenby, The Ohio State University, Fisher College of Business, 540A Fisher Hall, Helen C.
Kurtz Chair in Marketing, 2100 Neil Avenue, OH-43210 Columbus, USA, phone: +1 614 292 9452, fax: +49
841 937 2976, email: [email protected].
7/30/2019 Bayes SEM With R
2/21Electronic copy available at: http://ssrn.com/abstract=1433709
How to estimate structural equation models with R?
In the social sciences it is often useful to introduce latent variables and use structural
equation modeling to quantify relations among observable and latent variables. This paper
presents a manual, describing how to estimate structural equation models in a Bayesian
approach with R. Parameter estimation follows a Gibbs sampling procedure, generating
draws from the full conditionals of the unknown parameters. The manual is divided into
two main parts. The first part presents an introduction to the estimation of structural
equation models with R. The second part describes a method for simulating data of a
structural equation model and the appendix contains the derivation of the full conditional
distributions.
1 Estimation of SEMs
To illustrate the Bayesian estimation of SEMs with R, we present an application in the
context of a simple SEM. The estimation procedure covers three parts. Firstly the model
has to be specified, secondly the data have to be attached to the model and finally these
values have to be passed to the estimation function.
Specifying the model and attaching data
In order to enable the user to become familiar with the notation and to transfer his model
specifications to the model framework used in this paper, this subsection gives a short
overview of the model framework.1 An example for the specification of a simple SEM
illustrates how the specification procedure is done. This is followed by showing how to
attach data to the model.
1for more details to the model framework see appendix.
7/30/2019 Bayes SEM With R
3/21
A SEM is composed of a measurement equation (1) and a structural equation (2):
yi = i + i (1)
i = i + i + i (2)
i =
i + i = Mi + i
where i {1,...,n}.
Observations of reflective measures yi are assumed to be generated by underlying
latent variables i, possibly with measurement error i. The measurement equation is
defined by a confirmatory factor analysis model, where is the associated (p q) loading
matrix. The structural equation specifies relationships among the latent variables, where i
can be divided into i = i, an endogenous (q1 1) vector of latent variables, and i = i,
an exogenous (q2 1) vector of latent variables. Let q= q1 + q2, M is the unknown (q1 q)
matrix of regression coefficients that represent the proposed causal effects among and ,
and (q1 1) is a random vector of residuals. It is assumed that measurement errors are
uncorrelated with and , residuals are uncorrelated with and the variables are
distributed as follows:
i N(0,) (3)
i N(0, ) (4)
i N(0,) (5)
i {1,...,n}where and are diagonal matrices. This model is not identified, but it
can be identified by restricting appropriate elements in and/or M at fixed known values
(0 or 1). This is done with the help of the following Pick matrices:
Pick(p q): matrix, containing fixed known elements (0 or 1) in
MPick(q1 q): matrix, containing fixed known elements (0 or 1) in M
For example, if is a (2 2) matrix and you want to fix element [1, 1] to 1 and element
7/30/2019 Bayes SEM With R
4/21
[2, 2] to 0, Pick is:
Pick =
1 4
4 0
The non-fixed elements of Pick can be set to any value except for 0 and 1. Non-fixed
elements represent starting values for the MCMC chain (in this case: 4).
In order to enable Gibbs sampling from posterior distributions, we set natural
conjugate prior distributions for the unknown parameters: Let k be the kth diagonal
element of , l be the lth diagonal element of , Tk be the kth row of and M
Tl be
the lth row of M, we get:
1
k
Gamma (0k, 0k) (6)
k|
1k
N(0k, kH0k) (7)
1l Gamma (0l, 0l) (8)
Ml|
1l
N(M0l, lH0Ml) (9)
IW [v0, V0] (10)
with k {1,...,p} and l {1,...,q1}.
It follows that the following parameters have to be specified:
0k: shape parameter of the prior distribution of1k
0k: inverse scale parameter of the prior distribution of1k
0l: shape parameter of the prior distribution of 1l
0l: inverse scale parameter of the prior distribution of1l
v0, V0: parameters of the prior distribution of
Note that we assume that those values are the same for all k {1,...,p} and l {1,...,q1}.
Prior parameters of the distributions of regression matrices are set as follows:
0k: prior mean of k is assumed to be zero
H0k: variance-covariance matrix of the prior distribution of k
7/30/2019 Bayes SEM With R
5/21
is assumed to be a diagonal matrix with 0,01 on the diagonal
M0l: prior mean of Ml is assumed to be zero
H0Ml: variance-covariance matrix of the prior distribution of Ml is assumed to be a
diagonal matrix with 0,01 on the diagonal
Furthermore it is necessary to set starting values for the unknown parameters:
(p p): diagonal variance-covariance matrix of the measurement errors
(q1 q1): diagonal variance-covariance matrix of the structural residuals
(q2 q2): variance-covariance matrix of the latent exogenous variables
and to determine the number of iterations of the MCMC Chain: R.
Starting values for the regression coefficients in the matrices and M have been
already set in the corresponding Pick matrices.
7/30/2019 Bayes SEM With R
6/21
We use the following simple structural equation model with a single exogenous
variable and three endogenous variables to exemplify our approach:
1
1
2
1
2
1
2
33
QQQQQQQQQQs
QQQQQQQQQQs
3
Figure 1: SEM example
1i
2i
3i
=
0 0 0
0 0 0
1 2 0
1i
2i
3i
+
1
2
0
1i +
1i
2i
3i
(11)
1i, 2i and 3i are the endogenous variables in this model, 1i is an exogenous variable. In
matrix notation and using the usual notation for vectors of latent variables in the SEM
literature, this structural model can be written as: i = i + 1i + i for observations
i = 1,...,n.
By continuing the two matrices and this equation becomes:
1i
2i
3i
=
0 0 0 1
0 0 0 2
1 2 0 0
1i
2i
3i
1i
+
1i
2i
3i
(12)
In matrix notation we write this as: i = Mi + i. In our example, we assume that each
7/30/2019 Bayes SEM With R
7/21
latent variable is measured by two reflective measurement indicators, as shown by the
following measurement equations:
yi =
11 0 0 0
21 0 0 0
0 32 0 0
0 42 0 0
0 0 53 0
0 0 63 0
0 0 0 74
0 0 0 84
1i
2i
3i
1i
+
1i
2i
3i
4i
5i
6i
7i
8i
(13)
which is the same as yi = i + i, where i comprises the values of all latent variables for
observation i and yi comprises the vector of observed measurement indicators for i. Since
this modell is not identified, we have to fix elements in to 1. Thus we get:
yi =
11 0 0 0
1 0 0 00 32 0 0
0 1 0 0
0 0 53 0
0 0 1 0
0 0 0 74
0 0 0 1
1i
2i
3i
1i
+
1i
2i
3i
4i
5i
6i
7i
8i
(14)
7/30/2019 Bayes SEM With R
8/21
The matrices and M contain fixed known elements, either 0 or 1. On this basis we
can determine the corresponding Pick matrices:
Pick =
0 0 0
1 0 0 0
0 0 0
0 1 0 0
0 0 0
0 0 1 0
0 0 0
0 0 0 1
(15)
MPick =
0 0 0
0 0 0
0 0
(16)
stands for the unknown elements of the matrices. For the MCMC chain we have to set
starting values for these elements. In this case we set all unknown parameters to 4. The
resulting Pick matrices are:
Pick =
4 0 0 0
1 0 0 0
0 4 0 0
0 1 0 0
0 0 4 0
0 0 1 0
0 0 0 4
0 0 0 1
(17)
7/30/2019 Bayes SEM With R
9/21
MPick =
0 0 0 4
0 0 0 4
4 4 0 0
(18)
The second step is attaching the data to the model. While the input of text-based data is
possible, R supports of several common data formats. For the manual, we will present how
to attach a .txt file. The data have to be arranged as (n p) matrix and saved as a .txt file.
You can read data from this file using the read.table function, creating a dataframe from it:
Data = read.table(file=C:/your folder/data.txt,header=TRUE, sep=\t, dec=,)
Passing data to the estimating function
Having set all necessary parameters and having attached the data, these objects can be
passed to a function, called semest, drawing the parameters of the model from the full
conditionals and thus yielding estimates for the unknown parameters. In order to pass data
and parameter values to the function semest, you have to rearrange the objects in the
following order:
L = (Data,0k, 0k, 0l, 0l,Pick, MPick, ,, , v0, V0, R),
where 0 and M0 are the corresponding matrices, including all rows 0k respectively M0l.
Now these values have to be passed to the function semest:
semest(L)
This function yields all draws of the posterior distributions of the unknown parameters as
well as the estimated values of the latent variables.2
2 Data Simulation
In order to check whether the algorithm yields the true parameters values, you can test the
Gibbs Sampler by simulating data and subsequently estimating the corresponding
2for more details to the derivation of the posterior distributions see appendix.
7/30/2019 Bayes SEM With R
10/21
parameter values. Firstly you have to determine a structural equation model, thus the
following parameters have to be spedified:
(p q): matrix of regression coefficients of the measurement model
M(q1 q): matrix of regression coefficients of the structural model
(p p): diagonal variance-covariance matrix of the measurement errors
(q1 q1): diagonal variance-covariance matrix of the structural residuals
(q2 q2): variance-covariance matrix of the latent exogenous variables
n: number of observations
Then you can pass those values to the function sim, yielding the simulated observations
and latent variables:
sim(,M, , ,,n)
7/30/2019 Bayes SEM With R
11/21
Appendix: Bayesian Estimation of standard SEMs
This section develops a Gibbs sampler to estimate structural equation models (SEM) with
reflective measurement indicators. We illustrate the Bayesian estimation by considering a
standard SEM that is equivalent to the most commonly used LISREL model.
A Model Framework
A SEM is composed of a measurement equation (19) and a structural equation (20):
yi = i + i (19)
i = i + i + i (20)
i =
i + i = Mi + i (21)
where i {1,...,n}.
Observations of reflective measures yi are assumed to be generated by underlying
latent variables i, possibly with measurement error i. The corresponding matrices
including all observations are Y(n p), (n q) and E(n p). The measurement
equation is defined by a confirmatory factor analysis model, where (p q) is the
associated loading matrix. The structural equation specifies relationships among the
identified latent variables, where can be divided into = (q1 1), an endogenous
random vector of latent variables, and = (q2 1), an exogenous random vector of
latent variables. M(q1 q) is the unknown matrix of regression coefficients that represent
the causal effects among and , and (q1 1) is a random vector of residuals. It is
assumed that measruement errors are uncorrelated with and , residuals are uncorrelated
with and the variables are distributed as follows:
i N(0,) (22)
7/30/2019 Bayes SEM With R
12/21
i N(0, ) (23)
i N(0,) (24)
i {1,...,n}, where and are diagonal matrices. The covariance matrix of isderived on the basis of the SEM:
=
E
T
ET
ET
ET
(25)
=
10
T +
T0
10
T
T
0
(26)
T =
10 + 10
10 +
10
T= 10
TT + T
T0 +
10
T + TT
T0
ET
= 10
T +
T0
T =
10 + 10
T
E
T
= 10
(27)
This model is not identified, but it can be identified by restricting appropriate elements in
and/or M at fixed known values (0 or 1).
B Prior Distributions
In order to enable Gibbs sampling from posterior distributions, we set natural conjugate
prior distributions for the unknown parameters: Let k be the kth diagonal element of ,
l be the lth diagonal element of , Tk be the kth row of and M
Tl be the lth row of M,
we get:
1k Gamma (0k, 0k) (28)
k|
1k
N(0k, kH0k) (29)
7/30/2019 Bayes SEM With R
13/21
1l Gamma (0l, 0l) (30)
Ml|
1l
N(M0l, lH0Ml) (31)
IW [v0, V0] (32)
with k {1,...,p} and l {1,...,q1}.
C Derivations of conditional distributions
According to Bayes Theorem, the joint posterior of all unknown parameters is
proportional to the likelihood times the prior, or:
p (, , , M,, |Y) p (Y|,, , M,,) p (, , , M,, ) (33)
Given Y and , and are independent from . Once we have obtained draws of ,
we can treat the estimation of and as a simple regression model. Thus we can sample
from the posterior distribution of and without having to refer to . The same holds
for inference with regard to M, and , which are independent from Y given . This
suggests:
p (, , M,, |Y, ) [p (Y|, ,) p (,)] [p (|M, , ) p (M, ,)] (34)
and we can treat the conditional posterior distributions of , and M, and
separately. in the above expression refers to the the n observations of the values of the
latent variables i which are conditionally independant given . The parameters of
can be understood as the parameters of the distribution of heterogeneity of the latent
variables.
7/30/2019 Bayes SEM With R
14/21
C.1 Obtaining draws of the latent variables ()
We can obtain draws of through the posterior of , which, according to Bayes Theorem,
is given by:
p (|Y, , , ) ni=1
p (yi|i, , ) p (i|) (35)
Given our assumption that the yi are distributed N(i,) and the i are distributed
N(0,), we see that the posterior involves the kernel of two normals, whose quadratic
form in the exponent can easily be combined. Because of the IID assumtion, we treat the
inference for the i separately for each observation i. The exponent of the resulting
distribution has the following expression:
(i 0)T 1 (i 0) + (yi i)
T 1 (yi i)
= Ti 1i + y
Ti
1 yi 2
Ti
T1 yi + Ti
T1 i
= Ti
1 + T1 i 2
Ti
T1 yi + yTi
1 yi
=i
1 + T1
1
T1 yi
T 1 + T1
i
1 + T1
1
T1 yi
T1 yiT 1 + T1 1 T1 yi + yTi 1 yi
(36)
where the last two terms are constants wrt i. As a result, the conditional posterior
distribution ofi is:
p (i|yi, , , ) N
1 + T1 1
T1 yi,
1 + T1 1
(37)
To obtain , we simply cycle in this manner through the i loop. We can then treat as
data in subsequent steps of the Gibbs sampler.
7/30/2019 Bayes SEM With R
15/21
C.2 Obtaining draws of
The first step in developing the conditional distribution for is to recognize that only
those elements of , which refer to , are relevant for . Since i = ii, we can simply
separate the draws of from and use this for inference regarding . refers to the
variance of the vector of exogenous variables only. The likelihood of observing is:
p (|) n
i=1
||0,5 exp
1
2Ti
1i
= ||
n2 etr
1
2T
1
(38)
Combining equation (38) with the prior distribution of in equation (32) yields:
[|] IWv0 + n, V0 +
T
(39)
C.3 Obtaining draws of and
Given , obtaining draws of and becomes a regression problem. We assume that
is a diagonal matrix, i.e. the measurement errors are uncorrelated. The likelihood of
observing the data is given by:
p (Y|,, ) ||
n2 exp
1
2
ni=1
(yi i)T 1 (yi i)
(40)
in which yi and i are column vectors. Because of the property of , we write this as:
p (Y|, , ) ||
n2 exp
1
2
ni=1
pk=1
1kyik
Tki
2(41)
We can change the order of summation and move k out of the summation over i. The
kernel of this distribution can then be written as:
1
2
pk=1
1k
ni=1
yik
Tki
2(42)
7/30/2019 Bayes SEM With R
16/21
The summation over i is:
ni=1
yik
Tki
2=
ni=1
y2ik 2yik
Tki + tr
Tki
Ti k
=
ni=1
y2ik 2Tk
ni=1
yiki + Tk
Tk = YTk Yk 2
Tk
TYk + Tk
Tk
=
k
T1
TYkT
T
k
T1
TYk
(43)
In the above expression, Yk refers to the column vector of all observations with regard to
the kth measurement variable. This yields the following likelihood:
pY|, 1k ,
||
n2
p
k=1
exp121k k T1 TYkT
Tk
T
1
TYk
=pk=1
n2
k exp
1
21k
k
T
1
TYkT
T
k
T
1
TYk
(44)
Notice that the determinant of involves only the product of the diagonal elements. We
can therefore move these elements into the exponential expression. The above expression
for the likelihood implies:
independence of the draws of k, 1k of h,
1h for all h = k
conditional independence ofp1k |Y,
and p
k|Y, ,
1k
Also notice that the p distributions for k|
1k and
1k are independent. This implies that
we can draw k and 1k independently. Thus the likelihood of observing Yk is given by:
pYk|k,
1k ,
n2
k exp
1
21k
k
T
1
TYkT
T
k
T1
TYk
(45)
As mentioned above, this model is not identified. We can handle this problem by fixing
some of the parameters in , see (Lee 2007) for the following section. We suggest to fix
7/30/2019 Bayes SEM With R
17/21
some elements in to 1 and/or 0. Consider Tk , the kth row of , with certain fixed
parameters. Let ck be the corresponding (1 q) row vector such that ckj = 0 ifkj is a
fixed parameter; and ckj = 1 ifkj is an unknown parameter, for k = 1,...,p, j = 1,...,q
and rk = ck1 + ... + ckq. Moreover, let
Tk be the (1 rk) row vector that contains the
unknown parameters in k; and let
k be the (n rk) submatrix of such that for
j = 1,...,rk, all the rows corresponding to ckj = 0 are deleted. Let YTk = (y
1k,...,y
nk) with
yik = yik
qj=1
(kiij (1 ckj)) (46)
This yields the following likelihood of observing Yk :
pYk |
k, 1k ,
n2
k exp
1
21k
k
T
1
TYkT
T
k
T
1
TYk (47)
which can also be written as:
Y
k |
k, 1
k ,
N
T
1
T
Y
k
, k
T
1
(48)
The conjugate prior distribution defined in equation (29) about the loading matrix is
[k|k] N(
0k, kH
0k) (49)
7/30/2019 Bayes SEM With R
18/21
To derive the posterior for k and 1k , we multiply equation (48) with (49) and (28):
p
k, 1k |Y
k ,
n2
k exp121k k T1 TYk T Tk
T
1
TYk
q2
k exp
121k (
k
0k)T (H0k)
1 (k
0k)
1k
0k1 exp0k 1k (50)
Combining the two quadratic forms yields:
p k, 1k |Yk ,
n2
k
q2
k exp
121k
(k c)
TC (
k c) + d
1k
0k1 exp0k 1k (51)
with
C =
T + (H0k)1
c = T + (H0k)
1
1
T
T
1
TYk
+ (H0k)1 0k
d =
T1
TYkT
T
T1
TYk
+ (0k)T (H0k)
1 (0k) (c)T
T + (H0k)1
(c)
(52)
Thus the posterior distributions of
k, 1k
are respectively given by:
p
k|Y
k , 1k ,
q2
k exp
121k (
k c)TC (
k c) (53)
and
p1k |Y
k ,
1k
n2+0k1 exp
1
21k [20k + d]
(54)
7/30/2019 Bayes SEM With R
19/21
which can also be written as:
k|Y
k , 1k ,
N
c, kC
1
(55)
and 1k |Y
k ,
Gamma
n
2+ 0k, 0k +
1
2d
(56)
C.4 Obtaining draws of M and
Given , obtaining draws of M and becomes a regression problem. We assume that
is a diagonal matrix, i.e. the measurement errors are uncorrelated. Thus estimating the
parameters of the structural equation follows the same procedure as obtaining draws of the
parameters of the measurement equation. Analogous to the likelihood of observing Yk , see
equation (48), we get the likelihood of observing l:
l|M
l , 1l ,
N
T
1
Tl
, l
T1
(57)
with l {1,...,q1}. The conjugate prior distribution defined in equation (31) about the
loading matrix is
[Ml |l] N(M
0k, lH
0Ml) (58)
To derive the posterior distributions ofMl and 1l , we multiply equation (57) with (58)
and (30):
p
Ml ,
1l |
l,
n2
l exp
1
21lM
l
T1
T
lT
T
Ml
T1
Tl
q2
l exp
121l (M
l M
0l)T (H0Ml)
1 (Ml M
0l)
1l
0l1 exp0l 1l (59)
7/30/2019 Bayes SEM With R
20/21
Combining the two quadratic forms yields:
pMl ,
1l |
l,
n2
l
q2
l exp121l (Ml cM)TCM (Ml cM) + dM1l
0l1 exp0l 1l (60)
with
CM =
T + (H0Ml)1
cM =
T + (H0Ml)11
T
T1
Tl
+ (H0Ml)1M0l
dM =
T1
Tl
T
T
T1
Tl
+ (M0l)T (H
0Ml)1 (M0l) (cM)T
T + (H0Ml)
1
(cM)
(61)
Thus the posterior distributions of
Ml , 1l
are respectively given by:
pMl |
l, 1l ,
q2
l exp
121l (M
l cM)TCM (M
l cM) (62)
and
p1l |
l,
1l
n2+0l1 exp
1
21l [20l + dM]
(63)which can also be written as:
Ml |
l, 1l ,
N
cM, lC
1M
(64)
and 1l |
l,
Gamma
n2
+ 0l, 0l + 12dM
(65)
7/30/2019 Bayes SEM With R
21/21
D Setting values for hyperprior parameters
In order to enable sampling from posterior distributions of the unknown elements, we have
to set values for the hyperparameters 0k and 0k, as well as 0l and 0l of the prior
distributions in equation (28) and (30). Since the results of the sampler are very sensitive
to these values, they have to be set thoughtfully. We suggest to run the Gibbs sampler
initially without sampling the parameters in and , but calculate them using the
following formulas:
k =1
n q(Yk k)
T (Yk k) (66)
l =1
n q(l Ml)
T (l Ml) (67)
In a second run we set the parameters 0k and 0k as follows:
0k = 100
0k = k 100(68)
and
0l = 100
0l = l 100(69)
and include the sampling of the parameters in and .