Upload
khalkhalesh
View
221
Download
0
Embed Size (px)
Citation preview
8/7/2019 fox - intro to struct eqn modelling
1/35
An Introduction to
Structural Equation Modeling
John Fox
McMaster UniversityCanada
November 2008
Programa de Computao Cientfica - FIOCRUZ Rio de Janeiro Brasil
Copyright 2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
2/35
8/7/2019 fox - intro to struct eqn modelling
3/35
8/7/2019 fox - intro to struct eqn modelling
4/35
Introduction to Structural-Equation Modeling 9
x1
x2
x3
x4
y5
y6
7
8
7814
51
52
63
64
56 65
Figure 1. Duncan, Haller, and Portess (nonrecursive) peer-influences
model: x1, respondents IQ; x2, respondents family SES; x3, best friendsfamily SES; x4, best friends IQ; y5 , respondents occupational aspiration;y6, best friends occupational aspiration. So as not to clutter the diagram,only one exogenous covariance, 14, is shown.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 10
When two variables are not linked by a directed arrow it does notnecessarily mean that one does not affect the other:
For example, in the Duncan, Haller, and Portes model, respondents
IQ (x1) can affect best friends occupational aspiration (y6), but only
indirectly, through respondents aspiration (y5). The absence of a directed arrow between respondents IQ and best
friends aspiration means that there is no partial relationship between
the two variables when the direct causes of best friends aspiration are
held constant.
In general, indirect effects can be identified with compound paths
through the path diagram.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 11
2.2 Structural Equations
The structural equations of a model can be read straightforwardly fromthe path diagram.
For example, for the Duncan, Haller, and Portes peer-influences
model:
y5i = 50 + 51x1i + 52x2i + 56y6i + 7iy6i = 60 + 63x3i + 64x4i + 65y5i + 8i
Ill usually simplify the structural equations by
(i) suppressing the subscript i for observation;
(ii) expressing all xs and y s as deviations from their population means
(and, later, from their means in the sample).
Putting variables in mean-deviation form gets rid of the constant terms
(here, 50 and 60) from the structural equations (which are rarely of
interest), and will simplify some algebra later on.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 12
Applying these simplifications to the peer-influences model:
y5 = 51x1 + 52x2 + 56y6 + 7y6 = 63x3 + 64x4 + 65y5 + 8
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
5/35
Introduction to Structural-Equation Modeling 13
2.3 Matrix Form of the Model
It is sometimes helpful (e.g., for generality) to cast a structural-equationmodel in matrix form.
To illustrate, Ill begin by rewriting the Duncan, Haller and Portes model,shifting all observed variables (i.e., with the exception of the errors)to the left-hand side of the model, and showing all variables explicitly;
variables missing from an equation therefore get 0 coefficients, while theresponse variable in each equation is shown with a coefficient of 1:
1y5 56y6 51x1 52x2 + 0x3 + 0x4 = 765y5 + 1y6 + 0x1 + 0x2 63x3 64x4 = 8
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 14
Collecting the endogenous variables, exogenous variables, errors, andcoefficients into vectors and matrices, we can write
1 5665 1
y5
y6 + 51 52 0 0
0 0 63 64
x1x2
x3x4
= 7
8 More generally, where there are q endogenous variables (and henceq errors) and m exogenous variables, the model for an individual
observation is
B(qq)
yi(q1)
+ (qm)
xi(m1)
= i(q1)
The B and matrices of structural coefficients typically contain some
0 elements, and the diagonal entries of the B matrix are 1s.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 15
We can also write the model for all n observations in the sample:
Y(nq)
B0(qq)
+ X(nm)
0
(mq)= E
(nq)
I have transposed the structural-coefficient matrices B and , writing
each structural equation as a column (rather than as a row), so that
each observation comprises a row of the matrices Y, X , and E ofendogenous variables, exogenous variables, and errors.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 16
2.4 Recursive, Block-Recursive, and NonrecursiveStructural-Equation Models
An important type of SEM, called a recursive model, has two definingcharacteristics:
(a) Different error variables are independent (or, at least, uncorrelated).
(b) Causation in the model is unidirectional: There are no reciprocalpaths or feedback loops, as shown in Figure 2.
Put another way, the B matrix for a recursive SEM is lower-triangular,while the error-covariance matrix is diagonal.
An illustrative recursive model, from Blau and Duncans seminalmonograph, The American Occupational Structure (1967), appears in
Figure 3.
For the Blau and Duncan model:
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
6/35
Introduction to Structural-Equation Modeling 17
reciprocal
paths
a feedback
loop
yk yk yk yk
yk
Figure 2. Reciprocal paths and feedback loops cannot appear in a recur-
sive model.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 18
x1
x2
y3
y4
y5
6
7
8
1232 52
31
42
43
53
54
Figure 3. Blau and Duncans basic stratification model: x1, fathers edu-cation; x2, fathers occupational status; y3, respondents (sons) education;y4, respondents first-job status; y5, respondents present (1962) occupa-tional status.FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 19
B =
1 0 043 1 053 54 1
=
26 0 00 27 0
0 0 2
8
Sometimes the requirements for unidirectional causation and indepen-
dent errors are met by subsets (blocks) of endogenous variables and
their associated errors rather than by the individual variables. Such a
model is called block recursive.
An illustrative block-recursive model for the Duncan, Haller, and Portespeer-influences data is shown in Figure 4.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 20
x1
x2
x3
x4
y5
y6
9
10
y7
y8
11
12
block 1
block 2
Figure 4. An extended, block-recursive model for Duncan, Haller, andPortess peer-influences data: x1, respondents IQ; x2, respondents family
SES; x3, best friends family SES; x4, best friends IQ; y5 , respondentsoccupational aspiration; y6, best friends occupational aspiration; y7, re-spondents educational aspiration; y8, best friends educational aspiration.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
7/35
Introduction to Structural-Equation Modeling 21
Here
B =
1 56 0 065 1 0 075 0 1 78
086
87
1
=
B11 0
B21 B22
=
29 9,10 0 010,9
210 0 0
0 0 211 11,120 0 12,11
212
=
11 0
0 22
A model that is neither recursive nor block-recursive (such as the model
for Duncan, Haller and Portess data in Figure 1) is termed nonrecursive.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 22
3. Instrumental-Variables Estimation Instrumental-variables (IV) estimationis a method of deriving estimators
that is useful for understanding whether estimation of a structural
equation model is possible (the identification problem) and for obtaining
estimates of structural parameters when it is.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 23
3.1 Expectations, Variances and Covariances ofRandom Variables
It is helpful first to review expectations, variances, and covariances ofrandom variables:
The expectation (or mean) of a random variable X is the average
value of the variable that would be obtained over a very large numberof repetitions of the process that generates the variable:
For a discrete random variable X,E(X) = X =
Xall x
xp(x)
where x is a value of the random variable and p(x) = Pr(X= x).
Im careful here (as usually I am not) to distinguish the randomvariable (X) from a value of the random variable (x).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 24
For a continuous random variable X,E(X) = X =
Z+
xp(x)dx
where now p(x) is the probability-density function of X evaluated atx.
Likewise, the variance of a random variable is
in the discrete caseVar(X) = 2X =
Xall x
(x X)2p(x)
and in the continuous caseVar(X) = 2X =
Z+
(x X)2p(x)dx
In both casesVar(X) = E
(X X)2
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
8/35
Introduction to Structural-Equation Modeling 25
The standard deviation of a random variable is the square-root of thevariance:
SD(X) = X = +q2X
The covariance of two random variablesXand Y is a measure of their
linear relationship; again, a single formula suffices for the discrete andcontinuous case:
Cov(X,Y) = XY = E[(X X)(Y Y)] The correlation of two random variables is defined in terms of their
covariance:
Cor(X, Y) = XY =XY
XY
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 26
3.2 Simple Regression
To understand the IV approach to estimation, consider first the followingroute to the ordinary-least-squares (OLS) estimator of the simple-
regression model,
y = x + where the variables x and y are in mean-deviation form, eliminating the
regression constant from the model; that is, E(y) = E(x) = 0. By the usual assumptions of this model, E() = 0; Var() = 2; andx, are independent.
Now multiply both sides of the model by x and take expectations:
xy = x2 + x
E(xy) = E(x2) + E(x)
Cov(x, y) = Var(x) + Cov(x)xy =
2x + 0
where Cov(x) = 0 because x and are independent.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 27
Solving for the regression coefficient ,
=xy
2x Of course, we dont know the population covariance of x and y, nor
do we know the population variance of x, but we can estimate both of
these parameters consistently:
s2x =P(xi x)2n 1
sxy =
P(xi x)(yi y)n 1
In these formulas, the variables are expressed in raw-score form, and
so I show the subtraction of the sample means explicitly.
A consistent estimator of is then
b =sxy
s2x=
P(xi x)(yi y)
P(xi x)2which we recognize as the OLS estimator.FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 28
Imagine, alternatively, that x and are not independent, but that isindependent of some other variable z.
Suppose further that zand x are correlated that is, Cov(x, z) 6= 0.
Then, proceeding as before, but multiplying through by z rather than
by x (with all variable expressed as deviations from their expectations):
zy = zx+ z
E(zy) = E(zx) + E(z)
Cov(z, y) = Cov(z, x) + Cov(z)
zy = zx + 0
=zy
zxwhere Cov(z) = 0 because zand are independent.
Substituting sample for population covariances gives the instrumental
variables estimator of :
bIV = szy
szx= P(zi z)(yi y)P
(zi z)(xi x)FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
9/35
Introduction to Structural-Equation Modeling 29
The variable z is called an instrumental variable (or, simply, aninstrument).
bIV is a consistent estimator of the population slope , because thesample covariances szy and szx are consistent estimators of the
corresponding population covariances zy and zx.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 30
3.3 Multiple Regression
The generalization to multiple-regression models is straightforward. For example, for a model with two explanatory variables,
y = 1x1 + 2x2 +
(with x1, x2, and y all expressed as deviations from their expectations).
If we can assume that the error is independent of x1 and x2, then we
can derive the population analog of estimating equations by multiplying
through by the two explanatory variables in turn, obtaining
E(x1y) = 1E(x21) + 2E(x1x2) + E(x1)
E(x2y) = 1E(x1x2) + 2E(x22) + E(x2)
x1y = 12x1
+ 2x1x2 + 0
x2y = 1x1x2 + 22x2 + 0
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 31
Substituting sample for population variances and covariancesproduces the OLS estimating equations:
sx1y = b1s2x1
+ b2sx1x2sx2y = b1sx1x2 + b2s
2x2
Alternatively, if we cannot assume that is independent of the xs, but
can assume that is independent of two other variables, z1 and z2,then
E(z1y) = 1E(z1x1) + 2E(z1x2) + E(z1)
E(z2y) = 1E(z2x1) + 2E(z2x2) + E(z2)
z1y = 1z1x1 + 2z1x2 + 0
z2y = 1z2x1 + 2z2x2 + 0
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 32
the IV estimating equations are obtained by the now familiar step
of substituting consistent sample estimators for the population
covariances:
sz1y = b1sz1x1 + b2sz1x2sz2y = b1sz2x1 + b2sz2x2
For the IV estimating equations to have a unique solution, itsnecessary that there not be an analog of perfect collinearity.
For example, neither x1 nor x2 can be uncorrelated with both z1 andz2.
Good instrumental variables, while remaining uncorrelated with theerror, should be as correlated as possible with the explanatory variables.
In this context, good means yielding relatively small coefficient
standard errors (i.e., producing efficient estimates).
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
10/35
Introduction to Structural-Equation Modeling 33
OLS is a special case of IV estimation, where the instruments and the
explanatory variables are one and the same.
When the explanatory variables are uncorrelated with the error, theexplanatory variables are their own best instruments, since they are
perfectly correlated with themselves.
Indeed, the Gauss-Markov theorem insures that when it is applicable,the OLS estimator is the best (i.e., minimum variance or most
efficient) linear unbiased estimator (BLUE).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 34
3.4 Instrumental-Variables Estimation in Matrix Form
Our object is to estimate the model
y(n1)
= X(nk+1)
(k+11)
+ (n1)
where Nn(0, 2In). Of course, if X and are independent, then we can use the OLS
estimator
bOLS = (X0X)1X0y
with estimated covariance matrixbV(bOLS) = s2OLS(X0X)1where
s2OLS =e0OLSeOLS
n
k
1
for eOLS = yXbOLS
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 35
Suppose, however, that we cannot assume that X and are indepen-dent, but that we have observations on k + 1 instrumental variables,Z
(nk+1), that are independent of .
For greater generality, I have not put the variables in mean-deviation
form, and so the model includes a constant; the matrices X and Z
therefore each include an initial column of ones. A development that parallels the previous scalar treatment leads to the
IV estimator
bIV = (Z0X)1Z0y
with estimated covariance matrixbV(bIV) = s2IV(Z0X)1Z0Z(X0Z)1where
s2IV =e0IVeIV
nk
1
foreIV = yXbIV
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 36
Since the results for IV estimation are asymptotic, we could also
estimate the error variance with n rather than n k 1 in thedenominator, but dividing by degrees of freedom produces a larger
variance estimate and hence is conservative.
For bIV to be unique Z0X must be nonsingular (just as X0X must be
nonsingular for the OLS estimator).
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
11/35
Introduction to Structural-Equation Modeling 37
4. The Identification Problem If a parameter in a structural-equation model can be estimated then the
parameter is said to be identified; otherwise, it is underidentified (or
unidentified).
If all of the parameters in a structural equation are identified, then sois the equation.
If all of the equations in an SEM are identified, then so is the model.
Structural equations and models that are not identified are also termed
underidentified.
If only one estimate of a parameter is available, then the parameter isjust-identified or exactly identified.
If more than one estimate is available, then the parameter is overidenti-
fied.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 38
The same terminology extends to structural equations and to models:An identified structural equation or SEM with one or more overidentified
parameters is itself overidentified.
Establishing whether an SEM is identified is called the identification
problem. Identification is usually established one structural equation at a time.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 39
4.1 Identification of Nonrecursive Models: The OrderCondition
Using instrumental variables, we can derive a necessary (but, as it turnsout, not sufficient) condition for identification of nonrecursive models
called the order condition.
Because the order condition is not sufficient to establish identification,it is possible (though rarely the case) that a model can meet the order
condition but not be identified.
There is a necessary and sufficient condition for identification called
the rank condition, which I will not develop here. The rank condition is
described in the reading.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 40
The terms order condition and rank condition derive from the
order (number of rows and columns) and rank (number of linearly
independent rows and columns) of a matrix that can be formulated
during the process of identifying a structural equation. We will not
pursue this approach.
Both the order and rank conditions apply to nonrecursive models
without restrictions on disturbance covariances.
Such restrictions can sometimes serve to identify a model that wouldnot otherwise be identified.
More general approaches are required to establish the identificationof models with disturbance-covariance restrictions. Again, these are
taken up in the reading.
We will, however, use the IV approach to consider the identification oftwo classes of models with restrictions on disturbance covariances:
recursive and block-recursive models.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
12/35
Introduction to Structural-Equation Modeling 41
The order condition is best developed from an example. Recall the Duncan, Haller, and Portes peer-influences model, repro-
duced in Figure 5.
Let us focus on the first of the two structural equations of the model,
y5 = 51x1 + 52x2 + 56y6 + 7where all variables are expressed as deviations from their expecta-
tions.
There are three structural parameters to estimate in this equation,51, 52, and 56.
It would be inappropriate to perform OLS regression of y5 on x1,
x2, and y6 to estimate this equation, because we cannot reasonably
assume that the endogenous explanatory variable y6 is uncorrelated
with the error 7.
7 may be correlated with 8, which is one of the components of y6 . 7 is a component of y5 which is a cause (as well as an effect) of y6.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 42
x1
x2
x3
x4
y5
y6
7
8
7814
51
52
63
64
56 65
Figure 5. Duncan, Haller, and Portes nonrecursive peer-influences model
(repeated).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 43
This conclusion is more general: We cannot assume that endogenous
explanatory variables are uncorrelated with the error of a structural
equation.
As we will see, however, we will be able to make this assumption inrecursive models.
Nevertheless, we can use the four exogenous variables x1, x2, x3, and
x4, as instrumental variables to obtain estimating equations for the
structural equation:
For example, multiplying through the structural equation by x1 andtaking expectations produces
x1y5 = 51x21 + 52x1x2 + 56x1y6 + x17
E(x1y5) = 51E(x21) + 52E(x1x2) + 56E(x1y6) + E(x17)
15 = 5121 + 5212 + 5616 + 0
since 17 = E(x17) = 0.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 44
Applying all four exogenous variables,IV Equation
x1 15 = 5121 + 5212 + 5616
x2 25 = 5112 + 5222 + 5626
x3 35 = 5113 + 5223 + 5636x4 45 = 5114 + 5224 + 5646
If the model is correct, then all of these equations, involvingpopulation variances, covariances, and structural parameters, hold
simultaneously and exactly.
If we had access to the population variances and covariances,then, we could solve for the structural coefficients 51, 52, and 56even though there are four equations and only three parameters.
Since the four equations hold simultaneously, we could obtain thesolution by eliminating any one and solving the remaining three.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
13/35
8/7/2019 fox - intro to struct eqn modelling
14/35
Introduction to Structural-Equation Modeling 49
1
1
22
3 3
possible va lues
of 54
possible va lues
of 51
54
54
51
51
(a) (b)
^
^
Figure 6. Population equations (a) and corresponding estimating equa-
tions (b) for an overidentified structural equation with two parameters andthree estimating equations. The population equations have a solution forthe parameters, but the estimating equations do not.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 50
Let us return to the Duncan, Haller, and Portes model, and add a pathfrom x3 to y5, so that the first structural equation becomes
y5 = 51x1 + 52x2 + 53x3 + 56y6 + 7 There are now four parameters to estimate (51, 52, 53, and 56), and
four IVs (x1, x2, x3, and x4), which produces four estimating equations. With as many estimating equations as unknown structural parameters,
there is only one way of estimating the parameters, which are therefore
just identified.
We can think of this situation as a kind of balance sheet with IVs as
credits and structural parameters as debits.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 51
For a just-identified structural equation, the numbers of credits anddebits are the same:
Credits Debits
IVs parameters
x1 51x2 52
x3 53x4 564 4
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 52
In the original specification of the Duncan, Haller, and Portes model,
there were only three parameters in the first structural equation,
producing a surplus of IVs, and an overidentified structural equation:
Credits Debits
IVs parameters
x1 51
x2 52x3 56x44 3
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
15/35
Introduction to Structural-Equation Modeling 53
Now let us add still another path to the model, from x4 to y5, so that thefirst structural equation becomes
y5 = 51x1 + 52x2 + 53x3 + 54x4 + 56y6 + 7 Now there are fewer IVs available than parameters to estimate in the
structural equation, and so the equation is underidentifi
ed:Credits Debits
IVs parameters
x1 51x2 52x3 53x4 54
564 5
That is, we have only four estimating equations for five unknownparameters, producing an underdetermined system of estimating
equations.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 54
From these examples, we can abstract the order condition for identifica-tion of a structural equation: For the structural equation to be identified,
we need at least as many exogenous variables (instrumental variables)
as there are parameters to estimate in the equation.
Since structural equation models have more than one endogenous
variable, the order condition implies that some potential explanatory
variables must be excluded apriori from each structural equation of the
model for the model to be identified.
Put another way, for each endogenous explanatory variable in a
structural equation, at least one exogenous variable must be excluded
from the equation.
Suppose that there are m exogenous variable in the model:
A structural equation with fewer than m structural parameters is
overidentified. A structural equation with exactly m structural parameters is just-
identified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 55
A structural equation with more than m structural parameters isunderidentified, and cannot be estimated.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 56
4.2 Identification of Recursive and Block-RecursiveModels
The pool of IVs for estimating a structural equation in a recursivemodel includes not only the exogenous variables but prior endogenous
variables as well.
Because the explanatory variables in a structural equation are drawnfrom among the exogenous and prior endogenous variables, there will
always be at least as many IVs as there are explanatory variables (i.e.,
structural parameters to estimate).
Consequently, structural equations in a recursive model are necessar-
ily identified.
To understand this result, consider the Blau and Duncan basic-stratification model, reproduced in Figure 7.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
16/35
Introduction to Structural-Equation Modeling 57
x1
x2
y3
y4
y5
6
7
8
1232 52
31
42
43
53
54
Figure 7. Blau and Duncans recursive basic-stratification model (re-peated).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 58
The first structural equation of the model is
y3 = 31x1 + 32x2 + 6with balance sheet
Credits Debits
IVs parameters
x1 31x2 322 2
Because there are equal numbers of IVs and structural parameters,the first structural equation is just-identified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 59
More generally, the first structural equation in a recursive model canhave only exogenous explanatory variables (or it wouldnt be the first
equation).
If all the exogenous variables appear as explanatory variables (asin the Blau and Duncan model), then the first structural equation is
just-identified.
If any exogenous variables are excluded as explanatory variablesfrom the first structural equation, then the equation is overidentified.
The second structural equation in the Blau and Duncan model is
y4 = 42x2 + 43y3 + 7 As before, the exogenous variable x1 and x2 can serve as IVs. The prior endogenous variable y3 can also serve as an IV, because
(according to the first structural equation), y3 is a linear combination
of variables (x1, x2, and 6) that are all uncorrelated with the error
7 (x1 and x2 because they are exogenous, 6 because it is anothererror variable).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 60
The balance sheet is thereforeCredits Debits
IVs parameters
x1 42x2 43y3
3 2 Because there is a surplus of IVs, the second structural equation is
overidentified.
More generally, the second structural equation in a recursive modelcan have only the exogenous variables and the first (i.e., prior)
endogenous variable as explanatory variables.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
17/35
Introduction to Structural-Equation Modeling 61
All of these predetermined variables are also eligible to serve asIVs.
If all of the predetermined variables appear as explanatoryvariables, then the second structural equation is just-identified; if
any are excluded, the equation is overidentified.
The situation with respect to the third structural equation is similar:
y5 = 52x2 + 53y3 + 54y4 + 8 Here, the eligible instrumental variables include (as always) the
exogenous variables (x1, x2) and the two prior endogenous variables:
y3 because it is a linear combination of exogenous variables (x1and x2) and an error variable (6), all of which are uncorrelated with
the error from the third equation, e8.
y4 because it is a linear combination of variables (x2, y3, and 7
as specified in the second structural equation), which are also alluncorrelated with 8.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 62
The balance sheet for the third structural equation indicates that theequation is overidentified:
Credits Debits
IVs parameters
x1 52x2 53y3 54y44 3
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 63
More generally:
All prior variables, including exogenous and prior endogenousvariables, are eligible as IVs for estimating a structural equation in a
recursive model.
If all of these prior variables also appear as explanatory variables inthe structural equation, then the equation is just-identified.
If, alternatively, one or more prior variables are excluded, then theequation is overidentified.
A structural equation in a recursive model cannot be underidentified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 64
A slight complication: There may only be a partial ordering of theendogenous variables.
Consider, for example, the model in Figure 8.
This is a version of Blau and Duncans model in which the path fromy3 to y4 has been removed.
As a consequence, y3 is no longer prior to y4 in the model indeed,
the two variables are unordered.
Because the errors associated with these endogenous variables, 6and 7, are uncorrelated with each other, however, y3 is still available
for use as an IV in estimating the equation for y4.
Moreover, now y4 is also available for use as an IV in estimating theequation for y3, so the situation with respect to identification has, if
anything, improved.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
18/35
Introduction to Structural-Equation Modeling 65
x1
x2
y3
y4
y5
6
7
8
1232 52
31
42
53
54
Figure 8. A recursive model (a modification of Blau and Duncans model) inwhich there are two endogenous variables, y3 and y4, that are not ordered.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 66
In a block-recursive model, all exogenous variables and endogenousvariables in prior blocks are available for use as IVs in estimating the
structural equations in a particular block.
A structural equation in a block-recursive model may therefore be
under-, just-, or overidentified, depending upon whether there are
fewer, the same number as, or more IVs than parameters.
For example, recall the block-recursive model for Duncan, Haller, and
Portess peer-influences data, reproduced in Figure 9.
There are four IVs available to estimate the structural equations inthe first block (for endogenous variables y5 and y6) the exogenous
variables (x1, x2, x3, and x4).
Because each of these structural equations has four parameters toestimate, each equation is just-identified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 67
x1
x2
x3
x4
y5
y6
9
10
y7
y8
11
12
block 1
block 2
Figure 9. Block-recursive model for Duncan, Hallter and Portess peer-in-fluences data (repeated).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 68
There are six IVs available to estimate the structural equations inthe second block (for endogenous variables y7 and y8) the four
exogenous variables plus the two endogenous variables (y5 and y6)
from the first block.
Because each structural equation in the second block has fivestructural parameters to estimate, each equation is overidentified.
In the absence of the block-recursive restrictions on the disturbancecovariances, only the exogenous variables would be available as
IVs to estimate the structural equations in the second block, and
these equations would consequently be underidentified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
19/35
Introduction to Structural-Equation Modeling 69
5. Estimation of Structural-Equation Models
5.1 Estimating Nonrecursive Models
There are two general and many specific approaches to estimating
SEMs:(a) Single-equationor limited-informationmethods estimate each struc-
tural equation individually.
I will describe a single-equation method called two-stage leastsquares(2SLS).
Unlike OLS, which is also a limited-information method, 2SLSproduces consistent estimates in nonrecursive SEMs.
Unlike direct IV estimation, 2SLS handles overidentified structuralequations in a non-arbitrary manner.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 70
2SLS also has a reasonable intuitive basis and appears to performwell it is generally considered the best of the limited-information
methods.
(b) Systems or full-informationmethods estimate all of the parameters
in the structural-equation model simultaneously, including errorvariances and covariances.
I will briefly describe a method called full-information maximum-likelihood (FIML).
Full information methods are asymptotically more efficient thansingle-equation methods, although in a model with a misspecified
equation, they tend to proliferate the specification error throughout
the model.
FIML appears to be the best of the full-information methods.
Both 2SLS and FIML are implemented in the sem package for R.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 71
5.1.1 Two-Stage Least Squares
Underidentified structural equations cannot be estimated.
Just-identified equations can be estimated by direct application of theavailable IVs.
We have as many estimating equations as unknown parameters.
For an overidentified structural equation, we have more than enoughIVs.
There is a surplus of estimating equations which, in general, are not
satisfied by a common solution.
2SLS is a method for reducing the IVs to just the right number but
by combining IVs rather than discarding some altogether.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 72
Recall the first structural equation from Duncan, Haller, and Portesspeer-influences model:
y5 = 51x1 + 52x2 + 56y6 + 7 This equation is overidentified because there are four IVs available
(x1, x2, x3, and x4) but only three structural parameters to estimate
(51, 52, and 56).
An IV must be correlated with the explanatory variables but uncorre-
lated with the error.
A good IV must be as correlated as possible with the explanatory
variables, to produce estimated structural coefficients with small
standard errors.
2SLS chooses IVs by examining each explanatory variable in turn:
The exogenous explanatory variables x1 and x2 are their own bestinstruments because each is perfectly correlated with itself.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
20/35
Introduction to Structural-Equation Modeling 73
To get a best IV for the endogenous explanatory variable y6, we firstregress this variable on all of the exogenous variables (by OLS),
according to the reduced-form model
y6 = 61x1 + 62x2 + 63x3 + 64x4 + 6producing fitted valuesby6 = b61x1 + b62x2 + b63x3 + b64x4 Because by6 is a linear combination of the xs indeed, the linear
combination most highly correlated with y6 it is (asymptotically)
uncorrelated with the structural error 7.
This is the first stage of 2SLS. Now we have just the right number of IVs: x1, x2, and by6, pro-
ducing three estimating equations for the three unknown structural
parameters:
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 74
IV 2SLS Estimating Equation
x1 s15 = b51s21 + b52s12 + b56s16x2 s25 = b51s12 + b52s22 + b56s26by6 s5
b6 =
b51s1
b6 +
b52s2
b6 +
b56s6
b6
where, e.g., s5b6 is the sample covariance between y5 and by6. The generalization of 2SLS from this example is straightforward: Stage 1: Regress each of the endogenous explanatory variables in
a structural equation on all of the exogenous variables in the model,
obtaining fitted values.
Stage 2: Use the fitted endogenous explanatory variables from stage
1 along with the exogenous explanatory variables as IVs to estimate
the structural equation.
If a structural equation is just-identified, then the 2SLS estimates areidentical to those produced by direct application of the exogenous
variables as IVs.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 75
There is an alternative route to the 2SLS estimator which, in the secondstage, replaces each endogenous explanatory variable in the structural
equation with the fitted values from the first stage regression, and then
performs an OLS regression.
The second-stage OLS regression produces the same estimates as
the IV approach.
The name two-stage least squares originates from this alternativeapproach.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 76
The 2SLS estimator for the jth structural equation in a nonrecursivemodel can be formulated in matrix form as follows:
Write the jth structural equation as
yj(n1)
= Yj(nqj)
j(qj1)
+ Xj(nmj)
j(mj1)
+ j(n1)
= [Yj,Xj] jj + j
whereyj is the response-variable vector in structural equation j
Yj is the matrix of qj endogenous explanatory variables in equation j
j is the vector of structural parameters for the endogenous
explanatory variables
Xj is the matrix of mj exogenous explanatory variables in equation j,
normally including a column of 1sj is the vector of structural parameters for the exogenous explanatory
variablesj is the error vector for structural equation j
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
21/35
Introduction to Structural-Equation Modeling 77
In the first stage of 2SLS, the endogenous explanatory variables are
regressed on all m exogenous variables in the model, obtaining the
OLS estimates of the reduced-form regression coefficients
Pj = (X0X)1X0Yj
and fitted values bYj = XPj = X(X0X)1X0Yj In the second stage of 2SLS, we apply Xj and bYj as instruments to
the structural equation to obtain (after quite a bit of manipulation) bjbj
=
Y0jX(X
0X)1X0Yj Y
0jXj
X0jYj X0jXj
1 Y0jX(X
0X)1X0yj
X0jyj
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 78
The estimated variance-covariance matrix of the 2SLS estimates is
bV bjbj
= s2ej
Y0jX(X
0X)1X0Yj Y
0jXj
X0jYj X0jXj
1where
s2ej =
e0jej
n qj mjej = yj Yjbj Xjbj
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 79
5.1.2 Full-Information Maximum Likelihood
Along with the other standard assumptions of SEMs, FIML estimatesare calculated under the assumption that the structural errors are
multivariately normally distributed.
Under this assumption, the log-likelihood for the model is
logeL(B,,) = n loge |det(B)|nq
2 loge 2 n
2 loge det()
12
nXi=1
(Byi+xi)01 (Byi+xi)
where det represents the determinant.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 80
The FIML estimates are the values of the parameters that maximize
the likelihood under the constraints placed on the model for example,
that certain entries of B, , and (possibly) are 0.
Estimated variances and covariances for the parameters are obtained
from the inverse of the information matrix the negative of the
Hessian matrix of second-order partial derivatives of the log-likelihood
evaluated at the parameter estimates.
The full general machinery of maximum-likelihood estimation is
available for example, alternative nested models can be compared
by a likelihood-ratio test.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
22/35
Introduction to Structural-Equation Modeling 81
5.1.3 Estimation Using the semPackage in R
The tsls function in the sem package is used to estimate structuralequations by 2SLS.
The function works much like the lm function for fitting linear models
by OLS, except that instrumental variables are specifi
ed in theinstruments argument as a one-sided formula.
For example, to fit the first equation in the Duncan, Haller, and Portes
model, we would specify something like
eqn.1 in the ram argument to sem.
Each error variance and covariance is represented as a bidirectionalarrow, .
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 83
To write out the model in this form, it helps to redraw the path diagram,
as in Figure 10 for the Duncan, Haller, and Portes model.
Then the model can be encoded as follows, specifying each arrow,
and giving a name to and start-value for the corresponding parameter
(NA = let the program compute the start-value):
model.DHP.1 ROccAsp, gamma51, NA
RSES -> ROccAsp, gamma52, NA
FSES -> FOccAsp, gamma63, NA
FIQ -> FOccAsp, gamma64, NA
FOccAsp -> ROccAsp, beta56, NA
ROccAsp -> FOccAsp, beta65, NA
ROccAsp ROccAsp, sigma77, NA
FOccAsp FOccAsp, sigma88, NA
ROccAsp FOccAsp, sigma78, NA
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 84
RIQ
RSES
FIQ
FSES
ROccAsp
FOccasp
gamma51
gamma63
gamma64
beta
65
beta
56
sigma88
sigma77
sigma78
Figure 10. Modified path diagram for the Duncan, Haller, and Portes
model, omitting covariances among exogenous variables, and showing er-ror variances and covariances as double arrows attached to the endoge-nous variables.FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
23/35
8/7/2019 fox - intro to struct eqn modelling
24/35
Introduction to Structural-Equation Modeling 89
6. Latent Variables, Measurement Errors, and
Multiple Indicators The purpose of this section is to use simple examples to explore the
consequences of measurement error for the estimation of SEMs.
I will show: when and how measurement error affects the usual estimators of
structural parameters;
how measurement errors can be taken into account in the process of
estimation;
how multiple indicators of latent variables can be incorporated into a
model.
Then, in the next section, I will introduce and examine general structural-equation models that include these features.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 90
6.1 Example 1: A Nonrecursive Model WithMeasurement Error in the Endogenous Variables
Consider the model displayed in the path diagram in Figure 11.
The path diagram uses the following conventions:
Greek letters represent unobservables, including latent variables,
structural errors, measurement errors, covariances, and structural
parameters.
Roman letters represent observable variables.
Latent variables are enclosed in circles (or, more generally, ellipses),
observed variables in squares (more generally, rectangles).
All variables are expressed as deviations from their expectations.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 91
x1
x2
5
6
y3
y4
9
10
7
8
12
51
62
56 65 78
Figure 11. A nonrecursive model with measurement error in the endoge-nous variables.FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 92
xs observable exogenous variables
ys observable fallible indictors of latent
endogenous variables
s (eta) latent endogenous variables
s (zeta) structural disturbances
s (epsilon) measurement errors in endogenous indicators
s, s (gamma, beta) structural parameterss (sigma) covariances
The model consists of two sets of equations:(a) The structural submodel:
5 = 51x1 + 566 + 76 = 62x2 + 655 + 8
(b) The measurement submodel:
y3 = 5 + 9
y4 = 6 + 10
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
25/35
Introduction to Structural-Equation Modeling 93
We make the usual assumptions about the behaviour of the structuraldisturbances e.g., that the s are independent of the xs.
We also assume well behaved measurement errors: Each has an expectation of 0.
Each is independent of all other variables in the model (except theindicator to which it is attached).
One way of approaching a latent-variable model is by substitutingobservable quantities for latent variables.
For example, working with the first structural equation:
5 = 51x1 + 566 + 7y3 9 = 51x1 + 56(y4 10) + 7
y3 = 51x1 + 56y4 + 07
where the composite error,
0
7, is07 = 7 + 9 5610
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 94
Because the exogenous variables x1 and x2 are independent of all
components of the composite error, they still can be employed in the
usual manner as IVs to estimate 51 and 56.
Consequently, introducing measurement error into the endogenous
variables of a nonrecursive model doesnt compromise our usualestimators.
Measurement error in an endogenous variable is not wholly benign: It
does increase the size of the error variance, and thus decreases the
precision of estimation.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 95
6.2 Example 2: Measurement Error in an ExogenousVariable
Now examine the path diagram in Figure 12.
Some additional notation:xs (here) observable exogenous variable or fallible
indicator of latent exogenous variable (xi) latent exogenous variable
(delta) measurement error in exogenous indicator
The structural and measurement submodels are as follows: structural submodel:
y4 = 466 + 42x2 + 7y5 = 53x3 + 54y4 + 8
measurement submodel:
x1 = 6 + 9
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 96
x1
x3
x2
y4
y5
7
8
6
9
Figure 12. A structural-equation model with measurement error in an ex-ogenous variable.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
26/35
Introduction to Structural-Equation Modeling 97
As in the preceding example, Ill substitute for the latent variable in thefirst structural equation:
y4 = 46(x1 9) + 42x2 + 7= 46x1 + 42x2 +
07
where
07 = 7 469is the composite error.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 98
If x1 were measured without error, then we would estimate the firststructural equation by OLS regression i.e., using x1 and x2 as IVs.
Here, however, x1 is not eligible as an IV since it is correlated with 9,
which is a component of the composite error 07.
Nevertheless, to see what happens, let us multiply the rewritten
structural equation in turn by x1 and x2 and take expectations:
14 = 4621 + 4212 4629
24 = 4612 + 4222
Notice that if x1 is measured without error, then the measurement-error variance 29 is 0, and the term 4629 disappears.
Solving these equations for 46 and 42 produces
46 =14
22 1224
212221229
22
42 =2124 121421
22 212
461229
2122 212
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 99
Now suppose that we make the mistake of assuming that x1 is measuredwithout error and perform OLS estimation.
The OLS estimator of 46 really estimates
046 =14
22 1224
2122 212
The denominator of the equation for 46 is positive, and the term
29
22
in this denominator is negative, so |046| < |46|. That is, the OLS estimator of 46 is biased towards zero (or
attenuated).
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 100
Similarly, the OLS estimator of 42 really estimates
042 =2124 121421
22 212
= 42 +4612
29
2122 212
= 42 + bias
where the bias is 0 if 6 does not affect y4 (i.e., 46 = 0); or 6 and x2 are uncorrelated (and hence 12 = 0); or there is no measurement error in x1 after all (29 = 0).
Otherwise, the bias can be either positive or negative; towards 0 oraway from it.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
27/35
Introduction to Structural-Equation Modeling 101
Looked at slightly differently, as the measurement error variance in x1grows larger (i.e., as 29),
04224
22
This is the population slope for the simple linear regression of y4 on x2alone.
That is, when the measurement-error component of x1 gets large,
it comes an ineffective control variable as well as an ineffective
explanatory variable.
Although we cannot legitimately estimate the first structural equation byOLS regression of y4 on x1 and x2, the equation is identified because
both x2 and x3 are eligible IVs:
Both of these variables are uncorrelated with the composite error
07.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 102
It is also possible to estimate the measurement-error variance 29 andthe true-score variance26:
Squaring the measurement submodel and taking expectations
produces
Ex21 = E[(6 + 9)
2]
21 = 26 +
29
because 6 and 9 are uncorrelated [eliminating the cross-product
E(69)].
From our earlier work,
14 = 4621 + 4212 4629
Solving for 29,29 =
4621 + 4212 14
46
and so26 =
21 29
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 103
In all instances, consistent estimates are obtained by substitutingobserved sample variances and covariances for the corresponding
population quantities.
the proportion of the variance of x1 that is true-score variance iscalled the reliability of x1; that is,
reliability(x1) =26
21=
26
26 + 29 The reliability of an indicator is also interpretable as the squared
correlation between the indicator and the latent variable that it
measures.
The second structural equation of this model, for y5, presents nodifficulties because x1, x2, and x3 are all uncorrelated with the structural
error 8 and hence are eligible IVs.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 104
6.3 Example 3: Multiple Indicators of a Latent Variable
Figure 13 shows the path diagram for a model that includes two differentindicators x1 and x2 of a latent exogenous variable 6.
The structural and measurement submodels of this model are as follows; Structural submodel:
y4 = 466 + 45y5 + 7y5 = 53x3 + 54y4 + 8
Measurement submodel:
x1 = 6 + 9x2 = 6 + 10
Further notation: (lambda) regression coefficient relating an indicator
to a latent variable (also called a
factor loading)
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
28/35
I t d ti t St t l E ti M d li 109 I t d ti t St t l E ti M d li 110
8/7/2019 fox - intro to struct eqn modelling
29/35
Introduction to Structural-Equation Modeling 109
It seems odd to use the endogenous variables y4 and y5 as in-
struments, but doing so works because they are uncorrelated with
the measurement errors 9 and 10 (and covariances involving the
structural error 7 cancel).
Now apply x2 to the first equation and x1 to the second equation,obtaining
24 = 4612 + 4525
14 =4612 + 4515
because x2 is uncorrelated with 07 and x1 is uncorrelated with
007.
We already know and so these two equations can be solved for 46and 45.
Moreover, because there is more than one way of calculating
(and hence of estimating) , the parameters 46
and 45
are also
overidentified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 110
In this model, if there were only one fallible indicator of 6, the modelwould be underidentified.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 111
7. General Structural Equation Models
(LISREL Models) We now have the essential building blocks of general structural-
equation models with latent variables, measurement errors, and multiple
indicators, often called LISREL models.
LISREL is an acronym for LInear Structural RELations.
This model was introduced by Karl Jreskog and his coworkers;
Jreskog and Srbom are also responsible for the widely used
LISREL computer program.
There are other formulations of general structural equation models thatare equivalent to the LISREL model.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 112
7.1 Formulation of the LISREL Model
Several types of variables appears in LISREL models, each representedas a vector:
(n1)
(xi) latent exogenous variables
x(q1)
indicators of latent exogenous variables
(q1)
(delta) measurement errors in the xs
(m1)
(eta) latent endogenous variables
y(p1)
indicators of latent endogenous variables
(p1)
(epsilon) measurement errors in the ys
(m1)
(zeta) structural disturbances
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural Equation Modeling 113 Introduction to Structural Equation Modeling 114
8/7/2019 fox - intro to struct eqn modelling
30/35
Introduction to Structural-Equation Modeling 113
The model also incorporates several matrices of regression coefficients:structural coefficients relating s (latent
B(mm)
(beta) endogenous variables) to each other
structural coefficients relating s to s
(mn) (gamma) (latent endogenous to exogenous variables)
factor loadings relating xs to s (indicators to
x(qn)
(lambda-x) latent exogenous variables)
factor loadings relating ys to s (indicators to
y(pm)
(lambda-y) latent endogenous variables)
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 114
Finally, there are four parameter matrices containing variances andcovariances:
variances and covariances of the s
(mm)
(psi) (structural disturbances)
variances and covariances of the s
(qq)
(theta-delta) (measurement errors in exogenous indicators)
variances and covariances of the s
(pp)
(theta-epsilon) (measurement errors in endogenous indicators)
variances and covariances of the s
(nn)
(phi) (latent exogenous variables)
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 115
The LISREL model consists of structural and measurement submodels. The structural submodel is similar to the observed-variable structural-
equation model in matrix form (for the ith of N observations):
i = Bi + i + i Notice that the structural-coefficient matrices appear on the right-
hand side of the model.
In this form of the model, B has 0s down the main diagonal. The measurement submodel consists of two matrix equations, for the
indicators of the latent exogenous and endogenous variables:
xi = xi + iyi = yi + i
Each column of the matrices generally contains an entry that isset to 1, fixing the scale of the corresponding latent variable.
Alternatively, the variances of exogenous latent variables in mightbe fixed, typically to 1.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 116
7.2 Assumptions of the LISREL Model
The measurement errors, and , have expectations of 0;
are each multivariately-normally distributed;
are independent of each other;
are independent of the latent exogenous variables (s), latentendogenous variables (s), and structural disturbances (s).
The N observations are independently sampled.
The latent exogenous variables, , are multivariate normal. This assumption is unnecessary for exogenous variables that are
measured without error.
FIOCRUZ/PROCC Copyright c2008 by John Fox
8/7/2019 fox - intro to struct eqn modelling
31/35
8/7/2019 fox - intro to struct eqn modelling
32/35
Introduction to Structural-Equation Modeling 125 Introduction to Structural-Equation Modeling 126
8/7/2019 fox - intro to struct eqn modelling
33/35
7.5 Examples
7.5.1 A Latent-Variable Model for the Peer-Influences Data
Figure 14 shows a latent-variable model for Duncan, Haller, and Portesspeer-influences data.
FIOCRUZ/PROCC Copyright c2008 by John Fox
x =1 1
x =2 2
x =3 3
x =4 4
x =5 5
x =6 6
s=
s
1
2
1
2
y1 y2
y3 y4
1 2
3 4
y
211
1y
32
1212 21
11
12
1314
2324
25
26
Figure 14. Latent-variable model for the peer-influences data.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 127
The variables in the model are as follows:x1 (1) respondents parents aspirations
x2 (2) respondents family IQ
x3 (3) respondents SES
x4 (4) best friends SES
x5 (5) best friends family IQ
x6 (6) best friends parents aspirations
y1 respondents occupational aspiration
y2 respondents educational aspiration
y3 best friends educational aspiration
y4 best friends occupational aspiration
1 respondents general aspirations
2 best friends general aspirations
In this model, the exogenous variables each have a single indicatorspecified to be measured without error, while the latent endogenous
variables each have two fallible indicators.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 128
The structural and measurement submodels are as follows: Structural submodel:
12
=
0 1221 0
12
+11 12 13 14 0 0
0 0 23 24 25 26
12
3456
+12
= Var
12
=
21 1212
22
(note: symmetric)
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 129 Introduction to Structural-Equation Modeling 130
8/7/2019 fox - intro to struct eqn modelling
34/35
Measurement submodel:
x1x2x3x4
x5x6
=
1234
56
; i.e.,x = I6, = 0(66)
, and = xx(66)
y1y2y3y4
=
1 0y21 00 y320 1
12
+
1234
, with = diag(11, 22, 33, 44)
FIOCRUZ/PROCC Copyright c2008 by John Fox
Maximum-likelihood estimates of the parameters of the model and theirstandard errors:
Parameter Estimate Std. Error Parameter Estimate Std. Error
11 0.161 0.038 y21 1.063 0.092
12 0.250 0.045 y42 0.930 0.071
13 0.218 0.043 21 0.281 0.04614 0.072 0.050
22 0.264 0.045
23 0.062 0.052 12 0.023 0.05224 0.229 0.044
11 0.412 0.052
25 0.349 0.045 22 0.336 0.053
26 0.159 0.040 33 0.311 0.047
12 0.184 0.096 44 0.405 0.047
21 0.235 0.120
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 131
With the exception of b14 and b23, the direct effect of each boys SESon the others aspirations, all of the coefficients of the exogenous
variables are statistically significant.
The reciprocal paths, b12 and b21, have respective p-values just smallerthan and just larger than .05 for a two-sided test, but a one-sided testwould be appropriate here anyway.
The negative covariance between the structural disturbances, b12 =0.023, is now close to 0 and non-significant.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 132
Because the indicator variables are standardized in this model, the
measurement-error variances represent the proportion of variance of
each indicator due to measurement error, and the complements of the
measurement-error variances are the reliabilities of the indicators.
For example, the estimated reliability of y1 (the respondents reportedoccupational aspiration) as an indicator of 1 (his general aspirations)
is 1 0.412 = .588. Further details are in the computer examples.
FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 133 Introduction to Structural-Equation Modeling 134
8/7/2019 fox - intro to struct eqn modelling
35/35
7.5.2 A Confirmatory-Factor-Analysis Model
The LISREL model is very general, and special cases of it correspondto a variety of statistical models.
For example, if there are only exogenous latent variables and their
indicators, the LISREL models specializes to the confirmatory-factor-analysis (CFA) model, which seeks to account for the correlational
structure of a set of observed variables in terms of a smaller number of
factors.
The path diagram for an illustrative CFA model appears in Figure 15. The data for this example are taken from Harmans classic factor-
analysis text.
Harman attributes the data to Holzinger, an important figure in the
development of factor analysis (and intelligence testing).
FIOCRUZ/PROCC Copyright c2008 by John Fox
1 2 3 4 5 6 7 8 9
2
1
3
x1 x2 x3 x4 x5 x6 x7 x8 x9
12 23
13
x
11
x
21 x
31 x
42 x
52 x
62 x
73 x
83 x
93
Figure 15. A confirmatory-factor-analysis model for three factors underly-ing nine psychological tests.FIOCRUZ/PROCC Copyright c2008 by John Fox
Introduction to Structural-Equation Modeling 135
The first three tests (Word Meaning, Sentence Completion, and
Odd Words) are meant to tap a verbal factor; the next three (Mixed
Arithmetic, Remainders, Missing Numbers) an arithmetic factor, and
the last three (Gloves, Boots, Hatchets) a spatial-relations factor.
The model permits the three factors to be correlated with one-another.
The normalizations employed in this model set the variances of
the factors to 1; the covariances of the factors are then the factorintercorrelations.
Estimates for this model, and for an alternative CFA model specifyinguncorrelated (orthogonal) factors, are given in the computer examples.
FIOCRUZ/PROCC Copyright c2008 by John Fox