Multivariate Statistics: Factor Analysis - UMGams.med.uni-goettingen.de/download/Steffen-Unkel/chap8.pdf · Introduction to latent variable modelling Exploratory factor analysis Con

Introduction to latent variable modellingExploratory factor analysis

Confirmatory factor analysis

Multivariate Statistics: Factor Analysis

Steffen Unkel

Department of Medical StatisticsUniversity Medical Center Goettingen, Germany

Summer term 2017 1/52



Latent variables in multivariate data

Multivariate data are often viewed as indirect measurementsarising from underlying sources or latent variables whichcannot be directly measured.

One is forced to examine the hidden sources by collecting dataon manifest variables which are considered indicators of theconcepts of real interest.

Examples:1 Educational and psychological tests;2 EEG brain scans;3 Trading prices of stocks.




Classification of latent variable models

Observed variablesContinuous Categorical

Latent variables (interval/ratio) (nominal/ordinal)

Continuous Factor analysis Latent trait analysis(interval/ratio) (Item response theory)

Categorical Latent profile analysis Latent class analysis(nominal/ordinal)




Factor analysis

Factor analysis is a statistical model that aims to identifythese latent sources.

Factor analysis aims to explain the interrelationships among pmanifest variables by k (� p) latent variables called commonfactors.

To allow for some some variation in each observed variablethat remains unaccounted for by the common factors, padditional latent variables called unique factors are introduced.




The birth of factor analysis

Spearman, C. (1904): ’Generalintelligence’ objectively determinedand measured, American Journal ofPsychology, Vol. 15, pp. 201-293.




Exploratory factor analysis (EFA) model

Model equation:x = Λξ + δ ,

x: p × 1 vector of observed variables;ξ: k × 1 vector of latent variables (common factors) (k � p);Λ: p × k matrix of factor loadings with rank(Λ) = k;δ: p × 1 vector of latent variables (unique or specific factors).

Assumptions:E(x) = E(δ) = 0; E(ξ) = 0 and E(ξδ>) = Ok×p;

E(xx>) = Σ and E(δδ>) = Θ, where Θ is assumed to be apositive definite diagonal matrix containing the unique orspecific variances;

E(ξξ>) = I (or E(ξξ>) = Φ).




Path diagram of the EFA model

ggsärr*

äiän§ää

Figure l.l An Exploratory Factor Modelr,Summer term 2017 7/52



Factor extraction

Model covariance structure for the EFA model withorthogonal common factors: Σ = ΛΛ> + Θ.

A pair {Λ,Θ} is sought which gives the best fit, for somespecified value of k, to the sample covariance matrix S withrespect to some discrepancy measure.

The process of finding this pair is called factor extraction.Various factor extraction methods have been proposed.

If the data are assumed normally distributed the maximumlikelihood (ML) principle is preferred.




Maximum likelihood factor analysis

If x ∼ Np(0,Σ), then the log-likelihood function for the rowsx1, . . . , xn of the mean-centred data matrix X ∈ Rn×p may bewritten

l(Σ) = −np

2ln(2π)− n

2ln(det(Σ))− 1

2

n∑i=1

tr(

x>i Σ−1xi),

which depends on Λ and Θ through Σ = ΛΛ> + Θ.

Make Λ well defined by imposing the computationallyconvenient uniqueness condition:

Λ>Θ−1Λ a diagonal matrix .

ML factor analysis is scale invariant.




Alternatives to ML factor analysis

A natural choice is the least squares approach for fitting theEFA model.

It can be formulated as the following general class of weightedleast squares (WLS) problems:

minΛ,Θ||(S− ΛΛ> −Θ)Γ||2F ,

where Γ is a matrix of weights. The case Γ = S−1 is known asgeneralized least squares (GLS) factor analysis.

If Γ = Ip, WLS reduces to an unweighted LS problem.

The standard numerical solutions of the optimization problemsare iterative (e.g. Newton-Raphson procedure, EM-algorithm).




Principal factor analysis

Principal factor analysis (PFA), also known as principal axisfactoring, consists of the following iterative cycle:

1 Guess Θ (e.g. take s2i (1− r2i ) as initial estimate of the ithspecific variance, where r2i is the square of the multiplecorrelation coefficient of the ith variable with the otherobserved variables and s2i is the variance of the ith variable).

2 Conduct a PCA on the reduced covariance matrix, S− Θ, andwrite the scaled loadings of the first k components as columnsof the matrix Λ.

3 Recalculate Θ as the diagonal of S− ΛΛ.

4 Return to step 2 with this new estimate of Θ.

The cycle through steps 2–4 is continued until two successivepairs Λ and Θ are identical to within a pre-set level ofaccuracy.




Principal component analysis (PCA) solution

Let Σ have eigenvalue-eigenvector pairs (λj , aj) withλ1 ≥ λ2 ≥ . . . ≥ λp ≥ 0.

For the sample covariance matrix S, theeigenvalue-eigenvector pairs are (λj , aj) with

λ1 ≥ λ2 ≥ . . . ≥ λp ≥ 0.

Take the first k pairs and take√λj aj as the jth column of

the p × k loading matrix Λ.

The estimated specific variances, Θ, are provided by the

diagonal elements of the matrix S− ΛΛ>

.

Unlike PFA, PCA is not a proper factor extraction method.




Heywood cases

It may happen that, while optimizing the objective function ofinterest, on a particular iteration estimates of one or more ofthe unique variances are < 0.

Such a solution is said to be improper, or a Heywood case.

For most computer programs, negative values of uniquevariances, if they occur on a particular iteration, are changedto small positive numbers before proceeding with the nextstep.

There is a long standing debate in EFA about the acceptanceof zero entries in Θ.




The EFA model and its assumptions imply that...

i. the variance of the ith variable, σ2i , is given by

σ2i =k∑

j=1

λ2ij + θi ,

where∑k

j=1 λ2ij is known as the communality of the variable

and θi is the specific or unique variance of the unique factor i .

ii. the covariance of the ith and jth variables, σij , is given by

σij =k∑

l=1

λilλjl .

iii. the covariance between the ith variable and the jth commonfactor is λij .




Restrictions on the number of factors

Factor analysis has a built-in restriction on the number k ofcommon factors that can be included in any given model.

The data to be used in the estimation will come from S andhence will contain 1

2p(p + 1) separate items of information.

There pk + p unknown parameters.

The requirement Λ>Θ−1Λ being a diagonal matrix introduces12k(k − 1) constraints.

Hence for estimability of parameters, we require(p − k)2 ≥ p + k.




Factor scores

The term ‘estimation’ when applied to factors means thatthey cannot be identified uniquely, rather than obtaining themin a standard procedure for finding particular statistics.

This form of indeterminacy is known as ‘factor scoreindeterminacy’.

This indeterminacy is due to the fact that the EFA modelpostulates the existence of k common and p unique factorssuch that the p observed variables can be represented as theirlinear combinations.

Thus, the scores of the n observations on the common andunique factors are not uniquely identifiable.




Factor scores (2)

Suppose that a pair {Λ,Θ} is obtained by solving the factorextraction problem stated above.

Common factor scores can be computed as a function of Λ, Θand x in a number of ways.

Weighted least squares method (Bartlett, 1937 and 1938):ξ = (Λ>Θ−1Λ)−1Λ>Θ−1x.

The common factor scores for the ith case (i = 1, . . . , n) are

obtained as ξi =(

Λ>

Θ−1

Λ)−1

Λ>

Θ−1

xi .

The n vectors ξi are then the rows of the matrix of commonfactor scores F ∈ Rn×k .




Factor scores (3)

Regression method (Thomson, 1951):

ξi = Λ> (

ΛΛ>

+ Θ)−1

xi i = 1, . . . , n .

A modification of Bartlett’s factor scores proposed byAnderson and Rubin (1956) are:

ξi =(

Λ>

Θ−1

SΘ−1

Λ)−1/2

Λ>

Θ−1

xi i = 1, . . . , n ,

which leads to uncorrelated common factor scores F.




Choosing the number of common factors

An advantage of ML factor analysis is that it has anassociated formal hypothesis testing procedure for the value ofk. The test statistic is

U = n′min(FML) ,

where n′ = n − 1− 16(2p + 5)− 2

3k and FML is defined onslide 42.

If k common factors are adequate to account for the observedcovariances or correlations, then U has, asymptotically, aχ2-distribution with ν degrees of freedom, where

ν =1

2(p − k)2 − 1

2(p + k) .




Choosing the number of common factors (2)

Sequential procedure:

Starting with some small value of k, the model parameters areestimated by maximum likelihood.

If U is not significant the current value of k is accepted,otherwise k is increased by one and the process repeated.

If, at any stage, the degrees of freedom of the test becomeszero, then either

1 no nontrivial solution is appropriate or2 alternatively the factor model itself is questionable.




Rotational indeterminacy

If the k-factor model holds then it also holds if the factors arerotated.

If T is an arbitrary orthogonal k × k matrix, the EFA modelmay be rewritten as

x = ΛTT>ξ + δ ,

which is a model with loading matrix ΛT and common factorsT>ξ.

The assumptions about the variables that make up theoriginal model are not violated by this transformation.




Rotational indeterminacy (2)

Thus, if the EFA model holds, Θ can be written asΣ = (ΛT)(T>Λ>) + Θ, that is, there is a rotationalindeterminacy in the decomposition of Σ in terms of Λ and Θ.

There is an infinite number of factor loadings satisfying theoriginal assumptions of the model.

To ensure a unique solution for the model unknowns Λ and Θadditional constraints such as e.g. Λ>Θ−1Λ being a diagonalmatrix are imposed on the parameters.

Such solutions are usually difficult to interpret.




Simple structure

Instead, parameter estimation is usually followed by some kindof rotation of Λ to some structure with specific features.

Rotation is a process by which a solution is made moreinterpretable without changing its underlying mathematicalproperties.

This aim is essentially what is usually referred to as simplestructure.

When simple structure is achieved the observed variables willfall into mutually exclusive groups whose loadings are high onsingle factors, perhaps moderate to low on a few factors, andof negligible size on the remaining factors.




Orthogonal versus oblique rotation

Orthogonal rotation methods restrict the rotated factors tobeing uncorrelated.

Non-orthogonal (oblique) rotation methods allow correlatedfactors.

If a researcher is more interested in the generalizability ofhis/her results, then orthogonal rotation is to be preferred.

If one is primarily concerned with getting results that “best fit”his/her data, then the researcher should rotate the factorsobliquely.




Orthogonal versus oblique rotation (2)

One advantage of an orthogonal rotation is that the loadingsrepresent correlations between factors and manifest variables.

This is not the case with an oblique rotation because of thecorrelations between the factors.

There are a variety of orthogonal and oblique rotationtechniques although only relatively few are in general use.

Although the axes may be rotated, the distribution of thepoints will remain invariant.




A comparison of EFA and PCA

1 EFA postulates a model for the data - PCA does not.

2 Whereas EFA tries to explain the interrelations between theobserved variables, PCA is primarily concerned with explainingthe variance of the observed variables.

3 If the number of PCs is increased, from k to k + 1, the first kcomponents are unchanged. This is not the case in EFA.

4 The calculation of PC scores is straightforward. Thecalculation of factor scores is more complex.

5 In contrast to PCA, ML factor analysis is scale invariant.




A comparison of EFA and PCA II

Despite these differences, the results from both types ofanalysis are frequently very similar.

Certainly if the specific variances are small we would expectboth forms of analysis to give similar results.

If the specific variances are large they will be absorbed into allthe PCs, both retained and rejected, whereas EFA accountsfor specific variances of the observed variables.

If observed variables are almost uncorrelated, EFA has nothingto explain.




Summary

EFA is an attempt to explain a set of multivariate data usinga smaller number of dimensions that one begins with.

EFA tries1 to explain the covariances (or correlations) of the observed

variables by means of few common factors.2 makes special provision for variances specific to the observed

variables and noise.

Ambiguities in the EFA model:1 Rotational indeterminacy,2 Factor score indeterminacy.

EFA is pointless if the observed variables are almostuncorrelated.




Revisiting the assumptions of EFA

The assumptions of the EFA model are made regardless of thesubstantive appropriateness.

Additional and generally arbitrary assumptions must then beimposed in order to estimate the model parameters.

The EFA model’s1 inability to incorporate substantively meaningful constraints,

and2 its necessary imposition of substantively meaningless

constraints,

has earned it the scornful label of garbage in, garbage out(GIGO) model.




Confirmatory factor analysis (CFA)

In the confirmatory factor analysis (CFA) model, theresearcher imposes substantively motivated constraints.

These constraints determine1 which pairs of common factors are correlated,2 which observed variables are affected by which common

factors,3 which observed variables are affected by a unique factor,4 which pairs of unique factors are correlated.

Statistical tests can be performed to evaluate whether thedata confirm the substantively generated model.




Exemplary path diagram of CFA

§

Figure 1.2 A Confirmatory Factor Mode!




The CFA model

Matrix Dimension Mean Covariance Dimension Description

x (p × 1) 0 Σ = E(xx>) (p × p) observed variablesξ (k × 1) 0 Φ = E(ξξ>) (k × k) common factorsΛ (p × k) – – – loadings of x on ξ

δ (p × 1) 0 Θ = E(δδ>) (p × p) unique factors

Factor equation: x = Λξ + δ.

Covariance equation (assuming E(ξδ>) = Ok×p): Σ = ΛΦΛ> + Θ.




Specifying hypotheses

One could specify a parameter in one of three ways:1 as a prescribed fixed parameter,2 as a constrained-equal parameter equal to some other

parameters, or3 as a free parameter to be estimated.

Whether a parameter is fixed, constrained-equal, or freefollows from the researcher’s hypothesis.

Usually fixed parameters in Λ enter as zeros.

Elements of the Φ and Θ can also be made fixed or free orconstrained-equal.




Example: Measurement of psychological disorders

Suppose one wants to measure psychological disorders for asample of persons at two points in time.

A single, adequate measure of psychological disorders is notavailable.

A measurement model is proposed in which two latentvariables are assumed: psychological disorder at time 1 andpsychological disorder at time 2.

Psychological disorder at each point in time is imperfectlymeasured by two observed variables, the number ofpsychophysiological symptoms at time 1 and at time 2.




A model for the measurement of psychological disordersol

aJE

qcl

qA\

puB

pel

slrr

llsa

pEä§

eJ e

ql o

l dn

sI lI

lepo

ltrF

o, tu

req

sJol

c"J

uoru

uroj

o ilo

,([u

o uo

sP

eol

elqe

lJ"a

E§e

nbä

eql E

urur

urex

e ul

€ueq

c es

nec,

(11c

a4P

lou

oP

r z,

roJ

luel

cgJa

oc e

ql 's

llun

alqs

rle^

lue1

el e

ql u

t ot

ueqa

xraq

l'Ig

PuB

'zr 'I,

sel

q€IJ

e^

qIäp

$ r

v 'rS

* z

r0 +

Irt

IY

I ag

Z u

orlu

nbe

reP

lsuo

J'4

'7 u

orP

nbo

Jo x

uleu

r ifu

esse

cou

eql ln

onp

aql

lenb

e ol

Peu

lerls

uü'

l§ra

ldeq

c ro

lq u

r P

u" 'a

.ra1

1)

!ryu

III

uelIJ

A\

eq u

ec q

rlqa

J4tS

=x

J+tff

:eY

+ rr

tzy

1 rf

rrl

I'I E

]

11 m 6 t\) t\) o A o o (! o E tr o o o rE ö o o q9. o § U o o G a

-(x

=,)

:snq

t su

ollB

nbe

rope

S e

ql

;rur

oJ x

.ul€

ur e

ql q

4rn

alqr

led

fol

'€'Z

uor

lenb

e ut

Psl

uess

rdI :

:tqcg

e^ P

äÄre

sqo

aq1

Euo

rut

9C




Multitrait-Multimethod models

In the multitrait-multimethod (MTMM) model each of a setof traits is measured by each of a set of methods.

If the measurement of a trait is not affected by the methodused in the measurement, the observed variable would loadonly on the common factor for that trait, and not on thefactor for that method.

The MTMM model attempts to disentangle the effects ofdifferent substantive concepts from the methods used tomeasure those concepts.

The MTMM model can be formulated as a CFA model.




Path diagram for an MTMM modelL D § tE C

, 3 a G B - G E fr E . o §




Loading matrix for MTMM model

λ11 0 0 λ14 0 00 λ22 0 λ24 0 00 0 λ33 λ34 0 0λ41 0 0 0 λ45 00 λ52 0 0 λ55 00 0 λ63 0 λ65 0λ71 0 0 0 0 λ760 λ82 0 0 0 λ860 0 λ93 0 0 λ96

𝚲 =

Trait loadings Method loadings

ξ1 ξ2 ξ3 ξ4 ξ5 ξ6

𝑥1𝑥2𝑥3𝑥4𝑥5𝑥6𝑥7𝑥8𝑥9




Factor correlation matrix for MTMM model

ϕ11 ϕ12 ϕ13 ϕ14 ϕ15 ϕ16ϕ21 ϕ22 ϕ23 ϕ24 ϕ25 ϕ26ϕ31 ϕ32 ϕ33 ϕ34 ϕ35 ϕ36ϕ41 ϕ42 ϕ43 ϕ44 ϕ45 ϕ46ϕ51 ϕ52 ϕ53 ϕ54 ϕ55 ϕ56ϕ61 ϕ62 ϕ63 ϕ64 ϕ65 ϕ66

𝚽 =

trait/trait trait/method

method/method method/trait




Identification

Identification is concerned with whether the parameters of themodel are uniquely determined.

A CFA model that is identified implies a unique solution forthe free parameters, given the observed covariance matrix, S,and the fixed and constrained-equal parameters of the model.

Attempts to estimate models that are not identified result inarbitrary estimates of the parameters and meaninglessinterpretations.

What is required is a set of verifiable conditions thatdetermine unambiguously whether a CFA model is identified.




Estimation

Estimation involves finding values of Λ, Φ, and Θ thatgenerate an estimated covariance matrix Σ that is as close aspossible to S in some well-defined sense.

These estimates must satisfy the constraints that have beenimposed on the model.

Define a matrix Σ∗ according to the formulaΣ∗ = Λ∗Φ∗Λ∗

>+ Θ∗, where Λ∗, Φ∗ and Θ∗ are any matrices

that incorporate the imposed constraints.

A function F (S,Σ∗) that measures how close a given Σ∗ is toS is called a discrepancy function.




Discrepancy functions

The most frequently used discrepancy functions are thefollowing:

1 Unweighted least squares (ULS):FULS = tr

[(S−Σ∗)>(S−Σ∗)

].

2 Generalized least squares (GLS):

FGLS = tr[(

(S−Σ∗)S−1)2]

.

3 Maximum likelihood (ML):

FML = ln |Σ∗|+ tr(

SΣ∗−1)− ln(det(S))− p ,

Unlike ULS or GLS, ML estimation requires the making ofdistributional assumptions.




Algorithms

In general, there are no closed-form solutions for theoptimization problems described above.

The minimization of the discrepancy functions can beperformed using quasi-Newton algorithms.

Quasi-Newton algorithms need only the first derivative of thediscrepancy function.

In quasi-Newton methods, the Hessian matrix of secondderivatives does not need to be evaluated directly. Instead,the Hessian matrix is approximated.




Avoiding improper solutions

During the iterations, it is essential to avoid improper valuesfor the parameters.

For example, a variance should not be negative or acorrelation should not be greater than unity.

One approach of dealing with the case when the value of aparameter becomes negative is to set it to zero or some verysmall positive quantity and proceed.

Sometimes starting the iterations with different initial valuesalso gets around this problem.




Statistical tests

A test of the goodness-of-fit of how well the model-predictedcovariance matrix fits S is provided by

U = (N − 1)FMLmin,

where N is the sample size and FMLminis the minimised value

of the maximum likelihood discrepancy function.

If the sample size is sufficiently large, the U statistic providesa test that the population covariance matrix is equal to thecovariance matrix implied by the fitted model against thealternative that the population matrix is unconstrained.

Under the equality hypothesis, U has a χ2 distribution withdf = p(p + 1)/2−m degrees of freedom, where m is thenumber of free parameters in the model.




Problems with the χ2 test

The χ2 goodness-of-fit statistic has limited practical usebecause in large samples even relatively trivial departures fromthe null hypothesis will lead to its rejection.

A more satisfactory way to use the test is for a comparison ofa series of nested models.

A large difference in the statistic for two models comparedwith the difference in the degrees of freedom of the modelsindicate that the additional parameters in the more generalmodel provides a genuine improvement in fit.

Further problems arise when the observed variables are notnormally distributed.




Fit indices

Root mean square error of approximation (RMSEA):

RMSEA =

√max

(χ2T − dfT

(N − 1)dfT, 0

),

where χ2T is the chi-square of the tested model with dfT

degrees of freedom.

For the RMSEA, 0 indicates perfect fit and larger valuesindicate lack of fit.




Fit indices (2)

Comparative fit index (CFI):

CFI = 1−max

(χ2T − dfT , 0

)max

(χ20 − df0, 0

) ,

where χ20 is the chi-quare for the null model with df0 degrees

of freedom.

The null model hypothesized that there are no non-zerocovariances between the observed variables.

The CFI ranges between 0 and 1, with 0 indicating horrible fitand 1 perfect fit.




Extending the CFA model

CFA is limited by its inability to incorporate structuralrelationships among common factors.

Structural equation or covariance structure models allow boththe response and explanatory latent variables to be linked by aseries of linear equations.

The aim is to explain the correlations or covariances of theobserved variables in terms of

1 the relationships of these variables to the assumed underlyinglatent variables, and

2 the relationships postulated between the latent variablesthemselves.

Covariance structure models represent the convergence ofrelatively independent research traditions in various disciplines.




Covariance structure model

Pair of measurement models:

x = Λxξ + δ

y = Λyη + ε .

Structural equation model:

η = Bη + Γξ + ζ ,

η: r × 1 vector of unobserved endogenous variables.ξ: k × 1 vector of unobserved exogenous variables;B: r × r matrix of coefficients relating the endogenousvariables to one another;Γ: r × k matrix of coefficients relating the exogenous variablesto the endogenous variables;ζ: r × 1 vector of errors in equations.




Path diagram of the covariance structure model

cp

Ql,z \tz

Figure 1.3 Combined Mea$rement Component and §tucfirral Component of üre Covarirnce §üucire Model

rriffi

är älFääääg : [äi ääi H[ä

fäg älgräF§§eriääE Hü§ cäq ääF:r äää

ii § ääägg$i g

[ää, *äi

$ §iä il i

-ä§

äiäiääääigää1:gäiäiääiiäää E äE* EäE rä+iä;E Eää ;; Iää +* rä

*ä äää§äää 'ää E F[ s r*ä äf E +F G




Summary

Many situations conform to a common-factor framework.

The ability of the CFA model to test specific structuressuggested by substantive theory gives it a major advantageover the EFA model.

The possibility of making causal inferences about latentvariables is one that has great appeal.

CFA can deal with a variety of applications. These include theanalysis of Multitrait-Multimethod covariance matrices.

CFA is a submodel of the more general covariance structuremodels or structural equation models.

Covariance structure models allow to incorporate structuralrelationships among common factors.


Documents

Multivariate Statistics: Factor Analysis - UMGams.med.uni-goettingen.de/download/Steffen-Unkel/chap8.pdf · Introduction to latent variable modelling Exploratory factor analysis Con