27
Applied Multivariate Analysis Seppo Pynn¨ onen Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Seppo Pynn¨ onen Applied Multivariate Analysis

Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

  • Upload
    ngoliem

  • View
    227

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

Applied Multivariate Analysis

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

Spring 2017

Seppo Pynnonen Applied Multivariate Analysis

Page 2: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Confirmatory Factor Analysis (CFA)

Seppo Pynnonen Applied Multivariate Analysis

Page 3: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

1 The model

2 Model Evaluation

Chi-square Test

Some Other Statistics

Seppo Pynnonen Applied Multivariate Analysis

Page 4: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

In exploratory factor analysis the aim is to find, for a set ofobserved variables x1, . . . , xp, a set of underlying latent factorsf1, . . . , fq, where m < p.

The model is as in EFA of the form

x = Λf + δ, (1)

where δ is the error term vector.

The factors are supposed to account for the inter-correlations ofthe observed variables.

When m > 1, the factor solution is not unique (not identified).Factor axes can be rotated to find a ”simple structure”.

Seppo Pynnonen Applied Multivariate Analysis

Page 5: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

In confirmatory factor analysis, the investigator is supposed toknow the number of underlying factors.

In addition he/she is supposed to have additional knowledge thatallows to specify at least m2 independent conditions on Λ(loadings) and Φ (the factor covariance matrix) in

Σ = Λ′ΦΛ + Θδ (2)

such the remaining parameters can be solved uniquely. In (2) Θδ isthe diagonal matrix with variances of the error terms δi on thediagonal i = 1, . . . , p

In such a case, we say that the model is identified.

Seppo Pynnonen Applied Multivariate Analysis

Page 6: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Most of the restrictions come from the modeling constraints.

In addition, because factors do not have scales, one common wayis to define Φ a correlation matrix.

Another popular base set up is alternatively fix on each column ofΛ one loading equal to one.

Essentially this implies that the scale of the corresponding factor isfixed according to the corresponding variable.

Seppo Pynnonen Applied Multivariate Analysis

Page 7: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

These technical constraints impose m restrictions.

Technically we need at least m(m − 1) more restrictions.

Usually it suffices that the zeros are distributed over the rows of Λsuch that the columns remain linearly independent (Λ has fullcolumn rank).

Seppo Pynnonen Applied Multivariate Analysis

Page 8: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Example 1

The following data is from a study, where the relationship betweenperformance and job satisfaction was investigateda.

The variables are:amm1: Achievement motivation measure 1,amm2: Achievement motivation measure 2,tssem1: Task specific self esteem measure 1,tssem2: Task specific self esteem measure 2,jsm1: Job satisfaction measure 1,jsm2: Job satisfaction measure 2.

In addition the data includesvim: Verbal intelligence measure,

performance: Performance (measured in hundreds of dollars).

aBagozzi, R.P. (1980). Performance and satisfaction in an industrial sales force: An examination of their

antecedents and simultaneity. Journal of Marketing 44, 65–77.

Seppo Pynnonen Applied Multivariate Analysis

Page 9: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Here we investigate whether the achievement measures (amm1, amm2),

task specific measures (tssem1, tssem2), and job satisfaction measures

(jsm1, jsm2) are measuring the concept they are aimed to measure.

Seppo Pynnonen Applied Multivariate Analysis

Page 10: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

The model: ����SAT

����ACH

����TSEM

jsm1

jsm2

amm1

amm2

tssem1

tssem2

�����

����:

XXXXXXXXXz

������

���:

XXXXXXXXXz

������

���:

XXXXXXXXXz

'

&

'

&'

&

--

--

--

δ1

δ2

δ3

δ4

δ5

δ6

Seppo Pynnonen Applied Multivariate Analysis

Page 11: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

The correlation matrix, means, and standard deviations based on asample of n = 122 observations is the following

perform js1 js2 am1 am2 tssm1 tssm2 verbal

perform 1.000js1 .418 1.000js2 .394 .627 1.000am1 .129 .202 .266 1.000am2 .189 .284 .208 .365 1.000tssm1 .544 .281 .324 .201 .161 1.000tssm2 .507 .225 .314 .172 .174 .546 1.000verbal -.357 -.156 -.038 -.199 -.277 -.294 -.174 1.000mean 720.86 15.54 18.46 14.90 14.35 19.57 24.16 21.36std 2.09 3.43 2.81 1.95 2.06 2.16 2.06 3.65

Seppo Pynnonen Applied Multivariate Analysis

Page 12: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

In SAS program estimating the model goes as follows.

/* Job satisfacton example */

data jobsat(type=corr);

infile cards missover; /* jumps over the (missing) symmetric part of the correlation matrix */

input _type_ $ _name_ $ performm jobsatm1 jobsatm2 achvm1 achvm2 sestm1 sestm2 intlm;

datalines;

corr performm 1.000

corr jobsatm1 .418 1.000

corr jobsatm2 .394 .627 1.000

corr achvm1 .129 .202 .266 1.000

corr achvm2 .189 .284 .208 .365 1.000

corr sestm1 .544 .281 .324 .201 .161 1.000

corr sestm2 .507 .225 .314 .172 .174 .546 1.000

corr intlm -.357 -.156 -.038 -.199 -.277 -.294 -.174 1.000

n . 122 122 122 122 122 122 122 122

std . 2.09 3.43 2.81 1.95 2.06 2.16 2.06 3.65

;

run;

Seppo Pynnonen Applied Multivariate Analysis

Page 13: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Use PROC CALIS to run confirmatory factor analysis:

proc calis data = jobsat;

/* path specifications */

path

Jsat --> jobsatm1 = 1, /* identification constraint */

Jsat --> jobsatm2, /* free parameter to be estimated */

Achiev --> achvm1 = 1, /* fixed to 1 */

Achiev --> achvm2, /* free parameter */

Selfes --> sestm1 = 1, /* fixed to 1 */

Selfes --> sestm2, /* free */

/* Variances of the error terms of observed indicator variables */

<--> jobsatm1, /* freely estimated */

<--> jobsatm2,

<--> achvm1,

<--> achvm2,

<--> sestm1,

<--> sestm2,

/* latent variable covariances

<--> Jsat Achiev Selfes /* freely estimated variances and covariances of the latent variables */

;

/* generate path diagram */

pathdiagram diagram=[init standardized] /* shows initial and standardized solutions */

exogcov /* shows correlations/covariances between factors */

title = "CFA for Job Satisfaction";

run;

Seppo Pynnonen Applied Multivariate Analysis

Page 14: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Initial path diagram produced by CALIS

Seppo Pynnonen Applied Multivariate Analysis

Page 15: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Estimated model (stadardized solution) with model fit summary

Seppo Pynnonen Applied Multivariate Analysis

Page 16: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Remark 1

Many SEM packages set initial identification constraints automatically.

Typically by fixing one loading for each factor equal to one. As discussed

earlier, this implies that the scale of the latent variable is fixed to that

particular variable. Also the coefficients of the error term paths are fixed

to one.

Seppo Pynnonen Applied Multivariate Analysis

Page 17: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

The chi-square goodness of fit test statistic has value 3.92, which with 6degrees of freedom has p-value of 0.69, indicating that the model fits thedata.

On the basis of this short analysis our quick conclusion is that the

measures seem to be indicators of those concepts they are supposed to.

Seppo Pynnonen Applied Multivariate Analysis

Page 18: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

1 The model

2 Model Evaluation

Chi-square Test

Some Other Statistics

Seppo Pynnonen Applied Multivariate Analysis

Page 19: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Model evaluation is an important step in empirical analysis.

The model extremes are:

Saturated model: no restrictions are imposed on the populationmoments.

Independence model: variables are uncorrelated

Modeling the population moments means imposing somerestrictions, implying that our proposed model is somewherebetween these extremes.

Seppo Pynnonen Applied Multivariate Analysis

Page 20: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Simplicity: Models with relatively few parameters are preferred(the principle of parsimony).

At the same time a well fitting model is preferable to a poorlyfitting one.

Empirically the question is how well the model predicted covariancematrix

Σ = ΛΦΛ′ + Θδ (3)

matches with the sample covariance matrix S .

Seppo Pynnonen Applied Multivariate Analysis

Page 21: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

To asses the empirical fitting there are dozens of statistics.

These measures can be classified into different categories:

Measures of parsimony, Minimum sample discrepancy measures,Measures based on population discrepancy, Information-theoreticmeasures, Comparison to baseline model measures, Parsimonyadjusted measures, Goodness of Fit indexes, etc.

Seppo Pynnonen Applied Multivariate Analysis

Page 22: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Chi-square Test

1 The model

2 Model Evaluation

Chi-square Test

Some Other Statistics

Seppo Pynnonen Applied Multivariate Analysis

Page 23: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Chi-square Test

The chi-square (χ2, CMIN in AMOS) statistic reported in theexamples, is perhaps one of the most popular statistic goodness offit statistic.

It can be classified to measure sample discrepancy.

Strictly speaking the null hypothesis it tests is:

H0 : x ∼ N(µ,Σ), (4)

where Σ is of the form (2), i.e., data generated according to ourhypnotized model.

Seppo Pynnonen Applied Multivariate Analysis

Page 24: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Chi-square Test

The p-value indicates how plausible this hypothesis is.

A small p-value indicates discrepancy. A usual threshold is 5%,i.e., p < 0.05 is an indication that our model is not reallyconsistent with the data.

In Example 1, we found χ2 = 3.92, with 6 degrees of freedomproduces p-value 0.69, which suggest that the model fits well withthe data.

Seppo Pynnonen Applied Multivariate Analysis

Page 25: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Chi-square Test

A derived measure is χ2/df (chi-square divided by the degrees offreedom).

The rule is that the ratio should be close to one.

In particular a ”large” (seems to be somewhere between 2 and 5)value represents an inadequate fit.

In the above Example 1 χ2/df < 1.

Seppo Pynnonen Applied Multivariate Analysis

Page 26: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Some Other Statistics

1 The model

2 Model Evaluation

Chi-square Test

Some Other Statistics

Seppo Pynnonen Applied Multivariate Analysis

Page 27: Applied Multivariate Analysis - Vaasan yliopistolipas.uwasa.fi/~sjp/Teaching/mva/lectures/c6.pdf · Applied Multivariate Analysis Seppo Pynn onen Department of Mathematics and Statistics,

The model Model Evaluation

Some Other Statistics

(a) Normed Fit Index (NFI) measures

The closer to 1 the better the fit (1 = perfect fit, 0 = no fit).

Should be > 0.90 (e.g. AMOS manual).

(b) Goodness of Fit Statistic (GFI)

The closer to 1 the better the fit (1 = perfect fit, 0 = no fit)

Threshold ?

(c) Root Mean Square of Approximation (RMSEA)

Should be ≤ 0.08. If > 0.1, the model should be improved.

Seppo Pynnonen Applied Multivariate Analysis