89
Statistical Modelling (S pe cial T op ic: SEM) Statistik T erapan dan Manajemen Data. Email : olah.data@yahoo.co .id

Statistical Modelling 2

Embed Size (px)

Citation preview

Page 1: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 1/89

Statistical Modelling(Special Topic: SEM)

Statistik Terapan dan Manajemen Data.Email : [email protected]

Page 2: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 2/89

Relationship Analysis

Awareness on the fact that somerelationships / models are ¶meaningful· and

some are not.

Meaningful relationships / modelsnormally have theoretical basis

(underlying theory) and exhibit ¶causality·or ¶cause-and-effect·

For those ¶cause-and-effect· relationships,

SEM provides a formal way of analysing

Page 3: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 3/89

 Agenda Agenda

Part I: SEM the Basic

i. SEM Nomenclature / Terminologies

ii. SEM & related Models

Part II: Modeling and Computingi. how to draw a model using AMOS.

ii. how to run the AMOS model and evaluate severalkey components of the AMOS graphics and textoutput, including overall model fit and test statistics

for individual path coefficients.iii. how to modify and respecify a non-fitting model.

Part III: SEM and Its Applications

Page 4: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 4/89

Part One

SEM: The Basichttp://58.26.137.12/byatim/

Page 5: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 5/89

SEM

Is a statistical methodology of the analysis

of a structural theory that bears on some

phenomenon using a confirmatory

(hypothesis testing) approach. Most othermultivariate procedures are descriptive/exploratory in nature.

The theory represent ́ causalµ processesthat generate observations on multiple

variables.

Page 6: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 6/89

SEM

conveys 2 important aspects of the procedures..The causal processes under study are represented by a

series of structural equations, and

These structural equations can be modeled pictorially

to enable a clearer conceptualization of the theoryunder study.

The model can be tested simultaneously todetermine the extent to which it is consistent

with the data: if the goodness of fit adequate, themodel is not rejected, otherwise thehypothesized relations rejected.

Page 7: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 7/89

SEM: A Note

SEM is a very general, very powerful and very popular multivariate analysis technique.

It provides a comprehensive method for thequantification and testing of theories.

Been applied in econometric, psychology, sociology,political science, education, market and medicalresearch etc.

Also known ascovariance structure analysis,

covariance structure modeling,latent vaviable modeling,

confirmatory factor analysis,

linear structural relationship and

analysis of covariance structures.

Page 8: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 8/89

SEM is ««

a family of statistical techniqueswhich incorporates and

integratesPath analysis

Linear regression

Factor analysis

Page 9: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 9/89

SEM

serves purposes similar to multipleregression, but in a more powerful way which takes into account the modeling

of interactions, nonlinearities, correlated

independents, measurement error, correlated errorterms, multiple latent independents each measured bymultiple indicators, and one or more latent dependentsalso each with multiple indicators.

may be used as a more powerful

alternative to multiple regression, pathanalysis, factor analysis, time series analysis,and analysis of covariance. Theseprocedures are special cases of SEM, or,

is an extension of the general linear model 

Page 10: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 10/89

Advantages of SEM compared to

multiple regression

more flexible assumptions (particularly allowing interpretationeven in the face of multicollinearity),

use of confirmatory factor analysis to reduce measurementerror by having multiple indicators per latent variable,

the attraction of SEM's graphical modeling interface, thedesirability of testing models overall rather than coefficientsindividually,

the ability to test models with multiple dependents,

model mediating variables, model error terms,

test coefficients across multiple between-subjects groups, and

handle difficult data (time series with autocorrelated error, non-normaldata, incomplete data).

Page 11: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 11/89

Major applications of structural

equation modeling1. causal modeling, or path analysis - hypothesizes causalrelationships among variables and tests the causal models with alinear equation system. Causal models can involve either manifestvariables, latent variables, or both;

2. confirmatory factor analysis - extension of factor analysis in whichspecific hypotheses about the structure of the factor loadings and

intercorrelations are tested;3. regression models, in which regression weights may be constrained

to be equal to each other, or to specified numerical values;

4. covariance structure models, which hypothesize that a covariancematrix has a particular form. For example, you can test thehypothesis that a set of variables all have equal variances with this

procedure;5. correlation structure models, which hypothesize that a correlation

matrix has a particular form.

Page 12: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 12/89

Aims and Objectives

By the end of this course you should: Have a working knowledge of the principles

behind causality. Understand the basic steps to building a model

of the phenomenon of interest. Be able to construct/ interpret path diagrams. Understand the basic principles of how models

are tested using SEM. Be able to test models adequacy using SEM Be able to use AMOS intelligently.

Page 13: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 13/89

SEM: Another Note

Assumption 1: you are familiar with thebasic logic of statistical reasoning asdescribed in Elementary Concepts.

Assumption 2: you are familiar with theconcepts of variance, covariance,correlation and regression analysis; if not,you are advised to read the BasicStatistics.

It is highly desirable that you have somebackground in factor analysis beforeattempting to use structural modeling.

Page 14: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 14/89

Introduction to SEMIntroduction to SEM

How Useful is Statistical Model?

The Basic Idea Behind SEM

Causality (Cause-and-EffectRelationship)

SEM Nomenclature/Terminologies

SEM & related Statistical Models

Page 15: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 15/89

How Useful is Statistical Model?How Useful is Statistical Model?

y ³ All models are wrong, but some areuseful ́

G.E.P Box

y ³SEM models can never be accepted(as absolute truth); they can only failto be rejected. ³ This leads researchers to  provisionally 

acce pt a given model. 

While models that fit the data well canonly be provisionally accepted, modelsthat do not fit the data well can be

absolutely rejected.

Page 16: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 16/89

The Basic Idea Behind SEM

y In Distribution Theory course you aretaught that, if you multiply every number ina list by some constant K, you multiply the

mean of the numbers by K. Similarly, youmultiply the standard deviation by theabsolute value of K.

y Suppose you have the list of numbers 1,2,3

having a mean of 2 and a standarddeviation of 1. Suppose also you take these3 numbers and multiply them by 4. Thenthe mean would become 8, and thestandard deviation would become 4, the

Page 17: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 17/89

The Basic Idea Behind SEM

y The point is, if you have a set of numbers Xrelated to another set of numbers Y by theequation Y = 4X, then the variance of Y must be16 times that of X, so you can test the

hypothesis that Y and X are related by theequation Y = 4X indirectly by comparing thevariances of the Y and X variables.This ideageneralizes, in various ways, to several variables

inter-related by a group of linear equations.Therules become more complex, the calculationsmore difficult, but the basic message remains thesame -- you can test whether variables areinterrelated through a set of linear relationshipsby examining the variances and covariances of 

Page 18: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 18/89

The Basic Idea Behind SEMStatisticians have developed procedures for testing whether a set of variances and covariances in a covariance matrix fits a specifiedstructure.The way SEM works is as follows:

1. You state the way that you (the theory) believe the variables areinter-related, often with the use of a path diagram.

2. You (AMOS) work out, via some complex internal rules, what theimplications of this are for the variances and covariances of thevariables.

3. You test whether the variances and covariances fit this model of them.

4. Results of the statistical testing, and also parameter estimates and

standard errors for the numerical coefficients in the linear equations are reported.

5. On the basis of this information, you decide whether the modelseems like a good fit to your data.

Page 19: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 19/89

A Simple SEM

SEM is an attempt to model causal

relations between variables by including

all variables that are known to have some

involvement in the process of interest« test the effect of a drug on some

psychological disorder (e.g. obsessive

compulsive disorder, OCD)

Page 20: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 20/89

CausalityCausality

Causality has theoretical basis

Education

Succes

s inLife

Price Deman

d

Suppl

y

Unemp-

loyment

Rate

No. of 

Crimes

Windows of 

Opportunity

for Crime

Page 21: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 21/89

Cause and Effect

Philosophers have had a great deal to say

about the conditions necessary to infer

causality. Cause and effect

should occur close together in time,

cause should occur before an effect isobserved, and

the cause should never occur without thepresence of the effect.

Page 22: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 22/89

 John Stuart Mill (1865) described three

conditions necessary to infer cause

Cause has to precede effect Cause and effect must be related

All other explanations of the cause-effect

relationship must be ruled out.

Page 23: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 23/89

To verify the third criterion, Mill

proposed the method of agreement which states that an

effect is present when the cause is present;

method of difference which states that whenthe cause is absent the effect will be absent alsoand;

method of concomitant variation which

states that when the above relationships areobserved, causal inference will be made strongerbecause most other interpretations of the cause-effect relationship will have been ruled out.

Page 24: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 24/89

Example:If we wanted to say that me talking about causality causes

boredom, we would have to satisfy the following conditions:(1) I talk about causality before boredom occurs.

(2) Whenever I talk about causality, boredom occurs shortlyafterwards.

(3) The correlation between boredom and my talking about

causality must be strong (e.g. 4 out of 4 occasions when I talk about causality boredom is observed)

. (4) When cause is absent effect is absent: when I don·t talk aboutcausality no boredom is observed.

(5) The manipulation of cause leads to an associatedchange in effect. So, if we manipulated whether someone islistening to me talking about causality or to my cat is mewing, theeffect elicited should change according to the manipulation.

This final manipulation serves to rule out external variables thatmight affect the cause-effect relationship.

Page 25: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 25/89

Continue«

in situations in which cause cannot be

manipulated we cannot make causal

attributions about our variables.

Statistically speaking, this means thatwhen we analyze data from non-experimental situations we cannot

conclude anything about cause an effect. Structural Equation Modeling (SEM) is an

attempt to provide a flexible framework within which causal models can be built.

Page 26: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 26/89

Statistical ModelingStatistical Modeling

A Statistical Model DOES NOT necessarily havetheoretical basis ² It may be interpreted as either¶make sense· or ¶nonsense·

No. of 

Road

 Accidents

No. of 

Newspape

r Readers

Heart

Disease

Weight

IncomeSmokin

g

Page 27: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 27/89

SEM & Related StatisticalSEM & Related Statistical

Models

Modelsy General Linear Model (GLM)

y Regression Model

y Time Series Model

y Log-linear Model

y Mixed Models

y Survival Models

y Many more «

 All these Statistical Models may or may

not have theoretical basis

Page 28: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 28/89

GeneralIntelligence

y4

e4

1

1

y3

e3

1

y2

e2

1

y1

e1

1

Quantitative Ability

x8

d8

1

1

x7

d7

1

x6

d6

1

x5

d5

1

Verbal

 Ability

x4

d4

1

1

x4

d3

1

x2

d2

1

x1

d1

1

Exogenous Latent Variable /Construct

Endogenous Latent Variable

Exogenous Latent Variable

Indicators

Indicators

Indicators

Page 29: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 29/89

SEM NomenclatureSEM Nomenclature

y Independent variables, which are assumed tobe measured without error, are calledexogenous or upstream variables;

y Dependent or mediating variables are called

endogenous or downstream variables.y Manifest or observed variables or indicators

are directly measured by researchersy Latent or unobserved variables are not

directly measured but are inferred by therelationships or correlations amongmeasured variables in the analysis. Example,self-concept, motivation, powerlessness,

anomie, verbal ability, capitalism, socialclass.

Page 30: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 30/89

SEM Nomenclature (cont.)SEM Nomenclature (cont.)

y SEM illustrates relationships among

observed and unobserved variables

using path diagrams. 

y Ovals or circles represent latentvariables,

y Rectangles or squares represent

measured variables. y Residuals are always unobserved, so

they are represented by ovals or 

circles.

Page 31: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 31/89

SEM SEM ± ± DefinitionDefinition

SEM is an extension of the general linearmodel (GLM) that enables a researcher totest a set of regression equationssimultaneously.

SEM consists of TWO components;

Structural Model illustrates the relationships among the latent

constructs or endogenous variables Measurement Model represents how the constructs are related to

their indicators or manifest variables

Page 32: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 32/89

ExampleExample

Ability /

IntelligenceAspirations Achievement

1L

2

L1

\

ExogenousLatent

Construct

EndogenousLatent

Construct

EndogenousLatent

Construct

In psychology, the theorypostulates that «

Page 33: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 33/89

Full Latent Variable Model

Ability Aspiration Achievement

InterpersonalSkill, x2

Peer¶sInfluence

y3

FamilyStatus, y1

Father¶sOccupation,

y2

ProfessionalStatus, y5

SocialStatus,

y6

AcademicSkill, x1

CommunicationSkill, x3

PersonalActualization,

y4

Page 34: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 34/89

Example:ONE Latent (unobserved) Exogenous Variable &

TWO Latent (unobserved) Endogenous Variables

1\ 1L 2L

1 x

2 x

3 x

1 y 2

 y3

 y4

 y5

 y6

 y

11P

21P

21 F

 x

11P

 x

21P

 x

31P

 y

11P

 y

21P

 y

31P y

42P

 y

52P

 y

62P

1 ^ 2

 ^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model

Page 35: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 35/89

Structural ModelStructural Model

y The structural model allows for certainrelationships among the latent variables,depicted by lines or arrows (in a path diagram)

y In the path diagram, we specified that Ability andAchievement were related in a specific way. Thatis, intelligence had some influence on laterachievement.

y Thus, one result from the structural model is an

indication of the extent to which these a priorihypothesized relationships are supported byour sample data.

Page 36: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 36/89

Structural ModelStructural Model (Cont.)(Cont.)

y The structural equation addresses the

following questions:

y Are Ability and Achievement related?

y Exactly how strong is the influence of 

Ability on Achievement?

y Could there be other latent variables that

we need to consider to get a betterunderstanding of the influence on

Achievement?

Page 37: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 37/89

Example:ONE Latent (unobserved) Exogenous Variable &

TWO Latent (unobserved) Endogenous Variables

1\ 1L 2L

1 x

2 x

3 x

1 y 2

 y3

 y4

 y5

 y6

 y

11P

21P

21 F

 x

11P

 x

21P

 x

31P

 y

11P

 y

21P

 y

31P y

42P

 y

52P

 y

62P

1 ^ 2

 ^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model

Page 38: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 38/89

Mathematical Form of StructuralMathematical Form of Structural

ModelModel

 ^L\L

 ^

 ^

L

L

 F\

P

P

L

L

 ^L F\PL

 ^\PL

+!

¹¹ º

 ¸©©ª

¨¹¹

 º

 ¸©©ª

¨¹¹ º

 ¸©©ª

¨¹¹

 º

 ¸©©ª

¨!¹¹

 º

 ¸©©ª

¨

!

!

 B

2

1

2

1

21

1

21

11

2

1

21211212

11111

0

00

Page 39: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 39/89

Measurement ModelMeasurement Model

y Specifying the relationship between thelatent variables and the observed variables

y  Answers the questions:1) To what extent are the observed variables

actually measuring the hypothesized latent

variables?2) Which observed variable is the best

measure of a particular latent variable?3) To what extent are the observed variables

actually measuring something other than

the hypothesized latent variable?y Using Exploratory Factor Analysis (EFA) or 

Confirmatory Factor Analysis (CFA) todetermine the significant observedvariables related to each of the latentvariables

Page 40: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 40/89

Exploratory FA (EFA)Exploratory FA (EFA)

In EFA the factor structure or theory about aphenomenon is NOT KNOWN.

For example, the researcher is interested inmeasuring ́ the achievement of a personnelµ.

Suppose he has no knowledge ( very little theory)regarding the factors that contribute to achievement

the no. of indicators of each factor

which indicators represent which factor

In such a case, the researcher may collect dataand ¶explore· for a factor or theory which canexplain the correlations among the indicators.

Page 41: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 41/89

Confirmatory FA (CFA)Confirmatory FA (CFA) In CFA the precise factor structure or theory about a

phenomenon is KNOWN or specified priori.

For example, a researcher is interested in measuring´consumer preferenceµ to a product.

Suppose that ¶based on previous research· it is hypothesized

(the theory) that a construct or factor to measure¶consumer preference· is a one-dimensional construct with 7 indicators or items as its

measures

The obvious question is: How well do the empirical data conform to the theory of 

consumer preferences? Or How well do the data fit the model?

In such a case, CFA is used to do empirical ¶confirmation· or¶testing· of the theory

Page 42: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 42/89

Using Factor AnalysisUsing Factor Analysis

 Ability

 Academic

Skill

Inter-

personal

Skill

Communi-cation Skill

x1

x2

x3

1\

FactorLoadings

Page 43: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 43/89

Using Factor AnalysisUsing Factor Analysis

 Aspiration

Family

Status

Father¶s

Occupation

Peer¶s

Influence

y1

y2

y3

1L

FactorLoadings

Page 44: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 44/89

Using Factor AnalysisUsing Factor Analysis

 Achievement

Personal

 Actualisation

Professional

Status

Social

Status

y4

y5

y6

2L

FactorLoading

Page 45: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 45/89

Measurement ModelMeasurement Model (Cont.)(Cont.)

y The relationships between the observedvariables and the latent variables aredescribed by factor loadings

y Factor loadings provide information about

the extent to which a given observedvariable is able to measure the latentvariable. They serve as validity coefficients.

y Measurement error is defined as thatportion of an observed variable that is

measuring something other than what thelatent variable is hypothesized to measure. It serves as a measure of reliability.

Page 46: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 46/89

Measurement ModelMeasurement Model (Cont.)(Cont.)

yMeasurement error could be the result

of:

1.  An unobserved variable that is

measuring some other latent variable

2. Unreliability

3.  A second-order factor 

Page 47: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 47/89

Mathematical Form of Mathematical Form of 

Measurement ModelMeasurement Model

H\

H

H

H

\

P

P

P

H\P

H\P

H\P

50!

¹

¹¹

 º

 ¸

©

©©

ª

¨

¹

¹¹

 º

 ¸

©

©©

ª

¨

!

¹

¹¹

 º

 ¸

©

©©

ª

¨

!!

!

 x

 x

 x

 x

 x

 x

 x

 x

or 

 x

 x

 x

 x

 x

 x

3

2

1

1

31

21

11

3

2

1

31313

21212

11111

How the latent (unobservable) exogenous variable

are related to their indicators or manifest/observed

variables x1,x2 & x3

1\

Page 48: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 48/89

Measurement Model (cont.)Measurement Model (cont.)

IL

I

I

I

I

I

I

L

L

P

P

P

P

P

P

ILP

ILP

ILP

ILP

ILP

ILP

50!

¹

¹¹¹¹¹¹¹

 º

 ¸

©

©©©©©©©

ª

¨

¹¹ º

 ¸©©ª

¨

¹

¹¹¹¹¹¹¹

 º

 ¸

©

©©©©©©©

ª

¨

!

¹

¹¹¹¹¹¹¹

 º

 ¸

©

©©©©©©©

ª

¨

!

!

!

!

!

!

 y

 y

 y

 y

 y

 y

 y

 y

 y

 y

 y

 y

 y

 yor 

 y

 y

 y

 y

 y

 y

  formmatrixin

 y

 y

 y

 y

 y

 y

6

5

4

3

2

1

2

1

62

52

42

31

21

11

6

5

4

3

2

1

62626

52525

42424

31313

21212

11111

0

0

0

0

0

0

How the TWO latent (unobservable)

constructs or endogenous variables ,

are related to their indicators or manifest

variables y1

, ..y6

1L

2L

Page 49: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 49/89

Full Latent Variable Model

Ability Aspiration Achievement

InterpersonalSkill, x2

Peer¶sInfluence

y3

FamilyStatus, y1

Father¶sOccupation,

y2

ProfessionalStatus, y5

SocialStatus,

y6

AcademicSkill, x1

CommunicationSkill, x3

PersonalActualization,

y4

Page 50: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 50/89

Example:ONE Latent (unobserved) Exogenous Variable &

TWO Latent (unobserved) Endogenous Variables

1\ 1L 2L

1 x

2 x

3 x

1 y 2

 y3

 y4

 y5

 y6

 y

11P

21P

21 F

 x

11P

 x

21P

 x

31P

 y

11P

 y

21P

 y

31P y

42P

 y

52P

 y

62P

1 ^ 2

 ^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model

Page 51: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 51/89

Reliabilityy Definition: Extent to which a variable or set of 

variables or set of variables is consistent in

what it is intended to measure

y If multiple measurement are taken, the reliable

measures will all be consistent in their valuesy It is a degree to which the observed variable

measure the ́ trueµ value and is ́ error freeµ

y It is different from validity

Page 52: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 52/89

True Score and Measurement

Error True score: a component which indicates

the subject actually stands on the variable

(statement) of interest Measurement error: A component which

indicates the inaccuracies when measuring´true scoresµ due to fallibility of survey

instrument, responses scales, data entryor respondent error

Page 53: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 53/89

ReliabilityReliability

y The degree to which scores are free from

random measurement error

y Reliability measures

Internal Consistency Reliability

Test-retest Reliability

Alternate Forms Reliability

Page 54: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 54/89

ReliabilityReliability

y Levels of Reliability

0.90 Excellent

0.80 Very Good

0.70 Adequate

<0.70 Poor

Page 55: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 55/89

Example:Reliability of ObservedExample:Reliability of ObservedVariablesVariablesC ronbach¶s al  pha were com puted for the all variables

Variable No. of items Reliability

Variable1 10 .91

Variable2 10 .87

Variable3 10 .58

Variable4 10 .70

Variable5 12 .72

Variable6 12 .80

Variable7 12 .80

Variable8 12 .87

Variable9 10 .84

Variable10 7 .71

Variable11 4 .48

Page 56: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 56/89

Summated Scale Reliability

y When reliability involves multiple scaled

items, reliability must be measured in a

summated scale.

y A summated rating scale is a short list of statements, questions or other items that

the subject responds to.

y A summated is a sum of responses from alist of statements to create an overallscore.

Page 57: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 57/89

Reliability coefficient (1)

y There are several ways to measure

reliability which will be discussed later.

y The measurement is normally called the

reliability coefficient.y This coefficient is the percent of variance

in an observed variable that is accounted

for by the true scores of the underlyingconstruct.

Page 58: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 58/89

Reliability Coefficient (2)y Imagine you have collected 2 scores from a survey

True and observed scores of customer satisfaction

y You compute the correlation between thescores

y The square of correlation coefficient will be

your reliability coefficient which is

The total variances explained in the observed scores by

the true score or The percent of variance in observed scores that is

accounted for by true scores.

Types of Reliability

Page 59: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 59/89

Types of Reliability

Test-retest

Assessed by administering the same instrument tothe same sample respondent at two points in time,and computing the correlation between two sets of scores.

Internal consistency reliabilityThe extent to which individual items that constitute atest correlate with one another or with the test total.In short, it measures how consistently respondents

respond to the items within scale.

Page 60: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 60/89

Types of Reliability (2) For example, if the first half of an instrument is educational

items which correlate highly among themselves and second ispolitical items which correlate highly among themselves., theinstrument would have high internal consistency anyway, eventhough they are two distinct dimensions

Note that measure of internal consistency are oftencalled measures of ́ internal consistency reliabilityµor even ́ reliability,µ but this merge the distinctconcepts of internal consistency and reliability, whichnecessarily go together

How do we solve this problem? The most commonly used internal consistency reliability is

Cronbach·s Alpha

Page 61: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 61/89

Validity

Definition: extent to which an item or setof items correctly represent theconstruct of study- the degree of which it

is free from any systematic or non-random error

Validity deals withHow well the construct is defined by the

item/s (what should be measured) While Reliability deals withHow consistent the item/s is/are measuring

the construct (HOW it is measured)

Page 62: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 62/89

ValidityValidity

Whether the scores measure what they

are supposed to measure

Types of validity

Construct Validity (SEM Confirmatory FactorAnalysis helps to establish construct validity)

Criterion-Related Validity (Correlation with anexternal standard)

Convergent Validity/ Discriminant Validity (Can bedetermined through SEM Confirmatory FactorAnalysis)

Page 63: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 63/89

Examples Example 1:How happy are you?

This example is validity -whether the measure

accurately represents what it is supposed to

measure

Example 2:How happy are you when you aresmoking? Ask this question repeatedly on the

same subject or multiple subject and see how

consistent their answers are?This example is about reliability (sometimes I·dlike to call it consistency)

Page 64: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 64/89

I Am an Indicator Lyrics by Alan Reifman

(May be sung to the tune of "The Entertainer," Billy  Joel)

I am an indicator, a latent construct I represent,I'm measurable, sometimes pleasurable,A manifestation of what is meant,

I am an indicator, I usually come in a multiple set,With other signs of the same construct, you may instruct,I'm correlated with my co-indicators, you can bet,

I am an indicator, from my presence the construct is inferred,I'm tap-able, the construct is not palpable,

The distinction should not be blurred

At L t Th

Page 65: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 65/89

At Least ThreeLyrics by Alan Reifman(May be sung to the tune of "Think of Me," Lloyd Webber/Hart/Stilgoe,

fromPh

antom of the Opera)

At least three, indicators are urged,

For each latent construct shown,At least three, indicators should help,Avoid output where you groan,

With less than three, your construct sure willbe, locally unidentified,Though the model might still run, you couldhave a rough ride

Page 66: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 66/89

Total, Direct and IndirectTotal, Direct and Indirect

EffectsEffectsy There is a direct effect between two latent

variables when a single directed line or arrowconnects them

y There is an indirect effect between two

variables when the second latent variable isconnected to the first latent variable throughone or more other latent variables

y The total effect between two latent variables is

the sum of any direct effect and all indirecteffects that connect them.

Page 67: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 67/89

Example: Direct and Indirect «Example: Direct and Indirect «

Ability /Intelligence

Aspirations Achievement

1L

2L

1\

ExogenousLatent

Construct

EndogenousLatent

Construct

EndogenousLatent

Construct

Page 68: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 68/89

Semantics

Types of measurement scale

Metric and Non-metric

Correlation coefficient

Correlation and Covariance Matrix

Standardized and Un-standardizedEstimates

Page 69: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 69/89

Types of Measurement Scale

y There 4 types of measurement scale in ascale instrument Nominal Scale Ordinal

Interval Scales Ratio

y Some other common scales like Likertscales, Semantic Differential Scales,

Dichotomous Scales etc can becategorized into the 4 above

y This is important as assumptions on SEMrely on what we know on this page

Page 70: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 70/89

Metric and Non-metric Scales

y Metric scales are quantitative data wherethe parameters of the scale is continuum Interval or Ratio scale data

y

Non-metric scales are qualitative datawhere attributes, characteristics orcategorical properties that identify ordescribe a subject or object

Possibly Nominal or Ordinal scale datay But the use of metric and non-metric

scales can be misused or abusedsometime«how?

Page 71: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 71/89

VARIABLE SC ALESVARIABLE SC ALES

y SEM in general assumes observed variables aremeasured on a linear continuous scale

y Dichotomous and ordinal variables cause

problems because correlations /covariances tendto be truncated. These scores are not normallydistributed and responses to individual items maynot be very reliable.

Page 72: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 72/89

Correlation

y Perhaps the most basic semantic Definition: the linear relationship of two

variables

y The strength of relationship is determined

by the correlation coefficient and r(explained later)

y There are 2 common types of correlationcoefficient

Pearson Product Moment Correlation(Interval)

Spearman Ranking Correlation (Ordinal)

y The former is the one we will use in this

course

Page 73: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 73/89

Correlation Matrix (1)

y The correlation matrix of n random variables X1,«,Xnis the n × n matrix whose i,j entry is corr( Xi,Xj)

y If the measurement of correlation used are product-moment coefficients, the correlation matrix is the sameas the covariance matrix of the standardized randomvariables Xi/SD(Xi) for i=1,«,n

y Consequently it is necessary a non-negative definitematrix ² important assumption

y The correlation matrix is symmetric because the

correlation between Xi and Xj is the same as thecorrelation between Xj and Xi

Correlation Matrix (2)

Page 74: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 74/89

Correlation Matrix (2)A1 A2 A3 A4 A5 A6 A7 B1 B2 B3

A1

a1

1.0000 0.65579

<.0001

0.46296

<.0001

0.58812

<.0001

0.62082

<.0001

0.62629

<.0001

0.64288

<.0001

0.34385

0.0004

0.57904

<.0001

0.56353

<.0001

A2

a2

0.65579

<.0001

1.00000 0.45951

<.0001

0.66297

<.0001

0.72727

<.0001

0.77384

<.0001

0.76693

<.0001

0.40987

<.0001

0.67796

<.0001

0.59493

<.0001

A3

a3

0.46296

<.0001

0.45951

<.0001

1.00000 0.51913

<.0001

0.46652

<.0001

0.45752

<.0001

0.44520

<.0001

0.33407

0.0006

0.35833

0.0002

0.33623

0.0006

A4

a4

0.55812

<.0001

0.66297

<.0001

0.51913

<.0001

1.00000 0.69905

<.0001

0.64969

<.0001

0.59358

<.0001

0.34148

0.0004

0.58859

<.0001

0.44284

<.0001

A5

a5

0.62082

<.0001

0.72727

<.0001

0.46652

<.0001

0.69905

<.0001

1.00000 0.67281

<.0001

0.66939

<.0001

0.31277

<.0014

0.63133

<.0001

0.54744

<.0001

A6

A6

0.62629

<.0001

0.77384

<.0001

0.45752

<.0001

0.64969

<.0001

0.67281

<.0001

1.00000 0.86014

<.0001

0.40483

<.0001

0.66758

<.0001

0.56944

<.0001

A7

A7

0.64288<.0001

0.76693<.0001

0.44520<.0001

0.59358<.0001

0.66939<.0001

0.86014<.0001

1.00000 0.39913<.0001

0.68141<.0001

0.62075<.0001

B1

b1

0.34385<.0004

0.40987<.0001

0.33407<.0006

0.34148<.0004

0.31277<.0014

0.40483<.0001

0.39913<.0001

1.00000 0.58187<.0001

0.62583<.0001

B2

b2

0.57904<.0001

0.67796<.0001

0.35833<.0002

0.58859<.0001

0.63133<.0001

0.66758<.0001

0.68141

<.0001

0.58187

<.0001

1.00000 0.85335<.0001

B3

b3

0.56353

<.0001

0.59493

<.0001

0.33623

<.0006

0.44284

<.0001

0.54744

<.0001

0.56944

<.0001

0.62075

<.0001

0.62583

<.0001

0.85335

<.000

1.00000

Page 75: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 75/89

Correlation Matrix (3)

y So we say that

If the input matrix used is the CovarianceMatrix: the estimated coefficients in the

parameters measured are unstandardizedestimates

If the input matrix used is the CorrelationMatrix: the estimated coefficients in the

parameters measured are the standardizedestimates

y So what?

Page 76: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 76/89

Covariance

The covariance between two variables

equals the correlation times the product

of the variables' standard deviations. Thecovariance of a variable with itself is thevariable's variance

Page 77: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 77/89

Correlation Matrix (4)

y Therefore when we want to test a theory,we use variance-covariance matrix (to validate the causal relationships among

constructs)

y When we just want to explain the patternof the relationships then we usecorrelation matrix

(Theory testing is not required)

F t Eff ti C l ti /

Page 78: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 78/89

Factors Effecting Correlation/

Covariance Coefficienty Type of scale and range of values

Pearson correlation is basis for analysis in regression, path, factor analysis andSEM.Hence data must be in metric form.

There must be enough variation in scores to allow correlation relationship tomanifest.

y Linearity Pearson correlation coefficient measures degree of linear relationship between

two variables, hence need to test linearity.

y Sample size SEM requires big sample size.Rule of thumb: 10-20 times the number of 

variables. Ding, Velicer and Harlow (1995): 100-150; Boomsma (1982,1983): 400;Hu, Bentler and Kano (1992): in some cases 5000 is still insufficient; Schumaker,Lomax (1999) many articles 250-500. Bentler and Chou (1987): for normal data5 subjects per variable is sufficient.

C i

Page 79: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 79/89

Covariance

Lyrics by Alan Reifman (May be sung to the tune of "Aquarius," Rado/Ragni/MacDermot, from Hair , also

popularized by the Fifth Dimension)

You draw paths to show relationships,You hope align with the known r ·s,Your model will guide the tracings,From constructs near to constructs far,

You will compare this with the data·s covariance,The data·s covariance...Covariance!Covariance!

Similar to correlation,With the variables unstandardized,

Does each known covariance match up with,The one the model tracings will derive?

Covariance!Covariance!

Page 80: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 80/89

SEM AssumptionsSEM Assumptions

Sample Size

y a good rule of thumb is >15 cases perpredictor / indicator ( James Stevens·

 Applied  Multivariate Statistics for the Social Sciences)

y Model with TWO factors,

recommended sample size >100y Model with FOUR factors,

recommended sample size > 200

Page 81: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 81/89

SEM Assumptions (cont.)SEM Assumptions (cont.)

Sample Size

y Consequences of using smaller samples

convergence failures (the software cannot

reach a satisfactory solution), improper solutions (including negative error 

variance estimates for measured variables),

lowered accuracy of parameter estimates

and, in particular, standard errorsy SEM program standard errors are

computed under the assumption of large

sample sizes.

Page 82: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 82/89

SEM Assumptions (cont.)SEM Assumptions (cont.)

Normality

y Many SEM estimation proceduresassume multivariate normal distributions

y

Lack of univariate normality occurs whenthe skew index is > 3.0 and kurtosisindex > 10.

y Multivariate normality can be detected by

indices of multivariate skew or kurtosisy Non-normal distributions can sometimes

be corrected by transforming variables

Page 83: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 83/89

SEM Assumptions (cont.)SEM Assumptions (cont.)

Multicollinearity Occurs when intercorrelations among some

variables are so high that certain mathematicaloperations are impossible or results are unstablebecause denominators are close to 0.

Bivariate correlations >0.85; Multiple correlations>0.90

May cause a non-positive definite/ singular covariancematrix

May be due to inclusion of individual and compositevariables

Detection; Tolerance = 1-R2 ,0.10;

Variance Inflation Factor (VIF) = 1/(1-R2) >10 Can be corrected by eliminating or combiningredundant variables

Page 84: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 84/89

SEM Assumptions (cont.)SEM Assumptions (cont.)

Outliersy Univariate outliers more than three SDs away

from the meany Detection by inspecting frequency

distributions and univariate measures of skewness and kurtosis

y Multivariate outliers may have extremescores on two or more variables or their figurations of scores may be unusual

y Detection by inspecting indices of multivariateskewness and kurtosis. MahalanobisDistance squared is distributed as chi square

with df equal to the number of variables.y Can be remedied by correcting errors or by

dropping these cases of transforming thevariables

Page 85: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 85/89

VIOLATIONS OF ASSUMPTIONS(1)VIOLATIONS OF ASSUMPTIONS(1)

y The best known distribution with nokurtosis is the multi-normal.

y

Leptokurtic (more peaked) distributionsresult in too many rejections of Ho

based on the Chi square statistic.

y

Platykurtic distributions will lead to toolow estimates of Chi Square.

VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS

Page 86: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 86/89

VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS

(2)(2)

y High degrees of skewness lead to

excessively large Chi square estimates.

y In small samples (N<100), the Chi square

statistic tends to be too large.

M d l Id tifi ti (Id tifi dM d l Id tifi ti (Id tifi d

Page 87: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 87/89

Model Identification (IdentifiedModel Identification (Identified

Equations)Equations)

y Identification refers to the idea that there is atleast one unique solution for each parameter estimate in a SEM model. 

y Models in which there is only one possiblesolution for each parameter estimate are said to

be  j ust-identified.y Models for which there are an infinite number of 

possible parameter estimate values are said tobe underidentified.

y Finally, models that have more than one possible

solution (but one best or optimal solution) for each parameter estimate are consideredoveridentified .

Model Identification (IdentifiedModel Identification (Identified

Page 88: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 88/89

((

Equations)Equations)

y

Underidentification empirical underidentification or

structural underidentification

y Empirical underidentification occurs when a parameterestimate that establishes model identification has a very small

(close to zero) estimate.

y A path coefficient whose value is estimated as being close to

zero may be treated as zero by the SEM program's matrix

inversion algorithm. If that path coefficient is necessary to

identify the model, the model thus becomes underidentified.y Remedy for Empirical underidentification - collect more data or

respecify the model

y Remedy for Structural underidentification - respecify the model

Page 89: Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 89/89

Examples of Identified ModelExamples of Identified Model

Case 1: Let say we have an equation x + 2y = 7

Question: Is this equation / model identified?

 Answer : No, it is µunderidentified¶ because there are aninfinite number of solutions for  x and y (e.g.,  x = 5 and y 

=1, or   x = 3 and y = 2). T

hese values are thereforeunderidentified because there are fewer "knowns" than"unknowns."

Case 2: Let say we have a set of equations

 x + 2y = 7

3 x - y = 7

Question: Is this equation / model identified?

 Answer : Yes, it is µjust-identified¶ model as there are asmany knowns as unknowns. There is one best pair of values (x = 3 y = 2)