BRM Multivariate Notes

8/3/2019 BRM Multivariate Notes

1/22

Multivariate Analysis

Business Research Methods

1


2/22

Multiple Regression

Q-1 What is Multiple Regression?

Ans :

Multiple regression is used to account for (predict) the variance in an interval dependent,based on linear combinations of interval, dichotomous, or dummy independent variables.

Multiple regression can establish that a set of independent variables explains a proportion

of the variance in a dependent variable at a significant level (through a significance testof R2), and can establish the relative predictive importance of the independent variables

(by comparing beta weights). Power terms can be added as independent variables to

explore curvilinear effects. Cross-product terms can be added as independent variables toexplore interaction effects. One can test the significance of difference of two R2's to

determine if adding an independent variable to the model helps significantly. Using

hierarchical regression, one can see how most variance in the dependent can be explained

by one or a set of new independent variables, over and above that explained by an earlierset. Of course, the estimates (b coefficients and constant) can be used to construct a

prediction equation and generate predicted scores on a variable for further analysis.

The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's

are the regression coefficients, representing the amount the dependent variable y changeswhen the corresponding independent changes 1 unit. The c is the constant, where the

regression line intercepts the y axis, representing the amount the dependent y will be

when all the independent variables are 0. The standardized version of the b coefficientsare the beta weights, and the ratio of the beta coefficients is the ratio of the relative

predictive power of the independent variables. Associated with multiple regression is R2,

multiple correlation, which is the percent of variance in the dependent variable explainedcollectively by all of the independent variables.

Multiple regression shares all the assumptions of correlation: linearity of relationships,

the same level of relationship throughout the range of the independent variable

("homoscedasticity"), interval or near-interval data, absence of outliers, and data whose

range is not truncated. In addition, it is important that the model being tested is correctlyspecified. The exclusion of important causal variables or the inclusion of extraneous

variables can change markedly the beta weights and hence the interpretation of theimportance of the independent variables.

2


3/22

Q-2 What is R-square ?

Ans:

R2

, also called multiple correlation or the coefficient of multiple determination, is the percent of the variance in the dependent explained uniquely or jointly by the

independents. R-squared can also be interpreted as the proportionate reduction in error in

estimating the dependent when knowing the independents. That is, R2 reflects the numberof errors made when using the regression model to guess the value of the dependent, in

ratio to the total errors made when using only the dependent's mean as the basis for

estimating all cases. Mathematically, R2 = (1 - (SSE/SST)), where SSE = error sum of

squares = SUM((Yi - EstYi)squared), where Yi is the actual value of Y for the ith caseand EstYi is the regression prediction for the ith case; and where SST = total sum of

squares = SUM((Yi - MeanY)squared). The "residual sum of squares" in SPSS /SAS

output is SSE and reflects regression error. Thus R-square is 1 minus regression error as a

percent of total error and will be 0 when regression error is as large as it would be if yousimply guessed the mean for all cases of Y. Put another way, the regression sum of

squares/total sum of squares = R-square, where the regression sum of squares = total sumof squares - residual sum of squares

Q-3 What is Adjusted R-square and How it is calculated?

Ans:

Adjusted R-Square is an adjustment for the fact that when one has a large number of

independents, it is possible that R2 will become artificially high simply because some

independents' chance variations "explain" small parts of the variance of the dependent. At

the extreme, when there are as many independents as cases in the sample, R

2

will always be 1.0. The adjustment to the formula arbitrarily lowers R2 as p, the number of

independents, increases. Some authors conceive of adjusted R2 as the percent of variance

"explained in a replication, after subtracting out the contribution of chance." When usedfor the case of a few independents, R2 and adjusted R2 will be close. When there are a

great many independents, adjusted R2 may be noticeably lower. The greater the number

of independents, the more the researcher is expected to report the adjusted coefficient.Always use adjusted R2 when comparing models with different numbers of independents.

Adjusted R2 = 1 - ( (1-R2)(N-1 / N - k - 1) ).where n is sample size and k is the number of terms in the model not counting the

constant (i.e., the number of independents).

Q-4 What is Multicollinearity and How it is measured?

Multicollinearity is the intercorrelation of independent variables. R2's near 1 violate the

assumption ofno perfect collinearity, while high R2's increase the standard error of the

3
http://www2.chass.ncsu.edu/garson/pa765/regress.htm#perfect%23perfecthttp://www2.chass.ncsu.edu/garson/pa765/regress.htm#perfect%23perfect


4/22

beta coefficients and make assessment of the unique role of each independent difficult or

impossible. While simple correlations tell something about multicollinearity, the

preferred method of assessing multicollinearity is to regress each independent on all theother independent variables in the equation. Inspection of the correlation matrix reveals

only bivariate multicollinearity, with the typical criterion being bivariate correlations > .

90. To assess multivariate multicollinearity, one uses tolerance or VIF, which build in theregressing of each independent on all the others. Even when multicollinearity is present,

note that estimates of the importance of other variables in the equation (variables which

are not collinear with others) are not affected.

Types of multicollinearity. The type of multicollinearity matters a great deal. Some typesare necessary to the research purpose

Tolerance is 1 - R2 for the regression of that independent variable on all the other

independents, ignoring the dependent. There will be as many tolerance coefficients as

there are independents. The higher the intercorrelation of the independents, the more the

tolerance will approach zero. As a rule of thumb, if tolerance is less than .20, a problemwith multicollinearity is indicated.

When tolerance is close to 0 there is high multicollinearity of that variable with other

independents and the b and beta coefficients will be unstable.The more themulticollinearity, the lower the tolerance, the more the standard error of the regression

coefficients. Tolerance is part of the denominator in the formula for calculating the

confidence limits on the b (partial regression) coefficient.

Variance-inflation factor, VIFVIFis the variance inflation factor, which is simply thereciprocal of tolerance. Therefore, when VIF is high there is high multicollinearity and

instability of the b and beta coefficients. VIF and tolerance are found in the SPSS andSAS output section on collinearity statistics.

Condition indices and variance proportions.

Condition indices are used to flag excessive collinearity in the data. A condition index

over 30 suggests serious collinearity problems and an index over 15 indicates possible

collinearity problems. If a factor (component) has a high condition index, one looks in thevariance proportions column. Criteria for "sizable proportion" vary among researchers

but the most common criterion is if two or more variables have a variance partition of .50

or higher on a factor with a high condition index. If this is the case, these variables have

high linear dependence and multicollinearity is a problem, with the effect that small datachanges or arithmetic errors may translate into very large changes or errors in the

regression analysis. Note that it is possible for the rule of thumb for condition indices (no

index over 30) to indicate multicollinearity, even when the rules of thumb for tolerance> .20 or VIF < 4 suggest no multicollinearity. Computationally, a "singular value" is the

square root of an eigenvalue, and "condition indices" are the ratio of the largest singular

values to each other singular value. In SPSS or SAS, select Analyze, Regression,Linear;click Statistics; check Collinearity diagnostics to get condition indices.

4
http://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc%23toleranchttp://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc%23toleranc


5/22

Q-5 What is homoscedasticity ?

Homoscedasticity: The researcher should test to assure that the residuals are dispersed

randomly throughout the range of the estimated dependent. Put another way, thevariance of residual error should be constant for all values of the independent(s). If not,

separate models may be required for the different ranges. Also, when thehomoscedasticity assumption is violated "conventionally computed confidence

intervals and conventional t-tests for OLS estimators can no longer be justified"However, moderate violations of homoscedasticity have only minor impact on

regression estimates .

Nonconstant error variance can be observed by requesting a simple residual plot (a plot

of residuals on the Y axis against predicted values on the X axis). A homoscedasticmodel will display a cloud of dots, whereas lack of homoscedasticity will be

characterized by a pattern such as a funnel shape, indicating greater error as the

dependent increases. Nonconstant error variance can indicate the need to respecify the

model to include omitted independent variables.

Lack of homoscedasticity may mean (1) there is an interaction effect between a

measured independent variable and an unmeasured independent variable not in the

model; or (2) that some independent variables are skewed while others are not.

One method of dealing with hetereoscedasticity is to select the weighted least squares

regression option. This causes cases with smaller residuals to be weighted more in

calculating the b coefficients. Square root, log, and reciprocal transformations of the

dependent may also reduce or eliminate lack of homoscedasticity.

Suggested Readings and Links:http://www2.chass.ncsu.edu/garson/pa765/regress.htm

www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdf

Kahane, Leo H. (2001).Regression basics. Thousand Oaks, CA: Sage Publications.

Menard, Scott (1995).Applied logistic regression analysis. Thousand Oaks, CA: Sage

Publications. Series: Quantitative Applications in the Social Sciences, No. 106.

Miles, Jeremy and Mark Shevlin (2001).Applying regression and correlation. ThousandOaks, CA: Sage Publications. Introductory text built around model-building.

Schroeder, Larry D., David L. Sjoquist, and Paula E. Stephan (1986). Understanding

regression analysis: An introductory guide. Thousand Oaks, CA: Sage Publications.

Series: Quantitative Applications in the Social Sciences, No. 57.

5
http://www2.chass.ncsu.edu/garson/pa765/regress.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www2.chass.ncsu.edu/garson/pa765/regress.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdf


6/22

Discriminant Analysis

Q-1 What is Disriminant Analysis ?

Ans:

Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify

cases into the values of a categorical dependent, usually a dichotomy. If discriminantfunction analysis is effective for a set of data, the classification table of correct and

incorrect estimates will yield a high percentage correct. Discriminant function analysis is

found in SPSS/SAS under Analyze, Classify, Discriminant. One gets DA or MDA fromthis same menu selection, depending on whether the specified grouping variable has two

or more categories.

Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a

cousin of multiple analysis of variance (MANOVA), sharing many of the sameassumptions and tests. MDA is used to classify a categorical dependent which has more

than two categories, using as predictors a number of interval or dummy independent

variables. MDA is sometimes also called discriminant factor analysis or canonical

discriminant analysis.

There are several purposes for DA and/or MDA:

To classify cases into groups using a discriminant prediction equation.

To test theory by observing whether cases are classified as predicted.

To investigate differences between or among groups.

To determine the most parsimonious way to distinguish among groups. To assess the relative importance of the independent variables in classifying the

dependent variable.

To infer the meaning of MDA dimensions which distinguish groups, based on

discriminant loadings.

Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the

discriminant model as a whole is significant, and (2) if the F test shows significance, thenthe individual independent variables are assessed to see which differ significantly in

mean by group and these are used to classify the dependent variable.

Discriminant analysis shares all the usual assumptions of correlation, requiring linear andhomoscedastic relationships, and untruncated interval or near interval data. Like multipleregression, it also assumes proper model specification (inclusion of all important

independents and exclusion of extraneous variables). DA also assumes the dependent

variable is a true dichotomy since data which are forced into dichotomous coding aretruncated, attenuating correlation.

6


7/22

DA is an earlier alternative to logistic regression, which is now frequently used in place

of DA as it usually involves fewer violations of assumptions (independent variables

needn't be normally distributed, linearly related, or have equal within-group variances), isrobust, handles categorical as well as continuous variables, and has coefficients which

many find easier to interpret. Logistic regression is preferred when data are not normal in

distribution or group sizes are very unequal. See also the separate topic on multiplediscriminant function analysis (MDA) for dependents with more than two categories.

Few Definitions and Concepts

Discriminating variables: These are the independent variables, also calledpredictors.

The criterion variable. This is the dependent variable, also called thegroupingvariable in SPSS. It is the object of classification efforts.

Discriminant function: A discriminant function, also called a canonical root, is a

latent variable which is created as a linear combination of discriminating(independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are

discriminant coefficients, the x's are discriminating variables, and c is a constant.

This is analogous to multiple regression, but the b's are discriminant coefficientswhich maximize the distance between the means of the criterion (dependent) variable.

Note that the foregoing assumes the discriminant function is estimated using ordinary

least-squares, the traditional method, but there is also a version involving maximumlikelihood estimation.

Number of discriminant functions. There is one discriminant function for 2-group

discriminant analysis, but for higher order DA, the number of functions (each with its

own cut-off value) is the lesser of (g - 1), where g is the number of categories in the

grouping variable, or p,the number of discriminating (independent) variables. Eachdiscriminant function is orthogonal to the others. A dimension is simply one of the

discriminant functions when there are more than one, in multiple discriminant

analysis.

The eigenvalue, also called the characteristic root of each discriminant function,reflects the ratio of importance of the dimensions which classify cases of the

dependent variable. There is one eigenvalue for each discriminant function. For two-

group DA, there is one discriminant function and one eigenvalue, which accounts for100% of the explained variance. If there is more than one discriminant function, the

first will be the largest and most important, the second next most important in

explanatory power, and so on. The eigenvalues assess relative importance becausethey reflect the percents of variance explained in the dependent variable, cumulating

7
http://www2.chass.ncsu.edu/garson/pa765/logistic.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/logistic.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htm


8/22

to 100% for all functions. That is, the ratio of the eigenvalues indicates the relative

discriminating power of the discriminant functions. If the ratio of two eigenvalues is

1.4, for instance, then the first discriminant function accounts for 40% more between-group variance in the dependent categories than does the second discriiminant

function. Eigenvalues are part of the default output in SPSS (Analyze, Classify,

Discriminant).

The relative percentage of a discriminant function equals a function's eigenvaluedivided by the sum of all eigenvalues of all discriminant functions in the model. Thus

it is the percent of discriminating power for the model associated with a given

discriminant function. Relative % is used to tell how many functions are important.One may find that only the first two or so eigenvalues are of importance.

The canonical correlation, R, is a measure of the association between the groups

formed by the dependent and the given discriminant function. When R is zero, there

is no relation between the groups and the function. When the canonical correlation is

large, there is a high correlation between the discriminant functions and the groups.Note that relative % and R* do not have to be correlated. R is used to tell how much

each function is useful in determining group differences. An R of 1.0 indicates that allof the variability in the discriminant scores can be accounted for by that dimension.

Note that for two-group DA, the canonical correlation is equivalent to the Pearsonian

correlation of the discriminant scores with the grouping variable.

The discriminant score, also called the DA score, is the value resulting fromapplying a discriminant function formula to the data for a given case. The Z score is

the discriminant score for standardized data. To get discriminant scores in SPSS,

select Analyze, Classify, Discriminant; click the Save button; check "Discriminant

scores". One can also view the discriminant scores by clicking the Classify button andchecking "Casewise results."

Cutoff: If the discriminant score of the function is less than or equal to the cutoff, the

case is classed as 0, or if above it is classed as 1. When group sizes are equal, thecutoff is the mean of the two centroids (for two-group DA). If the groups are unequal,

the cutoff is the weighted mean.

Unstandardized discriminant coefficients are used in the formula for making the

classifications in DA, much as b coefficients are used in regression in makingpredictions. The constant plus the sum of products of the unstandardized coefficients

with the observations yields the discriminant scores. That is, discriminant coefficients

are the regression-like b coefficients in the discriminant function, in the form

L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by thediscriminant function, the b's are discriminant coefficients, the x's are discriminating

variables, and c is a constant. The discriminant function coefficients are partial

coefficients, reflecting the unique contribution of each variable to the classification ofthe criterion variable. Thestandardized discriminant coefficients, like beta weights in

8
http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constant


9/22

regression, are used to assess the relative classifying importance of the independent

variables.

Standardized discriminant coefficients, also termed the standardized canonical

discriminant function coefficients, are used to compare the relative importance of the

independent variables, much as beta weights are used in regression. Note thatimportance is assessed relative to the model being analyzed. Addition or deletion of

variables in the model can change discriminant coefficients markedly.

As with regression, since these are partial coefficients, only the unique explanation of

each independent is being compared, not considering any shared explanation. Also, if

there are more than two groups of the dependent, the standardized discriminant

coefficients do not tell the researcher between which groups the variable is most orleast discriminating. For this purpose, group centroids and factor structure are

examined.

Q-2 What is Wilks Lambda?

Wilks' lambda is used to test the significance of the discriminant function as awhole. In SPSS, the "Wilks' Lambda" table will have a column labeled "Test of

Function(s)" and a row labeled "1 through n" (where n is the number of discriminant

functions). The "Sig." level for this row is the significance level of the discriminantfunction as a whole. A significant lambda means one can reject the null hypothesis

that the two groups have the same mean discriminant function scores. Wilks's lambda

is part of the default output in SPSS (Analyze, Classify, Discriminant). In SPSS, this

use of Wilks' lambda is in the "Wilks' lambda" table of the output section on"Summary of Canonical Discriminant Functions."

ANOVA table for discriminant scores is another overall test of the DA model. It is

an F test, where a "Sig." p value < .05 means the model differentiates discriminantscores between the groups significantly better than chance (than a model with just the

constant). It is obtained in SPSS by asking for Analyze, Compare Means, One-Way

ANOVA, using discriminant scores from DA (which SPSS will label Dis1_1 or

similar) as dependent.

Wilks' lambda also can be used to test which independents contribute significantly to

the discriminant function. The smaller the lambda for an independent variable, the

more that variable contributes to the discriminant function. Lambda varies from 0 to

1, with 0 meaning group means differ (thus the more the variable differentiates thegroups), and 1 meaning all group means are the same. The F test of Wilks's lambda

shows which variables' contributions are significant. Wilks's lambda is sometimes

called the U statistic. In SPSS, this use of Wilks' lambda is in the "Tests of equality ofgroup means" table in DA output.

Q-3 What is Confusion or classification Matrix ?

9


10/22

Ans:

The classification table, also called a classification matrix, or a confusion,

assignment, or prediction matrix or table, is used to assess the performance of DA.This is simply a table in which the rows are the observed categories of the dependent

and the columns are the predicted categories of the dependents. When prediction isperfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is

the percentage of correct classifications. This percentage is called the hit ratio.

Expected hit ratio. Note that the hit ratio must be compared not to zero but to the

percent that would have been correctly classified by chance alone. For two-group

discriminant analysis with a 50-50 split in the dependent variable, the expected

percent is 50%. For unequally split 2-way groups of different sizes, the expected percent is computed in the "Prior Probabilities for Groups" table in SPSS, by

multiplying the prior probabilities times the group size, summing for all groups, and

dividing the sum by N.

Adapted from the link:

http://faculty.chass.ncsu.edu/garson/PA765/discrim2.htm

Suggested Readings:

Huberty, Carl J. (1994).Applied discriminant analysis . NY: Wiley-Interscience.

(Wiley Series in Probability and Statistics).

Klecka, William R. (1980).Discriminant analysis. Quantitative Applications in the

Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.

Lachenbruch, P. A. (1975).Discriminant analysis. NY: Hafner.

10


11/22

Cluster Analysis

Q-1 What is Cluster Analysis ?

Ans:

Cluster analysis, also called segmentation analysis or taxonomy analysis, seeks to

identify homogeneous subgroups of cases in a population. That is, cluster analysisseeks to identify a set of groups which both minimize within-group variation and

maximize between-group variation. Other techniques, such as latent class analysis

and Q-mode factor analysis, also perform clustering and are discussed separately.

SPSS offers three general approaches to cluster analysis. Hierarchical clustering

allows users to select a definition of distance, then select a linking method of forming

clusters, then determine how many clusters best suit the data. In k-means clustering

the researcher specifies the number of clusters in advance, then calculates how toassign cases to the K clusters. K-means clustering is much less computer-intensive

and is therefore sometimes preferred when datasets are very large (ex., > 1,000).

Finally, two-step clusteringcreates pre-clusters, then it clusters the pre-clusters.

Key Concepts and Terms

Cluster formation is the selection of the procedure for determining how clusters are

created, and how the calculations are done. In agglomerative hierarchical clustering

every case is initially considered a cluster, then the two cases with the lowest distance (orhighest similarity) are combined into a cluster. The case with the lowest distance to either

of the first two is considered next. If that third case is closer to a fourth case than it is to

either of the first two, the third and fourth cases become the second two-case cluster; if

not, the third case is added to the first cluster. The process is repeated, adding cases toexisting clusters, creating new clusters, or combining clusters to get to the desired final

number of clusters. There is also divisive clustering, which works in the opposite

direction, starting with all cases in one large cluster. Hierarchical cluster analysis,discussed below, can use either agglomerative or divisive clustering strategies.

Similarity and Distance

11
http://www2.chass.ncsu.edu/garson/pa765/latclass.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htmhttp://www2.chass.ncsu.edu/garson/pa765/latclass.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htm


12/22

Distance. The first step in cluster analysis is establishment of the similarity or distance

matrix. This matrix is a table in which both the rows and columns are the units of analysis

and the cell entries are a measure of similarity or distance for any pair of cases.

Euclidean distance is the most common distance measure. A given pair of cases is plotted

on two variables, which form the x and y axes. The Euclidean distance is the square rootof the sum of the square of the x difference plus the square of the y distance. (Recall high

school geometry: this is the formula for the length of the third side of a right triangle.) Itis common to use the square of Euclidean distance as squaring removes the sign. When

two or more variables are used to define distance, the one with the larger magnitude will

dominate, so to avoid this it is common to first standardize all variables.

There are a variety of different measures of inter-observation distances and inter-clusterdistances to use as criteria when merging nearest clusters into broader groups or when

considering the relation of a point to a cluster. SPSS supports these interval distance

measures: Euclidean distance, squared Euclidean distance, Chebychev, block,

Minkowski, or customized; for count data, chi-square or phi-square. For binary data, itsupports Euclidean distance, squared Euclidean distance, size difference, pattern

difference, variance, shape, or Lance and Williams.

Similarity.Distance measures how far apart two observations are. Cases which are alikeshare a low distance. Similarity measures how alike two cases are. Cases which are alike

share a high similarity. SPSS supports a large number of similarity measures for interval

data (Pearson correlation or cosine) and for binary data (Russell and Rao, simplematching, Jaccard, Dice, Rogers and Tanimoto, Sokal and Sneath 1, Sokal and Sneath 2,

Sokal and Sneath 3, Kulczynski 1, Kulczynski 2, Sokal and Sneath 4, Hamann, Lambda,

Anderberg's D, Yule's Y, Yule's Q, Ochiai, Sokal and Sneath 5, phi 4-point correlation, or

dispersion).

Absolute values. Since for Pearson correlation, high negative as well as high positive

values indicate similarity, the researcher normally selects absolute values. This can be

done by checking the absolute value checkbox in the Transform Measures area of theMethods subdialog (invoked by pressing the Methods button) of the main Cluster dialog.

Summary. In SPSS, similarity/distance measures are selected in the Measure area of the

Method subdialog obtained by pressing the Method button in the Classify dialog. There

are three measure pulldown menus, for interval, binary, and count data respectively.Theproximity matrix table in the output shows the actual distances or similarities computed

for any pair of cases. In SPSS, proximity matrices are selected under Analyze, Cluster,

Hierarchical clustering; Statistics button; check proximity matrix.

Method. Under the Method button in the SPSS Classify dialog, the pull-down Methodselection determines how cases or clusters are combined at each step. Different methods

will result in different cluster patterns. SPSS offers these method choices:

12


13/22

Nearest neighbor. In this single linkage method, the distance between two clusters is the

distance between their closest neighboring points

Furthest neighbor. In this complete linkage method, the distance between two clusters isthe distance between their two furthest member points.

UPGMA (unweighted pair-group method using averages). The distance between two

clusters is the average distance between all inter-cluster pairs. UPGMA is generally

preferred over nearest or furthest neighbor methods since it is based on information aboutall inter-cluster pairs, not just the nearest or furthest ones. and is the default method in

SPSS. SPSS labels this "between-groups linkage."

Average linkage within groups is the mean distance between all possible inter- or intra-

cluster pairs. The average distance between all pairs in the resulting cluster is made to beas small as possibile. This method is therefore appropriate when the research purpose is

homogeneity within clusters. SPSS labels this "within-groups linkage."

Ward's method calculates the sum of squared Euclidean distances from each case in a

cluster to the mean of all variables. The cluster to be merged is the one which willincrease the sum the least. This is an ANOVA-type approach and preferred by some

researchers for this reason.

Centroid method. The cluster to be merged is the one with the smallest sum of Euclidean

distances between cluster means for all variables.

Median method. Clusters are weighted equally regardless of group size when computing

centroids of two clusters being combined. This method also uses Euclidean distance as

the proximity measure.

Correlation of items can be used as a similarity measure. One transposes the normal datatable in which columns are variables and rows are cases. By using columns as cases and

rows as variables instead, the correlation is between cases and these correlations may

constitute the cells of the similarity matrix.

Binary matchingis another type of similarity measure, where 1 indicates a match and 0indicates no match between any pair of cases. There are multiple matched attributes and

the similarity score is the number of matches divided by the number of attributes being

matched. Note that it is usual in binary matching to have several attributes because there

is a risk that when the number of attributes is small, they may be orthogonal to(uncorrelated) with one another, and clustering will be indeterminate.

Summary measures assess how the clusters differ from one another.

Means and variances. A table of means and variances of the clusters with respect to the

original variables shows how the clusters differ on the original variables. SPSS does notmake this available in the Cluster dialog, but one can click the Save button, which will

13


14/22

save the cluster number for each case (or numbers if multiple solutions are requested).

Then in Analyze, Compare Means, Means the researcher can use the cluster number as

the grouping variable to compare differences of means on any other continuous variablein the dataset.

Linkage tables show the relation of the cases to the clusters.

Cluster membership table. This shows cases as rows, where columns are alternative

numbers of clusters in the solution (as specified in the "Range of Solution" option in theCluster membership group in SPSS, under the Statistics button). Cell entries show the

number of the cluster to which the case belongs. From this table, one can see which cases

are in which groups, depending on the number of clusters in the solution.

Agglomeration Schedule. Agglomeration schedule is a choice under the Statistics buttonfor Hierarchical Cluster in the SPSS Cluster dialog. In this table, the rows are stages of

clustering, numbered from 1 to (n - 1). The (n - 1) th stage includes all the cases in one

cluster. There are two "Cluster Combined" columns, giving the case or cluster numbersfor combination at each stage. In agglomerative clustering using a distance measure like

Euclidean distance, stage 1 combines the two cases which have lowest proximity

(distance) score. The cluster number goes by the lower of the cases or clusters combined,

where cases are initially numbered 1 to n. For instance, at Stage 1, cases 3 and 18 mightbe combined, resulting in a cluster labeled 3. Later cluster 3 and case 2 might be

combined, resulting in a cluster labeled 2. The researcher looks at the "Coefficients"

column of the agglomerative schedule and notes when the proximity coefficient jumps upand is not a small increment from the one before (or when the coefficient reaches some

theoretically important level). Note that for distance measures, low is good, meaning the

cases are alike; for similarity measures, high coefficients mean cases are alike. After the

stopping stage is determined in this manner, the researcher can work backward todetermine how many clusters there are and which cases belong to which clusters (but it is

easier just to get this information from the cluster membership table). Note, though, that

SPSS will not stop on this basis but instead will compute the range of solutions (ex., 2 to4 clusters) requested by the researcher in the Cluster Membership group of the Statistics

button in th Hierarchical Clustering dialog. When there are relatively few cases, icicle

plots or dendograms provide the same linkage information in an easier format.

Linkage plots show similar information in graphic form.

Icicle plots are usually horizontal, showing cases as rows and number of clusters in the

solution as columns. If there are few cases, vertical icicle plots may plotted, with cases as

columns. Reading from the last column right to left (horizontal icicle plots) or last rowbottom to top (vertical icicle plots), the researcher can see how agglomeration proceeded.

The last/bottom row will show all the cases in separate one-case clusters. This is the (n -

1) solution. The next-to-last/bottom column/row will show the (n-2) solution, with two

cases combined into one cluster. Subsequent columns/rows show further clustering steps.Row 1 (vertical icicle plots) or column 1 (horizontal icicle plots) will show all cases in a

14


15/22

single cluster. This is a visual way of representing information on the agglomeration

schedule, but without the proximity coefficient information.

Dendrograms, also called tree diagrams, show the relative size of the proximitycoefficients at which cases were combined. The bigger the distance coefficient or the

smaller the similarity coefficient, the more clustering involved combining unlike entities,which may be undesirable. Trees are usually depicted horizontally, not vertically, with

each row representing a case on the Y axis, while the X axis is a rescaled version of theproximity coefficients. Cases with low distance/high similarity are close together. Cases

showing low distance are close, with a line linking them a short distance from the left of

the dendogram, indicating that they are agglomerated into a cluster at a low distancecoefficient, indicating alikeness. When, on the other hand, the linking line is to the right

of the dendogram the linkage occurs a high distance coefficient, indicating the

cases/clusters were agglomerated even though much less alike. If a similarity measure isused rather than a distance measure, the rescaling of the X axis still produces a diagram

with linkages involving high alikeness to the left and low alikeness to the right. In SPSS,

select Analyze, Classify, Hierarchical Cluster; click the Plots button, check theDendogram checkbox.

What is Hierarchical Cluster Analysis ?

Hierarchical clustering is appropriate for smaller samples (typically < 250). To

accomplish hierarchical clustering, the researcher must specify how similarity or distance

is defined, how clusters are aggregated (or divided), and how many clusters are needed.Hierarchical clustering generates all possible clusters of sizes 1...K, but is used only for

relatively small samples. In hierarchical clustering, the clusters are nested rather than

being mutually exclusive, as is the usual case..That is, in hierarchical clustering, larger

clusters created at later stages may contain smaller clusters created at earlier stages ofagglomeration.

One may wish to use the hierarchical cluster procedure on a sample of cases (ex., 200) to

inspect results for different numbers of clusters. The optimum number of clustersdepends on the research purpose. Identifying "typical" types may call for few clusters and

identifying "exceptional" types may call for many clusters. After using hierarchical

clustering to determine the desired number of clusters, the researcher may wish then toanalyze the entire dataset with k-means clustering (aka, the Quick Cluster procedure:

Analyze, Cluster, K-Means Cluster Analysis), specifying that number of clusters.

Forward clustering, also called agglomerative clustering: Small clusters are formed byusing a high similarity index cut-off (ex., .9). Then this cut-off is relaxed to establishbroader and broader clusters in stages until all cases are in a single cluster at some low

similarity index cut-off. The merging of clusters is visualized using a tree format.

Backward clustering, also called divisive clustering, is the same idea, but starting with a

low cut-off and working toward a high cut-off. Forward and backward methods need notgenerate the same results.

15


16/22

Clustering variables. In the Hierarchical Cluster dialog, in the Cluster group, the

researcher may selected Variable rather than the usual Cases, in order to cluster variables.

SPSS calls hierarchical clustering the "Cluster procedure." In SPSS, select Analyze,Classify, Hierarchical Cluster; select variables; select Cases in the Cluster group click

Statistics, select Proximity Matrix; select Range of Solutions in the Cluster Membershipgroup, specify the number of clusters (typically 3 to 6); Continue; OK.

What is K-means Cluster Analysis ?

K-means cluster analysis. K-means cluster analysis uses Euclidean distance. Theresearcher must specify in advance the desired number of clusters, K. Initial cluster

centers are chosen in a first pass of the data, then each additional iteration groups

observations based on nearest Euclidean distance to the mean of the cluster. Clustercenters change at each pass. The process continues until cluster means do not shift more

than a given cut-off value or the iteration limit is reached.

Cluster centers are the average value on all clustering variables of each cluster's

members. The "Initial cluster centers," in spite of its title, gives the average value of eachvariable for each cluster for the k well-spaced cases which SPSS selects for initialization

purposes when no initial file is supplied. The "Final cluster centers" table in SPSS output

gives the same thing for the last iteration step. The "Iteration history" table shows thechange in cluster centers when the usual iterative approach is taken. When the change

drops below a specified cutoff, the iterative process stops and cases are assigned to

clusters according to which cluster center they are nearest.

Large datasets are possible with K-means clustering, unlike hierarchical clustering,

because K-means clustering does not require prior computation of a proximity matrix ofthe distance/similarity of every case with every other case.

Method. The default method is "Iterate and classify," under which an interative process is

used to update cluster centers, then cases are classified based on the updated centers.However, SPSS supports a "Classify only" method, under which cases are immediately

classified based on the initial cluster centers, which are not updated.

Agglomerative K-means clustering. Normally in K-means clustering, a given case may be

assigned to a cluster, then reassigned to a different cluster as the algorithm unfolds.However, in agglomerative K-means clustering, the solution is constrained to force a

given case to remain in its initial cluster.

SPSS: Analyze, Cluster, K-Means Cluster Analysis; enter variables in the Variables: area;

optionally, enter a variable in the "Label cases by:" area; enter "Number of clusters:";choose Method: Iiterate and classify, or just Classify); Unlike hierarchical clustering,

there is no option for "Range of solutions"; instead you must re-run K-means clustering,

asking for a different number of clusters.

16


17/22

Iterate button. Optionally, you may press the Iterate button and set the number of

iterations and the convergence criterion. The default maximum number of iterations in

SPSS is 10. For the convergence criterion, by default, iterations terminate if the largestchange in any cluster center is less than 2% of the minimum distance between initial

centers (or if the maximum number of iterations has been reached). To override this

default, enter a positive number less than or equal to 1 in the convergence box. There isalso a "Use running means" checkbox which, if checked, will cause the clulster centers to

be updated after each case is classified, not the default, which is after the entire set of

cases is classified.

Save button: Optionally, you may press the Save button to save the final cluster numberof each case as an added column in your dataset (labeled QCL_1), and/or you may save

the Euclidean distance between each case and its cluster center (labeled QCL_2) by

checking "Distance from cluster center."

Options button: Optionally, you may press the Options button to select statistics or

missing values options. There are three statistics options: "Initial cluster centers" (givesthe initial variable means for each clusters); ANOVA table (ANOVA F-tests for each

variable., but as the F tests are only descriptive, the resulting probabilities are forexploratory purposes only; nonetheless, non-significant variables might be dropped as not

contributing to the differentiation of clusters); and "Cluster information for each case"

(gives each case's final cluster assignment and the Euclidean distance between the caseand the cluster center; also gives the Euclidean distance between final cluster centers).

Getting different clusters. Sometimes the researcher wishes to experiment to get different

clusters, as when the "Number of cases in each cluster" table shows highly imbalanced

clusters and/or clusters with very few members. Different results may occur by setting

different initial cluster centers from file (see above), by changing the number of clustersrequested, or even by presenting the data file in different case order.

What is Two-Step Cluster Analysis ?

Two-step cluster analysis groups cases into pre-clusters which are treated as single cases.

Standard hierarchical clustering is then applied to the pre-clusters in the second step. Thisis the method used when one or more of the variables are categorical (not interval or

dichotomous). Also, since it is a method requiring neither a proximity table like

hierarchical classification nor an iterative process like K-means clustering, but rather is aone-pass-through-the-dataset method, it is recommended for very large datasets.

Cluster feature tree.. The preclustering stage employs a CFtree with nodes leading to leaf

nodes. Cases start at the root node and are channeled toward nodes and eventually leaf

nodes which match it most closely. If there is no adequate match, the case is used to startits own leaf node. It can happen that the CFtree fills up and cannot accept new leaf entries

in a node, in which case it is split using the most-distant pair in the node as seeds. If this

recursive process grows the CFtree beyond maximum size, the threshold distance isincreased and the tree is rebuilt, allowing new cases to be input. The process continues

17


18/22

until all the data are read. Click the Advanced button in the Options button dialog to set

threshold distances, maximum levels, and maximum branches per leaf node manually.

Proximity. When one or more of the variables are categorical, log-likelihood is thedistance measure used, with cases categorized under the cluster which is associated with

the largest log-likelihood. If variables are all continuous, Euclidean distance is used, withcases categorized under the cluster which is associated with the smallest Euclidean

distance.

Number of clusters. By default SPSS determines the number of clusters using the change

in BIC (the Schwarz Bayesian Criterion: when BIC change is small, it stops and selects

as many clusters as thus far created. It is also possible to have this done based on changes

in AIC (the Akaike Information Criterion), or to simply to tell SPSS how many clustersare wanted. The researcher can also ask for a range of solutions, such as 3-5 clusters. The

"Autoclustering statistics" table in SPSS output gives, for example, BIC and BIC change

for all solutions.

SPSS. Choose Analyze, Classify, Two-Step Cluster; select your categorical and

continuous variables; if desired, click Plots and select the plots wanted; Click Output and

select the statistics wanted (descriptive statistics, cluster frequencies, AIC or BIC);

Continue

Adapted from

http://faculty.chass.ncsu.edu/garson/PA765/cluster.htm


Suggested Readings:

Anil K. Jain, Richard C. Dubes,Algorithms for Clustering Data ,2004

Leonard Kaufman, Peter J. Rousseeuw,Finding Groups In Data: An Introduction

To Cluster Analysis,2005

18
http://faculty.chass.ncsu.edu/garson/PA765/cluster.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.bestwebbuys.com/Algorithms_for_Clustering_Data-ISBN_9780130222787.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://faculty.chass.ncsu.edu/garson/PA765/cluster.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.bestwebbuys.com/Algorithms_for_Clustering_Data-ISBN_9780130222787.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-search


19/22

Factor Analysis

Q-1 What is Factor Analysis?

Factor analysis is a correlational technique to determine meaningful clusters ofshared variance.

Factor Analysis should be driven by a researcher who has a deep and genuine

interest in relevant theory in order to get optimal value from choosing the righttype of factor analysis and interpreting the factor loadings.

Factor analysis beings begins with a large number of variables and then tries to

reduce the interrelationships amongst the variables to a few number of clusters orfactors.

Factor analysis finds relationships or natural connections where variables are

maximally correlated with one another and minimally correlated with other

variables, and then groups the variables accordingly.

After this process has been done many times a pattern appears of relationships orfactors that capture the essence of all of the data emerges.

Summary: Factor analysis refers to a collection of statistical methods for reducingcorrelational data into a smaller number of dimensions or factors

Key Concepts and Terms

Exploratory factor analysis (EFA) seeks to uncover the underlying structure of arelatively large set of variables. The researcher's priori assumption is that any indicator

may be associated with any factor. This is the most common form of factor analysis.

There is no prior theory and one uses factor loadings to intuit the factor structure of the

data.

Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the

loadings of measured (indicator) variables on them conform to what is expected on the

basis of pre-established theory. Indicator variables are selected on the basis of priortheory and factor analysis is used to see if they load as predicted on the expected number

of factors. The researcher's priori assumption is that each factor (the number and labels

of which may be specified priori) is associated with a specified subset of indicatorvariables. A minimum requirement of confirmatory factor analysis is that one

hypothesize beforehand the number of factors in the model, but usually also the

researcher will posit expectations about which variables will load on which factors (Kim

and Mueller, 1978b: 55). The researcher seeks to determine, for instance, if measurescreated to represent a latent variable really belong together.

Factor loadings: The factor loadings, also called component loadings in PCA, are the

correlation coefficients between the variables (rows) and factors (columns). Analogous toPearson's r, the squared factor loading is the percent of variance in that variable explained

by the factor. To get the percent of variance in all the variables accounted for by each

factor, add the sum of the squared factor loadings for that factor (column) and divide by

19


20/22

the number of variables. (Note the number of variables equals the sum of their variances

as the variance of a standardized variable is 1.) This is the same as dividing the factor's

eigenvalue by the number of variables.

Communality, h2, is thesquared multiple correlation for the variable as dependent using

the factors as predictors. The communality measures the percent of variance in a givenvariable explained by all the factors jointly and may be interpreted as the reliability of the

indicator. When an indicator variable has a low communality, the factor model is notworking well for that indicator and possibly it should be removed from the model.

However, communalities must be interpreted in relation to the interpretability of the

factors. A communality of .75 seems high but is meaningless unless the factor on whichthe variable is loaded is interpretable, though it usually will be. A communality of .25

seems low but may be meaningful if the item is contributing to a well-defined factor.

That is, what is critical is not the communality coefficient per se, but rather the extent towhich the item plays a role in the interpretation of the factor, though often this role is

greater when communality is high

Eigenvalues: Also called characteristic roots. The eigenvalue for a given factor

measures the variance in all the variables which is accounted for by that factor. The ratioof eigenvalues is the ratio of explanatory importance of the factors with respect to the

variables. If a factor has a low eigenvalue, then it is contributing little to the explanation

of variances in the variables and may be ignored as redundant with more importantfactors.

Thus, eigenvalues measure the amount of variation in the total sample accounted for by

each factor. Note that the eigenvalue is not the percent of variance explained but rather a

measure of amount of variance in relation to total variance (since variables are

standardized to have means of 0 and variances of 1, total variance is equal to the numberof variables). SPSS will output a corresponding column titled '% of variance'. A factor's

eigenvalue may be computed as the sum of its squared factor loadings for all the

variables.

Q-2 What are the criteria for determining the number of factors, roughly in the

order of frequency of use in social science (see Dunteman, 1989: 22-3) .

Kaiser criterion: A common rule of thumb for dropping the least important factors from

the analysis. The Kaiser rule is to drop all components with eigenvalues under 1.0. Kaisercriterion is the default in SPSS and most computer programs.

Scree plot: The Cattell scree test plots the components as the X axis and the

corresponding eigenvalues as the Y axis. As one moves to the right, toward later

components, the eigenvalues drop. When the drop ceases and the curve makes an elbowtoward less steep decline, Cattell's scree test says to drop all further components after the

one starting the elbow. This rule is sometimes criticised for being amenable to researcher-

controlled "fudging." That is, as picking the "elbow" can be subjective because the curvehas multiple elbows or is a smooth curve, the researcher may be tempted to set the cut-off

20


21/22

at the number of factors desired by his or her research agenda. Even when "fudging" is

not a consideration, the scree criterion tends to result in more factors than the Kaiser

criterion.

Variance explained criteria: Some researchers simply use the rule of keeping enough

factors to account for 90% (sometimes 80%) of the variation. Where the researcher's goalemphasizes parsimony (explaining variance with as few factors as possible), the criterion

could be as low as 50%.

Q-3 What are the different rotation methods used in factor analysis?

Ans:

No rotation is the default, but it is a good idea to select a rotation method, usually

varimax. The original, unrotated principal components solution maximizes the sum of

squared factor loadings, efficiently creating a set of factors which explain as much of the

variance in the original variables as possible. The amount explained is reflected in thesum of the eigenvalues of all factors. However, unrotated solutions are hard to interpret

because variables tend to load on multiple factors.

Varimax rotation is an orthogonal rotation of the factor axes to maximize the varianceof the squared loadings of a factor (column) on all the variables (rows) in a factor matrix,

which has the effect of differentiating the original variables by extracted factor. Each

factor will tend to have either large or small loadings of any particular variable. Avarimax solution yields results which make it as easy as possible to identify each variable

with a single factor. This is the most common rotation option.

Quartimax rotation is an orthogonal alternative which minimizes the number of factorsneeded to explain each variable. This type of rotation often generates a general factor onwhich most variables are loaded to a high or medium degree. Such a factor structure is

usually not helpful to the research purpose.

Q-4 How many cases are required to do factor analysis?

There is no scientific answer to this question, and methodologists differ. Alternativearbitrary "rules of thumb," in descending order of popularity, include those below. These

are not mutually exclusive: Bryant and Yarnold, for instance, endorse both STV and the

Rule of 200.

Rule of 10. There should be at least 10 cases for each item in the instrument being used.

STV ratio. The subjects-to-variables ratio should be no lower than 5 (Bryant andYarnold, 1995)

21


22/22

Rule of 100: The number of subjects should be the larger of 5 times the number of

variables, or 100. Even more subjects are needed when communalities are low and/or few

variables load on each factor. (Hatcher, 1994)

Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150 - 300 cases, more

toward the 150 end when there are a few highly correlated variables, as would be the casewhen collapsing highly multicollinear variables.

Q-5 What is "sampling adequacy" and what is it used for?

Measured by the Kaiser-Meyer-Olkin (KMO) statistics, sampling adequacy predicts ifdata are likely to factor well, based on correlation and partial correlation. In the old days

of manual factor analysis, this was extremely useful. KMO can still be used, however, to

assess which variables to drop from the model because they are too multicollinear.

There is a KMO statistic for each individual variable, and their sum is the KMO overall

statistic. KMO varies from 0 to 1.0 and KMO overall should be .60 or higher to proceedwith factor analysis. If it is not, drop the indicator variables with the lowest individual

KMO statistic values, until KMO overall rises above .60.

To compute KMO overall, the numerator is the sum of squared correlations of allvariables in the analysis (except the 1.0 self-correlations of variables with themselves, of

course). The denominator is this same sum plus the sum of squared partial correlations of

each variable i with each variable j, controlling for others in the analysis. The concept isthat the partial correlations should not be very large if one is to expect distinct factors to

emerge from factor analysis.

In SPSS, KMO is found under Analyze - Statistics - Data Reduction - Factor - Variables(input variables) - Descriptives - Correlation Matrix - check KMO and Bartlett's test ofsphericity and also check Anti-image - Continue - OK. The KMO output is KMO overall.

The diagonal elements on the Anti-image correlation matrix are the KMO individual

statistics for each variable.

Adapted from:

http://faculty.chass.ncsu.edu/garson/PA765/fact spss .htm

www.sussex.ac.uk/Users/andyf/factor.pdf


Suggested Readings

Bruce Thompson, Exploratory and Confirmatory Factor Analysis: Understanding

Concepts and Applications, 2004
http://faculty.chass.ncsu.edu/garson/PA765/factspss.htmhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://faculty.chass.ncsu.edu/garson/PA765/factspss.htmhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdf

Documents

BRM Multivariate Notes