BRM Multivariate Notes

Embed Size (px)

Citation preview

  • 8/3/2019 BRM Multivariate Notes

    1/22

    Multivariate Analysis

    Business Research Methods

    1

  • 8/3/2019 BRM Multivariate Notes

    2/22

    Multiple Regression

    Q-1 What is Multiple Regression?

    Ans :

    Multiple regression is used to account for (predict) the variance in an interval dependent,based on linear combinations of interval, dichotomous, or dummy independent variables.

    Multiple regression can establish that a set of independent variables explains a proportion

    of the variance in a dependent variable at a significant level (through a significance testof R2), and can establish the relative predictive importance of the independent variables

    (by comparing beta weights). Power terms can be added as independent variables to

    explore curvilinear effects. Cross-product terms can be added as independent variables toexplore interaction effects. One can test the significance of difference of two R2's to

    determine if adding an independent variable to the model helps significantly. Using

    hierarchical regression, one can see how most variance in the dependent can be explained

    by one or a set of new independent variables, over and above that explained by an earlierset. Of course, the estimates (b coefficients and constant) can be used to construct a

    prediction equation and generate predicted scores on a variable for further analysis.

    The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's

    are the regression coefficients, representing the amount the dependent variable y changeswhen the corresponding independent changes 1 unit. The c is the constant, where the

    regression line intercepts the y axis, representing the amount the dependent y will be

    when all the independent variables are 0. The standardized version of the b coefficientsare the beta weights, and the ratio of the beta coefficients is the ratio of the relative

    predictive power of the independent variables. Associated with multiple regression is R2,

    multiple correlation, which is the percent of variance in the dependent variable explainedcollectively by all of the independent variables.

    Multiple regression shares all the assumptions of correlation: linearity of relationships,

    the same level of relationship throughout the range of the independent variable

    ("homoscedasticity"), interval or near-interval data, absence of outliers, and data whose

    range is not truncated. In addition, it is important that the model being tested is correctlyspecified. The exclusion of important causal variables or the inclusion of extraneous

    variables can change markedly the beta weights and hence the interpretation of theimportance of the independent variables.

    2

  • 8/3/2019 BRM Multivariate Notes

    3/22

    Q-2 What is R-square ?

    Ans:

    R2

    , also called multiple correlation or the coefficient of multiple determination, is the percent of the variance in the dependent explained uniquely or jointly by the

    independents. R-squared can also be interpreted as the proportionate reduction in error in

    estimating the dependent when knowing the independents. That is, R2 reflects the numberof errors made when using the regression model to guess the value of the dependent, in

    ratio to the total errors made when using only the dependent's mean as the basis for

    estimating all cases. Mathematically, R2 = (1 - (SSE/SST)), where SSE = error sum of

    squares = SUM((Yi - EstYi)squared), where Yi is the actual value of Y for the ith caseand EstYi is the regression prediction for the ith case; and where SST = total sum of

    squares = SUM((Yi - MeanY)squared). The "residual sum of squares" in SPSS /SAS

    output is SSE and reflects regression error. Thus R-square is 1 minus regression error as a

    percent of total error and will be 0 when regression error is as large as it would be if yousimply guessed the mean for all cases of Y. Put another way, the regression sum of

    squares/total sum of squares = R-square, where the regression sum of squares = total sumof squares - residual sum of squares

    Q-3 What is Adjusted R-square and How it is calculated?

    Ans:

    Adjusted R-Square is an adjustment for the fact that when one has a large number of

    independents, it is possible that R2 will become artificially high simply because some

    independents' chance variations "explain" small parts of the variance of the dependent. At

    the extreme, when there are as many independents as cases in the sample, R

    2

    will always be 1.0. The adjustment to the formula arbitrarily lowers R2 as p, the number of

    independents, increases. Some authors conceive of adjusted R2 as the percent of variance

    "explained in a replication, after subtracting out the contribution of chance." When usedfor the case of a few independents, R2 and adjusted R2 will be close. When there are a

    great many independents, adjusted R2 may be noticeably lower. The greater the number

    of independents, the more the researcher is expected to report the adjusted coefficient.Always use adjusted R2 when comparing models with different numbers of independents.

    Adjusted R2 = 1 - ( (1-R2)(N-1 / N - k - 1) ).where n is sample size and k is the number of terms in the model not counting the

    constant (i.e., the number of independents).

    Q-4 What is Multicollinearity and How it is measured?

    Multicollinearity is the intercorrelation of independent variables. R2's near 1 violate the

    assumption ofno perfect collinearity, while high R2's increase the standard error of the

    3

    http://www2.chass.ncsu.edu/garson/pa765/regress.htm#perfect%23perfecthttp://www2.chass.ncsu.edu/garson/pa765/regress.htm#perfect%23perfect
  • 8/3/2019 BRM Multivariate Notes

    4/22

    beta coefficients and make assessment of the unique role of each independent difficult or

    impossible. While simple correlations tell something about multicollinearity, the

    preferred method of assessing multicollinearity is to regress each independent on all theother independent variables in the equation. Inspection of the correlation matrix reveals

    only bivariate multicollinearity, with the typical criterion being bivariate correlations > .

    90. To assess multivariate multicollinearity, one uses tolerance or VIF, which build in theregressing of each independent on all the others. Even when multicollinearity is present,

    note that estimates of the importance of other variables in the equation (variables which

    are not collinear with others) are not affected.

    Types of multicollinearity. The type of multicollinearity matters a great deal. Some typesare necessary to the research purpose

    Tolerance is 1 - R2 for the regression of that independent variable on all the other

    independents, ignoring the dependent. There will be as many tolerance coefficients as

    there are independents. The higher the intercorrelation of the independents, the more the

    tolerance will approach zero. As a rule of thumb, if tolerance is less than .20, a problemwith multicollinearity is indicated.

    When tolerance is close to 0 there is high multicollinearity of that variable with other

    independents and the b and beta coefficients will be unstable.The more themulticollinearity, the lower the tolerance, the more the standard error of the regression

    coefficients. Tolerance is part of the denominator in the formula for calculating the

    confidence limits on the b (partial regression) coefficient.

    Variance-inflation factor, VIFVIFis the variance inflation factor, which is simply thereciprocal of tolerance. Therefore, when VIF is high there is high multicollinearity and

    instability of the b and beta coefficients. VIF and tolerance are found in the SPSS andSAS output section on collinearity statistics.

    Condition indices and variance proportions.

    Condition indices are used to flag excessive collinearity in the data. A condition index

    over 30 suggests serious collinearity problems and an index over 15 indicates possible

    collinearity problems. If a factor (component) has a high condition index, one looks in thevariance proportions column. Criteria for "sizable proportion" vary among researchers

    but the most common criterion is if two or more variables have a variance partition of .50

    or higher on a factor with a high condition index. If this is the case, these variables have

    high linear dependence and multicollinearity is a problem, with the effect that small datachanges or arithmetic errors may translate into very large changes or errors in the

    regression analysis. Note that it is possible for the rule of thumb for condition indices (no

    index over 30) to indicate multicollinearity, even when the rules of thumb for tolerance> .20 or VIF < 4 suggest no multicollinearity. Computationally, a "singular value" is the

    square root of an eigenvalue, and "condition indices" are the ratio of the largest singular

    values to each other singular value. In SPSS or SAS, select Analyze, Regression,Linear;click Statistics; check Collinearity diagnostics to get condition indices.

    4

    http://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc%23toleranchttp://www2.chass.ncsu.edu/garson/pa765/regress.htm#toleranc%23toleranc
  • 8/3/2019 BRM Multivariate Notes

    5/22

    Q-5 What is homoscedasticity ?

    Homoscedasticity: The researcher should test to assure that the residuals are dispersed

    randomly throughout the range of the estimated dependent. Put another way, thevariance of residual error should be constant for all values of the independent(s). If not,

    separate models may be required for the different ranges. Also, when thehomoscedasticity assumption is violated "conventionally computed confidence

    intervals and conventional t-tests for OLS estimators can no longer be justified"However, moderate violations of homoscedasticity have only minor impact on

    regression estimates .

    Nonconstant error variance can be observed by requesting a simple residual plot (a plot

    of residuals on the Y axis against predicted values on the X axis). A homoscedasticmodel will display a cloud of dots, whereas lack of homoscedasticity will be

    characterized by a pattern such as a funnel shape, indicating greater error as the

    dependent increases. Nonconstant error variance can indicate the need to respecify the

    model to include omitted independent variables.

    Lack of homoscedasticity may mean (1) there is an interaction effect between a

    measured independent variable and an unmeasured independent variable not in the

    model; or (2) that some independent variables are skewed while others are not.

    One method of dealing with hetereoscedasticity is to select the weighted least squares

    regression option. This causes cases with smaller residuals to be weighted more in

    calculating the b coefficients. Square root, log, and reciprocal transformations of the

    dependent may also reduce or eliminate lack of homoscedasticity.

    Suggested Readings and Links:http://www2.chass.ncsu.edu/garson/pa765/regress.htm

    www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdf

    Kahane, Leo H. (2001).Regression basics. Thousand Oaks, CA: Sage Publications.

    Menard, Scott (1995).Applied logistic regression analysis. Thousand Oaks, CA: Sage

    Publications. Series: Quantitative Applications in the Social Sciences, No. 106.

    Miles, Jeremy and Mark Shevlin (2001).Applying regression and correlation. ThousandOaks, CA: Sage Publications. Introductory text built around model-building.

    Schroeder, Larry D., David L. Sjoquist, and Paula E. Stephan (1986). Understanding

    regression analysis: An introductory guide. Thousand Oaks, CA: Sage Publications.

    Series: Quantitative Applications in the Social Sciences, No. 57.

    5

    http://www2.chass.ncsu.edu/garson/pa765/regress.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdfhttp://www2.chass.ncsu.edu/garson/pa765/regress.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss4.pdf
  • 8/3/2019 BRM Multivariate Notes

    6/22

    Discriminant Analysis

    Q-1 What is Disriminant Analysis ?

    Ans:

    Discriminant function analysis, a.k.a. discriminant analysis or DA, is used to classify

    cases into the values of a categorical dependent, usually a dichotomy. If discriminantfunction analysis is effective for a set of data, the classification table of correct and

    incorrect estimates will yield a high percentage correct. Discriminant function analysis is

    found in SPSS/SAS under Analyze, Classify, Discriminant. One gets DA or MDA fromthis same menu selection, depending on whether the specified grouping variable has two

    or more categories.

    Multiple discriminant analysis (MDA) is an extension of discriminant analysis and a

    cousin of multiple analysis of variance (MANOVA), sharing many of the sameassumptions and tests. MDA is used to classify a categorical dependent which has more

    than two categories, using as predictors a number of interval or dummy independent

    variables. MDA is sometimes also called discriminant factor analysis or canonical

    discriminant analysis.

    There are several purposes for DA and/or MDA:

    To classify cases into groups using a discriminant prediction equation.

    To test theory by observing whether cases are classified as predicted.

    To investigate differences between or among groups.

    To determine the most parsimonious way to distinguish among groups. To assess the relative importance of the independent variables in classifying the

    dependent variable.

    To infer the meaning of MDA dimensions which distinguish groups, based on

    discriminant loadings.

    Discriminant analysis has two steps: (1) an F test (Wilks' lambda) is used to test if the

    discriminant model as a whole is significant, and (2) if the F test shows significance, thenthe individual independent variables are assessed to see which differ significantly in

    mean by group and these are used to classify the dependent variable.

    Discriminant analysis shares all the usual assumptions of correlation, requiring linear andhomoscedastic relationships, and untruncated interval or near interval data. Like multipleregression, it also assumes proper model specification (inclusion of all important

    independents and exclusion of extraneous variables). DA also assumes the dependent

    variable is a true dichotomy since data which are forced into dichotomous coding aretruncated, attenuating correlation.

    6

  • 8/3/2019 BRM Multivariate Notes

    7/22

    DA is an earlier alternative to logistic regression, which is now frequently used in place

    of DA as it usually involves fewer violations of assumptions (independent variables

    needn't be normally distributed, linearly related, or have equal within-group variances), isrobust, handles categorical as well as continuous variables, and has coefficients which

    many find easier to interpret. Logistic regression is preferred when data are not normal in

    distribution or group sizes are very unequal. See also the separate topic on multiplediscriminant function analysis (MDA) for dependents with more than two categories.

    Few Definitions and Concepts

    Discriminating variables: These are the independent variables, also calledpredictors.

    The criterion variable. This is the dependent variable, also called thegroupingvariable in SPSS. It is the object of classification efforts.

    Discriminant function: A discriminant function, also called a canonical root, is a

    latent variable which is created as a linear combination of discriminating(independent) variables, such that L = b1x1 + b2x2 + ... + bnxn + c, where the b's are

    discriminant coefficients, the x's are discriminating variables, and c is a constant.

    This is analogous to multiple regression, but the b's are discriminant coefficientswhich maximize the distance between the means of the criterion (dependent) variable.

    Note that the foregoing assumes the discriminant function is estimated using ordinary

    least-squares, the traditional method, but there is also a version involving maximumlikelihood estimation.

    Number of discriminant functions. There is one discriminant function for 2-group

    discriminant analysis, but for higher order DA, the number of functions (each with its

    own cut-off value) is the lesser of (g - 1), where g is the number of categories in the

    grouping variable, or p,the number of discriminating (independent) variables. Eachdiscriminant function is orthogonal to the others. A dimension is simply one of the

    discriminant functions when there are more than one, in multiple discriminant

    analysis.

    The eigenvalue, also called the characteristic root of each discriminant function,reflects the ratio of importance of the dimensions which classify cases of the

    dependent variable. There is one eigenvalue for each discriminant function. For two-

    group DA, there is one discriminant function and one eigenvalue, which accounts for100% of the explained variance. If there is more than one discriminant function, the

    first will be the largest and most important, the second next most important in

    explanatory power, and so on. The eigenvalues assess relative importance becausethey reflect the percents of variance explained in the dependent variable, cumulating

    7

    http://www2.chass.ncsu.edu/garson/pa765/logistic.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/logistic.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#mle%23mlehttp://www2.chass.ncsu.edu/garson/pa765/mda.htmhttp://www2.chass.ncsu.edu/garson/pa765/mda.htm
  • 8/3/2019 BRM Multivariate Notes

    8/22

    to 100% for all functions. That is, the ratio of the eigenvalues indicates the relative

    discriminating power of the discriminant functions. If the ratio of two eigenvalues is

    1.4, for instance, then the first discriminant function accounts for 40% more between-group variance in the dependent categories than does the second discriiminant

    function. Eigenvalues are part of the default output in SPSS (Analyze, Classify,

    Discriminant).

    The relative percentage of a discriminant function equals a function's eigenvaluedivided by the sum of all eigenvalues of all discriminant functions in the model. Thus

    it is the percent of discriminating power for the model associated with a given

    discriminant function. Relative % is used to tell how many functions are important.One may find that only the first two or so eigenvalues are of importance.

    The canonical correlation, R, is a measure of the association between the groups

    formed by the dependent and the given discriminant function. When R is zero, there

    is no relation between the groups and the function. When the canonical correlation is

    large, there is a high correlation between the discriminant functions and the groups.Note that relative % and R* do not have to be correlated. R is used to tell how much

    each function is useful in determining group differences. An R of 1.0 indicates that allof the variability in the discriminant scores can be accounted for by that dimension.

    Note that for two-group DA, the canonical correlation is equivalent to the Pearsonian

    correlation of the discriminant scores with the grouping variable.

    The discriminant score, also called the DA score, is the value resulting fromapplying a discriminant function formula to the data for a given case. The Z score is

    the discriminant score for standardized data. To get discriminant scores in SPSS,

    select Analyze, Classify, Discriminant; click the Save button; check "Discriminant

    scores". One can also view the discriminant scores by clicking the Classify button andchecking "Casewise results."

    Cutoff: If the discriminant score of the function is less than or equal to the cutoff, the

    case is classed as 0, or if above it is classed as 1. When group sizes are equal, thecutoff is the mean of the two centroids (for two-group DA). If the groups are unequal,

    the cutoff is the weighted mean.

    Unstandardized discriminant coefficients are used in the formula for making the

    classifications in DA, much as b coefficients are used in regression in makingpredictions. The constant plus the sum of products of the unstandardized coefficients

    with the observations yields the discriminant scores. That is, discriminant coefficients

    are the regression-like b coefficients in the discriminant function, in the form

    L = b1x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by thediscriminant function, the b's are discriminant coefficients, the x's are discriminating

    variables, and c is a constant. The discriminant function coefficients are partial

    coefficients, reflecting the unique contribution of each variable to the classification ofthe criterion variable. Thestandardized discriminant coefficients, like beta weights in

    8

    http://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constanthttp://www2.chass.ncsu.edu/garson/pa765/discrim.htm#constant%23constant
  • 8/3/2019 BRM Multivariate Notes

    9/22

    regression, are used to assess the relative classifying importance of the independent

    variables.

    Standardized discriminant coefficients, also termed the standardized canonical

    discriminant function coefficients, are used to compare the relative importance of the

    independent variables, much as beta weights are used in regression. Note thatimportance is assessed relative to the model being analyzed. Addition or deletion of

    variables in the model can change discriminant coefficients markedly.

    As with regression, since these are partial coefficients, only the unique explanation of

    each independent is being compared, not considering any shared explanation. Also, if

    there are more than two groups of the dependent, the standardized discriminant

    coefficients do not tell the researcher between which groups the variable is most orleast discriminating. For this purpose, group centroids and factor structure are

    examined.

    Q-2 What is Wilks Lambda?

    Wilks' lambda is used to test the significance of the discriminant function as awhole. In SPSS, the "Wilks' Lambda" table will have a column labeled "Test of

    Function(s)" and a row labeled "1 through n" (where n is the number of discriminant

    functions). The "Sig." level for this row is the significance level of the discriminantfunction as a whole. A significant lambda means one can reject the null hypothesis

    that the two groups have the same mean discriminant function scores. Wilks's lambda

    is part of the default output in SPSS (Analyze, Classify, Discriminant). In SPSS, this

    use of Wilks' lambda is in the "Wilks' lambda" table of the output section on"Summary of Canonical Discriminant Functions."

    ANOVA table for discriminant scores is another overall test of the DA model. It is

    an F test, where a "Sig." p value < .05 means the model differentiates discriminantscores between the groups significantly better than chance (than a model with just the

    constant). It is obtained in SPSS by asking for Analyze, Compare Means, One-Way

    ANOVA, using discriminant scores from DA (which SPSS will label Dis1_1 or

    similar) as dependent.

    Wilks' lambda also can be used to test which independents contribute significantly to

    the discriminant function. The smaller the lambda for an independent variable, the

    more that variable contributes to the discriminant function. Lambda varies from 0 to

    1, with 0 meaning group means differ (thus the more the variable differentiates thegroups), and 1 meaning all group means are the same. The F test of Wilks's lambda

    shows which variables' contributions are significant. Wilks's lambda is sometimes

    called the U statistic. In SPSS, this use of Wilks' lambda is in the "Tests of equality ofgroup means" table in DA output.

    Q-3 What is Confusion or classification Matrix ?

    9

  • 8/3/2019 BRM Multivariate Notes

    10/22

    Ans:

    The classification table, also called a classification matrix, or a confusion,

    assignment, or prediction matrix or table, is used to assess the performance of DA.This is simply a table in which the rows are the observed categories of the dependent

    and the columns are the predicted categories of the dependents. When prediction isperfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is

    the percentage of correct classifications. This percentage is called the hit ratio.

    Expected hit ratio. Note that the hit ratio must be compared not to zero but to the

    percent that would have been correctly classified by chance alone. For two-group

    discriminant analysis with a 50-50 split in the dependent variable, the expected

    percent is 50%. For unequally split 2-way groups of different sizes, the expected percent is computed in the "Prior Probabilities for Groups" table in SPSS, by

    multiplying the prior probabilities times the group size, summing for all groups, and

    dividing the sum by N.

    Adapted from the link:

    http://faculty.chass.ncsu.edu/garson/PA765/discrim2.htm

    Suggested Readings:

    Huberty, Carl J. (1994).Applied discriminant analysis . NY: Wiley-Interscience.

    (Wiley Series in Probability and Statistics).

    Klecka, William R. (1980).Discriminant analysis. Quantitative Applications in the

    Social Sciences Series, No. 19. Thousand Oaks, CA: Sage Publications.

    Lachenbruch, P. A. (1975).Discriminant analysis. NY: Hafner.

    10

  • 8/3/2019 BRM Multivariate Notes

    11/22

    Cluster Analysis

    Q-1 What is Cluster Analysis ?

    Ans:

    Cluster analysis, also called segmentation analysis or taxonomy analysis, seeks to

    identify homogeneous subgroups of cases in a population. That is, cluster analysisseeks to identify a set of groups which both minimize within-group variation and

    maximize between-group variation. Other techniques, such as latent class analysis

    and Q-mode factor analysis, also perform clustering and are discussed separately.

    SPSS offers three general approaches to cluster analysis. Hierarchical clustering

    allows users to select a definition of distance, then select a linking method of forming

    clusters, then determine how many clusters best suit the data. In k-means clustering

    the researcher specifies the number of clusters in advance, then calculates how toassign cases to the K clusters. K-means clustering is much less computer-intensive

    and is therefore sometimes preferred when datasets are very large (ex., > 1,000).

    Finally, two-step clusteringcreates pre-clusters, then it clusters the pre-clusters.

    Key Concepts and Terms

    Cluster formation is the selection of the procedure for determining how clusters are

    created, and how the calculations are done. In agglomerative hierarchical clustering

    every case is initially considered a cluster, then the two cases with the lowest distance (orhighest similarity) are combined into a cluster. The case with the lowest distance to either

    of the first two is considered next. If that third case is closer to a fourth case than it is to

    either of the first two, the third and fourth cases become the second two-case cluster; if

    not, the third case is added to the first cluster. The process is repeated, adding cases toexisting clusters, creating new clusters, or combining clusters to get to the desired final

    number of clusters. There is also divisive clustering, which works in the opposite

    direction, starting with all cases in one large cluster. Hierarchical cluster analysis,discussed below, can use either agglomerative or divisive clustering strategies.

    Similarity and Distance

    11

    http://www2.chass.ncsu.edu/garson/pa765/latclass.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htmhttp://www2.chass.ncsu.edu/garson/pa765/latclass.htmhttp://www2.chass.ncsu.edu/garson/pa765/factor.htm
  • 8/3/2019 BRM Multivariate Notes

    12/22

    Distance. The first step in cluster analysis is establishment of the similarity or distance

    matrix. This matrix is a table in which both the rows and columns are the units of analysis

    and the cell entries are a measure of similarity or distance for any pair of cases.

    Euclidean distance is the most common distance measure. A given pair of cases is plotted

    on two variables, which form the x and y axes. The Euclidean distance is the square rootof the sum of the square of the x difference plus the square of the y distance. (Recall high

    school geometry: this is the formula for the length of the third side of a right triangle.) Itis common to use the square of Euclidean distance as squaring removes the sign. When

    two or more variables are used to define distance, the one with the larger magnitude will

    dominate, so to avoid this it is common to first standardize all variables.

    There are a variety of different measures of inter-observation distances and inter-clusterdistances to use as criteria when merging nearest clusters into broader groups or when

    considering the relation of a point to a cluster. SPSS supports these interval distance

    measures: Euclidean distance, squared Euclidean distance, Chebychev, block,

    Minkowski, or customized; for count data, chi-square or phi-square. For binary data, itsupports Euclidean distance, squared Euclidean distance, size difference, pattern

    difference, variance, shape, or Lance and Williams.

    Similarity.Distance measures how far apart two observations are. Cases which are alikeshare a low distance. Similarity measures how alike two cases are. Cases which are alike

    share a high similarity. SPSS supports a large number of similarity measures for interval

    data (Pearson correlation or cosine) and for binary data (Russell and Rao, simplematching, Jaccard, Dice, Rogers and Tanimoto, Sokal and Sneath 1, Sokal and Sneath 2,

    Sokal and Sneath 3, Kulczynski 1, Kulczynski 2, Sokal and Sneath 4, Hamann, Lambda,

    Anderberg's D, Yule's Y, Yule's Q, Ochiai, Sokal and Sneath 5, phi 4-point correlation, or

    dispersion).

    Absolute values. Since for Pearson correlation, high negative as well as high positive

    values indicate similarity, the researcher normally selects absolute values. This can be

    done by checking the absolute value checkbox in the Transform Measures area of theMethods subdialog (invoked by pressing the Methods button) of the main Cluster dialog.

    Summary. In SPSS, similarity/distance measures are selected in the Measure area of the

    Method subdialog obtained by pressing the Method button in the Classify dialog. There

    are three measure pulldown menus, for interval, binary, and count data respectively.Theproximity matrix table in the output shows the actual distances or similarities computed

    for any pair of cases. In SPSS, proximity matrices are selected under Analyze, Cluster,

    Hierarchical clustering; Statistics button; check proximity matrix.

    Method. Under the Method button in the SPSS Classify dialog, the pull-down Methodselection determines how cases or clusters are combined at each step. Different methods

    will result in different cluster patterns. SPSS offers these method choices:

    12

  • 8/3/2019 BRM Multivariate Notes

    13/22

    Nearest neighbor. In this single linkage method, the distance between two clusters is the

    distance between their closest neighboring points

    Furthest neighbor. In this complete linkage method, the distance between two clusters isthe distance between their two furthest member points.

    UPGMA (unweighted pair-group method using averages). The distance between two

    clusters is the average distance between all inter-cluster pairs. UPGMA is generally

    preferred over nearest or furthest neighbor methods since it is based on information aboutall inter-cluster pairs, not just the nearest or furthest ones. and is the default method in

    SPSS. SPSS labels this "between-groups linkage."

    Average linkage within groups is the mean distance between all possible inter- or intra-

    cluster pairs. The average distance between all pairs in the resulting cluster is made to beas small as possibile. This method is therefore appropriate when the research purpose is

    homogeneity within clusters. SPSS labels this "within-groups linkage."

    Ward's method calculates the sum of squared Euclidean distances from each case in a

    cluster to the mean of all variables. The cluster to be merged is the one which willincrease the sum the least. This is an ANOVA-type approach and preferred by some

    researchers for this reason.

    Centroid method. The cluster to be merged is the one with the smallest sum of Euclidean

    distances between cluster means for all variables.

    Median method. Clusters are weighted equally regardless of group size when computing

    centroids of two clusters being combined. This method also uses Euclidean distance as

    the proximity measure.

    Correlation of items can be used as a similarity measure. One transposes the normal datatable in which columns are variables and rows are cases. By using columns as cases and

    rows as variables instead, the correlation is between cases and these correlations may

    constitute the cells of the similarity matrix.

    Binary matchingis another type of similarity measure, where 1 indicates a match and 0indicates no match between any pair of cases. There are multiple matched attributes and

    the similarity score is the number of matches divided by the number of attributes being

    matched. Note that it is usual in binary matching to have several attributes because there

    is a risk that when the number of attributes is small, they may be orthogonal to(uncorrelated) with one another, and clustering will be indeterminate.

    Summary measures assess how the clusters differ from one another.

    Means and variances. A table of means and variances of the clusters with respect to the

    original variables shows how the clusters differ on the original variables. SPSS does notmake this available in the Cluster dialog, but one can click the Save button, which will

    13

  • 8/3/2019 BRM Multivariate Notes

    14/22

    save the cluster number for each case (or numbers if multiple solutions are requested).

    Then in Analyze, Compare Means, Means the researcher can use the cluster number as

    the grouping variable to compare differences of means on any other continuous variablein the dataset.

    Linkage tables show the relation of the cases to the clusters.

    Cluster membership table. This shows cases as rows, where columns are alternative

    numbers of clusters in the solution (as specified in the "Range of Solution" option in theCluster membership group in SPSS, under the Statistics button). Cell entries show the

    number of the cluster to which the case belongs. From this table, one can see which cases

    are in which groups, depending on the number of clusters in the solution.

    Agglomeration Schedule. Agglomeration schedule is a choice under the Statistics buttonfor Hierarchical Cluster in the SPSS Cluster dialog. In this table, the rows are stages of

    clustering, numbered from 1 to (n - 1). The (n - 1) th stage includes all the cases in one

    cluster. There are two "Cluster Combined" columns, giving the case or cluster numbersfor combination at each stage. In agglomerative clustering using a distance measure like

    Euclidean distance, stage 1 combines the two cases which have lowest proximity

    (distance) score. The cluster number goes by the lower of the cases or clusters combined,

    where cases are initially numbered 1 to n. For instance, at Stage 1, cases 3 and 18 mightbe combined, resulting in a cluster labeled 3. Later cluster 3 and case 2 might be

    combined, resulting in a cluster labeled 2. The researcher looks at the "Coefficients"

    column of the agglomerative schedule and notes when the proximity coefficient jumps upand is not a small increment from the one before (or when the coefficient reaches some

    theoretically important level). Note that for distance measures, low is good, meaning the

    cases are alike; for similarity measures, high coefficients mean cases are alike. After the

    stopping stage is determined in this manner, the researcher can work backward todetermine how many clusters there are and which cases belong to which clusters (but it is

    easier just to get this information from the cluster membership table). Note, though, that

    SPSS will not stop on this basis but instead will compute the range of solutions (ex., 2 to4 clusters) requested by the researcher in the Cluster Membership group of the Statistics

    button in th Hierarchical Clustering dialog. When there are relatively few cases, icicle

    plots or dendograms provide the same linkage information in an easier format.

    Linkage plots show similar information in graphic form.

    Icicle plots are usually horizontal, showing cases as rows and number of clusters in the

    solution as columns. If there are few cases, vertical icicle plots may plotted, with cases as

    columns. Reading from the last column right to left (horizontal icicle plots) or last rowbottom to top (vertical icicle plots), the researcher can see how agglomeration proceeded.

    The last/bottom row will show all the cases in separate one-case clusters. This is the (n -

    1) solution. The next-to-last/bottom column/row will show the (n-2) solution, with two

    cases combined into one cluster. Subsequent columns/rows show further clustering steps.Row 1 (vertical icicle plots) or column 1 (horizontal icicle plots) will show all cases in a

    14

  • 8/3/2019 BRM Multivariate Notes

    15/22

    single cluster. This is a visual way of representing information on the agglomeration

    schedule, but without the proximity coefficient information.

    Dendrograms, also called tree diagrams, show the relative size of the proximitycoefficients at which cases were combined. The bigger the distance coefficient or the

    smaller the similarity coefficient, the more clustering involved combining unlike entities,which may be undesirable. Trees are usually depicted horizontally, not vertically, with

    each row representing a case on the Y axis, while the X axis is a rescaled version of theproximity coefficients. Cases with low distance/high similarity are close together. Cases

    showing low distance are close, with a line linking them a short distance from the left of

    the dendogram, indicating that they are agglomerated into a cluster at a low distancecoefficient, indicating alikeness. When, on the other hand, the linking line is to the right

    of the dendogram the linkage occurs a high distance coefficient, indicating the

    cases/clusters were agglomerated even though much less alike. If a similarity measure isused rather than a distance measure, the rescaling of the X axis still produces a diagram

    with linkages involving high alikeness to the left and low alikeness to the right. In SPSS,

    select Analyze, Classify, Hierarchical Cluster; click the Plots button, check theDendogram checkbox.

    What is Hierarchical Cluster Analysis ?

    Hierarchical clustering is appropriate for smaller samples (typically < 250). To

    accomplish hierarchical clustering, the researcher must specify how similarity or distance

    is defined, how clusters are aggregated (or divided), and how many clusters are needed.Hierarchical clustering generates all possible clusters of sizes 1...K, but is used only for

    relatively small samples. In hierarchical clustering, the clusters are nested rather than

    being mutually exclusive, as is the usual case..That is, in hierarchical clustering, larger

    clusters created at later stages may contain smaller clusters created at earlier stages ofagglomeration.

    One may wish to use the hierarchical cluster procedure on a sample of cases (ex., 200) to

    inspect results for different numbers of clusters. The optimum number of clustersdepends on the research purpose. Identifying "typical" types may call for few clusters and

    identifying "exceptional" types may call for many clusters. After using hierarchical

    clustering to determine the desired number of clusters, the researcher may wish then toanalyze the entire dataset with k-means clustering (aka, the Quick Cluster procedure:

    Analyze, Cluster, K-Means Cluster Analysis), specifying that number of clusters.

    Forward clustering, also called agglomerative clustering: Small clusters are formed byusing a high similarity index cut-off (ex., .9). Then this cut-off is relaxed to establishbroader and broader clusters in stages until all cases are in a single cluster at some low

    similarity index cut-off. The merging of clusters is visualized using a tree format.

    Backward clustering, also called divisive clustering, is the same idea, but starting with a

    low cut-off and working toward a high cut-off. Forward and backward methods need notgenerate the same results.

    15

  • 8/3/2019 BRM Multivariate Notes

    16/22

    Clustering variables. In the Hierarchical Cluster dialog, in the Cluster group, the

    researcher may selected Variable rather than the usual Cases, in order to cluster variables.

    SPSS calls hierarchical clustering the "Cluster procedure." In SPSS, select Analyze,Classify, Hierarchical Cluster; select variables; select Cases in the Cluster group click

    Statistics, select Proximity Matrix; select Range of Solutions in the Cluster Membershipgroup, specify the number of clusters (typically 3 to 6); Continue; OK.

    What is K-means Cluster Analysis ?

    K-means cluster analysis. K-means cluster analysis uses Euclidean distance. Theresearcher must specify in advance the desired number of clusters, K. Initial cluster

    centers are chosen in a first pass of the data, then each additional iteration groups

    observations based on nearest Euclidean distance to the mean of the cluster. Clustercenters change at each pass. The process continues until cluster means do not shift more

    than a given cut-off value or the iteration limit is reached.

    Cluster centers are the average value on all clustering variables of each cluster's

    members. The "Initial cluster centers," in spite of its title, gives the average value of eachvariable for each cluster for the k well-spaced cases which SPSS selects for initialization

    purposes when no initial file is supplied. The "Final cluster centers" table in SPSS output

    gives the same thing for the last iteration step. The "Iteration history" table shows thechange in cluster centers when the usual iterative approach is taken. When the change

    drops below a specified cutoff, the iterative process stops and cases are assigned to

    clusters according to which cluster center they are nearest.

    Large datasets are possible with K-means clustering, unlike hierarchical clustering,

    because K-means clustering does not require prior computation of a proximity matrix ofthe distance/similarity of every case with every other case.

    Method. The default method is "Iterate and classify," under which an interative process is

    used to update cluster centers, then cases are classified based on the updated centers.However, SPSS supports a "Classify only" method, under which cases are immediately

    classified based on the initial cluster centers, which are not updated.

    Agglomerative K-means clustering. Normally in K-means clustering, a given case may be

    assigned to a cluster, then reassigned to a different cluster as the algorithm unfolds.However, in agglomerative K-means clustering, the solution is constrained to force a

    given case to remain in its initial cluster.

    SPSS: Analyze, Cluster, K-Means Cluster Analysis; enter variables in the Variables: area;

    optionally, enter a variable in the "Label cases by:" area; enter "Number of clusters:";choose Method: Iiterate and classify, or just Classify); Unlike hierarchical clustering,

    there is no option for "Range of solutions"; instead you must re-run K-means clustering,

    asking for a different number of clusters.

    16

  • 8/3/2019 BRM Multivariate Notes

    17/22

    Iterate button. Optionally, you may press the Iterate button and set the number of

    iterations and the convergence criterion. The default maximum number of iterations in

    SPSS is 10. For the convergence criterion, by default, iterations terminate if the largestchange in any cluster center is less than 2% of the minimum distance between initial

    centers (or if the maximum number of iterations has been reached). To override this

    default, enter a positive number less than or equal to 1 in the convergence box. There isalso a "Use running means" checkbox which, if checked, will cause the clulster centers to

    be updated after each case is classified, not the default, which is after the entire set of

    cases is classified.

    Save button: Optionally, you may press the Save button to save the final cluster numberof each case as an added column in your dataset (labeled QCL_1), and/or you may save

    the Euclidean distance between each case and its cluster center (labeled QCL_2) by

    checking "Distance from cluster center."

    Options button: Optionally, you may press the Options button to select statistics or

    missing values options. There are three statistics options: "Initial cluster centers" (givesthe initial variable means for each clusters); ANOVA table (ANOVA F-tests for each

    variable., but as the F tests are only descriptive, the resulting probabilities are forexploratory purposes only; nonetheless, non-significant variables might be dropped as not

    contributing to the differentiation of clusters); and "Cluster information for each case"

    (gives each case's final cluster assignment and the Euclidean distance between the caseand the cluster center; also gives the Euclidean distance between final cluster centers).

    Getting different clusters. Sometimes the researcher wishes to experiment to get different

    clusters, as when the "Number of cases in each cluster" table shows highly imbalanced

    clusters and/or clusters with very few members. Different results may occur by setting

    different initial cluster centers from file (see above), by changing the number of clustersrequested, or even by presenting the data file in different case order.

    What is Two-Step Cluster Analysis ?

    Two-step cluster analysis groups cases into pre-clusters which are treated as single cases.

    Standard hierarchical clustering is then applied to the pre-clusters in the second step. Thisis the method used when one or more of the variables are categorical (not interval or

    dichotomous). Also, since it is a method requiring neither a proximity table like

    hierarchical classification nor an iterative process like K-means clustering, but rather is aone-pass-through-the-dataset method, it is recommended for very large datasets.

    Cluster feature tree.. The preclustering stage employs a CFtree with nodes leading to leaf

    nodes. Cases start at the root node and are channeled toward nodes and eventually leaf

    nodes which match it most closely. If there is no adequate match, the case is used to startits own leaf node. It can happen that the CFtree fills up and cannot accept new leaf entries

    in a node, in which case it is split using the most-distant pair in the node as seeds. If this

    recursive process grows the CFtree beyond maximum size, the threshold distance isincreased and the tree is rebuilt, allowing new cases to be input. The process continues

    17

  • 8/3/2019 BRM Multivariate Notes

    18/22

    until all the data are read. Click the Advanced button in the Options button dialog to set

    threshold distances, maximum levels, and maximum branches per leaf node manually.

    Proximity. When one or more of the variables are categorical, log-likelihood is thedistance measure used, with cases categorized under the cluster which is associated with

    the largest log-likelihood. If variables are all continuous, Euclidean distance is used, withcases categorized under the cluster which is associated with the smallest Euclidean

    distance.

    Number of clusters. By default SPSS determines the number of clusters using the change

    in BIC (the Schwarz Bayesian Criterion: when BIC change is small, it stops and selects

    as many clusters as thus far created. It is also possible to have this done based on changes

    in AIC (the Akaike Information Criterion), or to simply to tell SPSS how many clustersare wanted. The researcher can also ask for a range of solutions, such as 3-5 clusters. The

    "Autoclustering statistics" table in SPSS output gives, for example, BIC and BIC change

    for all solutions.

    SPSS. Choose Analyze, Classify, Two-Step Cluster; select your categorical and

    continuous variables; if desired, click Plots and select the plots wanted; Click Output and

    select the statistics wanted (descriptive statistics, cluster frequencies, AIC or BIC);

    Continue

    Adapted from

    http://faculty.chass.ncsu.edu/garson/PA765/cluster.htm

    www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdf

    Suggested Readings:

    Anil K. Jain, Richard C. Dubes,Algorithms for Clustering Data ,2004

    Leonard Kaufman, Peter J. Rousseeuw,Finding Groups In Data: An Introduction

    To Cluster Analysis,2005

    18

    http://faculty.chass.ncsu.edu/garson/PA765/cluster.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.bestwebbuys.com/Algorithms_for_Clustering_Data-ISBN_9780130222787.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://faculty.chass.ncsu.edu/garson/PA765/cluster.htmhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss8.pdfhttp://www.bestwebbuys.com/Algorithms_for_Clustering_Data-ISBN_9780130222787.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-searchhttp://www.bestwebbuys.com/Finding_Groups_In_Data-ISBN_9780471735786.html?isrc=b-search
  • 8/3/2019 BRM Multivariate Notes

    19/22

    Factor Analysis

    Q-1 What is Factor Analysis?

    Factor analysis is a correlational technique to determine meaningful clusters ofshared variance.

    Factor Analysis should be driven by a researcher who has a deep and genuine

    interest in relevant theory in order to get optimal value from choosing the righttype of factor analysis and interpreting the factor loadings.

    Factor analysis beings begins with a large number of variables and then tries to

    reduce the interrelationships amongst the variables to a few number of clusters orfactors.

    Factor analysis finds relationships or natural connections where variables are

    maximally correlated with one another and minimally correlated with other

    variables, and then groups the variables accordingly.

    After this process has been done many times a pattern appears of relationships orfactors that capture the essence of all of the data emerges.

    Summary: Factor analysis refers to a collection of statistical methods for reducingcorrelational data into a smaller number of dimensions or factors

    Key Concepts and Terms

    Exploratory factor analysis (EFA) seeks to uncover the underlying structure of arelatively large set of variables. The researcher's priori assumption is that any indicator

    may be associated with any factor. This is the most common form of factor analysis.

    There is no prior theory and one uses factor loadings to intuit the factor structure of the

    data.

    Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the

    loadings of measured (indicator) variables on them conform to what is expected on the

    basis of pre-established theory. Indicator variables are selected on the basis of priortheory and factor analysis is used to see if they load as predicted on the expected number

    of factors. The researcher's priori assumption is that each factor (the number and labels

    of which may be specified priori) is associated with a specified subset of indicatorvariables. A minimum requirement of confirmatory factor analysis is that one

    hypothesize beforehand the number of factors in the model, but usually also the

    researcher will posit expectations about which variables will load on which factors (Kim

    and Mueller, 1978b: 55). The researcher seeks to determine, for instance, if measurescreated to represent a latent variable really belong together.

    Factor loadings: The factor loadings, also called component loadings in PCA, are the

    correlation coefficients between the variables (rows) and factors (columns). Analogous toPearson's r, the squared factor loading is the percent of variance in that variable explained

    by the factor. To get the percent of variance in all the variables accounted for by each

    factor, add the sum of the squared factor loadings for that factor (column) and divide by

    19

  • 8/3/2019 BRM Multivariate Notes

    20/22

    the number of variables. (Note the number of variables equals the sum of their variances

    as the variance of a standardized variable is 1.) This is the same as dividing the factor's

    eigenvalue by the number of variables.

    Communality, h2, is thesquared multiple correlation for the variable as dependent using

    the factors as predictors. The communality measures the percent of variance in a givenvariable explained by all the factors jointly and may be interpreted as the reliability of the

    indicator. When an indicator variable has a low communality, the factor model is notworking well for that indicator and possibly it should be removed from the model.

    However, communalities must be interpreted in relation to the interpretability of the

    factors. A communality of .75 seems high but is meaningless unless the factor on whichthe variable is loaded is interpretable, though it usually will be. A communality of .25

    seems low but may be meaningful if the item is contributing to a well-defined factor.

    That is, what is critical is not the communality coefficient per se, but rather the extent towhich the item plays a role in the interpretation of the factor, though often this role is

    greater when communality is high

    Eigenvalues: Also called characteristic roots. The eigenvalue for a given factor

    measures the variance in all the variables which is accounted for by that factor. The ratioof eigenvalues is the ratio of explanatory importance of the factors with respect to the

    variables. If a factor has a low eigenvalue, then it is contributing little to the explanation

    of variances in the variables and may be ignored as redundant with more importantfactors.

    Thus, eigenvalues measure the amount of variation in the total sample accounted for by

    each factor. Note that the eigenvalue is not the percent of variance explained but rather a

    measure of amount of variance in relation to total variance (since variables are

    standardized to have means of 0 and variances of 1, total variance is equal to the numberof variables). SPSS will output a corresponding column titled '% of variance'. A factor's

    eigenvalue may be computed as the sum of its squared factor loadings for all the

    variables.

    Q-2 What are the criteria for determining the number of factors, roughly in the

    order of frequency of use in social science (see Dunteman, 1989: 22-3) .

    Kaiser criterion: A common rule of thumb for dropping the least important factors from

    the analysis. The Kaiser rule is to drop all components with eigenvalues under 1.0. Kaisercriterion is the default in SPSS and most computer programs.

    Scree plot: The Cattell scree test plots the components as the X axis and the

    corresponding eigenvalues as the Y axis. As one moves to the right, toward later

    components, the eigenvalues drop. When the drop ceases and the curve makes an elbowtoward less steep decline, Cattell's scree test says to drop all further components after the

    one starting the elbow. This rule is sometimes criticised for being amenable to researcher-

    controlled "fudging." That is, as picking the "elbow" can be subjective because the curvehas multiple elbows or is a smooth curve, the researcher may be tempted to set the cut-off

    20

  • 8/3/2019 BRM Multivariate Notes

    21/22

    at the number of factors desired by his or her research agenda. Even when "fudging" is

    not a consideration, the scree criterion tends to result in more factors than the Kaiser

    criterion.

    Variance explained criteria: Some researchers simply use the rule of keeping enough

    factors to account for 90% (sometimes 80%) of the variation. Where the researcher's goalemphasizes parsimony (explaining variance with as few factors as possible), the criterion

    could be as low as 50%.

    Q-3 What are the different rotation methods used in factor analysis?

    Ans:

    No rotation is the default, but it is a good idea to select a rotation method, usually

    varimax. The original, unrotated principal components solution maximizes the sum of

    squared factor loadings, efficiently creating a set of factors which explain as much of the

    variance in the original variables as possible. The amount explained is reflected in thesum of the eigenvalues of all factors. However, unrotated solutions are hard to interpret

    because variables tend to load on multiple factors.

    Varimax rotation is an orthogonal rotation of the factor axes to maximize the varianceof the squared loadings of a factor (column) on all the variables (rows) in a factor matrix,

    which has the effect of differentiating the original variables by extracted factor. Each

    factor will tend to have either large or small loadings of any particular variable. Avarimax solution yields results which make it as easy as possible to identify each variable

    with a single factor. This is the most common rotation option.

    Quartimax rotation is an orthogonal alternative which minimizes the number of factorsneeded to explain each variable. This type of rotation often generates a general factor onwhich most variables are loaded to a high or medium degree. Such a factor structure is

    usually not helpful to the research purpose.

    Q-4 How many cases are required to do factor analysis?

    There is no scientific answer to this question, and methodologists differ. Alternativearbitrary "rules of thumb," in descending order of popularity, include those below. These

    are not mutually exclusive: Bryant and Yarnold, for instance, endorse both STV and the

    Rule of 200.

    Rule of 10. There should be at least 10 cases for each item in the instrument being used.

    STV ratio. The subjects-to-variables ratio should be no lower than 5 (Bryant andYarnold, 1995)

    21

  • 8/3/2019 BRM Multivariate Notes

    22/22

    Rule of 100: The number of subjects should be the larger of 5 times the number of

    variables, or 100. Even more subjects are needed when communalities are low and/or few

    variables load on each factor. (Hatcher, 1994)

    Rule of 150: Hutcheson and Sofroniou (1999) recommends at least 150 - 300 cases, more

    toward the 150 end when there are a few highly correlated variables, as would be the casewhen collapsing highly multicollinear variables.

    Q-5 What is "sampling adequacy" and what is it used for?

    Measured by the Kaiser-Meyer-Olkin (KMO) statistics, sampling adequacy predicts ifdata are likely to factor well, based on correlation and partial correlation. In the old days

    of manual factor analysis, this was extremely useful. KMO can still be used, however, to

    assess which variables to drop from the model because they are too multicollinear.

    There is a KMO statistic for each individual variable, and their sum is the KMO overall

    statistic. KMO varies from 0 to 1.0 and KMO overall should be .60 or higher to proceedwith factor analysis. If it is not, drop the indicator variables with the lowest individual

    KMO statistic values, until KMO overall rises above .60.

    To compute KMO overall, the numerator is the sum of squared correlations of allvariables in the analysis (except the 1.0 self-correlations of variables with themselves, of

    course). The denominator is this same sum plus the sum of squared partial correlations of

    each variable i with each variable j, controlling for others in the analysis. The concept isthat the partial correlations should not be very large if one is to expect distinct factors to

    emerge from factor analysis.

    In SPSS, KMO is found under Analyze - Statistics - Data Reduction - Factor - Variables(input variables) - Descriptives - Correlation Matrix - check KMO and Bartlett's test ofsphericity and also check Anti-image - Continue - OK. The KMO output is KMO overall.

    The diagonal elements on the Anti-image correlation matrix are the KMO individual

    statistics for each variable.

    Adapted from:

    http://faculty.chass.ncsu.edu/garson/PA765/fact spss .htm

    www.sussex.ac.uk/Users/andyf/factor.pdf

    www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdf

    Suggested Readings

    Bruce Thompson, Exploratory and Confirmatory Factor Analysis: Understanding

    Concepts and Applications, 2004

    http://faculty.chass.ncsu.edu/garson/PA765/factspss.htmhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdfhttp://faculty.chass.ncsu.edu/garson/PA765/factspss.htmhttp://www.sussex.ac.uk/Users/andyf/factor.pdfhttp://www.cs.uu.nl/docs/vakken/arm/SPSS/spss7.pdf