Chapter 13 MULTIVARIATE STATISTICAL ANALYSIS IAnalyzing Criterion-Predictor AssociationIn the previous chapters that discussed analysis of differences and association for twovariables, we determined that selecting the appropriate method of analysis requires evaluating the number of dependent and independent variables and their level of measurement. The same process applies to multivariate analysis, as is illustrated in Figure 13.1. Association implies only that two or more variables tend to change together to a greater or lesser degree, depending upon the degree of association involved. If we measure the amount of mutual change and find it to be persistent in both direction and degree, we may not conclude that there is necessarily a causal relationship, such that one variable is dependent (the effect) and the other variable (or variables) is independent (the deterministic or probabilistic cause). It should be understood, however, that association does not imply causation. However, if a set of variables are causally related, they will be associated in some way. Analytical techniques such as causal-path analysis and structural equation models have been developed to aid in examining possible causal relations in correlational data. These are discussed in depth by Blalock (1962), Bagozzi (1980), and Monroe and Petroshius (n.d.). In this and the next chapter, we present a series of brief discussions that focus on individual multivariate techniques. Multivariate analyses arise when more than two variables are to be analyzed at the same time. We begin by extending simple bivariate regression discussed in the last chapter to include a second predictor variable. This use of multiple predictors allows the researcher to conduct both multiple and partial regression. We next focus on two-group discriminant analysis. Like multiple regression, discriminant analysis is a predictor model. However, the dependent variable is categorical in measurement and defines two or more groups. In discriminant analysis, group membership (such as purchaser and non-purchaser groups) is predicted using the multiple independent variables. Finally, a brief conceptual introduction is given to a diverse group of techniques, including canonical correlation, correspondence analysis, Chi-Square Automatic Interaction Detection (CHAID), and probit and logit analyses. We start with a broad overview of multivariate procedures.
AN OVERVIEW OF MULTIVARIATE PROCEDURESThe Data Matrix The raw input to any analysis of associative data consists of the data matrix, whose informational content is to be summarized and portrayed in some way. For example, the computation of the mean and standard deviation of a single variable is often done simply to obtain a summary of the meaning of the entire set of values. In so doing, we choose to forgo the full information provided by the data in order to understand some of its basic characteristics, such as central tendency and dispersion. In virtually all marketing research studies we are concerned with variation in some characteristic, be it per capita consumption of soft drinks, TV viewing frequency, or customer intention to repurchase. Our objective now, however, is to concentrate on explaining the variation in one variable or group of variables in terms of covariation with other variables. When327Scott M. Smith and Gerald S. Albaum, An Introduction to Marketing Research, 2010
we analyze associative data, we hope to explain variation, achieving one or more of the following:1. Determine the overall strength of association between the criterion and predictor variables (often called dependent and independent variables, respectively) 2. Determine a function or formula by which we can estimate values of the criterion variable(s) from values of the predictor variable(s) 3. Determine the statistical confidence in either or both of the above
In some cases of interest, however, we may have no criterion (dependent) variable. We may still be interested in the interdependence of a group of variables as a whole and the possibility of summarizing information provided by this interdependence.Figure 13.1 Process for Selecting Appropriate Method of Analysis
A Classification of Techniques for Analyzing Associative Data The field of associative data analysis is vast; hence, it seems useful to enumerate various descriptors by which the field can be classified. A conceptual illustration is shown in Table 13.1. The key notion underlying our classification is the data matrix. We note that a data matrix is often thought of as being in spreadsheet form (referred as a flat data matrix) and consists of a set of objects (the n rows) and a set of measurements on those objects (the m columns). The row objects may be people, things, concepts, or events. The column variables are characteristics of the objects. The cell values represent the state of object i with respect to variable j. Cell values may consist of nominal-, ordinal-, interval-, or ratio-scaled measurements, or various combinations of these as we go across columns. When using data analysis software, the data file created is structured exactly like the basic data matrix, which is, in effect, a type of spreadsheet. Note however that some analysis software also has the ability to read relational databases that are328Scott M. Smith and Gerald S. Albaum, An Introduction to Marketing Research, 2010
not in the traditional row x column spreadsheet format. However for relational database format data, the concepts of objects and variables still apply.Table 13.1Object 1 2 3 . . . i . . . n 1 X11 X21 X31 . . . Xi1 . . . Xn1
Illustrative Basic Data Matrix2 X12 X22 X32 . . . Xi2 . . . Xn2 3 X12 X23 X33 . . . Xi3 . . . Xn3 Variable ... j . . . X1j . . . X2j . . . X3j . . . . . . Xij . . . . . . Xnj ... ... ... ... m X1m X2m X3m . . . Xim . . . Xnm
By reflecting on the steps in the research process discussed in Chapter 2, we can gain a perspective on the many available approaches to analysis of associate data. Although not exhaustive (or exclusive), the following represent the more common perspectives by which analysis activities can be directed:1. Purpose of the study and types of assertions desired by the researcher: What kinds of statementsdescriptive or inferentialdoes the researcher wish to make? 2. Focus of research: Is the emphasis on the objects (the whole profile or bundle of variables), the variables, or both? 3. Nature of the prior judgments assumed by the researcher as to how the data matrix should be partitioned (subdivided) into a number of subsets of variables. 4. Number of variables in each of the partitioned subsets: How many criterion versus predictor variables? 5. Type of association under study: Is the association linear, transformable to linear, or nonlinear? 6. Scales by which variables are measured: Are the scales nominal, ordinal, interval, ratio, or mixed?
All of these descriptors require certain decisions of the researcher, three of which are of immediate interest in determining the form of analysis: 1. Whether the objects or variables of the data matrix are partitioned into subsets or kept intact 2. If the matrix is partitioned into subsets of criterion and predictor variables, the number of variables in each subset 3. Whether the variables are nominal-scaled or interval-scaled 4. The relationships between the variables (linear, transformation to linear, or non-linear). Most decisions about associative data analysis are based on the researchers private model of how the data are interrelated and what features are useful for study. The choice of329Scott M. Smith and Gerald S. Albaum, An Introduction to Marketing Research, 2010
various public models for analysis (multiple regression, discriminant analysis, etc.) is predicated upon prior knowledge of both the characteristics of the statistical universe from which the data were obtained and the assumption structure and objectives associated with each statistical technique. Fortunately, we can make a few simplifications of the preceding descriptors. First, concerning types of scales, all the multivariate techniques of this book require no stronger measurement than interval scaling. Second, except for the ordinal scaling methods of multidimensional scaling and conjoint analysis (to be discussed in Chapter 13), we shall assume that (a) the variables are either nominal-scaled or interval-scaled, and (b) the functional form is linear in the parameters. While the original data may have been transformed by some nonlinear transformation (e.g., logarithmic), linear in the parameters means that all computed parameters after the transformation are of the first degree. For example, the function Y = b0 + b1X1 + b2X2 + . . . + bmXm is linear in the parameters. Note that b0 is a constant, while the other parameters, b1, b2, . . ., bm are all of the first degree. Moreover, none of the bjs depends on the value of either its own Xj or any other Xk (k j). Even with these simplifying assumptions, we shall be able to describe a wide variety of possible techniques, and these may vary from being relatively simple, (as shown in Exhibit 13.1), to complex. Analysis of Dependence If we elect to partition the data matrix into criterion and predictor variables, the problem becomes one of analyzing dependence structures. This, in turn, can be broken down into subcategories:1. 2.
Single criterion/multiple predictor association Multiple criterion/multiple predictor association
Multivariate techniques that deal with single criterion/multiple predictor association include multiple regression, analysis of variance and covariance, two-group discriminant analysis, and CHAID (chi-square automatic interaction detection). Multivariate techniques that deal with multiple criterion/multiple predictor association include canonical correlation, multivariate analy