718
Applied Multivariate Analysis Neil H. Timm SPRINGER

Applied Multivariate Analysis - UM Palangkaraya · Preface Univariate statistical analysis is concerned with techniques for the analysis of a single random variable. This book is

Embed Size (px)

Citation preview

Applied MultivariateAnalysis

Neil H. Timm

SPRINGER

Springer Texts in Statistics

Advisors:George Casella Stephen Fienberg Ingram Olkin

SpringerNew YorkBerlinHeidelbergBarcelonaHong KongLondonMilanParisSingaporeTokyo

This page intentionally left blank

Neil H. Timm

Applied MultivariateAnalysis

With 42 Figures

Neil H. TimmDepartment of Education in PsychologySchool of EducationUniversity of PittsburghPittsburgh, PA [email protected]

Editorial BoardGeorge Casella Stephen Fienberg Ingram OlkinDepartment of Statistics Department of Statistics Department of StatisticsUniversity of Florida Carnegie Mellon University Stanford UniversityGainesville, FL 32611-8545 Pittsburgh, PA 15213-3890 Stanford, CA 94305USA USA USA

Library of Congress Cataloging-in-Publication DataTimm, Neil H.

Applied multivariate analysis / Neil H. Timm.p. cm. (Springer texts in statistics)

Includes bibliographical references and index.ISBN 0-387-95347-7 (alk. paper)1. Multivariate analysis. I. Title. II. Series.QA278 .T53 2002519.535dc21 2001049267

ISBN 0-387-95347-7 Printed on acid-free paper.

c2002 Springer-Verlag New York, Inc.All rights reserved. This work may not be translated or copied in whole or in part without the written permissionof the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except forbrief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of informationstorage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now knowor hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, andsimilar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether ornot they are subject to proprietary rights.

Printed in the United States of America.

9 8 7 6 5 4 3 2 1 SPIN 10848751

www.springer-ny.com

Springer-Verlag New York Berlin HeidelbergA member of BertelsmannSpringer Science+Business Media GmbH

To my wifeVerena

This page intentionally left blank

Preface

Univariate statistical analysis is concerned with techniques for the analysis of a singlerandom variable. This book is about applied multivariate analysis. It was written to pro-vide students and researchers with an introduction to statistical techniques for the analy-sis of continuous quantitative measurements on several random variables simultaneously.While quantitative measurements may be obtained from any population, the material in thistext is primarily concerned with techniques useful for the analysis of continuous observa-tions from multivariate normal populations with linear structure. While several multivariatemethods are extensions of univariate procedures, a unique feature of multivariate data anal-ysis techniques is their ability to control experimental error at an exact nominal level and toprovide information on the covariance structure of the data. These features tend to enhancestatistical inference, making multivariate data analysis superior to univariate analysis.

While in a previous edition of my textbook on multivariate analysis, I tried to precedea multivariate method with a corresponding univariate procedure when applicable, I havenot taken this approach here. Instead, it is assumed that the reader has taken basic coursesin multiple linear regression, analysis of variance, and experimental design. While studentsmay be familiar with vector spaces and matrices, important results essential to multivariateanalysis are reviewed in Chapter 2. I have avoided the use of calculus in this text. Emphasisis on applications to provide students in the behavioral, biological, physical, and socialsciences with a broad range of linear multivariate models for statistical estimation andinference, and exploratory data analysis procedures useful for investigating relationshipsamong a set of structured variables. Examples have been selected to outline the processone employs in data analysis for checking model assumptions and model development, andfor exploring patterns that may exist in one or more dimensions of a data set.

To successfully apply methods of multivariate analysis, a comprehensive understand-ing of the theory and how it relates to a flexible statistical package used for the analysis

viii Preface

has become critical. When statistical routines were being developed for multivariate dataanalysis over twenty years ago, developing a text using a single comprehensive statisticalpackage was risky. Now, companies and software packages have stabilized, thus reduc-ing the risk. I have made extensive use of the Statistical Analysis System (SAS) in thistext. All examples have been prepared using Version 8 for Windows. Standard SAS pro-cedures have been used whenever possible to illustrate basic multivariate methodologies;however, a few illustrations depend on the Interactive Matrix Language (IML) procedure.All routines and data sets used in the text are contained on the Springer-Verlag Web site,http://www.springer-ny.com/detail.tpl?ISBN=0387953477 and the authors University ofPittsburgh Web site, http://www.pitt.edu/timm.

Acknowledgments

The preparation of this text has evolved from teaching courses and seminars in appliedmultivariate statistics at the University of Pittsburgh. I am grateful to the University ofPittsburgh for giving me the opportunity to complete this work. I would like to express mythanks to the many students who have read, criticized, and corrected various versions ofearly drafts of my notes and lectures on the topics included in this text. I am indebted tothem for their critical readings and their thoughtful suggestions. My deepest appreciationand thanks are extended to my former student Dr. Tammy A. Mieczkowski who read theentire manuscript and offered many suggestions for improving the presentation. I also wishto thank the anonymous reviewers who provided detail comments on early drafts of themanuscript which helped to improve the presentation. However, I am responsible for anyerrors or omissions of the material included in this text. I also want to express specialthanks to John Kimmel at Springer-Verlag. Without his encouragement and support, thisbook would not have been written.

This book was typed using Scientific WorkPlace Version 3.0. I wish to thank Dr. MelissaHarrison, Ph.D., of Far Field Associates who helped with the LATEX commands used toformat the book and with the development of the author and subject indexes. This book hastaken several years to develop and during its development it went through several revisions.The preparation of the entire manuscript and every revision was performed with great careand patience by Mrs. Roberta S. Allan, to whom I am most grateful. I am also especiallygrateful to the SAS Institute for permission to use the Statistical Analysis System (SAS) inthis text. Many of the large data sets analyzed in this book were obtained from the Data andStory Library (DASL) sponsored by Cornell University and hosted by the Department ofStatistics at Carnegie Mellon University (http://lib.stat.cmu.edu/DASL/). I wish to extendmy thanks and appreciation to these institutions for making available these data sets forstatistical analysis. I would also like to thank the authors and publishers of copyrighted

x Acknowledgments

material for making available the statistical tables and many of the data sets used in thisbook.

Finally, I extend my love, gratitude, and appreciation to my wife Verena for her patience,love, support, and continued encouragement throughout this project.

Neil H. Timm, ProfessorUniversity of Pittsburgh

Contents

Preface vii

Acknowledgments ix

List of Tables xix

List of Figures xxiii

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Multivariate Models and Methods . . . . . . . . . . . . . . . . . . . . . 11.3 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Vectors and Matrices 72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Vectors, Vector Spaces, and Vector Subspaces . . . . . . . . . . . . . . . 7

a. Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7b. Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8c. Vector Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces . . . . . . . . . . 12a. Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13b. Lengths, Distances, and Angles . . . . . . . . . . . . . . . . . . . . . 13c. Gram-Schmidt Orthogonalization Process . . . . . . . . . . . . . . . 15d. Orthogonal Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 17e. Vector Inequalities, Vector Norms, and Statistical Distance . . . . . . 21

xii Contents

2.4 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 25a. Equality, Addition, and Multiplication of Matrices . . . . . . . . . . . 26b. Matrix Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . 28c. Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 29d. Trace and the Euclidean Matrix Norm . . . . . . . . . . . . . . . . . 30e. Kronecker and Hadamard Products . . . . . . . . . . . . . . . . . . . 32f. Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35g. The Vec() and Vech() Operators . . . . . . . . . . . . . . . . . . . . 35

2.5 Rank, Inverse, and Determinant . . . . . . . . . . . . . . . . . . . . . . . 41a. Rank and Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41b. Generalized Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 47c. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2.6 Systems of Equations, Transformations, and Quadratic Forms . . . . . . . 55a. Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 55b. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 61c. Projection Transformations . . . . . . . . . . . . . . . . . . . . . . . 63d. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . 67e. Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71f. Quadratic Forms and Extrema . . . . . . . . . . . . . . . . . . . . . 72g. Generalized Projectors . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.7 Limits and Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3 Multivariate Distributions and the Linear Model 793.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.2 Random Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . 793.3 The Multivariate Normal (MVN) Distribution . . . . . . . . . . . . . . . 84

a. Properties of the Multivariate Normal Distribution . . . . . . . . . . . 86b. Estimating and . . . . . . . . . . . . . . . . . . . . . . . . . . . 88c. The Matrix Normal Distribution . . . . . . . . . . . . . . . . . . . . 90

3.4 The Chi-Square and Wishart Distributions . . . . . . . . . . . . . . . . . 93a. Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 93b. The Wishart Distribution . . . . . . . . . . . . . . . . . . . . . . . . 96

3.5 Other Multivariate Distributions . . . . . . . . . . . . . . . . . . . . . . 99a. The Univariate t and F Distributions . . . . . . . . . . . . . . . . . . 99b. Hotellings T 2 Distribution . . . . . . . . . . . . . . . . . . . . . . . 99c. The Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 101d. Multivariate t , F , and 2 Distributions . . . . . . . . . . . . . . . . . 104

3.6 The General Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . 106a. Regression, ANOVA, and ANCOVA Models . . . . . . . . . . . . . . 107b. Multivariate Regression, MANOVA, and MANCOVA Models . . . . 110c. The Seemingly Unrelated Regression (SUR) Model . . . . . . . . . . 114d. The General MANOVA Model (GMANOVA) . . . . . . . . . . . . . 115

3.7 Evaluating Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.8 Tests of Covariance Matrices . . . . . . . . . . . . . . . . . . . . . . . . 133

a. Tests of Covariance Matrices . . . . . . . . . . . . . . . . . . . . . . 133

Contents xiii

b. Equality of Covariance Matrices . . . . . . . . . . . . . . . . . . . . 133c. Testing for a Specific Covariance Matrix . . . . . . . . . . . . . . . . 137d. Testing for Compound Symmetry . . . . . . . . . . . . . . . . . . . . 138e. Tests of Sphericity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139f. Tests of Independence . . . . . . . . . . . . . . . . . . . . . . . . . . 143g. Tests for Linear Structure . . . . . . . . . . . . . . . . . . . . . . . . 145

3.9 Tests of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149a. Two-Sample Case, 1 = 2 = . . . . . . . . . . . . . . . . . . . 149b. Two-Sample Case, 1 = 2 . . . . . . . . . . . . . . . . . . . . . . 156c. Two-Sample Case, Nonnormality . . . . . . . . . . . . . . . . . . . . 160d. Profile Analysis, One Group . . . . . . . . . . . . . . . . . . . . . . 160e. Profile Analysis, Two Groups . . . . . . . . . . . . . . . . . . . . . . 165f. Profile Analysis, 1 = 2 . . . . . . . . . . . . . . . . . . . . . . . 175

3.10 Univariate Profile Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 181a. Univariate One-Group Profile Analysis . . . . . . . . . . . . . . . . . 182b. Univariate Two-Group Profile Analysis . . . . . . . . . . . . . . . . . 182

3.11 Power Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

4 Multivariate Regression Models 1854.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.2 Multivariate Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

a. Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . 186b. Multivariate Regression Estimation and Testing Hypotheses . . . . . . 187c. Multivariate Influence Measures . . . . . . . . . . . . . . . . . . . . 193d. Measures of Association, Variable Selection and Lack-of-Fit Tests . . 197e. Simultaneous Confidence Sets for a New Observation ynew

and the Elements of B . . . . . . . . . . . . . . . . . . . . . . . . . . 204f. Random X Matrix and Model Validation: Mean Squared Er-

ror of Prediction in Multivariate Regression . . . . . . . . . . . . . . 206g. Exogeniety in Regression . . . . . . . . . . . . . . . . . . . . . . . . 211

4.3 Multivariate Regression Example . . . . . . . . . . . . . . . . . . . . . . 2124.4 One-Way MANOVA and MANCOVA . . . . . . . . . . . . . . . . . . . 218

a. One-Way MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . 218b. One-Way MANCOVA . . . . . . . . . . . . . . . . . . . . . . . . . 225c. Simultaneous Test Procedures (STP) for One-Way MANOVA

/ MANCOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2304.5 One-Way MANOVA/MANCOVA Examples . . . . . . . . . . . . . . . . 234

a. MANOVA (Example 4.5.1) . . . . . . . . . . . . . . . . . . . . . . . 234b. MANCOVA (Example 4.5.2) . . . . . . . . . . . . . . . . . . . . . . 239

4.6 MANOVA/MANCOVA with Unequal i or Nonnormal Data . . . . . . . 2454.7 One-Way MANOVA with Unequal i Example . . . . . . . . . . . . . . 2464.8 Two-Way MANOVA/MANCOVA . . . . . . . . . . . . . . . . . . . . . 246

a. Two-Way MANOVA with Interaction . . . . . . . . . . . . . . . . . 246b. Additive Two-Way MANOVA . . . . . . . . . . . . . . . . . . . . . 252c. Two-Way MANCOVA . . . . . . . . . . . . . . . . . . . . . . . . . 256

xiv Contents

d. Tests of Nonadditivity . . . . . . . . . . . . . . . . . . . . . . . . . . 2564.9 Two-Way MANOVA/MANCOVA Example . . . . . . . . . . . . . . . . 257

a. Two-Way MANOVA (Example 4.9.1) . . . . . . . . . . . . . . . . . 257b. Two-Way MANCOVA (Example 4.9.2) . . . . . . . . . . . . . . . . 261

4.10 Nonorthogonal Two-Way MANOVA Designs . . . . . . . . . . . . . . . 264a. Nonorthogonal Two-Way MANOVA Designs with and Without

Empty Cells, and Interaction . . . . . . . . . . . . . . . . . . . . . . 265b. Additive Two-Way MANOVA Designs With Empty Cells . . . . . . . 268

4.11 Unbalance, Nonorthogonal Designs Example . . . . . . . . . . . . . . . 2704.12 Higher Ordered Fixed Effect, Nested and Other Designs . . . . . . . . . . 2734.13 Complex Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . 276

a. Nested Design (Example 4.13.1) . . . . . . . . . . . . . . . . . . . . 276b. Latin Square Design (Example 4.13.2) . . . . . . . . . . . . . . . . . 279

4.14 Repeated Measurement Designs . . . . . . . . . . . . . . . . . . . . . . 282a. One-Way Repeated Measures Design . . . . . . . . . . . . . . . . . . 282b. Extended Linear Hypotheses . . . . . . . . . . . . . . . . . . . . . . 286

4.15 Repeated Measurements and Extended Linear Hypotheses Example . . . 294a. Repeated Measures (Example 4.15.1) . . . . . . . . . . . . . . . . . 294b. Extended Linear Hypotheses (Example 4.15.2) . . . . . . . . . . . . 298

4.16 Robustness and Power Analysis for MR Models . . . . . . . . . . . . . . 3014.17 Power CalculationsPower.sas . . . . . . . . . . . . . . . . . . . . . . . 3044.18 Testing for Mean Differences with Unequal Covariance Matrices . . . . . 307

5 Seemingly Unrelated Regression Models 3115.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3115.2 The SUR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

a. Estimation and Hypothesis Testing . . . . . . . . . . . . . . . . . . . 312b. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314

5.3 Seeming Unrelated Regression Example . . . . . . . . . . . . . . . . . . 3165.4 The CGMANOVA Model . . . . . . . . . . . . . . . . . . . . . . . . . . 3185.5 CGMANOVA Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 3195.6 The GMANOVA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

a. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320b. Estimation and Hypothesis Testing . . . . . . . . . . . . . . . . . . . 321c. Test of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324d. Subsets of Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . 324e. GMANOVA vs SUR . . . . . . . . . . . . . . . . . . . . . . . . . . 326f. Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

5.7 GMANOVA Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327a. One Group Design (Example 5.7.1) . . . . . . . . . . . . . . . . . . 328b. Two Group Design (Example 5.7.2) . . . . . . . . . . . . . . . . . . 330

5.8 Tests of Nonadditivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3335.9 Testing for Nonadditivity Example . . . . . . . . . . . . . . . . . . . . . 3355.10 Lack of Fit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3375.11 Sum of Profile Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

Contents xv

5.12 The Multivariate SUR (MSUR) Model . . . . . . . . . . . . . . . . . . . 3395.13 Sum of Profile Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 3415.14 Testing Model Specification in SUR Models . . . . . . . . . . . . . . . . 3445.15 Miscellanea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

6 Multivariate Random and Mixed Models 3516.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3516.2 Random Coefficient Regression Models . . . . . . . . . . . . . . . . . . 352

a. Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 352b. Estimating the Parameters . . . . . . . . . . . . . . . . . . . . . . . . 353c. Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

6.3 Univariate General Linear Mixed Models . . . . . . . . . . . . . . . . . 357a. Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 357b. Covariance Structures and Model Fit . . . . . . . . . . . . . . . . . . 359c. Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361d. Balanced Variance Component Experimental Design Models . . . . . 366e. Multilevel Hierarchical Models . . . . . . . . . . . . . . . . . . . . . 367f. Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368

6.4 Mixed Model Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 369a. Random Coefficient Regression (Example 6.4.1) . . . . . . . . . . . . 371b. Generalized Randomized Block Design (Example 6.4.2) . . . . . . . 376c. Repeated Measurements (Example 6.4.3) . . . . . . . . . . . . . . . 380d. HLM Model (Example 6.4.4) . . . . . . . . . . . . . . . . . . . . . . 381

6.5 Mixed Multivariate Models . . . . . . . . . . . . . . . . . . . . . . . . . 385a. Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 386b. Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 388c. Evaluating Expected Mean Square . . . . . . . . . . . . . . . . . . . 391d. Estimating the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . 392e. Repeated Measurements Model . . . . . . . . . . . . . . . . . . . . . 392

6.6 Balanced Mixed Multivariate Models Examples . . . . . . . . . . . . . . 394a. Two-way Mixed MANOVA . . . . . . . . . . . . . . . . . . . . . . . 395b. Multivariate Split-Plot Design . . . . . . . . . . . . . . . . . . . . . 395

6.7 Double Multivariate Model (DMM) . . . . . . . . . . . . . . . . . . . . 4006.8 Double Multivariate Model Examples . . . . . . . . . . . . . . . . . . . 403

a. Double Multivariate MANOVA (Example 6.8.1) . . . . . . . . . . . . 404b. Split-Plot Design (Example 6.8.2) . . . . . . . . . . . . . . . . . . . 407

6.9 Multivariate Hierarchical Linear Models . . . . . . . . . . . . . . . . . . 4156.10 Tests of Means with Unequal Covariance Matrices . . . . . . . . . . . . . 417

7 Discriminant and Classification Analysis 4197.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4197.2 Two Group Discrimination and Classification . . . . . . . . . . . . . . . 420

a. Fishers Linear Discriminant Function . . . . . . . . . . . . . . . . . 421b. Testing Discriminant Function Coefficients . . . . . . . . . . . . . . 422c. Classification Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 424

xvi Contents

d. Evaluating Classification Rules . . . . . . . . . . . . . . . . . . . . . 4277.3 Two Group Discriminant Analysis Example . . . . . . . . . . . . . . . . 429

a. Egyptian Skull Data (Example 7.3.1) . . . . . . . . . . . . . . . . . . 429b. Brain Size (Example 7.3.2) . . . . . . . . . . . . . . . . . . . . . . . 432

7.4 Multiple Group Discrimination and Classification . . . . . . . . . . . . . 434a. Fishers Linear Discriminant Function . . . . . . . . . . . . . . . . . 434b. Testing Discriminant Functions for Significance . . . . . . . . . . . . 435c. Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437d. Classification Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 438e. Logistic Discrimination and Other Topics . . . . . . . . . . . . . . . 439

7.5 Multiple Group Discriminant Analysis Example . . . . . . . . . . . . . . 440

8 Principal Component, Canonical Correlation, and ExploratoryFactor Analysis 4458.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4458.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . 445

a. Population Model for PCA . . . . . . . . . . . . . . . . . . . . . . . 446b. Number of Components and Component Structure . . . . . . . . . . . 449c. Principal Components with Covariates . . . . . . . . . . . . . . . . . 453d. Sample PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455e. Plotting Components . . . . . . . . . . . . . . . . . . . . . . . . . . 458f. Additional Comments . . . . . . . . . . . . . . . . . . . . . . . . . 458g. Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

8.3 Principal Component Analysis Examples . . . . . . . . . . . . . . . . . . 460a. Test Battery (Example 8.3.1) . . . . . . . . . . . . . . . . . . . . . . 460b. Semantic Differential Ratings (Example 8.3.2) . . . . . . . . . . . . . 461c. Performance Assessment Program (Example 8.3.3) . . . . . . . . . . 465

8.4 Statistical Tests in Principal Component Analysis . . . . . . . . . . . . . 468a. Tests Using the Covariance Matrix . . . . . . . . . . . . . . . . . . . 468b. Tests Using a Correlation Matrix . . . . . . . . . . . . . . . . . . . . 472

8.5 Regression on Principal Components . . . . . . . . . . . . . . . . . . . . 474a. GMANOVA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 475b. The PCA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

8.6 Multivariate Regression on Principal Components Example . . . . . . . . 4768.7 Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . . 477

a. Population Model for CCA . . . . . . . . . . . . . . . . . . . . . . . 477b. Sample CCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482c. Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . 483d. Association and Redundancy . . . . . . . . . . . . . . . . . . . . . . 485e. Partial, Part and Bipartial Canonical Correlation . . . . . . . . . . . . 487f. Predictive Validity in Multivariate Regression using CCA . . . . . . . 490g. Variable Selection and Generalized Constrained CCA . . . . . . . . . 491

8.8 Canonical Correlation Analysis Examples . . . . . . . . . . . . . . . . . 492a. Rohwer CCA (Example 8.8.1) . . . . . . . . . . . . . . . . . . . . . 492b. Partial and Part CCA (Example 8.8.2) . . . . . . . . . . . . . . . . . 494

Contents xvii

8.9 Exploratory Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . 496a. Population Model for EFA . . . . . . . . . . . . . . . . . . . . . . . 497b. Estimating Model Parameters . . . . . . . . . . . . . . . . . . . . . . 502c. Determining Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . 506d. Factor Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507e. Estimating Factor Scores . . . . . . . . . . . . . . . . . . . . . . . . 509f. Additional Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 510

8.10 Exploratory Factor Analysis Examples . . . . . . . . . . . . . . . . . . . 511a. Performance Assessment Program (PAPExample 8.10.1) . . . . . . 511b. Di Vesta and Walls (Example 8.10.2) . . . . . . . . . . . . . . . . . . 512c. Shin (Example 8.10.3) . . . . . . . . . . . . . . . . . . . . . . . . . 512

9 Cluster Analysis and Multidimensional Scaling 5159.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5159.2 Proximity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516

a. Dissimilarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . 516b. Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 519c. Clustering Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 522

9.3 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522a. Agglomerative Hierarchical Clustering Methods . . . . . . . . . . . . 523b. Nonhierarchical Clustering Methods . . . . . . . . . . . . . . . . . . 530c. Number of Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . 531d. Additional Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 533

9.4 Cluster Analysis Examples . . . . . . . . . . . . . . . . . . . . . . . . . 533a. Protein Consumption (Example 9.4.1) . . . . . . . . . . . . . . . . . 534b. Nonhierarchical Method (Example 9.4.2) . . . . . . . . . . . . . . . 536c. Teacher Perception (Example 9.4.3) . . . . . . . . . . . . . . . . . . 538d. Cedar Project (Example 9.4.4) . . . . . . . . . . . . . . . . . . . . . 541

9.5 Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 541a. Classical Metric Scaling . . . . . . . . . . . . . . . . . . . . . . . . 542b. Nonmetric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 544c. Additional Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 547

9.6 Multidimensional Scaling Examples . . . . . . . . . . . . . . . . . . . . 548a. Classical Metric Scaling (Example 9.6.1) . . . . . . . . . . . . . . . . 549b. Teacher Perception (Example 9.6.2) . . . . . . . . . . . . . . . . . . 550c. Nation (Example 9.6.3) . . . . . . . . . . . . . . . . . . . . . . . . . 553

10 Structural Equation Models 55710.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55710.2 Path Diagrams, Basic Notation, and the General Approach . . . . . . . . 55810.3 Confirmatory Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . 56710.4 Confirmatory Factor Analysis Examples . . . . . . . . . . . . . . . . . . 575

a. Performance Assessment 3 - Factor Model (Example 10.4.1) . . . . . 575b. Performance Assessment 5-Factor Model (Example 10.4.2) . . . . . . 578

10.5 Path Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580

xviii Contents

10.6 Path Analysis Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 586a. Community Structure and Industrial Conflict (Example 10.6.1) . . . . 586b. Nonrecursive Model (Example 10.6.2) . . . . . . . . . . . . . . . . . 590

10.7 Structural Equations with Manifest and Latent Variables . . . . . . . . . . 59410.8 Structural Equations with Manifest and Latent Variables Example . . . . 59510.9 Longitudinal Analysis with Latent Variables . . . . . . . . . . . . . . . . 60010.10 Exogeniety in Structural Equation Models . . . . . . . . . . . . . . . . . 604

Appendix 609

References 625

Author Index 667

Subject Index 675

List of Tables

3.7.1 Univariate and Multivariate Normality Tests, Normal DataData Set A, Group 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

3.7.2 Univariate and Multivariate Normality Tests Non-normal Data,Data Set C, Group 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.7.3 Ramus Bone Length Data . . . . . . . . . . . . . . . . . . . . . . . . . 1283.7.4 Effects of Delay on Oral Practice. . . . . . . . . . . . . . . . . . . . . 1323.8.1 Boxs Test of 1 = 22 Approximation. . . . . . . . . . . . . . . . . 1353.8.2 Boxs Test of 1 = 2 F Approximation. . . . . . . . . . . . . . . . . 1353.8.3 Boxs Test of 1 = 22 Data Set B. . . . . . . . . . . . . . . . . . . 1363.8.4 Boxs Test of 1 = 22 Data Set C. . . . . . . . . . . . . . . . . . . 1363.8.5 Test of Specific Covariance Matrix Chi-Square Approximation. . . . . . 1383.8.6 Test of Comparing Symmetry 2 Approximation. . . . . . . . . . . . . 1393.8.7 Test of Sphericity and Circularity 2 Approximation. . . . . . . . . . . 1423.8.8 Test of Sphericity and Circularity in k Populations. . . . . . . . . . . . 1433.8.9 Test of Independence 2 Approximation. . . . . . . . . . . . . . . . . 1453.8.10 Test of Multivariate Sphericity Using Chi-Square and Adjusted

Chi-Square Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 1483.9.1 MANOVA Test Criteria for Testing 1 = 2. . . . . . . . . . . . . . . 1543.9.2 Discriminant Structure Vectors, H : 1 = 2. . . . . . . . . . . . . . . 1553.9.3 T 2 Test of HC : 1 = 2 = 3. . . . . . . . . . . . . . . . . . . . . . 1633.9.4 Two-Group Profile Analysis. . . . . . . . . . . . . . . . . . . . . . . . 1663.9.5 MANOVA Table: Two-Group Profile Analysis. . . . . . . . . . . . . . 1743.9.6 Two-Group Instructional Data. . . . . . . . . . . . . . . . . . . . . . . 1773.9.7 Sample Data: One-Sample Profile Analysis. . . . . . . . . . . . . . . . 179

xx List of Tables

3.9.8 Sample Data: Two-Sample Profile Analysis. . . . . . . . . . . . . . . . 1793.9.9 Problem Solving Ability Data. . . . . . . . . . . . . . . . . . . . . . . 180

4.2.1 MANOVA Table for Testing B1 = 0 . . . . . . . . . . . . . . . . . . . 1904.2.2 MANOVA Table for Lack of Fit Test . . . . . . . . . . . . . . . . . . . 2034.3.1 Rohwer Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2134.3.2 Rohwer Data for Low SES Area . . . . . . . . . . . . . . . . . . . . . 2174.4.1 One-Way MANOVA Table . . . . . . . . . . . . . . . . . . . . . . . . 2234.5.1 Sample Data One-Way MANOVA . . . . . . . . . . . . . . . . . . . . 2354.5.2 FIT Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2394.5.3 Teaching Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2434.9.1 Two-Way MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . 2574.9.2 Cell Means for Example Data . . . . . . . . . . . . . . . . . . . . . . . 2584.9.3 Two-Way MANOVA Table . . . . . . . . . . . . . . . . . . . . . . . . 2594.9.4 Two-Way MANCOVA . . . . . . . . . . . . . . . . . . . . . . . . . . 2624.10.1 Non-Additive Connected Data Design . . . . . . . . . . . . . . . . . . 2664.10.2 Non-Additive Disconnected Design . . . . . . . . . . . . . . . . . . . 2674.10.3 Type IV Hypotheses for A and B for the Connected Design in

Table 4.10.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2684.11.1 Nonorthogonal Design . . . . . . . . . . . . . . . . . . . . . . . . . . 2704.11.2 Data for Exercise 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2734.13.1 Multivariate Nested Design . . . . . . . . . . . . . . . . . . . . . . . . 2774.13.2 MANOVA for Nested Design . . . . . . . . . . . . . . . . . . . . . . . 2784.13.3 Multivariate Latin Square . . . . . . . . . . . . . . . . . . . . . . . . . 2814.13.4 Box Tire Wear Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2824.15.1 Edwards Repeated Measures Data . . . . . . . . . . . . . . . . . . . . 2954.17.1 Power Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . 3064.17.2 Power Calculations1 . . . . . . . . . . . . . . . . . . . . . . . . . 307

5.5.1 SUR Model Tests for Edwards Data . . . . . . . . . . . . . . . . . . . 320

6.3.1 Structured Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . 3606.4.1 Pharmaceutical Stability Data . . . . . . . . . . . . . . . . . . . . . . . 3726.4.2 CGRB Design (Milliken and Johnson, 1992, p. 285) . . . . . . . . . . . 3776.4.3 ANOVA Table for Nonorthogonal CGRB Design . . . . . . . . . . . . 3796.4.4 Drug Effects Repeated Measures Design . . . . . . . . . . . . . . . . . 3806.4.5 ANOVA Table Repeated Measurements . . . . . . . . . . . . . . . . . 3816.5.1 Multivariate Repeated Measurements . . . . . . . . . . . . . . . . . . . 3936.6.1 Expected Mean Square Matrix . . . . . . . . . . . . . . . . . . . . . . 3966.6.2 Individual Measurements Utilized to Assess the Changes in

the Vertical Position and Angle of the Mandible at Three Occasion . . . 3966.6.3 Expected Mean Squares for Model (6.5.17) . . . . . . . . . . . . . . . 3966.6.4 MMM Analysis Zullos Data . . . . . . . . . . . . . . . . . . . . . . . 3976.6.5 Summary of Univariate Output . . . . . . . . . . . . . . . . . . . . . . 3976.8.1 DMM Results, Dr. Zullos Data . . . . . . . . . . . . . . . . . . . . . . 406

List of Tables xxi

6.8.2 Factorial Structure Data . . . . . . . . . . . . . . . . . . . . . . . . . . 4096.8.3 ANOVA for Split-Split Plot Design -Unknown Kronecker Structure . 4096.8.4 ANOVA for Split-Split Plot Design -Compound Symmetry Structure . 4106.8.5 MANOVA for Split-Split Plot Design -Unknown Structure . . . . . . 411

7.2.1 Classification/Confusion Table . . . . . . . . . . . . . . . . . . . . . . 4277.3.1 Discriminant Structure Vectors, H : 1 = 2 . . . . . . . . . . . . . . 4307.3.2 Discriminant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 4317.3.3 Skull Data Classification/Confusion Table . . . . . . . . . . . . . . . . 4317.3.4 Willeran et al. (1991) Brain Size Data . . . . . . . . . . . . . . . . . . 4337.3.5 Discriminant Structure Vectors, H : 1 = 2 . . . . . . . . . . . . . . 4347.5.1 Discriminant Structure Vectors, H : 1 = 2 = 3 . . . . . . . . . . . 4417.5.2 Squared Mahalanobis Distances Flea Beetles H : 1 = 2 = 3 . . . . 4417.5.3 Fishers LDFs for Flea Beetles . . . . . . . . . . . . . . . . . . . . . . 4427.5.4 Classification/Confusion Matrix for Species . . . . . . . . . . . . . . . 443

8.2.1 Principal Component Loadings . . . . . . . . . . . . . . . . . . . . . . 4488.2.2 Principal Component Covariance Loadings (Pattern Matrix) . . . . . . 4488.2.3 Principal Components Correlation Structure . . . . . . . . . . . . . . . 4508.2.4 Partial Principal Components . . . . . . . . . . . . . . . . . . . . . . . 4558.3.1 Matrix of Intercorrelations Among IQ, Creativity, and

Achievement Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 4618.3.2 Summary of Principal-Component Analysis Using 13 13

Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4628.3.3 Intercorrelations of Ratings Among the Semantic Differential Scale . . 4638.3.4 Summary of Principal-Component Analysis Using 8 8

Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4638.3.5 Covariance Matrix of Ratings on Semantic Differential Scales . . . . . 4648.3.6 Summary of Principal-Component Analysis Using 8 8

Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4648.3.7 PAP Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 4678.3.8 Component Using S in PAP Study . . . . . . . . . . . . . . . . . . . . 4678.3.9 PAP Components Using R in PAP Study . . . . . . . . . . . . . . . . . 4678.3.10 Project Talent Correlation Matrix . . . . . . . . . . . . . . . . . . . . . 4688.7.1 Canonical Correlation Analysis . . . . . . . . . . . . . . . . . . . . . . 4828.10.1 PAP Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5128.10.2 Correlation Matrix of 10 Audiovisual Variables . . . . . . . . . . . . . 5138.10.3 Correlation Matrix of 13 Audiovisual Variables (excluding diagonal) . . 514

9.2.1 Matching Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5219.4.1 Protein Consumption in Europe . . . . . . . . . . . . . . . . . . . . . . 5359.4.2 Protein Data Cluster Choices Criteria . . . . . . . . . . . . . . . . . . . 5379.4.3 Protein ConsumptionComparison of Hierarchical

Clustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5379.4.4 Geographic Regions for Random Seeds . . . . . . . . . . . . . . . . . 539

xxii List of Tables

9.4.5 Protein ConsumptionComparison of NonhierarchicalClustering Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539

9.4.6 Item Clusters for Perception Data . . . . . . . . . . . . . . . . . . . . . 5409.6.1 Road Mileages for Cities . . . . . . . . . . . . . . . . . . . . . . . . . 5499.6.2 Metric EFA Solution for Gamma Matrix . . . . . . . . . . . . . . . . . 5539.6.3 Mean Similarity Ratings for Twelve Nations . . . . . . . . . . . . . . . 554

10.2.1 SEM Symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56010.4.1 3-Factor PAP Standardized Model . . . . . . . . . . . . . . . . . . . . 57710.4.2 5-Factor PAP Standardized Model . . . . . . . . . . . . . . . . . . . . 57910.5.1 Path AnalysisDirect, Indirect and Total Effects . . . . . . . . . . . . 58510.6.1 CALIS OUTPUTRevised Model . . . . . . . . . . . . . . . . . . . . 59110.6.2 Revised Socioeconomic Status Model . . . . . . . . . . . . . . . . . . 59310.8.1 Correlation Matrix for Peer-Influence Model . . . . . . . . . . . . . . . 600

List of Figures

2.3.1 Orthogonal Projection of y on x, Pxy = x . . . . . . . . . . . . . . . 152.3.2 The orthocomplement of S relative to V, V/S . . . . . . . . . . . . . . 192.3.3 The orthogonal decomposition of V for the ANOVA . . . . . . . . . . . 202.6.1 Fixed-Vector Transformation . . . . . . . . . . . . . . . . . . . . . . . 622.6.2 y2 = PVr y2 + PVnr y2. . . . . . . . . . . . . . . . . . . . . . . 67

3.3.1 z1z = z21 z1z2 + z22 = 1 . . . . . . . . . . . . . . . . . . . . . . 863.7.1 Chi-Square Plot of Normal Data in Set A, Group 1. . . . . . . . . . . . 1253.7.2 Beta Plot of Normal Data in Data Set A, Group 1 . . . . . . . . . . . . 1253.7.3 Chi-Square Plot of Non-normal Data in Data Set C, Group 2. . . . . . . 1273.7.4 Beta Plot of Non-normal Data in Data Set C, Group 2. . . . . . . . . . 1273.7.5 Ramus Data Chi-square Plot . . . . . . . . . . . . . . . . . . . . . . . 129

4.8.1 3 2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2514.9.1 Plots of Cell Means for Two-Way MANOVA . . . . . . . . . . . . . . 2584.15.1 Plot of Means Edwards Data . . . . . . . . . . . . . . . . . . . . . . . 296

7.4.1 Plot of Discriminant Functions . . . . . . . . . . . . . . . . . . . . . . 4357.5.1 Plot of Flea Beetles Data in the Discriminant Space . . . . . . . . . . . 442

8.2.1 Ideal Scree Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4578.3.1 Scree Plot of Eigenvalues Shin Data . . . . . . . . . . . . . . . . . . . 4628.3.2 Plot of First Two Components Using S . . . . . . . . . . . . . . . . . . 4658.7.1 Venn Diagram of Total Variance . . . . . . . . . . . . . . . . . . . . . 486

xxiv List of Figures

9.2.1 2 2 Contingency Table, Binary Variables . . . . . . . . . . . . . . . 5189.3.1 Dendogram for Hierarchical Cluster . . . . . . . . . . . . . . . . . . . 5249.3.2 Dendogram for Single Link Example . . . . . . . . . . . . . . . . . . . 5269.3.3 Dendogram for Complete Link Example . . . . . . . . . . . . . . . . . 5279.5.1 Scatter Plot of Distance Versus Dissimilarities, Given the

Monotonicity Constraint . . . . . . . . . . . . . . . . . . . . . . . . . 5459.5.2 Scatter Plot of Distance Versus Dissimilarities, When the

Monotonicity Constraint Is Violated . . . . . . . . . . . . . . . . . . . 5469.6.1 MDS Configuration Plot of Four U.S. Cities . . . . . . . . . . . . . . . 5509.6.2 MDS Two-Dimensional Configuration Perception Data . . . . . . . . . 5519.6.3 MDS Three-Dimensional Configuration Perception Data . . . . . . . . 5529.6.4 MDS Three-Dimensional Solution - Nations Data . . . . . . . . . . . . 555

10.2.1 Path Analysis Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 56310.3.1 Two Factor EFA Path Diagram . . . . . . . . . . . . . . . . . . . . . . 56810.4.1 3-Factor PAP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 57610.5.1 Recursive and Nonrecursive Models . . . . . . . . . . . . . . . . . . . 58110.6.1 Lincolns Strike Activity Model in SMSAs . . . . . . . . . . . . . . . . 58710.6.2 CALIS Model for Eq. (10.6.2). . . . . . . . . . . . . . . . . . . . . . . 58910.6.3 Lincolns Standardized Strike Activity Model Fit by CALIS. . . . . . . 59110.6.4 Revised CALIS Model with Signs . . . . . . . . . . . . . . . . . . . . 59110.6.5 Socioeconomic Status Model . . . . . . . . . . . . . . . . . . . . . . . 59210.8.1 Models for Alienation Stability . . . . . . . . . . . . . . . . . . . . . . 59610.8.2 Duncan-Haller-Portes Peer-Influence Model . . . . . . . . . . . . . . . 59910.9.1 Growth with Latent Variables. . . . . . . . . . . . . . . . . . . . . . . 602

1Introduction

1.1 Overview

In this book we present applied multivariate data analysis methods for making inferencesregarding the mean and covariance structure of several variables, for modeling relationshipsamong variables, and for exploring data patterns that may exist in one or more dimensionsof the data. The methods presented in the book usually involve analysis of data consisting ofn observations on p variables and one or more groups. As with univariate data analysis, weassume that the data are a random sample from the population of interest and we usuallyassume that the underlying probability distribution of the population is the multivariatenormal (MVN) distribution. The purpose of this book is to provide students with a broadoverview of methods useful in applied multivariate analysis. The presentation integratestheory and practice covering both formal linear multivariate models and exploratory dataanalysis techniques.

While there are numerous commercial software packages available for descriptive andinferential analysis of multivariate data such as SPSSTM, S-PlusTM, MinitabTM, and SYS-TATTM, among others, we have chosen to make exclusive use of SASTM, Version 8 forWindows.

1.2 Multivariate Models and Methods

Multivariate analysis techniques are useful when observations are obtained for each ofa number of subjects on a set of variables of interest, the dependent variables, and onewants to relate these variables to another set of variables, the independent variables. The

2 1. Introduction

data collected are usually displayed in a matrix where the rows represent the observationsand the columns the variables. The n p data matrix Y usually represents the dependentvariables and the n q matrix X the independent variables.

When the multivariate responses are samples from one or more populations, one oftenfirst makes an assumption that the sample is from a multivariate probability distribution.In this text, the multivariate probability distribution is most often assumed to be the multi-variate normal (MVN) distribution. Simple models usually have one or more means i andcovariance matrices i .

One goal of model formulation is to estimate the model parameters and to test hypothesesregarding their equality. Assuming the covariance matrices are unstructured and unknownone may develop methods to test hypotheses regarding fixed means. Unlike univariate anal-ysis, if one finds that the means are unequal one does not know whether the differencesare in one dimension, two dimensions, or a higher dimension. The process of locatingthe dimension of maximal separation is called discriminant function analysis. In modelsto evaluate the equality of mean vectors, the independent variables merely indicate groupmembership, and are categorical in nature. They are also considered to be fixed and non-random. To expand this model to more complex models, one may formulate a linear modelallowing the independent variables to be nonrandom and contain either continuous or cat-egorical variables. The general class of multivariate techniques used in this case are calledlinear multivariate regression (MR) models. Special cases of the MR model include mul-tivariate analysis of variance (MANOVA) models and multivariate analysis of covariance(MANCOVA) models.

In MR models, the same set of independent variables, X, is used to model the set of de-pendent variables, Y. Models which allow one to fit each dependent variable with a differ-ent set of independent variables are called seemingly unrelated regression (SUR) models.Modeling several sets of dependent variables with different sets of independent variablesinvolve multivariate seemingly unrelated regression (MSUR) models. Oftentimes, a modelis overspecified in that not all linear combinations of the independent set are needed toexplain the variation in the dependent set. These models are called linear multivariatereduced rank regression (MRR) models. One may also extend MRR models to seeminglyunrelated regression models with reduced rank (RRSUR) models. Another name often as-sociated with the SUR model is the completely general MANOVA (CGMANOVA) modelsince growth curve models (GMANOVA) and more general growth curve (MGGC) modelsare special cases of the SUR model. In all these models, the covariance structure of Y isunconstrained and unstructured.

In formulating MR models, the dependent variables are represented as a linear structureof both fixed parameters and fixed independent variables. Allowing the variables to remainfixed and the parameters to be a function of both random and fixed parameters leads toclasses of linear multivariate mixed models (MMM). These models impose a structure on so that both the means and the variance and covariance components of are estimated.Models included in this general class are random coefficient models, multilevel models,variance component models, panel analysis models and models used to analyze covariancestructures. Thus, in these models, one is usually interested in estimating both the mean andthe covariance structure of a model simultaneously.

1.3 Scope of the Book 3

A general class of models that define the dependent and independent variables as ran-dom, but relate the variables using fixed parameters are the class of linear structure relation(LISREL) models or structural equation models (SEM). In these models, the variables maybe both observed and latent. Included in this class of models are path analysis, factor analy-sis, simultaneous equation models, simplex models, circumplex models, and numerous testtheory models. These models are used primarily to estimate the covariance structure in thedata. The mean structure is often assumed to be zero.

Other general classes of multivariate models that rely on multivariate normal theory in-clude multivariate time series models, nonlinear multivariate models, and others. When thedependent variables are categorical rather than continuous, one can consider using multino-mial logit or probit models or latent class models. When the data matrix contains n subjects(examinees) and p variables (test items), the modeling of test results for a group of exam-ines is called item response modeling.

Sometimes with multivariate data one is interested in trying to uncover the structure ordata patterns that may exist. One may wish to uncover dependencies both within a set ofvariables and uncover dependencies with other variables. One may also utilize graphicalmethods to represent the data relationships. The most basic displays are scatter plots or ascatter plot matrix involving two or three variables simultaneously. Profile plots, star plots,glyph plots, biplots, sunburst plots, contour plots, Chernoff faces, and Andrews Fourierplots can also be utilized to display multivariate data.

Because it is very difficult to detect and describe relationships among variables in largedimensional spaces, several multivariate techniques have been designed to reduce the di-mensionality of the data. Two commonly used data reduction techniques include principalcomponent analysis and canonical correlation analysis. When one has a set of dissimilarityor similarity measures to describe relationships, multidimensional scaling techniques arefrequently utilized. When the data are categorical, the methods of correspondence analysis,multiple correspondence analysis, and joint correspondence analysis are used to geometri-cally interpret and visualize categorical data.

Another problem frequently encountered in multivariate data analysis is to categorizeobjects into clusters. Multivariate techniques that are used to classify or cluster objects intocategories include cluster analysis, classification and regression trees (CART), classifica-tion analysis and neural networks, among others.

1.3 Scope of the Book

In reviewing applied multivariate methodologies, one observes that several procedures aremodel oriented and have the assumption of an underlying probability distribution. Othermethodologies are exploratory and are designed to investigate relationships among themultivariables in order to visualize, describe, classify, or reduce the information underanalysis. In this text, we have tried to address both aspects of applied multivariate analy-sis. While Chapter 2 reviews basic vector and matrix algebra critical to the manipulationof multivariate data, Chapter 3 reviews the theory of linear models, and Chapters 46 and

4 1. Introduction

10 address standard multivariate model based methods. Chapters 7-9 include several fre-quently used exploratory multivariate methodologies.

The material contained in this text may be used for either a one-semester course in ap-plied multivariate analysis for nonstatistics majors or as a two-semester course on multi-variate analysis with applications for majors in applied statistics or research methodology.The material contained in the book has been used at the University of Pittsburgh with bothformats. For the two-semester course, the material contained in Chapters 14, selectionsfrom Chapters 5 and 6, and Chapters 79 are covered. For the one-semester course, Chap-ters 13 are covered; however, the remaining topics covered in the course are selected fromthe text based on the interests of the students for the given semester. Sequences have in-cluded the addition of Chapters 46, or the addition of Chapters 710, while others haveincluded selected topics from Chapters 410. Other designs using the text are also possible.No text on applied multivariate analysis can discuss all of the multivariate methodologiesavailable to researchers and applied statisticians. The field has made tremendous advancesin recent years. However, we feel that the topics discussed here will help applied profes-sionals and academic researchers enhance their understanding of several topics useful inapplied multivariate data analysis using the Statistical Analysis System (SAS), Version 8for Windows.

All examples in the text are illustrated using procedures in base SAS, SAS/STAT, andSAS/ETS. In addition, features in SAS/INSIGHT, SAS/IML, and SAS/GRAPH are uti-lized. All programs and data sets used in the examples may be downloaded from theSpringer-Verlag Web site, http://www.springer.com/editorial/authors.html. The programsand data sets are also available at the authors University of Pittsburgh Web site, http://www.pitt.edu/timm. A list of the SAS programs, with the implied extension .sas, dis-cussed in the text follow.

Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7

Multinorm m4 3 1 m5 3 1 m6 4 1 m7 3 1Norm MulSubSel m5 5 1 m6 4 2 m7 3 2m3 7 1 m4 5 1 m5 5 2 m6 4 3 m7 5 1m3 7 2 m4 5 1a m5 7 1 m6 6 1Box-Cox m4 5 2 m5 7 2 m6 6 2Ramus m4 7 1 m5 9 1 m6 8 1Unorm m4 9 1 m5 9 2 m6 8 2m3 8 1 m4 9 2 m5 13 1m3 8 7 m4 11 1 m5 14 1m3 9a m4 13 1am3 9d m4 13 1bm3 9e m4 15 1m3 9f Powerm3 10a m4 17 1m3 10bm3 11 1

1.3 Scope of the Book 5

Chapter 8 Chapter 9 Chapter 10 Other

m8 2 1 m9 4 1 m10 4 1 Xmacrom8 2 2 m9 4 2 m10 4 2 Distnewm8 3 1 m9 4 3 m10 6 1m8 3 2 m9 4 3a m10 6 2m8 3 3 m9 4 4 m10 8 1m8 6 1 m9 4 4m8 8 1 m9 6 1m8 8 2 m9 6 2m8 10 1 m9 6 3m8 10 2m8 10 3

Also included on the Web site is the Fortran program Fit.For and the associated manual:Fit-Manual.ps, a postscript file. All data sets used in the examples and some of the exercisesare also included on the Web site; they are denoted with the extension .dat. Other data setsused in some of the exercises are available from the Data and Story Library (DASL) Website, http://lib.stat.cmu.dat/DASL/. The library is hosted by the Department of Statistics atCarnegie Mellon University, Pittsburgh, Pennsylvania.

This page intentionally left blank

2Vectors and Matrices

2.1 Introduction

In this chapter, we review the fundamental operations of vectors and matrices useful instatistics. The purpose of the chapter is to introduce basic concepts and formulas essen-tial to the understanding of data representation, data manipulation, model building, andmodel evaluation in applied multivariate analysis. The field of mathematics that deals withvectors and matrices is called linear algebra; numerous texts have been written about theapplications of linear algebra and calculus in statistics. In particular, books by Carroll andGreen (1997), Dhrymes (2000), Graybill (1983), Harville (1997), Khuri (1993), Magnusand Neudecker (1999), Schott (1997), and Searle (1982) show how vectors and matricesare useful in applied statistics. Because the results in this chapter are to provide the readerwith the basic knowledge of vector spaces and matrix algebra, results are presented withoutproof.

2.2 Vectors, Vector Spaces, and Vector Subspaces

a. Vectors

Fundamental to multivariate analysis is the collection of observations for d variables. The dvalues of the observations are organized into a meaningful arrangement of d real1 numbers,called a vector (also called, a d-variate response or a multivariate vector valued observa-

1All vectors in this text are assumed to be real valued.

8 2. Vectors and Matrices

tion). Letting yi denote the i th observation where i goes from 1 to d, the d 1 vector y isrepresented as

y =

y1y2...

yd

(2.2.1)This representation of y is called a column vector of order d, with d rows and 1 column.Alternatively, a vector may be represented as a 1 d vector with 1 row and d columns.Then, we denote y as y and call it a row vector. Hence,

y = [y1, y2, . . . , yd ] (2.2.2)Using this notation, y is a column vector and y, the transpose of y, is a row vector. Thedimension or order of the vector y is d where the index d represents the number of variables,elements or components in y. To emphasize the dimension of y, the subscript notation yd1or simply yd is used.

The vector y with d elements represents, geometrically, a point in a d-dimensional Eu-clidean space. The elements of y are called the coordinates of the vector. The null vec-tor 0d1 denotes the origin of the space; the vector y may be visualized as a line segmentfrom the origin to the point y. The line segment is called a position vector. A vector y withn variables, yn , is a position vector in an n-dimensional Euclidean space. Since the vector yis defined over the set of real numbers R, the n-dimensional Euclidean space is representedas Rn or in this text as Vn .

Definition 2.2.1 A vector yn1 is an ordered set of n real numbers representing a positionin an n-dimensional Euclidean space Vn.

b. Vector Spaces

The collection of n 1 vectors in Vn that are closed under the two operations of vectoraddition and scalar multiplication is called a (real) vector space.

Definition 2.2.2 An n-dimensional vector space is the collection of vectors in Vn that sat-isfy the following two conditions

1. If xVn and yVn, then z = x+ yVn2. If R and yVn, then z = yVn

(The notation is set notation for is an element of.)For vector addition to be defined, x and y must have the same number of elements n.

Then, all elements zi in z = x + y are defined as zi = xi + yi for i = 1, 2, . . . , n.Similarly, scalar multiplication of a vector y by a scaler R is defined as zi = yi .

2.2 Vectors, Vector Spaces, and Vector Subspaces 9

c. Vector Subspaces

Definition 2.2.3 A subset, S, of Vn is called a subspace of Vn if S is itself a vector space.The vector subspace S of Vn is represented as S Vn.

Choosing = 0 in Definition 2.2.2, we see that 0 Vn so that every vector spacecontains the origin 0. Indeed, S = {0} is a subspace of Vn called the null subspace. Now,if and are elements of R and x and y are elements of Vn , then all linear combinationsx + y, are in Vn . This subset of vectors is called Vk , where Vk Vn . The subspaceVk is called a subspace, linear manifold or linear subspace of Vn . Any subspace Vk , where0 < k < n, is called a proper subspace. The subset of vectors containing only the zerovector and the subset containing the whole space are extreme examples of vector spacescalled improper subspaces.

Example 2.2.1 Let

x = 10

0

and y = 01

0

The set of all vectors S of the form z = x+y represents a plane (two-dimensional space)in the three-dimensional space V3. Any vector in this two-dimensional subspace, S = V2,can be represented as a linear combination of the vectors x and y. The subspace V2 iscalled a proper subspace of V3 so that V2 V3.

Extending the operations of addition and scalar multiplication to k vectors, a linear com-bination of vectors yi is defined as

v =k

i=1i yi V (2.2.3)

where yi V and i R. The set of vectors y1, y2, . . . , yk are said to span (or generate)V , if

V = {v | v =k

i=1i yi } (2.2.4)

The vectors in V satisfy Definition 2.2.2 so that V is a vector space.

Theorem 2.2.1 Let {y1, y2, . . . , yk} be the subset of k, n 1 vectors in Vn. If every vectorin V is a linear combination of y1, y2, . . . , yk then V is a vector subspace of Vn.

Definition 2.2.4 The set of n 1 vectors {y1, y2, . . . , yk} are linearly dependent if thereexists real numbers 1, 2, . . . , k not all zero such that

ki=1

i yi = 0

Otherwise, the set of vectors are linearly independent.

10 2. Vectors and Matrices

For a linearly independent set, the only solution to the equation in Definition 2.2.4 isgiven by 1 = 2 = = k = 0. To determine whether a set of vectors are linearlyindependent or linearly dependent, Definition 2.2.4 is employed as shown in the followingexamples.

Example 2.2.2 Let

y1 = 11

1

, y2 = 011

, and y3 = 142

To determine whether the vectors y1, y2, and y3 are linearly dependent or linearly inde-pendent, the equation

1y1 + 2y2 + 3y3 = 0is solved for 1, 2, and 3. From Definition 2.2.4,

1

111

+ 2 011

+ 3 142

= 00

0

11

1

+ 022

+ 34323

= 00

0

This is a system of three equations in three unknowns

(1) 1 + 3 = 0

(2) 1 + 2 + 43 = 0

(3) 1 2 23 = 0From equation (1), 1 = 3. Substituting 1 into equation (2), 2 = 33. If 1and 2are defined in terms of 3, equation (3) is satisfied. If 3 = 0, there exist real numbers 1,2, and 3, not all zero such that

3i=1

i = 0

Thus, y1, y2, and y3 are linearly dependent. For example, y1 + 3y2 y3 = 0.Example 2.2.3 As an example of a set of linearly independent vectors, let

y1 = 01

1

, y2= 112

, and y3= 34

1

2.2 Vectors, Vector Spaces, and Vector Subspaces 11

Using Definition 2.2.4,

1

011

+ 2 112

+ 3 34

1

= 00

0

is a system of simultaneous equations

(1) 2 + 33 = 0

(2) 1 + 2 + 43 = 0

(3) 1 22 + 3 = 0From equation (1), 2 = 33. Substituting 33 for 2 into equation (2), 1 = 3;by substituting for 1 and 2 into equation (3), 3 = 0. Thus, the only solution is 1 =2 = 3 = 0, or {y1, y2, y3} is a linearly independent set of vectors.

Linearly independent and linearly dependent vectors are fundamental to the study of ap-plied multivariate analysis. For example, suppose a test is administered to n students wherescores on k subtests are recorded. If the vectors y1, y2, . . . , yk are linearly independent,each of the k subtests are important to the overall evaluation of the n students. If for somesubtest the scores can be expressed as a linear combination of the other subtests

yk =k1i=1

i yi

the vectors are linearly dependent and there is redundancy in the test scores. It is oftenimportant to determine whether or not a set of observation vectors is linearly independent;when the vectors are not linearly independent, the analysis of the data may need to berestricted to a subspace of the original space.

Exercises 2.2

1. For the vectors

y1 = 11

1

and y2 = 201

find the vectors

(a) 2y1 + 3y2(b) y1 + y2(c) y3 such that 3y1 2y2 + 4y3 = 0

2. For the vectors and scalars defined in Example 2.2.1, draw a picture of the space Sgenerated by the two vectors.

12 2. Vectors and Matrices

3. Show that the four vectors given below are linearly dependent.

y1 = 10

0

, y2 = 23

5

, y3 = 10

1

, and y4 = 04

6

4. Are the following vectors linearly dependent or linearly independent?

y1 = 11

1

, y2 = 12

3

, y3 = 22

3

5. Do the vectors

y1 = 24

2

, y2 = 12

3

, and y3 = 612

10

span the same space as the vectors

x1 = 00

2

and x2 = 24

10

6. Prove the following laws for vector addition and scalar multiplication.

(a) x+ y = y+ x (commutative law)(b) (x+ y)+ z = x+ (y+ z) (associative law)(c) (y) = ()y = ()y = (y) (associative law for scalars)(d) (x+ y) = x+ y (distributive law for vectors)(e) ( + )y = y+ y (distributive law for scalars)

7. Prove each of the following statements.

(a) Any set of vectors containing the zero vector is linearly dependent.

(b) Any subset of a linearly independent set is also linearly independent.

(c) In a linearly dependent set of vectors, at least one of the vectors is a linearcombination of the remaining vectors.

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces

The concept of dimensionality is a familiar one from geometry. In Example 2.2.1, thesubspace S represented a plane of dimension two, a subspace of the three-dimensionalspace V3. Also important is the minimal number of vectors required to span S.

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 13

a. Bases

Definition 2.3.1 Let {y1, y2, . . . , yk} be a subset of k vectors where yi Vn. The set of kvectors is called a basis of Vk if the vectors in the set span Vk and are linearly independent.The number k is called the dimension or rank of the vector space.

Thus, in Example 2.2.1 S V2 V3 and the subscript 2 is the dimension or rank ofthe vector space. It should be clear from the context whether the subscript on V representsthe dimension of the vector space or the dimension of the vector in the vector space. Everyvector space, except the vector space {0}, has a basis. Although a basis set is not unique, thenumber of vectors in a basis is unique. The following theorem summarizes the existenceand uniqueness of a basis for a vector space.

Theorem 2.3.1 Existence and Uniqueness

1. Every vector space has a basis.

2. Every vector in a vector space has a unique representation as a linear combinationof a basis.

3. Any two bases for a vector space have the same number of vectors.

b. Lengths, Distances, and Angles

Knowledge of vector lengths, distances and angles between vectors helps one to understandrelationships among multivariate vector observations. However, prior to discussing theseconcepts, the inner (scalar or dot) product of two vectors needs to be defined.

Definition 2.3.2 The inner product of two vectors x and y, each with n elements, is thescalar quantity

xy =n

i=1xi yi

In textbooks on linear algebra, the inner product may be represented as (x, y) or x y. GivenDefinition 2.3.2, inner products have several properties as summarized in the followingtheorem.

Theorem 2.3.2 For any conformable vectors x, y, z, and w in a vector space V and anyreal numbers and , the inner product satisfies the following relationships

1. xy = yx2. xx 0 with equality if and only if x = 03. (x)(y) = (xy)4. (x+ y) z = xz+ yz5. (x+ y)(w+ z) = x(w+ z)+ y(w+ z)

14 2. Vectors and Matrices

If x = y in Definition 2.3.2, then xx =ni=1 x2i . The quantity (xx)1/2 is called theEuclidean vector norm or length of x and is represented as x. Thus, the norm of x is thepositive square root of the inner product of a vector with itself. The norm squared of x isrepresented as ||x||2. The Euclidean distance or length between two vectors x and y in Vnis x y = [(x y)(x y)]1/2. The cosine of the angle between two vectors by the lawof cosines is

cos = xy/ x y 0 180 (2.3.1)Another important geometric vector concept is the notion of orthogonal (perpendicular)vectors.

Definition 2.3.3 Two vectors x and y in Vn are orthogonal if their inner product is zero.

Thus, if the angle between x and y is 90, then cos = 0 and x is perpendicular to y,written as x y.Example 2.3.1 Let

x = 11

2

and y = 101

The distance between x and y is then x y = [(x y)(x y)]1/2 = 14 and thecosine of the angle between x and y is

cos = xy/ x y = 3/62 = 3/2so that the angle between x and y is = cos1(3/2) = 150.

If the vectors in our example have unit length, so that x = y = 1, then the cos isjust the inner product of x and y. To create unit vectors, also called normalizing the vectors,one proceeds as follows

ux = x / x = 1/

6

1/

62/

6

and uy = y/ y = 1/

2

0/

21/2

and the cos = ux uy =

3/2, the inner product of the normalized vectors. The normal-

ized orthogonal vectors ux and uy are called orthonormal vectors.

Example 2.3.2 Let

x = 124

and y = 40

1

Then xy = 0; however, these vectors are not of unit length.Definition 2.3.4 A basis for a vector space is called an orthogonal basis if every pair ofvectors in the set is pairwise orthogonal; it is called an orthonormal basis if each vectoradditionally has unit length.

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 15

y

0

xPx y = x

y x

FIGURE 2.3.1. Orthogonal Projection of y on x, Pxy = x

The standard orthonormal basis for Vn is {e1, e2, . . . , en} where ei is a vector of all zeroswith the number one in the i th position. Clearly the ei = 1 and eie j ; for all pairs iand j . Hence, {e1, e2, . . . , en} is an orthonormal basis for Vn and it has dimension (or rank)n. The basis for Vn is not unique. Given any basis for Vk Vn we can create an orthonormalbasis for Vk . The process is called the Gram-Schmidt orthogonalization process.

c. Gram-Schmidt Orthogonalization Process

Fundamental to the Gram-Schmidt process is the concept of an orthogonal projection. In atwo-dimensional space, consider the vectors x and y given in Figure 2.3.1. The orthogonalprojection of y on x, Pxy, is some constant multiple, x of x, such that Pxy (yPxy).

Since the cos =cos 90 = 0, we set (yx)x equal to 0 and we solve for to find = (yx)/ x2. Thus, the projection of y on x becomes

Pxy = x = (yx)x/ x2

Example 2.3.3 Let

x = 11

1

and y = 14

2

Then, the

Pxy = (yx)xx2 =

7

3

111

Observe that the coefficient in this example is no more than the average of the ele-ments of y. This is always the case when projection an observation onto a vector of 1s (theequiangular or unit vector), represented as 1n or simply 1. P1y = y1 for any multivariateobservation vector y.

To obtain an orthogonal basis {y1, . . . , yr } for any subspace V of Vn , spanned by anyset of vectors {x1, x2, . . . , xk}, the preceding projection process is employed sequentially

16 2. Vectors and Matrices

as follows

y1 = x1y2 = x2 Py1 x2 = x2 (x2y1)y1/ y12 y2y1y3 = x3 Py1 x3 Py2x3= x3 (x3y1)y1/y21 (x3y2)y2/y22 y3y2y1

or, more generally

yi = xi i1j=1

ci j y j where ci j = (xi y j )/y j2

deleting those vectors yi for which yi = 0. The number of nonzero vectors in the setis the rank or dimension of the subspace V and is represented as Vr , r k. To find anorthonormal basis, the orthogonal basis must be normalized.

Theorem 2.3.3 (Gram-Schmidt) Every r-dimensional vector space, except the zero-dimen-sional space, has an orthonormal basis.

Example 2.3.4 Let V be spanned by

x1 =

11

101

, x2 =

20412

, x3 =

11311

, and x4 =

6231

1

To find an orthonormal basis, the Gram-Schmidt process is used. Set

y1 = x1 =

11

101

y2 = x2 (x2y1)y1/ y12

=

20412

84

11

101

=

02210

y3 = x3 (x3y1)y1/ y1 2 (x3y2)y2/ y2 2= 0

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 17

so delete y3;

y4 =

6231

1

(x4y1)y1/ y12 (x4y2)y2/ y22

=

6231

1

84

11

101

99

02210

=

42121

Thus, an orthogonal basis for V is {y1, y2, y4}. The vectors must be normalized to

obtain an orthonormal basis; an orthonormal basis is u1 = y1/

4,u2 = y2/3, andu3 = y4/

26.

d. Orthogonal Spaces

Definition 2.3.5 Let Vr = {x1, . . . , xr } Vn . The orthocomplement subspace of Vr in Vn,represented by V, is a vector subspace of Vn which consists of all vectors y Vn suchthat xi y = 0 and we write Vn = Vr V.

The vector space Vn is the direct sum of the subspaces Vn and V. The intersection ofthe two spaces only contain the null space. The dimension of Vn, dim Vn , is equal to thedim Vr + dim V so that the dim V = n r. More generally, we have the followingresult.

Definition 2.3.6 Let S1, S2, . . . , Sk denote vector subspaces of Vn. The direct sum of thesevector spaces, represented as

ki=1 Si , consists of all unique vectors v =

ki=1 i si where

si Si , i = 1, . . . , k and the coefficients i R.Theorem 2.3.4 Let S1, S2, . . . , Sk represent vector subspaces of Vn. Then,

1. V =ki=1 Si is a vector subspace of Vn, V Vn .2. The intersection of Si is the null space {0}.3. The intersection of V and V is the null space.

4. The dim V = n k so that dim V V = n.Example 2.3.5 Let

V = 10

1

, 011

= {x1, x2} and y V3

18 2. Vectors and Matrices

We find V using Definition 2.3.5 as follows

V = {y V3 | (yx) = 0 for any x V }= {y V3 | (yV }= {y V3 | (yxi } (i = 1, 2)

A vector y = [y1, y2, y3] must be found such that yx1 and yx2. This implies thaty1 y3 = 0, or y1 = y3, and y2 = y3, or y1 = y2 = y3. Letting yi = 1,

V = 11

1

= 1 and V3 = V VFurthermore, the

P1y = yy

y

and PVy = y P1y = y1 yy2 y

y3 y

Alternatively, from Definition 2.3.6, an orthogonal basis for V is

V = 101

, 1/211/2

= {v1, v2} = S1 S2and the PV y becomes

Pv1y+ Pv2 y = y1 yy2 y

y3 y

Hence, a unique representation for y is y = P1y + PV y as stated in Theorem 2.3.4. Thedim V3 = dim 1+ dim V.

In Example 2.3.5, V is the orthocomplement of V relative to the whole space. OftenS V Vn and we desire the orthocomplement of S relative to V instead of Vn . Thisspace is represented as V/S and V = (V/S) S = S1 S2. Furthermore, Vn = V (V/S) S = V S1 S2. If the dimension of V is k and the dimension of S is r , thenthe dimension of V is n k and the dim V/S is k r , so that (n k)+ (k r)+ r = nor the dim Vn = dim V + dim(V/S)+ dim S as stated in Theorem 2.3.4. In Figure 2.3.2,the geometry of subspaces is illustrated with Vn = S (V/S) V.

yi j = + i + ei j i = 1, 2 and j = 1, 2The algebra of vector spaces has an important representation for the analysis of variance

(ANOVA) linear model. To illustrate, consider the two group ANOVA modelThus, we have two groups indexed by i and two observations indexed by j . Representing

the observations as a vector,y = [y11, y12, y21, y22]

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 19

V/S

VV

S

Vn

FIGURE 2.3.2. The orthocomplement of S relative to V, V/S

and formulating the observation vector as a linear model,

y =

y11y12y21y22

=

1111

+

1100

1 +

0011

2 +

e11e12e21e22

The vectors associated with the model parameters span a vector space V often called the

design space. Thus,

V =

1111

1100

0011

= {1, a1, a2}

where 1, a1, and a2 are elements of V4. The vectors in the design space V are linearlydependent. Let A = {a1, a2} denote a basis for V . Since 1 A, the orthocomplement ofthe subspace {1} 1 relative to A, denoted by A/1 is given by

A/1 = {a1 P1a1, a2 P1a2}

=

1/21/21/21/2

1/21/2

1/21/2

The vectors in A/1 span the space; however, a basis for A/1 is given by

A/1 =

1111

where (A/1)1 =A and A V4. Thus, (A/1)1 A = V4. Geometrically, as shown inFigure 2.3.3, the design space V A has been partitioned into two orthogonal subspaces1 and A/1 such that A = 1 (A/1), where A/1 is the orthocomplement of 1 relative to A,and A A = V4.

20 2. Vectors and Matrices

A1

y

V

A/1

FIGURE 2.3.3. The orthogonal decomposition of V for the ANOVA

The observation vector y V4 may be thought of as a vector with components in variousorthogonal subspaces. By projecting y onto the orthogonal subspaces in the design space A,we may obtain estimates of the model parameters. To see this, we evaluate PAy = P1y +PA/1y.

P1y = y

1111

=

1111

PA/1y = PAy P1y

= (ya1)a1a12

+ (ya2)a2a22

(y1)112

=2

i=1

[(yai )ai2

(y1)12

]ai

=2

i=1(yi y)ai =

2i=1

i ai

since (A/1)1 and 1 = a1 + a2. As an exercise, find the projection of y onto A and thePA/1y2.

From the analysis of variance, the coefficients of the basis vectors for 1 and A/1 yield theestimators for the overall effect and the treatment effects i for the two-group ANOVAmodel employing the restriction on the parameters that 1+2 = 0. Indeed, the restrictioncreates a basis for A/1. Furthermore, the total sum of squares, y2, is the sum of squaredlengths of the projections of y onto each subspace, y2 = P1y2+PA/1y2+PAy2.The dimensions of the subspaces for I groups, corresponding to the decomposition of y2,satisfy the relationship that n = 1 + (I 1) + (n I ) where the dim A = I and y Vn .Hence, the degrees of freedom of the subspaces are the dimensions of the orthogonal vectorspaces {1}, {A/1} and {A}for the design space A. Finally, the PA/1y2 is the hypothesissum of squares and the PAy2 is the error sum of squares. Additional relationships be-tween linear algebra and linear models using ANOVA and regression models are containedin the exercises for this section. We conclude this section with some inequalities useful instatistics and generalize the concepts of distance and vector norms.

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 21

e. Vector Inequalities, Vector Norms, and Statistical Distance

In a Euclidean vector space, two important inequalities regarding inner products are theCauchy-Schwarz inequality and the triangular inequality.

Theorem 2.3.5 If x and y are vectors in a Euclidean space V , then

1. (xy)2 x2 y2 (Cauchy-Schwarz inequality)2. x+ y x + y (Triangular inequality)In terms of the elements of x and y, (1) becomes(

i

xi yi

)2

(i

x2i

)(i

y2i

)(2.3.2)

which may be used to show that the zero-order Pearson product-moment correlation co-efficient is bounded by 1. Result (2) is a generalization of the familiar relationship fortriangles in two-dimensional geometry.

The Euclidean norm is really a member of Minkowskis family of norms (Lp-norms)

xp ={

ni=1|xi |p

}1/p(2.3.3)

where 1 p < and x is an element of a normed vector space V . For p = 2, wehave the Euclidean norm. When p = 1, we have the minimum norm, x1. For p = ,Minkowskis norm is not defined, instead we define the maximum or infinity norm of x as

x = max1in

|xi | (2.3.4)

Definition 2.3.7 A vector norm is a function defined on a vector space that maps a vectorinto a scalar value such that

1. xp 0, and xp = 0 if and only if x = 0,2. xp = || xp for R,3. x+ yp xp + yp,

for all vectors x and y.

Clearly the x2 = (xx)1/2 satisfies Definition 2.3.7. This is also the case for the maxi-mum norm of x. In this text, the Euclidean norm (L2-norm) is assumed unless noted other-wise. Note that (||x||2)2 = ||x||2 = xx is the Euclidean norm squared of x.

While Euclidean distances and norms are useful concepts in statistics since they help tovisualize statistical sums of squares, non-Euclidean distance and non-Euclidean norms areoften useful in multivariate analysis. We have seen that the Euclidean norm generalizes to a

22 2. Vectors and Matrices

more general function that maps a vector to a scalar. In a similar manner, we may generalizethe concept of distance. A non-Euclidean distance important in multivariate analysis is thestatistical or Mahalanobis distance.

To motivate the definition, consider a normal random variable X with mean zero andvariance one, X N (0, 1). An observation xo that is two standard deviations from themean lies a distance of two units from the origin since the xo = (02 + 22)1/2 = 2 andthe probability that 0 x 2 is 0.4772. Alternatively, suppose Y N (0, 4) where thedistance from the origin for yo = xo is still 2. However, the probability that 0 y 2becomes 0.3413 so that y is closer to the origin than x . To compare the distances, we musttake into account the variance of the random variables. Thus, the squared distance betweenxi and x j is defined as

D2i j = (xi x j )2/ 2 = (xi x j )( 2)1(xi x j ) (2.3.5)where 2 is the population variance. For our example, the point xo has a squared statisticaldistance D2i j = 4 while the point yo = 2 has a value of D2i j = 1 which maintains the in-equality in probabilities in that Y is closer to zero statistically than X . Di j is the distancebetween xi and x j , in the metric of 2 called the Mahalanobis distance between xi and x j .When 2 = 1, Mahalanobis distance reduces to the Euclidean distance.

Exercises 2.3

1. For the vectors

x = 13

2

, y = 12

0

, and z = 11

2

and scalars = 2 and = 3, verify the properties given in Theorem 2.3.2.

2. Using the law of cosines

y x2 = x2 + y2 2 x y cos derive equation (2.3.1).

3. For the vectorsy1 =

221

and y2 = 301

(a) Find their lengths, and the distance and angle between them.

(b) Find a vector of length 3 with direction cosines

cos1 = y1/ y = 1/

2 and cos2 = y2/ y = 1/

2

where 1 and 2 are the cosines of the angles between y and each of its refer-

ences axes e1=[

10

], and e2=

[01

].

(c) Verify that cos2 1 + cos2 2 = 1.

2.3 Bases, Vector Norms, and the Algebra of Vector Spaces 23

4. For

y = 197

and V =v1 =

231

, v2 = 50

4

(a) Find the projection of y onto V and interpret your result.

(b) In general, if yV , can you find the PV y?5. Use the Gram-Schmidt process to find an orthonormal basis for the vectors in Exer-

cise 2.2, Problem 4.

6. The vectors

v1 = 121

and v2 = 23

0

span a plane in Euclidean space.

(a) Find an orthogonal basis for the plane.

(b) Find the orthocomplement of the plane in V3.

(c) From (a) and (b), obtain an orthonormal basis for V3.

7. Find an orthonormal basis for V3 that includes the vector y = [1/

3, 1/

3,1/3].

8. Do the following.

(a) Find the orthocomplement of the space spanned by v = [4, 2, 1] relative toEuclidean three dimensional space, V3.

(b) Find the orthocomplement of v = [4, 2, 1] relative to the space spanned byv1 = [1, 1, 1] and v1 = [2, 0,1].

(c) Find the orthocomplement of the space spanned by v1 = [1, 1, 1] and v2 =[2, 0,1] relative to V3.(d) Write the Euclidean three-dimensional space as the direct sum of the relative

spaces in (a), (b), and (c) in all possible ways.

9. Let V be spanned by the orthonormal basis

v1 =

1/

20

1/

20

and v2 =

01/2

01/2

(a) Express x = [0, 1, 1, 1] as x = x1 + x2,where x1 V and x2 V.(b) Verify that the PV x2 = Pv1x2 + Pv2 x2.(c) Which vector y V is closest to x? Calculate the minimum distance.

24 2. Vectors and Matrices

10. Find the dimension of the space spanned by

v11111

v21100

v30011

v41010

v50101

11. Let yn Vn , and V = {1}.

(a) Find the projection of y onto V, the orthocomplement of V relative to Vn .(b) Represent y as y = x1 + x2, where x1 V and x2 V. What are the dimen-

sions of V and V?(c) Since y2 = x12+x22 = PV y2+

PVy2, determine a general formfor each of the components of y2. Divide PVy2 by the dimension of V.What do you observe about the ratio

PVy2 / dim V?12. Let yn Vn be a vector of observations, y = [y1, y2, . . . , yn] and let V = {1, x}

where x = [x1, x2, . . . , xn].(a) Find the orthocomplement of 1 relative to V (that is, V/1) so that 1(V/1) =

V . What is the dimension of V/1?

(b) Find the projection of y onto 1 and also onto V/1. Interpret the coefficientsof the projections assuming each component of y satisfies the simple linearrelationship yi = + (xi x).

(c) Find y PV y and y PV y2. How are these quantities related to the simplelinear regression model?

13. For the I Group ANOVA model yi j = + i + ei j where i = 1, 2, . . . , I and j =1, 2, . . . , n observations per group, evaluate the square lengths P1y2 ,

PA/1y2 ,and

PAy2 for V = {1, a1, . . . , aI }. Use Figure 2.3.3 to relate these quantitiesgeometrically.

14. Let the vector space V be spanned by

v1

11111111

{ 1

v2 v3 v4 v5 v6 v7 v8 v9

11110000

00001111

,

11001100

00110011

11000000

00110000

00001100

00000011

A, B, AB }

2.4 Basic Matrix Operations 25

(a) Find the space A+ B = 1 (A/1) (B/1) and the space AB/(A+ B) so thatV = 1 (A/1) (B/1) + [AB/(A + B)]. What is the dimension of each ofthe subspaces?

(b) Find the projection of the observation vector y = [y111, y112, y211, y212, y311,y312, y411, y412] in V8 onto each subspace in the orthogonal decomposition of Vin (a). Represent these quantities geometrically and find their squared lengths.

(c) Summarize your findings.

15. Prove Theorem 2.3.4.

16. Show that Minkowskis norm for p = 2 satisfies Definition 2.3.7.17. For the vectors y = [y1, . . . , yn] and x = [x1, . . . , xn] with elements that have a

mean of zero,

(a) Show that s2y = y2 /(n 1) and s2x = x2 / (n 1) .(b) Show that the sample Pearson product moment correlation between two obser-

vations x and y is r = xy/ x y .

2.4 Basic Matrix Operations

The organization of real numbers into a rectangular or square array consisting of n rowsand d columns is called a matrix of order n by d and written as n d.Definition 2.4.1 A matrix Y of order n d is an array of scalars given as

Ynd =

y11 y12 y1dy21 y22 y2d...

......

yn1 yn2 ynd

The entries yi j of Y are called the elements of Y so that Y may be represented as Y = [yi j ].Alternatively, a matrix may be represented in terms of its column or row vectors as

Ynd = [v1, v2, . . . , vd ] and v j Vn (2.4.1)or

Ynd =

y1y2...

yn

and yi VdBecause the rows of Y are usually associated with subjects or individuals each yi is amember of the person space while the columns v j of Y are associated with the variablespace. If n = d, the matrix Y is square.

26 2. Vectors and Matrices

a. Equality, Addition, and Multiplication of Matrices

Matrices like vectors may be combined using the operations of addition and scalar multi-plication. For two matrices A and B of the same order, matrix addition is defined as

A+ B = C if and only if C = [ci j ] = [ai j + bi j ] (2.4.2)The matrices are conformable for matrix addition only if both matrices are of the sameorder and have the same number of row and columns.

The product of a matrix A by a scalar is

A = A = [ai j ] (2.4.3)Two matrices A and B are equal if and only if [ai j ] = [bi j ]. To extend the concept of aninner product of two vectors to two matrices, the matrix product AB = C is defined if andonly if the number of columns in A is equal to the number of rows in B. For two matricesAnd and Bdm , the matrix (inner) product is the matrix Cnm such that

AB = C = [ci j ] for ci j =d

k=1aikbk j (2.4.4)

From (2.4.4), we see that C is obtained by multiplying each row of A by each columnof B. The matrix product is conformable if the number of columns in the matrix A is equalto the number of rows in the matrix B. The column order is equal to the row order formatrix multiplication to be defined. In general, AB = BA. If A = B and A is square, thenAA = A2. When A2= A, the matrix A is said to be idempotent.

From the definitions and properties of real numbers, we have the following theorem formatrix addition and matrix multiplication.

Theorem 2.4.1 For matrices A,B,C, and D and scalars and , the following propertieshold for matrix addition and matrix multiplication.

1. A+ B = B+ A2. (A+ B)+ C = A+ (B+ C)3. (A+ B) =A+B4. ( + )A =A+A5. (AB)C = A(BC)6. A(B+ C) = AB+ AC7. (A+ B)C = AC+ BC8. A+ (A) = 09. A+ 0 = A

10. (A+ B)(C+ D) = A(C+ D)+ B(C+ D) = AC+ AD+ BC+ BD

2.4 Basic Matrix Operations 27

Example 2.4.1 Let

A = 1 23 74 8

and B = 2