Upload
sundaykemi
View
113
Download
3
Embed Size (px)
Citation preview
Lecture Slides on Mixed Models
Based on
A Course in Mixed Models for Use inAnimal Health and Animal Welfare Research
Søren Højsgaard & Erik Jørgensen
Biometry Research UnitDanish Institute of Agricultural Sciences
Research Centre Foulum
October 18, 2001
1 Preface
In the spring 2001 the Biometry Research group at the Danish Institute of Agricultural Sciencesarranged a course in Mixed models for researchers at the Department of Animal Health andAnimal Welfare at the same institute. The course consisted a combination of lectures, groupexercises, written assignments and a final project report based on data from experiments thatthe project participants were involved in.
During the course, the book SAS System for Mixed Models by Littell et al. (1996) was used,referred to as LMSW in the present document. It was necessary to supplement the book withadditional theoretical material and examples based on data from the research institute. Thisled to a comprehensive number of slides used for the presentations.
This supplementary material is compiled in the present document. We hope the readers willfind it useful. Maybe the online version1 of this document will be even more useful, because ofthe hypertext facilities.
Søren Højsgaard & Erik Jø[email protected] [email protected]
Biometry Research UnitDanish Institute of Agricultural Sciences
Research Centre FoulumP.O. Box 50
DK-8830 Tjele
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/HSVmixed2001Slides.pdf
3
1 Preface
4
Contents
1 Preface 3
Contents 9
2 Overview of slides 11
3 Basic Concepts from Linear algebra) 13Why Linear Algebra?? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26n–dimensional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Linear dependence and independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Projections onto Linear Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4 Linear normal models 39Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Linear Normal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Random Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Functions of Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 51The Distribution of a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52The Expectation in a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Representations of Models in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Least Squares Estimation in a LNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61Estimation on matrix form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64The parameter vector β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Estimability and Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75Estimability in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Least Squares Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82Hypothetis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Calculating things in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 Some Basic Statistical Concepts 97Data and Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Why the Normal Distribution is so “Normal” . . . . . . . . . . . . . . . . . . . . . . . 101
5
Contents
The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Some General Principles of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
How good is an estimator? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109Consistency of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Desirable Properties of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 112The Method of Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 113
The Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115The Maximum likelihood principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
How Good is the Estimate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120The Asymptotic Normal Distribution of the MLE . . . . . . . . . . . . . . . . . . . . . 122Asymptotical normality of transformations of the MLE . . . . . . . . . . . . . . . . . . 125Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126How to get the asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6 An overview 137Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Darwins maize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Galtons tilgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Korrekt tilgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142Hvad er sket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Den 5. potte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144Populations genetik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Populations genetik/ Husdyravl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146Mixed Models generelt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7 Experimental planning and design 149Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Forskningsprocessen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Darwins majs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151Hypoteser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Luse Beslutningsstøtte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Forskningsbeslutningsstøtte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Designmuligheder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8 Randomized Complete Block Design 157Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Linear Normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158Random vs. Fixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159ML - estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160Proc Mixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162Andre eksempler pa RCBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164Proc Mixed fortsat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173IC - options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6
Contents
9 Randomized Complete Block Design II 175Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176BLUEs and BLUPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179BLUP Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Model Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10 Split-Plot Experiments 183The General Idea behind Split–Plot Experiments . . . . . . . . . . . . . . . . . . . . . 184Variance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186Comparing Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188Inference Issues for Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189Analysis of the Split–Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 190Modelling the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Three Technical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Back to the Original Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195Unbalanced cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195Satterthwaites approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198How Good is Satterthwaites Approximation . . . . . . . . . . . . . . . . . . . . . . . . 201Two–sample Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202Split–Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203Making the “right” tests with PROC MIXED . . . . . . . . . . . . . . . . . . . . . . . 204A Severe Warning!! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205Some Tentative Conclusions on Satterthwaite . . . . . . . . . . . . . . . . . . . . . . . 207Random or Fixed Effects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208Multilocation Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11 Examples of Split-Plot Designs 213Example: W. Schouten Ph.D. work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214Breed Effect on Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216Straw shortener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217Group Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218Herd Investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218Multilocation trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
12 Estimation and tests in mixed models 221Maximum Likelihood and Linear Normal Models . . . . . . . . . . . . . . . . . . . . . 222Maximum Likelihood Estimation in Mixed Models . . . . . . . . . . . . . . . . . . . . 225Using ML or REML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230Tests in Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
13 Complications concerning Variance Components 235Sugar Beet example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Reason . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238Likelihood contour plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7
Contents
G not positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239Warning Satterthwaite goes wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241Testing effects of random components . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
14 Repeated Measurements 245Analyzing Repeated Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246Tacit Assumptions when using the Split–Plot Model . . . . . . . . . . . . . . . . . . . 248Modelling of Covariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250Types of random variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250Unstructured Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251The AR(1)–model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253How to estimate the autocorrelation?? . . . . . . . . . . . . . . . . . . . . . . . . . . . 256Compound Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260Which Covariance Structure to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261Numerical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262What does the covariance structure mean for the conclusions? . . . . . . . . . . . . . . 263
15 Repeated Measurements: Covariance structures 265Repeated statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Types of variance structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
Unstructured . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268Autoregressive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269Antedependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271Toeplitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272Heterogeneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273AR vs CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16 Random Regression 275The Basic Idea behind Random Regression . . . . . . . . . . . . . . . . . . . . . . . . 276Analyzing the Individual Regression Coefficients . . . . . . . . . . . . . . . . . . . . . 279Random Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280How to ... In SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283Correlation structure in Random Regression Models . . . . . . . . . . . . . . . . . . . 285
17 Factor Structure Diagrams 289Factor Structure Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290Two–way ANOVA with Replicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Two–way ANOVA without Replicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 293Block Experiments with Replicates within Blocks . . . . . . . . . . . . . . . . . . . . . 294Block Experiments without Replicates within Blocks . . . . . . . . . . . . . . . . . . . 296Split Plot Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
18 Covariate Models and Multivariate Response 301Example of the use of covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
8
Contents
Model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304Table 5:1 LMSW page 5.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304SAS- Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306Feed vs daily gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307Multivariate Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310The Components of a MLNM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311How to ... In SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315The general setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
19 Heterogeneous Variance 319Why Variance Heterogeneity is Important to Recognize . . . . . . . . . . . . . . . . . 320Graphical Investigation of the Variance Structure . . . . . . . . . . . . . . . . . . . . . 321Variance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322The Delta–method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324Taylors Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325Applying Taylors Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326Transformation of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328Modelling Variance Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330Heterogeneous Variance for Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . . 334Power–of–Mean for Data with Covariates . . . . . . . . . . . . . . . . . . . . . . . . . 340Noget om transformationer, normalfordelingsapproximation og konfidensintervaller . . 345Transformation og konfidensintervaller . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
20 Variansheterogeneity: Example of effect of transformation 355Variance Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Model of Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360Model comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361Treatment differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363Natural Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
21 Variance Homogeneity: Diurnal Variation 365Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366Random Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367Model of mean ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368Modelling variance inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368SAS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
22 Links to supplementary material 371
Bibliography 373
9
Contents
10
2 Overview of slides
The course was arranged consisting of three blocks of lectures.
1. Brush-up concerning the necessary prerequisites of statistical concepts, linear algebra andlinear normal models. In addition, a historic review was given and experimental planningdiscussed. This covers Chapter 3-7.
2. This block of lectures covered the basic application of Mixed Models within the experi-mental designs typically used at the Department of Animal Health and Animal Welfare.That is
• randomized complete block designs, (Chapter 8 and 9),
• split-plot designs (Chapter 10 and 11),
• repeated measurements. (Chapter 14 and 15)
• random regression. (Chapter 16)
• covariates and multivariate response. (Chapter 18)
In addition the fundamentals concerning estimation and tests in Mixed Models, is dis-cuused in Chapter 12. The two remaining issues: numerical problems (Chapter 13) andfactor structure diagrams (Chapter 17) were included because of questions raised from theparticipants. In practical examples some of the variance components estimates were veryoften set to 0, leading to problems concerning the calculations of d.f. (i.e., with Satterth-waites approximation). This further raised a need for a more ’manual’ approach towardsd.f. calculations in different designs.
3. In the final part of the course some additional topics and developments within MixedModels were presented and efforts were made to give a general summary and overview ofthe topics. Lectures concerning variance heterogeneity is presented in Chapter 19 and 20.An example using the presented methods on data concerning diurnal variation is presentedin Chapter 21
In addition, the preliminary work on the final project report were presented during thisfinal block.
11
2 Overview of slides
The final chapter (22) in this book consist of links to supplementary material. Mainly, SASexamples.
The exercises uses in the course is not included but can be found by visiting the home page ofthe course1
Finally, it should be mentioned that each chapter starts with a very short introduction to thetopic. In addition, a link to the full screen version of the presentation can be found.
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/HSVmixed2001.htm
12
3 Basic Concepts from Linear algebra)
Linear algebra is an important prerequisite in order to understand the model formulation andcalculations within Mixed Model. The following slides served as a brush-up on the theory, withpresentation of the most important concepts and results.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/LinAlg.f.pdf
13
Why Linear Algebra??
• Many statistical models used in practice are assumed to have some
kind of a linear structure. (Linear regression and analysis of variance
are classical examples.)
• Linear algebra is the branch of mathematics that deals with linear
structures.
• Linear algebra is a convenient tool for handling models with linear
structures.
• Moreover, many concepts from linear algebra can be given
geometrical interpretation.
October 17, 2001 Mixed Models Course 1
• Hence geometry can be a way to understand statistical models with
linear structures
October 17, 2001 Mixed Models Course 2
3 Basic Concepts from Linear algebra)
14
Vectors
Vectors: A column vector is a list of numbers stacked on top of each
other, e.g.
a =
2
1
3
A row vector is a list of numbers written one after the other, e.g.
b = (2, 1, 3)
In both cases, the list is ordered, i.e.
(2, 1, 3) 6= (1, 2, 3).
October 17, 2001 Mixed Models Course 3
• Note In what follows all vectors are column vectors unless
otherwise stated.
In general an n–vector has the form
a =
a1
a2
...
an
where the ais are numbers.
October 17, 2001 Mixed Models Course 4
15
Transpose of vectors: This means that a column vector is turned
into a row vector and that a row vector is turned into a column
vector. The transpose is denoted by “>”. For example,
a> = (a1, a2, . . . , an)
Hence transposing twice takes us back to where we started:
a = (a>)>
• Example:
1
3
2
>
= [1, 3, 2] og [1, 3, 2]> =
1
3
2
October 17, 2001 Mixed Models Course 5
Multiplying a vector by a number: If a is a vector and α is a
number then αa is the vector
αa =
αa1
αa2
...
αan
• Example:
7
1
3
2
=
7
21
14
October 17, 2001 Mixed Models Course 6
3 Basic Concepts from Linear algebra)
16
Sum of vectors: Let a and b be n–vectors. The sum a + b is the
n–vector
a + b =
a1
a2
...
an
+
b1
b2
...
bn
=
a1 + b1
a2 + b2
...
an + bn
= b + a
• Note Only vectors of the same dimension can be added !
• Example:
1
3
2
+
2
8
9
=
1 + 2
3 + 8
2 + 9
=
3
11
11
October 17, 2001 Mixed Models Course 7
Inner product of vectors: Let a and b be n–vectors. The inner
product a · b is the number
a · b = a1b1 + a2b2 + · · ·+ anbn =n
∑
i=1
aibi
• Note The product is a number – not a vector
• Note Only vectors of the same dimension can be multiplied!
• Example:
1
3
2
·
2
8
9
= 1 · 2 + 3 · 8 + 2 · 9 = 44
October 17, 2001 Mixed Models Course 8
17
The length (norm) of a vector: The length (or norm) of a vector
a is
||a|| =√
a · a =
√
√
√
√
n∑
i=1
a2
i
The 0–vector and the 1–vector: The 0-vector (1–vector) is a
vector with 0 (1) on all entries. The 0–vector (1–vector) is
frequently written simply as 0 (1) or as 0n (1n) to emphasize that
it is of length n.
Orthogonal (perpendicular) vectors: Two vectors a and b with
a 6= 0 and b 6= 0 are orthogonal if their inner product is zero,
written
a ⊥ b ⇔ a · b = 0
October 17, 2001 Mixed Models Course 9
Matrices
Matrix: A matrix A with r rows og c columns is an r × c table of
the form
A =
a11 a12 . . . a1c
a21 a22 . . . a2c
... ... . . . ...
ar1 ar2 . . . arc
It is said that A has the dimension r × c.
• Note One can regard A as consisting of c columns vectors put
after each other:
A = [a1 : a2 : · · · : ac]
October 17, 2001 Mixed Models Course 10
3 Basic Concepts from Linear algebra)
18
Transpose of matrices: A matrix is transposed by interchanging
rows and columns and is denoted by “>”. That is,
A> =
a11 a21 . . . ar1
a12 a22 . . . ar2
... ... . . . ...
a1c a2c . . . arc
Example:
1 2
3 8
2 9
>
=
[
1 3 2
2 8 9
]
October 17, 2001 Mixed Models Course 11
• Note If A is an r × c matrix then A> is a c× r matrix.
• Note One can regard a column vector of length r as an r × 1
matrix and a row vector of length c as a 1× c matrix.
October 17, 2001 Mixed Models Course 12
19
Multiplying a matrix with a number: For a number α and a matrix
A, the product αA is the matrix
αA =
αa11 αa12 . . . αa1c
αa21 αa22 . . . αa2c
... ... . . . ...
αar1 αar2 . . . αarc
Example:
7
1 2
3 8
2 9
=
7 14
21 56
14 63
October 17, 2001 Mixed Models Course 13
Sum of matrices: Let A = [a1 : a2 : · · · : ac] and B = [b1 : b2 : · · · :bc] be r × c matrices.
The sum A + B is the r × c matrix given by
A + B = [a1 + b1 : a2 + b2 : · · · : as + bs]
=
a11 a12 . . . a1c
a21 a22 . . . a2c
... ... . . . ...
ar1 ar2 . . . arc
+
b11 b12 . . . b1c
b21 b22 . . . b2c
... ... . . . ...
br1 br2 . . . brc
=
a11 + b11 a12 + b12 . . . a1c + b1c
a21 + b21 a22 + b22 . . . a2c + b2c
... ... . . . ...
ar1 + br1 ar2 + br2 . . . arc + brc
= B + A
October 17, 2001 Mixed Models Course 14
3 Basic Concepts from Linear algebra)
20
• Note Only matrices with the same dimensions can be added.
Example:
1 2
3 8
2 9
+
5 4
8 2
3 7
=
6 6
11 10
5 16
October 17, 2001 Mixed Models Course 15
Multiplication of a matrix and a vector: Let A be an r× c matrix
and let b be a c-dimensional column vector. The product Ab is the
r × 1 matrix
Ab =
a11 a12 . . . a1c
a21 a22 . . . a2c
... ... . . . ...
ar1 ar2 . . . arc
b1
b2
...
bc
=
a11b1 + a12b2 + · · ·+ a1cbc
a21b1 + a22b2 + · · ·+ a2cbc
...
ar1b1 + ar2b2 + · · ·+ arcbc
• Eksempel:
1 2
3 8
2 9
[
5
8
]
=
1 · 5 + 2 · 83 · 5 + 8 · 82 · 5 + 9 · 8
=
21
79
82
October 17, 2001 Mixed Models Course 16
21
Multiplication of matrices: Let A be an r× c matrix and B a c× t
matrix, i.e. B = [b1 : b2 : · · · : bt]. The product AB is the r × t
matrix given by:
AB = A[b1 : b2 : · · · : bt] = [Ab1 : Ab2 : · · · : Abt]
Example:
[
1 23 82 9
]
[
5 48 2
]
=
1 2
3 8
2 9
[
5
8
]
:
1 2
3 8
2 9
[
4
2
]
=
1 · 5 + 2 · 8 1 · 4 + 2 · 23 · 5 + 8 · 8 3 · 4 + 8 · 22 · 5 + 9 · 8 2 · 4 + 9 · 2
=
21 8
79 28
82 26
October 17, 2001 Mixed Models Course 17
• Note The product AB can only be formed if the number of
rows in B and the number of columns in A are the same. On
that case, A and B are said to be conforme.
• Note In general AB and BA are not identical.
A mnemonic for matrix multiplication is :
[
1 23 82 9
]
[
5 48 2
]
=
5 4
8 2
1 2 1 · 5 + 2 · 8 1 · 4 + 2 · 23 8 3 · 5 + 8 · 8 3 · 4 + 8 · 22 9 2 · 5 + 9 · 8 2 · 4 + 9 · 2
=
21 8
79 28
82 26
October 17, 2001 Mixed Models Course 18
3 Basic Concepts from Linear algebra)
22
Special matrices:
• An n× n matrix is said to be a square matrix
• A matrix with 0 on all entries is the 0–matrix and is often written
simply as 0 (or as 0r×c to emphasize the dimension).
• A matrix consisting of 1s in all entries is of written J (or as Jr×c
to emphasize the dimension).
• A square matrix with 0 on all off–diagonal entries and elements
d1, d2, . . . , dn on the diagonal is said to be a diagonal matrix and
is iften written diag{d1, d2, . . . , dn}• A diagonal matrix 1s on the diagonal is called the unity matrix
and is denoted I (or In×n to emphasize the dimension).
• A matrix A is a symmetric matrix A = A>.
October 17, 2001 Mixed Models Course 19
Some rules for matrix operations: For (conformable) matrices
A,B and C the following rules apply
(A + B)> = A> + B>
(AB)> = B>A>
A(B + C) = AB + AC
AB = AC 6⇒ B = C
October 17, 2001 Mixed Models Course 20
23
Inverse of a matrix: The inverse of an n×n matrix A is the matrix
B (which is also n× n) which multiplied with A gives the identity
matrix I. That is,
AB = BA = I.
One says that B is A’s inverse and writes B = A−1.
• Note Only square matrices can have an inverse.
• Note Not all square matrices have an inverse.
• Note When the inverse exists, it is unique.
• Note Finding the inverse of a large matrix A is numerically
complicated.
October 17, 2001 Mixed Models Course 21
Example 1. It is easy find the inverse for a 2× 2 matrix. When
A =
[
a b
c d
]
then the inverse is
A−1 =1
ad− bc
[
d −b
−c a
]
under the assumption that ab − bc 6= 0. The number ab − bc iscalled the determinant of A, sometimes written det(A).
If the determinant det(A) = 0, then A has no inverse. fin
October 17, 2001 Mixed Models Course 22
3 Basic Concepts from Linear algebra)
24
Example 2. Finding the inverse of a diagonal matrix is easy: Let
A =
a1 0 . . . 0
0 a2 0... . . . 0
0 0 . . . an
where all ai 6= 0. Then the inverse is
A−1 =
1
a10 . . . 0
0 1
a20
... . . . 0
0 0 . . . 1
an
If one ai = 0 then A−1 does not exist. fin
October 17, 2001 Mixed Models Course 23
Generalized inverse: Not all square matrices have an inverse.
However all square matrices have a generalized inverse.
A generalized inverse of a square matrix A is a matrix A− satisfying
that
AA−A = A
Any square matrix has an infinite number of generalized inverses.
October 17, 2001 Mixed Models Course 24
25
Linear Combinations
Let a1, a2, . . . , ac be r–vectors and let A = [a1 : a2 : · · · : ac] be the
corresponding r × c matrix.
Let vv = (v1, v2, . . . , vc)> be a c-vector and let
x = Av = a1v1 + a2v2 + · · ·+ acvc =∑
j
ajvj
Then the r–vector x is said to be a linear combination of
a1, a2, . . . , ac.
October 17, 2001 Mixed Models Course 25
Let w = (w1, w2, . . . , wc)> be another c vector and let
correspondingly y = Aw =∑
j ajwj.
Then the following can be noted:
• For a number α the vector αx = α(Av) = A(αv) is also a linear
combination of a1, a2, . . . , ac.
• The sum x+y = Av+Aw = A(v+w) is also a linear combination
of a1, a2, . . . , ac.
• Hence if x and y are both linear combination a1, a2, . . . , ac then so
is the sum αx + βy where α and β are numbers.
October 17, 2001 Mixed Models Course 26
3 Basic Concepts from Linear algebra)
26
n–dimensional Spaces
A 2–vector x = (x1, x2) can be regarded as the point with
coordinates (x1, x2) in a 2–dimensional coordinate system, i.e. in the
plane.
Likewise a 3–vector x = (x1, x2, x3) can be regarded as the point
with coordinates (x1, x2, x3) in a 3–dimensional coordinate system,
i.e. in space.
In general an n–vector x = (x1, x2, . . . , xn) can be regarded as the
point with coordinates (x1, x2, . . . , xn) in an n–dimensional
coordinate system, i.e. in an n–dimensional space. Such as space
shall here be referred to as Rn. Its hard to draw!
October 17, 2001 Mixed Models Course 27
To justify such n–dimensional spaces, suppose x consists of a
location of an object (that takes 3 coordinates), the temperature of
the object (that occupies one coordinate) and the time (that also
occupies one coordinate). Hence the total information about the
object can be regarded as a point in a 5–dimensional space.
Note that If x and y are both vectors in Rn then so is the sum
αx + βy.
October 17, 2001 Mixed Models Course 28
27
Linear Subspaces
Consider a set a1, a2, . . . , ac of r–vectors.
We can regard these vectors as “building blocks” for creating new
vectors as linear combinations of the building blocks. Any such
vector is an r–vector
The set of vectors which can be created as linear combinations of
the “building blocks” is called a linear subspace of Rr.
Such a space, let us call it L, is said to be spanned by a1, a2, . . . , ac
and we write L = span(a1, a2, . . . , ac).
October 17, 2001 Mixed Models Course 29
Example 3. Consider the vectors
a1 =
2
6
4
, a2 =
1
5
7
Hence span(a1, a2) is the set of vectors which can be written as
y =
2
6
4
v1 +
1
5
7
v2
for alle possible choices of v = (v1, v2). fin
October 17, 2001 Mixed Models Course 30
3 Basic Concepts from Linear algebra)
28
More precisely, L consists of all vectors of the form
a1v1 + a2v2 + · · ·+ acvc
for all possible choices of c–vectors v = (v2, . . . , vc).
It is common to organize the building blocks as a matrix
A = [a1 : · · · : ac]. Then another way of describing L is as the set of
vectors that can be written as Av, or more precisely
L = {y|y = Av for all possible vectors v}
Frequenly one uses the name span(A) for L.
October 17, 2001 Mixed Models Course 31
There are some additional aspects of subspaces of which a few will
be illustrated:
Example 4. Consider again the subspace L = span(a1, a2) where
a1 = (2, 6, 4)> a2 = (1, 5, 7)>
• A question is whether all vectors y = (y1, y2, y3)> can be written
as y = a1v1 + a2v2?
The answer is “no”, for example y = (1, 5, 3) can not be writtenin that form.
• Another question is whether there are other ways of representingL?
The answer is “yes” – there are infinitely many. To pick one, letb1 = a1 + a2 and b2 = a1 − a2. Then L = span(b1, b2).
October 17, 2001 Mixed Models Course 32
29
fin
• Note The 0-vector belongs to all linear subspaces. In the previous
example one gets y = 0 when choosing α = (0, 0, 0).)
October 17, 2001 Mixed Models Course 33
Linear dependence and independence
Linearly dependent vectors: A set of vectors a1, ..., ac arelinearly dependent if one of them can be written as a linearcombination of the others, for example if
ac =c−1∑
j=1
ajqj
where the vjs are numbers.
Linearly independent vectors: If none of the vectors a1, ..., ac canbe written as a linear combination of the others, the set is said tobe linearly independent.
October 17, 2001 Mixed Models Course 34
3 Basic Concepts from Linear algebra)
30
Throw–out–technique: If one vector, say ac, can be written as alinear combination of the other vectors, then it can be thrown awaywith changing the structure of the space, i.e.
span(a1, . . . , ac) = span(a1, . . . , ac−1)
This process can go on until one ends up with a set of linearlyindependent vectors.
This allow us to find a representation of the which is as simple(economical) as possible.
October 17, 2001 Mixed Models Course 35
Example 5. Consider the vectors
a1 =
2
6
4
, a2 =
1
5
7
, a3 =
0
2
5
og x =
3
13
16
1. The vector x is a linear combination of a1, a2 and a3, sincex = a1 + a2 + a3.
2. Since a3 = a2 − 1
2a1, the ai–vectors are linearly dependent.
Consequently x can be written as a linear combination of onlya1 og a2, because x = 1
2a1 + 2a2.
3. The vectors a1, a2 are linearly independent and so are the setsa1, a3 and a2, a3.
October 17, 2001 Mixed Models Course 36
31
fin
Basis of a subspace: If the vectors a1, ..., ac span a given subspace
L and are linearly independent, the are said to be a basis for L.
Any linear subspace has infinitely many different bases.
Dimension of a linear subspace: Yet all bases of a linear subspace
shares have a common feature: They have the same number of
elements. The number of elements of a basis is the dimension of
the subspace.
Throw–away: Having a linearly dependent set of vectors a1, ..., ac
on can always apply the throw–away–technique to obtain a
linearly independent set of vectors. This set is then a basis
October 17, 2001 Mixed Models Course 37
for span(a1, . . . , ac).
Example 6. Consider the vectors
a1 =
2
6
4
, a2 =
1
5
7
, a3 =
0
2
5
b1 =
1
3
2
and b2 =
2
8
9
and the corresponding matrices A = [a1 : a2 : a3], A = [a1 : a2] ogB = [b1 : b2].
1. Since a3 = a2 − 1
2a1, the ai vectors are linearly dependent.
October 17, 2001 Mixed Models Course 38
3 Basic Concepts from Linear algebra)
32
fin
• Note Since L = span(A) = span(B) one can think of the
matrices A and B as two different ways of representing the same
linear subspace.
October 17, 2001 Mixed Models Course 40
Projections onto Linear Subspaces
Example 7. Consider the vector a = (2, 2) and y = (1, 2).
Clear y is not in span(a). In statistics the following question isextremely important: Can we find a vector y in span(a) which is as“close to” y as possible?
The answer is “yes”: Find the (orthogonal) projection of the pointy onto the line going through a. There is a simple mathematicalexpression for obtaining y, namely
y = a(a>a)−1a>y =
[
2
2
]
1
8[2, 2]
[
1
2
]
=1
2
[
1 1
1 1
] [
1
2
]
=
[
3
23
2
]
October 17, 2001 Mixed Models Course 41
33
The property of y is that the length of y − y is as small as possible.
Moreover, y − y and y are orthogonal. fin
In general let y be an r–vector and let A = [a1 : · · · : ac] be an r × c
matrix.
Then there always exist a vector y in span(A) which is as close to y
as possible.
If y is in span(A), then y = y because in this case the lenght of
y − y is zero.
If y is not in span(A) then the expression is as follows: Assume that
all columns of A are linearly independent. (Recall that if that is not
October 17, 2001 Mixed Models Course 42
the case we can throw away redundant columns without changing
the space spanned by those remaining.)
Then y = Py where
P = A(A>A)−1A>
is the projection matrix onto span(A).
It then holds that
1. Py is in span().
2. Py is the vector in span(A) which is closest to y (in the sense thatthe lenght of y − y is minmized.
3. Py = y if and only if y is already in span(A).
October 17, 2001 Mixed Models Course 43
3 Basic Concepts from Linear algebra)
34
Example 8. Consider the 3× 2 matrix A = [a1 : a2], where
a1 =
1
3
2
og a2 =
2
8
9
Then the projection matrix onto span(A) is P = A(A>A)−1A>. Tofind P we first calculate
A>A =
[
1 3 2
2 8 9
]
1 2
3 8
2 9
=
[
14 44
44 149
]
October 17, 2001 Mixed Models Course 44
Hence
(X>X)−1 =1
150
[
149 −44
−44 14
]
From this we find
(X>X)−1X> =1
150
[
149 −44
−44 14
] [
1 3 2
2 8 9
]
=1
150
[
61 95 98
−16 −20 38
]
October 17, 2001 Mixed Models Course 45
35
Finally we find
P = A(A>A)−1A> =1
150
1 2
3 8
2 9
[
61 95 98
−16 −20 38
]
=1
150
29 55 −22
55 125 10
−22 10 146
fin
October 17, 2001 Mixed Models Course 46
Exercises in linear algebra
Exercise 1. 1. Are the vectors (1, 1) and (1, 2) orthogonal?
2. Are (1, 1) and (2,−2) ?
3. Are (1, 1) and (−1,−1) ?
4. Make a drawing which illustrates these vectors
Exercise 2. Let
A =
1 2
3 4
5 6
.
October 17, 2001 Mixed Models Course 47
3 Basic Concepts from Linear algebra)
36
1. Is A symmetrical?
2. Is A>A symmetrical?
3. Is AA> symmetrical?
4. What is the result from adding A and A>?
Exercise 3. Let
A =
[
1 2
3 4
]
, and B =
[
1 0
1 1
]
.
Calculate AB and BA. What can be concluded from this?
Exercise 4. Let a = (1, 1, 1, 0, 0, 0)> be a 6 × 1 matrix. Find aa>
and a>a.
October 17, 2001 Mixed Models Course 48
Exercise 5. Let
A =
[
a b
c d
]
and
B =1
ad− bc
[
d −b
−c a
]
Calculate AB. What can be concluded from this?
Exercise 6. What is the inverse to the 3× 3 matrix diag(1, 4, 9)?
Exercise 7. Two equations with two unknowns. COnvince yourself
October 17, 2001 Mixed Models Course 49
37
that the system of equations
x1 + 2x2 = 3
2x1 + 3x2 = 4
can be written as
[
1 2
2 3
] [
x1
x2
]
=
[
3
4
]
,
i.e. as Ax = b. Find A−1 and use this for solving the system of
equations as follows:
x = Ix = A−1Ax = A−1b.
October 17, 2001 Mixed Models Course 50
Exercise 8. Let
A =
1 0
1 0
0 1
0 1
.
1. How do vectors of the form Av look when v = (v1, v2)>?
2. Find the projection matrix P = A(A>A)−1A>.
3. Let y = (1, 3, 5, 7)>. Find Py.
October 17, 2001 Mixed Models Course 51
3 Basic Concepts from Linear algebra)
38
4 Linear normal models
Linear normal models serves as a natural starting point for the presentation of Mixed Modelstheory. Most researchers within animal science has a least a working knowledge of linear normalmodels
These slides served the purpose of giving an overview of the different concepts, and to link theconcepts with the underlying statistical theory. Finally, the standard terminology used withinSAS, were presented from a theoretical point of view.
Link to the full screen presentation1.
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/LNM.f.pdf
39
Introduction
Many well known statistical models used in practice, for example
• linear regression,
• multiple regression,
• analysis of variance,
• analysis of covariance,
can be formulated in the general framework of linear normal models
(abbreviated LNM), which undoubtly is the most important class of
models in statistics.
October 17, 2001 Mixed Models Course 1
A linear normal model is also sometimes called a
general linear model.
The SAS procedure PROC GLM is designed to deal with the class of
linear normal models.
Any linear normal model can be formulated in matrix form as
Y = Xβ + ε
where Y is an n× 1 vector of observations, X is an n× p matrix of
covariates, β is a p× 1 vector of unknown parameters and ε is a
n× 1 vector of unobservable random errors.
October 17, 2001 Mixed Models Course 2
4 Linear normal models
40
Example 1. One–way analysis of variance.
The modelYkl = αk + εkl
where εkl ∼ N(0, σ2) for k = 1, 2 and l = 1, 2, 3 can be written inmatrix form as
Y11
Y12
Y13
Y21
Y22
Y23
=
1 01 01 00 10 10 1
[
α1
α2
]
+
ε11ε12ε13ε21ε22ε23
Y = X β + ε
October 17, 2001 Mixed Models Course 3
The vector of expected values µ = (µ11, µ12, . . . , µ23)> is
µ11
µ12
µ13
µ21
µ22
µ23
=
1 01 01 00 10 10 1
[
α1
α2
]
=
α1
α1
α1
α2
α2
α2
µ = X β
fin
October 17, 2001 Mixed Models Course 4
41
There are good reasons for dealing with LNMs in general instead, of
treating regression analysis, analysis of variance etc. separately.
For LNMs in general it is easy to establish how to
• estimate parameters,
• estimate contrasts,
• make significance tests,
• perform model control.
From these general results, it can be deduced how to make the
corresponding tests in e.g. regression models and in analysis of
variance
October 17, 2001 Mixed Models Course 5
It is also convenient to work with LNMs in matrix terminology,
because any LNM can be formulated generally as
y = Xβ + ε
Moreover, random effects models (mixed models) are an extension of
linear normal models. I.e. any linear normal model is in a sense also
a mixed model.
Many aspects of mixed models become extremely cumbersome if the
matrix representation is not available.
October 17, 2001 Mixed Models Course 6
4 Linear normal models
42
Example 2. Simple linear regression:
The linear regression model
Yi = β0 + β1xi + εi
where εi ∼ N(0, σ2) for i = 1, . . . , 6 can be written in matrix form as
Y1
Y2
Y3
Y4
Y5
Y6
=
1 x1
1 x2
1 x3
1 x4
1 x5
1 x6
[
β0
β1
]
+
ε1ε2ε3ε4ε5ε6
Y = X β + ε
October 17, 2001 Mixed Models Course 7
The vector of expected values µ = (µ1, µ2, . . . , µ6)> is
µ1
µ2
µ3
µ4
µ5
µ6
=
1 x1
1 x2
1 x3
1 x4
1 x5
1 x6
[
β0
β1
]
=
β0 + β1x1
β0 + β1x2
β0 + β1x3
β0 + β1x4
β0 + β1x5
β0 + β1x6
µ = X β
fin
October 17, 2001 Mixed Models Course 8
43
Linear Normal Models
A linear normal model (LNM) is defined as follows:
1. The observations y1, . . . , yn come from (are realizations of)
independent random variables Y1, . . . , Yn.
2. Each random variable has a normal distribution
Yi = µi + εi εi ∼ N(0, σ2).
Hence each Yi is allowed to have its own mean value, but the
variance σ2 is the same for all i = 1, . . . , n.
October 17, 2001 Mixed Models Course 9
3. To each observation yi there are covariates (known constants)
xi1, . . . , xip such that
µi = µ(β)i = xi1β1 + xi2β2 + · · ·+ xipβp =
p∑
k=1
xikβk.
That is, the mean value µi is related to the covariates in a linear
way through the parameters β1, . . . , βp.
A practical interpretation of constant variance is that each random
variable Yi has the same tendency to deviate (in a random way)
from its expectation µi.
October 17, 2001 Mixed Models Course 10
4 Linear normal models
44
As it has been illustrated, any LNM can be cast in matrix form as
Y = Xβ + ε
where
Y : is an n× 1 vector of observations,
X : is an n× p matrix of covariates, whose ith row is xi1, . . . , xip,
β : is a p× 1 vector of unknown parameters, and
ε : is a n × 1 vector of unobservable random errors which are
independent and N(0, σ2) distributed.
October 17, 2001 Mixed Models Course 11
The matrix X is called the design matrix (or model matrix) because
it contains information about covariates, i.e. about the design of the
study.
October 17, 2001 Mixed Models Course 12
45
Example 3. Polynomial regression:
The polynomial regression model
Yi = β0 + β1xi + β2x2i + εi
where εi ∼ N(0, σ2) for i = 1, . . . , 6 can be written in matrix form as
Y1
Y2
Y3
Y4
Y5
Y6
=
1 x1 x21
1 x2 x22
1 x3 x23
1 x4 x24
1 x5 x25
1 x6 x26
β0
β1
β1
+
ε1ε2ε3ε4ε5ε6
µ = X β + ε
fin
October 17, 2001 Mixed Models Course 13
Random Vectors and Matrices
A random vector Z = (Z1, . . . , Zn)> is a vector of random variables.
Since we are working with vectors of random variables, it is
convenient to establish the notions of
• expectation vector (or mean vector ) and
• covariance matrix of a vector of random variables.
October 17, 2001 Mixed Models Course 14
4 Linear normal models
46
• Most frequently the interest is the the mean vector.
• Yet, the covariance matrix is of interest when modelling that
observations can not be regarded as comming from independent
random variables.
• In fact, one view of mixed models is that mixed models are
concerned with modelling the covariance matrix is some structured
way.
October 17, 2001 Mixed Models Course 15
The mean or expectation of a random vector is the vector of mean
values, i.e.
E(Z) =
E(Z1)
E(Z2)...
E(Zn)
=
µ1
µ2...
µn
= µ
For a LNM, we have already seen a use of this, namely through
writing
µ = Xβ.
October 17, 2001 Mixed Models Course 16
47
The covariance matrix Cov(Z) of a random vector
Z = (Z1, . . . , Zn)>
is the n× n matrix whose element in the ith row and jth column is
the covariance between Zi and Zj.
Example 4. For example, with n = 3 we have
Cov(Z) =
[
Var(Z1) Cov(Z1, Z2) Cov(Z1, Z3)Cov(Z1, Z2) Var(Z2) Cov(Z2, Z3)Cov(Z3, Z1) Cov(Z3, Z2) Var(Z3)
]
=
[
σ21 σ12 σ13
σ21 σ22 σ23
σ31 σ32 σ23
]
fin
October 17, 2001 Mixed Models Course 17
In general
Cov(Z)ij = Cov(Zi, Zj) = E[(Zi − µi)(zj − µj)].
In particular the diagonal elements of Cov(Z) contain the variances,
Cov(Z)ii = Cov(Zi, Zi) = E[(Zi − µi)2] = V ar(Zi).
Since Cov(Zi, Zj) = Cov(Zj, Zi), the covariance matrix is
symmetric.
October 17, 2001 Mixed Models Course 18
4 Linear normal models
48
Example 5. The error term ε = (ε1, . . . , εn) from a linear normalmodel has a very simple covariance matrix:
• Var(εi) = σ2 because the variance is the same for all units
• Cov(εi, εj) = 0 because εi and εj are independent.
• Hence
Cov(ε) = σ2
1 0 . . . 00 1 . . . 0... ... . . . ...0 0 . . . 1
= σ2In
fin
October 17, 2001 Mixed Models Course 19
Functions of Random Vectors
Matrix algebra is useful when dealing with
linear functions of random vectors.
If Z is a random n-vector, A is an r× n matrix and b is an r–vector,
then
U = AZ + b
is also a random vector.
October 17, 2001 Mixed Models Course 20
49
The mean and covariance of linear functions of random vectors is
easily calculated using the following:
Result 1.
E(AY + b) = AE(Y ) + b (1)
Cov(AY + b) = Cov(AY ) = ACov(Y )A> (2)
October 17, 2001 Mixed Models Course 21
A particular application of (1) and (2) is the following:
• Let Z be a random vector of length n with mean E(Z) (an
n–vector) and covariance matrix Cov(Z) (an n× n matrix).
• Let a = (a1, . . . , an)> be a vector of numbers and consider the
linear combination U =∑
i aiZi = a>Z.
• Then (1) and (2) implies that
E(U) = E(a>Z) = a>E(Z)
Cov(U) = Cov(a>Z) = a>Cov(Z)a
October 17, 2001 Mixed Models Course 22
4 Linear normal models
50
The Multivariate Normal Distribution
So far, we have treated the mean and covariance of a random vector.
We shall now discuss a distribution of a random vector:
Definition 1. It is said that Z follows an n–dimensional
multivariate normal distribution (in short MVN) with mean vector
µ = E(Z) and covariance matrix Σ = Cov(Z), written
Z ∼ Nn(µ,Σ)
if a>Z follows a univariate normal distribution for all possible n-
vectors a.
October 17, 2001 Mixed Models Course 23
Without going into detail, we shall just mention that if Σ has an
inverse, then Z has a density which can be written
f(z) = (2π)−n2 det(Σ)−
n2 exp{
1
2(z − µ)>Σ−1(z − µ)}
Example 6. For n = 2 the density looks as follows:
October 17, 2001 Mixed Models Course 24
51
fin
October 17, 2001 Mixed Models Course 25
The Distribution of a LNM
For a LNM, the vector of unobservable errors is ε = (ε1, . . . , εn)>,
where εi ∼ N(0, σ2) and ε1, . . . , εn are independent.
Hence we have
E(ε) = 0 and Cov(ε) = σ2I
Since any linear combination of independent N(0, σ2)–variables
yields a normal variable we conclude that
ε ∼ Nn(0, σ2I)
October 17, 2001 Mixed Models Course 26
4 Linear normal models
52
Hence for the linear normal model Y = Xβ + ε we find that
E(Y ) = µ = E(Xβ + ε)
= Xβ + E(ε) = Xβ
Cov(Y ) = Cov(Xβ + ε)
= Cov(ε) = σ2I
and can write
Y ∼ Nn(Xβ, σ2I).
October 17, 2001 Mixed Models Course 27
The Expectation in a LNM
Example 7. (Continuation of Example 1).
The one–way analysis of variance model in Example 1 can beformulated at least three different ways:
1. As Ykl = αk + εkl, and β = (α1, α2)>.
2. As Ykl = δ + γk + εkl where γ2 = 0, such that γ1 is represents thetreatment effect. Hence, β2 = (δ, γ1)
>.
3. As Ykl = δ + ρk + εkl. Thus, β3 = (δ, ρ1, ρ2)>.
October 17, 2001 Mixed Models Course 28
53
In many ways, the latter formulation is the most natural andconventional, but it poses some problems
Let
X =
1 0
1 0
1 0
0 1
0 1
0 1
X2 =
1 1
1 1
1 1
1 0
1 0
1 0
X3 =
1 1 0
1 1 0
1 1 0
1 0 1
1 0 1
1 0 1
(3)
Any vector which can be written as Xβ must be of the form(a, a, a, b, b, b)> for numbers a and b.
October 17, 2001 Mixed Models Course 29
But that is also the case for vectors of the form X2β2 and X3β3.From this we conclude that with respect to the mean vector thematrices X, X2 and X3 are “all the same”.
This leads to that
µ = Xβ = X2β2 = X3β3.
1. X corresponds to writing the model as Ykl = αk + εkl.
2. X2 corresponds to writing the model as Ykl = δ + γk + εkl, withγ2 = 0.
3. X3 corresponds to writing the model as Ykl = δ + ρk + εkl.
October 17, 2001 Mixed Models Course 30
4 Linear normal models
54
Consider the mean vector µ = (2, 2, 2, 3, 3, 3)>. The formulation asµ = X3β3 where β3 = (δ, ρ1, ρ2)
> is different from the two others inan important way:
• Under the representation µ = Xβ, there is only one choice of βnamely β = (2, 3) which yields µ.
• Under the representation µ = X2β2, there is only one choice of β2
namely β2 = (3,−1) which yields µ.
• Under the representation µ = X3β3, there are infinitely many waysof obtaining µ. Two such are β3 = (1, 1, 2) and β3 = (3,−1, 0).
fin
October 17, 2001 Mixed Models Course 31
• Example 7 illustrates that there in general are different
representations of the same model. Corresponding to the different
representations, there are different parameters, with different
interpretations.
• We say that the there are different parametrizations of the same
model.
• The representation µ = X3β3 is said to be over parametrized –
there are too many parameters in the model.
October 17, 2001 Mixed Models Course 32
55
In many practical situations the models we work with are over
parametrized.
Yet, it does not matter which representation of the model we choose
and it is not really important that whether the model is over
parametrized in the following sense:
Any question that can be answered under one representation can
also be answered under another.
October 17, 2001 Mixed Models Course 33
To treat these issues in detail, it is necessary to think about what a
LNM really says: It says that
y = Xβ + ε where µ = Xβ.
Hence β effects the distribution of the observables y only indirectly,
namely through Xβ.
Therefore since y is what can be observed, we can only use y for
saying “somethingh” about β if this “something” can be expressed
through Xβ.
This observation leads to the important notion of estimability and
estimable functions.
October 17, 2001 Mixed Models Course 34
4 Linear normal models
56
The columns of X defines a subspace of Rn which we denote by L,
i.e.
L = span(X).
The statement µ = Xβ simply means that µ can be written as a
linear combination of the column vectors of X, i.e. that µ lies in
span(X).
But as has been illustrated in Example 7, there might be more than
one β vector producing µ.
Hence by saying that µ = Xβ, all one really says is that µ belongs
to L.
Moreover, there are infinitely many different ways of representing L,
October 17, 2001 Mixed Models Course 35
because one can always find another matrix, say X2 with
span(X2) = span(X) such that any vector µ = Xβ = X2β2.
Therefore, since the parameter vector β is closely related to the
actual representation of L, and since β might not be uniquely
determined, the value of a parameter vector β is rarely of direct
interest in itself.
October 17, 2001 Mixed Models Course 36
57
Example 8. (Continuation of Example 2)
Let x. = 1n
∑
i xi denote the average of the xis. Define new variableszi = xi − x. and consider the regression model
Yi = α0 + α1zi + εi.
This model corresponds to “centering the xis around their mean”.Not surprisingly, this does not change the fundamental structure ofthe model - it is still a linear regression model, but with the followingnew design matrix:
October 17, 2001 Mixed Models Course 37
X =
1 z1
1 z2
1 z3
1 z4
1 z5
1 z6
=
1 x1 − x.
1 x2 − x.
1 x3 − x.
1 x4 − x.
1 x5 − x.
1 x6 − x.
, β =
[
α0
α1
]
fin
October 17, 2001 Mixed Models Course 38
4 Linear normal models
58
Representations of Models in SAS
Here we shall illustrate some of the differences between different
ways of specifying the models in SAS.
The illustration is with PROC MIXED but applies to PROC GLM too.
The model in Example 7 can be analyzed with the SAS program
PROC MIXED;
CLASS TREAT;
MODEL Y = TREAT / SOLUTION;
RUN;
Here TREAT is a variable with levels 1 and 2.
October 17, 2001 Mixed Models Course 39
1. First SAS generates the matrix X3.
2. SAS then realizes that the columns of X3 are linearly dependent.
3. SAS therefore proceeds by eliminating columns until a set of linearly
independent columns are achieved. This is done in a systematic
way: The column corresponding to the highest value of TREAT is
removed which yields X2.
The parameter estimates reported by SAS are therefore (δ, γ1).
Note that it is the option SOLUTION that causes the parameter
estimates to be reported.
October 17, 2001 Mixed Models Course 40
59
The SAS program
PROC MIXED;
CLASS TREAT;
MODEL Y = TREAT / NOINT SOLUTION;
RUN;
on the other hand causes SAS to directly generate X, because the
NOINT option specifies that there shall not be a column of 1s in the
design matrix. The parameter estimates reported by SAS is therefore
(α1, α2).
October 17, 2001 Mixed Models Course 41
Example 9. Consider the two–way analysis of variance
Yijk = δ + αi + βj + γij + εijk
where i = 1, 2, j = 1, 2 and k = 1, 2, 3. The mean vector is
µ =
1 1 0 1 0 1 0 0 0
1 1 0 0 1 0 1 0 0
1 0 1 1 0 0 0 1 0
1 0 1 0 1 0 0 0 1
δα1
α2
β1
β2
γ11
γ12
γ21
γ22
= Xβ
(where in the designmatrix we regard 1 and 0 as vectors of length 3).
October 17, 2001 Mixed Models Course 42
4 Linear normal models
60
This model is highly over parametrized. SAS handles this problem inthe way indicated above: A new design matrix giving the same modelis created, namely
µ =
1 1 1 1
1 1 0 0
1 0 1 0
1 0 0 0
δ
α1
β1
γ11
= X2β2
This corresponds to setting α2 = β2 = γ21 = γ12 = γ22 = 0 onbeforehand. (That is every time a parameter contains the levelnumber 2 in its index it is set to being zero.) fin
October 17, 2001 Mixed Models Course 43
This means that SAS solves the problem of an over parametrized
model by simply reducing it to a representation which is not over
parametrized.
As mentioned previously, this is not a problem because any quation
that can be answered under one representation of a model can also
be answered under another.
Yet, care should be taken when it comes to interpreting output from
SAS, see Section 18.
October 17, 2001 Mixed Models Course 44
61
Least Squares Estimation in a LNM
In a LNM, the mean µi is a function of the parameter vector β.
One frequently used criterion for estimation is the method of
least squares:
Find the vector µ = (µ1, . . . , µn)> which minimizes the sum of
squared deviations
D(β) =n
∑
i=1
(yi − µi)2
under the restriction that µ = Xβ for some parameter vector β.
October 17, 2001 Mixed Models Course 45
• Such a vector µ always exists and is unique.
• We say that β is a least squares estimate for β. Such an estimate
β also exists, but it is in general not unique.
October 17, 2001 Mixed Models Course 46
4 Linear normal models
62
Example 10. (Continuation of Example 2)
For the regression analysis we find
D(β) =n
∑
i=1
(yi − (β0 + β1xi))2
Most standard textbooks on statistics take the following approach tominimization of D(β):
1) Calculate the derivatives ∂∂β0
D(β) and ∂∂β1
D(β),
2) set these equal to zero and
3) solve for β0 and β1.
October 17, 2001 Mixed Models Course 47
This gives
β1 =
∑
i(yi − y.)(xi − x.)∑
i(xi − x.)2
β0 = y.− β1x.
fin
October 17, 2001 Mixed Models Course 48
63
Example 11. (Continuation of Example 1) For the one–way analysisof variance
D(β) =
2∑
k=1
3∑
l=1
(ykl − αk)2
The values of αk which minimizes D(β), where β = (α1, α2)>, are
αk =1
3
3∑
l=1
ykl = yk
The vector µ is in this case (y1, y1, y1, y2, y2, y2)>.
However, if the model is written as Ykl = δ + ρk + εkl, i.e. asY = X3β3 + ε in Example 7, there is no unique least squares estimate
October 17, 2001 Mixed Models Course 49
of β3 = (δ, α1, α2). To see this, just note that
δ = 0, α1 = y1, α2 = y2
and
δ = (y1 + y2)/2, α1 = (y1 − y2)/2, α2 = (−y1 + y2)/2
both results in the same vector µ = (y1, y1, y1, y2, y2, y2)>. fin
October 17, 2001 Mixed Models Course 50
4 Linear normal models
64
Estimation on matrix form
The estimation problem can be formulated very generally in matrix
notation and can be solved generally using projections onto
subspaces:
Using matrix notation the least squares method is:
Find the vector µ = (µ1, . . . , µn)>
D(β) = (y − µ)>(y − µ)
under the restriction that µ = Xβ for some parameter vector β.
October 17, 2001 Mixed Models Course 51
Then we have the following results:
1. There always exists a unique vector of expected values µ =
(µ1, . . . , µn)> which minimizes D(β).
2. The vector µ is µ = Py where P be is the projection matrix onto
span(X).
3. Since µ is in span(X), there exists a vector β1 satisfying that
µ = Xβ1. We say that β1 is a least squares estimate of β.
4. If the columns of X are linearly independent, there exists only one
vector β1 satisfying that µ = Xβ1. In that case the least squares
estimate is unique.
October 17, 2001 Mixed Models Course 52
65
5. If the columns of X are linearly dependent, there exists several least
squares estimates, i.e. there is another vector β2 with µ = Xβ2,
and where β1 6= β2.
6. In regression problems, the least squares estimate is typically unique,
whereas in analysis of variance problems, the least squares estimate
is generally not unique.
7. In the case where the least squares estimate is unique is is given as
β = (X>X)−1X>y.
It is easy to see why it is so: We know that µ = Py =
X[(X>X)−1X>y]. However, since µ is in span(X), we also
know that µ = Xβ. But both equations can only be true if
β = (X>X)−1X>y.
October 17, 2001 Mixed Models Course 53
The vector e = y − µ is the vector of residuals reflecting the
unobserved error vector ε.
Hence e>e = (y − µ)>(y − µ) is the residual sums of squares and if
the model fits well to data, e>e should be “small” in some sense.
If there are p linearly independent columns in X the estimate for the
variance σ2 is
σ2 =1
n− pe>e =
1
n− p(y − µ)>(y − µ)
October 17, 2001 Mixed Models Course 54
4 Linear normal models
66
Example 12. (Continuation of Example 7).
With the matrix X as in Example 7, the projection matrix becomes
P =1
3
1 1 1 0 0 0
1 1 1 0 0 0
1 1 1 0 0 0
0 0 0 1 1 1
0 0 0 1 1 1
0 0 0 1 1 1
fin
October 17, 2001 Mixed Models Course 55
The parameter vector β
We shall now assume that the LNM is such that the columns of X
are linearly independent such that the least squares estimate
β = (X>X)−1X>y.
of β is unique.
Letting A = (X>X)−1X> we note that A is an p× n–matrix and
see that β = Ay.
October 17, 2001 Mixed Models Course 56
67
Thinking in terms of random variables, the data y is a realization of
a random vector Y with E(Y ) = Xβ and Cov(Y ) = σ2I. Then
β(Y ) = (X>X)−1X>Y = AY
is also a random vector because β(Y ) is a function of the random
vector Y .
If the elements of A are denoted aij we see that the ith component
of β is βi =∑p
j=1 aijyj
Hence each component βi of the vector β is a linear function of the
data y. Therefore it is not surprising that the corresponding random
variables βi(Y ) are dependent is some way.
October 17, 2001 Mixed Models Course 57
Using the relations (1) and (2) we find that
E(β(Y )) = AE(Y ) = (X>X)−1X>E(Y )
= (X>X)−1X>Xβ = β (4)
Equation (4) says that the expected value of the least squares
estimator β is simply the true but unknown value β.
October 17, 2001 Mixed Models Course 58
4 Linear normal models
68
Cov(β(Y )) = ACov(Y )A> = σ2AIA> = σ2AA>
= σ2(X>X)−1X>[(X>X)−1X>]>
= σ2(X>X)−1X>X(X>X)−1
= σ2(X>X)−1 (5)
Equation (5) says that the covariance of the least squares estimator
β is proportional to the residual variance σ2. Moreover, the matrix
(X>X)−1 does not depend on the data y but only on the design
matrix X, i.e. on how the study at hand was conducted.
October 17, 2001 Mixed Models Course 59
Recall that on the diagonal of a covariance matrix one finds the
variances. Hence when knowning (X>X)−1 and an estimate for σ2
then we also know the variance estimates for βi.
October 17, 2001 Mixed Models Course 60
69
Example 13. (Continuation of Example 2) Suppose xi = i andzi = i− 3.5 in the regression example for i = 1, . . . , 6.
Regression of y on x with the program
PROC GLM ;
MODEL y = x / inv;
RUN; QUIT;
gives the result
October 17, 2001 Mixed Models Course 61
The GLM Procedure
X’X Inverse Matrix
Intercept x y
Intercept 0.8666666667 -0.2 -1.286578758
x -0.2 0.0571428571 0.4835938022
y -1.286578758 0.4835938022 3.225955579
Dependent Variable: y Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 4.09260190 4.09260190 5.07 0.0874
Error 4 3.22595558 0.80648889
Corrected Total 5 7.31855748
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept -1.286578758 0.83603651 -1.54 0.1987
x 0.483593802 0.21467436 2.25 0.0874
October 17, 2001 Mixed Models Course 62
4 Linear normal models
70
The two first diagonal elements of (X>X)−1 times the varianceestimate σ (i.e. the Mean Square Error) gives variance estimates ofthe regression parameters.
The square root of these estimates are the standard errors reported.
Moreover, the covariance between the intercept and the slope isestimated to be −0.2 so these estimates are correlated.
October 17, 2001 Mixed Models Course 63
Regression of y on z with the program
PROC GLM ;
MODEL y = z / inv;
RUN; QUIT;
gives the result
October 17, 2001 Mixed Models Course 64
71
The GLM Procedure
X’X Inverse Matrix
Intercept z y
Intercept 0.1666666667 0 0.4059995498
z 0 0.0571428571 0.4835938022
y 0.4059995498 0.4835938022 3.225955579
The GLM Procedure
Dependent Variable: y Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 4.09260190 4.09260190 5.07 0.0874
Error 4 3.22595558 0.80648889
Corrected Total 5 7.31855748
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 0.4059995498 0.36662626 1.11 0.3302
z 0.4835938022 0.21467436 2.25 0.0874
October 17, 2001 Mixed Models Course 65
In this case we see that centering the x values around their average(3.5) gives parameter estimates which are uncorrelated. Moreover,the estimate of the slope (and the associated standard error) is thesame as before. fin
October 17, 2001 Mixed Models Course 66
4 Linear normal models
72
Example 14. (Continuation of Example 2)
With
X =
1 x1
1 x2
1 x3
1 x4
1 x5
1 x6
, β =
[
β0
β1
]
we find (when letting n = 6) that
X>X =
[
n∑
i xi∑
i xi
∑
i x2i
]
October 17, 2001 Mixed Models Course 67
Recall that
A =
[
a b
c d
]
implies that A−1 =1
ad− bc
[
d −b
−c a
]
(provided that ab− bc 6= 0). Using this gives
(X>X)−1 =1
n∑
i x2i − (
∑
i xi)2
[ ∑
i x2i −
∑
i xi
−∑
i xi n
]
Letting K = 1n
∑
i x2i−(
∑
i xi)2, the variance of the estimator β0 for the
intercept is
V ar(β0) =
∑
i x2i
K
October 17, 2001 Mixed Models Course 68
73
and the variance of the estimator β1 for the slope is
V ar(β1) =n
K
The estimators β0 and β1 are correlated since
Cov(β0, β1) = −1
K
∑
i
xi
fin
October 17, 2001 Mixed Models Course 69
Example 15. (Continuation of Example 8)
Since∑
i(xi − x.) = 0 (Verify this!) we find that
X>X =
[
n 0
0∑
i z2i
]
=
[
n 0
0∑
i(xi − x.)2
]
Since the inverse of a diagonal matrix is also diagonal, we concludethat the estimators α0 and α1 are independent. fin
October 17, 2001 Mixed Models Course 70
4 Linear normal models
74
The estimator β has a p–dimensional multivariate normal
distribution (in short MVN), with mean vector β and covariance
matrix σ2(X>X)−1.
This is written
β ∼ Np(β, σ2(X>X)−1).
This means that any linear combination λ>β has a univariate normal
distribution
λ>β ∼ N(λ>β, σ2λ>(X>X)−1λ) (6)
and that is a very important result for practical statistics.
October 17, 2001 Mixed Models Course 71
Estimability and Contrasts
In a LNM with mean vector µ = Xβ one is typically interested in
making statements about (some of) the components of the
parameter vector β.
However, with µ = Xβ we only have indirect knowledge about β
because all we know is that µi =∑
j xijβj and, as has been
illustrated, β is in general not uniquely determined. That is, there
can be another vector β2 such that µ = Xβ = Xβ2.
Hence there are some constraints on what can actually be said about
β.
October 17, 2001 Mixed Models Course 72
75
In the one–way analysis of variance of Example 1 one might be
interested in the difference α1 − α2 or in α1 itself and there is no
problem in that. For later purposes it can be noted that
α1 − α2 = (1,−1)(α1, α2)> = (1,−1)β
α1 = (1, 0)(α1, α2)> = (1, 0)β
October 17, 2001 Mixed Models Course 73
Example 16. Consider the two–way analysis of variance
Yij = δ + αi + βj + εij
where
µ =
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 0 1
δ
α1
α2
β1
β2
= Xβ
It is clear that this model is grossely over parametrized (why?)
Under this model we can estimate quantities like
α1 − α2, δ + α1, δ + α1 +1
2(β1 + β2)
October 17, 2001 Mixed Models Course 74
4 Linear normal models
76
Note that
α1 − α2 = (0, 1,−1, 0, 0)β,
δ + α1 +1
2(β1 + β2) = (1, 1, 0,
1
2,1
2)β
However other things like
α1 = (0, 1, 0, 0, 0)β or β1 = (0, 0, 0, 1, 0)β
can not be estimated under this model.
fin
October 17, 2001 Mixed Models Course 75
In a sense, the only thing uniqely determined in a LNM is µ.
Therefore the only thing one can truely say something about is linear
combinations of µ, i.e. linear combinations of the form
a>µ
for some n–vector a.
Most frequently interest is in contrasts of the form λ>β.
Therefore, a natural question is how
a>µ and λ>β
relate to each other?
October 17, 2001 Mixed Models Course 76
77
Since µ = Xβ, we can only say something about β if one can
express it as
a>Xβ.
Note that a>X is an 1× p–vector.
Therefore, we can say something about the contrast λ>β only if one
can find an n–vector a such that
a>X = λ>
If there exists such a vector a, the contrast λ>β is said to be
estimable.
In this case the contrast is can be written
λ>β = a>Xβ = a>µ
October 17, 2001 Mixed Models Course 77
After having estimated µ, the contrast λ>β is estimated by
λ>β = a>Xβ = a>µ.
Recall from the section on estimation that there might in general be
many least squares estimates for β. However, the following holds:
Result 2. The least squares estimate of λ>β is unique if and only
if λ>β is estimable.
In other words,
The only thing one can say something about in an unambiguous
way is estimable functions.
October 17, 2001 Mixed Models Course 78
4 Linear normal models
78
From the general result
λ>β ∼ N(λ>β, σ2λ>(X>X)−1λ) (7)
we know the distribution of the contrast λ>β and hence testing for
the contrast being zero is straight forward.
Note that transposing a>X = λ> gives X>a = λ.
Hence the condition for estimability is that λ can be written as a
linear combination of the columns of X> i.e. as a linear combination
of the rows of X.
This amounts to solving a set of linear equations – and computers
can do that!
October 17, 2001 Mixed Models Course 79
Example 17. (Continuation of Example 16)
We wish to verify that
δ + α1 +1
2(β1 + β2) = (1, 1, 0,
1
2,1
2)β
is indeed estimable.
That is, we seek a vector a = (a1, a2, a3, a4)> such that
a>X = (1, 1, 0,1
2,1
2).
October 17, 2001 Mixed Models Course 80
79
Direct multiplication gives
a1 + a2 + a3 + a4 = 1
a1 + a2 = 1
a3 + a4 = 0
a1 + a3 =1
2
a2 + a4 =1
2
It is not hard to spot that the solution to these equations are
a1 = a2 = 1/2 and a3 = a4 = 0.
fin
October 17, 2001 Mixed Models Course 81
Estimability in SAS
In checking whether a specific contrast is estimable, it is
recommended to use PROC GLM.
The following SAS program deals with data from Example 16
proc glm data=a;
class i j;
model y = i j/E;
lsmeans i j /E;
run;
October 17, 2001 Mixed Models Course 82
4 Linear normal models
80
The output caused by the E–option in the MODEL statement is
General Form of Estimable Functions
Effect Coefficients
1 Intercept L1
2 i 1 L2
3 i 2 L1-L2
4 j 1 L4
5 j 2 L1-L4
Recall that β = (δ, α1, α2, β1, β2). The numbers 1,2,3,4,5 identify
the entry of the λ–vector, λ = (λ1, λ2, . . . , λ5), and the Ls specify
the constraints to be satisfied by the λis.
It reads as follows: λ1 can be set to any value L1, and λ2 can be set
to any value L2. But then λ3 is constrained to be equal to L1− L2.
Likewise, λ4 can be set to any value L4, but then λ5 is constrained
October 17, 2001 Mixed Models Course 83
to be equal to L1− L4.
From this we see how to specify some contrasts
λ = (1, 1, 0, 1, 0) : λ>β = δ + α1 + β1
λ = (1, 1, 0,1
2,1
2) : λ>β = δ + α1 +
1
2(β1 + β2)
λ = (0, 1,−1, 0, 0) : λ>β = α1 − α2
But we can also see that the contrast δ + 12(α1 + α2) is not
estimable: Taking λ1 = 1 and λ2 = λ3 = 12 would give the desired
result, but setting λ4 = 0 implies that λ5 = 1, so it is not possible.
The contrasts specified above are constructed as follows in PROC
GLM (and in PROC MIXED. Note that we have indicate two ways of
October 17, 2001 Mixed Models Course 84
81
constructing the last contrast.
title ’Estimation of contrasts’;
proc glm data=a;
class i j;
model y = i j /E;
estimate ’Lambda 1’ intercept 1 i 1 0 j 1 0 / E;
estimate ’Lambda 2’ intercept 1 i 1 0 j .5 .5 / E;
estimate ’Lambda 3’ intercept 0 i 1 -1 j 0 0 / E;
estimate ’Lambda 3’ intercept 0 i 1 -1 / E;
run; quit;
October 17, 2001 Mixed Models Course 85
Least Squares Means
The LSMEANS statement in GLM is an attempt to generate meaningful
estimates automatically, sometimes (but not always) with success.
These are denoted least squares means and can be constructed as
title ’Least squares means’;
proc glm data=a;
class i j;
model y = i j ;
lsmeans i j / E stderr;
run; quit;
The output caused by the E–option in the LSMEANS statement is
October 17, 2001 Mixed Models Course 86
4 Linear normal models
82
Least Squares Means
Coefficients for i Least Square Means i Level
Effect 1 2
1 Intercept 1 1
2 i 1 1 0
3 i 2 0 1
4 j 1 0.5 0.5
5 j 2 0.5 0.5
Coefficients for j Least Square Means j Level
Effect 1 2
1 Intercept 1 1
2 i 1 0.5 0.5
3 i 2 0.5 0.5
4 j 1 1 0
5 j 2 0 1
October 17, 2001 Mixed Models Course 87
The interpretation of the columns to the right is exactly as before:
The vector λ = (1, 1, 0, 0.5, 0.5)> gives
λ>β = δ + α1 +1
2(β1 + β2).
From this we see that the LSMEANS for i = 1 is the δ + α1 plus the
“average effect” of the factor j, i.e. 12(β1 + β2).
October 17, 2001 Mixed Models Course 88
83
Hypothetis Testing
Example 18. The two–way analysis of variance model
Yij = δ + αi + βj + εij, , i = 1, 2, j = 1, 2
is in the following be referred to as the large model.
Data is assumed to be in accordance with the large model.
Suppose we are interested in testing whether βj = 0.
October 17, 2001 Mixed Models Course 89
The mean µij of Yij is δ + αi + βj and the mean vector has the form
µ =
µ11
µ12
µ21
µ22
=
1 1 0 1 0
1 1 0 0 1
1 0 1 1 0
1 0 1 0 1
δ
α1
α2
β1
β2
= Xβ
Testing βj = 0 corresponds to testing whether the reduced model
Yij = δ + αi + εij
is in accordance with data.
October 17, 2001 Mixed Models Course 90
4 Linear normal models
84
Under the reduced model, the mean µij of Yij is δ +αi and the meanvector has the form
µ =
µ11
µ12
µ21
µ22
=
1 1 0
1 1 0
1 0 1
1 0 1
δ
α1
α2
= X0β0
Hence testing the hypothesis βj = 0 corresponds to testing whetherµ = X0β0 when we “know” that µ = Xβ. fin
October 17, 2001 Mixed Models Course 91
Note that any vector µ that can be written as µ = X0β0 can also be
written as µ = Xβ – simply by setting the last two elements of β to
zero.
More generally, any vector in span(X0) is also in span(X), but not
vice versa.
(Recall that span(X0) is the set of vectors that can be written as a
linear combination of the columns of X0.)
Let P and P0 be the projection matrices corresponding to X and
October 17, 2001 Mixed Models Course 92
85
X0. The least squares estimate of µ are
µ = Py under the large model
µ = P0y under the reduced model
How to judge whether the reduced model is feasible??
The answer lies in the “distance” between the observations and the
expected values.
The vector of residuals
e = y − µ = y − Py = (I − P )y
October 17, 2001 Mixed Models Course 93
reflect random deviations from the mean under the large model (in
which we “believe”).
Therefore the length of e (and hence the squared length e>e is
expected to be “small” in some sense.
October 17, 2001 Mixed Models Course 94
4 Linear normal models
86
If the reduced model is true then e0 = (I − P0)y is also the vector of
residuals, and the length of the vector should also be small.
On the other hand if the reduced model is not true, then e0 is not
just residuals, because it contains some of the variation due to the
factor βj.
In this case the length of the residual vector is expected to be large.
Consider the difference between the residuals
D = e− e0 = y − Py − (y − P0)y = Py − P0y = (P − P0)y
If the reduced model is true, then this difference is just difference
between residuals, and the length of D is expected to be small.
October 17, 2001 Mixed Models Course 95
If we let d and d0 denote the number of independent columns in X
and X0, one can show the following
Result 3.
E(D>D
d− d0) =
1
d− d0E(D>D) = σ2 + k
or equivalently that
E(D>D) = (d− d0)(σ2 + k) = (d− d0)σ
2 + (d− d0)k,
where k ≥ 0 and k = 0 when the reduced model is true.
If σ2 had been known the result above would be very useful:
If D>D is “much larger” than (d− d0)σ2, this would indcate that
October 17, 2001 Mixed Models Course 96
87
k > 0 which in turn causes us to doubt the feasibility of the reduced
model.
October 17, 2001 Mixed Models Course 97
There are two problems in this connection:
1. σ2 is not known, and
2. what does “much larger” mean...
Yet, in Linear Normal Models there is a simple solution to this two
problems now to be outlined:
October 17, 2001 Mixed Models Course 98
4 Linear normal models
88
Problem 1: σ2 is not known
Under the large model, the variance estimate is
σ2 = e>e/(n− d),
i.e. the residual sum of squares divided by the residual degrees of
freedom.
It is well known that E(σ2) = σ2, so it is reasonable to assume that
σ2 ≈ σ2.
Therefore, if the reduced model is true (and hence k = 0), the ratio
F =D>D/(d− d0)
e>e/(n− d)≈ 1.
October 17, 2001 Mixed Models Course 99
That takes, to some extent, “care of” the problem that σ2 is
unknown.
Problem 2: what does “much larger” mean... :
If the reduced model is not true, then the ratio F would tend to be
larger than 1. The problem remaining is to define what is meant by
“large”. On can show the following:
Result 4. If the reduced model is true then F has an Fd−d0,n−d–
distribution.
Here d − d0 is the number of parameters removed from the model
(i.e. the additional residual degrees of freedom gained by going from
the large to the reduced model), and n− d is the residual degrees of
October 17, 2001 Mixed Models Course 100
89
freedom under the large model.
If the reduced model is not true, then F has an expected value larger
than 1.
Therefore, if F is larger than a pre–specified quantile in the
Fd−d0,n−d–distribution one would doubt the feasibility of the model
reduction, i.e. reject the hypothesis.
October 17, 2001 Mixed Models Course 101
Calculating things in Practice
Consider again the difference between the residuals
D = e− e0 = y − Py − (y − P0)y = Py − P0y = (P − P0)y.
There is an easy way to calculate D>D in practice:
Result 5.
D>D = e>0 e0 − e>e = RSS0 −RSS
where RSS and RSS0 denote the residual (or error) sums of squares
under the large and the reduced model respectively.
October 17, 2001 Mixed Models Course 102
4 Linear normal models
90
Tests in LNMs in short form
• Consider a LNM Y ∼ Nn(µ, σ2I). Hence Y =D µ + e, where
e ∼ Nn(0, σ2I).
• Consider the models for the mean value
M : µ ∈ L = C(X) calM0 : µ ∈ M0 = C(X0) L0 ⊂ L
where M is assumed to hold true, and let M and M0 denote the
corresponding projections of dimension d and d0.
• Under M, MY = Mµ + Me = µ + Me.
October 17, 2001 Mixed Models Course 103
• If M0 is true, then
(M −M0)Y = Mµ + Me−M0µ−M0e = (M −M0)e
is only “random noise”. In this case (M −M0)Y is expected to be
small.
• Clearly, M −M0 is the projection onto L ∩ L>.
• Hence ||(M−M0)Y ||d−d0
= Y >(M−M0)Yr(M−M0)
is a measure of how close M0Y
is to MY in relation to the difference in dimensionality of the
models.
October 17, 2001 Mixed Models Course 104
91
• We use the results that
E(Y >AY ) = tr(AVar(Y )) + E(y)>AE(Y )
tr(M) = d, tr(M −M0) = d− d0
• Assuming only M,
E(Y >(M −M0)Y
r(M −M0)) = (
σ2
d− d0tr(M −M0)) + β>X>(M −M0)
d− d0)Xβ
= σ2 + β>X>(M −M0)
d− d0)Xβ
= σ2 + ||v||2
• If M0 is true, then ||v||2 = 0.
October 17, 2001 Mixed Models Course 105
• If we use MSE = Y >(I−M)Yn−d
= σ2 as an estimate for σ2 then
under M0,
F =
Y >(M−M0)Yd−d0
Y >(I−M)Yn−d
≈ 1
• It is clear that nominator and denominator are independent:
(
I −M
M −M0
)
Y ∼ N
((
I −M
M −M0
)
µ;σ2
(
I −M 0
0 M −M0
))
• Under M0,
1
σ2Y >(M −M0)Y ∼ χ2(d− d0, β
>X>(M −M0)Xβ)
, i.e. a non–central χ2 distribution.
October 17, 2001 Mixed Models Course 106
4 Linear normal models
92
• Hence large values of F causes doubt in M0.
October 17, 2001 Mixed Models Course 107
Hypothesis Testing in SAS
In practice SAS performs all relevant calculations (and,
unfortunately, a few more).
Degrees of freedom: A comment regarding the degrees of
freedom reported by SAS is appropriate:
Default in SAS is that all observations are centered around their
average.
This centering “costs” one degree of freedom and therefore SAS
reports the Corrected Total which is n− 1, where n is the
number of observations.
October 17, 2001 Mixed Models Course 108
93
In the large model in Example 18 there are three parameters,
(δ, α1, β1)
Because of the centering of the data, SAS does not regard δ as a
parameter when it comes to reporting degrees of freedom. So the
real number of parameters is the number SAS reports plus 1. Hence
d = 2 + 1 while d0 = 1 + 1.
(Note: If the NOINT option is specified, the model degrees of
freedom become correct.)
In practice it is not a problem whether data are centered or not,
because we mainly are interested in differences between the number
of parameters, i.e. differences in degrees of freedom.
October 17, 2001 Mixed Models Course 109
Example 19. (Continuation of Example 18) Below we find theoutput from fitting the large and the reduced model in PROC GLM.
Dependent Variable: y Large model
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 3.76999467 1.88499734 2.70 0.3954
Error 1 0.69877998 0.69877998
Corrected Total 3 4.46877465
Source DF Type III SS Mean Square F Value Pr > F
i 1 0.73276693 0.73276693 1.05 0.4924
j 1 3.03722775 3.03722775 4.35 0.2847
Dependent Variable: y Reduced model
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 1 0.73276693 0.73276693 0.39 0.5951
Error 2 3.73600773 1.86800386
Corrected Total 3 4.46877465
October 17, 2001 Mixed Models Course 110
4 Linear normal models
94
In the notation from before
D>D = RSS0 −RSS = 3.73600773 − 0.69877998 = 3.037
e>e = RSS = 0.699
d− d0 = 3− 2 = 2− 1 = 1
n− d = 4− 3 = 3− 2 = 1
The F-statistic therefore becomes
F =3.037/1
0.699/1= 4.35
This is the statistic reported in the Type III SS–section of theoutput. So in most (but not all) cases, SAS does the work for us.fin
October 17, 2001 Mixed Models Course 111
Example 20. The two–way analysis of variance with interactions
Yijk = δ + αi + βj + γij + εijk, i = 1, 2; j = 1, 2; k = 1, 2, 3
has mean
µ =
µ11
µ12
µ21
µ22
=
1 1 0 1 0 1 0 0 0
1 1 0 0 1 0 1 0 0
1 0 1 1 0 0 0 1 0
1 0 1 0 1 0 0 0 1
δ
α1
α2
β1
β2
γ11
γ12
γ21
γ22
= Xβ
October 17, 2001 Mixed Models Course 112
95
Here we regard µij, 1 and 0 as vectors of length 3 such that µcontains 12 elements.
In this form, the model is overparametrized so SAS works with anequivalent representation, namely
µ =
1 1 1 1
1 1 0 0
1 0 1 0
1 0 0 0
δ
α1
β1
γ11
= X2β2 (8)
fin
October 17, 2001 Mixed Models Course 113
4 Linear normal models
96
5 Some Basic Statistical Concepts
This lecture presented/refreshed basic statistic concepts, such as central limit theorem, principlesof estimation, the likelihood principle and test of hypothesis.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/StatTheory.f.pdf
97
Data and Models
The starting point for a statistical analysis is a set of observations
y = (y1, . . . , yn)
resulting from an experiment (or perhaps an observational study)
conducted in order to gain insight in a specific area.
We shall in general use the term experiment even though the setting
may not be that of a controlled experiment.
October 18, 2001 Mixed Models Course 1
Some Characteristics:
A fundamental characteristic of the experiment is that the outcome
is stochastic rather than deterministic.
Hence, if the experiment is repeated again under similar conditions
the new result would not necessarily be y.
Because of the random/stochastic variation in data, it is natural toconsider models based on probability theory, because this is thebranch of mathematics dealing with random variation. In thissetting, the starting point is the set of possible outcomes
Y = (Y1, . . . ,Yn)
of the experiment.
October 18, 2001 Mixed Models Course 2
5 Some Basic Statistical Concepts
98
Here Yi could be for example
• the set of all real numbers,
• the set of positive real numbers,
• the set {diseased, not diseased}, or
• the set {low, medium, high}.
The link between the observed value yi and the set of possible values
Yi is established through the notion of a random variable Yi.
A random variable Yi is a function whose values can be in the set
Yi, and the observed value yi is said to be a realization of the
random variable Yi.
October 18, 2001 Mixed Models Course 3
The random variable Yi is a function, but not a deterministic
function such as e.g.
f(x) = x2 + 7.
It is a random function whose outcome on one hand is uncertain but
on the other hand typically governed by some rules. Those rules are
best formulated in terms of a probability distribution.
Example 1. : Binomial Experiment Any animal can be infectedwith a specific disease, i.e. it can be diseased or not–diseased.
For the ith animal in the population the state of disease is denoted byYi and Yi can therefore take one of the values {diseased, not diseased}(for brevity written simply as {1, 0}).
fin
October 18, 2001 Mixed Models Course 4
99
Example 2. : Binomial Experiment If the possible outcomes ofYi is the set {diseased, not diseased} (for brevity written simply as{1, 0}) the random variable Yi can be either 1 or 0. A statisticalmodel for Yi is obtained by specifying the probability distribution forYi, for instance
p(Y = y) = θy(1− θ)1−y
where 0 ≤ θ ≤ 1. fin
Example 3. : Samples from the normal distribution If Yi has anormal distribution, e.g. Yi ∼ N(θ, 1) the set of possible outcomesYi is the real line. fin
October 18, 2001 Mixed Models Course 5
In both examples, the function Yi is specified through a
probability distribution.
The distribution depends on an (unknown) parameter θ. (In the
examples, θ is a single number but more generally the parameter is a
vector θ = (θ1, . . . , θp).)
October 18, 2001 Mixed Models Course 6
5 Some Basic Statistical Concepts
100
In statistical terms, one speaks of a parametrical statistical model:
1. It is a statistical model, because the outcome of Yi is described in
terms of a probability distribution.
2. It is a parametrical model because once the parameter θ is known
the distribution is known.
October 18, 2001 Mixed Models Course 7
Why the Normal Distribution is so “Normal”
The most frequenly employed distribution is the normal distribution.
Many (but certainly not all) random phenomena encountered in
practice exhibit a certain regularity:
1. Observations have a tendency to be clustered around a “mean
value”.
2. Deviations from the “mean value” are often symmetric.
3. The histogram of observations can be well approximated with the
bell–shaped normal (or Gaussian) distribution
October 18, 2001 Mixed Models Course 8
101
Histogram of z.mean
z.mean
Rel
ativ
e F
requ
ency
0.3 0.4 0.5 0.6 0.7
01
23
45
The bell-shaped curve is written
f(y;µ, σ2) =1√2πσ
exp(− 1
2σ2(y − µ)2)
Why does this bell–shaped curve fit quite well to many
phenomenons encountered in practice??
October 18, 2001 Mixed Models Course 9
The Central Limit Theorem
Parts of the answer is given by the Central Limit Theorem:
Let Z1, . . . , Zn be independent random variables with E(Zi) = µi
and V ar(Zi) = σ2i .
Let Y =∑n
i=1 Zi.
Then E(Y ) = µ =∑
i µi and V ar(Y ) = σ2 =∑
i σ2i .
What about the distribution of Y ?
October 18, 2001 Mixed Models Course 10
5 Some Basic Statistical Concepts
102
Result 1. The Central Limit Theorem says that
Y ∼approx N(µ, σ2).
The approximation becomes better as n →∞.
(Note: We have not made any assumption about the distribution of
the Zis – it has only been assumed that they are independent.
Many things encountered in nature can be regarded as the sum of
many small (independent) contributions. That is one explanation
why the normal distribtuion is so “normal”.
October 18, 2001 Mixed Models Course 11
Example 4. Let Zi be uniformly distributed on [0, 1], i.e. all valuesin the [0, 1]–interval are “equally likely” for i = 1, . . . , 4.
How does the distribution of Z = 1n
∑y
i=1 Zi look?
Quite normal, actually !
Histogram of z1
z1
Rel
ativ
e F
requ
ency
0.0 0.4 0.8
0.0
0.4
0.8
1.2
Histogram of z2
z2
Rel
ativ
e F
requ
ency
0.0 0.4 0.8
0.0
0.5
1.0
1.5
Histogram of z.mean
z.mean
Rel
ativ
e F
requ
ency
0.2 0.4 0.6 0.8
0.0
0.5
1.0
1.5
2.0
2.5
3.0
−2 −1 0 1 2
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
fin
October 18, 2001 Mixed Models Course 12
103
Some General Principles of Estimation
After establishing a statistical model a problem is to estimate the
value of the parameter θ. To find this estimate we need to make
some assumptions.
In what follows, a very fundamental assumption will be made:
There exists a true (but unknown) value of θ.
If θ had been known, then the distribution of Yi would be known
too. That is we would know the characteristics of the mechanism
which generated the data y.
October 18, 2001 Mixed Models Course 13
A consequence of this is that the important task is to obtain a good
estimate of θ. Some examples of doing so are given in the following.
October 18, 2001 Mixed Models Course 14
5 Some Basic Statistical Concepts
104
Example 5. (Continuation of Example 2) Consider the experimentof tossing a “pin” n times, giving data y = (y1, . . . , yn). Hence thepossible outcomes are Yi = {up, down} which we write {1, 0}.
It is assumed thatP (Yi = 1) = θ
for all i, such that the probability of observing “pin up” (!) is thesame every time. If we observe that the pin points upwards alltogether y+ =
∑
i yi times, then it takes only very little creativity tosuggest that the relative frequency
y+/n
is a sensible estimate for θ. fin
October 18, 2001 Mixed Models Course 15
Example 6. : Linear regression Consider the case where there isassociated a known number xi to each outcome of the experimentyi, and where it is suspected that there might be an approximatelylinear relationship between xi and yi.
This can lead to the linear regression model
Yi ∼ N(θi, σ2) where θi = θ0 + θ1xi
This model is fundamentally different from the model in Example 2:In Example 2, each observation was assumed to have the samedistribution. In the present model, this is not the case as the meanfor each random variable Yi is allowed to depend on the value of xi.
October 18, 2001 Mixed Models Course 16
105
It is well known from any standard textbook on statistics that theparameters θ = (θ0, θ1) can be estimated by minimizing the squareddistance between the observed and the expected values, i.e. byminimizing the function
D(θ0, θ1) =∑
i
(yi − (θ0 + θ1xi))2
fin
October 18, 2001 Mixed Models Course 17
Example 7. (Continuation of Example 3) Suppose we conduct anexperiment where each observation yi is a realization of Yi ∼ N(θ, 1).Then it takes very little fantasy to suggest that the average
z1 =1
n
n∑
i=1
yi
is a sensible estimate for θ. fin
October 18, 2001 Mixed Models Course 18
5 Some Basic Statistical Concepts
106
In the examples above it is easy to suggest ways of estimating the
unknown parameters. These can be described as:
Example 5: Estimation by the relative frequency.
Example 6: Estimation by minimizing the squared distance.
Example 7: Estimation by the average.
However, it is clear that there is a need for:
• General principles for obtaining those estimates.
• Some notion for how “good” an estimate is.
October 18, 2001 Mixed Models Course 19
In the following we present and discuss some of these principles
briefly.
The exposition is by no means intended to be neither comprehensive
nor very precise.
The aim is solely to illustrate some of the considerations made in
connection with estimation of unknown parameters on the basis of
data.
Eventually the exposition leads to the method of maximum
likelihood.
October 18, 2001 Mixed Models Course 20
107
Method of Moments
One approach is to base the estimation on the moments, i.e. the
expectation, variance etc. of radom variables.
Recall that the first moment of a random variable X is E(X) and
the second central moment of X is E(X −E(X))2 = V ar(X).
For Example 3 with Yi ∼ N(θ, 1) we define a new random variable,
say Z1, as the avereage of the Yis. Then it is well known that
Z1 =1
n
n∑
i=1
Yi ∼ N(θ, 1/n)
October 18, 2001 Mixed Models Course 21
The estimate z1 = 1n
∑n
i=1 yi can then be regarded as a realization
of the random variable Z1 which has mean E(Z1) = θ.
It is important to keep in mind that Z1 is a function of Y1 . . . , Yn
which can be emphasized by writing Z1(Y ). Likewise, z1 is a
function of the observed data whih is emphasized by writing z1(y).
We say that
• the random variable Z1(Y ) is an estimator, and
• a specific value of Z1(y) is an estimate.
October 18, 2001 Mixed Models Course 22
5 Some Basic Statistical Concepts
108
The method of moments is to consider θ(y) as a good estimate of θ
because the corresponding random variable Z1(Y ) has θ as its
expectation:
E(Z1(Y )) = θ (1)
October 18, 2001 Mixed Models Course 23
How good is an estimator?
An estimator with the property (1) is said to be unbiased.
Unbiasedness seems to be desireable property of an estimator.
However, there are many estimators with the property (1). Two
additional ones are
• the average Z2(Y ) = (Y1+Y2)/2 of the two first random variables,
and
• Z3(Y ) = Y1, i.e. the first random variable itself.
October 18, 2001 Mixed Models Course 24
109
Yet, intuition indicates that z1 is a “better” estimate of θ than
z2 = (y1 + y2)/2 which in turn is “better” than z3 = y1.
To be precise about what is meant by “better” we consider the
variance of the estimators:
V ar(Z1(Y )) = 1/n
V ar(Z2(Y )) = 1/2
V ar(Z3(Y )) = 1
October 18, 2001 Mixed Models Course 25
Hence (with more than 2 observations), we have
V ar(Z1) < V ar(Z2) < V ar(Z3),
and on the basis of this it is clear that we will consider Z1 to be a
better estimate of θ than Z2 or Z3.
Note: Because estimates are realizations of random variables (their
corresponding estimators) it is “a must” always to report a the
variance, a standard deviation or a related quantity whenever
reporting the value of an estimate.
October 18, 2001 Mixed Models Course 26
5 Some Basic Statistical Concepts
110
Someone might suggest to estimate θ by Z4(Y ) = Z1(Y ) + 7.
In terms of considering estimators with small variance as being
“good”, one can argue that Z4 is just as good as Z1, because
V ar(Z4) = V ar(Z1).
However, E(Z4) = θ + 17 6= θ, so Z4 is not an unbiased estimate of
θ.
These considerations suggest that good estimators should be
unbiased and have as small variance as possible.
October 18, 2001 Mixed Models Course 27
These two criteria leads to the theory of
Minimum Variance Unbiased Estimation – sometimes written briefly
as MVUE. It is not surprising that Z1 is a MVUE (Minimum
Variance Unbiased Estimator).
In general, establishing MVUEs can be a complicated task: Finding
estimators that are unbiased may not be too hard, but finding one
with the smallest possible variance may be very very complicated.
October 18, 2001 Mixed Models Course 28
111
Consistency of Estimators
The estimator Z1 has other nice properties compared with Z2, Z3
and Z4.
When the number of observations n tends to infinity, the variance ofZ1 tends to 0. The practical implication of this is straight forward:Z1 becomes indistinguishable from its expectation θ. An estimatorwith this property is said to be consistent.
Consistency is an attractive feature of an estimator, because itmeans that the estimate of θ gets better and better the more datawe collect.
It is clear that neither of Z2, Z3 and Z4 are consistent.
October 18, 2001 Mixed Models Course 29
Desireable Properties of Estimators
From the discussion above we have found that
• Unbiasedness,
• Smallest possible variance, and
• Consistency
are three attractive properties of estimators.
October 18, 2001 Mixed Models Course 30
5 Some Basic Statistical Concepts
112
Estimators, whatever kind they are, are functions of the random
variables Y1, . . . , Yn from which data y1, . . . , yn are realizations.
Hence estimators are random variables and as such they have a
distribution. This distribution is needed when drawing inference
about a parameter, e.g. when making a test or constructing a
confidence interval.
Therefore a fourth desireable property of an estimator is that
• The distribution of the estimator is known.
October 18, 2001 Mixed Models Course 31
The Method of Maximum Likelihood
There is a general estimation method called maximum likelihood
estimation to be discussed in the following.
An estimator obtained from this method do not in general have the
attractive properties mentioned above – but almost. That is, when
the sample size goes to infinity (in a sufficiently well behaved way)
then the properties hold.
We say that the estimator is asymptotically unbiased, do
asymptotically have the smallest possible variance, is asymptotically
consistent and finally, the distribution of the estimator is
asymptotically normal.
October 18, 2001 Mixed Models Course 32
113
These four properties of maximum likelihood estimators indicates
why this is such a powerful method.
Moreover, it turns out that the estimation process can be made by
maximizing a particular function, called the likelihood function.
Maximization of such a function can in practice be complicated, but
is in principle not much different from what we all learned in high
school: Calculate the derivative, set this one to zero and solve!
October 18, 2001 Mixed Models Course 33
Example 8. : Binomial Experiment
Consider n throws with a pin where θ = Pr(“Falls with pin up”).Hence the outcome of the ith toss can be {Up,Down} writtenbriefly as {1, 0} and
p(yi; θ) = P (Yi = yi; θ) = θyi(1− θ)1−yi
October 18, 2001 Mixed Models Course 34
5 Some Basic Statistical Concepts
114
Suppose the observed data are y = {1, 1, 0, 1, 0, 1, 0, . . . , 0, 0}.
If the outcomes of the tosses are independent, then the probability ofobserving y is
p(y; θ) = p(y1; θ)p(y2; θ) . . . p(yn; θ)
= p(1)p(1)p(0)p(1)p(0)p(1)p(0) . . . p(0)p(0)
= θθ(1− θ)θ(1− θ)θ(1− θ) . . . (1− θ)(1− θ)
= θy+(1− θ)n−y+ (2)
where n is the number of times the pin is thrown and y+ =∑
i yi isthe number of times the pin points up.
fin
October 18, 2001 Mixed Models Course 35
The Likelihood function
When data y is observed, p(y; θ) can be regarded as a function of
θ. This function is called the likelihood function and is denoted by
L(θ).
Hence in the example,
L(θ) = θy+(1− θ)n−y+.
To be specific, let the pin be thrown n = 25 times, and suppose that
pin up is observed y+ = 10 times. Then we have
L(θ; y) = θ10(1− θ)25−10
October 18, 2001 Mixed Models Course 36
115
Figure 1 shows a plot of L(θ) against θ for n = 25 and y+ = 10.
Theta value
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
010
^-8
2*10
^-8
3*10
^-8
4*10
^-8
5*10
^-8
Figure 1: Likelihood function for n = 25 and y+ = 10.
October 18, 2001 Mixed Models Course 37
5 Some Basic Statistical Concepts
116
The Maximum likelihood principle
The principle in maximum likelihood estimation is that
the estimate of θ is the value of θ which maximizes the likelihood
function.
One can think of θ as the value of θ which maximizes the probability
of observing the data which one actually has observed.
October 18, 2001 Mixed Models Course 38
• This value is called the maximum likelihood estimate (MLE) and
is often denote by θ.
• The corresponding estimator is called the maximum likelihood estimator.
For clarity one should write θ(y) for the estimate and θ(Y ) for the
corresponding estimator, but this is too cumbersome to do. So,
except for special cases, we simple write θ for both entities and then
derive from the context whether its is an estimate (a number) or and
estimator (the corresponding random variable).
Figure 1 suggests that 0.4 is the maximum likelihood estimate.
October 18, 2001 Mixed Models Course 39
117
It is often easier to maximize the log-likelihood function often
denoted by l(θ):
l(θ) = log L(θ) = y+ log θ + (n− y+) log(1− θ)
Since log is a monotone function the value of θ maximizing l(θ) will
also maximize L(θ).
October 18, 2001 Mixed Models Course 40
Figure 2 shows a plot of l(θ) against θ for n = 25 and y+ = 10.
Theta value
log-
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
-70
-60
-50
-40
-30
-20
Figure 2: Log–Likelihood function for n = 25 and y+ = 10.
October 18, 2001 Mixed Models Course 41
5 Some Basic Statistical Concepts
118
Maximization of
l(θ) = y+ log θ + (n− y+) log(1− θ)
is obtained by solving the equation
S(θ) = l′(θ) = 0,
where l′(θ) denotes the derivative of l(θ).
• The function S(θ) is called the score function.
• The equation S(θ) = 0 is called the likelihood equation.
We find that
S(θ) = l′(θ) =y+
θ− n− y+
1− θ= 0
October 18, 2001 Mixed Models Course 42
119
which happens if and only if
θ =y+
n
Hence, the maximum likelihood estimate is just the relative
frequency. The corresponding maximum likelihood estimator is
θ(Y+) =Y+
n.
Hence when y+(= 10) is observed the observed value of the
maximum likelihood estimator (i.e. the maximum likelihood
estimate) becomes θ(x) = θ(10) = 0.4 - in accordance with Figure 1
and Figure 2.
October 18, 2001 Mixed Models Course 43
How Good is the Estimate?
When y+ = 10 and n = 25 we have θ = 0.4, but the same value is
found if y+ = 2 and n = 5.
However, intuition suggests that with 25 observations we should
have more confidence that θ is a good estimate than with only 5
observations. That is, we would expect that the variance of the
estimator is smaller with 25 observations than with only 5.
It is well known for binomial experiments that V ar(Y+) = nθ(1− θ)
and hence that V ar(θ) = θ(1− θ)/n which indeed confirms the
intuition.
October 18, 2001 Mixed Models Course 44
5 Some Basic Statistical Concepts
120
In Figure 3 is shown the likelihood function for (n = 2, y+ = 5),
(n = 4, y+ = 10), (n = 10, y+ = 25) and (n = 20, y+ = 50).
Theta - y+=2, n=5
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.01
0.02
0.03
Theta - y+=4, n=10
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.00
040.
0008
0.00
12
Theta - y+=10, n=25
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
010
^-8
3*10
^-8
5*10
^-8
Theta - y+=20, n=50
Like
lihoo
d fu
nctio
n
0.0 0.2 0.4 0.6 0.8 1.0
010
^-15
2*10
^-15
Figure 3: Likelihood function for (n = 5, y+ = 2), (n = 10, y+ = 4),
(n = 25, y+ = 10) and (n = 50, y+ = 20).
October 18, 2001 Mixed Models Course 45
It is clear from those graphs that the more observations the more
“peaked” is the likelihood function and the higher is its curvature at
its maximum and.
That is, the value of L(θ) is more and more distinct from the value
of L(θ) for θ 6= θ when more and more observations are made.
It is therefore not surprising that there is a connection (indeed it
turns out to be a close connection) between that variance of the
maximum likelihood estimator and the curvature of the likelihood
functions at its maximum.
This connection is presented in the next sections.
October 18, 2001 Mixed Models Course 46
121
The Asymptotic Normal Distribution of the MLE
In this section we present a very important result:
The maximum likelihood estimator is asymptotically normally
distributed.
This property of the MLE is central to much practical statistical
inference.
October 18, 2001 Mixed Models Course 47
Example 9. Frequently one is interested in making statementsabout θ on the basis of the experiment. For example one mightbe interested in whether on can reasonably assume that the truevalue of θ is 0.5.
The key to answering this question is the random variable θ(y). Putin a popular way, one has to investigate whether 0.5 is a “likely”outcome of θ(Y ). To answer that question, one need to knowthe distribution of θ(Y ) – and this distributions is in general verycomplicated to find. fin
October 18, 2001 Mixed Models Course 48
5 Some Basic Statistical Concepts
122
Therefore one frequently resorts to an approximate result, on which
so much resides in statistics:
When n →∞ and certain conditions are satisfied then it holds
approximately that
θ ∼ N(θ,− 1
l′′(θ))
That is, the distribution of θ(X) will asymptotically be like a normal
distribution with the true (but unknown) parameter θ as expectation
and and a variance − 1l′′(θ)
.
October 18, 2001 Mixed Models Course 49
Example 10. For the binomial experiment, it is not hard to see whythe MLE is asymptotically normal:
We can regard y as a sum of independent random variables yi whereyi = 1 corresponds to pin up and yi = 0 is “pin not up”.
Hence the Central Limit Theorem gives that y is approximatelynormally distributed, and hence so is θ = y/n.
For a single experiment we know that E(yi) = θ and V ar(yi) =θ(1− θ). From this we find that
E(θ) = θ, V ar(θ) =θ(1− θ)
n
so approximately,
θ ∼ N(θ,θ(1− θ)
n)
fin
October 18, 2001 Mixed Models Course 50
123
Example 11. In general, the answer is not so straight forward. Wetherefore outline the “standard” calculations which one goes throughin this connection:
The expression for the variance is obtained as follows: Recall that thelikelihood and score functions are given by
l(θ) = x log θ + (n− x) log(1− θ)
S(θ) = l′(θ) =x
θ− n− x
1− θ
Differentiating the scorefunction and changing sign gives
−l′′(θ) =x
θ2+
n− x
(1− θ)2
October 18, 2001 Mixed Models Course 51
In practice θ is unknown. However, it can be justified to plug theestimate θ = x/n into l′′(θ) and this gives −l′′(θ) = n
θ(1−θ).
Hence, asymptotically,
θ ∼ N
(
θ,θ(1− θ)
n
)
October 18, 2001 Mixed Models Course 52
5 Some Basic Statistical Concepts
124
With n = 25, x = 10 we get θ = 0.4 and V ar(θ) ≈ 0.0096. Hence,an (approximate) 95% confidence interval for θ is
(θ − 1.96
√
V ar(θ) ; θ + 1.96
√
V ar(θ))
= (0.4− 0.19 ; 0.4 + 0.19) = (0.21; 0.59)
fin
October 18, 2001 Mixed Models Course 53
Asymptotical normality of transformations of the
MLE
If h is a function of θ then the distribution of h(θ) will,
asymptotically, look like a normal distribution with mean h(θ) and
variance which can be estimated by −(h′(θ))2/l′′(θ), i.e.
asymptotically
h(θ) ∼ N(h(θ),−h′(θ)2
l′′(θ))
October 18, 2001 Mixed Models Course 54
125
Example 12. For example, if we are more comfortable withinterpreting the odds η = h(θ) = θ/(1−θ) we find h′(θ) = 1/(1−θ)2.Hence, asymptotically,
η ∼ N(θ
1− θ,
θ
(1− θ)n) = N(
θ
1− θ, 0.0133).
fin
October 18, 2001 Mixed Models Course 55
Tests of Hypotheses
The final point to touch upon concerns tests of hypotheses regarding
θ.
Suppose interest is in testing whether θ is equal to a specific fixed
value θ0.
The likelihood ratio test
The maximum likelihood estimate θ is the value of θ which gives the
observed data the highest probability which is L(θ).
If the value θ0 assigns nearly the same probability L(θ0) as θ does,
we would be tempted to accept the hypothesis that θ = θ0.
October 18, 2001 Mixed Models Course 56
5 Some Basic Statistical Concepts
126
In other words, it is tempting to consider the
likelihood ratio test statistic Q defined by
Q =L(θ0)
L(θ)
Clearly Q is a number between 0 and 1 and values close to 1 are in
favor of the hypothesis.
It can be shown that if the hypothesis is true then
−2 log Q = 2(l(θ)− l(θ0))
has (when n is large) approximately a χ2 distribution with 1 degree
of freedom. Large values of −2 log Q leads to rejection of the
hypothesis. In Figure 4 it can be seen that −2 log Q is twice the
October 18, 2001 Mixed Models Course 57
vertical distance between the value of l in θ and θ0.
.
.
l(θ) − l(θ0)
Slopel′(θ)
θ0θ
θ
l(θ)
Figure 4: Illustration of the likelihood ratio test, the score test and
the Wald test.
The Score Test
A test statistic equivalent to −2 log Q is obtained by considering the
slope of l in the point θ0. It is known that the slope of l in θ is 0
October 18, 2001 Mixed Models Course 58
127
(l(θ) = 0 by definition of the MLE.) Hence values of l′(θ0) near 0
will also speak in favor of the hypothesis.
It can be shown that when n is large and the hypothesis is true, the
distribtion of the so called score test
S = −l′(θ0)2/l′′(θ0)
will also look like a χ2 distribution with 1 degree of freedom.
Hence when n is large the likelihood ratio test and the score test are
equivalent.
The Wald Test
A third test is the Wald test which compares the values of θ and θ0
October 18, 2001 Mixed Models Course 59
directly corresponding to the horizontal distance in Figure 4.
It can be shown that when n is large and the hypothesis is true, the
distribtion of the Wald test statistic
W = −(θ − θ0)2(l′′(θ))2
will also look like a χ2 distribution with 1 degree of freedom.
Note that in W is simply the square of the difference (θ− θ0) divided
by its standard deviation 1/
√
l′′(θ). In the litterature, one frequently
use the term “Wald test” about the square root of W which yields a
test statistic approximately with a N(0, 1) distribution.
October 18, 2001 Mixed Models Course 60
5 Some Basic Statistical Concepts
128
Hence when n is large the likelihood ratio test, the score test and
the Wald test are equivalent.
October 18, 2001 Mixed Models Course 61
How to get the asymptotic normality
This section is somewhat theoretical.
Consider the following general setup: Let X be a single random
variable. The expectation and variance of X is
µi = E(X) =
∫
xp(x; θ)dx
V ar(X) =
∫
(x− µ)p(x; θ)dx.
Since X is a random variable, then so is the score function
S(θ;X) = l′(θ;X).
October 18, 2001 Mixed Models Course 62
129
For later purposes we need the mean and the variance of the score
function.
To obtain these quantities, we use the following facts:
S(θ) = l′(θ;x) = (log p(x; θ))′ =1
p(x; θ)p′(x; θ)
S′(θ) = l′′(θ;x) = − 1
p(x; θ)2(p′(x; θ))2 +
1
p(x; θ)p′′(x; θ)
∫
p(x; θ)dx = 1
The function S′(θ) is called the Hessian (matrix) and is very
important in connection with PROC MIXED.
October 18, 2001 Mixed Models Course 63
Moreover, in most cases of practical interest, the order of
differentiation and integration can be interchanged. Hence
∫
d
dθp(x; θ)dx =
d
dθ
∫
p(x; θ)dx =d
dθ1 = 0
Mean of the score function We shall supress the dependence on X
in the following: We find that
E(S(θ)) = E(l′(θ)) =
∫
l′(θ)p(x; θ)dx =
∫
p′(x; θ)dx
Interchanging the order of differentiation and integration yields
E(S(θ)) =
∫
d
dθp(x; θ)dx =
d
dθ
∫
p(x; θ)dx =d
dθ1 = 0
October 18, 2001 Mixed Models Course 64
5 Some Basic Statistical Concepts
130
So the expected value of the score function is zero.
Variance of the score function The variance of the score function
has a special name, namely the Fisher information and is usually
denoted by I(θ). Hence we have
I(θ) = V ar(S(θ)) = E(S(θ)2)
= E([l′(θ)]2)
=
∫
l′(θ)2p(x; θ)dx =
∫
1
p(x; θ)p′(x; θ)2dx
because the expected value is zero.
A more convenient expression for the variance can be found in
October 18, 2001 Mixed Models Course 65
terms of the derivative of the score funtion:
E(S′(θ)) = E(l′′(θ))
=
∫
[− 1
p(x; θ)2(p′(x; θ))2 +
1
p(x; θ)p′′(x; θ)]p(x; θ)dx
=
∫
[− 1
p(x; θ)(p′(x; θ))2 + p′′(x; θ)]dx
Interchanging the order of differentiation and integration as before
gives that∫
p′′(x; θ)dx = 0. Hence
E(S′(θ)) = −∫
1
p(x; θ)(p′(x; θ))2dx = −V ar(S(θ,X)).
October 18, 2001 Mixed Models Course 66
131
Hence we have for a single observation
E(S(θ)) = 0
I(θ) = V ar(S(θ)) = E(S(θ)2) = −E(S′(θ)) (3)
The likelihood for all data
From (2) it is seen that the likelihood for all data is the product of
the likelihood for each observation, i.e.
L(θ; y) = p(y1; θ) . . . p(yn; θ) =∏
i
p(yi; θ),
Consequently, the log–likelihood, the score function and the
derivative of the score function for all data is a sum of independent
October 18, 2001 Mixed Models Course 67
components:
l(θ) =∑
i
l(θ; yi) =∑
u
li(θ)
S(θ) = l′(θ; y) =∑
i
l′(θ; yi) =∑
i
S(θ; yi) =∑
i
Si(θ),
S′(θ) =∑
i
S′i(θ), (4)
For a single observation we have
E(Si(θ)) = 0
I(θ) = V ar(Si(θ)) = E(Si(θ)2) = −E(S′i(θ))
October 18, 2001 Mixed Models Course 68
5 Some Basic Statistical Concepts
132
and correspondingly for all observations
E(S(θ)) = 0
V ar(S(θ)) = nI(θ).
We then need three small results:
Result 1: Since S′(θ; y) =∑
i S′i(θ) it is reasonable to assume (using
the law of large numbers) that
1
nS′(θ) =
1
n
∑
i
S′i(θ) ≈ E(S′k(θ)) = −I(θ)
Result 2: Since S(θ) =∑
i Si(θ) is a sum of independent random
variables where E(Si(θ)) = 0 and V ar(Si(θ)) = I(θ). Hence by
October 18, 2001 Mixed Models Course 69
the central limit theorem, approximately
S(θ) ∼ N(0, nI(θ))
Result 3: Let θ0 be the true (but unknown to us) value of the
parameter θ. Let us assume that θ is a good estimate, i.e. close to
θ0. Then
0 = S(θ) ≈ S(θ0) + S′(θ0)(θ − θ0)
That is
1√nS(θ0) ≈ −1
nS′(θ0)
√n(θ − θ0)
≈ I(θ0)√
n(θ − θ0)
October 18, 2001 Mixed Models Course 70
133
The left hand side is approximately N(0, I(θ)) distributed.
Hence, approximately, 1√nI(θ0)
S(θ0) ∼ N(0, I(θ)−1). That is,
approximately,
√n(θ − θ0) ∼ N(0, I(θ)−1).
or
θ ∼ N(θ0, (nI(θ))−1).
as desired.
October 18, 2001 Mixed Models Course 71
Likelihood and Linear Normal Models
For a linear normal model maximum likelihood estimation is the
same as least squares estimation. The unknown parameters are β
and σ2, so let θ = (β, σ2).
October 18, 2001 Mixed Models Course 72
5 Some Basic Statistical Concepts
134
Because the observations are independent, the likelihood becomes
L(θ) = f(y1, ...yn; θ)
=
n∏
i=1
f(yi; θ)
=n∏
i=1
1√2π
1√σ2
exp(− 1
σ2(yi − µi)
2)
=1√2π
n
1√σ2
n exp(− 1
σ2
∑
i
(yi − µi))2)
=1√2π
n
1√σ2
n exp(− 1
σ2(y −Xβ)>(y −Xβ))
For the moment, suppose σ is known.
October 18, 2001 Mixed Models Course 73
Maximizing L(θ) = L(β, σ2) is done by minimizing∑
i(yi − µi))2) = (y −Xβ)>(y −Xβ)). But this is exactly what is
done in least squares estimation.
October 18, 2001 Mixed Models Course 74
135
Once β has been estimated, it can be verified that the maximum
likelihood estimate for σ is
σ2 =1
n(y − µ)>(y − µ)
In practice, one never uses this variance estimate. Instead one uses
σ2 =1
n− p(y − µ)>(y − µ)
where p is the number of parameters in β.
October 18, 2001 Mixed Models Course 75
The reason for using the latter estimate is that
E(σ2) =n− p
nσ2
E(σ2) = σ2
Hence the latter estimate is unbiased while the former is not.
October 18, 2001 Mixed Models Course 76
5 Some Basic Statistical Concepts
136
6 An overview
The purpose of this lecture was to illustrate, how the problems of the research within thebiological sciences is related to the progress within statistical theory both in general, and relatedto mixed models.
Starting out with an experiment reported from Darwin, the lecture discussed the state of the artof experimental design and analysis at Darwin’s time, proceeded with the progress in statisticaltheory, very much related to animal breeding, and ended up with the general theory of mixedmodels. Important researchers such as F. Galton, R.A. Fisher, S. Wright, C.R.Henderson werepresented.
The slides are in Danish. Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/oversigt.f.pdf
137
1
Outline
• Baggrund for metoder
• Historisk forløb
• Relation til vores fagomrader
February 7, 2001
2
Darwins Majs
• C. Darwin (1876) The effects of cross- and self-fertilisation in the
vegetable Kingdom. John Murray, London.
• c.f. Fisher, R.A. (1935) Design of Experiments. Oliver and Boyd.
February 7, 2001
6 An overview
138
3
Darwins Majs
column I II III
Crossed Self. -fert
Pot I 2348 173
8
12 2038
21 20
Pot II 22 20
1918 183
8
2148 185
8
Pot III 2218 185
8
2038 152
8
1828 164
8
2158 18
2328 162
8
Pot IV 21 18
2218 126
8
23 1548
12 18
February 7, 2001
4
Darwins Majscolumn I II III
Crossed Self. -fertPot I 23.50 17.38
12.00 20.3821.00 20.00
Pot II 22.00 20.0019.13 18.3821.50 18.63
Pot III 22.13 18.6320.38 15.2518.25 16.5021.63 18.0023.25 16.25
Pot IV 21.00 18.0022.13 12.7523.00 15.5012.00 18.00
February 7, 2001
139
5
Darwins Majs
” As only a moderate number of crossed and self-fertilised plants
were measured, it was of great importance to learn, how far the
averages were trustworthy. I therefore asked Mr Galton, who has
much experience in statistical researches, to examine some of my
tables..... I may premise that if we took by chance a dozen score of
men belonging to different nations and measured them, it would I
presume, be very rash to form any judgment from such small
numbers on their average heights. But the case is somewhat
different with my crossed and self-fertilised plants, as they were of
exactly the same age, were subjected from first to last to the same
conditions, and were descended from the same parents”
February 7, 2001
6
Galtons tilgangcolumn I II III Sorteret Diff.
Crossed Self. -fert Crossed Self. -fertPot I 23.50 17.38 23.50 20.38 3.125
12.00 20.38 23.25 20.00 3.25021.00 20.00 23.00 20.00 3.000
Pot II 22.00 20.00 22.13 18.63 3.50019.13 18.38 22.13 18.63 3.50021.50 18.63 22.00 18.38 3.625
Pot III 22.13 18.63 21.63 18.00 3.62520.38 15.25 21.50 18.00 3.50018.25 16.50 21.00 18.00 3.00021.63 18.00 21.00 17.38 3.62523.25 16.25 20.38 16.50 3.875
Pot IV 21.00 18.00 19.13 16.25 2.87522.13 12.75 18.25 15.50 2.75023.00 15.50 12.00 15.25 -3.25012.00 18.00 12.00 12.75 -0.750
February 7, 2001
6 An overview
140
7
Galtons Tilgang
• Sortering
• Differencer
• Spredning (Most probable error) – men ikke t-test
February 7, 2001
8
Hvem var Galton
Anthropologi, Meteorologi, populationsgenetik, Eugenics
(arvehygiejne), fingeraftryk, Korrelation.
Meget interesseret i malemetoder, objektiv kvantificering af
fænomener.
K. Pearson’s Guru
February 7, 2001
141
9
Korrekt tilgang ?column I II III Diff.
Crossed Self. -fertPot I 23.50 17.38 3.125
12.00 20.38 3.25021.00 20.00 3.000
Pot II 22.00 20.00 3.50019.13 18.38 3.50021.50 18.63 3.625
Pot III 22.13 18.63 3.62520.38 15.25 3.50018.25 16.50 3.00021.63 18.00 3.62523.25 16.25 3.875
Pot IV 21.00 18.00 2.87522.13 12.75 2.75023.00 15.50 -3.25012.00 18.00 -0.750
February 7, 2001
10
Korrekt tilgang ?
• Differencer
• Spredning + t-test
• Anova. Lineær Normal Model.
• Hypotesetest. Nul hypoteser.
• Uafhængighedsantagelse.
• Randomisering
February 7, 2001
6 An overview
142
11
Korrekt tilgang ?column I II III Diff.
Crossed Self. -fertPot I 23.50 17.38 3.125
12.00 20.38 3.25021.00 20.00 3.000
Pot II 22.00 20.00 3.50019.13 18.38 3.50021.50 18.63 3.625
Pot III 22.13 18.63 3.62520.38 15.25 3.50018.25 16.50 3.00021.63 18.00 3.62523.25 16.25 3.875
Pot IV 21.00 18.00 2.87522.13 12.75 2.75023.00 15.50 -3.25012.00 18.00 -0.750
February 7, 2001
12
Hvad er sket
• R.A. Fisher
? Rothamstead
• Student (W. Gossett)
February 7, 2001
143
13
Den 5. Potte
• Hvad forventer vi af udslag i potte 5. Hvad
er et gæt pa forskellen ?
• Hvorfor ?.
• Hvad er et gæt pa niveauet for Self-fertilized.
• Tilfældige effekter,
Populationer,
Stikprøver
February 7, 2001
14
Populationsgenetik
• Population
• P = A + M
• V(P ) = V(A) + V(M)
• h2 = V(A)V(P )
• Ao = 12Am + 1
2Af
February 7, 2001
6 An overview
144
15
Populationsgenetik
• R.A. Fisher
• Sewall Wright
• (Haldane)
February 7, 2001
16
Hierarkiske populationer
.
.
���������������
� �
�
�����������������
� � �
� Sires
Females
Offspring
February 7, 2001
145
17
Populationsgenetik/ Husdyravl
• R.A. Fisher
• Sewall Wright
• Jay R. Lush
• C.R. Henderson
• S.R. Searle.
February 7, 2001
18
Husdyravl
• Oprindelig Hierarkisk Struktur
• Strukturen bryder ned, specielt pga. KS
• Metoder til krydset klassifikation
• Henderson’s Mixed Model Equations
February 7, 2001
6 An overview
146
19
Husdyravl
• Hovedvægt pa estimation (Selektion)
• Afhængighed beskrives ved residual varians og
heretabilitet
• Problem er primært regneteknisk (Matrice-
invertering)
• Normalt MANGE! observationer
• Hypotesetest af mindre interesse
February 7, 2001
20
Mixed Models generelt
• Gentagne malinger/longitudinelle data
• Spatiale observationer
• Hierarkiske forsøgsdesign (e.g. split-plot)
• Mixed Model Equations fælles referenceramme
• Fælles program udvikling
February 7, 2001
147
21
Mixed Models generelt
• Hypotesetest af stor interesse
• Afhængighed beskrives ved mange
variansparametre
• Begrænset antal observationer
•
• Stadig løse ender
February 7, 2001
6 An overview
148
7 Experimental planning and design
The purpose of the lecture was to refresh the concepts used in experimental planning and design,i.e., hypothesis, power of designs, blocking. Typical blocking factors were discussed.
Different types of experimental design, such as randomized block, split-plot, latin squares andfactorial designs, were discussed, and examples were sought within the participants areas ofresearch.
The slides are in Danish. Link to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Forsplanpl.f.pdf
149
1
Outline
• Hypotheses
• Decision Support
• Need of information for planning
• Restrictions in experimental design
• Different designs
February 12, 2001
2
Forskningsprocessen
AnsøgningPublicering
Pakke
Forsøg
February 12, 2001
7 Experimental planning and design
150
3
Forskningsprocessen
• Fa ideer til omrader, hvor eksisterende
viden/teori er utilstrækkelig/forkert
• Foretage iagttagelser, sa ideerne kan be- eller
afkræftes
• Beslutte om viden/teori skal justeres
• (Kvantificering af viden)
• Gruppearbejde over tid og sted
February 12, 2001
4
Darwins Majscolumn I Height, Inch
Crossed Self. -fertPot I 23.50 17.38
12.00 20.3821.00 20.00
Pot II 22.00 20.0019.13 18.3821.50 18.63
Pot III 22.13 18.6320.38 15.2518.25 16.5021.63 18.0023.25 16.25
Pot IV 21.00 18.0022.13 12.7523.00 15.5012.00 18.00
February 12, 2001
151
5
Hypotheses
Hypothesis A GMO sugar beets are not harmfull to cows
Hypothesis B GMO sugar beets are harmfull to cows
Hypothesis A Pesticide use reduces fertility
Hypothesis B Pesticide use do not reduce fertility
February 12, 2001
6
Luse Beslutningsstøtte
Table 1: Sprøjteeksempel – gevinsttabel
Afgrødens tilstand
Beslutning Ingen lus Lus
SprøjtOmkostninger til
sprøjtemiddel og
arbejde
Omkostninger til
sprøjtemiddel og
arbejdeSprøjt ikke 0 Udbytte tab
February 12, 2001
7 Experimental planning and design
152
7
Forskning Beslutningsstøtte
Table 2: Forskningseksempel – gevinsttabel’Verdens’ tilstand
Beslutning Hypotese 1 er
sand
Hypotese 2 er
sand
Accepter hypotese 1 OK Fejl !
Accepter hypotese 2 Dyr fejl ! Gennembrud
!
February 12, 2001
8
Typer af fejlkonklusion
Hypotese 1 Hypotese 2
Type I fejl Type II fejl
February 12, 2001
153
9
Muligheder i designfase
-1 0 1 2-1 0 1 2-1 0 1 2
Hypotese 1 Hypotese 2
Forøg præcision Forøg forsøgsudslag
NB!: Type I fejl er konstant, e.g. 0.05
February 12, 2001
10
Biologisk input
• Maleegenskaber
• Forventede forsøgsudslag
• Mulige konklusioner af forsøg
• Afhængige <> uafhængige hypoteser
• Hypotesegene(re)rende egenskaber
February 12, 2001
7 Experimental planning and design
154
11
Table 3: Oversigt over forventede forsøgsudslagEgenskab Hypotese 1 er sand Hypotese 2 er sand
Behandling 1 Behandling 2 Behandling 1 Behandling 2A 100 100 100 120B... ... ... ... ...
February 12, 2001
12
Typiske blokfaktorer
• Kuld
• Sti, Flok, Bur
• Køn
• Afstamning
• Besætning
• Observatør
February 12, 2001
155
13
Begrænsninger i design muligheder
• Blokstørrelse
• Opstaldning/Management
• Ressource kamp
February 12, 2001
14
Designtyper
• Randomiseret Blokforsøg
• Split-Plot forsøg
• Romer Kvadrat
• Ikke komplette blokforsøg
• Faktorielle forsøg
• Fraktionerede designs
February 12, 2001
7 Experimental planning and design
156
8 Randomized Complete Block Design
These are the first slides in the second block of lectures. They start off with the augmentationof the linear normal model to a mixed model. Then PROC MIXED in SAS were presented, andexample 1.2.1 in LMSW (Littell et al., 1996) were discussed. The slides can be seen as a summaryof chapter 1 in LMSW.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RDBC.f.pdf
157
1
Outline
• Hypotheses
• Udvidelse af LNM
• Introduktion til Proc Mixed
• RCBD eksempel (1.2.4)
February 28, 2001
2
Linear Normal Model
Y11 = δ + α1 + u1 + ε11
Y12 = δ + α2 + u1 + ε12
Y21 = δ + α1 + u2 + ε21
Y22 = δ + α2 + u2 + ε22
εij ∼ N (0, σ2)
Y ∼ N (0, σ2I)
February 28, 2001
8 Randomized Complete Block Design
158
5
Matrix formulering
Y11
Y12
Y21
Y22
=
1 1 01 0 11 1 01 0 1
δ
α1
α2
+
1 01 00 10 1
(
u1
u2
)+
ε11
ε12
ε21
ε22
Y = Xβ + Zu + ε
ε ∼ N (0, R), u ∼ N (0, G)
V(Zu) = Z V(u)Z> = ZGZ>
V(Y ) = ZGZ> + R
February 28, 2001
6
Random vs. Fixed
• Do the levels of the factor come from a probability distribution?
McCulloch & Searle (1997)
• Are Inferences to be drawn from these data about just these level
of the factor ? Searle, (1971)
February 28, 2001
159
7
ML - estimation
Type Distribution Estimate
LNM Y ∼ N (Xβ, σ2I) β = (X>X)−1X>y
If V is known:
LMM Y ∼ N (Xβ, V ) β = (X>V −1X)−1X>V −1y
V = ZGZ> + R is not known, depends on parameters,
V = f(σ2, σ2u).
February 28, 2001
8
Likelihood function
l(y, β, σ2, σ2u) = −1
2log |V | − 1
2(y−Xβ)>V −1(y−Xβ)− n
2log(2π)
1 2 3 4 5
−44
−43
−42
−41
−40
σ2
Logl
ike
February 28, 2001
8 Randomized Complete Block Design
160
9
Proc Mixed I
PROC MIXED < options > ;
BY variables ;
ID variables ;
WEIGHT variable ;
February 28, 2001
10
Proc Mixed II
CLASS variables ;
MODEL dependent = < fixed-effects > < / options > ;
RANDOM random-effects < / options > ;
REPEATED < repeated-effect> < / options > ;
PARMS (value-list) ... < / options > ;
PRIOR <distribution > < / options > ;
February 28, 2001
161
11
Proc Mixed III
CONTRAST ’label’ < fixed-effect values ... >
< | random-effect values ... > , ... < / options > ;
ESTIMATE ’label’ < fixed-effect values ... >
< | random-effect values ... >< / options > ;
LSMEANS fixed-effects < / options > ;
MAKE ’table’ OUT=SAS-data-set ;
February 28, 2001
12
Proc Mixed
Model concerns Xβ
Random concerns Zu and G = V(u)
Repeated concerns ε and R = V(ε)
February 28, 2001
8 Randomized Complete Block Design
162
13
Ingot Støbeblok/metal barre
metal Metal brugt til lodning (?) af Ingot (nickel, iron, copper)
Pres Tryk der brækker lodningen
/*---Data Set 1.2.4---*/data rcb;
input ingot metal $ pres;datalines;
1 n 67.01 i 71.91 c 72.2
.
.
February 28, 2001
14
Design
Ingot no.
Lodning 1 2 3 4 5 6 7
1 n i c c c n n
2 c n i i n c i
3 i c n n i i c
February 28, 2001
163
15
Andre eksempler pa RCBD
• Parrede observationer
Den rullende Afprøvning
• (Beretning 685) Stigende mængder solsikkefrø (4 niveauer). 20
kuld a 4 grise.
• Beretning 546. Opdrætningsintensitet, Jersey. 10 par enæggede
tvillinger. Høj vs. lav intensitet,
• Forskningsrapport 25. Airwash systemet. Besætning opdeles efter
lige vs ulige konumre.
February 28, 2001
16
Proc Mixed model
proc mixed data=rcb;class ingot metal;model pres=metal;random ingot;
lsmeans metal / pdiff;estimate ’nickel mean’ intercept 1 metal 0 0 1;estimate ’copper vs iron’ metal 1 -1 0;contrast ’copper vs iron’ metal 1 -1 0;
run;
February 28, 2001
8 Randomized Complete Block Design
164
17
Anden notation
Yijk = µ + αi + uj + εij
uj ∼ N (0, σ2u)
εij ∼ N (0, σ2ε)
February 28, 2001
18
Tredje notation
Yijk = Xβ + Zu + ε
u ∼ N (0, G)
ε ∼ N (0, R)
February 28, 2001
165
19
SAS (8E) Output
The Mixed Procedure
Model Information
Data Set WORK.RCBDependent Variable presCovariance Structure Variance ComponentsEstimation Method REMLResidual Variance Method ProfileFixed Effects SE Method Model-BasedDegrees of Freedom Method Containment
February 28, 2001
20
Class Level Information
Class Levels Values
ingot 7 1 2 3 4 5 6 7metal 3 c i n
February 28, 2001
8 Randomized Complete Block Design
166
21
Dimensions
Covariance Parameters 2Columns in X 4Columns in Z 7Subjects 1Max Obs Per Subject 21Observations Used 21Observations Not Used 0Total Observations 21
February 28, 2001
22
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 112.409879521 1 107.79020201 0.00000000
Convergence criteria met.
February 28, 2001
167
23
Estimate of σ2u, σ2
ε
Covariance ParameterEstimates
Cov Parm Estimate
ingot 11.4478Residual 10.3716
February 28, 2001
24
Kriterier for fit af model, bruges ved modelsammenligninger.
Fit Statistics
-2 Res Log Likelihood 107.8AIC (smaller is better) 111.8AICC (smaller is better) 112.6BIC (smaller is better) 111.7
February 28, 2001
8 Randomized Complete Block Design
168
25
Signifikans test
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
metal 2 12 6.36 0.0131
February 28, 2001
26
Degrees of Freedom
Numerator H0 : α1 = α2 = α3 = 0
K>β = 0 ⇔
0 1 −1 00 1 0 −10 0 1 −1
µ
α1
α2
α3
= 0
Num DF is rank(K)
February 28, 2001
169
27
Denominator Containment method: ”Denote the fixed effect in
question A, and search the RANDOM effect list for the effects that
syntactically contain A. For example, the RANDOM effect B(A)contains A, but the RANDOM effect C does not, even if it has the
same levels as B(A).Among the RANDOM effects that contain A, compute their rank
contribution to the (XZ) matrix. The DDF assigned to A is
the smallest of these rank contributions. If no effects are found,
the DDF for A is set equal to the residual degrees of freedom,
N − rank(XZ)”
Methods CONTAIN,BETWITHIN, RESIDUAL, SATTERTH, KENWARDROGER.MODEL .... \DDFM=SATTERTH;
February 28, 2001
28
Output fra Estimate
Estimates StandardLabel Estimate Error DF t Value Pr > |t|
nickel mean 71.1000 1.7655 12 40.27 <.0001copper vs iron -5.7143 1.7214 12 -3.32 0.0061
ContrastsNum Den
Label DF DF F Value Pr > F
copper vs iron 1 12 11.02 0.0061
February 28, 2001
8 Randomized Complete Block Design
170
29
Least Squares Means
StandardEffect metal Estimate Error DF t Value Pr > |t|
metal c 70.1857 1.7655 12 39.75 <.0001metal i 75.9000 1.7655 12 42.99 <.0001metal n 71.1000 1.7655 12 40.27 <.0001
Differences of Least Squares Means
StandardEffect metal _metal Estimate Error DF t Value Pr > |t|
metal c i -5.7143 1.7214 12 -3.32 0.0061metal c n -0.9143 1.7214 12 -0.53 0.6050metal i n 4.8000 1.7214 12 2.79 0.0164
February 28, 2001
30
GLM
GLM:Source DF Type III SS Mean Square F Value Pr > F
ingot 6 268.2895238 44.7149206 4.31 0.0151metal 2 131.9009524 65.9504762 6.36 0.0131
Mixed:Num Den
Effect DF DF F Value Pr >F
metal 2 12 6.36 0.0131
February 28, 2001
171
31
GLM:Standard LSMEAN
metal pres LSMEAN Error Pr > |t| Number
c 70.1857143 1.2172327 <.0001 1i 75.9000000 1.2172327 <.0001 2n 71.1000000 1.2172327 <.0001 3
Mixed: Least Squares Means
StandardEffect metal Estimate Error DF t Value Pr > |t|
metal c 70.1857 1.7655 12 39.75 <.0001metal i 75.9000 1.7655 12 42.99 <.0001metal n 71.1000 1.7655 12 40.27 <.0001
February 28, 2001
32
GLM: StandardParameter Estimate Error t Value Pr > |t|
nickel mean 71.1000000 1.21723265 58.41 <.0001copper vs iron -5.7142857 1.72142692 -3.32 0.0061
Mixed: StandardLabel Estimate Error DF t Value Pr > |t|
nickel mean 71.1000 1.7655 12 40.27 <.0001copper vs iron -5.7143 1.7214 12 -3.32 0.0061
February 28, 2001
8 Randomized Complete Block Design
172
33
Summary
• Model specification
• Output elements
• Estimation Methods
• Fit Statistics/Information Criterias
• Degrees of freedom, model parameters.
• GLM differs
February 28, 2001
34
IC Options
The IC option displays a table of various information criteria. The
criteria are all in smaller-is-better form, and are described in .
Criteria Formula Reference
AIC −2l + 2d Akaike (1974)
AICC −2l + 2d n∗
n∗−d−1 Burnham and Anderson (1998)
HQIC −2l + 2d log(log(n)) Hannan and Quinn (1979)
BIC −2l + d log(n) Schwarz (1978)
CAIC −2l + d(log(n) + 1) Bozdogan (1987)
Here l denotes the maximum value of the (possibly restricted) log
likelihood, d the dimension of the model, and n the number of
observations. In Version 6 of SAS/STAT software, n equals the
February 28, 2001
173
35
number of valid observations for maximum likelihood estimation and
n− p for restricted maximum likelihood estimation, where p equals
the rank of X. In later versions, n equals the number of effective
subjects as displayed in the ”Dimensions” table, unless this value
equals 1, in which case n equals the number of levels of the first
RANDOM effect you specify. If the number of effective subjects
equals 1 and you have no RANDOM statements, then n reverts to
the Version 6 values. For AICC (a finite-sample corrected version of
AIC), n∗ equals the Version 6 values of n, unless this number is less
than d + 2, in which case it equals d + 2.
For restricted likelihood estimation, d equals q the effective number
of estimated covariance parameters. In Version 6, when a parameter
estimate lies on a boundary constraint, then it is still included in the
calculation of d, but in later versions it is not. The most common
February 28, 2001
36
example of this behavior is when a variance component is estimated
to equal zero. For maximum likelihood estimation, d equals q + p.
For ODS purposes, the name of the ”Information Criteria” table is
”InfoCrit.
February 28, 2001
8 Randomized Complete Block Design
174
9 Randomized Complete Block Design II
These slides discussed the concept of BLUE and BLUP estimates. The question of model controlis addressed.
Link to full-screen presentation presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RDBC2SLU.f.pdf
175
1
Outline
• BLUEs and BLUPs
• Examples of model control
February 28, 2001
2
BLUEs and BLUPs
• Best Linear Unbiased Estimator l>Xβ0
• Best Predictor: E(u|y)
February 28, 2001
9 Randomized Complete Block Design II
176
3
Linear Regression
−2 −1 0 1 2
−2
−1
01
2
x1
x 2
−2 −1 0 1 2−
2−
10
12
x1
x 2
February 28, 2001
4
Linear Regression
(x2
x1
)∼ N
((µx2
µx1
),
(V2 C21
C12 V1
))
E(X2|X1) = µx2 + C21V−11 (x1 − µx1)
V(X2|X1) = V2 − C21V−11 C>
21
V(E(X2|X1)) = C21V−11 C>
21
February 28, 2001
177
5
(u
y
)∼ N
((µu
µy
),
(G C
C> V
))
u1 = u1
u2 = u2
Y11 = δ + α1 + u1 + ε11
Y12 = δ + α2 + u1 + ε12
Y21 = δ + α1 + u2 + ε21
Y22 = δ + α2 + u2 + ε22
February 28, 2001
6
BLUEs and BLUPs
• Best Linear Unbiased Estimator l>Xβ0
• Best Predictor: E(u|y)• Best Linear Predictor: (µu) + CV −1(y − µy)• Best Linear Unbiased Predictor:
BLUP(t>Xβ + s>u) = t>Xβ0 + s>CV −1(y −Xβ0)• Estimated Best (?) Linear Unbiased Predictor:
EBLUP(t>Xβ + s>u) = t>Xβ0 + s>CV −1(y −Xβ0)
EBLUP(t>Xβ + s>u) = t>Xβ0 + s>GZ>V −1(y −Xβ0)
February 28, 2001
9 Randomized Complete Block Design II
178
7
Variance in BLUP
u true value, u BLUP estimate, εu error in prediction.
u = u + εu ⇔ u− u = εu
V(u) = V(u) + V(εu)
The error of prediction:
V(u− u) = G− CV −1C>
The variance in BLUP value:
V(u) = CV −1C>
February 28, 2001
8
Example
One-way classification model: Effect of number of observations per block
ui = BLUP(ui) =niσ
2u
σ2 + niσ2u
(yi· − µ)
i: block no, ni: number of observations in block i, yi·: block mean.
As n →∞ the coefficientniσ
2u
σ2+niσ2u→ 1 and the variance of the BLUP estimates
V(ui) → G.
February 28, 2001
179
9
Fixed vs. Random
−5 0 5 10
0.5
1.0
1.5
2.0
2.5
Block Effect
(c) Ingots
−30 −20 −10 0 100.
51.
01.
52.
02.
5
Block Effect
(d) Litter
February 28, 2001
10
BLUP summary
• BLUP corresponds to the conditional expectation of the random effect givenobservation
• Under normality assumptions and known variance BP=BLUP
• With unknown variance this no longer holds.
• Variance of BLUPs depends on the precision of information concerning therandom effects
February 28, 2001
9 Randomized Complete Block Design II
180
11
Model check - LNM - model
• εi are independent and indentically distributed εi ∼ N (0, σ2)
• Residual vs. predicted
• Residual vs. anything else
• Probit plots.
• εi,t vs εi,t−1
• etc.
February 28, 2001
12
Residuals – Mixed Models
Distribution of residuals Mixed Models
(y −Xβ) = (Zu + ε) ∼ N (0, V )
i.e., not iid. ( option OUTPM in PROC MIXED)
Another definition of residuals
(y −Xβ − Z E(u)) = (Z(u− u) + ε) ∼ N (0, VG − VGV −1V >G + R)
where VG = ZGZ>. i.e., not iid. ( option OUTP in PROC MIXED)
Standardized residuals ?
February 28, 2001
181
13
Residual vs predicted
72 73 74 75
−20
−10
010
p1
r1
(e)
66 68 70 72 74 76 78−
20−
15−
10−
50
510
p2
r2
(f)
February 28, 2001
9 Randomized Complete Block Design II
182
10 Split-Plot Experiments
These slides present the theoretical background for split-plot designs. The slides augments thepresentation of split-plot designs in chapter 2 in LMSW, (Littell et al., 1996). The concept ofvariance-components are presented, and the different variance of different contrast presented. Inaddition concepts such the distribution of Sum of Squares, Satterthwaite’s approximation andthe distinction between random and fixed effects are presented.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SplitPlot.f.pdf
183
The General Idea behind Split–Plot Experiments
“Once upon a time there were linear normal models -systematic effects plus one error term...”
Yet, many experiments and studies have a hierarchical structure
• with respect to treatments
• with respect to error structures
Split–plot models is a very powerful (and early) way of handling such
situations.
April 6, 2001 Mixed Models Course 1
The name “split–plot” comes from the area of field experiements:
• Some treatments (say factor A) are applied to entire plots (parcels).
Those plots are called whole–plot–units and the factor A is the
whole plot factor.
• A plot is sometimes further sub–divided into sub–plots and other
treatments (say factor B) are applied to each of these sub–plots. The
sub–plots are called split–plot–units and the factor B is called the
split–plot factor.
April 6, 2001 Mixed Models Course 2
10 Split-Plot Experiments
184
Other examples:
• Treatment A (e.g. feeding) applied to a whole pig pen (the whole–plot)
while treatment B (something...) is applied to pigs within a pen.
• Treatment A is applied to an entire litter of piglets, treatment B is
applied to each piglet in the litter.
• Treatment A is a management straegy applied to a whole farm, while
treatment B is a treatment of each pig pen on the farm.
April 6, 2001 Mixed Models Course 3
The basic property of split–plot experiments is that subjects within a
whole–plot are more similar than than subjects on in different
whole–plots.
More generally, Subjects/individuals/plots close (in some sense) to each
other are expected to be more similar than if they were further apart.
Split–plit models are sometimes appropriate for analyzing repeated
measurements.
April 6, 2001 Mixed Models Course 4
185
Example 1. (Example 2.2 from LMSW).
• The effect of 3 bacterial inoculation treatments (INOC, indexed with j)applied to 2 grass cultivars (CULT, indexed with i).
• There are 4 blocks (BLOCK, indexed with k.) and and CULT is randomlyassigned to each half of the block.
• Half a block is the whole–plot unit. Each whole–plot unit is subdividedinto 3 split plot units and each INOC is applied there.
The statistical model is
yijk = µ + αi + βj + γij + rk + wik + εijk
where rk ∼ N(0, σ2r), wik ∼ N(0, σ2
w) and εijk ∼ N(0, σ2). fin
April 6, 2001 Mixed Models Course 5
Variance and Correlation
The total variance is
Var(yijk) = Var(rk) + Var(wik) + Var(εijk)
= σ2r + σ2
w + σ2 = σ2tot
which justifies the name variance component model:
• The total variance is a sum of individual variance contributions.
• Moreover, each variance contribution can be assigned to a specific
feature of the experiment.
April 6, 2001 Mixed Models Course 6
10 Split-Plot Experiments
186
The variance components have implications for the correlation structure
among the variables:
1. Observations within the same block (k) but with different levelsof factor A (i) are correlated through the block component
Corr(yijk, yi′j′k) = Corr(yijk, yi′jk) =Cov(yijk,yi′jk)
Var(yijk) = σ2r
σ2tot
2. Observations within the same block (k) and with the same levelof factor A (i) but different levels of factor B (j) are correlated
through the block component and the whole–plot component
Corr(yijk, yij′k) =Cov(yijk,yij′k)
σ2tot
= σ2r+σ2
w
σ2tot
April 6, 2001 Mixed Models Course 7
Hence in the split plot model it is assumed that the correlation, when
present, is positive.
The split–plot structure has important implications with respect to the
statistical inference:
1. The effect of the interaction between A and B and treatment B itself
should be compared with the “residual variation” i.e. the variation
between the split–plot units.
2. The effect of treatment A should be compared with the “whole–plot
variation”, i.e. the variation between the whole–plot units.
We shall illustrate these points for a balanced split–plot experiment.
April 6, 2001 Mixed Models Course 8
187
Comparing Differences
Consider again the model
yijk = µ + αi + βj + γij + rk + wik + εijk
where rk ∼ N(0, σ2r), wik ∼ N(0, σ2
w) and εijk ∼ N(0, σ2), and
i = 1 . . . a, j = 1 . . . b and k = 1 . . . c.
A simple calculation of differences of means illstrates the special issues
arising in a split–plot experiment.
April 6, 2001 Mixed Models Course 9
Different levels of factor A can be compared by
y1.. − y2.. = α1 − α2 + γ1. − γ2. + (w1. − w2.) + (ε1.. − ε2..)
Var(y1.. − y2..) = Var(w1. − w2.) + Var(ε1.. − ε2..)
= 2σ2
w
c+ 2
σ2
bc=
2c(σ2
w +σ2
b)
Different levels of factor B can be compared by
y.1. − y.2. = β1 − β2 + γ.1 − γ.2 + (ε.1. − ε.2.)
Var(y.1. − y.2.) = Var(ε.1. − ε.2.) =2c(σ2
a)
April 6, 2001 Mixed Models Course 10
10 Split-Plot Experiments
188
Hence Var(y1.. − y2..) is bigger than Var(y.1. − y.2.).
In other words, the effect of the whole–plot–factor is determined less
accurately the the effect of the split–plot factor.
April 6, 2001 Mixed Models Course 11
Inference Issues for Mixed Models
For balanced experiments, inference is based on F–tests.
For unbalanced cases, inference is a delicate issue. Loosely speaking
“What are the denominator degrees of freedom”.
In PROC MIXED one can make “approximate F–tests” (but SAS never
informs you that the tests are only approximate).
Several suggestions have been made regarding this. One such is
Satterthwaites Approximation.
April 6, 2001 Mixed Models Course 12
189
Analysis of the Split–Plot Experiment
Consider again the model
yijk = µ + αi + βj + γij + rk + wik + εijk
where rk ∼ N(0, σ2r), wik ∼ N(0, σ2
w) and εijk ∼ N(0, σ2), and
i = 1 . . . a, j = 1 . . . b and k = 1 . . . c.
For simplicity suppose that factor B does not represent a treatment but
only replications within each whole–plot. Then the model reduces to
yijk = µ + αi + rk + wik + εijk
April 6, 2001 Mixed Models Course 13
The replicates due to factor B are eliminated by calculating the average
within each block and treatment:
yi.k = µ + αi + rk + (wik + εi.k) where Var(wik + εi.k) = σ2w + σ2/b
• Hence the between whole–plot variation (σ2w) remains unchanged while
the within whole–plot variation σ2 is reduced by a factor b.
• Therefore by taking more replicates within a whole–plot unit, parts ofthe variation is reduced , while other parts of the variationremains the same.
April 6, 2001 Mixed Models Course 14
10 Split-Plot Experiments
190
Modelling the Mean
Let zik = yi.k denote the mean and define uik = wik + εi.k.
Then the model for the means can be written
zik = µ + αi + rk + uik
where uik ∼ N(0, σ2u) with Var(uik) = σ2
w + 1bσ
2 = σ2u and
rk ∼ N(0, σ2r).
This is an ordinary ANOVA–model with one treatment, one (random)
block effect and no interaction. Analyzing such a model is straight
forward.
April 6, 2001 Mixed Models Course 15
Three Technical Results
In connection with ANOVA calculations, one frequently uses the following results:
ANOVA1: Let X, Y be independent with E(X) = E(Y ) = 0 and let a be a number.Then
E(a + X + Y )2 = V ar(a + X + Y ) + [E(a + X + Y )]2
= Var(X) + Var(Y ) + a2 = E(X2) + E(Y 2) + a2
ANOVA2: Let Y1, . . . , Yn be independent with Yi ∼ N(µ, σ2), and let SSD =∑ni=1(Yi − Y.)2. Then
E(SSD) = (n− 1)σ2 = (n− 1) Var(Yi)
SSD ∼ σ2χ2(n− 1)
April 6, 2001 Mixed Models Course 16
191
ANOVA3: Let Y1, . . . , Yn be independent with Yi = µi + εi, where εi ∼ N(µi, σ2), and
let
SSD =n∑
i=1
(Yi − Y.)2 and Q(µ) =n∑
i=1
(µi − µ.)2
Then
E(SSD) = Q(µ) + E(n∑
i=1
(εi − ε.)2) = Q(µ) + (n− 1)σ2
April 6, 2001 Mixed Models Course 17
With
zik = µ + αi + rk + uik
summation gives
zi. = µ + αi + r. + ui.
z.. = µ + α. + r. + u..
The difference
zi. − z.. = (αi − α.) + (ui. − u..)
is a measurement of the treatment effect, and does not depend on the
block.
April 6, 2001 Mixed Models Course 18
10 Split-Plot Experiments
192
Letting SSDA =∑
i(zi. − z..)2 we find that
E(SSDA) =∑
i
(αi − α.)2 + E(∑
i
(ui. − u..)2)
= Q(α) + (a− 1)σ2
u
cand hence
E(c∑
i
(zi. − z..)2) = Q(α) + (a− 1)σ2u.
• If there is no effect of treatment A then Q(α) = 0 and SSDA has a
χ2–distribution.
• To be able to make the F–test we need to find a quantity which has
σ2u as expected value no matter whether αi = 0 or not.
April 6, 2001 Mixed Models Course 19
1. Let SSDAC =∑
ik(zik − zi. − z.k + z..)2. It is easy to see that
zik − zi. − z.k + z.. = uik − ui. − u.k + u..
2. It is not difficult to verify (and it can be found in any standard text
book on statistics) that
E(SSDAC) = σ2u(a− 1)(c− 1).
3. Finally it is equally easy to verify that SSDA and SSDAC are
independent.
April 6, 2001 Mixed Models Course 20
193
4. Therefore the F–statistic for testing αi = 0 becomes
F =c · SSDA/(a− 1)
SSDAC/(a− 1)(c− 1)
=c∑
i(zi. − z..)2/(a− 1)∑ik(zik − zi. − z.k + z..)2/(a− 1)(c− 1)
∼ Fa−1,(a−1)(c−1)
Large values of F are critical to the hypothesis.
April 6, 2001 Mixed Models Course 21
• The important point is that the treatment effect of factor A is “tested
against” the variance σ2u = σ2
w + σ2/b. which largely consists of the
whole–plot variation (σ2w) + a “minor” contribution from the split–plot
variation (σ2/b).
• In the balanced case, the test for αi = 0 can be made by simply
analyzing the “means”. That is the reason why PROC GLM in
special (balanced) cases can make the correct tests in certain variance
component models.
April 6, 2001 Mixed Models Course 22
10 Split-Plot Experiments
194
Back to the Original Setup
Return to the original model with a treatment effect of factor B, i.e.
yijk = µ + αi + βj + γij + rk + wik + εijk
1. The interaction effect γij is tested exactly as if wik and rk had been
fixed effects. I.e. the test is made “against” the residual variation σ2.
2. In the absence of γij, the main effect βj is also tested as if wik and rk
had been fixed effects.
3. The main effect of factor A is tested as described previously. Just note
that the effect of B cancels out in all calculations.
April 6, 2001 Mixed Models Course 23
Unbalanced cases
All the nice calculations previously presented breaks down when the
design is no longer balanced.
Consider again
yijk = µ + αi + rk + wik + εijk
and suppose this time that i = 1 . . . a, k = 1 . . . c and j = 1 . . . bik.
Hence there might not be the same number of replicates (j) within each
whole–plot unit.
April 6, 2001 Mixed Models Course 24
195
As before, the replicates due to factor B are eliminated by calculating
the average within each block and treatment:
zik = yi.k = µ + αi + rk + (wik + εi.k)
But now with uik = wik + εi.k
Var(uik) = σ2w + σ2/bik = σ2
uik
That is, the ziks have different variances.
April 6, 2001 Mixed Models Course 25
1. One unpleasant consequence of this is that
zi. = µ + αi + r. + ui.
has variance (σ2w + σ2
ui.)/c which depends on i.
2. Another, equally unpleaseant, consequence is that SSDAC from before
does not have a χ2 distribution.
3. Consequently, the F–statistic from before does not have an Fdistribution.
April 6, 2001 Mixed Models Course 26
10 Split-Plot Experiments
196
Some consequences of this:
• Hence we can still calculate the F–statistic, but it has an unknown
distribtution in the unbalanced case.
• Hence we have a problem in judging whether an observed F–statistic
is “large”.
• It seems plausible that when the experiment is “nearly balanced”, then
F must “nearly be F–distributed. But what is “nearly balanced”, and
what to do when the experiment is very unbalanced?
April 6, 2001 Mixed Models Course 27
A related problem:
A related problem arises even in the balanced case. Suppose interest is in
comparing
µ11 − µ21 = α1 − α2 + γ11 − γ21.
The optimal estimate of this contrast is in the balanced case the differnce
y11. − y21.
and the variance of that difference is
Var(y11. − y21.) =23(σ2
w + σ2)
April 6, 2001 Mixed Models Course 28
197
• The problem is that to estimate σ2w + σ2, two sums–of–squares are
needed.
• To put it in general terms, suppose SSD1 ∼ σ21χ
2(f1) and SSD2 ∼σ2
2χ2(f2) are needed. The problem arising is that the weighted sum
SSD = a1SSD1 + a2SSD2
does not have a χ2–distribtution unless σ1 = σ2 and a1 = a2.
• Satterthwaites idea was the following: Let us assume that SSD
approximately has a χ2–distribution.
• The problem is then how many degrees of freedom – but this number
can be “estimated” in the following way.
April 6, 2001 Mixed Models Course 29
Satterthwaites approximation
Consider the two–sample problem
Yij ∼ N(µi, σ2i ), i = 1, 2, j = 1, . . . , ni
Then
Yi ∼ N(µi,σ2
i
ni), Y1 − Y2 ∼ N(µ1 − µ2,
σ21
n1+
σ22
n2)
S2i =
1fi
ni∑j=1
(Yij − Yi.)2 ∼σ2
i
fiχ2(fi), fi = ni − 1
April 6, 2001 Mixed Models Course 30
10 Split-Plot Experiments
198
Let σ2D = σ2
1n1
+ σ22
n2. A natural and unbiased estimate for σ2
D is
S2D =
S21
n1+
S22
n2(1)
Question : What is the distribution S2D?
Satterthwaite (Worked at General Electric, USA) (approx. 1945): We
don’t know but let’s approximate the distribution of S2D with a suitable
χ2–distribution:
S2D ∼approx
φ2
ηχ2(η) (2)
April 6, 2001 Mixed Models Course 31
• With S2D = S2
1n1
+ S22
n2we have
E(S2D) =
σ21
n1+
σ22
n2= σ2
D
V ar(S2D) = 2(
σ41
n21f1
+σ4
2
n22f2
)
• Under the approximation S2D ∼approx
φ2
η χ2(η) is
E(S2D) = φ2
V ar(S2D) = 2
φ4
η
April 6, 2001 Mixed Models Course 32
199
• Satterthwaites ide: Match the first two moments:
φ2 = σ2D
η =(σ2
D)2
σ41
n21f1
+ σ42
n22f2
• In real life σ2i and hence σ2
D are unknown. Instead we plug in the
estimates s2i and s2
D in the calculation of η:
η =(s2
D)2
s41
n21f1
+ s42
n22f2
April 6, 2001 Mixed Models Course 33
Example 2. Let σ21 = 2, σ2
2 = 10, n1 = n2 = 6, f1 = f2 = 5. Then
σ2D =
26
+106
= 2
η =22
22
62·5 + 102
62·5
= 6.9 ≈ 7
Hence
S2D =
S21
n1+
S22
n2∼approx
σ2D
7χ2(7)
fin
April 6, 2001 Mixed Models Course 34
10 Split-Plot Experiments
200
Example 3. Let σ21 = 100, σ2
2 = 90, n1 = 100, n2 = 10, f1 = 99, f2 =9. Then
σ2D =
100100
+9010
= 10
η =(1 + 9)212
99 + 92
9
= 11.1
If the variances are assumed equal, then
σ2D =
99 · 100 + 9 · 90108
(1
100+
110
) = 10.9
which has a scaled χ2(108)–distribution.
Quite a difference! fin
April 6, 2001 Mixed Models Course 35
How Good is Satterthwaites Approximation
The 1000 EURO question is now : How good is Satterthwaites
approximation ???
The usual answer : Simulate and calculate coverage percentages !!!
April 6, 2001 Mixed Models Course 36
201
Two–sample Problem
Model:
Yij = µi + εij, i = 1, 2, j = 1, . . . , ni
where εij ∼ N(0, σ2i ).
1. Simulate data where µ1 = µ2.
2. Test hypothesis µ1 = µ2 at different significane levels.
- Using Satterthwaites approximation
- Using the Containment method, (default in PROC MIXED).
3. Calculate coverage percentages.
April 6, 2001 Mixed Models Course 37
n1 σ1 n2 σ2 Method DDF Fpr0.01 χ2pr0.01 Fpr0.05 χ2pr0.05 Fpr0.10 χ2pr0.10
3 1 3 20 contain 4 0.047 0.127 0.114 0.204 0.182 0.2603 1 3 20 satterth 2.16 0.020 0.124 0.056 0.202 0.106 0.258
8 1 3 20 contain 9 0.071 0.110 0.133 0.169 0.187 0.2278 1 3 20 satterth 2.01 0.013 0.110 0.052 0.169 0.088 0.227
3 1 8 20 contain 9 0.009 0.030 0.053 0.084 0.101 0.1343 1 8 20 satterth 7.16 0.006 0.030 0.046 0.084 0.093 0.134
8 1 8 20 contain 14 0.010 0.024 0.064 0.084 0.112 0.1458 1 8 20 satterth 7.04 0.007 0.024 0.038 0.084 0.096 0.145
16 1 16 20 contain 30 0.013 0.025 0.068 0.078 0.119 0.12816 1 16 20 satterth 15.1 0.010 0.025 0.060 0.078 0.110 0.128
3 1 3 5 contain 4 0.026 0.105 0.090 0.178 0.157 0.2353 1 3 5 satterth 2.61 0.013 0.105 0.056 0.178 0.107 0.234
8 1 3 5 contain 9 0.078 0.132 0.168 0.210 0.226 0.2718 1 3 5 satterth 2.62 0.026 0.132 0.070 0.210 0.130 0.271
3 1 8 5 contain 9 0.020 0.046 0.062 0.089 0.117 0.1443 1 8 5 satterth 7.94 0.016 0.046 0.059 0.089 0.112 0.144
8 1 8 5 contain 14 0.026 0.035 0.056 0.080 0.107 0.1318 1 8 5 satterth 7.73 0.014 0.035 0.048 0.080 0.090 0.131
Table 1: Two–sample problem - 1000 simulations
April 6, 2001 Mixed Models Course 38
10 Split-Plot Experiments
202
Split–Plot Experiment
We consider the model
Yijk = µ + αi + βj + wik + εijk, i = 1, 2, k = 1, . . . , ni, j = 1, . . . , nik
where wik ∼ N(0, σ2w) and εijk ∼ N(0, σ2).
• Make simulations for different values of σ2w.
• In the simulations α1 = α2.
• Test of the hypothesis α1 = α2.
April 6, 2001 Mixed Models Course 39
The design is as follows:
n1 = 3 and n2 = 8
i = 1 : j = 1 . . . n1k = 5
i = 2 : k = 1 . . . 3 : j = 1 . . . n1k = 3
i = 2 : k = 4 . . . 8 : j = 1 . . . n1k = 9
So all problems arise to to unbalancedness (rather than variance
heterogeneity as before).
April 6, 2001 Mixed Models Course 40
203
σ σw Method DDF Fpr0.01 χ2pr0.01 Fpr0.05 χ2pr0.05 Fpr0.10 χ2pr0.10
1 1 contain 9 0.007 0.030 0.050 0.068 0.086 0.1251 1 satterth 9.67 0.012 0.030 0.051 0.068 0.088 0.125
3 1 contain 9 0.004 0.018 0.037 0.064 0.083 0.1253 1 satterth 21.7 0.009 0.018 0.043 0.064 0.098 0.125
6 1 contain 9 0.002 0.014 0.020 0.043 0.057 0.0866 1 satterth 33.5 0.012 0.014 0.034 0.043 0.072 0.086
9 1 contain 9 0.002 0.020 0.034 0.063 0.083 0.1169 1 satterth 36.5 0.011 0.020 0.054 0.063 0.097 0.116
Table 2: Split–Plot Experiment - 1000 simulations
April 6, 2001 Mixed Models Course 41
Making the “right” tests with PROC MIXED
A typical SAS program for analyzing the split plot data above is like
proc mixed data=sim noitprint;class i j k subject;model y = i j /ddfm=contain chisq;random i*k;
run;
• The containment method is default in PROC MIXED (but can be
specified explicitely with ddfm=contain) in the MODEL statement.
• This tells SAS that when testing any of the fixed effects in the model,
SAS should look for a random effect which syntactically contains the
April 6, 2001 Mixed Models Course 42
10 Split-Plot Experiments
204
fixed effect: Since i is contained in i*j SAS then knows that that it is
this random effect the test should be “made against”.
• It is well known that this is the right thing to do when the experiment
is balanced.
April 6, 2001 Mixed Models Course 43
A Severe Warning!!
A very commonly made mistake in this connection is the following:Each combination (i, k) often identifies an experimental entity, e.g. ananimal or a (whole) plot in a field. Typically one would have a variable inthe data set identifying such an entity. For illustration we have made avariable, called subject defined as (i, k). A typical SAS program wouldthen be:
proc mixed data=sim noitprint;class i j k subject;model y = i j /ddfm=contain chisq;random subject;
run;
Such a program is made under the mistaken impression that since
April 6, 2001 Mixed Models Course 44
205
subject and (i, k) really identifies the same units in the experiment
then it should be immaterial what one writes.
This is not true, and the reason is the following:
Since i is not syntactically contained in subject the tests (for effect of
the factor i) would be made against the residual variance, which we
know is wrong.
April 6, 2001 Mixed Models Course 45
To emphasize this point, suppose that we declare a new variable icopywhich is just a copy of i. Then writing
proc mixed data=sim noitprint;class i j k subject icopy;model y = i j /ddfm=contain chisq;random icopy*k;
run;
will also make SAS perform the test of effect of the factor i against the
residual variance which, as poined out above, is wrong.
If, however, we write ddfm=satterth in any of the examples above,
then SAS will actually identify the right variance component to make the
test for effect of factor i against.
April 6, 2001 Mixed Models Course 46
10 Split-Plot Experiments
206
Some Tentative Conclusions on Satterthwaite
• For small samples, Satterthwaites method performs much better thanthe default Containment method.
• For larger samples, there is not much difference between the twomethods. In practice, this is because the difference between thequantiles in a F (1, 7) and F (1, 14) distribution is not large whereas thedifferences between quantiles in a F (1, 2) and a F (1, 4) distribution besubstantive
• Both methods generally perform better than the large sample χ2 tests.
• A drawback of Satterthwaites method is that it is computationally
April 6, 2001 Mixed Models Course 47
somewhat intensive.
• Results suggest the use of Satterthwaites approximation.
April 6, 2001 Mixed Models Course 48
207
Random or Fixed Effects?
Sometimes it is straight forward to decide on whether a specific effect
should be considered as random or fixed.
In other cases, it is a more delicate issue.
The text below is taken from lecture notes by L. R. Schaeffer, University
of Guelph, Ontario, Canada:
Fixed factors are factors in which the classes comprise all of the possible classes ofinterest that could be observed. For example, the sex of an animal is either male,female, sterilized male, or sterffized female. If the number of classes in a factor is smalland confined to this number even if conceptual resampling were performed an infinitenumber of times, then the factor is likely fixed. Other examples are age classes,
April 6, 2001 Mixed Models Course 49
lactation number, management system, cage number, and breed class. Usually if thesampling were to be repeated a second time, those factors which maintain the sameclasses between the two samplings would be fixed factors. For example, a growth trialon pigs using two diets would probably need to use the same housing facilities, thesame age groups of pigs, and the same diets, but the individual pigs would necessarilyhave to be new animals because an animal could not go through the same growthphase a second time in its life. Pig effeets would be considered a random factor whflethe other effects would be fixed.
Random factors are factors whose levels are considered to be drawn randomly from aninfinitely large population of levels. As in the previons pig experiment, pigs wereconsidered random because the pig population of the world is large enough to beconsidered infinitely large, and the group that were involved in that experiment were arandom sample from that population. In actual fact, however, the pigs on thatexperiment were likely sampled from those relatively few pigs that were available at thetime the trial started, but still they are considered to be a random factor because if theexperiment were to be repeated again, there would likely be a completely differentgroup of pigs involved.
April 6, 2001 Mixed Models Course 50
10 Split-Plot Experiments
208
Another way to determine if a factor is fixed or random is to know how the results willbe used. In a nutrition trial the results infer something about the diets in the trial. Thediets are specific and no inferences should be made about other diets not tested in theexperiment. Hence diet effects would be a fixed factor. In contrary, if animal effeetswere in the model, inferences about how any animal might respond to a specific dietmay need to be made. There should not be anything peculiar about the animal on thetrial that would nullify that inference. Animal effeets would be a random factor.
In general, a few questions need to be answered to make the correct choice of fixed orrandom factor designation. Some of the questions are:
1. How many levels of the factor a-re in the model? If smalt, then perhaps this is afixed factor. If large, tILen perhaps this is a random factor.
2. Is the number of levels in the population large enough to be considered fiffinite? Ifyes, then perhaps this factor is random.
3. Would the same levels be used again if the experiment were to be repeated a second
April 6, 2001 Mixed Models Course 51
time? If yes, then perhaps this factor is fixed.
4. Are inferences to be made about levels not included in the experiment? If yes, thenperhaps this factor should be random.
5. Were the levels of a factor determined in a nonrandom manner? If yes, tiden perhapsthis factor should be treated as fixed.
By studying the scientific literature, a researcher should be able to get some help in this
decision process. If in doubt, then the assistance of an experienced statistician should
be sought.
April 6, 2001 Mixed Models Course 52
209
Multilocation Trials
Consider the following setup:
• Four treatments, e.g. of housing systems for pigs are to be compared.
• Studies are carried out on 9 farms (locations)
• Within each farm a randomized block design with 3 blocks is employed,
i.e. each treatment is repeated 3 times within each farm, once in each
block.
How to analyze such data?
April 6, 2001 Mixed Models Course 53
Note that since there are replicates within each farm, the
farm–treatment interaction can be estimated.
The following model seems appealing:
yijk = µ + τi + Lj + (RL)jk + (τL)ij + εijk
where i = 1 . . . 4 is treatment, j = 1 . . . 9 is location and k = 1 . . . 3 is
block.
April 6, 2001 Mixed Models Course 54
10 Split-Plot Experiments
210
It is reasonable to assume that (RL)jk and εijk are random. But other
effects need more consideration:
• One can consider Lj and (hence) (τL)ij as being random.
• Alternatively one can consider Lj and (τL)ij to be fixed effects.
The effects in question can be considered random if the farms (locations)
are random representatives from the population of farms with specific
characteristics.
But if the farms are selected as e.g. “those 9 farms whose owners
responded to a questionnaire sent out to all farms with given
characteristics”, then the farms are not random representatives from the
April 6, 2001 Mixed Models Course 55
population. In that case, the effects in question should be regarded as
fixed, and one can not extrapolate the conclusions from the study
outside these 9 farms.
What to do if 6 farms are selected randomly, while 3 are not?
What to do if there are only 3 randomly selected farms in the study?
April 6, 2001 Mixed Models Course 56
211
10 Split-Plot Experiments
212
11 Examples of Split-Plot Designs
The purpose of this lecture was to illustrate the kind of problems that may arise, if split-plotdesigns are not treated properly. Most of the experiments presented were made at the DanishInstitute of Agricultural Sciences, or rather the National Institute of Animal Science, as it wascalled in those days.
Another common aspect of several of the experiments were that they have led to a heated debate.The pro’s and con’s in those debates were presented.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SPLITPLOTExamples.pdf
213
. . . After reading 50 of these papers in AABS-issues [Applied
Animal Behaviour Science] of 1984 and 1985, we found that in
about 25 cases statistical methods were used incorrectly. The
main defect was that observations entered into test statistics
were not independent. In a number of cases it was totally
unclear how the authors made their computations
Hoekstra & Jansen, AABS 16 (1986) 303-308
March 6, 2001 1
Example: W. Schouten Ph.D. work
Rearing conditions and Behaviour in pigs.
How does early experience influence later behaviour ?
’Barren’ farrowing crates vs. 2 × 2 m2 vs ’enriched’ large straw pens
28 m2.
8 sows (4 sister-pairs). Within each sister-pair the pigs were
assigned to treatment at random. Each litter consisted of 8 pigs,
i.e., a total of 64 piglets.
Detailed behavioural observations
March 6, 2001 2
11 Examples of Split-Plot Designs
214
Anovas
Reported Model Litter Averages
Effect df SS F df SS F
Sister-Pair 3 384.2 3 48.0
Housing System 1 893.3 5.14∗ 1 117.0 2.424
Residual 59 10253.7 3 138.2
Total 63 7
March 6, 2001 3
Mixed model formulation
Reported model:
Yijk = µ + Pi + Hj + εijk
Pi Effect of sister pair i ∈ {1, . . . , 4}. Hj effect of housing. εijk
random residual.
Correct model:
Yijk = µ + Pi + Hj + Sij + εijk
Sij Effect of sow.
March 6, 2001 4
215
Breed effect on production
Are the present feeding standards for essential nutrients per FUp
sufficient for Ad lib feeding ?
Beretning 579. A. Just et al. (1985)
6 litters (YY) and 6 litters of (LL) 6 (7) pigs (boars, gilts,
castrates). Two levels of nutrient concentrations in the feed.
March 6, 2001 5
Model
Yijkl = µ + ai + bj + ck + dl(j) + (ab)ij + (ac)ik + εijkl
• ai: effect of feed nutrient concentration, i ∈ {1, 2}
(Norm vs. Norm +20%).
• bj: effect of breed, j ∈ 1, 2 (LL and YY).
• ck: effect of sex k, k ∈ {1, 2, 3}.
• dl(j): effect of litter l within breed j.
• (ab)ij: interaction between feed concentration and breed.
• (ac)ik: interaction between feed concentration and sex.
• εijkl: random residual.
March 6, 2001 6
11 Examples of Split-Plot Designs
216
Similar designs
• Breeding line vs. pecking behaviour
• Rearing Conditions vs. later productivity
• Effect of organic feed.
• Effect of GMO production.
March 6, 2001 7
Straw shortener
A number of sows were fed with either control feed or feed containing straw fromfields treated with straw shortener (CCC). To investigate long term effects thestudy covered 4 parities.
Reported model:Yijk = µ + ti + pj + (tp)ij + εijk
Yijk: Observed variable e.g., litter size. ti: effect of treatment. pj: effect of parity.(tp)ij: Interaction between parity and treatment.εijk: random residual
Correct model:Yijk = µ + ti + pj + (tp)ij + Sik + εijk
Sik: Effect of sow k on treatment i, Sik ∼ N (0, σ2S)
March 6, 2001 8
217
Group housing
Loose housed sows. Automatic feeding systems.
Hypothesis: Pelleted feed reduces aggression compared with mealy
feed.
Hypothesis: Pelleted feed reduces the effect of rank on received
aggression.
March 6, 2001 9
Herd Investigations
Inspired by Nørgard (1999).
Yijkl = µ + ai + sj + Hijk + vl + (vs)jl + εijklm
• Yijklm measurement at slaughter.
• ai : Effect of Abattoir i.
• sj : Effect of herd disease state j.
• Hijk Random effect of herd Hijk ∼ N (0, σ2H).
• vl: Effect of season l.
• (vs)jl: Interaction between season and disease state.
• εijklm: Random residual from mth animal. εijklm ∼ N (0, σ2)
March 6, 2001 10
11 Examples of Split-Plot Designs
218
Multi location trials
Yijk = µ + τi + Lj + R(L)jk + (τL)ij + εijk
• τi: effect of treatment
• Lj: effect of location
• R(L)jk: random effect of block within location, R(L)jk ∼
N (0, σ2R)
• (τL)ij: interaction between treatment and location
• εijk: residual εijk ∼ N (0, σ2)
March 6, 2001 11
219
11 Examples of Split-Plot Designs
220
12 Estimation and tests in mixed models
The purpose of this lecture was to give a detailed description of theoretical issues of estimationand tests in mixed models, i.e. properties of maximum likelihood estimators in the linear normalmodel and the mixed linear normal model. Concepts such as ML and REML is introduced.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/MLMixed.f.pdf
221
Maximum Likelihood and Linear Normal Models
Example 1. Consider the linear regression model
yi = β0 + β1xi + εi
We shall show that the maximum likelihood estimate and the least squaresestimate for
β = (β0, β1)
are identical.
April 6, 2001 Mixed Models Course 1
Because of the independence, the joint density for y1, . . . , yn (and hencethe likelihood function) becomes
f(y1, ...yn;β) =n∏
i=1
f(yi;β)
=n∏
i=1
1√2π
1σ
exp(− 12σ2
(yi − (β0 + β1xi))2)
=1
√2π
n1σn
exp(− 12σ2
∑i
(yi − (β0 + β1xi))2)
= L(β)
April 6, 2001 Mixed Models Course 2
12 Estimation and tests in mixed models
222
The likelihood function is
L(β) =1
√2π
n1σn
exp(− 12σ2
∑i
(yi − (β0 + β1xi))2)
• Let D(β0, β1) =∑
i(yi − (β0 + β1xi))2.
• If σ is known then L(β) is maximized by minimizing the sumof squared deviations D(β0, β1) (because of the “−” sign in theexponential).
• Therefore the maximum likelihood estimate is the same as the leastsquares estimate.
fin
April 6, 2001 Mixed Models Course 3
For a general linear normal model
y = Xβ + ε where ε ∼ N(σ2I)
the likelihood is
L(β, σ2) =1
√2π
n1σn
exp(− 12σ2
∑i
(yi − µi))2)
=1
√2π
n1σn
exp(− 12σ2
(y −Xβ)>(y −Xβ))
Hence the maximum likelihood estimate for β is found by minimizing
(y −Xβ)>(y −Xβ).
April 6, 2001 Mixed Models Course 4
223
Once β (and hence µ) is found, it is not hard to verify that L(β, σ2) is
maximized as a function of σ2 by
σ2 =1n(y −Xβ)>(y −Xβ)
However, in practice one never uses the ML estimate for σ2. Instead one
uses
σ2 =1
n− p(y −Xβ)>(y −Xβ)
where p is the number of parameters in the model.
April 6, 2001 Mixed Models Course 5
The reason for using σ2 instead of σ2 is that
E(σ2) = σ2
E(σ2) =n− p
nσ2
That is σ2 is an unbiased estimate for σ2 while σ2 is biased.
April 6, 2001 Mixed Models Course 6
12 Estimation and tests in mixed models
224
It can be noted that
σ2 =1
n− p(y −Xβ)>(y −Xβ)
is called the REML estimate for σ2, where REML means REstricted or
REsidual Maximum Likelihood.
The REML method is frequently applied in connection with mixed
models in an attempt to obtain unbiased variance estimates.
April 6, 2001 Mixed Models Course 7
Maximum Likelihood Estimation in Mixed Models
For a mixed model
y = Xβ + Zu + ε
the variance of y is Cov(y) = V = Z Cov(u)Z> + Cov(ε).
• The unknown parameters are in this case (β, V ).
• The typical case is that V depends only on a small number of parameters
itself, e.g. on α = (σ2r , σ
2w, σ2) in a split–plot experiment.
• So we write V = V (α).
April 6, 2001 Mixed Models Course 8
225
In mixed models, maximum likelihood estimation becomes much more
involved.
The likelihood function is
L(β, V ) =1
√2π
n det(V )−n2 exp(−1
2(y −Xβ)>V −1(y −Xβ))
Here det(V ) is a number, called the determinant of V .
There are two situations to consider: When V is known and when V is
unknown.
April 6, 2001 Mixed Models Course 9
Case 1 - V is known: If V is known then L is maximized by
minimizing
(y −Xβ)>V −1(y −Xβ)
This quantity is minimized by
β = (X>V −1X)−1X>V −1y
which is also the weighted least squares estimate of β.
April 6, 2001 Mixed Models Course 10
12 Estimation and tests in mixed models
226
Case 2 - V is unknown: If V is unknown (which of course is
generally the case in practice) things become more complicated.
There are different approaches available. Two of these are
• Maximum Likelihood (ML) and
• Restricted Maximum Likelihood (REML)
April 6, 2001 Mixed Models Course 11
Maximum Likelihood: The expression
β(V ) = (X>V −1X)−1X>V −1y
depends on V which is unknown. If the expression for β is substituted
into L we get
L(β(V ), V ) =1
√2π
ndet(V )n2 exp(−1
2(y −Xβ(V ))>V −1(y −Xβ(V )))
This likelihood depends now only on V .
April 6, 2001 Mixed Models Course 12
227
Maximization of L has to be done iteratively.
This gives V and hence
β(V ) = (X>V −1X)−1X>V −1y
Typically, V only depends on a few parameters, say α, so we write
V = V (α).
In that case L(β(V (α)), V (α)) has to be maximized as a function of α.
April 6, 2001 Mixed Models Course 13
Restricted Maximum Likelihood:
An alternative to ML estimation REML estimation.
This is the default method in PROC MIXED.
Consider a mixed model
y = Xβ + Zu + ε, where Var(y) = V
and V and β are unknown.
If β had been known, the residuals are
ε = y −Xβ ∼ N(0, V )
April 6, 2001 Mixed Models Course 14
12 Estimation and tests in mixed models
228
and one could use the ML method from before for estimating V .
However, β is not known. Therefore one frequently does the following:
The least squares estimate of β is
βls = (X>X)−1X>y
and while not the optimal estimate for β, it is still an unbiased estimate.
One then considers the residuals
εls = y −Xβls ∼ N(0, A(X)V A(X)>)
where A(X) is a known matrix which is a function of X.
April 6, 2001 Mixed Models Course 15
The likelihood for the “residuals” εls then depends only on V and one
can maximize that likelihood numerically.
This gives the REML estimate Vreml for V .
When V depends on fewer parameters α the result is the REML estimate
αreml.
With this estimate at hand we can estimate β as
βreml = β(Vreml) = (X>V −1remlX)−1X>V −1
remly
April 6, 2001 Mixed Models Course 16
229
Using ML or REML
In practice the ML and the REML estimates do not differ much.
The main argument for REML estimation is that, at least in the balanced
cases, Vreml is unbiased while Vml is not.
Whether Vreml is always unbiased is not known.
April 6, 2001 Mixed Models Course 17
Tests in Mixed Models
In dealing with tests in mixed models we shall first assume that the
covariance matrix V is known.
Typically we are interested in testing hypotheses of the form λ>β = k for
some vector λ and some number k (often k = 0.)
We know that the contrast λ>β is estimable if and only if there is a
vector a such that a>X = λ>.
The estimate of the contrast is λ>β is a>Xβ, where
Xβ = X(X>V −1X)−1X>V −1y
April 6, 2001 Mixed Models Course 18
12 Estimation and tests in mixed models
230
Standard calculations gives that
Var(Xβ) = X(X>V −1X)−1X>V −1X(X>V −1X)−1X>
= X(X>V −1X)−1X>
so
Xβ ∼ N(Xβ, X(X>V −1X)−1X>).
Hence
a>Xβ ∼ N(a>Xβ, a>X(X>V −1X)−1X>a)
If the hypothesis λ>β = k is true then
a>Xβ − k ∼ N(0, a>X(X>V −1X)−1X>a)
April 6, 2001 Mixed Models Course 19
Therefore if V is known the task is to test whether E(a>Xβ − k) = 0when Cov(a>Xβ − k) is known.
This can be done by constricting the statistic
X2 = (a>Xβ − k)>[a>X(X>V −1X)−1X>a]−1(a>Xβ − k)
which under the hypothesis has a χ2(f1)–distribution where f1 is the
number of parameters “eliminated” in the contrast a>Xβ = k
April 6, 2001 Mixed Models Course 20
231
The problem is what to do when V is unknown?
In some cases V (e.g. in a split–plot experiment) the structure of V is
such that V = ω2W−1 where W is known and ω2 is unknown.
In that case, one can construct an F–statistic
F =(a>Xβ − k)>[a>X(X>W−1X)−1X>a]−1(a>Xβ − k)/f1
ω2
which under the hypothesis has an Ff1,f2–distribution.
How to derive f2 shall not be discussed here. We just note that PROCMIXED attempts to construct such test statistics and to derive the
appropriate number f2 of denominator degrees of freedom.
April 6, 2001 Mixed Models Course 21
In this connection it is to be pointed out that it is extremely important to
specify the random effects in the RANDOM–statement in the correct way.
April 6, 2001 Mixed Models Course 22
12 Estimation and tests in mixed models
232
Another approach is to construct approximate F–tests by establishing
a denominator D, such that
F =(a>Xβ − k)>[a>X(X>V −1X)−1X>a]−1(a>Xβ − k)/f1
D/f2
has an approximate F–distribution when the hypothesis is true.
Adding the option DDFM=SATTERTH to the MODEL–statement causes PROCMIXED to attempt to construct such tests.
April 6, 2001 Mixed Models Course 23
A final option is the following:
When n→∞ (in a suitably regular way) then V and V becomes
indistinguishable.
Therefore, one approach is to simply “pretend” that the ML estimate V
is the true, but unknown variance V .
One can force PROC MIXED to making such tests by adding the
CHISQ–option to a the model statement.
April 6, 2001 Mixed Models Course 24
233
12 Estimation and tests in mixed models
234
13 Complications concerning VarianceComponents
This lectures illustrated some of the problems that may arise because of numerical problemsin the iterative search for the maximum likelihood, and the reason why some of the variancecomponents are set equal to 0.
Based on an example from one of the exercises, the profile of the likelihood function is illustrated.
A special problem is that Satterthwaites approximation fails in the cases where the variancecomponent is set to 0, and the G matrix is not positive-semidefinit. Rules of thumb is suggestedin that case.
Finally, the relevance of a test of a positive variance component is discussed, e.g. comparableto a test of a block effect, when block is treated as a fixed effect
Link to the fullscreen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Complicate.pdf
235
Sugar beet example
Pct Sukk
Num Den
Effect DF DF F Value Pr > F
OPTAGN 1 2 15.21 0.0599
SAATID 4 16 189.37 <.0001
OPTAGN*SAATID 4 16 5.37 0.0061
Kg
OPTAGN 1 18 336.85 <.0001
SAATID 4 18 408.52 <.0001
OPTAGN*SAATID 4 18 12.70 <.0001
March 13, 2001 1
Inspection of Log
Pct Sukk
NOTE: Convergence criteria met.
NOTE: There were 30 observations read from the data set WORK.ROER.
Kg
NOTE: Convergence criteria met.
NOTE: Estimated G matrix is not positive definite.
NOTE: There were 30 observations read from the data set WORK.ROER.
March 13, 2001 2
13 Complications concerning Variance Components
236
Sugar beet example
Table 1: Covariance Parameter EstimatesPct Sukk
Cov Parm Estimate Alpha Lower Upper
BLOK 0.001000 0.05 0.000164 37.9371
BLOK(OPTAGN) 0.001000 0.05 0.000219 0.2840
Residual 0.001333 0.05 0.000740 0.003088
Kg
BLOK 0.05344 0.05 0.01660 3.13E192
BLOK(OPTAGN) 0 . . .
Residual 5.1215 0.05 2.9241 11.2004
March 13, 2001 3
Outline
• Estimation of variance components
– Why are σ2X = 0
– Consequences
– Rules of Thumb
• Are random effects significant ?
– Are we really interested ?
– Likelihood ratio tests
March 13, 2001 4
237
Reason
The likelihood function is maximized subject to the constraint that
the variance component parameters σ2X ≥ 0.
The precision of numerical optimisation methods depends on the
internal representation of numbers in the computer. Proc Mixed
solves this by setting σ2X = 0 if it is close to 0.
Other statistical packages (R,S-Plus) handles the constraint by
maximising the likelihood as a function of log(σ2X)
Sometimes (e.g., repeated measurements) the assumption that
σ2X ≥ 0 cannot be justified.
March 13, 2001 5
Likelihood contour plot Pct Sukk
−5 −4 −3 −2 −1 0 1 2
−5
−4
−3
−2
−1
01
2
log10(σB(O)2 )
log 1
0(σB2)
March 13, 2001 6
13 Complications concerning Variance Components
238
Likelihood contour plot Kg
−5 −4 −3 −2 −1 0 1 2
−5
−4
−3
−2
−1
01
2
log10(σB(O)2 )
log 1
0(σB2)
March 13, 2001 7
G Not positive Definite
V(u) = G =
σ2B 0 0 0 0 0
0 . . . 0 0 0 0
0 0 σ2B 0 0 0
0 0 0 σ2B(O) 0 0
0 0 0 0 . . . 0
0 0 0 0 0 σ2B(O)
March 13, 2001 8
239
G Not positive Definite
V(u) = G =
σ2B 0 0 0 0 0
0 . . . 0 0 0 0
0 0 σ2B 0 0 0
0 0 0 σ2B(O) 0 0
0 0 0 0 . . . 0
0 0 0 0 0 σ2B(O)
March 13, 2001 9
G Not positive Definite
G =
σ2B 0 0 0 0 00 . . . 0 0 0 00 0 σ2
B 0 0 00 0 0 0 0 00 0 0 0 . . . 00 0 0 0 0 0
G−1 =???
March 13, 2001 10
13 Complications concerning Variance Components
240
Warning: Satterthwaite Goes Wrong
Satterthwaite’s approximation uses the estimated variance
components for calculation of test degrees of freedom. The
calculations includes differentiation with respect to σ2X. At boundary
values such as 0 this differentiation is not defined.
In the PARMS statement a lower bound on the estimated variance
components may be specified, e.g.,
PARMS /LBOUND=0.001,0.001,0.001;
This produces the same problems as σ2X = 0
March 13, 2001 11
Conclusions
• If estimated covariance parameters are > 0 use Satterthaites
approximation.
• If not
– If model reductions are ”natural”, reestimate parameters using
revised models.
– Nested design should be reformulated to maintain design
– Use containment method but be careful to specify model
syntactically correct. (Compare with random statement in GLM)
March 13, 2001 12
241
Testing Effects of Random components
• Why are we interested in testing σ2B > 0 ?
• Model Reduction
• σ2B = 0 is not a test and may not be used for this purpose.
• Fixed effects vs. Random Effects
• Biologically significant, i.e.,. if we sample x individuals at random,
what are the average difference between lowest and highest,
confidence interval for the difference. What is the correlation,
heretability, repeatability, sensitivity and specificity.
March 13, 2001 13
Model Reduction
Consider model A and model B that represents a special case of A,
e.g., one of the variance components σ2X = 0. B is said to be nested
within A. In this case a Likelihood Ratio test may be performed
Then 2(LogLikeA − LogLikeB) is asymptotically χ2 distributed with
(pA − pB) degrees of freedom, where pA is number of parameters in
model A.
NB! This not feasible if σ2X = 0
March 13, 2001 14
13 Complications concerning Variance Components
242
General recommandations
• Using ML any nested models may be compared
• Using REML only nested models with identical fixed effects may be
compared.
• With respect to test for variance components this test is
conservative, i.e., true p-value is smaller than the calculated. Thus
the test results in too few significant findings.
• With respect to test for fixed effects this test is anti conservative,
i.e., true p-value is larger than the calculated. Thus the test results
in too many significant findings. (Therefore likelihood ratio tests
should not be used for fixed effects).
March 13, 2001 15
Fixed Effects
If the variance component is 0, this implies that ui = uj for every i
and j.
i.e., Reformulate model and treat the factor of interest as Fixed.
However:
ui ≈ uj does not imply that σ2u = 0
March 13, 2001 16
243
Biologically significant
• Very often the real interest can be formulated as an interval of the
variance component parameter, e.g., is it larger than some preset
’irrelevance’ level ?
• The confidence interval produced with the CL option in the
Proc Mixed statement are often sufficient for this. However,
the general comment about sufficient sample size is VERY relevant
here.
• Many ’biologically’ relevant parameters are combinations of several
variance component parameters, e.g., correlation ( repeatability)
(σ2
A
σ2ε+σ2
A
). Therefore the joint distribution of parameter estimates
need to be considered. This is not trivial (Interest ???).
March 13, 2001 17
Covariance Matrix: Sugar beet PCT Sukk
Asymptotic Covariance Matrix of Estimates
Row Cov Parm CovP1 CovP2 CovP3
1 BLOK 3.069E-6 -8.02E-7
2 BLOK(OPTAGN) -8.02E-7 1.613E-6 -4.44E-8
3 Residual -4.44E-8 2.222E-7
March 13, 2001 18
13 Complications concerning Variance Components
244
14 Repeated Measurements
This lecture gives an introduction to repeated measurements, and is a supplement to Chapter 3in LMSW (Littell et al., 1996). It illustrates how it is possible to modify the tacit assumptionsof the split-plot design into a more flexible modelling of the variance matrix.
Different variance structure is illustrated graphically and the use of SAS to compare differentstructures presented. The AR(1) and CS structure are discussed in detail. Finally, methods forcomparison between different structures is shown.
Links to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/Repeated.f.pdf
245
Analyzing Repeated Measurements
Consider the setup:
• A treatment factor A with a levels is applied to individuals, e.g.
pigs.
• Within each treatment there are c individuals
• On each individual repeated measurements of the same response is
made at b different time points.
October 11, 2001 Mixed Models Course 1
Example: Exercise Therapy (LMSW p. 88)
• Subjects (SUBJ) were assigned to one of three different training
programs (PROGRAM) on weightlifting.
• The strength (STRENGTH) of the subjects was measured every
second day (TIME) for a two period from the start of the study.
Some questions:
• Is there a treatment effect?
• Is there an interaction between treatment and time?
October 11, 2001 Mixed Models Course 2
14 Repeated Measurements
246
Mean profiles
1 2 3 4 5 6 7
79.5
80.5
81.5
82.5
time
stre
ngth
Group means
CC C C
CC CR
RR
RR R RW
WW
W W WW
The task: Comparison of the mean profiles
Clear evidence of treatment effect and treat–by–time interaction.
October 11, 2001 Mixed Models Course 3
Individual profiles:
1 2 3 4 5 6 7
7580
8590
time
stre
ngth
CONT
1 2 3 4 5 6 7
7580
8590
time
stre
ngth
RI
1 2 3 4 5 6 7
7580
8590
time
stre
ngth
WI
No evidence of non–constant variance!!
Sometimes (but certainly not always!) repeated measurements can
be appropriately dealt with by a split–plot model.
October 11, 2001 Mixed Models Course 4
247
• A statistical model for this situation could be
yijk = µ + αi + βj + γij + wik + εijk
where wik ∼ N(0, σ2
w) and εijk ∼ N(0, σ2).
• Here i denotes treatment, k is replications (within treatment) and
j is “time”
• “Time” is called the within–subject factor.
• Note: “Time” can also refer to different locations, e.g. in the
intestine.
• It is the usual split–plot model!
October 11, 2001 Mixed Models Course 5
Tacit Assumptions when using the Split–Plot Model
It is important to realize the assumptions one make in applying a
split–plot model to a repeated measurement problem:
1. It is assumed that the variance is constant.
This may not a reasonable assumption: Sometimes the variance
increases with the mean, and if the mean changes over “time”, this
assumption is violated.
If time is really location in the instine, there might be certain
segments where the variance of a given respons is much larger than
in other segments.
October 11, 2001 Mixed Models Course 6
14 Repeated Measurements
248
2. It is assumed that the correlation between two measurements on the
same individual is the same – no matter how far the measurements
are apart in time.
This may not be a reasonable assumption: Observations close
to each other in time might be expected to be more alike than
observations far from each other.
3. It is assumed that the correlation is positive.
This may not be a reasonable assumption: Consider a feeding
experiment. If the feed intake is lower than expected in one week
because of diseases it may be higher than expected in the next
week. Hence the observations would be negatively correlated.
October 11, 2001 Mixed Models Course 7
4. It is assumed that the biological questions can be answered through
the interaction γik and possibly the main effects αi and βj.
That might be a too crude model. For example, data might
indicate the mean value evolves over time in a specific way, e.g.
µij = µ + αi + β × j + β2 × j2
October 11, 2001 Mixed Models Course 8
249
Modelling of Covariances
A classical way of thinking of a statistical model is as
Observables = Systematic effects + Random effects
Most frequently, the main interest is in the systematic effects, while
the random effects are considered as a nuissance.
Yet, the random effects are important to understand and to model in
an appropriate way.
October 11, 2001 Mixed Models Course 9
Types of random variation
5 10 15 20
5010
015
020
0
x
m + e
5 10 15 20
5010
015
020
0
x
m + subj
5 10 15 20
5010
015
020
0
x
m + ser
5 10 15 20
5010
015
020
0
x
m + subj + ser + e
October 11, 2001 Mixed Models Course 10
14 Repeated Measurements
250
Can be summarized as:
• Random subject effect
• Serial dependence
• Residual variation
October 11, 2001 Mixed Models Course 11
Unstructured Covariance Matrix
Consider Exercise Therapy data.
A very general model is the model where for each treatment i and
time j there is mean value µij, and the measurements have a
completely unstructured covariance matrix.
Yik =
Yi1k
...
Yi7k
∼ N7(µi =
µi1
...
µi7
, V )
where k refers to to subject within treatment, and where V is a
7× 7 unstructured matrix.
October 11, 2001 Mixed Models Course 12
251
Since the subjects are independent the random vector arising after
stacking all Yiks on the top of each other has a covariance matrix
consisting of V ’s on the “diagonal” and 0s outside.
Such a matrix is said to be block diagonal.
Note that in V there are 7× 8/2 = 28 parameters.
October 11, 2001 Mixed Models Course 13
This model can be fitted with the following SAS program:
proc mixed data=weight2;
class program subj time;
model strength = program time program*time / outP=pred;
repeated time / subject=subj*program type=un r;
ods listing exclude r; ods output r=r rcorr=rcorr;
data r; set r; keep col1-col7;
data rcorr; set rcorr; keep col1-col7;
run;
The data set r contains the estimated covariance matrix, whilercorr contains the correlation matrix
Note that V is the covariance matrix for Yik. But if we writeYik = µi + εik (note: everything here are vectors) then V is also thecovariance for the error terms εik which has mean 0.
October 11, 2001 Mixed Models Course 14
14 Repeated Measurements
252
The estimated correlation matrix is
1.0000 0.9602 0.9246 0.8716 0.8421 0.8091 0.7968
0.9602 1.0000 0.9396 0.8770 0.8596 0.8273 0.7917
0.9246 0.9396 1.0000 0.9556 0.9372 0.8975 0.8755
0.8716 0.8770 0.9556 1.0000 0.9601 0.9094 0.8874
0.8421 0.8596 0.9372 0.9601 1.0000 0.9514 0.9165
0.8091 0.8273 0.8975 0.9094 0.9514 1.0000 0.9531
0.7968 0.7917 0.8755 0.8874 0.9165 0.9531 1.0000
October 11, 2001 Mixed Models Course 15
The AR(1)–model
Consider a sequence of measurements z1, z2, . . . , zT made on the
same experimental unit at T time points t = 1, . . . , T .
It is assumed that E(zt) = 0 for all t.
A frequently employed model is the AutoRegressive model of order
1, which states that
zt = ρzt−1 + εt t = 2, . . . , T
where εt ∼ N(0, σ2
z), all independent and where −1 < ρ < 1.
October 11, 2001 Mixed Models Course 16
253
Hence what happens at time t is ρ times what happened at time
t− 1 + some random noise.
The variance of each zt is the same and is denoted ω2.
This variance can be found as:
ω2 = Var(zt) = Var(ρzt−1 + εt)
= ρ2 Var(zt−1) + Var(εt)
= ρ2ω2 + σ2
Hence ω2 = σ2
1−ρ2.
October 11, 2001 Mixed Models Course 17
It is illustrative to investigate the covariance structure of this model.
First consider observations one time–step apart:
Cov(zt, zt−1) = Cov(ρzt−1 + εt, zt−1)
= ρCov(zt−1, zt−1) = ρVar(zt−1) = ρω2
Next we consider observations two time–steps apart:
Cov(zt, zt−2) = Cov(ρzt−1 + εt, zt−2)
= Cov(ρzt−1, zt−2) = ρCov(zt−1, zt−2)
= ρ2ω2
October 11, 2001 Mixed Models Course 18
14 Repeated Measurements
254
In general, the covariance between observations k time–steps apart is
Cov(zt, zt−k) = ρkω2
The correlation between observations k time steps apart therefore
becomes
γ(k) = Corr(zt, zt−k) =ρkω2
ω2= ρk
The number k is called the lag between the observations and γ(k) is
called the autocorrelation function
If the postulated model is correct, the autocorrelation should tend to
0 as the lag increases.
Some Autocorrelations
October 11, 2001 Mixed Models Course 19
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
c(0, x)
rho^
c(0,
x)
Autocorrelation, rho= 0.5
0 10 20 30 40 50
−5
05
10
x
z
Observations
0 10 20 30 40 50
−0.
50.
00.
51.
0
c(0, x)
rho^
c(0,
x)
Autocorrelation, rho= −0.5
0 10 20 30 40 50
−10
−5
05
10
x
z
Observations
October 11, 2001 Mixed Models Course 20
255
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
c(0, x)
rho^
c(0,
x)
Autocorrelation, rho= 0.9
0 10 20 30 40 50
−10
−5
05
10
x
z
Observations
0 10 20 30 40 50
0.0
0.2
0.4
0.6
0.8
1.0
c(0, x)
rho^
c(0,
x)
Autocorrelation, rho= 0.1
0 10 20 30 40 50−
10−
50
510
x
z
Observations
October 11, 2001 Mixed Models Course 21
How to estimate the autocorrelation??
A very brute–force way of estimating the autocorrelation is the
following: Suppose there are observations from 4 time points, i.e.
t = 1, . . . , 4 on many subjects and assume observations all have zero
mean.
Then the (symmetric) matrix of correlations is
Corr =
1 ρ12 ρ13 ρ14
ρ21 1 ρ23 ρ24
ρ31 ρ23 1 ρ34
ρ41 ρ24 ρ43 1
October 11, 2001 Mixed Models Course 22
14 Repeated Measurements
256
Simple estimates of the autocorrelation for observations one, two
and three time–step apart are
γ(1) =1
3(ρ12 + ρ23 + ρ34)
γ(2) =1
2(ρ13 + ρ24)
γ(3) =1
1(ρ14)
Obviously, for higher values of k, γ(k) will be poorly estimated as it
is the average over few values.
October 11, 2001 Mixed Models Course 23
The autocorrelation can be estimated (as described above) byinvoking the macro:
%autocorr(r);
where r is the covariance matrix estimated in connection with the
model with unstructured covariance matrix.
If the file autocorr.sas is located in e.g. c:\stat then the macrois included, i.e. made available by submitting the statement
%include ’d:\stat\autocorr.sas’;
This creates the SAS dataset autocorr with autocorrelation and lag.
October 11, 2001 Mixed Models Course 24
257
The macro also creates a plot of the autocorrelation against lag:
0 1 2 3 4 5 6
0.80
0.85
0.90
0.95
1.00
lag
auto
corr
Autocorrelation for Exercise Therapy data
What can be concluded from that?
October 11, 2001 Mixed Models Course 25
• There is a clear indication of positive correlation and that the
correlation decreases with time.
• Whether the correlation structure can be appropriately described
by ρk is another issue. There is not much evidence for or against
that structure.
October 11, 2001 Mixed Models Course 26
14 Repeated Measurements
258
Since all autocorrelations γ(k) are positive it is tempting to plot
log γ(k) against k as well.
The reason is that if the autocorrelation is γ(k) = ρk then
log γ(k) = k log ρ.
Hence a plot of log γ(k) = k log ρ against k should approximately
yield a straight line with intercept 0 and slope log ρ:
October 11, 2001 Mixed Models Course 27
0 1 2 3 4 5 6
−0.
20−
0.15
−0.
10−
0.05
0.00
lag
log(
auto
corr
)
Log Autocorrelation for Exercise Therapy data
October 11, 2001 Mixed Models Course 28
259
Again, there is not any strong evidence against the AR(1) structure.
From the graph it follows that the slope is approximately
log ρ ≈ −0.23/6 = −0.038 such that ρ ≈ 0.962.
Hence the correlation between observations does decrease as the
time between them increases – but it decreases very slowly!!
October 11, 2001 Mixed Models Course 29
Compound Symmetry
The Split–plot model can also be formulated using a REPEATED
statement instead of a RANDOM statement.
proc mixed data=weight2;
class program subj time;
model strength = program time program*time;
repeated time / type=cs sub=subj(program) r rcorr;
ods listing exclude r; ods output r=r;
run;
Fortunately, the results using a REPEATED or a RANDOM statement are
the same!
The option type=cs specifies that the covariance matrix for each
October 11, 2001 Mixed Models Course 30
14 Repeated Measurements
260
subject has a compound symmetry structure:
σ2 + σ2
w σ2
w . . . σ2
w
σ2
w σ2 + σ2
w . . . σ2
w... ... . . . ...
σ2
w σ2
w . . . σ2 + σ2
w
From the SAS output one sees that the correlation between
observations on the same subject is estimated to
σ2
w
σ2w + σ2
≈ 0.8892
October 11, 2001 Mixed Models Course 31
Which Covariance Structure to use?
With all this flexibility in choosing the covariance structure, some
guidelines are needed for choosing an appropriate one:
• Parsimony: Covariance structures with few parameters are most
attractive as there are fewer parameters to be estimated from data.
• Exploratory data analysis: A graphical investigation of the data
might suggest an appropriate covariance structure.
• Subject matter considerations: Sometimes the problem at hand
really dictates an appropriate covariance structure
October 11, 2001 Mixed Models Course 32
261
• Necessity: Sometimes one is for numerical reasons forced to use a
very simple covariance structure – PROC MIXED might not be able
to fit the complex ones.
• Numerical criteria: There are some numerical criteria, which can
be a guideline.
October 11, 2001 Mixed Models Course 33
Numerical Criteria
AIC and BIC are some criteria to be used. They are both the
log–likelihood + some term penalizing for the number of parameters
used in the model. BIC penalizes the use of many parameters harder
than AIC.
Smaller values of both criteria indicate a good fit.
For the Exercise Therapy the result is
Structure CS AR(1) UN
AIC 1424.9 1270.8 1290.9
BIC 1428.9 1274.9 1348.1
October 11, 2001 Mixed Models Course 34
14 Repeated Measurements
262
Hence the result is in favor of using the AR(1)–structure.
October 11, 2001 Mixed Models Course 35
What does the covariance structure mean for the
conclusions?
For the Exercise Therapy the p–values for the test of no interaction
effect are:
Structure CS AR(1) UN
Program*Time 0.0005 0.3007 0.1297
Radically different conclusions!
The data really suggests that the interaction is present!
October 11, 2001 Mixed Models Course 36
263
14 Repeated Measurements
264
15 Repeated Measurements: Covariancestructures
This lecture gives an overview of how to specify different covariance structures in SAS via theREPEATED statement in PROC MIXED. The lecture is based on the description in the on-line SAS-manual1.
The most important types of covariance structure is presented.
• Unstructured (UN)
• Autoregressive (AR(1)–SP(POW))
• Antedependence (ANTE(1))
• Toeplitz (TOEP)
• Heterogeneous variance (ARH(1),CSH, etc.)
The pro’s and con’s of the different structures are discussed
Link to full screen Presentation2
1http://dokumentation.agrsci.dk/sasdocv8/sasdoc/sashtml/onldoc.htm2http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RepeatedType.f.pdf
265
Repeated statement
Y = Xβ + Zu + ε
V(ε) = R
R is a n× n matrix, where n is number of observations.
In order to handle this, a structure of the matrix is defined with
repeated use of the elements in the structure.
March 21, 2001 1
Repeated Statement
The syntax of the REPEATED statement
REPEATED < repeated-effect > < / options >;
Usually a formulation like:
REPEATED time / subj=animal*treat ;
A good precaution is always to specify the repeated-effect
March 21, 2001 2
15 Repeated Measurements: Covariance structures
266
Missing data: example
Treat Animal Time Y
A 1 1 12.4
A 1 2 .
A 1 3 14.5
B 1 1 14.3
B 1 2 15.3
B 1 3 14.8... ... ... ...
March 21, 2001 3
PROC MIXED: REPEATED Statement
REPEATED < repeated-effect > < / options > ;
You can specify the following options in the REPEATED statement
after a slash (/).GROUP=effect HLM HLPSLDATA=SAS-data-set LOCAL LOCALWNONLOCALW R<=value-list> RC<=value-list>RCI<=value-list> RCORR<=value-list> RI<=value-list>SSCP SUBJECT=effect TYPE=covariance-structure
March 21, 2001 4
267
Types of variance structure
• Approximately 30 different methods
• ”Time”/”linear” structure vs. spatial structure
• Homogeneous vs. heterogeneous variance
• ”Banded” vs full structure
March 21, 2001 5
Unstructured: type=un
The measurements of each subject
σ11 σ12 σ13 σ14
σ22 σ23 σ24
σ33 σ34
σ44
Parameters t× (t + 1)/2
March 21, 2001 6
15 Repeated Measurements: Covariance structures
268
Autoregressive: type=AR(1)
The measurements of each subject
σ2
1 ρ ρ2 ρ3
1 ρ ρ2
1 ρ
1
.
.
Y1 Y2 Y3 Y4 Y5
ρ ρ ρ ρ
March 21, 2001 7
Autocovariance
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
lag
ρ
March 21, 2001 8
269
Autocovariance
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
lag
ρ
March 21, 2001 9
Autoregressive: type=SP(POW)
The measurements of each subject
σ2
1 ρ|t2−t1| ρ|t3−t1| ρ|t4−t1|
1 ρ|t3−t2| ρ|t4−t2|
1 ρ|t4−t3|
1
March 21, 2001 10
15 Repeated Measurements: Covariance structures
270
Ante-Dependence: type=ANTE(1)
AR(1)
.
.
Y1 Y2 Y3 Y4 Y5
ρ ρ ρ ρ
ANTE(1)
.
.
Y1 Y2 Y3 Y4 Y5
ρ1 ρ2 ρ3 ρ4
March 21, 2001 11
Ante-Dependence: type=ANTE(1)
The measurements of each subject
σ2
1σ1σ2ρ1 σ1σ3ρ1ρ2 σ1σ4ρ1ρ2ρ3
σ2
2σ2σ3ρ2 σ2σ4ρ2ρ3
σ2
3σ3σ4ρ3
σ2
4
March 21, 2001 12
271
Toeplitz: type=TOEP
The measurements of each subject
σ2 σ1 σ2 σ3
σ2 σ1 σ2
σ2 σ1
σ2
March 21, 2001 13
Heterogenous variance
Instead of identical variance at every time point, the variance is
estimated at each time point
In general, the type is found by simple adding an H to the type, i.e.,
csh, arh(1), toeph
The structures are preserved as far as the correlation between time
points are concerned
More elaborate parametric techniques are available Eq. LIN
March 21, 2001 14
15 Repeated Measurements: Covariance structures
272
Conclusions
• Parsimony !
• Fixed observation times and similar intervals : AR(1)
(2 parms)
• Slightly varying observation times and similar intervals
: SP(POW) (2 parms)
• Fixed observation times but intervals of different type:
ANTE(1) (2t− 1 parms (heterogen. variance))
• Fixed observation times, similar intervals, no simple
lag-structure : TOEP (t− 1 parms)
March 21, 2001 15
AR vs CS
AR(1)
.
.
Y1 Y2 Y3 Y4 Y5
ρ ρ ρ ρ
CS
.
.
A
Y1 Y2 Y3 Y4 Y5
March 21, 2001 16
273
15 Repeated Measurements: Covariance structures
274
16 Random Regression
The random regression model is discussed starting with an example from one of the exercises.The presentation supplements chapter 7: Random Coefficients in LMSW (Littell et al., 1996)
The basic idea behind random regression and the implementation of the model in PROC MIXEDis shown. Finally, the implications for the covariance structure of the observations is presented.
Link to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/RandomRegression.f.pdf
275
The Basic Idea behind Random Regression
Feeding pigs with different amounts of vitamin E supplement.
Weights recorded weekly.
4 6 8 10 12
4060
8010
0
Time
Wei
ght
Cu = 1
4 6 8 10 12
4060
8010
0
Time
Wei
ght
Cu = 2
4 6 8 10 12
4060
8010
0
Time
Wei
ght
Cu = 3
October 4, 2001 Mixed Models Course 1
• Clearly (random) between–subject (pig) variation.
• Approximately linear increase in weight.
• Slight tendency to larger dispersion between pigs at the end of the
study than at the beginning.
• Repeated measurement problem.
Aims:
• Find a regression model which describes the weight as function of
time.
• Draw inferences about possible treatment effects.
October 4, 2001 Mixed Models Course 2
16 Random Regression
276
First idea: fit linear regression model (with random pig effect) and
treatment specific parameters:
yijt = αi + βit + Uij + εijt
Here, i is treatment, j is subject (pig) within treatment, t is time,
Uij ∼ N(0, σ2u) and εijt ∼ N(0, σ2), all independent.
title ’Linear regression (with random Pig effect)’;
title2 ’Treatment specific parameters’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time /noint solution outp=R1 ;
random Cu*Pig;
run;
October 4, 2001 Mixed Models Course 3
Plot the curves of residuals:
symbol i=j;
proc gplot data=R1;
by Cu;
plot resid*Time=Pig;
run;
4 6 8 10 12
−10
−5
05
Time
Res
id
Cu = 1
4 6 8 10 12
−10
−5
05
Time
Res
id
Cu = 2
4 6 8 10 12
−10
−5
05
Time
Res
id
Cu = 3
The “residual curves” do not look random.
October 4, 2001 Mixed Models Course 4
277
Second idea: fit individual linear regression model (with random pig
effect):
yijt = αi + βijt + Uij + εijt
where i is treatment, j is subject (pig) within treatment, t is time,
and Uij ∼ N(0, σ2u) and εijt ∼ N(0, σ2), independent.
title ’Individual linear regressions (with random Pig effect)’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Pig*Time /noint solution outp=R2;
random Cu*Pig;
ods output solutionf=sf2;
proc gplot data=R2;
by Cu;
plot resid*Time=Pig;
run;
October 4, 2001 Mixed Models Course 5
4 6 8 10 12
−4
−2
02
4
Time
Res
id
Cu = 1
4 6 8 10 12
−4
−2
02
4
Time
Res
id
Cu = 2
4 6 8 10 12
−4
−2
02
4
Time
Res
id
Cu = 3
The “residual curves” now look much more random.
This approach gives a whole lot of parameter estimates βij, where i
refers to treatment and j to individual within treatment.
How to proceed with the analysis?
October 4, 2001 Mixed Models Course 6
16 Random Regression
278
Analyzing the Individual Regression Coefficients
Frequently the task is to estimate the effect of time for each
treatment.
A tempting (and classical) way of doing this is to continue analyzing
the βijs.
For example, βi. = 1J
∑
j βij is the average slope within treatment i.
The analysis could then proceed by comparing β1. , β2. and β3. in
some way.
Yet - it is somewhat unsatisfactory to first estimate the βijs as
systematic effects and then afterwards analyzing these as if they
October 4, 2001 Mixed Models Course 7
were random quantities.
October 4, 2001 Mixed Models Course 8
279
Some graphics of the βijs:
Estimates Time*Cu*Pig for Cu= 1
Estimate
Den
sity
5 6 7 8 9
0.0
0.4
0.8
−1.0 −0.5 0.0 0.5 1.0
6.0
7.5
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Estimates Time*Cu*Pig for Cu= 2
Estimate
Den
sity
5 6 7 8 9
0.0
0.6
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
6.6
7.4
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
sEstimates Time*Cu*Pig for Cu= 3
Estimate
Den
sity
5 6 7 8 9
0.0
0.3
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
5.5
7.0
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
October 4, 2001 Mixed Models Course 9
Random Regression
A random regression model is an alternative:
yijt = αi + βit + Uij + Bijt + εijt
The systematic effects are as usual.
The random effects are Uij ∼ N(0, σ2u), Bij ∼ N(0, σ2
B) and
εijt ∼ N(0, σ2).
It is assumed that εijt is independent of Uij and of Bij but needs
not to be assumed that Uij and Bij are independent.
October 4, 2001 Mixed Models Course 10
16 Random Regression
280
Hence
• βi is the population slope pigs receiving the ith treatment.
• Bij describes individual random deviations from the population
slope.
In this way systematic and random variation of the regression
coefficients can be separated.
October 4, 2001 Mixed Models Course 11
Just like the parameter estimates in a regression usually are
correlated, then so might the random effects Uij and Bij also be.
To obtain such flexibilities, we assume
[
Uij
Bij
]
∼ N2
([
0
0
]
,
[
σ2U σUB
σUB σ2B
])
If σUB = 0 then Uij and Bij are independent.
October 4, 2001 Mixed Models Course 12
281
How to ... In SAS
Independence:
title ’Random regression model (with random Pig effect)’;
title2’Independent intercepts and slopes’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time / ddfm=satterth noint solution outp=R3;
random int Time / sub=Pig type=vc solution;
ods output solutionf=sf3;
ods exclude listing solutionr;
ods output solutionr=sr3;
run;
Independence of Uij and Bij is obtained by type=vc in the RANDOM
statement.
October 4, 2001 Mixed Models Course 13
Dependence:
title ’Random regression model (with random Pig effect)’;
title2’Dependent intercepts and slopes’;
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Cu*Time / ddfm=satterth noint solution outp=R4;
random int Time / sub=Pig type=un solution;
ods output solutionf=sf4;
ods exclude listing solutionr;
ods output solutionr=sr4;
run;
Dependence of Uij and Bij is obtained by type=un in the RANDOM
statement.
October 4, 2001 Mixed Models Course 14
16 Random Regression
282
Inference
In connection with random regression models we recommend always
using the ddfm=satterth option for estimating the degrees of
freedom.
Contrast etc. can be obtained as follows:
proc mixed data=CuFeed;
class Cu Pig;
model Weight = Cu Time Cu*Time / ddfm=satterth solution outp=R3;
random int Time / sub=Pig type=vc solution;
lsmeans Cu / diff;
estimate ’Slope: Cu1 vs Cu2’ Cu*Time 1 -1 0;
estimate ’Slope: Cu1 vs Cu3’ Cu*Time 1 0 -1;
estimate ’Slope: Cu2 vs Cu3’ Cu*Time 1 0 -1;
October 4, 2001 Mixed Models Course 15
run;
October 4, 2001 Mixed Models Course 16
283
When a random regression coefficient is present in the model, then
it is important that the model also contains a random intercept.
To see why consider the random regression model
yijt = αi + βit + Uij + Bijt + εijt
Suppose that the scale of time t is changed to t′ = c1t + c2. Then it
would be very desirable to obtain the same result whether t or t′ was
used as time in the regression.
October 4, 2001 Mixed Models Course 17
Now we use t′ in a random regression model without random
intercept:
yijt = αi + βit + Bijt′ + εijt
= αi + βit + Bij(c1t + c2) + εijt
= αi + βit + (Bijc1t) + (Bijc2) + εijt
Hence Bijc2 will play the role of a random intercept.
In other words, the presence of a random intercept a matter of the
scale on which t is measured.
Likewise, in a polynomial regression involving t2: If there is a
random regression coefficient for t2 then there must also be a
October 4, 2001 Mixed Models Course 18
16 Random Regression
284
random regression coefficient for t and a random intercept.
October 4, 2001 Mixed Models Course 19
Correlation structure in Random Regression Models
Consider again the random regression model
yijt = αi + βit + Uij + Bijt + εijt
and assume for simplicity that Uij and Bij are independent.
The variance of Yijt is
Var(Yijt) = σ2U + σ2
Bt2 + σ2e
For later use let Vt = σ2U + σ2
Bt2.
October 4, 2001 Mixed Models Course 20
285
Next consider the variance at time t + k:
Var(Yij(t+k)) = σ2U + σ2
B(t + k)2 + σ2e = Vt+k + σ2
e
= σ2U + t2σ2
B + k(2t + k)σ2B + σ2
e
= Vt + k(2t + k)σ2B + σ2
e
The covariance between Yijt and Yij(t+k) is
Cov(Yijt, Yij(t+k)) = Cov(Uij + Bijt + εijt, Uij + Bij(t + k) + εijt)
= Var(Uij) + Cov(Bijt, Bij(t + k))
= σ2U + t(t + k)σ2
B
= [σ2U + t2σ2
B] + tkσ2B = Vt + tkσ2
B
October 4, 2001 Mixed Models Course 21
In total
Var(Yijt) = Vt + σ2e
Var(Yij(t+k)) = Vt + k(2t + k)σ2B + σ2
e
Cov(Yijt, Yij(t+k)) = Vt + tkσ2B
Hence the correlation is
Corr(Yijt, Yij(t+k)) =Vt + tkσ2
B√
(Vt + σ2e)(Vt + k(2t + k)σ2
B + σ2e)
Now consider a fixed t. The numerator is a linear function in k while
the denominator is a quadratic function in k.
October 4, 2001 Mixed Models Course 22
16 Random Regression
286
Hence we know from high school mathematics that
Corr(Yijt, Yij(t+k)) → 0
as k (i.e. the time span between Yijt and Yij(t+k) goes to infinity.
In other words, under the random regression model, the correlation
decreases as with distance in time.
That is an appealing property of the model!
October 4, 2001 Mixed Models Course 23
287
16 Random Regression
288
17 Factor Structure Diagrams
The discussion with participants during the previous lectures had shown the need for an inde-pendent means of checking the degrees of freedom in the F-tests in PROC MIXED. The methodsof calculation of degrees of freedom (option ddfm) is not fool-proof. The containment methodmay lead to errors if the experimental design cannot be deducted from the model specification,and the satterthwaite method is erroneous if one of the variance component is estimated as0.
Therefore, the factor structure diagram method were presented, supplement with an exercise.
Link to the full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/FactorStructure.f.pdf
289
Factor Structure Diagrams
Factor structure diagrams is a way of representing certain factorial
designs, including block experiments, split plot experiments etc.
• With such diagrams, it is for certain balanced cases easy to calculate
the correct degrees of freedom for the tests.
• It is also for certain balanced cases easy to identify which “error” an
effect is to be “tested against”.
April 17, 2001 1
However
• it is a somewhat restricted class of models that can be appropriately
represented this way.
• the degree of freedom calculations are not correct in unbalanced cases
• It is a very comprehensive task to describe the class of designs for which
factor structure diagrams can be used
Nonetheless, they are quite useful...
April 17, 2001 2
17 Factor Structure Diagrams
290
Two–way ANOVA with Replicates
Factors A and B have a and b levels. Replicates within each
combination A×B are denoted by the factor R with r levels.
That is, there are abr units in the experiment
The usual two–factor ANOVA model is
yabr = µ+ αa + βb + (αβ)ab + εabr
The model can be represented in a factor structure diagram
April 17, 2001 3
O11
Aaa−1
Bbb−1
ABabab−a−b+1[ABR]abrabr−ab
• The term O is to be identified with µ
• The term A is to be identified with αa
• The term AB is to be identified with (αβ)ab etc.
• Terms in [. . . ] are random effects.
April 17, 2001 4
291
Calculating the degrees of freedom
1. Fill in the levels of the factors as superscripts (i.e. the red) symbols.
2. Then calculate the degrees of freedom (DF) recursively from right to
left:
The DF for O is 1 (the blue symbol).
The DF for A is a minus the sum of DFs from factors pointing towards
A in the diagram, i.e.
a− 1 = a− 1
3. Proceed like this towards left in the diagram: The DF for AB are
ab− (a− 1)− (b− 1)− 1 = ab− a− b+ 1
April 17, 2001 5
“Proof that it works...”
%let a=4; %let b=2; %let r=3;
title ’Two-way ANOVA with replicates’;
data data1;
do A=1 to &a;
do B=1 to &b;
do R=1 to &r;
y=rannor(0);
output;
end; end; end;
proc mixed data=data1 noinfo noclprint;
class A B R;
model Y = A B A*B;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
A 3 16 0.56 0.6470
B 1 16 1.54 0.2329
A*B 3 16 1.72 0.2021
April 17, 2001 6
17 Factor Structure Diagrams
292
Two–way ANOVA without Replicates
If there are no replicates within each combination of A and B (i.e.
r = 1), the model is
yab = µ+ αa + βb + εab
since the interaction can not be estimated.
Following the lines from before, a diagram is
O11
Aaa−1
Bbb−1
ABabab−a−b+1[ABR]abab−ab=0
April 17, 2001 7
Another way of looking at it is by saying that the random error is the
interaction!!
So a more appropriate diagram is
O11
Aaa−1
Bbb−1
[AB]abab−a−b+1
April 17, 2001 8
293
“Proof that it works...”
title ’Two-way ANOVA without replicates’;
data data2;
do A=1 to &a;
do B=1 to &b;
y=rannor(0);
output;
end; end;
proc mixed data=data2 noinfo noclprint;
class A B;
model Y = A B;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
A 3 3 0.45 0.7377
B 1 3 0.05 0.8414
April 17, 2001 9
Block Experiments with Replicates within Blocks
If A is a (random) block effect and there are replicates of the factor B
within each block the model is
yabr = µ+ Ua + βb + Vab + εabr
The diagram is
O11
[A]aa−1
Bbb−1
[AB]abab−a−b+1[ABR]abrabr−ab
April 17, 2001 10
17 Factor Structure Diagrams
294
Note:
• The systematic effect B is to be tested against the random effect
closest to it in the diagram, i.e. [AB]
• Note that since A is a random effect, any factor containing A must
also be random.
April 17, 2001 11
“Proof that it works...”
title ’Block experiment with replicates within blocks’;
data data3;
do A=1 to &a;
U = rannor(0);
do B=1 to &b;
V = rannor(0);
do R=1 to &r;
y=rannor(0) + U + V;
output;
end; end; end;
proc mixed data=data3 noinfo noclprint;
class A B R;
model Y = B;
random A A*B;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
B 1 3 14.99 0.0305
April 17, 2001 12
295
Block Experiments without Replicates within Blocks
If A is a (random) block effect and there are no replicates of the factor
B within each block the model is
yab = µ+ Ua + βb + εab
The diagram is
O11
[A]aa−1
Bbb−1
[AB]abab−a−b+1
April 17, 2001 13
“Proof that it works...”
title ’Block experiment without replicates within blocks’;
data data4;
do A=1 to &a;
U = rannor(0);
do B=1 to &b;
y=rannor(0) + U;
output;
end; end;
proc mixed data=data4 noinfo noclprint;
class A B;
model Y = B;
random A;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
B 1 3 3.30 0.1671
April 17, 2001 14
17 Factor Structure Diagrams
296
Split Plot Experiment
Let A denote the whole–plot treatment and B the split–plot treatment.
Replicate units within A are denoted by R.
The model is:
yabr = µ+ αa + Uar + βb + (αβ)ab + εabr
O11
[A]aa−1
Bbb−1ABabab−a−b+1
[AR]arab−a−b+1
[ABR]abrabr−ab
April 17, 2001 15
“Proof that it works”title ’Split plot experiment’;
%let a=4; %let b=3; %let r=3;
data data5;
do A=1 to &a;
do R=1 to &r;
U = rannor(0);
do B=1 to &b;
y=rannor(0) + U;
output;
end; end; end;
proc mixed data=data5 noinfo noclprint;
class A B R;
model Y = A B A*B;
random A*R;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
A 3 8 0.68 0.5901
B 2 16 3.81 0.0444
A*B 6 16 2.57 0.0618
April 17, 2001 16
297
Split Plot Experiment – Homework
Let E and C be the vitamin E and copper treatments applied to R pigs
within each combination of E and C.
Let M denote the membrane.
Hence the model is
yecrm = µ+αe+βc+(αβ)ec+Uecr+γm+(αγ)em+(βγ)cm+(αβγ)ecm+εecrm
April 17, 2001 17
The factor structure diagram becomes
O11
Eee−1
Ccc−1
Mmm−1
ECecec−e−c+1
EMemem−e−m+1
CM cmcm−c−m+1
ECMecm(e−1)(c−1)(m−1)
[ECR]ecrec(r−1)
[ECRM ]ecrmec(rm−r−m+1)
April 17, 2001 18
17 Factor Structure Diagrams
298
“Proof that it works”
title ’Split plot experiment - homework - with 3 membranes’;
%let sigma_G = 2;
%let sigma_M = 6;
%let sigma_E = 1;
data mem;
do cu= 1 to 2;
do e_vit= 1 to 2;
do grnr= 1 to 8;
U_g = &sigma_G * rannor(0);
do membran= 1 to 3;
V_m = &sigma_M * rannor(0);
do muskel= 1 to 2;
E = &sigma_E * rannor(0);
y = U_g + V_m + E;
output;
end;
end;
end;
end;
end;
data mem1; set mem(where=(muskel=1));
April 17, 2001 19
proc mixed data=mem1;
class cu e_vit membran grnr;
model y = cu | e_vit | membran ;
random cu*e_vit*grnr ;
run;
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
cu 1 28 0.05 0.8316
e_vit 1 28 0.10 0.7489
cu*e_vit 1 28 1.55 0.2230
membran 2 56 0.10 0.9091
cu*membran 2 56 0.57 0.5708
e_vit*membran 2 56 1.26 0.2904
cu*e_vit*membran 2 56 1.16 0.3198
April 17, 2001 20
299
A Neat Little Exercise
1. Draw a factor structure diagram for the entire membrane experiment.
2. Compute the degrees of freedom for each test.
3. Verify by simulation that SAS does the right thing.
Hint: Use a BIG sheet of paper!
April 17, 2001 21
17 Factor Structure Diagrams
300
18 Covariate Models and MultivariateResponse
The use of covariates in mixed models is discussed, initially based on chapter 5 in LMSW (Littellet al., 1996), i.e., model specification, comparison, and reduction.
Then it is shown that the covariate model may be naturally modified to include several dependentvariables, i.e., to a multivariate response model. The data manipulation steps in SAS is describedand the necessary model specification shown.
Link to full screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/covariate.f.pdf
301
Example of use of covariates
Excercise 1: Treatments copper and vitamin E each at three levels.
Litters as blocks. Dependent variables, daily gain (and feed intake).
Weight at start differed.
April 17, 2001 1
Plot
15 20 25 30 35
0.6
0.7
0.8
0.9
1.0
Start weight
Dai
ly G
ain
April 17, 2001 2
18 Covariate Models and Multivariate Response
302
Plot
15 20 25 30 35
0.6
0.7
0.8
0.9
1.0
Start weight
Dai
ly G
ain
April 17, 2001 3
Yijk = (αγ)ij + Lk + βijwijk + εijk
• Yijk: Daily gain,
• wijk weight at start,
• βij regression coefficient for level ij of treatment,
• (αγ)ij interaction between copper and vitamin E,
• Lk random effect of litter (Lk ∼ N (0, σ2
L)),
• εijk random residual, εijk ∼ N (0, σ2)
Model reduction ?
April 17, 2001 4
303
Model reduction
Reformulate as additive model and remove non-significant terms
Yijk = (αγ)ij + Lk + βijwijk + εijk
(αγ)ij = µ + αi + γj + (αγ)′ij
βij = β0 + β1i + β2j + β′
ij
April 17, 2001 5
Table 5:1 LMSW, page 5.2.2
1. Are all slopes = 0 ? If fail to reject goto step 2. else goto 3
2. Fit a common slope and test hypothesis = 0. If fail to reject
compare treatments using ANOVA, else use parallel lines
3. Test that the slopes are equal. If fail to reject use common slope
model, if reject goto step 4.
4. Use the unequal slopes model.
April 17, 2001 6
18 Covariate Models and Multivariate Response
304
SAS-code
Step 1:
proc Mixed data=a;
class Kuld Evit Kobber ;
model Tilv= Evit*Kobber
Startv*Evit*Kobber /noint solution ;
random kuld ;
Step 3:
model Tilv= Evit Kobber Evit*Kobber
Startv Startv*Evit Startv*Kobber
Startv*Kobber*Evit ;
April 17, 2001 7
SAS-Anova
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
EVIT 2 34 0.54 0.5905
KOBBER 2 34 0.46 0.6333
EVIT*KOBBER 4 34 1.10 0.3740
STARTV 1 34 27.62 <.0001
STARTV*EVIT 2 34 0.79 0.4627
STARTV*KOBBER 2 34 0.55 0.5829
STARTV*EVIT*KOBBER 4 34 1.13 0.3572
April 17, 2001 8
305
Plot
15 20 25 30 35
0.6
0.7
0.8
0.9
1.0
Start weight
Dai
ly G
ain
April 17, 2001 9
Final Model
15 20 25 30 35
0.6
0.7
0.8
0.9
1.0
Start weight
Dai
ly G
ain
April 17, 2001 10
18 Covariate Models and Multivariate Response
306
Feed per day
1.4 1.6 1.8 2.0 2.2 2.4 2.6
0.6
0.7
0.8
0.9
1.0
Feed pr day
Dai
ly G
ain
April 17, 2001 11
Feed per day
1.4 1.6 1.8 2.0 2.2 2.4 2.6
0.6
0.7
0.8
0.9
1.0
Feed pr day
Dai
ly G
ain
April 17, 2001 12
307
Feed per day
1.4 1.6 1.8 2.0 2.2 2.4 2.6
0.6
0.7
0.8
0.9
1.0
Feed pr day
Dai
ly G
ain
April 17, 2001 13
SAS-code
Test
proc Mixed data=a;
class Kuld Evit Kobber ;
model Tilv= Kobber
Fedag Fedag*Kobber ;
random kuld ;
Estimation:
model Tilv= Kobber
Fedag*Kobber /noint solution ;
April 17, 2001 14
18 Covariate Models and Multivariate Response
308
Feed per day
1.4 1.6 1.8 2.0 2.2 2.4 2.6
0.6
0.7
0.8
0.9
1.0
Feed pr day
Dai
ly G
ain
April 17, 2001 15
The lines actually denotes the conditional distribution of the daily
gain given the feed intake, i.e.,
Yij = µ + βXij + εij
If both variables measures the effect of the treatment, the joint
distribution may be more interesting.
There is a relatively simple relationship between the conditional and
joint distribution.
E(Xij) = µx
E(Yij) = µy = E(µ + βXij) = µ + βµx
April 17, 2001 16
309
V(Xij) = σ2
x
V(Yij|Xij) = V(εij) = σ2
x − σyx
1
σ2x
σxy
C(Xij, Yij) = C(Xij, µ + βXij + εij) = β V(Xij) = βσ2
x
V(Yij) = σ2
ε + β2σ2
x
i.e., the joint distribution(
Xij
Yij
)
∼ N
((
µx
µy
)
,
(
σ2
x βσ2
x
βσ2
x σ2
ε + β2σ2
x
))
Can this be generalised ?
April 17, 2001 17
Multivariate Responses
Consider a feeding experiment where a treatment factor A (say
supplement of copper) is applied to pigs.
Two responses are measured:
Y 1 : Weight gain
Y 2 : Feed intake
Hence the response is a two–dimensional vector Y = (Y 1, Y 2)>.
April 17, 2001 18
18 Covariate Models and Multivariate Response
310
Return to the feeding experiment.
A model for each response Y r, where r = 1, 2 could be
Y rik = µr + αr
i + εrik
where i = 1, . . . , I is treatment, k = 1, . . . K is replicates within
each treatment, and εrik ∼ N(0, σ2
r).
Hence all parameters µr, αri , σ
2
r are specific to the rth response.
April 17, 2001 21
The Components of a MLNM
For each response Y r it is assumed that E(Y r) can be written as a
linear function of the explanatory variables.
In the example,
E(Y rik) = δr + αr
i
April 17, 2001 22
311
It is assumed that the mean value has the same structure for each
response r made on the same unit.
In the example,
E(Yik) = (E(Y 1
ik), E(Y 2
ik)) = (δ1 + α1
i , δ2 + α2
i ) = (µ1
i , µ2
i )
It is also assumed that the parameters βr and βs relating to the rth
respectively the sth response have nothing in common.
In the example, this means that there are no restrictions on the
parameters of the form that e.g. α1
i and α2
i are restricted to being
identical.
April 17, 2001 23
The responses are possibly correlated. To account for this we allow
for a covariance matrix of the form
Σ = C(Yik) =
[
σ2
1σ12
σ21 σ2
2
]
The model we consider can be briefly written
Yik = (Y 1
ik, Y2
ik) ∼ N2((µ1
i , µ2
i ),Σ)
If the vectors are regarded as row vectors, then it just looks like two
linear normal models appended to each other, with the extra finesse
that the two responses are allowed to be non–independent.
And - that is just what it is !
April 17, 2001 24
18 Covariate Models and Multivariate Response
312
Such models can be dealt with in a mixed model setup.
The trick is to arrange the data in columns.
Suppose there are two treatments, i.e. i = 1, 2 and two pigs per
treatment, i.e. j = 1, 2.
Then there 4 units in the experiment, each with two measurements
giving all together 8 measurements.
April 17, 2001 25
It is not very hard to see that the mean of each of these can be
written in the matrix form
E(
(
Y 1
11
Y 2
11
)
(
Y 1
12
Y 2
12
)
(
Y 1
21
Y 2
21
)
(
Y 1
22
Y 2
22
)
) =
1 1 0 0 0 0
0 0 0 1 1 0
1 1 0 0 0 0
0 0 0 1 1 0
1 0 1 0 0 0
0 0 0 1 0 1
1 0 1 0 0 0
0 0 0 1 0 1
δ1
α1
1
α2
1
δ2
α1
2
α2
2
April 17, 2001 26
313
The covariance matrix is easy to specify too: The units are assumed
independent, and hence the covariance between measurements on
different units is zero.
The covariance structure for measurements on the same unit
together with the variances are described in the 2× 2 matrix Σ.
April 17, 2001 27
For all measurements, the covariance matrix is therefore the 8× 8
matrix
C(
(
Y 1
11
Y 2
11
)
(
Y 1
12
Y 2
12
)
(
Y 1
21
Y 2
21
)
(
Y 1
22
Y 2
22
)
) =
Σ 02 02 02
02 Σ 02 02
02 02 Σ 02
02 02 02 Σ
where 02 is the 2× 2 matrix consisting exclusively of 0s.
April 17, 2001 28
18 Covariate Models and Multivariate Response
314
How to ... In SAS
A brief outline about how to work with such problems in SAS.
The response variables are stacked on top of each other in a variable
called Y.
Let R be another variable with levels, say W and I indicating whether
the corresponding measurement in Y is a measurement of weight or
feed intake.
Let K be a variable identifying the subjects (within the treatment),
and let A be the treatment factor.
Then the following SAS program would do the trick:
April 17, 2001 29
proc mixed data=...;
class R K A;
model Y = R R*A / noint ddfm=satterth ...;
repeated R / subject=K*A type=un;
run;
In the REPEATED statement the subject option specifies the blocks
of the covariance matrix (in the example that there are 4 blocks).
The option type=un specifies that the blocks should be completely
unstructured
The variable R in the REPEATED statement is used for identifying the
different response types.
April 17, 2001 30
315
The General Setup
More generally,
E(Y rj ) = x>j βr
where xj are covariates for the jth experimental unit and βr is a
vector of parameters establishing the connection between E(Y rj ) and
xj
More generally,
E(Yj) = (E(Y 1
j ), E(Y 2
j ), . . . , E(Y Rj )) = x>j [β1 : β2 : · · · : βR] = x>j B
Hence B = [β1 : β2 : · · · : βR] is now a matrix of parameters where
the rth column is the parameters associated with the rth response.
April 17, 2001 31
If we let Yj = (Y 1
j , . . . , RRj ) be a row vector, then
E(Yj) = x>j B
is also a row vector and is given by
If the rows of data from all n units are stacked on top of each other
we obtain an n×R matrix
Y =
Y 1
1Y 2
1. . . , RR
1
Y 1
2Y 2
2. . . , RR
2
... ... ...
Y 1
n Y 2
n . . . , RRn
Similarly the covariates x>j can be stacked on top of each other to
give a design matrix X (with dimension n× p) in the usual way.
April 17, 2001 32
18 Covariate Models and Multivariate Response
316
The previous considerations then gives that
E(Y ) = X B
(n×R) (n× p) (p×R)
i.e. the mean is now organized as a matrix rather than as a vector.
April 17, 2001 33
317
18 Covariate Models and Multivariate Response
318
19 Heterogeneous Variance
The purpose of this lecture was to present why it is important to recognize variance heterogeneity,how to model such heterogeneity and consequences of different modelling approaches. Thelecture extends the description in chapter 8 in LMSW (Littell et al., 1996).
Graphical techniques for finding suitable models of variance heterogeneity is presented and vari-ance functions including the power-family is introduced. In addition, the effect of transformationis illustrated.
Link to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/VarianceStructure.f.pdf
319
Why Variance Heterogeneity is Important to
Recognize
Frequently the usual assumptions about variance homogeneity are
not met in practice. In that case the variance is said to be
heterogeneous.
One reason for incorporating variance heterogeneity in the model is
the ability to
• downweight portions of data which are highly variable, and
• extract more information from portions of the data which are more
precise.
October 18, 2001 Mixed Models Course 1
As always there is a price to pay:
• The models become less parsimonious in terms of the number of
parameters.
• Fitting the models can be more difficult (numerical problems).
• Usually, only asymptotic inference can be carried out (i.e. no exact
F–tests etc.)
• Model control becomes more complicated.
October 18, 2001 Mixed Models Course 2
19 Heterogeneous Variance
320
Graphical Investigation of the Variance Structure
Frequently there is some structure on the way in which the variance
is non–constant:
Frequently the variance increases when the mean increases.
That is, the variance is a function of the mean, symbolically
Var(Y ) = f(E(Y ))
With grouped data, the variance function can sometimes be
identified.
October 18, 2001 Mixed Models Course 3
Example 1. One-way ANOVA:
Ykl = αk + εkl
where εkl ∼ N(0, σ2k) for some treatments k = 1, 2, . . . K and
replicates within treatments l = 1, 2, . . . , Lk.
Good estimates for mean and variance in the kth group are
• Mean: yk.
• Variance: s2k = 1
Lk−1
∑
l(ykl − yk.)2
A reasonable idea is to plot s2k against yk. to see if the variance is a
function of the mean. fin
October 18, 2001 Mixed Models Course 4
321
Variance Functions
After having found that the variance is non–constant, the next step
is to look for some structure in which it is non–constant.
This is obtained by considering a particular function for the variance
as a function of the mean.
Frequently in practice one works with the variance function
Var(Y ) = σ2µθ
where µ = E(Y ), and σ2 and θ are unknown constants.
Variance functions of this form are called the power family.
October 18, 2001 Mixed Models Course 5
With
Var(Y ) = σ2µθ
we have a linear relationship on the log–scale:
log Var(Y ) = log σ2 + θ log µ
Therefore, in the ANOVA example the natural thing to do is to plot
log s2k against log yk. and see if the relationship is approximately
linear.
If so, it may be reasonable to assume then we are within the power
family of variance functions – and this is a nice family as shall soon
be shown.
October 18, 2001 Mixed Models Course 6
19 Heterogeneous Variance
322
Example 2. A substance X14 has been added in the concentrationfod∈ {0.0, 4.4, 6.2, 9.3} to the food for some pigs. The pigs arefed (up!) with this food until their weight is 60 kg. From thereofand until they are slaughtered at 100kg, their food does contain thesubstance.
At 60kg (sample=1) and 100kg (sample=2) muscle biopsies are madeand the concentration of the substance is determined.
0 2 4 6 8
12
3
fod
mConcentrations, 1=60kg, 2=100kg
1
1
1
1
2
2
22
October 18, 2001 Mixed Models Course 7
Plot of individual points and of log–variance against log–mean indicatethat variance increases with the mean:
0 2 4 6 8
01
23
4
fod
X14
Sample = 1
0 2 4 6 8
01
23
4
fod
X14
Sample = 2
−1.0 −0.5 0.0 0.5 1.0
−3.
5−
2.5
−1.
5
logm
logv
Log−var vs log−mean, slope=1.23(0.25)
• One possibility is a linear increase with the slope being ≈ 1.
• Another is that there are two variances: One when fod= 0 andanother one when fod6= 0.
fin
October 18, 2001 Mixed Models Course 8
323
From hereof there are different possibilities:
• Transform data onto a scale where the variance is (approximately)
constant
• Include the heterogeneous variance explicitly in the model
October 18, 2001 Mixed Models Course 9
The Delta–method
First we consider transformation of data onto a scale where the
variance is approximately constant.
Let Y be a random variable and let h() be a nice function, e.g.
h(y) =√
y, h(y) = y2, h(y) = log y.
We shall investigate the properties of the transformed random
variable Z where
Z = h(Y )
October 18, 2001 Mixed Models Course 10
19 Heterogeneous Variance
324
Example 3. Let Y ∼ N(µ, σ2). If h is linear, i.e. h(y) = α + βy,then it is well known that
Z = h(Y ) ∼ N(α + βµ, β2σ2)
If h is non–linear, e.g. if h(y) = log y then Z is not normallydistributed. fin
• However, Z = h(Y ) will in certain cases be approximately normal
if Y is normal.
• Moreover, one can find the approximate mean and variance of Z
independently of whether Y is normal or not.
October 18, 2001 Mixed Models Course 11
Taylors Approximation
The road to these results can be based on the following argument:
Let x0 and x be two numbers (not too far apart) and assume that h
is “nice” (i.e. differentiable).
Then it is well known from high school that
h(x) ≈ h(x0) + h′(x0)(x− x0).
The further x is from x0 the worse is this approximation.
This approximation is frequently called a Taylor expansion of h
around x0.
October 18, 2001 Mixed Models Course 12
325
0 1 2 3 4
020
4060
80
x
f(x)
First order Taylor approximation
h(x) ≈ h(x0) + h′(x0)(x− x0).
October 18, 2001 Mixed Models Course 13
Applying Taylors Approximation
Taylors approximation is now applied to the random variable Y with
mean µ = E(Y ) and variance σ2 = Var(Y ).
The approximation is around µ. We then get
Z = h(Y ) ≈ h(µ) + h′(µ)(Y − µ).
• Hence, when Y is “close to” µ, then h(Y ) is approximately a linear
function of Y .
• Y “being close to” µ means basically that σ2 has to be to be small.
October 18, 2001 Mixed Models Course 14
19 Heterogeneous Variance
326
• From the approximation
Z = h(Y ) ≈ h(µ) + h′(µ)(Y − µ).
we also conclude that
E(Z) = E(h(Y )) ≈ h(µ)
Var(Z) = Var(h(Y )) ≈ h′(µ)2 Var(Y )
• Hence, if Y is normal then it follows that Z must also be
approximately normal since Z is an approximately linear function
of Y . In this case we therefore conclude
Z = h(Y ) ≈ N(h(µ), h′(µ)2σ2).
October 18, 2001 Mixed Models Course 15
It must be emphasized that these results are asymptotic results.
How good they are depend on many things including
• the variance of Y , i.e. how close Y –value tend to be to µ
• the form of h – how “smooth” (that is how close to being linear)
h is.
October 18, 2001 Mixed Models Course 16
327
Transformation of Data
The previous results can sometimes be used for identifying
transformations of data onto a scale where the variance is constant.
It is assumed in the following that
E(Yi) = µi and V ar(Yi) = σ2µθi .
By plotting log–variance against log–mean one can frequently get a
good estimate of θ, and from that one can (sometimes) identify an
appropriate transformation.
October 18, 2001 Mixed Models Course 17
We look for a function h such that Z = h(Y ) has constant variance
σ2Z:
• From the previous section we have
σ2Z = Var(h(Y )) ≈ h′(µ)2 Var(Y ) = h′(µ)2σ2µθ
• If we solve for h′ we get
h′(µ) ≈√
σ2Z
σ2µ−
θ2
• For later use let c =
√
σ2Z
σ2 . Hence we look for a function h which
October 18, 2001 Mixed Models Course 18
19 Heterogeneous Variance
328
satisfies that its derivative is
h′(µ) = cµ−β2 .
October 18, 2001 Mixed Models Course 19
Such an equation is called a differential equation.
The search for h has to be taken in two steps:
When θ = 2: Then h′(µ) = c1
µ, and high school knowledge tell us
that the solution is the natural logarithm, i.e.
h(µ) = c log(µ).
When θ 6= 2: In this case we need the anti–derivative of a simple
power function. It is then well know from high school that
h(µ) = c2
2− θµ
2−θ2 .
October 18, 2001 Mixed Models Course 20
329
With Var(Y ) = σ2µθ there are some well known special cases:
• Note that θ = 0 implies that the Var(Y ) = σ2.
(As is the case in Linear Normal Models)
• Note that σ2 = θ = 1 implies that the Var(Y ) = µ.
(As is the case in the Poisson distribution.)
• Note that θ = 2 implies that the Var(Y ) = σ2µ2.
(I.e. the coefficient of variation is constant as is the case in the
Gamma distribution.)
October 18, 2001 Mixed Models Course 21
Modelling Variance Heterogeneity
As has been seen transformation of data in an attempt to obtain
variance can be a mixed blessing:
• the transformation can ruin the linearity of the men structure.
• it can be very difficult to report contrasts and their standard error
on the original scale.
An attractive alternative to transformation is therefore to include
variance heterogeneity in the model.
October 18, 2001 Mixed Models Course 22
19 Heterogeneous Variance
330
Consider the pig–feeding example from before and the model
yis = α + βxi + βsxi + εis
where i is pig, s is sample and xi is the dose given to the ith pig.
• if εis ∼ N(0, σ2) then it is a LNM, i.e. there is assumed variance
homgeneity.
• if εis ∼ N(0, σ2xi
) then we accomodate for different variances
corresponding to different doses of x. (Recall that xi can assume
4 different values, so there are 4 different variance parameters
• if εis ∼ N(0, σ21) when xi = 0.0 and εis ∼ N(0, σ2
2) when xi 6= 0.0
there are two different variance parameters in the model.
October 18, 2001 Mixed Models Course 23
• if εis ∼ N(0, σ2xi,s
) then we accomodate for different variances
corresponding to different doses of x and for there different samples
(Hence there are 8 different variance parameters).
October 18, 2001 Mixed Models Course 24
331
Fitting the models in PROC MIXED:
data biopsi; set biopsi; fod_c =fod; if fod=0.0 then fod_c2 = 1;
else fod_c2=2;
title ’Variance homogeneity’;
proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o1;
ods output solutionf=sf1;
title ’Variance heterogeneity, 4 variances’;
proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o2;
ods output solutionf=sf2;
repeated fod_c/ type=un(1);
October 18, 2001 Mixed Models Course 25
title ’Variance heterogeneity, 2 variances’;
proc mixed data=biopsi;
class sample fod_c fod_c2;
model x14=fod fod*sample / ddfm=satterth chisq solution outp=o3;
ods output solutionf=sf3;
repeated fod_c2/ type=un(1);
run;
October 18, 2001 Mixed Models Course 26
19 Heterogeneous Variance
332
Parts of the SAS output is
Variance homogeneity: Residual 0.1262-2 Res Log Likelihood 51.6AIC (smaller is better) 53.6AICC (smaller is better) 53.7BIC (smaller is better) 55.4
Variance heterogeneity, 4 variances: Cov Parm Estimate-2 Res Log Likelihood 39.1 UN(1,1) 0.02512AIC (smaller is better) 47.1 UN(2,2) 0.08855AICC (smaller is better) 48.0 UN(3,3) 0.1491BIC (smaller is better) 54.6 UN(4,4) 0.2481
Variance heterogeneity, 2 variances: Cov Parm Estimate-2 Res Log Likelihood 41.8 UN(1,1) 0.02517AIC (smaller is better) 45.8 UN(2,2) 0.1592AICC (smaller is better) 46.1BIC (smaller is better) 49.6
October 18, 2001 Mixed Models Course 27
The parameter estimates are:
Effect sample Estimate StdErr DF tValue Probt model
Intercept 0.3130 0.09145 46 3.42 0.0013 varhomo
fod 0.1453 0.01735 46 8.38 <.0001 varhomo
fod*sample 1 0.2433 0.01689 46 14.40 <.0001 varhomo
fod*sample 2 0 . . . . varhomo
Intercept 0.2608 0.04468 12.1 5.84 <.0001 varhet1
fod 0.1546 0.01552 41 9.96 <.0001 varhet1
fod*sample 1 0.2489 0.01985 33.3 12.54 <.0001 varhet1
fod*sample 2 0 . . . . varhet1
Intercept 0.2620 0.04489 11.9 5.84 <.0001 varhet2
fod 0.1524 0.01466 44.9 10.39 <.0001 varhet2
fod*sample 1 0.2432 0.01897 34.9 12.82 <.0001 varhet2
fod*sample 2 0 . . . . varhet2
October 18, 2001 Mixed Models Course 28
333
Heterogeneous Variance for Grouped Data
Example 4. Example 8.2 from LMSW, p. 268.
• The response is the ultrafiltration rate UFR (in ml/hr) of 20 highflux membrane dialyzers measured at 7 different transmembranepressures TMP.
• The measurements are made in vivo and the aim is to characterizethe ultrafiltration characteristics of the membranes.
• The dialyzers are evaluated in vitro using bovine blood and flowrates QB of either 200 or 300 dl/min.
October 18, 2001 Mixed Models Course 29
0.5 1.0 1.5 2.0 2.5 3.0
020
4060
tmp
ufr
QB= 200
0.5 1.0 1.5 2.0 2.5 3.0
020
4060
tmp
ufr
QB= 300
• Plots suggest inhomogeneous variance, and more specifically thatvariance increases with the mean.
• The plot also suggest that there might be individual curves for eachmembrane, i.e. to consider random regression coefficient models.
October 18, 2001 Mixed Models Course 30
19 Heterogeneous Variance
334
The starting point is the 4. degree polynomial model
yimj = β0 + τi + (β1 + δ1i)ximj + (β2 + δ2i)x2imj
+(β3 + δ3i)x3imj + (β4 + δ4i)x
4imj + εimj
where x is TMP, i denotes QB–level, m is membrane within QB–level, and j is the jt measurement on the membrane to which themeasurement ximj is associated.
There are 7 measurements on each membrane, so a crude startingpoint could be to assume that εim = (εim1, . . . , εim7) follows a7–dimensional normal distribution,
εim ∼ N(0, R)
where R is an unstructured 7× 7 covariance matrix.
October 18, 2001 Mixed Models Course 31
The SAS program employed by LMSW for fitting this model is
proc mixed data=dial;
class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated / type=un subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;
With this program data is treated as being equidistant in TMP, i.e. theactual difference between two TMP–measurements is accounted for.
This becomes transparent if the program is rewritten as
proc mixed data=dial;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated index / type=un subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;
October 18, 2001 Mixed Models Course 32
335
Some of the SAS output is
Estimated Covariance matrix2.76 2.90 3.57 3.04 0.36 0.46 0.642.90 5.10 6.40 6.38 4.13 3.32 1.163.57 6.40 11.15 12.46 8.33 5.44 4.023.04 6.38 12.46 18.54 13.38 10.90 7.680.36 4.13 8.33 13.38 17.71 13.83 12.040.46 3.32 5.44 10.90 13.83 20.31 11.330.64 1.16 4.02 7.68 12.04 11.33 19.67
Estimated Correlation matrix1.00 0.77 0.64 0.43 0.05 0.06 0.090.77 1.00 0.85 0.66 0.43 0.33 0.120.64 0.85 1.00 0.87 0.59 0.36 0.270.43 0.66 0.87 1.00 0.74 0.56 0.400.05 0.43 0.59 0.74 1.00 0.73 0.650.06 0.33 0.36 0.56 0.73 1.00 0.570.09 0.12 0.27 0.40 0.65 0.57 1.00
October 18, 2001 Mixed Models Course 33
fin
• Note that with the model above there are 7×8/2 = 28 parameters
in the covariance matrix.
• The variances increase with TMP, and hence the covariances increase
with the differences in TMP.
• Yet, the correlations decrease with the difference in TMP.
• We seek a more parsimoneous model describing this correlation
structure.
October 18, 2001 Mixed Models Course 34
19 Heterogeneous Variance
336
• A simple AR(1) model in which the ijth element of R is
Rij = σ2ρ|i−j|
(which has 2 parameters) will clearly not fit to these data.
• A more flexible alternative is the heterogeneous AR(1) model (the
ARH(1) model) in which the ijth element of R is
Rij = σiσjρ|i−j|
(which has 8 parameters). This model is still much more
parsimonious than the unstructured covariance matrix which
requires 28 parameters.
October 18, 2001 Mixed Models Course 35
The ARH(1) model can be fitted using
proc mixed data=dial;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp;
repeated index / type=arh(1) subject=sub r rcorr;
ods output r=r rcorr=rcorr;
run;
October 18, 2001 Mixed Models Course 36
337
The empirical and estimated correlation matrix from the ARH(1)
model are close:
Estimated Correlation matrix (ARH(1))1.00 0.76 0.58 0.44 0.34 0.26 0.200.76 1.00 0.76 0.58 0.44 0.34 0.260.58 0.76 1.00 0.76 0.58 0.44 0.340.44 0.58 0.76 1.00 0.76 0.58 0.440.34 0.44 0.58 0.76 1.00 0.76 0.580.26 0.34 0.44 0.58 0.76 1.00 0.760.20 0.26 0.34 0.44 0.58 0.76 1.00
Estimated Correlation matrix (Unstructured)1.00 0.77 0.64 0.43 0.05 0.06 0.090.77 1.00 0.85 0.66 0.43 0.33 0.120.64 0.85 1.00 0.87 0.59 0.36 0.270.43 0.66 0.87 1.00 0.74 0.56 0.400.05 0.43 0.59 0.74 1.00 0.73 0.650.06 0.33 0.36 0.56 0.73 1.00 0.570.09 0.12 0.27 0.40 0.65 0.57 1.00
October 18, 2001 Mixed Models Course 37
For the model with the unstructured covariance matrix, a plot of the
residuals against TMP gives some insight:
0.5 1.0 1.5 2.0 2.5 3.0
−10
−5
05
tmp
Res
id
Residuals, UN − QB= 200
0.5 1.0 1.5 2.0 2.5 3.0
−10
−5
05
tmp
Res
id
Residuals, UN − QB= 300
• The profiles do not vary randomly around 0 – some profiles are
steadily increasing, other steadily decreasing.
October 18, 2001 Mixed Models Course 38
19 Heterogeneous Variance
338
• This suggests that maybe we are not faced with variance
heterogeneity but rather with individual regression coefficients.
• (After all, there is likely to be some variation between the
membranes).
The random regression model is fitted by:
proc mixed data=dial ;
class qb sub index;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / outp=o2;
random int tmp tmp*tmp / subject=sub type=un;
run;
October 18, 2001 Mixed Models Course 39
Now there is no tendency for the residuals to be steadily increasing
or decreasing when plotted against TMP.
0.5 1.0 1.5 2.0 2.5 3.0
−4
−2
02
4
tmp
Res
id
Residuals, RandomReg − QB= 200
0.5 1.0 1.5 2.0 2.5 3.0
−4
−2
02
4
tmp
Res
id
Residuals, RandomReg − QB= 300
Yet, the curves are still somewhat “smooth” suggesting that some
within subject variation has yet to be accounted for.
October 18, 2001 Mixed Models Course 40
339
Power–of–Mean for Data with Covariates
Previously it was discussed that the variance can sometimes be
regarded as a function of the mean.
This was used for
• identifying situations where serious variance heterogeneity was
present
• suggesting transformations of data
Yet, until now the actual structure – the variance as a function of
the mean has never been used directly.
October 18, 2001 Mixed Models Course 41
Usually when estimating variance/covariance parameters this is done
by subtracting estimates for the mean from the observed data to
give residuals. The residuals are then used for estimating the
variance/covariance parameters.
REML estimation is a clear example of this.
• In the setup in this section the mean and variance parameters are
not estimated separately.
• With this setup, one can capture variance heterogeneity together
with having random regression coefficients in the model
October 18, 2001 Mixed Models Course 42
19 Heterogeneous Variance
340
• We consider cases where the variance of the residuals is
Var(εi) = σ2|µi|θ
such that the R–matrix is diagonal with Rii = σ2|µi|θ.
• Since µi = x>i β, the mixed model becomes complicated:
y = Xβ + Zu + ε
where
E(Y ) = Xβ
Var(ε) = R(σ2, β, θ) = diag(σ2|x>i β|θ)
are both functions of β.
October 18, 2001 Mixed Models Course 43
• Consequently, maximizing the likelihood function is going to be a
very complicated task.
October 18, 2001 Mixed Models Course 44
341
Yet, it is easy to suggest a heuristic solution to the estimation
problem:
• Suppose we have a provisional estimate βp of β.
• If this estimate is plugged into R, i.e.
R(σ2, βp, θ) = diag(σ2|x>i βp|θ) = R(σ2, θ)
then R is all of a sudden only a function of σ2 and the power θ.
• These parameters can be estimated, together with β and the
parameters in Var(u) in PROC MIXED.
• The trick is then to set βp equal to the new estimate for β and
repeat the iteration until the parameters stop changing.
October 18, 2001 Mixed Models Course 45
In LMSW, p. 278 a way of doing it is shown. A simpler way is given
here:
1. First the iteration has to be started:
proc mixed data=dial;
class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / s;
random int tmp tmp*tmp / type=un sub=sub;
repeated / local;
ods output solutionf=sf covparms=cp;
run;
October 18, 2001 Mixed Models Course 46
19 Heterogeneous Variance
342
2. Then the estimated parameters β are used as provisional parameters
in the next iteration. (This happens in the repeated statement).
The estimated parameters of Var(u) as used as starting point
for the maximization algorithm. (This happens in the parms
statement).
This step is not necessary to but it speeds up the procedure
considerably:
proc mixed data=dial;
class qb sub;
model ufr = tmp|tmp|tmp|tmp qb|tmp|tmp|tmp|tmp / s outp=o3;
random int tmp tmp*tmp / type=un sub=sub;
repeated / local=pom(sf);
parms / pdata=cp;
ods output solutionf=sf1 covparms=cp1;
run;
October 18, 2001 Mixed Models Course 47
3. Finally the provisional estimate βp is set to the recent estimate for
β.
Likewise, the starting values for the parameters in Var(u) are set
to the recently estimated values of these:
proc compare brief data=sf compare=sf1;
var estimate;
data sf; set sf1;
data cp; set cp1;
run;
Now iterate between 2. and 3. until convergence, i.e. until the
parameters in sf and sf1 become very similar.
October 18, 2001 Mixed Models Course 48
343
Parts of the output from the final iteration is
Covariance Parameter Estimates
Cov Parm Subject Estimate
UN(1,1) sub 3.8360
UN(2,1) sub -5.8353
UN(2,2) sub 28.2501
UN(3,1) sub 1.3778
UN(3,2) sub -8.3312
UN(3,3) sub 2.6970
POM 1.9785
Residual 0.001974
The power is estimated to 1.9785 ≈ 2 which, in a sense, corresponds
to the case of constant coefficient of variation.
October 18, 2001 Mixed Models Course 49
Now there is no tendency for the residuals to be steadily increasing
or decreasing when plotted against TMP.
Also the curves are less smooth than before, suggesting that more of
the within subject variation has yet to be accounted for.
0.5 1.0 1.5 2.0 2.5 3.0
−6
−4
−2
02
4
tmp
Res
id
Residuals, POM − QB= 200
0.5 1.0 1.5 2.0 2.5 3.0
−6
−4
−2
02
4
tmp
Res
id
Residuals, POM − QB= 300
October 18, 2001 Mixed Models Course 50
19 Heterogeneous Variance
344
Noget om transformationer,
normalfordelingsapproximationen og
konfidensintervaller
Baseret pa 250 kvitteringer for indkøb af benzin li samt tilsvarende
registreringer af kørte kilometer pr. tankfuld ki er benzinøkonomien
yi =ki
li, i = 1, . . . , 250
udtrykt ved kilometer pr. liter beregnet.
Histogrammet og probitdiagrammet i øverste række af nedenstaende
figur viser at man med rimelighed kan antage at yi’erne er
October 18, 2001 Mixed Models Course 51
realisationer af stokastiske variabler Yi, hvor
Yi ∼ N(µ, σ2), i = 1, . . . , 250
Pa basis af data kan man nu opstille f.eks. et konfidensinterval, for µ.
Af forskellige grunde beslutter man sig for at ville sælge bilen i USA,
hvor man sædvanligvis angiver benzinøkonomi som “gallon pr. 100
miles”. For at gøre det nemt betragter vi i stedet “liter pr. 100 km”,
nemlig
zi = 100liki
= 1001
yi.
Det vil sige at vi transformerer data som zi = h(yi) = 100/yi.
October 18, 2001 Mixed Models Course 52
345
Det er velkendt at hvis Yi er normalfordelt, sa er 100/Yi IKKE
normalfordelt.
Nedenfor ses histogrammer og qqplots for Y og Z = h(100/Y ).Histogram of y
yF
requ
ency
10 11 12 13 14 15
010
30
−3 −2 −1 0 1 2 3
1012
14
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Histogram of z
z
Fre
quen
cy
7 8 9 10 11
020
4060
−3 −2 −1 0 1 2 3
78
910
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Man sporer en svagt højreskæv fordeling for zierne, men ellers ser
data ud til rimeligt at kunne beskrives ved en normalfordeling. Det
vil sige at med en vis rimelighed kan man arbejde med at 100/Yi
tilnærmelsesvist er normalfordelt.
October 18, 2001 Mixed Models Course 53
Ovenstaende data er i virkeligheden 250 observationer simulerede fra
en N(12, 12)–fordeling.
Vi skal nu illustrere at approximationen til normalfordelingen bliver
gradvist darligere nar spredningen bliver større.
Vi har derfor gennemført ovenstaende for spredningen σ = 2 og
σ = 3. Resultaterne er vist nedenfor:
Histogram of y
y
Fre
qu
en
cy
6 8 10 12 14 16 18
01
03
0
−3 −2 −1 0 1 2 3
81
21
6
Normal Q−Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntile
s
Histogram of z
z
Fre
qu
en
cy
6 8 10 12 14
02
04
06
0
−3 −2 −1 0 1 2 3
68
10
14
Normal Q−Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntile
s
Histogram of y
y
Fre
qu
en
cy
5 10 15 20
02
04
06
0
−3 −2 −1 0 1 2 3
51
01
52
0
Normal Q−Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntile
s
Histogram of z
z
Fre
qu
en
cy
5 10 15 20
02
06
0
−3 −2 −1 0 1 2 3
51
01
52
0
Normal Q−Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntile
s
Vi skal nu illustrere hvorledes det gar med middelværdien og
October 18, 2001 Mixed Models Course 54
19 Heterogeneous Variance
346
variansen af de transformerede data.
I det følgende lader vi E(Z) = η og V ar(Z) = τ 2. Vi kan da
estimere η og τ2 direkte pa bagrrund af de transformerede data som
henholdsvis gennemsnittet og stiskprøvevariansen.
Dernæst bemærkes at med h(x) = 100/x er h′(x) = −100/x2. Af
resultaterne
E(Z) = E(h(Y )) ≈ h(µ)
V ar(Z) = V ar(h(Y )) ≈ h′(µ)2V ar(Y )
har vi derfor at E(Z) ≈ 100/µ og V ar(Z) ≈ 10000σ2/µ4.
For de σ = 1, 2, 3 er tallene givet i tabelen nedenfor.
October 18, 2001 Mixed Models Course 55
Det ses at 1
µ er en god approximation til E(Z) = η og ligeledes er
10000σ2
µ4 en rimelig tilnærmelse til V ar(Z) = τ 2 nar spredningen er
lille. Det fremgar ogsa at nar spredingen bliver stor, blive specielt
approximationen til V ar(Z) = τ 2 darlig.
Størrelse σ = 1 σ = 2 σ = 3
µ 11.968 11.919 11.962
σ2 1.146 4.007 9.475
η 8.423 8.658 9.261
τ2 0.588 2.834 18.833
E(Z) = E(h(Y )) ≈ 1001
µ 8.355 8.389 8.359
V ar(Z) = V ar(h(Y )) ≈ 10000σ2
µ4 0.558 1.985 4.627
Afslutningsvis bemærkes at η og µ i dette eksempel er et udtryk for
October 18, 2001 Mixed Models Course 56
347
det samme nemlig benzinøkonomien.
Gennem transformationen af data zi = 100/yi fas at zi er utrykt i
“liter pr. 100 km”, hvilket ogsa bliver enheden for E(Z) = η.
Enheden for µ er “km. pr. liter”, og derfor er enheden for 100/µ
“liter pr. 100 km”.
Man kan derfor diskutere hvorvidt 100/µ eller η er den relevante
størrelse. De estimeres forskelligt, den første som 100 gange et
reciprokt gennemsnit og den anden som 100 gange gennemsnittet at
October 18, 2001 Mixed Models Course 57
reciprokke data:
100/µ = 100(1
n
∑
i
yi)−1
η =1
n
∑
i
zi = 1001
n
∑
i
1
yi
Beslutter man sig for at enheden “liter pr. 100 km” er den relevante
størrelse, sa har vi altsa to mader at fa den frem pa: Enten som et
gennemsnit af transformerede data eller som en transformation af
middelværdien af de oprindelige data.
October 18, 2001 Mixed Models Course 58
19 Heterogeneous Variance
348
Transformation og konfidensintervaller
Antag at de observerede data er y1, . . . , yn og at disse f.eks. for at
opna varianshomogenitet er transformeret til z1, . . . , zn med
transformationen h, dvs. zi = h(yi).
Pa den transformerede skala er der udført en statistisk analyse. Lad
θ være den størrelse vi er interesserede i. Pa baggrund af (de
transformerede) data fas et estimat θ, for θ samt et estimat σθ for
spredningen pa θ.
F.eks. kunne θ være hældningen i en lineær regression
Zi = α + θxi + εi.
October 18, 2001 Mixed Models Course 59
Generelt er et (1− α) konfidensinterval for θ givet ved to stokastiske
variable Zlav og Zhøj sadan at sandsynligheden for at θ ligger i
intervallet [Zlav, Zhøj] er 100(1− α)%.
I mange klassiske lineære modeller beregnes et (1− α)
konfidensinterval som
Zlav = θ − t1−α2(d)σθ
Zhøj = θ + t1−α2(d)σθ
hvor t1−α2(d) er 1− α
2–fraktilen i en t–fordeling med d frihedsgrader.
Hvis f.eks. θ er hældningen i en regression som ovenfor sa udtrykker
θ den forventede tilvækst pa Z nar x øges med een enhed.
Ofte er man interesseret i at undersøge udtrykke den forventede
October 18, 2001 Mixed Models Course 60
349
tilvækst af Y altsa pa den originale skala nar x øges med een enhed.
Populært sagt, vil man udtrykke θ “pa den oprindelige skala”.
Dette gøres ofte ved følgende. Lad h−1 være den omvendte funktion
til h. Da lader man h−1(θ) være et udtryk for θ “pa den oprindelige
skala”.
Man anvender derfor h−1 pa den estimerede værdi θ, hvilket giver
η = h−1(θ). Konfidensgrænserne pa den transformerede skala kan
ogsa transformeres tilbage med h−1:
Hvis h er strengt voksende da er
Ylav = h−1(Zlav)
Yhøj = h−1(Yhøj)
October 18, 2001 Mixed Models Course 61
og hvis h er strengt aftagende, sa er
Ylav = h−1(Zhøj)
Yhøj = h−1(Ylav)
Hvis [Zlav, Zhøj] er et 100(1− α)% konfidensinterval for θ da er
[Ylav, Yhøj] er et 100(1− α)% konfidensinterval for h−1(θ).
Bemærk: [Zlav, Zhøj] er symmetrisk omkring θ men [Ylav, Yhøj] er
IKKE generelt symmetrisk omkring h−1(θ).
Hvis h er approximativt lineær, da her h−1 ligesa, og i det tilfælde
bliver [Ylav, Yhøj] tilnærmelsesvist symmetrisk omkring h−1(θ).
Et alternativ til ovenstaende er følgende: Middelværdi og varians pa
October 18, 2001 Mixed Models Course 62
19 Heterogeneous Variance
350
den transformerede skala er tilnæmelsesvis givet ved
E(Z) = E(h(Y )) ≈ = h(E(Y ))
V ar(Z) = V ar(h(Y )) ≈ h′(E(Y ))2V ar(Y ).
Man kan nu løse disse ved hjælp af h−1. Man far
E(Y ) ≈ h−1(E(Z))
V ar(Y ) ≈ V ar(Z)
[h′(E(Y ))]2=
V ar(Z)
[h′(h−1(E(Z)))]2.
Disse resultater kan anvendes pa parameteren θ, som vi er
October 18, 2001 Mixed Models Course 63
interesseret i. Man far da
η = h−1(θ)
ση =σθ
|h′(η)|
Det er nu fristende at udregne konfidensgrænser for h−1(θ) som
Ylav = η − t1−α2(d)ση
Yhøj = η + t1−α2(d)ση
Dette interval bliver symmetrisk omkring η.
Der er dog ikke savidt vides gode formelle argumenter for at kalde
October 18, 2001 Mixed Models Course 64
351
[Ylav, Yhøj] for et 100(1− α)% konfidensinterval for h−1(θ). Derfor
anbfeales generelt [Ylav, Yhøj]
Det vil dog i nogle tilfælde være tilfældet at [Ylav, Yhøj] og
[Ylav, Yhøj] faktisk ligner hinanden meget.
Dette sker hvis varitionen i datamaterialet er lille. Indenfor et
snævert interval han h da betragtes som nogenlunde lineær, hvorved
ovennævnte approximationer bliver gode.
October 18, 2001 Mixed Models Course 65
Eksempel: Antag at data er transformeret som
zi = h(yi)
Pa baggrund af de transformerede data laves en regression
Zi = α + βxi + εi
Vi er interesserede i et konfidensinterval for h−1(β).
Pa baggrund af data estimeres β = 0.25 of σβ = 0.03.
Vi vil nu sammenligne to mader at beregne intervallerne pa. For
argumentets skyld skal vi gennemføre tilsvarende beregninger for
σβ = 0.06 og σβ = 0.09.
October 18, 2001 Mixed Models Course 66
19 Heterogeneous Variance
352
For simpelhedens skyld antager vi at der er sa mange observationer
at t fordelingen ligner en normalfordeling. Dermed bliver
t1−α2(d) ≈ 1.96 for α = 0.05.
Bemærk først at
h(y) =√
(y) = y1/2 hvormed
h−1(y) = y2 og
h′(y) =1
2√
y.
og at η = h−1(β) = 0.0625 samt at h′(η) = 2 (regn selv efter)!.
October 18, 2001 Mixed Models Course 67
For σβ = 0.03 fas nu
Zlav = β − 1.96σβ = 0.19
Zhøj = β + 1.96σβ = 0.31
Transformeres disse grænser tilbage ved h−1 fas
Ylav = 0.192 = 0.0361
Yhøj = 0.312 = 0.0961
der ikke er symmetrisk omkring η = h−1(β) (men næsten!).
October 18, 2001 Mixed Models Course 68
353
Under den anden metode skitseret ovenfor skal vi beregne
ση =σβ
|h′(η)2|
=σβ
2= 0.015
idet σβ = 0.03. Vi far nu
Ylav = η − 1.96ση = 0.0331
Yhøj = η + 1.96ση = 0.0919.
Vi ser altsa at intervallerne [Ylav, Yhøj] og [Ylav, Yhøj] ligner
hinanden meget.
October 18, 2001 Mixed Models Course 69
For σβ = 0.06 gennemføres helt analoge beregninger og vi finder
ση = 0.06/2 = 0.03. Dermed fas
Zlav = β − 1.96σβ = 0.13
Zhøj = β + 1.96σβ = 0.37
Ylav = 0.132 = 0.0175
Yhøj = 0.372 = 0.1351
Ylav = η − 1.96ση = −0.0037
Yhøj = η + 1.96ση = 0.1213.
Vi ser nu at intervallerne [Ylav, Yhøj] og [Ylav, Yhøj] bliver mere
forskellige.
October 18, 2001 Mixed Models Course 70
19 Heterogeneous Variance
354
20 Variansheterogeneity: Example of effect oftransformation
This lecture illustrates the consequence of transformation, based on an analysis of an experimentinvestigation the effect of feed concentration on muscle content of a certain ingredient.
Transformation back to the original scale is discussed, both related to the mean level and toestimates of treatment effects.
Finally, examples are shown of different scales for usual production traits within animal produc-tion.
Link to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/VariansHetero.f.pdf
355
1
Variance homogeneity
• Variance homogeneity
• Transformation as a solution
• Effect of back-transformation.
12. oktober 2001
2
Variance homogeneity
Yij = µ + αi + εij
εij ∼ N (0, σ2)
Variance homogeneity implied by missing suffix
12. oktober 2001
20 Variansheterogeneity: Example of effect of transformation
356
3
Variance homogeneity
Herd no. Herd type Observations Herd averageA 1 10 12.3B 2 10 13.6C 1 10 10.2D 2 10 15.0
12. oktober 2001
4
Variance homogeneity
Herd no. Herd type Observations Herd averageA 1 100 12.3B 2 100 13.6C 1 1 10.2D 2 1 15.0
Weigh according to precision in measurements
12. oktober 2001
357
5
Variance of an average
Y =1
nobs
nobs∑i
Yi
V(Y ) = σ2Y
1nobs
The magnitude of variance inhomogeneity can be assessed byusing this as an analogue.
12. oktober 2001
6
Example
A certain ingredient is added to the feed ration in theconcentration x, x ∈ {0.0, 4.4, 6.2, 9.3}. The pigs are fed with therations until 60 kg. Biopsies are made at 60 kg. Concentration ofthe feed ingredient in the biopsy is measured. Let yi denote theconcentration of the ingredient in animal i.
12. oktober 2001
20 Variansheterogeneity: Example of effect of transformation
358
7
Mean curve
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
12. oktober 2001
8
Transformation ?
0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.1
0.2
0.3
0.4
Mean
Var
ianc
e
−1.0 −0.5 0.0 0.5 1.0
−3.
5−
2.5
−1.
5
Log(Mean)
Log(
Var
ianc
e)
12. oktober 2001
359
9
Model of expectations
E(y) = µ + αi
E(√
y) = µ + αi ⇒ E(y) = µ2 + α2i + 2µαi
E(log(y)) = µ + αi ⇒ E(y) = exp(µ) exp(αi)
12. oktober 2001
10
Curve �tting
E(y) = µ + β1x + β2x2
E(√
y) = µ + β1x + β2x2
E(log(y)) = µ + β1x + β2x2
12. oktober 2001
20 Variansheterogeneity: Example of effect of transformation
360
11
Model comparison
Dependent variable y√
y
Parameter Estimate P-value Estimate P-valueβ1 0.438 0.081∗∗∗ 0.242 0.026∗∗∗
β2 -0.007 0.008 -0.010 0.003∗∗
12. oktober 2001
12
Sqrt transformed
0 2 4 6 8
0.5
1.0
1.5
2.0
x, Feed contents
sqrt
(y),
Mus
cle
conc
., 60
kg
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
12. oktober 2001
361
13
Comparisons
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
0 2 4 6 8
01
23
45
x, Feed contents
y, M
uscl
e co
nc.,
60 k
g
12. oktober 2001
14
Treatment differences
Very often we are inter-ested in estimating treat-ment differences, α1−α2.In SASwe may use PDIFFoption in LSMEANS, orESTIMATE.How do we transform ??
0 2 4 6 8
0.5
1.0
1.5
2.0
x, Feed contents
sqrt
(y),
Mus
cle
conc
., 60
kg
12. oktober 2001
20 Variansheterogeneity: Example of effect of transformation
362
15
Conclusion
• Transformations may achieve variance homogeneity
• Transformations changes the model of the mean
• Back transformations of expected values OK
• Back transformations of general estimable functions may causeproblem
12. oktober 2001
16
Natural scales ?
• Geometric cell-count
• Daily gain vs Age at slaughter
• Feed utilisation FU/Gain vs. Gain/FU
• Calvings per cow year vs. Calving interval.
• Feeding interval vs. Feeding frequency
12. oktober 2001
363
20 Variansheterogeneity: Example of effect of transformation
364
21 Variance Homogeneity: Diurnal Variation
The purpose of this lecture was to illustrate the application and combination of some of theadvanced topics presented during the course.
A data set consisting of half-hourly observations of cortisol release in pigs was analysed using arandom regression model to capture the individual difference between pigs in diurnal variation.The power-of-mean approach was used to model the variance heterogeneity.
The application of such a model requires iterative use of PROC MIXED
The experience with the model was that it was possible to estimate the model parameters, butthat it was necessary to ’nudge’ the procedure to secure convergence of the iterative calculations,and that the calculations were very time-consuming. At the current of state-of-the art theapplication of such models is not a routine matter.
Link to full-screen presentation1
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/PowerOfMean.f.pdf
365
Example
In an experiment pigs were assigned to two different treatments on
order to study the effect of the treatment on the diurnal release of
cortisol. Cortisol were sampled continuously in a period of
approximately 24 hours for each animal.
Yijlm = µ+αi+Aij+(β1+B1j) cos(2π24
tijk)+(β2+B2j) sin(2π24
tijk)+εijk
where Yijk is the logarithmic transformed plasma cortisol cortisol, µ
general mean, αi effect of treatment, Aij random effect of animal j
within treatment i.
May 2, 2001 1
cos(2π24
tijk) and sin(2π24
tijk) are covariates for estimation of the
diurnal variation. βk and Bkj are corresponding regression
parameters. βk a systematic effect and, Bkj a random deviation
from the line. The random effects (Aij, B1k, B2k)> ∼ N 3(0, V ),
where V is a 3× 3 variance matrix. εijk ∼ N (0, σ2)
May 2, 2001 2
21 Variance Homogeneity: Diurnal Variation
366
Random regression model
The model is a random regression model and can be estimated usingthe following SAS statements
*Initial model ;
data a ;
....
PI=3.141593 ;
sint=sin(time*2*pi/24) ;
cost=cos(time*2*pi/24) ;
proc mixed CL data=a ;
class beh dyr ;
model Logcort = beh sint cost /ddfm=satterth ;
random intercept sint cost / subject=dyr*kuld*beh type=un ;
May 2, 2001 3
Resultat eksempler
15 20 25 30 35
3.0
4.0
5.0
6.0
Timer
log(
Cor
tisol
) Dyrnr: 17111
15 20 25 30 35
3.0
4.0
5.0
6.0
Timer
log(
Cor
tisol
) Dyrnr: 31111
15 20 25 30 35
3.0
4.0
5.0
6.0
Timer
log(
Cor
tisol
) Dyrnr: 35111
May 2, 2001 4
367
Model of Mean ?
exp(Xβ) =exp(µ + αi + Aij+ (1)
(β1 + B1j) cos(2π24
tijk) + (β2 + B2j) sin(2π24
tijk)) (2)
May 2, 2001 5
Modelling variance inhomogeneity
Logarithmic transform of cortisol were used because the variance
increased with the mean. Another approach to model this increase
directly.
Using the so-called power of mean method, we use the measured
cortisol level directly, but instead of homogenous variance we assume
εijk ∼ N (0, σ2
n|Xβ|δ)
and estimate σ2
n and δ.
In order to do this it is neccessary to perform the calculations with
PROC MIXED iterativly.
May 2, 2001 6
21 Variance Homogeneity: Diurnal Variation
368
SAS Model
*Initial model ;
proc mixed CL data=a ;
class kuld beh dyr ;
model cortisol = beh sint cost /ddfm=satterth s;
random intercept sint cost / subject=dyr*kuld*beh type=un ;
repeated / subject=dyr*kuld*beh local ;
ods output SolutionF=sf ;
ods output Covparms=cp ;
run;
May 2, 2001 7
* Loop ;
proc mixed CL data=a maxiTER=100 CONVH=1e-8;
class kuld beh dyr ;
model cortisol = beh sint cost /ddfm=satterth s;
random intercept sint cost /
subject=dyr*kuld*beh type=un s ;
repeated /local=pom(sf) ;
parms /pdata=cp ;
ods output SolutionF=sf1 ;
ods output SolutionR=Coeff ;
ods output Covparms=cp1 ;
run ;
proc compare brief data=sf compare=sf1 ;
var estimate ;
run;
data sf ; set sf1 ;
data cp ; set cp1 ;
run;
May 2, 2001 8
369
Experience
• δ was estimated as 3.10, indicating that logarithmic may not be
sufficient to obtain variance homogeneity (y−1
2)
• Estimation of a single model run much longer with pom
• It was necessary to adjust convergence criteria to obtain
convergence
• Approx. 10 iterations needed.
May 2, 2001 9
21 Variance Homogeneity: Diurnal Variation
370
22 Links to supplementary material
In order to illustrate the underlying principles in linear algebra it was necessary to introducea method for performing the calculations. For that purpose the IML procedure of SAS wasintroduced using the small program in ImlExample.sas1
Several SAS macros were introduced for performing standard calculations, e.g., a SAS macrofor calculation of autocorrelations2. The biometry research unit has further SAS macros andexamples on this web-page3.
The book used for the course, LMSW (Littell et al., 1996), contains a series of program examples.These examples may be downloaded from SAS institutes home pages, but can be found here 4
as well. Another important link is the SAS online manual5
Finally, most of the course participants used Word for text processing and SAS for making graphs.To get these two programs to interact satisfactorily was clearly a problem. Therefore a shortnote Eksport af grafer fra SAS til Word6 were made, and references made to SAS tech. reportts252x7 were the export facilities are discussed in detail.
1http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/ImlExample.sas2http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS/autocorr.sas3 http://www.jbs.agrsci.dk/Biometri/SASmateriale/SASmateriale.html4http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS/sasmixed.sas5http://dokumentation.agrsci.dk/sasdocv8/sasdoc/sashtml/onldoc.htm6http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/SAS2Word.pdf7http://www.jbs.agrsci.dk/biometri/Courses/HSVmixed2001/ts252x.pdf
371
22 Links to supplementary material
372
Bibliography
Littell, R.C., G.A. Milliken, W.W. Stroup, & R.D. Wolfinger (1996). SAS System for MixedModels. SAS Institute, Inc., Cary, NC.
373