26
eNote 3 1 eNote 3 Case study

Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 1

eNote 3

Case study

Page 2: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 INDHOLD 2

Indhold

3 Case study 1

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3.2 Initial explorative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3.3 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 7

3.4 Post hoc analysis and summarizing the results . . . . . . . . . . . . . . . . 8

3.4.1 Estimates of the variance parameters . . . . . . . . . . . . . . . . . 8

3.4.2 Estimates of the fixed parameters . . . . . . . . . . . . . . . . . . . . 9

3.4.3 Comparisons of the fixed parameters . . . . . . . . . . . . . . . . . 9

3.5 R-TUTORIAL: Creating report ready tables and figures . . . . . . . . . . . 13

3.5.1 Plot devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.2 Plotting with colours . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5.3 Report ready tables with xtable . . . . . . . . . . . . . . . . . . . . 15

3.6 R-TUTORIAL: Initial explorative analysis . . . . . . . . . . . . . . . . . . . 17

3.7 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 18

3.8 R-TUTORIAL: Post hoc analysis and summarizing the results . . . . . . . 20

3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Page 3: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.1 INTRODUCTION 3

3.1 Introduction

This module consists of the first part of a complete analysis of the beech wood datapresented as an example in module 2. The aim is to show that the principles of analyzingdata and summarizing results in the case of fixed ANOVA and/or regression modelsalso apply to mixed models. Maybe some readers will find it helpful to have some ofthese principles reviewed.

For completeness, we repeat the description and initial factor structure considerations.To investigate the effect of drying of beech wood on the humidity percentage, the fol-lowing experiment was conducted. Each of 20 planks was dried for a certain period oftime. Then, the humidity percentage was measured in 5 depths and 3 widths for eachplank:

depth 1: close to the topdepth 5: at the centerdepth 9: close to the bottomdepth 3: between 1 and 5depth 7: between 5 and 9

width 1: close to the sidewidth 3: at the centerwidth 2: between 1 and 3

As a result, there are 3 · 5 = 15 measurements for each plank, and altogether 300 obser-vations. The data may be found in the file planks.txt, and the data set is reproduced inthe following table.

Page 4: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.1 INTRODUCTION 4

Width 1 Width 2 Width 3Depth Depth Depth

Plank 1 3 5 7 9 1 3 5 7 9 1 3 5 7 91 3.4 4.9 5.0 4.9 4.0 4.1 4.7 5.2 4.6 4.3 4.4 4.8 5.0 4.9 4.22 4.3 5.5 6.2 5.4 4.7 3.9 5.6 5.7 5.5 4.9 4.0 4.7 4.5 3.9 4.03 4.2 5.5 5.6 6.3 4.5 5.4 6.2 6.1 6.4 5.2 4.5 4.9 4.9 4.9 4.44 4.4 6.0 7.1 6.9 4.6 4.6 6.1 6.6 6.5 4.7 4.9 5.9 5.8 6.4 4.75 3.9 4.7 5.2 5.0 3.7 4.2 5.2 5.4 4.8 3.9 4.0 4.4 4.4 4.1 3.56 4.6 5.9 6.3 5.8 4.8 5.9 7.3 6.9 6.9 4.4 5.2 5.7 6.6 6.0 4.07 3.9 5.6 6.0 5.3 5.0 4.9 6.9 7.1 6.1 4.5 4.3 5.4 5.9 5.5 4.28 3.9 4.5 5.3 5.6 4.7 3.7 4.9 4.8 4.9 4.3 3.8 4.5 5.4 4.8 4.09 3.6 4.1 4.0 4.4 3.7 3.8 5.1 5.0 4.6 3.3 3.0 3.9 4.7 4.9 3.8

10 6.5 8.7 9.5 7.9 6.6 6.9 8.9 7.4 7.0 6.9 5.8 7.5 7.7 7.3 5.911 3.7 5.2 5.5 5.9 4.4 4.7 5.8 5.7 4.9 4.2 3.7 5.0 6.3 5.2 4.312 4.3 5.8 6.2 5.2 4.4 4.8 6.7 7.0 6.1 5.2 5.1 5.7 5.9 6.4 5.113 6.5 8.8 9.1 8.9 6.0 5.9 7.5 8.4 7.9 5.7 4.0 4.2 4.9 4.6 3.514 4.4 6.2 6.7 6.4 4.3 5.7 7.0 7.4 7.3 5.5 4.6 6.2 6.8 5.8 4.915 5.5 7.1 7.5 6.9 5.4 6.4 8.4 8.9 8.1 6.1 6.5 8.4 9.1 9.2 7.516 5.2 6.0 6.2 6.6 5.3 6.6 7.6 7.8 7.7 5.8 5.9 6.7 6.7 5.0 3.917 3.7 4.5 5.0 4.5 3.7 3.7 4.4 4.8 4.4 4.3 3.7 4.5 4.7 5.3 3.918 6.0 7.4 7.8 7.5 5.7 6.9 8.6 8.8 7.5 5.4 5.1 6.1 5.2 5.4 4.719 3.8 4.6 4.8 4.4 3.8 3.7 4.7 4.7 4.3 3.7 3.3 3.5 3.7 3.4 3.220 6.1 7.4 7.7 6.7 4.6 4.7 6.3 7.1 6.5 5.1 4.7 6.0 6.0 6.3 4.2

In this experiment, there are 3 factors apart from the trivial factors I and 0. Let us usethe factor names plank, width, and depth. The factor plank has 20 levels, width has 3,and depth has 5 levels. For the ith measurement of humidity, planki denotes the plankon which this measurement was performed. Correspondingly, widthi and depthi denotethe width and depth, respectively, of this ith measurement. It would be natural to alsoinclude the interaction between width and depth, corresponding to the product factorwidth× depth. In this case, the product factor has 15 levels.

A natural model would include plank as a block factor, with depth and width enteringtogether with their interaction. If Yi denotes the humidity percentage corresponding tothe ith measurement, the model with a fixed block effect can be written as:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + δ(planki) + εi, (3-1)

where i = 1, . . . , 300, and where the εi’s are independent and normally distributed ran-dom variables with mean 0 and variance σ2. Or, similarly:

Yijk = µ + αi + β j + γij + δk + εijk

Page 5: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5

[I]266300

depth × width815

[plank]1920

width23

depth45

011

Figur 3.1: The factor structure diagram

where Yijk is the kth measurement within the (i, j)th combination of the two factors,i = 1, . . . , 3, j = 1, . . . , 5, and k = 1, . . . , 20. However, as pointed out in module 1, theblock (plank) effect should rather be considered a random effect, leading to the mixedmodel:

Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + d(planki) + εi, (3-2)

where d(planki) ∼ N(0, σ2Plank) and εi ∼ N(0, σ2), and where all d(planki)’s and εi’s are

independent. This model corresponds to the factor structure diagram given in figure3.1.

3.2 Initial explorative analysis

Now, it is time to do some initial plotting/explorative analysis of the data. Throughoutthis module, figures and results are presented without showing R code or raw R output.This can be seen as a standard for reports in this course. Typically, numerous figuresnot entering a final project report should be studied, since this phase is explorative, andfinal figures used to present the key results are chosen after the statistical analyses arecompleted.

The plotting of various average profiles is usually a helpful tool for data with severalfactors. In Figure 3.2, four such plots are presented. In the top left diagram, humiditypatterns for each plank are illustrated across widths. The plot was created by plotting

Page 6: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 6

3

4

5

6

7

8

9

width

mea

n of

hum

idity

1 2 3

3

4

5

6

7

8

9

depth

mea

n of

hum

idity

1 3 5 7 9

3

4

5

6

7

8

9

width

mea

n of

hum

idity

1 2 3

depth

57319

3

4

5

6

7

8

9

depth

mea

n of

hum

idity

1 3 5 7 9

width

213

Figur 3.2: Four average humidity profiles

the average humidity (the average over the five observed depths for each width andplank) against the widths.

It is immediately clear that there is extensive plank-to-plank variation in the level of hu-midity. The message about the width effect is less clear. To the top right, the correspon-ding plot for the depth effect is seen. Here, the message is much clearer: The humidityis high at the center (depth = 5), and low at the top (depth = 1) and at the bottom (depth= 9). As pointed out, this is the effect seen when the three widths are averaged. It couldbe that the depth effect is different for widths close to the side of the plank (width = 1)than for widths towards the center (width = 3). In other words, there could be a plank×width interaction effect, that we wouldn’t find in the plots above. Instead, similar plotsare given in the bottom diagrams of figure 3.2 for the widths and depths by averagingover the planks (that is, by plotting the 15 average values).

The depth structure from before is visible again. Also, it is seen that there is a clear shift

Page 7: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.3 TEST OF OVERALL EFFECTS/MODEL REDUCTION 7

in humidity level from width to width, and that the depth humidity pattern seems to beroughly the same for the three widths. However, there are some deviations from parallelpatterns, and the uncertainties in the deviations from parallel patterns are not visible.A similar increasing-decreasing width pattern, that was not clearly visible from the topdiagram, is now seen. This pattern seems to be roughly the same for all depths (with thesame precautions as before), and the low humidity levels for the top and bottom depthsare clearly seen. Note again that the two bottom plots contain the same information: hadthere been clear non-parallel patterns in one figure (an interaction effect), these wouldalso have appeared in the other figure. The next step is to start the actual statisticalanalysis of the data.

3.3 Test of overall effects/model reduction

A statistical analysis of this kind is commonly carried out in several steps, starting withthe basic model found from the factor structure considerations. This model usually con-tains every possible effect there may be in the data. However, it is of interest to sim-plify things into easily interpretable results, if possible. So, the idea is to remove non-significant “complex stuff” from the model before summarizing the results.

Carrying out the mixed model analysis corresponding to the model given by (3-2) givesthe following ANOVA table of fixed effects:

Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 266 78.26 <0.0001width 2 266 29.65 <0.0001depth×width 8 266 1.08 0.3745

We see that the depth×width interaction effect is non-significant. Hence, we remove theinteraction term and do the further analysis based on the model:

Yi = µ + α(widthi) + β(depthi) + d(planki) + εi, (3-3)

where d(planki) ∼ N(0, σ2Plank) and εi ∼ N(0, σ2). This model is illustrated by the factor

structure diagram in figure 3.3.

Note how the 8 degrees of freedom from the interaction effect has now been added tothe error degrees of freedom. The table of fixed effects then becomes:

Page 8: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 8

[I]274300

[plank]1920

width23

depth45

011

Figur 3.3: The factor structure diagram

Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 274 78.07 <0.0001width 2 274 29.57 <0.0001

Note that the removal of the non-significant interaction effect only has minor effects onthe conclusions regarding the depth and width effects: They are both extremely signi-ficant, confirming what we saw in our “exploration” above. Since there are no morenon-significant fixed effects, the model given by 3-3 is the final model to use for sum-marizing the results.

3.4 Post hoc analysis and summarizing the results

3.4.1 Estimates of the variance parameters

The final model is given by (3-3), since the main effects of width as well as depth areclearly significant. Estimates of the two variance parameters are:

σ̂2Planks = 0.98982, σ̂2 = 0.63622

Uncertainties of these estimates on the standard deviation scale, given as 95% profilelikelihood confidence limits, are:

Page 9: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 9

2.5 % 97.5 %Planks 0.72 1.37

Residual 0.58 0.69

The remaining part of this subsection on post-hoc analysis and presentation of resultsillustrates how the information in factors can be summarized whenever the factor doesnot interact with any other factor.

3.4.2 Estimates of the fixed parameters

Estimates of the expected values (LS-means) for each level of depth, together with theiruncertainties and 95% confidence intervals, are:

Estimate SE Lower UpperDepth 1 4.7150 0.2361 4.2270 5.2030Depth 3 5.9050 0.2361 5.4170 6.3930Depth 5 6.1950 0.2361 5.7070 6.6830Depth 7 5.8633 0.2361 5.3753 6.3514Depth 9 4.6533 0.2361 4.1653 5.1414

and correspondingly, for each level of width:

Estimate SE Lower UpperWidth 1 5.5140 0.2303 5.0352 5.9928Width 2 5.7860 0.2303 5.3072 6.2648Width 3 5.0990 0.2303 4.6202 5.5778

3.4.3 Comparisons of the fixed parameters

A commonly used method of post hoc analysis is to compare either specific pairs ofdepths (respectively widths) or compare all combinations within each factor. For the

Page 10: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 10

former, a standard t-test can be used, e.g.,

t =β̂(1)− β̂(2)

SE(

β̂(1)− β̂(2))

using the error degrees of freedom (274). Or, equivalently, expressed using a 95% confi-dence interval:

β̂(1)− β̂(2)± t.975,274SE(

β̂(1)− β̂(2))

In this case, the estimates of the fixed effects are raw averages of the data based onthe same number of observations for each level, so the standard error of the differencebetween two depth levels is given by

SE(

β̂(1)− β̂(2))=√

2√

σ̂2/60

This means that two depth levels are claimed signifcantly different if they differ by morethan

t.975,274√

2√

σ̂2/60

from each other. This is also called the 95% Least Significant Difference (LSD) value.

It would be tempting to do such tests for all combinations of levels within each factor.This is generally not an acceptable approach, though, since the probability of “signifi-cance-by-chance” becomes too large when many tests are performed simultaneously.This is called the “multiplicity problem”. With five depth levels, there are 5× 4/2 =10 possible depth pairs to compare. Comparing two specific levels (decided on beforeseeing the data) is not the same as comparing the smallest among five with the largestamong five. In a case with no effects, one would always expect the two latter levels tobe more different, by chance, than the former.

There are numerous solutions to handling this problem properly, if all comparisons areindeed made. All of them amount to requiring differences to be larger than required bythe usual t-test to be claimed significant. One general idea, that can be used whenevernumerous tests are performed simultaneously, is the Bonferroni correction: If k tests areperformed simultaneously, then use level α/k in each test rather than α. For instance,if all depth levels are compared, standard pair-wise t-test output can be used, but em-ploying the significance level 0.5% in each test rather than 5%: That is, only claimingthose differences significant for which the usual P-value is less than 0.005. This methodis known to be somewhat conservative, meaning that it may be too critical, or, in otherwords again: it may miss some actual differences.

Another solution is to use another distribution than the t-distribution when compari-sons are made. With the so-called Tukey-Kramer method, two depth levels would be

Page 11: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 11

claimed signifcantly different if they differ by more than

ν.975,J,274

√σ̂2/60

from each other, where J is the number of groups to be compared and ν0.975,J,274 is the97.5%-quantile of the so-called “studentized range” distribution with J groups. This di-stribution takes into account that the two levels we compare in a single test come fromaltogether J groups. This distribution is, just like the t-distribution, tabulated or “avai-lable” on the computer. Note that when J = 2, then the studentized range distributioncorresponds to the t-distribution,

ν.975,2,274 = t.975,274√

2

The Tukey-adjusted results are:

Depth Parameter Estimate SE Lower Upper P-valuedifference1-3 β(1)− β(2) -1.1900 0.1162 -1.5090 -0.8710 <0.00011-5 β(1)− β(3) -1.4800 0.1162 -1.7990 -1.1610 <0.00011-7 β(1)− β(4) -1.1483 0.1162 -1.4673 -0.8294 <0.00011-9 β(1)− β(5) 0.06167 0.1162 -0.2573 0.3806 0.98413-5 β(2)− β(3) -0.2900 0.1162 -0.6090 0.02896 0.09433-7 β(2)− β(4) 0.04167 0.1162 -0.2773 0.3606 0.99643-9 β(2)− β(5) 1.2517 0.1162 0.9327 1.5706 <0.00015-7 β(3)− β(4) 0.3317 0.1162 0.01271 0.6506 0.03705-9 β(3)− β(5) 1.5417 0.1162 1.2227 1.8606 <0.00017-9 β(4)− β(5) 1.2100 0.1162 0.8910 1.5290 <0.0001

Note that since the p-values are “corrected”, that is, based on the more proper studen-tized range distribution, they can be used directly without any additional Bonferronicorrection. Similarly, for the width effect:

Width Parameter Estimate SE Lower Upper P-valuedifference1-2 α(1)− α(2) -0.2720 0.08997 -0.4840 -0.05998 0.00771-3 α(1)− α(3) 0.4150 0.08997 0.2030 0.6270 <0.00012-3 α(2)− α(3) 0.6870 0.08997 0.4750 0.8990 <0.0001

Often, the key information from the table for each effect is summarized in a simple tablein which the lsmeans are ordered by size:

Page 12: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 12

EstimateDepth 9 4.6533aDepth 1 4.7150aDepth 7 5.8633bDepth 3 5.9050bcDepth 5 6.1950c

The letter subscripts express the 5% significance results of the 10 pair-wise comparisons:

• Two depths sharing a subscript are NOT significantly different.

• Two depths NOT sharing a subscript are significantly different.

Thus, the pattern already observed in Figure 3.2 can now be statistically confirmed: The-re is, clearly, lower humidity close to the top and the bottom (and no difference betweentop and bottom). Also, there is an indication that the center position has significantlyhigher humidity than the “in-between” positions (between which no difference is seen).

For the width effect, the summary table becomes particularly simple, since all threedifferences are significant:

EstimateWidth 3 5.0990aWidth 1 5.5140bWidth 2 5.7860c

For these data, a figure of the raw data, like one of the bottom plots from figure 3.2,together with a statement of the lack of significant width×depth interaction, and thetwo summary tables would probably suffice for most purposes. In later modules, wewill see how additional plots of the model expectations/details will provide informativefigures for interpretation.

Other types of post hoc analysis than the multiple comparison approach may be em-ployed, especially when quantitative information about the factor levels is available. Inthis case, we know exactly the positions which correspond to the different widths anddepths, and this could be used in the analysis. For instance, it could be investigatedwhether a quadratic function of the depths could be used to describe the humidity pat-tern. Apart from the nice direct functional interpretation of the dependence of humidityon depth, it could possibly provide more powerful tests for interaction effects. In fact,

Page 13: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 13

this would still be a “linear” model which could be handled by the lmer function fromthe lme4-package. We will return to such analyzes in a later module. Non-linear models(using, e.g., exponentials, etc.) could also be an option in some cases, but then the modelwill no longer be a linear model, and additional theory and packages would be needed.

The summary approach above was based on the assumption of no interaction betweenwidth and depth, that is, the conclusions regarding widths hold for all the depths, andvice versa. Had there been a significant interaction, we would have to present, say, thedepth effects for each of the three widths (and/or vice versa), since the significance tellsus that these three conclusions will NOT be the same. In practice, we would proceed asabove, BUT using the combined width×depth factor with 15 levels rather than for eachof them separately. We will see examples of this later.

One important step is missing from the analysis in this section: An investigation of thevalidity of the model assumptions! We will return to this issue in module 6, where wewill then finish the analysis of this data set on the humidity of beech wood planks.

3.5 R-TUTORIAL: Creating report ready tables and figures

Since reports without raw R-code or raw R-output are requested in this course (as wellas more generally), it is useful to be able to apply some of the tools given in R to createnice tables (and figures) for LATEX and/or Microsoft Word-based report writing.

3.5.1 Plot devices

First of all, there are different device functions for saving plots in various formats, e.g.,to save a plot as a pdf, write:

pdf("myplanksinteractionplot.pdf")

with(planks, interaction.plot(depth, width, humidity, col=2:4))

dev.off()

Note that dev.off() lets R know that no further graphics commands will follow. It turnsoff the graphics device and saves the figure to the designated file.

Or as a png (you choose the extension of the output file yourself, but it is clearly recom-mended to choose an extension that corresponds to the device function, here pdf orpng):

Page 14: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 14

png("myplanksinteractionplot.png")

with(planks, interaction.plot(depth, width, humidity, col=2:4))

dev.off()

Similarly, there are bmp, jpeg, and other device functions. Plots can also be exporteddirectly from the “Plots”-window in RStudio.

3.5.2 Plotting with colours

Colors can be specified in several different ways. Also, various plot functions may havevarious colour options for colouring different aspects of the plot. The simplest way tospecify a colour is with a character string giving the color name (e.g., "red"). A list ofthe possible colors can be obtained with the function colours, write:

colors (distinct = FALSE)

to see all the possible choices. Have a look at this website to see what all these colourslook like, or go to: the QuickR website.

Even more easily, you can use integers as colour codes. As a default, R uses a palette of8 colours:

palette()

[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow"

[8] "gray"

which can then be refered to by the numbers 1-8. Then, it cycles modulus 8, meaningthat using 9 gives "black" again.

There are a number of pre-defined palettes that can be used when more (and better)collections of colours are needed, e.g., via the functions rainbow and hsv. For example,write:

Page 15: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 15

?heat.colors

which could then be used as follows (plots not shown):

par(mfrow=c(2,2))

with(planks, {interaction.plot(width, plank, humidity, legend=FALSE, col=heat.colors(20))

interaction.plot(depth, plank, humidity, legend=FALSE, col=terrain.colors(20))

interaction.plot(width, depth, humidity, col=topo.colors(5))

interaction.plot(depth, width, humidity, col=cm.colors(3))

})par(mfrow=c(1,1))

Or:

# Notice the value 10 is used to tell that you want 10 colors

# e.g. rainbow(10) gives 10 different colors. rainbow(5) gives 5 colors

with(planks, interaction.plot(width, depth, humidity, col=rainbow(5)))

Or:

with(planks, interaction.plot(width, depth, humidity, col=hsv(1:5/5)))

3.5.3 Report ready tables with xtable

Nice tables can be produced using the xtable function from the xtable-package. Anexample:

means <- as.matrix(with(planks, tapply(humidity, width, mean)))

xtable(means)

% latex table generated in R 3.5.1 by xtable 1.8-2 package

% Fri Sep 13 16:17:21 2019

\begin{table}[ht]

\centering

\begin{tabular}{rr}

Page 16: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 16

\hline

& x \\

\hline

1 & 5.51 \\

2 & 5.79 \\

3 & 5.10 \\

\hline

\end{tabular}

\end{table}

When this tex-code is included in your tex-file it will appear in the report as in thefollowing table.

x1 5.512 5.793 5.10

Note how the input to xtable was a matrix here. The function is prepared to recognizea number of different R-objects, see, e.g.:

methods(xtable)

[1] xtable.anova* xtable.aov*

[3] xtable.aovlist* xtable.coxph*

[5] xtable.data.frame* xtable.emmGrid*

[7] xtable.glm* xtable.gmsar*

[9] xtable.lagImpact* xtable.lm*

[11] xtable.matrix* xtable.prcomp*

[13] xtable.ref.grid* xtable.sarlm*

[15] xtable.sarlm.pred* xtable.spautolm*

[17] xtable.sphet* xtable.splm*

[19] xtable.stsls* xtable.summary.aov*

[21] xtable.summary.aovlist* xtable.summary.glm*

[23] xtable.summary.gmsar* xtable.summary.lm*

[25] xtable.summary.prcomp* xtable.summary.ref.grid*

[27] xtable.summary.sarlm* xtable.summary.spautolm*

[29] xtable.summary.sphet* xtable.summary.splm*

[31] xtable.summary.stsls* xtable.summary_emm*

Page 17: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.6 R-TUTORIAL: INITIAL EXPLORATIVE ANALYSIS 17

[33] xtable.table* xtable.ts*

[35] xtable.zoo*

see ’?methods’ for accessing help and source code

For instance, ANOVA-tables will be recognized. So a LATEX-user can then copy these tex-lines into the report’s .tex-document. Or, to integrate the R-code into the LATEX-code,use the knitR R-package to create the pure tex-file from an .Rnw file, which is a kindof LATEX-file with all the R-code integrated into it, with a lot of flexibility in controllingwhat will be shown/evaluated etc. in the output. This can be used for both raw code,results, tables, and figures.

A Microsoft Word user may also use xtable through the html-print-option:

print(xtable(means), type = "html")

<!-- html table generated in R 3.5.1 by xtable 1.8-2 package -->

<!-- Fri Sep 13 16:17:21 2019 -->

<table border=1>

<tr> <th> </th> <th> x </th> </tr>

<tr> <td align="right"> 1 </td> <td align="right"> 5.51 </td> </tr>

<tr> <td align="right"> 2 </td> <td align="right"> 5.79 </td> </tr>

<tr> <td align="right"> 3 </td> <td align="right"> 5.10 </td> </tr>

</table>

Then, the table may be printed directly into a file:

print(xtable(means), type = "html", file = "myhtmltable.html")

Open the file in a browser and copy-paste to Word.

3.6 R-TUTORIAL: Initial explorative analysis

The data set planks is imported as described in eNote 1. Assume that the data set iscalled planks in R.

The plots in Figure 3.2 are produced using the function interaction.plot() which

Page 18: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.7 TEST OF OVERALL EFFECTS/MODEL REDUCTION 18

requires three arguments: first, the factor that is to be on the x-axis, then the factor thatseparates the data into distinct graphs, and finally the response variable. An optionalparameter legend, which takes either FALSE or TRUE, specifies whether or not a legendshould be added (relating the graphs to the factor levels).

The code that produced this figure was:

par(mar = c(3.5, 3.5, 1, 1), # smaller margin on top and right

mgp = c(2.4,0.7,0), # position of axis labels, ticks labels and axis

las=1)

planks <- read.table("planks.txt", header = TRUE, sep = ",")

Ylim <- c(3, 9)

par(mfrow=c(2,2))

with(planks, {interaction.plot(width, plank, humidity, ylim=Ylim, legend=FALSE,

bty="n", col=2:11, xtick = TRUE)

interaction.plot(depth, plank, humidity, ylim=Ylim, legend=FALSE,

bty="n", col=2:11, xtick = TRUE)

interaction.plot(width, depth, humidity, ylim=Ylim,

bty="n", col=2:11, xtick = TRUE)

interaction.plot(depth, width, humidity, ylim=Ylim,

bty="n", col=2:11, xtick = TRUE) })par(mfrow=c(1,1))

Notice that the with{ ... } function around the interaction.plot statements resultsin evaluation of the statements within a frame where the data set planks is available.This approach avoids the necessity of attaching data sets or referring to them repeatedly.

The function par is used to set a variety of graphical parameters (try typing ?par fordetails). The parameter mfrow is a vector of length two where the first component is thenumber of rows on the graphical device and the second component is the number ofcolumns. To return to the default use par(mfrow=c(1, 1)).

3.7 Test of overall effects/model reduction

In the previous section, we did not need to define the variables as factors in R to useinteraction.plot, but, in the following, we do. Configure the three variables depth,plank, and width as factors:

Page 19: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.7 TEST OF OVERALL EFFECTS/MODEL REDUCTION 19

planks$plank <- factor(planks$plank)

planks$depth <- factor(planks$depth)

planks$width <- factor(planks$width)

Analysis of models including random effects can be done using the lmer function fromthe R-package lme4. The general model with a fixed-effects structure consisting of theinteraction between two factors and random effects assigned to the plank is specified asfollows

require(lme4)

model1 <- lmer(humidity ~ depth*width + (1|plank), data = planks)

Note that the fixed-effects structure is specified as either depth + width + depth:width

or depth*width like here — they give the same model. The relevant tests of the fixed-effects structure are obtained by applying anova(model1) after making sure that thelmerTest-package is available

require(lmerTest)

anova(model1)

Type III Analysis of Variance Table with Satterthwaite’s method

Sum Sq Mean Sq NumDF DenDF F value Pr(>F)

depth 126.388 31.5970 4 266 78.259 < 2.2e-16 ***

width 23.939 11.9696 2 266 29.646 2.381e-12 ***

depth:width 3.501 0.4377 8 266 1.084 0.3745

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

lmerTest automatically loads lme4, so we could have just run require(lmerTest) fromthe beginning instead. The interaction is not significant, and a reduced model can beformulated

model2 <- lmer(humidity ~ depth + width + (1|plank), data = planks)

anova(model2)

Type III Analysis of Variance Table with Satterthwaite’s method

Sum Sq Mean Sq NumDF DenDF F value Pr(>F)

Page 20: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS20

depth 126.388 31.597 4 274 78.068 < 2.2e-16 ***

width 23.939 11.970 2 274 29.574 2.348e-12 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Both factors are highly significant, and no further reduction is possible.

3.8 R-TUTORIAL: Post hoc analysis and summarizing theresults

Estimates of the variance parameters are found with

VarCorr(model2)

Groups Name Std.Dev.

plank (Intercept) 0.98980

Residual 0.63619

Note that the estimates are given on the standard-deviation scale — not the variance-scale.

The so-called profile likelihood-based confidence intervals for the two variance para-meters are found with:

m2prof <- profile(model2, which=1:2, signames=FALSE)

confint(m2prof)

2.5 % 97.5 %

sd_(Intercept)|plank 0.7202931 1.3719277

sigma 0.5806037 0.6852898

By default, the profile function profiles the likelihood for all model parameters, butsince profiling is time-consuming, and since we are only interested in the profile like-lihood confidence intervals for the two variance parameters, we set the which = 1:2

option.

Page 21: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS21

Like in eNote 1, we can use emmeans to compute the estimated mean levels and theirdifferences:

require(emmeans)

emmeans::emmeans(model2, ~ depth)

depth emmean SE df lower.CL upper.CL

1 4.71 0.236 23.3 4.23 5.20

3 5.91 0.236 23.3 5.42 6.39

5 6.20 0.236 23.3 5.71 6.68

7 5.86 0.236 23.3 5.38 6.35

9 4.65 0.236 23.3 4.17 5.14

Results are averaged over the levels of: width

Degrees-of-freedom method: kenward-roger

Confidence level used: 0.95

emmeans::emmeans(model2, pairwise ~ width)

$emmeans

width emmean SE df lower.CL upper.CL

1 5.51 0.23 21.1 5.04 5.99

2 5.79 0.23 21.1 5.31 6.26

3 5.10 0.23 21.1 4.62 5.58

Results are averaged over the levels of: depth

Degrees-of-freedom method: kenward-roger

Confidence level used: 0.95

$contrasts

contrast estimate SE df t.ratio p.value

1 - 2 -0.272 0.09 274 -3.023 0.0077

1 - 3 0.415 0.09 274 4.613 <.0001

2 - 3 0.687 0.09 274 7.636 <.0001

Results are averaged over the levels of: depth

P value adjustment: tukey method for comparing a family of 3 estimates

Observe that writing pairwise ~ generates all pairwise differences of the LS-means.

Page 22: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS22

The multcomp package also includes the so-called compact letter displays:

require(multcomp)

tuk2 <- glht(model2, linfct = mcp(depth = "Tukey"))

tuk.cld2 <- cld(tuk2)

tuk.cld2 # Display the CLD

1 3 5 7 9

"a" "bc" "c" "b" "a"

# Plot the compact-letter-display:

old.par <- par(no.readonly=TRUE) # Save current graphics parameters

par(mai=c(1,1,1.25,1)) # Use sufficiently large upper margin

plot(tuk.cld2, col=2:6)

1 3 5 7 9

34

56

78

depth

linea

r pr

edic

tor

a

b c

c

b

a

Page 23: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.9 EXERCISES 23

par(old.par) # reset graphics parameters

Finally, the lmerTest-package has a ranova function which produces an ANOVA-liketable of χ2-tests of the random effects in a mixed model:

ranova(model2)

ANOVA-like table for random-effects: Single term deletions

Model:

humidity ~ depth + width + (1 | plank)

npar logLik AIC LRT Df Pr(>Chisq)

<none> 9 -331.91 681.82

(1 | plank) 8 -474.84 965.68 285.85 1 < 2.2e-16 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

3.9 Exercises

Exercise 1 Colour of spinach

Spinach heated to 90 or 100 degrees Celcius was vacuum packed and stored for 0, 1,or 2 weeks before the packages were opened and chill-stored in normal atmospherefor 0, 1, or 2 days. Then, the colour was measured on a Hunter Lab. Two of the colourcoordinates, a and b (measuring, respectively, something like red and yellow colour),were recorded, and are given in the data set below. The variable batch is a blockingvariable referring to two batches of spinach. The data is available in the file spinage.txtand listed here:

Batch temp weeks days a b

A 90 0 0 -7.19 8.89

A 90 0 1 -7.17 9.11

A 90 0 2 -7.49 9.69

A 90 1 0 -7.43 9.97

A 90 1 1 -7.07 9.09

Page 24: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.9 EXERCISES 24

A 90 1 2 -7.16 9.19

A 90 2 0 -6.69 10.07

A 90 2 1 -6.80 9.13

A 90 2 2 -6.93 9.58

A 100 0 0 -7.54 9.09

A 100 0 1 -7.19 8.74

A 100 0 2 -7.11 8.63

A 100 1 0 -7.16 8.92

A 100 1 1 -7.23 8.89

A 100 1 2 -7.38 9.36

A 100 2 0 -5.28 10.41

A 100 2 1 -5.71 9.72

A 100 2 2 -7.35 10.10

B 90 0 0 -7.45 9.81

B 90 0 1 -7.53 9.52

B 90 0 2 -7.54 9.89

B 90 1 0 -6.88 9.35

B 90 1 1 -7.16 9.55

B 90 1 2 -6.56 8.91

B 90 2 0 -7.07 10.39

B 90 2 1 -6.13 9.52

B 90 2 2 -6.63 9.43

B 100 0 0 -7.45 9.23

B 100 0 1 -7.75 9.18

B 100 0 2 -7.58 9.32

B 100 1 0 -7.10 8.97

B 100 1 1 -7.06 9.16

B 100 1 2 -6.93 9.08

B 100 2 0 -7.17 10.34

B 100 2 1 -7.30 9.99

B 100 2 2 -6.64 9.31

a) Write down all the factors relevant to the analysis, including their levels and mutu-al structure. For example, are they crossed or nested? Make the factor structurediagram.

b) Analyse the effect of the different factors on the two colour measurements, andsummarize the significant effects (LS-means, etc.).

Page 25: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.9 EXERCISES 25

Exercise 2 Sensory evaluation of spinach

In the spinach experiment from exercise 3.1, sensory evaluations were performed inaddition to the colour measurements. The treatments were still the same, so the factorswere heating temperature, original storage (weeks), storage after opening (days), andbatch.

The products from each treatment combination, from each batch, were assessed by (so-me of) 7 assessors, who gave a score (between 0 and 15) for each of 6 different sensoryproperties (see the list further below).

There was one sesssion for each combination of batch and weeks, and at each sessionthe assessors evaluated the same 6 products (6 combinations of days and temperature).Note that not all assessors were present at all sessions.

The results, with one line per evaluation, are given in the order:weeks of storage, days after opening, batch, temperature, session number, assessor num-ber, and the six sensory properties hay flavour 1, hay flavour 2, hay taste, spinach fla-vour 1, spinach flavour 2, spinach taste.

The data is available in the file spinagesens.txt and partly listed below:

0 0 A 90 1 1 4.1 3.6 4.6 3.9 9.3 5

0 0 A 90 1 2 . . . . . .

0 0 A 90 1 3 . . . . . .

0 0 A 90 1 4 6 3.7 4.5 5.4 10.8 10.2

0 0 A 90 1 5 8.6 4.1 6.7 3.8 10 7.2

0 0 A 90 1 6 4.3 3.8 5.1 7.1 10.8 9.6

0 0 A 90 1 7 8.9 5.7 7 4.7 8.8 8.3

0 0 A 100 1 1 2.6 .8 6.2 2.7 8.7 6.3

0 0 A 100 1 2 . . . . . .

0 0 A 100 1 3 . . . . . .

0 0 A 100 1 4 6.1 2.5 4.6 6.4 11 11.3

0 0 A 100 1 5 5.9 6.5 5.5 8.7 8.4 7.2

0 0 A 100 1 6 3.8 2.8 3.7 4.9 10.7 8.9

0 0 A 100 1 7 10.4 4.3 7.1 3.3 7 8.6

0 0 B 90 4 1 3.5 4.3 6.7 4.1 9 10.6

0 0 B 90 4 2 . . . . . .

Page 26: Case study - DTU Course Website 02429 › ... › enotepdfs › eNote-3.pdfeNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5 [I] 266 300 depth´ width 8 15 [plank] 19 20 width 2 3 depth 4

eNote 3 3.9 EXERCISES 26

. . . . . . . . . (252 lines in total)

2 2 B 100 6 6 3.6 3.7 3.9 4.4 5.9 7.4

2 2 B 100 6 7 . . . . . .

a) Write down the factors relevant to the analysis, including their levels and mutualstructure. [You should include a production factor corresponding to the combina-tions of temperature, weeks, days, and batch.]

b) Make the factor structure diagram for one of the sensory variables including all thefactors (quite big and complicated - do it if you feel it helps your understanding).

c) Specify which effects you want to include in the model. Pay particular attentionto which interactions you want in the model. [Include at least some of the inte-ractions between assessor and treatment factors]. Which effects are random andwhich are fixed?

d) Perform the analysis for one of the sensory properties and draw conclusions.