Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
eNote 3 1
eNote 3
Case study
eNote 3 INDHOLD 2
Indhold
3 Case study 1
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Initial explorative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 7
3.4 Post hoc analysis and summarizing the results . . . . . . . . . . . . . . . . 8
3.4.1 Estimates of the variance parameters . . . . . . . . . . . . . . . . . 8
3.4.2 Estimates of the fixed parameters . . . . . . . . . . . . . . . . . . . . 9
3.4.3 Comparisons of the fixed parameters . . . . . . . . . . . . . . . . . 9
3.5 R-TUTORIAL: Creating report ready tables and figures . . . . . . . . . . . 13
3.5.1 Plot devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.5.2 Plotting with colours . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.3 Report ready tables with xtable . . . . . . . . . . . . . . . . . . . . 15
3.6 R-TUTORIAL: Initial explorative analysis . . . . . . . . . . . . . . . . . . . 17
3.7 Test of overall effects/model reduction . . . . . . . . . . . . . . . . . . . . . 18
3.8 R-TUTORIAL: Post hoc analysis and summarizing the results . . . . . . . 20
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
eNote 3 3.1 INTRODUCTION 3
3.1 Introduction
This module consists of the first part of a complete analysis of the beech wood datapresented as an example in module 2. The aim is to show that the principles of analyzingdata and summarizing results in the case of fixed ANOVA and/or regression modelsalso apply to mixed models. Maybe some readers will find it helpful to have some ofthese principles reviewed.
For completeness, we repeat the description and initial factor structure considerations.To investigate the effect of drying of beech wood on the humidity percentage, the fol-lowing experiment was conducted. Each of 20 planks was dried for a certain period oftime. Then, the humidity percentage was measured in 5 depths and 3 widths for eachplank:
depth 1: close to the topdepth 5: at the centerdepth 9: close to the bottomdepth 3: between 1 and 5depth 7: between 5 and 9
width 1: close to the sidewidth 3: at the centerwidth 2: between 1 and 3
As a result, there are 3 · 5 = 15 measurements for each plank, and altogether 300 obser-vations. The data may be found in the file planks.txt, and the data set is reproduced inthe following table.
eNote 3 3.1 INTRODUCTION 4
Width 1 Width 2 Width 3Depth Depth Depth
Plank 1 3 5 7 9 1 3 5 7 9 1 3 5 7 91 3.4 4.9 5.0 4.9 4.0 4.1 4.7 5.2 4.6 4.3 4.4 4.8 5.0 4.9 4.22 4.3 5.5 6.2 5.4 4.7 3.9 5.6 5.7 5.5 4.9 4.0 4.7 4.5 3.9 4.03 4.2 5.5 5.6 6.3 4.5 5.4 6.2 6.1 6.4 5.2 4.5 4.9 4.9 4.9 4.44 4.4 6.0 7.1 6.9 4.6 4.6 6.1 6.6 6.5 4.7 4.9 5.9 5.8 6.4 4.75 3.9 4.7 5.2 5.0 3.7 4.2 5.2 5.4 4.8 3.9 4.0 4.4 4.4 4.1 3.56 4.6 5.9 6.3 5.8 4.8 5.9 7.3 6.9 6.9 4.4 5.2 5.7 6.6 6.0 4.07 3.9 5.6 6.0 5.3 5.0 4.9 6.9 7.1 6.1 4.5 4.3 5.4 5.9 5.5 4.28 3.9 4.5 5.3 5.6 4.7 3.7 4.9 4.8 4.9 4.3 3.8 4.5 5.4 4.8 4.09 3.6 4.1 4.0 4.4 3.7 3.8 5.1 5.0 4.6 3.3 3.0 3.9 4.7 4.9 3.8
10 6.5 8.7 9.5 7.9 6.6 6.9 8.9 7.4 7.0 6.9 5.8 7.5 7.7 7.3 5.911 3.7 5.2 5.5 5.9 4.4 4.7 5.8 5.7 4.9 4.2 3.7 5.0 6.3 5.2 4.312 4.3 5.8 6.2 5.2 4.4 4.8 6.7 7.0 6.1 5.2 5.1 5.7 5.9 6.4 5.113 6.5 8.8 9.1 8.9 6.0 5.9 7.5 8.4 7.9 5.7 4.0 4.2 4.9 4.6 3.514 4.4 6.2 6.7 6.4 4.3 5.7 7.0 7.4 7.3 5.5 4.6 6.2 6.8 5.8 4.915 5.5 7.1 7.5 6.9 5.4 6.4 8.4 8.9 8.1 6.1 6.5 8.4 9.1 9.2 7.516 5.2 6.0 6.2 6.6 5.3 6.6 7.6 7.8 7.7 5.8 5.9 6.7 6.7 5.0 3.917 3.7 4.5 5.0 4.5 3.7 3.7 4.4 4.8 4.4 4.3 3.7 4.5 4.7 5.3 3.918 6.0 7.4 7.8 7.5 5.7 6.9 8.6 8.8 7.5 5.4 5.1 6.1 5.2 5.4 4.719 3.8 4.6 4.8 4.4 3.8 3.7 4.7 4.7 4.3 3.7 3.3 3.5 3.7 3.4 3.220 6.1 7.4 7.7 6.7 4.6 4.7 6.3 7.1 6.5 5.1 4.7 6.0 6.0 6.3 4.2
In this experiment, there are 3 factors apart from the trivial factors I and 0. Let us usethe factor names plank, width, and depth. The factor plank has 20 levels, width has 3,and depth has 5 levels. For the ith measurement of humidity, planki denotes the plankon which this measurement was performed. Correspondingly, widthi and depthi denotethe width and depth, respectively, of this ith measurement. It would be natural to alsoinclude the interaction between width and depth, corresponding to the product factorwidth× depth. In this case, the product factor has 15 levels.
A natural model would include plank as a block factor, with depth and width enteringtogether with their interaction. If Yi denotes the humidity percentage corresponding tothe ith measurement, the model with a fixed block effect can be written as:
Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + δ(planki) + εi, (3-1)
where i = 1, . . . , 300, and where the εi’s are independent and normally distributed ran-dom variables with mean 0 and variance σ2. Or, similarly:
Yijk = µ + αi + β j + γij + δk + εijk
eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 5
[I]266300
depth × width815
[plank]1920
width23
depth45
011
Figur 3.1: The factor structure diagram
where Yijk is the kth measurement within the (i, j)th combination of the two factors,i = 1, . . . , 3, j = 1, . . . , 5, and k = 1, . . . , 20. However, as pointed out in module 1, theblock (plank) effect should rather be considered a random effect, leading to the mixedmodel:
Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + d(planki) + εi, (3-2)
where d(planki) ∼ N(0, σ2Plank) and εi ∼ N(0, σ2), and where all d(planki)’s and εi’s are
independent. This model corresponds to the factor structure diagram given in figure3.1.
3.2 Initial explorative analysis
Now, it is time to do some initial plotting/explorative analysis of the data. Throughoutthis module, figures and results are presented without showing R code or raw R output.This can be seen as a standard for reports in this course. Typically, numerous figuresnot entering a final project report should be studied, since this phase is explorative, andfinal figures used to present the key results are chosen after the statistical analyses arecompleted.
The plotting of various average profiles is usually a helpful tool for data with severalfactors. In Figure 3.2, four such plots are presented. In the top left diagram, humiditypatterns for each plank are illustrated across widths. The plot was created by plotting
eNote 3 3.2 INITIAL EXPLORATIVE ANALYSIS 6
3
4
5
6
7
8
9
width
mea
n of
hum
idity
1 2 3
3
4
5
6
7
8
9
depth
mea
n of
hum
idity
1 3 5 7 9
3
4
5
6
7
8
9
width
mea
n of
hum
idity
1 2 3
depth
57319
3
4
5
6
7
8
9
depth
mea
n of
hum
idity
1 3 5 7 9
width
213
Figur 3.2: Four average humidity profiles
the average humidity (the average over the five observed depths for each width andplank) against the widths.
It is immediately clear that there is extensive plank-to-plank variation in the level of hu-midity. The message about the width effect is less clear. To the top right, the correspon-ding plot for the depth effect is seen. Here, the message is much clearer: The humidityis high at the center (depth = 5), and low at the top (depth = 1) and at the bottom (depth= 9). As pointed out, this is the effect seen when the three widths are averaged. It couldbe that the depth effect is different for widths close to the side of the plank (width = 1)than for widths towards the center (width = 3). In other words, there could be a plank×width interaction effect, that we wouldn’t find in the plots above. Instead, similar plotsare given in the bottom diagrams of figure 3.2 for the widths and depths by averagingover the planks (that is, by plotting the 15 average values).
The depth structure from before is visible again. Also, it is seen that there is a clear shift
eNote 3 3.3 TEST OF OVERALL EFFECTS/MODEL REDUCTION 7
in humidity level from width to width, and that the depth humidity pattern seems to beroughly the same for the three widths. However, there are some deviations from parallelpatterns, and the uncertainties in the deviations from parallel patterns are not visible.A similar increasing-decreasing width pattern, that was not clearly visible from the topdiagram, is now seen. This pattern seems to be roughly the same for all depths (with thesame precautions as before), and the low humidity levels for the top and bottom depthsare clearly seen. Note again that the two bottom plots contain the same information: hadthere been clear non-parallel patterns in one figure (an interaction effect), these wouldalso have appeared in the other figure. The next step is to start the actual statisticalanalysis of the data.
3.3 Test of overall effects/model reduction
A statistical analysis of this kind is commonly carried out in several steps, starting withthe basic model found from the factor structure considerations. This model usually con-tains every possible effect there may be in the data. However, it is of interest to sim-plify things into easily interpretable results, if possible. So, the idea is to remove non-significant “complex stuff” from the model before summarizing the results.
Carrying out the mixed model analysis corresponding to the model given by (3-2) givesthe following ANOVA table of fixed effects:
Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 266 78.26 <0.0001width 2 266 29.65 <0.0001depth×width 8 266 1.08 0.3745
We see that the depth×width interaction effect is non-significant. Hence, we remove theinteraction term and do the further analysis based on the model:
Yi = µ + α(widthi) + β(depthi) + d(planki) + εi, (3-3)
where d(planki) ∼ N(0, σ2Plank) and εi ∼ N(0, σ2). This model is illustrated by the factor
structure diagram in figure 3.3.
Note how the 8 degrees of freedom from the interaction effect has now been added tothe error degrees of freedom. The table of fixed effects then becomes:
eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 8
[I]274300
[plank]1920
width23
depth45
011
Figur 3.3: The factor structure diagram
Source of Numerator degrees Denominator degrees F- P-variation of freedom of freedom statistics valuesdepth 4 274 78.07 <0.0001width 2 274 29.57 <0.0001
Note that the removal of the non-significant interaction effect only has minor effects onthe conclusions regarding the depth and width effects: They are both extremely signi-ficant, confirming what we saw in our “exploration” above. Since there are no morenon-significant fixed effects, the model given by 3-3 is the final model to use for sum-marizing the results.
3.4 Post hoc analysis and summarizing the results
3.4.1 Estimates of the variance parameters
The final model is given by (3-3), since the main effects of width as well as depth areclearly significant. Estimates of the two variance parameters are:
σ̂2Planks = 0.98982, σ̂2 = 0.63622
Uncertainties of these estimates on the standard deviation scale, given as 95% profilelikelihood confidence limits, are:
eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 9
2.5 % 97.5 %Planks 0.72 1.37
Residual 0.58 0.69
The remaining part of this subsection on post-hoc analysis and presentation of resultsillustrates how the information in factors can be summarized whenever the factor doesnot interact with any other factor.
3.4.2 Estimates of the fixed parameters
Estimates of the expected values (LS-means) for each level of depth, together with theiruncertainties and 95% confidence intervals, are:
Estimate SE Lower UpperDepth 1 4.7150 0.2361 4.2270 5.2030Depth 3 5.9050 0.2361 5.4170 6.3930Depth 5 6.1950 0.2361 5.7070 6.6830Depth 7 5.8633 0.2361 5.3753 6.3514Depth 9 4.6533 0.2361 4.1653 5.1414
and correspondingly, for each level of width:
Estimate SE Lower UpperWidth 1 5.5140 0.2303 5.0352 5.9928Width 2 5.7860 0.2303 5.3072 6.2648Width 3 5.0990 0.2303 4.6202 5.5778
3.4.3 Comparisons of the fixed parameters
A commonly used method of post hoc analysis is to compare either specific pairs ofdepths (respectively widths) or compare all combinations within each factor. For the
eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 10
former, a standard t-test can be used, e.g.,
t =β̂(1)− β̂(2)
SE(
β̂(1)− β̂(2))
using the error degrees of freedom (274). Or, equivalently, expressed using a 95% confi-dence interval:
β̂(1)− β̂(2)± t.975,274SE(
β̂(1)− β̂(2))
In this case, the estimates of the fixed effects are raw averages of the data based onthe same number of observations for each level, so the standard error of the differencebetween two depth levels is given by
SE(
β̂(1)− β̂(2))=√
2√
σ̂2/60
This means that two depth levels are claimed signifcantly different if they differ by morethan
t.975,274√
2√
σ̂2/60
from each other. This is also called the 95% Least Significant Difference (LSD) value.
It would be tempting to do such tests for all combinations of levels within each factor.This is generally not an acceptable approach, though, since the probability of “signifi-cance-by-chance” becomes too large when many tests are performed simultaneously.This is called the “multiplicity problem”. With five depth levels, there are 5× 4/2 =10 possible depth pairs to compare. Comparing two specific levels (decided on beforeseeing the data) is not the same as comparing the smallest among five with the largestamong five. In a case with no effects, one would always expect the two latter levels tobe more different, by chance, than the former.
There are numerous solutions to handling this problem properly, if all comparisons areindeed made. All of them amount to requiring differences to be larger than required bythe usual t-test to be claimed significant. One general idea, that can be used whenevernumerous tests are performed simultaneously, is the Bonferroni correction: If k tests areperformed simultaneously, then use level α/k in each test rather than α. For instance,if all depth levels are compared, standard pair-wise t-test output can be used, but em-ploying the significance level 0.5% in each test rather than 5%: That is, only claimingthose differences significant for which the usual P-value is less than 0.005. This methodis known to be somewhat conservative, meaning that it may be too critical, or, in otherwords again: it may miss some actual differences.
Another solution is to use another distribution than the t-distribution when compari-sons are made. With the so-called Tukey-Kramer method, two depth levels would be
eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 11
claimed signifcantly different if they differ by more than
ν.975,J,274
√σ̂2/60
from each other, where J is the number of groups to be compared and ν0.975,J,274 is the97.5%-quantile of the so-called “studentized range” distribution with J groups. This di-stribution takes into account that the two levels we compare in a single test come fromaltogether J groups. This distribution is, just like the t-distribution, tabulated or “avai-lable” on the computer. Note that when J = 2, then the studentized range distributioncorresponds to the t-distribution,
ν.975,2,274 = t.975,274√
2
The Tukey-adjusted results are:
Depth Parameter Estimate SE Lower Upper P-valuedifference1-3 β(1)− β(2) -1.1900 0.1162 -1.5090 -0.8710 <0.00011-5 β(1)− β(3) -1.4800 0.1162 -1.7990 -1.1610 <0.00011-7 β(1)− β(4) -1.1483 0.1162 -1.4673 -0.8294 <0.00011-9 β(1)− β(5) 0.06167 0.1162 -0.2573 0.3806 0.98413-5 β(2)− β(3) -0.2900 0.1162 -0.6090 0.02896 0.09433-7 β(2)− β(4) 0.04167 0.1162 -0.2773 0.3606 0.99643-9 β(2)− β(5) 1.2517 0.1162 0.9327 1.5706 <0.00015-7 β(3)− β(4) 0.3317 0.1162 0.01271 0.6506 0.03705-9 β(3)− β(5) 1.5417 0.1162 1.2227 1.8606 <0.00017-9 β(4)− β(5) 1.2100 0.1162 0.8910 1.5290 <0.0001
Note that since the p-values are “corrected”, that is, based on the more proper studen-tized range distribution, they can be used directly without any additional Bonferronicorrection. Similarly, for the width effect:
Width Parameter Estimate SE Lower Upper P-valuedifference1-2 α(1)− α(2) -0.2720 0.08997 -0.4840 -0.05998 0.00771-3 α(1)− α(3) 0.4150 0.08997 0.2030 0.6270 <0.00012-3 α(2)− α(3) 0.6870 0.08997 0.4750 0.8990 <0.0001
Often, the key information from the table for each effect is summarized in a simple tablein which the lsmeans are ordered by size:
eNote 3 3.4 POST HOC ANALYSIS AND SUMMARIZING THE RESULTS 12
EstimateDepth 9 4.6533aDepth 1 4.7150aDepth 7 5.8633bDepth 3 5.9050bcDepth 5 6.1950c
The letter subscripts express the 5% significance results of the 10 pair-wise comparisons:
• Two depths sharing a subscript are NOT significantly different.
• Two depths NOT sharing a subscript are significantly different.
Thus, the pattern already observed in Figure 3.2 can now be statistically confirmed: The-re is, clearly, lower humidity close to the top and the bottom (and no difference betweentop and bottom). Also, there is an indication that the center position has significantlyhigher humidity than the “in-between” positions (between which no difference is seen).
For the width effect, the summary table becomes particularly simple, since all threedifferences are significant:
EstimateWidth 3 5.0990aWidth 1 5.5140bWidth 2 5.7860c
For these data, a figure of the raw data, like one of the bottom plots from figure 3.2,together with a statement of the lack of significant width×depth interaction, and thetwo summary tables would probably suffice for most purposes. In later modules, wewill see how additional plots of the model expectations/details will provide informativefigures for interpretation.
Other types of post hoc analysis than the multiple comparison approach may be em-ployed, especially when quantitative information about the factor levels is available. Inthis case, we know exactly the positions which correspond to the different widths anddepths, and this could be used in the analysis. For instance, it could be investigatedwhether a quadratic function of the depths could be used to describe the humidity pat-tern. Apart from the nice direct functional interpretation of the dependence of humidityon depth, it could possibly provide more powerful tests for interaction effects. In fact,
eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 13
this would still be a “linear” model which could be handled by the lmer function fromthe lme4-package. We will return to such analyzes in a later module. Non-linear models(using, e.g., exponentials, etc.) could also be an option in some cases, but then the modelwill no longer be a linear model, and additional theory and packages would be needed.
The summary approach above was based on the assumption of no interaction betweenwidth and depth, that is, the conclusions regarding widths hold for all the depths, andvice versa. Had there been a significant interaction, we would have to present, say, thedepth effects for each of the three widths (and/or vice versa), since the significance tellsus that these three conclusions will NOT be the same. In practice, we would proceed asabove, BUT using the combined width×depth factor with 15 levels rather than for eachof them separately. We will see examples of this later.
One important step is missing from the analysis in this section: An investigation of thevalidity of the model assumptions! We will return to this issue in module 6, where wewill then finish the analysis of this data set on the humidity of beech wood planks.
3.5 R-TUTORIAL: Creating report ready tables and figures
Since reports without raw R-code or raw R-output are requested in this course (as wellas more generally), it is useful to be able to apply some of the tools given in R to createnice tables (and figures) for LATEX and/or Microsoft Word-based report writing.
3.5.1 Plot devices
First of all, there are different device functions for saving plots in various formats, e.g.,to save a plot as a pdf, write:
pdf("myplanksinteractionplot.pdf")
with(planks, interaction.plot(depth, width, humidity, col=2:4))
dev.off()
Note that dev.off() lets R know that no further graphics commands will follow. It turnsoff the graphics device and saves the figure to the designated file.
Or as a png (you choose the extension of the output file yourself, but it is clearly recom-mended to choose an extension that corresponds to the device function, here pdf orpng):
eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 14
png("myplanksinteractionplot.png")
with(planks, interaction.plot(depth, width, humidity, col=2:4))
dev.off()
Similarly, there are bmp, jpeg, and other device functions. Plots can also be exporteddirectly from the “Plots”-window in RStudio.
3.5.2 Plotting with colours
Colors can be specified in several different ways. Also, various plot functions may havevarious colour options for colouring different aspects of the plot. The simplest way tospecify a colour is with a character string giving the color name (e.g., "red"). A list ofthe possible colors can be obtained with the function colours, write:
colors (distinct = FALSE)
to see all the possible choices. Have a look at this website to see what all these colourslook like, or go to: the QuickR website.
Even more easily, you can use integers as colour codes. As a default, R uses a palette of8 colours:
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow"
[8] "gray"
which can then be refered to by the numbers 1-8. Then, it cycles modulus 8, meaningthat using 9 gives "black" again.
There are a number of pre-defined palettes that can be used when more (and better)collections of colours are needed, e.g., via the functions rainbow and hsv. For example,write:
eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 15
?heat.colors
which could then be used as follows (plots not shown):
par(mfrow=c(2,2))
with(planks, {interaction.plot(width, plank, humidity, legend=FALSE, col=heat.colors(20))
interaction.plot(depth, plank, humidity, legend=FALSE, col=terrain.colors(20))
interaction.plot(width, depth, humidity, col=topo.colors(5))
interaction.plot(depth, width, humidity, col=cm.colors(3))
})par(mfrow=c(1,1))
Or:
# Notice the value 10 is used to tell that you want 10 colors
# e.g. rainbow(10) gives 10 different colors. rainbow(5) gives 5 colors
with(planks, interaction.plot(width, depth, humidity, col=rainbow(5)))
Or:
with(planks, interaction.plot(width, depth, humidity, col=hsv(1:5/5)))
3.5.3 Report ready tables with xtable
Nice tables can be produced using the xtable function from the xtable-package. Anexample:
means <- as.matrix(with(planks, tapply(humidity, width, mean)))
xtable(means)
% latex table generated in R 3.5.1 by xtable 1.8-2 package
% Fri Sep 13 16:17:21 2019
\begin{table}[ht]
\centering
\begin{tabular}{rr}
eNote 3 3.5 R-TUTORIAL: CREATING REPORT READY TABLES AND FIGURES 16
\hline
& x \\
\hline
1 & 5.51 \\
2 & 5.79 \\
3 & 5.10 \\
\hline
\end{tabular}
\end{table}
When this tex-code is included in your tex-file it will appear in the report as in thefollowing table.
x1 5.512 5.793 5.10
Note how the input to xtable was a matrix here. The function is prepared to recognizea number of different R-objects, see, e.g.:
methods(xtable)
[1] xtable.anova* xtable.aov*
[3] xtable.aovlist* xtable.coxph*
[5] xtable.data.frame* xtable.emmGrid*
[7] xtable.glm* xtable.gmsar*
[9] xtable.lagImpact* xtable.lm*
[11] xtable.matrix* xtable.prcomp*
[13] xtable.ref.grid* xtable.sarlm*
[15] xtable.sarlm.pred* xtable.spautolm*
[17] xtable.sphet* xtable.splm*
[19] xtable.stsls* xtable.summary.aov*
[21] xtable.summary.aovlist* xtable.summary.glm*
[23] xtable.summary.gmsar* xtable.summary.lm*
[25] xtable.summary.prcomp* xtable.summary.ref.grid*
[27] xtable.summary.sarlm* xtable.summary.spautolm*
[29] xtable.summary.sphet* xtable.summary.splm*
[31] xtable.summary.stsls* xtable.summary_emm*
eNote 3 3.6 R-TUTORIAL: INITIAL EXPLORATIVE ANALYSIS 17
[33] xtable.table* xtable.ts*
[35] xtable.zoo*
see ’?methods’ for accessing help and source code
For instance, ANOVA-tables will be recognized. So a LATEX-user can then copy these tex-lines into the report’s .tex-document. Or, to integrate the R-code into the LATEX-code,use the knitR R-package to create the pure tex-file from an .Rnw file, which is a kindof LATEX-file with all the R-code integrated into it, with a lot of flexibility in controllingwhat will be shown/evaluated etc. in the output. This can be used for both raw code,results, tables, and figures.
A Microsoft Word user may also use xtable through the html-print-option:
print(xtable(means), type = "html")
<!-- html table generated in R 3.5.1 by xtable 1.8-2 package -->
<!-- Fri Sep 13 16:17:21 2019 -->
<table border=1>
<tr> <th> </th> <th> x </th> </tr>
<tr> <td align="right"> 1 </td> <td align="right"> 5.51 </td> </tr>
<tr> <td align="right"> 2 </td> <td align="right"> 5.79 </td> </tr>
<tr> <td align="right"> 3 </td> <td align="right"> 5.10 </td> </tr>
</table>
Then, the table may be printed directly into a file:
print(xtable(means), type = "html", file = "myhtmltable.html")
Open the file in a browser and copy-paste to Word.
3.6 R-TUTORIAL: Initial explorative analysis
The data set planks is imported as described in eNote 1. Assume that the data set iscalled planks in R.
The plots in Figure 3.2 are produced using the function interaction.plot() which
eNote 3 3.7 TEST OF OVERALL EFFECTS/MODEL REDUCTION 18
requires three arguments: first, the factor that is to be on the x-axis, then the factor thatseparates the data into distinct graphs, and finally the response variable. An optionalparameter legend, which takes either FALSE or TRUE, specifies whether or not a legendshould be added (relating the graphs to the factor levels).
The code that produced this figure was:
par(mar = c(3.5, 3.5, 1, 1), # smaller margin on top and right
mgp = c(2.4,0.7,0), # position of axis labels, ticks labels and axis
las=1)
planks <- read.table("planks.txt", header = TRUE, sep = ",")
Ylim <- c(3, 9)
par(mfrow=c(2,2))
with(planks, {interaction.plot(width, plank, humidity, ylim=Ylim, legend=FALSE,
bty="n", col=2:11, xtick = TRUE)
interaction.plot(depth, plank, humidity, ylim=Ylim, legend=FALSE,
bty="n", col=2:11, xtick = TRUE)
interaction.plot(width, depth, humidity, ylim=Ylim,
bty="n", col=2:11, xtick = TRUE)
interaction.plot(depth, width, humidity, ylim=Ylim,
bty="n", col=2:11, xtick = TRUE) })par(mfrow=c(1,1))
Notice that the with{ ... } function around the interaction.plot statements resultsin evaluation of the statements within a frame where the data set planks is available.This approach avoids the necessity of attaching data sets or referring to them repeatedly.
The function par is used to set a variety of graphical parameters (try typing ?par fordetails). The parameter mfrow is a vector of length two where the first component is thenumber of rows on the graphical device and the second component is the number ofcolumns. To return to the default use par(mfrow=c(1, 1)).
3.7 Test of overall effects/model reduction
In the previous section, we did not need to define the variables as factors in R to useinteraction.plot, but, in the following, we do. Configure the three variables depth,plank, and width as factors:
eNote 3 3.7 TEST OF OVERALL EFFECTS/MODEL REDUCTION 19
planks$plank <- factor(planks$plank)
planks$depth <- factor(planks$depth)
planks$width <- factor(planks$width)
Analysis of models including random effects can be done using the lmer function fromthe R-package lme4. The general model with a fixed-effects structure consisting of theinteraction between two factors and random effects assigned to the plank is specified asfollows
require(lme4)
model1 <- lmer(humidity ~ depth*width + (1|plank), data = planks)
Note that the fixed-effects structure is specified as either depth + width + depth:width
or depth*width like here — they give the same model. The relevant tests of the fixed-effects structure are obtained by applying anova(model1) after making sure that thelmerTest-package is available
require(lmerTest)
anova(model1)
Type III Analysis of Variance Table with Satterthwaite’s method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
depth 126.388 31.5970 4 266 78.259 < 2.2e-16 ***
width 23.939 11.9696 2 266 29.646 2.381e-12 ***
depth:width 3.501 0.4377 8 266 1.084 0.3745
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
lmerTest automatically loads lme4, so we could have just run require(lmerTest) fromthe beginning instead. The interaction is not significant, and a reduced model can beformulated
model2 <- lmer(humidity ~ depth + width + (1|plank), data = planks)
anova(model2)
Type III Analysis of Variance Table with Satterthwaite’s method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS20
depth 126.388 31.597 4 274 78.068 < 2.2e-16 ***
width 23.939 11.970 2 274 29.574 2.348e-12 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Both factors are highly significant, and no further reduction is possible.
3.8 R-TUTORIAL: Post hoc analysis and summarizing theresults
Estimates of the variance parameters are found with
VarCorr(model2)
Groups Name Std.Dev.
plank (Intercept) 0.98980
Residual 0.63619
Note that the estimates are given on the standard-deviation scale — not the variance-scale.
The so-called profile likelihood-based confidence intervals for the two variance para-meters are found with:
m2prof <- profile(model2, which=1:2, signames=FALSE)
confint(m2prof)
2.5 % 97.5 %
sd_(Intercept)|plank 0.7202931 1.3719277
sigma 0.5806037 0.6852898
By default, the profile function profiles the likelihood for all model parameters, butsince profiling is time-consuming, and since we are only interested in the profile like-lihood confidence intervals for the two variance parameters, we set the which = 1:2
option.
eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS21
Like in eNote 1, we can use emmeans to compute the estimated mean levels and theirdifferences:
require(emmeans)
emmeans::emmeans(model2, ~ depth)
depth emmean SE df lower.CL upper.CL
1 4.71 0.236 23.3 4.23 5.20
3 5.91 0.236 23.3 5.42 6.39
5 6.20 0.236 23.3 5.71 6.68
7 5.86 0.236 23.3 5.38 6.35
9 4.65 0.236 23.3 4.17 5.14
Results are averaged over the levels of: width
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
emmeans::emmeans(model2, pairwise ~ width)
$emmeans
width emmean SE df lower.CL upper.CL
1 5.51 0.23 21.1 5.04 5.99
2 5.79 0.23 21.1 5.31 6.26
3 5.10 0.23 21.1 4.62 5.58
Results are averaged over the levels of: depth
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
1 - 2 -0.272 0.09 274 -3.023 0.0077
1 - 3 0.415 0.09 274 4.613 <.0001
2 - 3 0.687 0.09 274 7.636 <.0001
Results are averaged over the levels of: depth
P value adjustment: tukey method for comparing a family of 3 estimates
Observe that writing pairwise ~ generates all pairwise differences of the LS-means.
eNote 3 3.8 R-TUTORIAL: POST HOC ANALYSIS AND SUMMARIZING THE RESULTS22
The multcomp package also includes the so-called compact letter displays:
require(multcomp)
tuk2 <- glht(model2, linfct = mcp(depth = "Tukey"))
tuk.cld2 <- cld(tuk2)
tuk.cld2 # Display the CLD
1 3 5 7 9
"a" "bc" "c" "b" "a"
# Plot the compact-letter-display:
old.par <- par(no.readonly=TRUE) # Save current graphics parameters
par(mai=c(1,1,1.25,1)) # Use sufficiently large upper margin
plot(tuk.cld2, col=2:6)
1 3 5 7 9
34
56
78
depth
linea
r pr
edic
tor
a
b c
c
b
a
eNote 3 3.9 EXERCISES 23
par(old.par) # reset graphics parameters
Finally, the lmerTest-package has a ranova function which produces an ANOVA-liketable of χ2-tests of the random effects in a mixed model:
ranova(model2)
ANOVA-like table for random-effects: Single term deletions
Model:
humidity ~ depth + width + (1 | plank)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 9 -331.91 681.82
(1 | plank) 8 -474.84 965.68 285.85 1 < 2.2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
3.9 Exercises
Exercise 1 Colour of spinach
Spinach heated to 90 or 100 degrees Celcius was vacuum packed and stored for 0, 1,or 2 weeks before the packages were opened and chill-stored in normal atmospherefor 0, 1, or 2 days. Then, the colour was measured on a Hunter Lab. Two of the colourcoordinates, a and b (measuring, respectively, something like red and yellow colour),were recorded, and are given in the data set below. The variable batch is a blockingvariable referring to two batches of spinach. The data is available in the file spinage.txtand listed here:
Batch temp weeks days a b
A 90 0 0 -7.19 8.89
A 90 0 1 -7.17 9.11
A 90 0 2 -7.49 9.69
A 90 1 0 -7.43 9.97
A 90 1 1 -7.07 9.09
eNote 3 3.9 EXERCISES 24
A 90 1 2 -7.16 9.19
A 90 2 0 -6.69 10.07
A 90 2 1 -6.80 9.13
A 90 2 2 -6.93 9.58
A 100 0 0 -7.54 9.09
A 100 0 1 -7.19 8.74
A 100 0 2 -7.11 8.63
A 100 1 0 -7.16 8.92
A 100 1 1 -7.23 8.89
A 100 1 2 -7.38 9.36
A 100 2 0 -5.28 10.41
A 100 2 1 -5.71 9.72
A 100 2 2 -7.35 10.10
B 90 0 0 -7.45 9.81
B 90 0 1 -7.53 9.52
B 90 0 2 -7.54 9.89
B 90 1 0 -6.88 9.35
B 90 1 1 -7.16 9.55
B 90 1 2 -6.56 8.91
B 90 2 0 -7.07 10.39
B 90 2 1 -6.13 9.52
B 90 2 2 -6.63 9.43
B 100 0 0 -7.45 9.23
B 100 0 1 -7.75 9.18
B 100 0 2 -7.58 9.32
B 100 1 0 -7.10 8.97
B 100 1 1 -7.06 9.16
B 100 1 2 -6.93 9.08
B 100 2 0 -7.17 10.34
B 100 2 1 -7.30 9.99
B 100 2 2 -6.64 9.31
a) Write down all the factors relevant to the analysis, including their levels and mutu-al structure. For example, are they crossed or nested? Make the factor structurediagram.
b) Analyse the effect of the different factors on the two colour measurements, andsummarize the significant effects (LS-means, etc.).
eNote 3 3.9 EXERCISES 25
Exercise 2 Sensory evaluation of spinach
In the spinach experiment from exercise 3.1, sensory evaluations were performed inaddition to the colour measurements. The treatments were still the same, so the factorswere heating temperature, original storage (weeks), storage after opening (days), andbatch.
The products from each treatment combination, from each batch, were assessed by (so-me of) 7 assessors, who gave a score (between 0 and 15) for each of 6 different sensoryproperties (see the list further below).
There was one sesssion for each combination of batch and weeks, and at each sessionthe assessors evaluated the same 6 products (6 combinations of days and temperature).Note that not all assessors were present at all sessions.
The results, with one line per evaluation, are given in the order:weeks of storage, days after opening, batch, temperature, session number, assessor num-ber, and the six sensory properties hay flavour 1, hay flavour 2, hay taste, spinach fla-vour 1, spinach flavour 2, spinach taste.
The data is available in the file spinagesens.txt and partly listed below:
0 0 A 90 1 1 4.1 3.6 4.6 3.9 9.3 5
0 0 A 90 1 2 . . . . . .
0 0 A 90 1 3 . . . . . .
0 0 A 90 1 4 6 3.7 4.5 5.4 10.8 10.2
0 0 A 90 1 5 8.6 4.1 6.7 3.8 10 7.2
0 0 A 90 1 6 4.3 3.8 5.1 7.1 10.8 9.6
0 0 A 90 1 7 8.9 5.7 7 4.7 8.8 8.3
0 0 A 100 1 1 2.6 .8 6.2 2.7 8.7 6.3
0 0 A 100 1 2 . . . . . .
0 0 A 100 1 3 . . . . . .
0 0 A 100 1 4 6.1 2.5 4.6 6.4 11 11.3
0 0 A 100 1 5 5.9 6.5 5.5 8.7 8.4 7.2
0 0 A 100 1 6 3.8 2.8 3.7 4.9 10.7 8.9
0 0 A 100 1 7 10.4 4.3 7.1 3.3 7 8.6
0 0 B 90 4 1 3.5 4.3 6.7 4.1 9 10.6
0 0 B 90 4 2 . . . . . .
eNote 3 3.9 EXERCISES 26
. . . . . . . . . (252 lines in total)
2 2 B 100 6 6 3.6 3.7 3.9 4.4 5.9 7.4
2 2 B 100 6 7 . . . . . .
a) Write down the factors relevant to the analysis, including their levels and mutualstructure. [You should include a production factor corresponding to the combina-tions of temperature, weeks, days, and batch.]
b) Make the factor structure diagram for one of the sensory variables including all thefactors (quite big and complicated - do it if you feel it helps your understanding).
c) Specify which effects you want to include in the model. Pay particular attentionto which interactions you want in the model. [Include at least some of the inte-ractions between assessor and treatment factors]. Which effects are random andwhich are fixed?
d) Perform the analysis for one of the sensory properties and draw conclusions.