49
SPSS Step 1: Normality tests go to Analyze Descriptive Statistics <select> Explore from the drop down list. In the box that opens enter the column identifier for the data that you wish to test in the Dependent List box. Click on Plots and tick the Normality plots with tests option. Click Continue then click Ok. The relevant output for this test can be found in the following table: Tests of Normality Kolmogorov-Smirnov a Shapiro-Wilk Statistic Df Sig. Statistic df Sig. Metabolism .140 15 .200 * .957 15 .633 a. Lilliefors Significance Correction *. This is a lower bound of the true significance. The significance of the test is indicated by the p value in the table. If the p value is less than 0.05 then the distribution of data differs significantly from normal. If it is > .05 then the data can be considered normally distributed. A normality plot will also be shown called Normal Q-Q Plot of ‘column identifier’. EXAMINE VARIABLES=Metabolism /PLOT BOXPLOT STEMLEAF NPPLOT /COMPARE GROUPS /STATISTICS DESCRIPTIVES /CINTERVAL 95

SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Embed Size (px)

Citation preview

Page 1: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

SPSS

Step 1: Normality tests

go to Analyze – Descriptive Statistics <select> Explore from the drop down list. In the box that opens enter the column identifier for the data that you wish to test in the Dependent List box. Click on Plots and tick the Normality plots with tests option. Click Continue then click Ok. The relevant output for this test can be found in the following table:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic Df Sig. Statistic df Sig.

Metabolism .140 15 .200* .957 15 .633

a. Lilliefors Significance Correction

*. This is a lower bound of the true significance.

The significance of the test is indicated by the p value in the table. If the p value is less than 0.05 then the distribution of data differs significantly from normal. If it is > .05 then the data can be considered normally distributed. A normality plot will also be shown called Normal Q-Q Plot of ‘column identifier’.

EXAMINE VARIABLES=Metabolism /PLOT BOXPLOT STEMLEAF NPPLOT /COMPARE GROUPS /STATISTICS DESCRIPTIVES /CINTERVAL 95

Page 2: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

/MISSING LISTWISE /NOTOTAL.

Step 3: Attempt to normalise the distribution by transforming it.

Data can be easily transformed by using the Transform – Compute Variable command. Enter a name for your new variable in the Target Variable box and enter your transformation in the Numeric Expression box (e.g., LG10(Variable name)). SPSS will create a new column with the transformed variable.

COMPUTE LOGMetabolism=LG10(Metabolism). EXECUTE.

Alternatively data may be transformed using the Box-Cox procedure

Go to Transform – Prepare Data for Modelling <select> Automatic from the drop down list. In the Fields tab you can specify which variables to transform by moving them to the Inputs box. In the Settings tab click on Rescale Fields. Tick the box before ‘Rescale a continuous target with a Box-Cox transformation to reduce skew’. Click Run. This will create a new column with the transformed variable.

Step 7: Paired comparisons of parametric normally distributed data For two sample t-test Go to Analyze – Compare Means <select> Independent Samples T-Test from the drop down list. This will open a new window. Data should be organized as shown in Table 1. Add the variable to be tested in the Test Variable(s) box. You can enter multiple variables (e.g., metabolism and body mass) if you want. Identify which data belong to which treatment by entering your treatment column into the Grouping Variable box. You now need to define your treatment groups by clicking the Define Groups button. A new window will pop up. When using numeric treatment groups enter the values for Group 1 and Group 2 (i.e., 1 and 2 respectively in our example in Table 1). If you have used string codes you need to identify the codes between apostrophe’s, e.g., for control = C and treatment =T, enter “C” for Group 1 and “T” for Group 2. Click Continue. Click Ok. The output appears as follows.

Group Statistics

Treatment N Mean Std. Deviation Std. Error Mean

Metabolism 1.00 10 1.5380 .19510 .06169

2.00 5 .9760 .20403 .09125

Page 3: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Independent Samples Test

Levene's

Test for

Equality of

Variances t-test for Equality of Means

F Sig. t df

Sig.

(2-

tailed)

Mean

Difference

Std. Error

Difference

95% Confidence Interval

of the Difference

Lower Upper

Metabolism Equal

variances

assumed

.348 .565 5.185 13 .000 .56200 .10839 .32784 .79616

Equal

variances

not

assumed

5.102 7.771 .001 .56200 .11015 .30670 .81730

The first table summarizes your data and gives sample sizes, Mean, SD and SEM. The second table gives the results for the t-test. A t-test assumes equal variances and the first two columns in the table show the results for Levene’s test for equality of variances. If p>0.05 variances are equal and you will use the top row. The next 3 columns give the results for the independent t-test (t-value, degrees of freedom and p-value respectively). In this case p<0.05 and the differences between the groups is statistically significant.

T-TEST GROUPS=treatment(1 2) /MISSING=ANALYSIS /VARIABLES=Metabolism /CRITERIA=CI(.95). For Paired t-test

Go to Analyze – Compare Means <select> Paired Samples T-Test from the drop down list. This will open a new window. Data should be organized as shown in Table 2. Add the column identifiers of interest into the Paired Variables box next to pair 1 (i.e., under variable1 and variable2). Click Ok.The output appears as follows.

Paired Samples Correlations

N Correlation Sig.

Pair 1 Metabolism & Metabolism2 15 .909 .000

Paired Samples Test

Paired Differences t df Sig. (2-

Page 4: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Mean

Std.

Deviation

Std. Error

Mean

95% Confidence Interval

of the Difference

tailed)

Lower Upper

Pair

1

Metabolism -

Metabolism2

-.12000 .14933 .03856 -.20270 -.03730 -3.112 14 .008

Paired Samples Statistics

Mean N Std. Deviation Std. Error Mean

Pair 1 Metabolism 1.3507 15 .33401 .08624

Metabolism2 1.4707 15 .25018 .06460

The first table shows if there is a significant correlation between the two measurements of interest. In this case there is a significant correlation between both measurements of metabolism (p<0.01). This need not always be the case, and the results for the paired t-test are valid also if there is not a significant relationship between the variables of interest. The last three columns of the second table show the results for your paired t-test (t-value, degrees of freedom and p-value respectively). In this case p<0.05 and a significant effect of treatment is shown. The third table shows your descriptive statistics (mean, sd and sem).

T-TEST PAIRS=Metabolism WITH Metabolism2 (PAIRED) /CRITERIA=CI(.9500) /MISSING=ANALYSIS.

Step 8: multiple treatment levels, parametric tests

For One Way ANOVA Go to Analyze – Compare Means <select> One way ANOVA from the drop down list. This will open a new window. Data should be organized as shown in Table 1. Add variable of interest (e.g., metabolism) to Dependent List box. Add column identifier for treatment levels to the Factor box (e.g., levels). Click Ok. The output appears as follows.

ANOVA

Metabolism

Sum of Squares df Mean Square F Sig.

Between Groups 1.342 2 .671 36.582 .000

Within Groups .220 12 .018 Total 1.562 14

The F and P values for the treatment effect and for the individual effect are shown in the last two columns of the table. A p-value < than .05 indicates a significant effect. In this example there was a significant treatment effect.

Page 5: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

ONEWAY Metabolism BY levels /MISSING ANALYSIS. For Repeated Measured ANOVA

Data should be organized as shown in Table 2. Go to Analyze – General Linear Model <select> Repeated Measures from the drop down list. This will open a new window. Add the ‘Number of levels’ that you have in the appropriate box (i.e., the number of repeated measurements; i.e., in this case 3). Then click Add. Click Define. This will open a new window. Add the column identifiers of interest into the Within-Subjects Variables box. The number of levels that you have identified in the previous step will show up in the box and you need to add the column identifiers for each level. If you used different levels of treatments or other factor that are different between subjects like for instance sex, this can be added to the Between-Subjects Factor(s) box. Click Ok. The relevant output for this test can be found in the following tables:

Tests of Within-Subjects Effects

Measure:MEASURE_1

Source

Type III Sum of

Squares df Mean Square F Sig.

Treatment Sphericity Assumed .109 2 .054 3.779 .035

Greenhouse-Geisser .109 1.799 .060 3.779 .041

Huynh-Feldt .109 2.000 .054 3.779 .035

Lower-bound .109 1.000 .109 3.779 .072

Error(Treatment) Sphericity Assumed .403 28 .014 Greenhouse-Geisser .403 25.189 .016 Huynh-Feldt .403 28.000 .014 Lower-bound .403 14.000 .029

Tests of Between-Subjects Effects

Measure:MEASURE_1

Transformed Variable:Average

Source

Type III Sum of

Squares Df Mean Square F Sig.

Intercept 89.183 1 89.183 445.546 .000

Error 2.802 14 .200

The Tests of Within-Subject Effects Table shows the results for your repeated measures (e.g., metabolism measured at different time points in the same individual). In our example (Table 2) this refers to metabolism measured at control, treatment 1 and treatment 2. The first row (Spericity Assumed) shows if there was a significant effect of treatment (F1,28=3.8, p=0.035 in our example).

Page 6: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

If you added a Between-Subjects Factor (e.g., sex) the F and p-values will be shown in the last two columns of the Tests of Between-Subjects Effects Table. In this case, no value is shown because no factor was added. Note that sphericity is assumed in a RM ANOVA. A violation of sphericity occurs when the variances of the differences between all combinations of the groups are not equal and this is tested in the Mauchly's Test of Sphericity test for which the results are given as part of the SPSS output in a RM ANOVA (in Mauchly's Test of Sphericity table). When the probability of Mauchly's test statistic is less than or equal to 0.05 (i.e., p < .05), sphericity cannot be assumed and a correction needs to be made to the F and p-value. SPSS provides three corrections, Greenhouse-Geisser, Huynh-Feldt and Lower-bound. For more details about these tests we would refer you to a statistical text book.

GLM Metabolism1 Metabolism2 Metabolism3 /WSFACTOR=Treatment 3 Polynomial /METHOD=SSTYPE(3) /CRITERIA=ALPHA(.05) /WSDESIGN=Treatment.

Step 10: post hoc tests

For One Way ANOVA

Conduct the one-way ANOVA as described under section 9. However before clicking on OK, click on the button labelled Post Hoc. This will open a new window. Choose the test you want to use by ticking the appropriate box, e.g., Tukey test. Click Continue and click OK. This time in addition to the ANOVA results there is an additional output below the analysis of variance table as follows.

Post Hoc Tests

Multiple Comparisons

Metabolism

Tukey HSD

(I) levels (J) levels

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

1.00 2.00 .34000* .08565 .005 .1115 .5685

3.00 .73200* .08565 .000 .5035 .9605

2.00 1.00 -.34000* .08565 .005 -.5685 -.1115

3.00 .39200* .08565 .002 .1635 .6205

3.00 1.00 -.73200* .08565 .000 -.9605 -.5035

2.00 -.39200* .08565 .002 -.6205 -.1635

Page 7: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Multiple Comparisons

Metabolism

Tukey HSD

(I) levels (J) levels

Mean Difference

(I-J) Std. Error Sig.

95% Confidence Interval

Lower Bound Upper Bound

1.00 2.00 .34000* .08565 .005 .1115 .5685

3.00 .73200* .08565 .000 .5035 .9605

2.00 1.00 -.34000* .08565 .005 -.5685 -.1115

3.00 .39200* .08565 .002 .1635 .6205

3.00 1.00 -.73200* .08565 .000 -.9605 -.5035

2.00 -.39200* .08565 .002 -.6205 -.1635

*. The mean difference is significant at the 0.05 level.

Homogeneous Subsets

Metabolism

Tukey HSDa

Levels N

Subset for alpha = 0.05

1 2 3

3.00 5 .9760 2.00 5 1.3680 1.00 5 1.7080

Sig. 1.000 1.000 1.000

Means for groups in homogeneous subsets are displayed.

a. Uses Harmonic Mean Sample Size = 5.000.

The Multiple Comparisons table shows pair-wise comparisons for the different levels of treatment. P values<0.05 between groups indicate the groups differ significantly. The Homogeneous Subsets table summarises the results from the multiple comparisons and shows the mean values for the different levels of treatment and whether they differ or not. In this case metabolism is different between all levels of treatment.

ONEWAY Metabolism BY levels /MISSING ANALYSIS /POSTHOC=TUKEY ALPHA(0.05) For Repeated Measures ANOVA

Conduct the Repeated Measures ANOVA as described under section 9. However before clicking on OK, click on the button labelled Post Hoc. This will open a new window. Add the Factor of interest to the ‘Post Hoc Tests For’ box. Choose the test you want to use by ticking the appropriate box, e.g., Tukey test. Click Continue and click OK. This time in addition to the ANOVA results there is an additional output below the analysis of variance that is similar to the output shown above for the One- Way ANOVA.

Page 8: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

GLM Metabolism Metabolism2 BY levels /WSFACTOR=Time 2 Polynomial /METHOD=SSTYPE(3) /POSTHOC=levels(TUKEY) /CRITERIA=ALPHA(.05) /WSDESIGN=Time /DESIGN=levels.

Step 11

SPSS does not provide the capability to perform power analysis. Alternative programs need to be used instead.

Step 13 : two way anova

For Two-way ANOVA Data should be organized as shown in Table 1. Go to Analyze – General Linear Model <select> Univariate from the drop down list. This will open a new window. Add your variable to be tested to the Dependent Variable box (e.g., metabolism). Add your fixed factors to the Fixed Factor(s) box (e.g., treatment and sex). The output appears as follows.

Between-Subjects Factors

N

Sex .00 9

1.00 6

Levels 1.00 5

2.00 5

3.00 5

Tests of Between-Subjects Effects

Dependent Variable:Metabolism1

Source

Type III Sum of

Squares df Mean Square F Sig.

Corrected Model 1.417a 5 .283 17.596 .000

Intercept 26.645 1 26.645 1654.406 .000

Sex .033 1 .033 2.065 .185

Levels 1.210 2 .605 37.577 .000

Sex * Levels .042 2 .021 1.300 .319

Error .145 9 .016 Total 28.926 15 Corrected Total 1.562 14

a. R Squared = .907 (Adjusted R Squared = .856)

Page 9: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

The last two columns of the Tests of Between-Subjects Effects table show the F and p-values. In this case there was a significant effect of treatment level (p<0.001), but there was no significant effect of sex (p=0.185) and no significant sex by treatment interaction (p=0.319). A significant interaction effect implies that both sexes responded differently to the treatment (e.g., one sex in or decreased more than the other).

For Repeated Measures Two-way ANOVA Data should be organised as shown in Table 2. As Repeated measures one-way ANOVA (see 9), but add a Between-Subjects Factor (e.g., sex). In the output the extra factor will be included in both the Tests of Within and between effects tables.

Step 14: Same as 13 for Two-way ANOVA, but add extra fixed factor(s) in the Fixed Factor(s) box. Factors can be added to or removed from the model by using the Model button. This will open a new window. Tick Custom and add/remove variables of interest.

Step 18: non parametric tests paired comparisons

Mann whitney U-test Go to Analyze – Nonparametric tests – Legacy dialogs <select> Two independent samples from the drop down list. This will open a new window. Data should be organized as shown in Table 1. Add variable to be tested to the Test Variable list box (e.g., metabolism). Identify which data belong to which treatment by entering your treatment column into the Grouping Variable box. You now need to define your treatment groups by clicking the Define Groups button. A new window will pop up. When using numeric treatment groups enter the values for Group 1 and Group 2 (i.e., 1 and 2 respectively in our example in Table 1). If you have used string codes you need to identify the codes between apostrophe’s, e.g., for control = C and treatment =T, enter “C” for Group 1 and “T” for Group 2. Click Continue. Make sure the box before Mann-Whitney U is ticked. Click Ok. The output appears as follows.

Ranks

treatment N Mean Rank Sum of Ranks

Metabolism 1.00 10 10.50 105.00

2.00 5 3.00 15.00

Total 15

Test Statisticsb

Metabolism

Page 10: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Mann-Whitney U .000

Wilcoxon W 15.000

Z -3.067

Asymp. Sig. (2-tailed) .002

Exact Sig. [2*(1-tailed Sig.)] .001a

a. Not corrected for ties.

b. Grouping Variable: treatment

Z and p-value (Asump. Sig. (2-tailed)) are shown in the Test statistics table.

NPAR TESTS /M-W= Metabolism BY treatment(1 2) /MISSING ANALYSIS.

Wilcoxon matched pairs test Go to Analyze – Nonparametric Tests – Legacy dialogs <select> Two Related Samples from the drop down list. This will open a new window. Data should be organized as shown in Table 2. Add the column identifiers of interest into the Test Pairs box next to pair 1 (i.e., under variable1 and variable2). Make sure the box before Wilcoxon in ticked. Click Ok. The output appears as follows.

Ranks

N Mean Rank Sum of Ranks

Metabolism2 - Metabolism Negative Ranks 3a 4.00 12.00

Positive Ranks 12b 9.00 108.00

Ties 0c Total 15

a. Metabolism2 < Metabolism

b. Metabolism2 > Metabolism

c. Metabolism2 = Metabolism

Test Statisticsb

Metabolism2 –

Metabolism

Z -2.728a

Asymp. Sig. (2-tailed) .006

a. Based on negative ranks.

b. Wilcoxon Signed Ranks Test Z and p-value (Asump. Sig. (2-tailed)) are shown in the Test statistics table.

NPAR TESTS /WILCOXON=Metabolism WITH Metabolism2 (PAIRED)

Page 11: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

/MISSING ANALYSIS.

Step 19 non parametric analysis when there are multiple treatments or levels:

Kruskal-Wallis ANOVA Go to Analyze – Nonparametric tests – Legacy dialogs <select> k independent samples from the drop down list. This will open a new window. Data should be organized as shown in Table 1. Add variable to be tested to the Test Variable list box (e.g., metabolism). Identify which data belong to which treatment by entering your treatment column into the Grouping Variable box. You now need to define your treatment groups by clicking the Define Range button. A new window will pop up. Enter Minimum and Maximum values for your groups (i.e., 1 - 3 respectively for the treatment levels in our example in Table 1). Click Continue. Make sure the box before Kruskal-Wallis H is ticked. Click Ok. The output appears as follows.

Ranks

levels N Mean Rank

Metabolism 1.00 5 13.00

2.00 5 8.00

3.00 5 3.00

Total 15

Test Statisticsa,b

Metabolism

Chi-Square 12.545

Df 2

Asymp. Sig. .002

a. Kruskal Wallis Test

b. Grouping Variable: levels

X2 and p-value (Asump. Sig. (2-tailed)) are shown in the Test Statistics table.

NPAR TESTS /K-W=Metabolism BY levels(1 3) /MISSING ANALYSIS. Repeated measures Friedman Test Go to Analyze – Nonparametric Tests – Legacy dialogs <select> k Related Samples from the drop down list. This will open a new window. Data should be organized as shown in Table 2. Add the column identifiers of interest into the Test Variables. Click Ok. The output appears as follows.

Page 12: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Ranks

Mean Rank

Metabolism 1.20

Metabolism2 1.80

Test Statisticsa

N 15

Chi-Square 5.400

Df 1

Asymp. Sig. .020

a. Friedman Test

X2 and p-value (Asump. Sig. (2-tailed)) are shown in the Test Statistics table.

Step 23 Analysis of Covariance: ANCOVA

Go to Analyze – General Linear Model <select> Univariate from the drop down list. This will open a new window. Data should be organized as shown in Table 1. Add your variable to be tested to the Dependent Variable box (e.g., metabolism). Add the column identifier for treatments to the Fixed Factor(s) box (e.g., levels) and add the covariate (e.g., body mass) to the Covariate box. The output appears as follows.

Tests of Between-Subjects Effects

Dependent Variable: Metabolism

Source Type III Sum of

Squares

df Mean Square F Sig.

Corrected Model 1.403a 3 .468 32.441 .000

Intercept .001 1 .001 .065 .804

Bodymass .061 1 .061 4.264 .063

Levels 1.325 2 .663 45.957 .000

Error .159 11 .014 Total 28.926 15 Corrected Total 1.562 14

a. R Squared = .898 (Adjusted R Squared = .871)

F and P-values are shown in the final two columns. In this example the bodyweight effect did not reach statistical significance (p=0.063), but there was a significant effect of treatment (p<0.05). In SPSS the default full factorial model does not include an interaction effect between body mass and treatment.

Page 13: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

To include this interaction rerun the analysis (go to Analyze – General Linear Model <select> Univariate from the drop down list) and click the Model button. This will open a new window. Tick the Custom box and then click and add factors as appropriate using the arrow button. To add an interaction effect, select two factors (in this case body mass and levels) and click the arrow button and the interaction effect will show up in the right box (i.e., levels x body mass). The output is as follows.

Tests of Between-Subjects Effects

Dependent Variable: Metabolism

Source Type III Sum of

Squares

df Mean Square F Sig.

Corrected Model 1.439a 5 .288 21.138 .000

Intercept .004 1 .004 .288 .604

Levels .056 2 .028 2.055 .184

Bodymass .077 1 .077 5.644 .042

Levels * Bodymass .036 2 .018 1.323 .314

Error .123 9 .014 Total 28.926 15 Corrected Total 1.562 14

a. R Squared = .922 (Adjusted R Squared = .878)

Note that when the interaction effect is included there is no significant effect of

treatment level. The interaction effect is also not significant. In this case you would

remove the interaction effect and analyse the data without the interaction effect as

shown above.

Step 37 Correlation matrix

Select Analyse <select> Correlation and <select> Bivariate. This will open a new window. Add variables of interest to the variables box. Tick relevant box under correlation coefficients (i.e., Pearson). The output is a correlation matrix.

A typical matrix might be as follows for a situation where 5 organ weights are available. Organ Liver WAT Brain Skeletal

muscle BAT

Liver 1.00 0.32 0.13 0.66 0.22

WAT 1.00 0.17 0.55 0.93

Brain 1.00 0.11 0.03

Page 14: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Skeletal muscle

1.00 0.61

This table highlights that WAT and BAT are highly correlated and hence not independent predictor variables. Skeletal muscle also is quite highly correlated to the liver mass. One can proceed with the analysis ignoring these effects but one should be aware that such correlations may compromise the outcome. In this case a strong effect of WAT might emerge because of the effect of BAT on metabolism combined with the high correlation of WAT with BAT. This analysis requires the number of observations (i.e., individuals) to exceed by at least a factor of 3 the number of predictor variables included into the analysis. Hence in this situation one would have 5 predictors so for each group (i.e., treatment levels and control) one would need at least 15 individuals – and preferably many more. Interpretation of these effects depends on the complexity of the interactions. The bottom line is to diagnose an overall treatment effect controlling for these body composition variables. If there is an overall treatment effect one can establish where this occurs using the multiple range tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST).

Step 38: PRINCIPAL COMPONENTS ANALYSIS.

Principal Component Analysis can be performed using SPSS, but the procedure to do so is hidden within the procedure for factor analysis. Go to Analyze <select> Dimension Reduction <select> Factor. This opens a new window. Add the column identifiers of the variables of interest to the Variables box. Click the Extraction button. This will open a new window. Under Method select Principal Components using the drop-down menu. Tick Correlation matrix and Unrotated factor solution. To restrict the number of components tick Fixed number of Factors and type the number of components you want in the Factors to extract box (e.g., 5). Click Continue this will take you back to the first window. Click Ok. The output is as follows.

Communalities

Initial Extraction

Carcass 1.000 .924

HEART 1.000 .874

LIVER 1.000 .784

KIDNEY 1.000 .926

BRAIN 1.000 .672

Brown Fat 1.000 .861

Abdominal Fat 1.000 .926

Gonadal Fat 1.000 .939

Mesenteric Fat 1.000 .836

Gonads 1.000 .914

Large Intestine (g) 1.000 .807

Small Intestine (g) 1.000 .735

Stomach 1.000 .805

Page 15: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Lungs 1.000 .911

Pancreas 1.000 .839

Pelage 1.000 .942

Tail 1.000 .895

Extraction Method: Principal Component Analysis.

Total Variance Explained

Component

Initial Eigenvalues Extraction Sums of Squared Loadings

Total % of Variance Cumulative % Total % of Variance Cumulative %

1 7.688 45.221 45.221 7.688 45.221 45.221

2 3.430 20.176 65.396 3.430 20.176 65.396

3 1.614 9.493 74.889 1.614 9.493 74.889

4 1.060 6.235 81.123 1.060 6.235 81.123

5 .798 4.696 85.819 .798 4.696 85.819

6 .648 3.815 89.634 7 .550 3.237 92.871 8 .354 2.083 94.953 9 .283 1.663 96.616 10 .180 1.061 97.677 11 .125 .735 98.412 12 .103 .605 99.018 13 .075 .440 99.458 14 .046 .272 99.731 15 .023 .134 99.865 16 .019 .110 99.974 17 .004 .026 100.000 Extraction Method: Principal Component Analysis.

Component Matrixa

Component

1 2 3 4 5

Carcass .923 -.198 -.090 -.145 .069

HEART .655 .530 .067 .212 .337

LIVER .785 -.399 .021 -.017 .089

KIDNEY .800 .501 .113 -.035 -.145

BRAIN .373 .687 .229 -.090 .012

Brown Fat -.639 -.218 .615 .021 -.161

Abdominal Fat .625 -.617 .178 .335 -.106

Gonadal Fat .743 -.472 .140 .287 -.250

Mesenteric Fat .796 -.179 .310 .202 .182

Gonads .386 -.559 -.186 -.465 .449

Large Intestine (g) .869 -.025 -.124 .009 .187

Page 16: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Small Intestine (g) .412 .107 .730 -.132 .059

Stomach .370 .589 .430 -.368 -.027

Lungs -.052 .689 -.145 .586 .264

Pancreas .765 .223 -.320 -.068 -.312

Pelage .909 -.283 -.054 .056 -.173

Tail .635 .518 -.340 -.168 -.284

Extraction Method: Principal Component Analysis.

a. 5 components extracted.

Page 17: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

MINITAB

Step 1: Normality tests

Go to statistics tab, <select> basic statistics from the drop down list <select> normality test (second from bottom). In the box that opens enter the column identifier for the data that you wish to test. Click on the actual test you wish to perform (e.g. Anderson-Darling test).

A typical output for this test looks like the following for the data in table 1 column 2

2.252.001.751.501.251.000.750.50

99

95

90

80

70

60504030

20

10

5

1

Metabolism

Pe

rce

nt

Mean 1.351StDev 0.3340N 15AD 0.239P-Value 0.732

Probability Plot of MetabolismNormal

The significance of the test is indicated by the p value in the box to the right of the plot. If the p value is less than 0.01 then the distribution of data differs significantly from normal. If it is > .01 then the data can be considered normally distributed.

Step 3: Attempt to normalise the distribution by transforming it.

Data can be easily transformed by going to Calc - Calculator. This will open a window. Enter a name for your new variable in the Store results in Variable box and enter your transformation in the Expression box (e.g., LOGTEN(Column Identifier)). Click Ok. Minitab will create a new column with the transformed variable.

Alternatively data can be transformed using the box-cox procedure. Go to the statistics tab. Select Control charts from the dropdown box. Select BOX-COX from the options that appear. This opens a new window. Type the column identifier in the box you want to transform. Insert the number 1 in the box that says sub-group size. Click on options. In the new window that appears type a column

Page 18: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

identifier (e.g. C9 for column 9) in the box that says ‘store transformed data in’ where you want the transformed data to be stored. Click on OK. Closes window. Click on OK. Perfoms analysis. A typical output looks like this.

5.02.50.0-2.5-5.0

1.3

1.2

1.1

1.0

0.9

0.8

Lambda

StD

ev

Lower CL Upper CL

Limit

Estimate 0.88

Lower CL -0.00Upper CL 1.91

Rounded Value 1.00

(using 95.0% confidence)

Lambda

Box-Cox Plot of BEE

The plot shows the optimal transformation value (lambda). The transformed data will now be in the column you specified in the options.

Step 7: Paired comparisons of parametric normally distributed data

For two sample t-test Go to the statistics tab and select ‘basic statistics’ from the drop down tab. Select 2t two sample t… from the available options and click on it. Opens new window. If you have formatted the data as detailed in the Table 1 then the data you are testing will be in one column (e.g., in the above example the energy expenditure data is in column 2) and the codes identifying which data are treatment and which control are in another column (in the above example column 4). In the new window select the ‘data in one column’ button and enter C2 in the data box and c4 in the subscripts box.

Typical output (for analysis of metabolism against treatment group in Table one) looks as follows:

Two-sample T for Metabolism

Treatment N Mean StDev SE Mean 1 10 1.538 0.195 0.062 2 5 0.976 0.204 0.091

Difference = mu (1) - mu (2) Estimate for difference: 0.562 95% CI for difference: (0.302, 0.822) T-Test of difference = 0 (vs not =): T-Value = 5.10 P-Value = 0.001 DF = 7

The t-value and p value are shown on the bottom line. If P < .05 then the difference between the two groups is significant. Data for mean, sd and se for each of the

Page 19: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

treatment groups is shown in the table. In this case (metabolism data from table 1) there was a significant difference between treatment and control groups.

For Paired t-test To use the paired t-test the data in Minitab needs to be organised as shown in Table 2, i.e., the data we are interested in testing for the treatment needs to be placed in a separate column from the control data and data from the same individual needs to be aligned in the same row. Go to the statistics tab. Select ‘basic statistics’ from the dropdown box. This opens a new window. Select t..t paired t test. Opens new window. Click the ‘samples in columns’ button. Enter the column identifiers into the two boxes. Click OK.

Typical output (for data in table 2 comparing metabolism 1 and metabolism 2) looks as follows:

Paired T for Metabolism1 - Metabolism2

N Mean StDev SE Mean

Metabolism1 15 1.3507 0.3340 0.0862 Metabolism2 15 1.4707 0.2502 0.0646 Difference 15 -0.1200 0.1493 0.0386

95% CI for mean difference: (-0.2027, -0.0373) T-Test of mean difference = 0 (vs not = 0): T-Value = -3.11 P-Value = 0.008

The t-value and p value are shown on the bottom line. If P < .05 then there is a difference between the treatment and control. Sign of the t and the values in the table indicate the direction of the difference. In this case the difference is highly significant (P <.01) and the metabolism 1 is lower than metabolism 2.

Step 8: multiple treatment levels, parametric tests

For One way ANOVA

Go to the statistics tab. <Select> ANOVA from the drop-down box. If the data are all in a single column with the identifiers for them in a second column then <Select> ‘one way...’ from the options. This opens a new window. In the response box type the identifier for the variable being tested (e.g., metabolism). In the factor box type the column that contains the treatment levels. On the other hand if the data are structured as in Table 2 with each measurement in a separate column <Select> ‘one way (unstacked)...’. This opens a new window. Enter the column identifiers for the columns containing the data into the box marked ‘Responses (in separate columns)’. Then click on OK.

Using the data from Table 1 the output appears as follows:

Page 20: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

One-way ANOVA: Metabolism versus Levels Source DF SS MS F P Levels 2 1.3418 0.6709 36.58 0.000 Error 12 0.2201 0.0183 Total 14 1.5619 S = 0.1354 R-Sq = 85.91% R-Sq(adj) = 83.56% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+--- 1 5 1.7080 0.0769 (----*-----) 2 5 1.3680 0.0864 (-----*----) 3 5 0.9760 0.2040 (----*----) ------+---------+---------+---------+--- 1.00 1.25 1.50 1.75 Pooled StDev = 0.1354

F and P values are shown in the variance table at the start of the output. If the P value is less than .05 then there is a significant effect of the treatment. In this case there is a highly significant treatment effect. (Note Minitab refers to p values less than .001 as 0.000. These should be cited as P < .001).

Repeated measures ANOVA: There is no specific procedure in MINITAB to perform a repeated measures ANOVA. The best way to perform this test is to use the general linear model test and include individual ID as a random factor in the model. The data needs to be in the ‘stacked format’ for this analysis – ie all the data need to be in a single column with other columns identifying the treatment and the individual IDs. The following analysis uses the data from Table 2 where 15 individuals are measured in 3 conditions (control and 2 treatments labelled metabolism1, 2 and 3) To perform this test, go to the statistics tab. <Select> ANOVA. From the options that appear <select> GLM – general linear model. This opens a new window. In the box that opens type the column identifier for the variable you are interested in testing into the response box. In the model box you need to enter the column identifier for the treatment levels and the column identifier for the column containing the individual IDs. In the box marked ‘random factors’ enter the same column identifier for the IDs. (Note if each individual is measured only once in each condition an interaction of individual and treatment level cannot be tested). The output appears as follows.

General Linear Model: Metabolism versus ID, Treatment Factor Type Levels Values ID random 15 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 trmt fixed 3 Metabolism1, Metabolism2, Metabolism3 Analysis of Variance for C21, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P

Page 21: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

ID 14 2.80231 2.80231 0.20017 13.91 0.000 trmt 2 0.10875 0.10875 0.05438 3.78 0.035 Error 28 0.40292 0.40292 0.01439 Total 44 3.31398 S = 0.119958 R-Sq = 87.84% R-Sq(adj) = 80.89%

The F and P values for the treatment effect and for the individual effect are shown in the variance table. A value less than .05 indicates a significant effect. In this example there was both a significant treatment effect, and also a significant individual effect.

Step 10: post hoc tests

For paired t-test see procedure detailed above in section 8. For post hoc tests proceed as follows. Conduct the one-way ANOVA as described under step 8. However before clicking on OK, click on the button labelled comparisons. Choose the test you want to use e.g. Tukey test. This time in addition to the ANOVA results there is an additional output below the analysis of variance table as follows.

Grouping Information Using Tukey Method Levels N Mean Grouping 1 5 1.7080 A 2 5 1.3680 B 3 5 0.9760 C Means that do not share a letter are significantly different. Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Levels Individual confidence level = 97.94% Levels = 1 subtracted from: Levels Lower Center Upper ----+---------+---------+---------+----- 2 -0.5683 -0.3400 -0.1117 (----*-----) 3 -0.9603 -0.7320 -0.5037 (-----*----) ----+---------+---------+---------+----- -0.80 -0.40 -0.00 0.40 Levels = 2 subtracted from: Levels Lower Center Upper ----+---------+---------+---------+----- 3 -0.6203 -0.3920 -0.1637 (-----*-----) ----+---------+---------+---------+----- -0.80 -0.40 -0.00 0.40

The first part of this output shows the pairwise comparisons of each level. Inthis case the 3 groups all differ significantly from each other which is indicated in the table by the fact none of them share a letter adjacent to the level identifier. The information under the table shows the pairwise differences and their confidence limits.

Page 22: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

For repeated measures ANOVA tested in GLM Repeat the analysis as specified above under step 8 but before clicking on OK to run the test click on ‘comparisons’. In the new window that opens enter the column identifier for the treatment in the box labelled ‘terms’. Select the test required, e.g. Tukey test. The output appears as follows.

Grouping Information Using Tukey Method and 95.0% Confidence C22 N Mean Grouping Metabolism2 15 1.5 A Metabolism3 15 1.4 A B Metabolism1 15 1.4 B Means that do not share a letter are significantly different.

In this instance metabolism 1 doesn’t differ from metabolism 3 but it does differ from metabolism 2. However metabolism 2 and 3 are also not significantly different.

Step 11 Power analysis

Go to the statistics tab. Select ‘power analysis and sample size’. Select the test you used from the options that appear. Under each of the options you need to specify all the values except ‘power’, e.g., under two sample t-test you need to fill in the boxes that specify sample sizes, differences and standard deviation. The sample size is the number of measurements in each group. The difference is the size of the effect that you would consider important to detect. For example if you felt a difference would between groups would need to be 5% or larger before you would consider it important then you need to take the mean value across all the measurements and calculate 5% of that value. Finally add the pooled standard deviation from the output of the test. For example in the 2 sample t test detailed above the overall mean was 1.368 so 5% of this would be 0.0684. The standard deviation was 0.133 and the sample size per group was 10. Putting these into the respective boxes and clicking Ok runs the analysis.

The typical output looks like this

Power and Sample Size 2-Sample t Test Testing mean 1 = mean 2 (versus not =) Calculating power for mean 1 = mean 2 + difference Alpha = 0.05 Assumed standard deviation = 0.133 Sample Difference Size Power 0.068 10 0.191413 The sample size is for each group.

Page 23: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

The power for this example is 0.191 multiply this by 100 to express it as a %. So the power in this case to detect a 5% difference between means was only 19.1%. this means we cannot be sure that the absence of a difference wasn’t just a type 2 error because of the low sample size. If you run the test again but this time put the desired minimum power into the power box – i.e. 0.8, and leave the sample size box empty, this analysis would show that to detect a 5% difference would need 62 animals per group.

Step 13 : two way anova

Two-way ANOVA In Mintab the best way to perform Two way ANOVA is to use the general linear model (GLM) as detailed in section 9 above. Although there is a facility to do two-way ANOVA in MINITAB under the statistics and ANOVA tabs this only works for completely balanced designs. To perform two-way ANOVA by GLM the data needs to be in the ‘stacked format’ – i.e. all the response data need to be in a single column with other columns identifying the treatment or factor variables and individual IDs. To select this test choose <statistics>, <ANOVA>, <general linear model>. In the box labelled ‘responses’ type the column identifier for the variable you want to analyse (e.g. Metabolism). In the box labelled ‘model’ it is necessary to include the column identifiers for both of the treatment variables plus an additional term which is the multiplication of these two variables to reflect the interaction of the predictors. For example using the example of the data in Table 1. The treatment levels are in column 5 (C5) and the sex identifiers in column six (C6). Hence the model would be C5 C6 C5*C6

In the case of the metabolism data detailed in table 1 above with treatment and sex as factors the output appears as follows

General Linear Model: Metabolism versus Treatment, Sex Factor Type Levels Values Levels fixed 3 1, 2, 3 Sex fixed 2 0, 1 Analysis of Variance for Metabolism, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Levels 2 1.34181 1.21041 0.60520 37.58 0.000 Sex 1 0.03325 0.03325 0.03325 2.06 0.185 Levels*Sex 2 0.04188 0.04188 0.02094 1.30 0.319 Error 9 0.14495 0.14495 0.01611 Total 14 1.56189 S = 0.126908 R-Sq = 90.72% R-Sq(adj) = 85.56%

The significance of the different effects is shown in the ANOVA table. In this case there is a significant treatment effect (p < 0.001), no significant sex effect (p > 0.05) and no significant sex by treatment interaction (p = 0.319).

Page 24: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Repeated measures 2-way ANOVA

To perform this test one simply repeats the above procedure but including a column with the individual IDs in it and put this into the model. This column is also entered into the box labelled ‘random factors’.

Step 14: Proceed in the same way as for step 13 Two-way ANOVA. Add additional factors and interactions into the ‘model’ box.

Step 18: non parametric tests paired comparisons

Mann Whitney U-test The data need to be organised with the data for treatment and control in different columns. Using the data from table 1. Go to Statistics – nonparametrics – Mann Whitney. This opens a new window. Enter the column identifier for the treatment data and control data in the respective boxes. Click on OK. The output appears as follows.

Mann-Whitney Test and CI: Treatment, Control N Median T 10 1.5650 C 5 1.0000 Point estimate for ETA1-ETA2 is 0.5600 95.7 Percent CI for ETA1-ETA2 is (0.2599,0.7799) W = 105.0 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0027 The test is significant at 0.0026 (adjusted for ties)

The p value for the test is shown on the bottom line of the output. If this value is < .05 the difference between the columns is significant. Wilcoxon matched pairs The data need to be organised as in table 2. Before running the test it is necessary to subtract one column from the other. To do this go to Calc, select calculator. Identify the column where you wish the result to be placed (e.g. C10). Pt the subtraction calculation into the ‘expression’ box. Ie using the data in table 2 the expression would be ‘C3-C2’ (i.e. treatment minus control data) Go to Statistics – nonparametrics – 1-Sample Wilcoxon. This opens a new window. Put the column identifier for the column that contains the differences into the box marked ‘variables. Click the button marked ‘test median’ and enter the value 0.0 into the box if it doesn’t appear automatically.

Page 25: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

The output appears as follows.

Wilcoxon Signed Rank Test: C10 Test of median = 0.000000 versus median not = 0.000000 N for Wilcoxon Estimated N Test Statistic P Median C10 15 15 108.0 0.007 0.1075

The value for the Wilcoxon statistic and the associated p value are displayed in the table. If the P value is less than 0.05 as it is in this case there is a significant difference between the treatment and control.

Step 19 non parametric analysis when there are multiple treatments or levels:

Kruskal Wallis ANOVA

Go to statistics- select nonparametrics- select Kruskal Wallis. In the response box enter the column identifier for the dependent variable eg metabolism and in the factor box type the identifier for the treatment variable. Click OK. The output appears as follows.

Kruskal-Wallis Test: Metabolism versus Treatment Kruskal-Wallis Test on Metabolism Treatment N Median Ave Rank Z 1 10 1.565 10.5 3.06 2 5 1.000 3.0 -3.06 Overall 15 8.0 H = 9.38 DF = 1 P = 0.002 H = 9.41 DF = 1 P = 0.002 (adjusted for ties)

The significance is indicated on the last line as a P value. If P < .05 there is a significant treatment effect Friedman test To apply the Friedman test in Minitab the data needs to be structured in a different way. All the data needs to be in a single column with the individual identifiers in a separate column and the treatments in a third column. To generate these columns from the data in table 2 you can use the ‘stack’ command. i.e. go to data <select> Stack and then select columns from the options that appear. In the first box type the column identifiers for the two sets of metabolism data (c2 and c3) and then click the button ‘column of current worksheet’ and type in the column number of the column where you want the stacked data to be stored (e.g. C10). In the store subscripts box type another column name (e.g. c11) this will identify which values correspond to

Page 26: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

treatment and which to control. To get the corresponding individual IDs type C1 C1 in the ‘stack the following columns box’ this will give you the individual data in a third new column (e.g. c12). To perform the Friedman test choose Statistics – Nonparametrics – Friedman test. In the box that opens type the column identifier for the stacked metabolism data (in the above case C10). In the treatment box type the column identifier for the subscripts (C11) and in the blocks column type the column identifying the individual IDs (c12). The output is as follows.

Friedman Test: metabolism versus treatment blocked by IDs S = 5.40 DF = 1 P = 0.020 Sum of trtment N Est Median Ranks 1 15 1.3900 18.0 2 15 1.4900 27.0 Grand median = 1.4400 The result and P value are on the first line of the output.

Step 23 Analysis of Covariance: ANCOVA

Data should be organized as shown in Table 1. Go to Statistics <select> ANOVA, <select> general linear model. In the box labelled response, add the column identifier for the dependent variable (e.g. metabolism or column 2 in table 1). In the box labelled ‘model’ add the column identifiers for the treatment variable and also the covariate (ie body mass) and the treatment by covariate interaction. Eg using the data in table 1 as an example the model is specified as C3 C4 C3*C4 It is necessary to declare that body weight is a covariate in the model. To do this click on the box labelled ‘covariates’ and then type the column identifier for the covariate (in this case C3) into the box. Close this box and then click on Ok to run the analysis. The output appears as follows

General Linear Model: Metabolism versus Levels Factor Type Levels Values Levels fixed 3 1, 2, 3 Analysis of Variance for Metabolism, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P

Page 27: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Body Mass 1 0.07801 0.07686 0.07686 5.64 0.042 Levels 2 1.32527 0.05598 0.02799 2.06 0.184 Levels*Body Mass 2 0.03604 0.03604 0.01802 1.32 0.314 Error 9 0.12257 0.12257 0.01362 Total 14 1.56189 S = 0.116699 R-Sq = 92.15% R-Sq(adj) = 87.79% Term Coef SE Coef T P Constant -0.3954 0.7367 -0.54 0.604 Body Mass 0.07448 0.03135 2.38 0.042 Body Mass*Levels 1 -0.05786 0.04214 -1.37 0.203 2 -0.01052 0.04386 -0.24 0.816

The significance of the different effects is shown in the ANOVA table. In this case body mass was a significant covariate (p =0.042), and there no significant effect of treatment or interaction between body mass and treatment level was found (p>0.05). Note: see results under Part 14 where treatment levels were shown to significantly affect metabolism. The analysis of covariance here suggests that these effects on metabolism might be explained by differences in body mass between individuals and are thus not caused by the treatment. However, to perform the final analysis of these data the analysis needs to be repeated omitting the non-significant interaction effect in the model. i.e. the model should be respecified as C3 C4 This should only be done when the interaction term is NOT significant. In this analysis keep C3 as a covariate. The revised output is as follows.

General Linear Model: Metabolism versus Levels Factor Type Levels Values Levels fixed 3 1, 2, 3 Analysis of Variance for Metabolism, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Body Mass 1 0.07801 0.06147 0.06147 4.26 0.063 Levels 2 1.32527 1.32527 0.66264 45.96 0.000 Error 11 0.15861 0.15861 0.01442 Total 14 1.56189 S = 0.120078 R-Sq = 89.85% R-Sq(adj) = 87.08% Term Coef SE Coef T P Constant -0.1899 0.7467 -0.25 0.804 Body Mass 0.06559 0.03177 2.06 0.063

This revised analysis excluding the interaction term shows that consistent with the data analysis in section 14 there is a significant treatment effect and an effect of body mass that just fails to reach statistical significance (p = 0.063). This emphasises the critical importance of re-running such analyses excluding non-significant interaction terms.

Page 28: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Step 37 Correlation matrix

Select statistics <select> display basic statistics and <select> correlation. In the variables box enter the column identifiers for all the predictor variables that you want to correlate together. The output is a correlation matrix.

A typical matrix might be as follows for a situation where 5 organ weights are available. Organ Liver WAT Brain Skeletal

muscle BAT

Liver 1.00 0.32 0.13 0.66 0.22

WAT 1.00 0.17 0.55 0.93

Brain 1.00 0.11 0.03

Skeletal muscle

1.00 0.61

This table highlights that WAT and BAT are highly correlated and hence not independent predictor variables. Skeletal muscle also is quite highly correlated to the liver mass. One can proceed with the analysis ignoring these effects but one should be aware that such correlations may compromise the outcome. In this case a strong effect of WAT might emerge because of the effect of BAT on metabolism combined with the high correlation of WAT with BAT. This analysis requires the number of observations (i.e., individuals) to exceed by at least a factor of 3 the number of predictor variables included into the analysis. Hence in this situation one would have 5 predictors so for each group (i.e., treatment levels and control) one would need at least 15 individuals – and preferably many more. Interpretation of these effects depends on the complexity of the interactions. The bottom line is to diagnose an overall treatment effect controlling for these body composition variables. If there is an overall treatment effect one can establish where this occurs using the multiple range tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST).

Step 38: PRINCIPAL COMPONENTS ANALYSIS.

To perform a principal components analysis the data for the individual organ weights need to be organised such that the organ weights are in separate columns and the organ weights for a given individual are in a single row. An example set of data is included in Appendix one. These data are 17 organ weights in grams from 30 rats. The original data were published in Selman et al (2008). To perform a principal components analysis on these data select Statistics - multivariate and principal components . In the box labelled variables type the column identifiers for the 17 organs (eg c1 – c17). In the box labelled ‘number of components to compute’ type 5. This will restrict the analysis to calculate only the first 5 components. Otherwise if

Page 29: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

this is left blank the analysis will compute n components where n is the original number of columns entered into the analysis. Click on the button labelled ‘storage’ and in the new window that opens type column identifiers for the same number of columns that you asked the program to compute. E.g. if you asked it to compute 5 components then type 5 column identifiers for example C18-c22. Click OK. Closes new window. Click OK. Runs analysis. The output looks as follows:

Principal Component Analysis: Carcass, HEART, LIVER, KIDNEY, BRAIN, Brown Fat…., Eigenanalysis of the Correlation Matrix 28 cases used, 2 cases contain missing values Eigenvalue 7.6875 3.4298 1.6138 1.0599 0.7983 0.6485 0.5503 0.3541 Proportion 0.452 0.202 0.095 0.062 0.047 0.038 0.032 0.021 Cumulative 0.452 0.654 0.749 0.811 0.858 0.896 0.929 0.950 Eigenvalue 0.2827 0.1803 0.1250 0.1029 0.0749 0.0463 0.0228 0.0187 Proportion 0.017 0.011 0.007 0.006 0.004 0.003 0.001 0.001 Cumulative 0.966 0.977 0.984 0.990 0.995 0.997 0.999 1.000 Eigenvalue 0.0044 Proportion 0.000 Cumulative 1.000 Variable PC1 PC2 PC3 PC4 PC5 Carcass 0.333 -0.107 -0.071 0.141 0.077 HEART 0.236 0.286 0.052 -0.206 0.378 LIVER 0.283 -0.215 0.017 0.016 0.100 KIDNEY 0.288 0.270 0.089 0.034 -0.163 BRAIN 0.134 0.371 0.180 0.087 0.014 Brown Fat -0.231 -0.118 0.485 -0.020 -0.180 Abdominal Fat 0.225 -0.333 0.140 -0.326 -0.118 Gonadal Fat 0.268 -0.255 0.110 -0.279 -0.280 Mesenteric Fat 0.287 -0.097 0.244 -0.196 0.204 Gonads 0.139 -0.302 -0.147 0.451 0.502 Large Intestine (g) 0.314 -0.014 -0.098 -0.009 0.209 Small Intestine (g) 0.149 0.058 0.574 0.128 0.066 Stomach 0.133 0.318 0.339 0.357 -0.030 Lungs -0.019 0.372 -0.114 -0.569 0.296 Pancreas 0.276 0.120 -0.252 0.066 -0.349 Pelage 0.328 -0.153 -0.042 -0.055 -0.194 Tail 0.229 0.280 -0.267 0.164 -0.317

At the top of the output the note reminds us that for this analysis to run it is necessary to have complete data for all animals. If the data for a given animal is incomplete it is excluded from the analysis. Beneath this is a table containing 17 sets of values labelled Eigenvalue, proportion and cumulative. These are the proportions of the original variance contained in the 17 computed components ordered by size. Hence the first principal component explains 45.2% of the original variation. The eigenvalue is a representation of how much better this variable is at describing the variance compared to the original variables. As there were 17 original variables they each contain 1/17th of the total variation (p = 0.0588). Since this new variable contains p = 0.452 of the variation it is 0.452/0.0588 = 7.68x better than the original variables at describing the data. Another way of thinking about the eigenvalue is that it is the number of original variables that the current variable is ‘worth’. The second principal component in this case has an eigen value of 3.43 and explains 20.2% of the variation, so the cumulative variance explained by components 1 and 2 is 65.4%.

Page 30: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Looking at this table you can see that beyond the 4th principal component the eigenvalue falls below 1 so these variables explain less than the original variables. Moreover, the first 4 components explain together 81% of the original variation. This means that by looking at just the first 4 components retains 81% of the original information but in just 4 as opposed to 17 variables.

Below the eigenvalues and variance table is a second table showing each of the original variables alongside the new principal components (PC1 to PC5). The values in this table are ‘eigenvectors’ that show the strength and direction of the association between the original variable and the new component. As you can see almost all the variables affect PC1 in a positive way and so it reflects an overall size component, while PC2 is negatively affected by all the body fat components so it is a reflection of leanness of the animals. We can use the ‘scores’ on these principal components in a general linear model (see above) in place of the original organ masses. The major advantage of this is that these principal components are by definition completely independent of each other. This makes their use in the general linear model more statistically valid.

Page 31: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

R

Step 1: Normality tests

The following code will perform the Anderson-Darling and the Shapiro-Wilks tests for normality on the Metabolism variable in the Table1 data frame. To perform the Anderson-Darling test using the ad.test() function, the “nortest” package must be installed. On most systems the install.packages() function can be used to install packages; otherwise, there are package installation wizards associated with all script editors. To use the “nortest” package, it must be loaded using the library() function.

Anderson-Darling test

install.packages("nortest") library(nortest) ad.test(Table1$Metabolism) Anderson-Darling normality test data: Table1$Metabolism A = 0.2395, p-value = 0.732

Shapiro-Wilks test

shapiro.test(Table1$Metabolism) Shapiro-Wilk normality test data: Table1$Metabolism W = 0.9566, p-value = 0.6333

If the p-value is less than 0.05, then the distribution of the data differs significantly from normal. If the p-value is greater than 0.05, then the data can be considered normally distributed. Normality Q-Q plots can be made for the Metabolism variable in the Table1 data frame using the qqnorm() function. A Q-Q line that reflects what would is expected if the distribution is normal can be added using the qqline() function. Note that to make this plot with the observed values on the x-axis, datax = TRUE must be specified in both the qqnorm() and qqline() functions.

qqnorm(Table1$Metabolism, datax = TRUE, xlab = "Expected Normal", ylab = "Observed Values")

qqline(Table1$Metabolism, datax = TRUE)

Page 32: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Step 3: Attempt to normalise the distribution by transforming it.

New log10 or square root transformed variables can be added to the data frame using the log10() and sqrt() functions, respectively. Transformed variables (e.g. log10Metabolism or sqrtMetabolism) can be analyzed in the steps below.

Table1$log10Metabolism <- log10(Table1$Metabolism) Table1$sqrtMetabolism <- sqrt(Table1$Metabolism)

Alternatively data may be transformed using the BOX-COX procedure Prior to performing a Box-Cox transformation, the “MASS” library needs to be loaded using the library() function (the “MASS” package comes with the installation of R). To perform the Box-Cox transformation in R, an ANOVA or ANCOVA model needs to be specified using either the lm() or aov() functions (e.g. aov(Metabolism ~ Levels, data = Table1)). The boxcox() function calculates the log-likelihood of a sequence of Lambda values (λ) attempting to normalise the residuals of the linear model that is specified. The default of the boxcox() function is to calculate log-likelihood values for λ values between -2 and 2 at 0.1 intervals. In this example, the plateau of the log-likelihood function peaks outside these λ values; therefore, in the function below log-likelihood values between -5 and 5 at 0.1 intervals are calculated using the seq() function. The default of the boxcox() function is to plot log-likelihood values on λ values with the 95% confidence interval of λ values. In this example, the results of the boxcox() function will be placed in a list that we have arbitrarily called bc. This list contains two vectors: (1) a vector of the

Page 33: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

λ values used between -5 and 5 (this vector is called x and it can be seen with the command bc$x), and (2) a vector of all the log-likelihood values calculated from the λ values (this vector is called y and it is shown below rounded to two decimal places using the round() function). The max() function is used to determine the largest log-likelihood value. The which.max() function is used to determine the position of maximum log-likelihood value in the log-likelihood vector. In this case, the 77th value in the log-likelihood vector was the maximum value. The vector notation bc$x[] will be used to output λ value in the position corresponding to the maximum log-likelihood value; this value will be called Lambda. Finally, values transformed by the Box-Cox transformation (yi

(λ)) are calculated using the following formulas: (1) if the maximum log-likelihood λ ≠ 0: yi

(λ) = (yiλ – 1) /

λ (shown below), (2) if the maximum log-likelihood λ = 0: yi(λ) = loge(yi)

specified using the log() function in R. This transformation is then applied to the dependent variable (i.e. Metabolism) and then the ANOVA or ANCOVA model is re-run using the Box-Cox transformed dependent variable.

library(MASS) bc <- boxcox(aov(Metabolism ~ Levels, data = Table1), lambda = seq(-5, 5, 0.1))

Page 34: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

bc$x [1] -5.0 -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4.0 -3.9 -3.8 -3.7 -3.6 [16] -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 [31] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 [46] -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 [61] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 [76] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 [91] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0

round(bc$y, digits = 2) [1] -31.68 -30.83 -29.98 -29.13 -28.28 -27.44 -26.60 -25.76 -24.93 -24.10 [11] -23.27 -22.45 -21.62 -20.81 -19.99 -19.18 -18.37 -17.56 -16.76 -15.97 [21] -15.17 -14.38 -13.59 -12.81 -12.03 -11.26 -10.49 -9.72 -8.96 -8.20 [31] -7.45 -6.70 -5.96 -5.23 -4.49 -3.77 -3.05 -2.33 -1.63 -0.93 [41] -0.23 0.45 1.13 1.80 2.47 3.12 3.76 4.40 5.02 5.63 [51] 6.23 6.81 7.38 7.94 8.48 9.01 9.52 10.01 10.48 10.93 [61] 11.35 11.76 12.14 12.50 12.83 13.14 13.42 13.67 13.89 14.09 [71] 14.26 14.40 14.51 14.60 14.65 14.69 14.69 14.67 14.63 14.56 [81] 14.47 14.36 14.23 14.09 13.92 13.74 13.54 13.33 13.10 12.86 [91] 12.61 12.35 12.08 11.80 11.51 11.21 10.91 10.59 10.27 9.95 [101] 9.62

max(bc$y) [1] 14.6913

which.max(bc$y) [1] 77

Lambda <- bc$x[which.max(bc$y)] Lambda

[1] 2.6

Table1$MetabolismBC <- (Table1$Metabolism^Lambda) / Lambda summary(Table1$MetabolismBC)

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1157 0.5230 0.8232 0.9376 1.4030 1.7990

R

Page 35: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Step 7: Paired comparisons of parametric normally distributed data For two sample t-test Using the data presented in Table 1, the difference between the Treatments can be evaluated using the t.test() function. In this function, a two-sided test is the default comparison between the Treatments. If equal variances in the two Treatments are assumed, then the use the following code:

t.test(Metabolism ~ Treatment, data = Table1, var.equal = TRUE)

If equal variances in the two treatments are not assumed, then use the

following code:

t.test(Metabolism ~ Treatment, data = Table1, var.equal = FALSE) The output from the t.test() function where equal variances in the two treatments are not assumed is:

Welch Two Sample t-test data: Metabolism by Treatment t = 5.1023, df = 7.771, p-value = 0.001014 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.3066951 0.8173049 sample estimates: mean in group 1 mean in group 2 1.538 0.976

The t-value, df, and p-value are shown on the same line. If the p-value < 0.05 then the difference between the two Treatments is significant. In this case, Metabolism in group 1 was significantly greater than Metabolism in group 2.

If you want to output the standard deviation and the standard error (standard deviation / square root of the number of samples) for both groups, then use the tapply() function.

tapply(Table1$Metabolism, Table1$Treatment, sd) 1 2 0.1950954 0.2040343 tapply(Table1$Metabolism, Table1$Treatment, sd) /

sqrt(tapply(Table1$Metabolism, Table1$Treatment, length)) 1 2 0.06169459 0.09124692

Page 36: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

For Paired t-test This analysis uses the data provided in Table 2. To run this analysis, the data in Table 2 needs to be in ‘stacked format’. In the stacked format data frame, there are five columns: 1) ID, 2) Metabolism – all values from the Metabolism Control, Metabolism Treatment 1, and Metabolism Treatment 2 columns, 3) Levels - the Metabolism Control, Metabolism Treatment 1, and Metabolism Treatment 2 values are specified as Metabolism1, Metabolism2, and Metabolism3, respectively, 4) Body Mass and 5) Sex. In the stacked data, all categorical variables (ID and Sex) must be specified as factors using the as.factor() function. This paired t-test compares Metabolism between the Levels Metabolism1 and Metabolism2 (the Metabolism3 values are removed using subset not equal to “!=”).

Table2stack <- read.table("Table2b.txt", header=TRUE) Table2stack$Sex <- as.factor(Table2stack$Sex) Table2stack$ID <- as.factor(Table2stack$ID) t.test(Metabolism ~ Levels, data = Table2stack, subset = Table2stack$Levels !=

"Metabolism3", paired = TRUE, var.equal = FALSE) Paired t-test data: Metabolism by Levels t = -3.1122, df = 14, p-value = 0.007644 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.20269722 -0.03730278 sample estimates: mean of the differences -0.12

In this case, the difference between the Metabolism1 and Metabolism2 Levels is highly significant (i.e. P < .01). To calculate the mean within each Level, use the tapply() function.

tapply(Table2stack$Metabolism, Table2stack$Levels, mean) Metabolism1 Metabolism2 Metabolism3 1.350667 1.470667 1.402000

Step 8: multiple treatment levels, parametric tests

R For One way ANOVA Using the data from Table 1, to compare the Control, Treatment1, and Treatment2 Levels, use the aov() function.

Page 37: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

summary(aov(Metabolism ~ Levels, data = Table1)) Df Sum Sq Mean Sq F value Pr(>F) Levels 2 1.34181 0.67091 36.582 7.827e-06 *** Residuals 12 0.22008 0.01834 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

The p-value in this summary table is under the Pr(>F) heading. In this case, the difference between Levels is highly significant (i.e. P < .01).

Repeated-measures ANOVA This analysis uses the data provided in Table 2 and compares Metabolism measured under three conditions (Levels: Metabolism1, Metabolism2, and Metabolism3). To run this analysis, the data in Table 2 needs to be in ‘stacked format’ – i.e. all the Metabolism data need to be in a single column with other four columns identifying the Levels (Metabolism1, Metabolism2, and Metabolism3), Body Mass, Sex, and individual IDs. Repeated measures ANOVA can be performed in R using either the lme() or the aov() functions. The random effect of ID is included in the lme() function by adding “random = ~1|ID” as a separate argument, whereas in the aov() function it is included by adding “+ Error(ID)” in the formula. In order to use the lme() function, the “nlme” package must be installed and loaded using the library() function.

install.packages("nlme") library(nlme) anova(lme(Metabolism~Levels, random = ~1|ID, data = Table2stack))

numDF denDF F-value p-value (Intercept) 1 28 445.5459 <.0001 Levels 2 28 3.7787 0.0353

summary(aov(Metabolism ~ Levels + Error(ID), data = Table2stack)) Error: ID Df Sum Sq Mean Sq F value Pr(>F) Residuals 14 2.8023 0.20017 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Levels 2 0.10875 0.054376 3.7787 0.03525 * Residuals 28 0.40292 0.014390 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 38: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

The p-value in this summary table is under the Pr(>F) heading. The difference between Levels is significant (i.e. P < .05). The significance of the random effect is not provided by either the lme() or the aov() functions, but the F-value can be calculated using the aov() output by dividing the Mean Square of the ID effect (0.20017), by the Mean Square Residual (0.014390). Using this F-value, the p-value can be calculated based on the F distribution with degrees of freedom equal to 14 and 28 using the pf() function. In this case, the random effect of ID is highly significant (P < 0.0001).

pf(0.20017 / 0.014390, df1 = 14, df2 = 28, lower.tail = FALSE) [1] 4.51458e-09

Step 10: post hoc tests

For One way Anova The one-way ANOVA in step 9 suggested that Metabolism was significantly different among the Levels in Table 1. Tukey's post-hoc test can be used to determine what Levels were significantly different from each other using the TukeyHSD() function.

TukeyHSD(aov(Metabolism ~ Levels, data = Table1)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Metabolism ~ Levels, data = Table1) $Levels diff lwr upr p adj 2-1 -0.340 -0.5685037 -0.1114963 0.0048896 3-1 -0.732 -0.9605037 -0.5034963 0.0000053 3-2 -0.392 -0.6205037 -0.1634963 0.0017004

The Levels that are being compared are given in the left-most column. Based on the p-values (p adj column), all three Levels differ significantly from each other (P < 0.005).

Repeated-measures ANOVA The repeated measures ANOVA in step 9 suggested that Metabolism differed among the Metabolism1, Metabolism2, and Metabolism3 Levels. This analysis determines what Levels were significantly different from each other. The data in Table 2 needs to be in ‘stacked format’ for this analysis. Post-hoc tests following repeated measures ANOVA can be performed on lme() function objects (see Step 9 for the installation of the "nlme" package for the

Page 39: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

lme() function), but not on the aov() function objects. To proceed, the “multpcomp” package must be installed, the package must be loaded using the library() function, and the glht() function is used.

library(nlme) install.packages(“multcomp”) library(multcomp) summary(glht(lme(Metabolism ~ Levels, random = ~1|ID, data = Table2stack),

linfct=mcp(Levels="Tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lme.formula(fixed = Metabolism ~ Levels, data = Table2stack, random = ~1 | ID) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) Metabolism2 - Metabolism1 == 0 0.12000 0.04380 2.740 0.017 * Metabolism3 - Metabolism1 == 0 0.05133 0.04380 1.172 0.470 Metabolism3 - Metabolism2 == 0 -0.06867 0.04380 -1.568 0.260 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)

The Levels that are being compared are given in the left-most column. Based on the p-values (Pr(>|z|) column), Metabolism2 differed significantly from Metabolism1 (P = 0.017), but none of the other Levels differed significantly from each other (P > .05).

Step 11 Power analysis

To perform power analyses, the “pwr” package must be installed using the install.packages() function. To use the functions in the “pwr” package, it must be loaded using the library() function. To perform a power analysis on a two sample t-test using the pwr.t2n.test() function, you need to input the sample sizes of the two samples (i.e. n1 = 10, and n2 = 10; note that the default significance level is set to 0.05). To specify the effect size in R, the % difference that you would consider important to detect needs to be divided by the standard deviation; thus, using the same mean, % difference, and standard deviation as in the MINITAB example: d = 1.368 * 0.05 / 0.133.

Page 40: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

install.packages("pwr") library(pwr) pwr.t2n.test(n1 = 10, n2 = 10, d = 1.368 * 0.05 / 0.133) t test power calculation n1 = 10 n2 = 10 d = 0.5142857 sig.level = 0.05 power = 0.1931212 alternative = two.sided

As outlined in the MINITAB example, the pwr.t.test() function can be used to calculate the sample sizes required to obtain a specified level of power (e.g. power = 0.8).

pwr.t.test(power = 0.8, d = 1.368 * 0.05 / 0.133, type = "two.sample") Two-sample t test power calculation n = 60.32651 d = 0.5142857 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group

Step 13 : Two-way ANOVA

Two-way ANOVA This analysis uses the data in Table 1. In order to use Type III sums-of-squares as is done in MINITAB and SPSS, the “car” package must be installed and loaded using the library() function. Two-way ANOVAs can be analyzed using the aov() function. The aov() function is nested within the Anova() function specifying that we want the analysis to use type III sums-of-squares (i.e. type = “III”). R offers a number of different ways by which factor levels can be compared (i.e. contrasted in statistical terms). Treatment contrasts are the default method in R; however, this type of contrasts is not valid for two-way ANOVAs using type III sums-of-squares. Helmert or sum contrasts are two types of contrasts that are valid for this type of ANOVA. Sum contrasts are used in the command below by specifying "contr.sum" in the function options(contrasts()). The Anova() function will output F-value, p-values, and the other values that are used to calculate them. In this example, the

Page 41: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Levels:Sex interaction was non-significant (F2,9 = 1.3, P = 0.32), and thus it can be removed from the model. The model without the interaction suggests that there is a strong effect of Levels (F2,11 = 39.5, P < 0.0001), but that the effect of Sex is non-significant (F1,11 = 1.96, P = 0.19).

install.packages(“car”) library(car) options(contrasts=c("contr.sum", "contr.poly")) Anova(aov(Metabolism ~ Levels + Sex + Levels:Sex, data = Table1), type = "III") Anova Table (Type III tests) Response: Metabolism Sum Sq Df F value Pr(>F) (Intercept) 26.6451 1 1654.4056 1.634e-11 *** Levels 1.2104 2 37.5774 4.278e-05 *** Sex 0.0333 1 2.0648 0.1846 Levels:Sex 0.0419 2 1.3000 0.3192 Residuals 0.1450 9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Anova(aov(Metabolism ~ Levels + Sex, data = Table1), type = "III") Anova Table (Type III tests) Response: Metabolism Sum Sq Df F value Pr(>F) (Intercept) 26.6451 1 1568.824 3.221e-13 *** Levels 1.3418 2 39.502 9.533e-06 *** Sex 0.0333 1 1.958 0.1893 Residuals 0.1868 11 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Repeated-measures two-way ANOVA This analysis examines the effect of Levels and Sex on Metabolism including a random factor of ID. This analysis can be performed using either the lme() or the aov() functions; however, only the commands for the aov() function will be shown. The random effect of ID is included in the aov() function it is included by adding “+ Error(ID)” in the formula. For this analysis, the data in Table 2 needs to be in ‘stacked format’. The “Levels * Sex” notation suggests that both the main effects of Levels and Sex are tested in addition to their interaction. The “Levels * Sex” interaction was not significant (F2,26 = 0.06, P = 0.94), and thus, it was removed from the model. There was a significant in Metabolism among the Levels (F2,28 = 3.78, P = 0.04); however, Sex did not significantly affect Metabolism (F1,13 = 0.33, P = 0.58).

Page 42: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

summary(aov(Metabolism ~ Levels * Sex + Error(ID), data = Table2stack)) Error: ID Df Sum Sq Mean Sq F value Pr(>F) Sex 1 0.06848 0.068481 0.3256 0.578 Residuals 13 2.73383 0.210295 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Levels 2 0.10875 0.054376 3.5258 0.04417 * Levels:Sex 2 0.00193 0.000967 0.0627 0.93936 Residuals 26 0.40098 0.015422 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

summary(aov(Metabolism ~ Levels + Sex + Error(ID), data = Table2stack)) Error: ID Df Sum Sq Mean Sq F value Pr(>F) Sex 1 0.06848 0.068481 0.3256 0.578 Residuals 13 2.73383 0.210295 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Levels 2 0.10875 0.054376 3.7787 0.03525 * Residuals 28 0.40292 0.014390 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Step 14: Proceed in the same way as for step 13 for Two-way ANOVA, but with additional factors added to the model formula

Step 18: non parametric tests paired comparisons

Mann-Whitney U/ two-sample Wilcoxon test This analysis compares Metabolism between the two Treatments in Table 1. The Mann-Whitney U test can also be called a two-sample Wilcoxon test. To determine the median of each Treatment, use the tapply() function.

Page 43: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

tapply(Table1$Metabolism, Table1$Treatment, median) 1 2 1.565 1.000 wilcox.test(Metabolism ~ Treatment, data = Table1, conf.int = TRUE) Wilcoxon rank sum test with continuity correction data: Metabolism by Treatment W = 50, p-value = 0.002647 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: 0.2600396 0.7799804 sample estimates: difference in location 0.5543361

R gives a warning message for this analysis that it cannot calculate exact p-values and confidence intervals because of ties. The p-value and the 95% confidence interval are corrected because there are ties amongst the Metabolism values. The p-value suggests that Metabolism is significantly different between the two Treatments (i.e. P < .05).

Wilcoxon matched pairs This analysis uses the wilcox.test() function to compare the Metabolism Control and Metabolism Treatment1 columns in Table 2 assuming that these values come from the same individual (paired = TRUE). To proceed with this analysis, the Table 2 data need to be imported using: Table2 <- read.table("Table2.txt", header = TRUE).

wilcox.test(Table2$Control, Table2$Treatment1, paired = TRUE, conf.int = TRUE) Wilcoxon signed rank test with continuity correction data: Table2$Control and Table2$Treatment1 V = 12, p-value = 0.006945 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -0.20504427 -0.03504422 sample estimates: (pseudo)median -0.1072468

Again, R gives a warning message for this analysis that it cannot calculate exact p-values and confidence intervals because of ties. The p-value and the 95% confidence interval are corrected because there are ties amongst the Metabolism values. This analysis suggests that there is a significant difference between Control and Treatment1.

Page 44: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Step 19 non parametric analysis when there are multiple treatments or levels:

Kruskal-Wallis ANOVA This analysis compares Metabolism between the Treatments in Table 1 using the kruskal.test() function. The Kruskal-Wallis statistic (χ2) and the p-value adjusted for ties is given on the last line. A p-value of < .05 suggests that Metabolism is significantly different between the Treatments.

kruskal.test(Metabolism ~ Treatment, data = Table1) Kruskal-Wallis rank sum test data: Metabolism by Treatment Kruskal-Wallis chi-squared = 9.4086, df = 1, p-value = 0.00216

Repeated measures Friedman test This analysis compares Metabolism between the Levels in Table2. The data in Table 2 needs to be in ‘stacked format’ and this analysis will focus on comparing Metabolism between Levels Metabolism1 and Metabolism2. The subset() function is used to omit the Metabolism3 Level and create a new data frame called Table2stacksub. The as.factor(as.character()) functions are required in this case because without performing these functions, R considers there be to zero individuals in the Metabolism3 Level and this causes an error with the Friedman Test.

Table2stacksub <- subset(Table2stack, Table2stack$Levels != "Metabolism3") Table2stacksub$Levels <- as.factor(as.character(Table2stacksub$Levels)) friedman.test(Table2stacksub$Metabolism, Table2stacksub$ID,

Table2stacksub$Levels) Friedman rank sum test data: Table2stacksub$Metabolism, Table2stacksub$ID and Table2stacksub$Levels Friedman chi-squared = 26.6679, df = 14, p-value = 0.02126

The significance of this test is indicated on the last line as a p-value. The p-value is < .05 in this case suggesting that Metabolism differs significantly between the Metabolism1 and Metabolism2 Levels taking into account that these Levels were examined in the same individuals.

Page 45: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

pchisq(26.6679, df=14, lower.tail = F) [1] 0.02125833 pchisq(5.4, df=1, lower.tail = F) [1] 0.02013675

Step 23 Analysis of Covariance: ANCOVA

This analysis uses the data in Table 1. In order to use Type III sums-of-squares as is done in MINITAB and SPSS, the “car” package must be installed (see Step 14 for installation of the “car” package) and loaded using the library() function. ANCOVAs can be analyzed using the aov() function. In this example, we will first run the aov() function and call the resulting model aov1. We will name the model aov1 because it will simplify the notation for the post-hoc Tukey test in step 30. Next, we will use the Anova() function on aov1 specifying that we want the analysis to use type III sums-of-squares (i.e. type = “III”). Remember that the correct contrasts need to be specified in the options() function in order for the results of Anova() to be correct (see two-way ANOVA in step 14). In this example, the Levels:Body.Mass interaction was non-significant (F2,9 = 1.32, P = 0.31), and thus it can be removed from the model. The ANOVA tables both including (aov1) and excluding (aov2) the Levels:Body.Mass interaction are presented. The interpretation of the analysis excluding the interaction is that there is a strong effect of Levels (F2,11 = 46.0, P < 0.0001), but that the effect of body mass is only a trend assuming a significance threshold of 0.05 (F1,11 = 4.3, P = 0.06).

library(car) options(contrasts=c("contr.sum", "contr.poly")) aov1 <- aov(Metabolism ~ Body.Mass + Levels + Levels:Body.Mass, data = Table1) Anova(aov1, type = "III") Anova Table (Type III tests) Response: Metabolism Sum Sq Df F value Pr(>F) (Intercept) 0.003923 1 0.2881 0.60446 Body.Mass 0.076857 1 5.6436 0.04152 * Levels 0.055980 2 2.0553 0.18399 Body.Mass:Levels 0.036039 2 1.3231 0.31351 Residuals 0.122567 9 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 46: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

aov2 <- aov(Metabolism ~ Body.Mass + Levels, data = Table1) Anova(aov2, type = "III") Anova Table (Type III tests) Response: Metabolism Sum Sq Df F value Pr(>F) (Intercept) 0.00093 1 0.0647 0.80395 Body.Mass 0.06147 1 4.2635 0.06334 . Levels 1.32527 2 45.9567 4.561e-06 *** Residuals 0.15861 11 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To perform a post-hoc test on an ANCOVA, the "multcomp" package must be installed (see Step 11 for the installation of the "multcomp" package). The "multcomp" package must then be loaded using the library() function, and the glht() function is used. In the example below, the glht() function compares the Levels using a Tukey test, based on the aov2 model.

library(multcomp) summary(glht(model = aov2, linfct = mcp(Levels = "Tukey"), data = Table1)) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: aov(formula = Metabolism ~ Body.Mass + Levels, data = Table1) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 2 - 1 == 0 -0.37673 0.07800 -4.830 0.00123 ** 3 - 1 == 0 -0.72806 0.07597 -9.584 < 0.001 *** 3 - 2 == 0 -0.35133 0.07846 -4.478 0.00219 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)

Step 37 Correlation matrix

Assuming that you have already imported a data frame called BodyComp into R, you can assign a new data frame called cors that only contains the organ weights that you want to compare. R uses [row, column] notation to refer to specific values within a data frame. For example, if you want to return the variable in the second row and third column of the Table1 data frame, you would type:

Page 47: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Table1[2, 3] [1] 22.3

Because we want to have all rows from the BodyComp data frame in the cors data frame, we leave the rows position in the [row, column] notation blank. In the column position, we include all the variable names that we want to include in a vector (i.e. vectors are specified using c() notation). Finally, to display the correlation matrix using Pearson correlation coefficients, use the cor() function. The “use” argument within the cor() function allows you to deal with missing values (for more information on the “use” argument, see type:

cors <- BodyComp[ , c("Liver", "WAT", "Brain", "Skeletal.Muscle", "BAT")] cor(cors, method = "pearson", use = "pairwise.complete.obs")

A typical matrix might be as follows for a situation where 5 organ weights are available. Organ Liver WAT Brain Skeletal

muscle BAT

Liver 1.00 0.32 0.13 0.66 0.22

WAT 1.00 0.17 0.55 0.93

Brain 1.00 0.11 0.03

Skeletal muscle

1.00 0.61

This table highlights that WAT and BAT are highly correlated and hence not independent predictor variables. Skeletal muscle also is quite highly correlated to the liver mass. One can proceed with the analysis ignoring these effects but one should be aware that such correlations may compromise the outcome. In this case a strong effect of WAT might emerge because of the effect of BAT on metabolism combined with the high correlation of WAT with BAT. This analysis requires the number of observations (i.e., individuals) to exceed by at least a factor of 3 the number of predictor variables included into the analysis. Hence in this situation one would have 5 predictors so for each group (i.e., treatment levels and control) one would need at least 15 individuals – and preferably many more. Interpretation of these effects depends on the complexity of the interactions. The bottom line is to diagnose an overall treatment effect controlling for these body composition variables. If there is an overall treatment effect one can establish where this occurs using the multiple range tests (TUKEY TEST and DUNCAN’S MULTIPLE RANGE TEST). If there are large numbers of high correlations in the matrix go to 39 otherwise END.

Page 48: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

Step 38: PRINCIPAL COMPONENTS ANALYSIS.

This analysis will use the BodyComp data set that was used in step 38. In order to run a PCA in R using prcomp(), individuals without complete data must be removed using the na.omit() function nested within the prcomp() function. The argument scale = TRUE is specified because the variances among the body composition variables vary considerably (the spread within each variable within the BodyComp data frame can be seen with the summary() function). The following line of code will give the proportion and cumulative proportion of the total variance explained.

summary(prcomp(na.omit(BodyComp), scale = TRUE)) PC1 PC2 PC3 PC4 PC5 PC6 PC7 Standard deviation 2.7726 1.8520 1.27034 1.02950 0.89347 0.80528 0.74181 Proportion of Variance 0.4522 0.2018 0.09493 0.06235 0.04696 0.03815 0.03237 Cumulative Proportion 0.4522 0.6540 0.74889 0.81123 0.85819 0.89634 0.92871 PC8 PC9 PC10 PC11 PC12 PC13 PC14 Standard deviation 0.59503 0.53166 0.42466 0.35356 0.32082 0.2736 0.21518 Proportion of Variance 0.02083 0.01663 0.01061 0.00735 0.00605 0.0044 0.00272 Cumulative Proportion 0.94953 0.96616 0.97677 0.98412 0.99018 0.9946 0.99731 PC15 PC16 PC17 Standard deviation 0.15093 0.1366 0.06610 Proportion of Variance 0.00134 0.0011 0.00026 Cumulative Proportion 0.99865 0.9997 1.00000

Eigenvalues for all seventeen components are the square of the standard deviations given from the previous line of code and can be outputted using:

prcomp(na.omit(BodyComp), scale=TRUE)$sd^2 [1] 7.687505211 3.429847979 1.613766157 1.059868403 0.798288931 0.648470172 [7] 0.550281167 0.354066373 0.282664344 0.180336227 0.125007493 0.102923245 [13] 0.074873073 0.046300902 0.022779398 0.018651206 0.004369719

To output the principle component scores associated with all seventeen organ weights (only the first four are shown) use the following line of code. The principle components show the strength of the association between the original variable and the new component.

Page 49: SPSS Step 1 - Disease Models & Mechanismsdmm.biologists.org/content/suppl/2013/02/25/6.2.293.DC1/DMM009860… · SPSS Step 1: Normality tests go to Analyze – Descriptive Statistics

prcomp(na.omit(BodyComp), scale=TRUE) PC1 PC2 PC3 PC4 Carcass -0.33276942 0.10678015 -0.07113286 0.140815809 HEART -0.23637808 -0.28630188 0.05238709 -0.205892396 LIVER -0.28312835 0.21517741 0.01690019 0.016342695 KIDNEY -0.28843254 -0.27045367 0.08891807 0.033666276 BRAIN -0.13447712 -0.37113185 0.18025001 0.087331685 Brown.Fat 0.23057185 0.11777104 0.48450263 -0.020379466 Abdominal.Fat -0.22535429 0.33295878 0.14048190 -0.325546388 Gonadal.Fat -0.26791674 0.25488352 0.11035956 -0.278616530 Mesenteric.Fat -0.28709897 0.09681311 0.24393549 -0.196235051 Gonads -0.13919520 0.30207318 -0.14657328 0.451415952 Large.Intestine..g. -0.31351304 0.01358145 -0.09778719 -0.008909068 Small.Intestine..g. -0.14871276 -0.05786933 0.57434774 0.128275550 Stomach -0.13339934 -0.31818402 0.33869275 0.357140179 Lungs 0.01886844 -0.37180831 -0.11381438 -0.569416313 Pancreas -0.27587262 -0.12034584 -0.25189547 0.066332253 Pelage -0.32768510 0.15295340 -0.04247428 -0.054512252 Tail -0.22887083 -0.27969993 -0.26741094 0.163568995

The signs of the principal component scores within each principal component are arbitrary. It may be that all the scores need to be multiplied by “-1” in order to have consistent results between different statistical programs. Within a given principal component, variables with principal component scores with the same sign, have original values are correlated in the same direction with the principal component scores. For example, fifteen of the seventeen variable are correlated with PC1 in the same direction suggesting that this variable gives a general indication of body size.