Statistical Analyses of Multivariate Time Series Data With Application to Compacting Effects on Soil Chemical and Biological Properties in Forestry

STATISTICAL ANALYSES OF MULTIVARIATE

TIME SERIES DATA

WITH APPLICATION TO COMPACTING

EFFECTS ON SOIL CHEMICAL AND

BIOLOGICAL PROPERTIES IN FORESTRY

VOLUME ONE

By Stuart Fenech BSc (AES)

Australian School of Environmental Studies

Faculty of Environmental Science

GRIFFITH UNIVERSITY

BRISBANE

This dissertation is submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science with Honours in Australian Environmental Studies.

October 2002

ii

DECLARATION

I, Stuart Anthony Fenech, hereby declare that this work has not been submitted for a

degree or diploma in any university. To the best of my knowledge and belief, the

dissertation contains no material previously published or written by another person

except where due reference is made in the dissertation itself.

_____________________________________

Stuart Fenech

October 2002

iii

ABSTRACT

When repeated measurements are recorded over time, the result is a time series. The

nature of a measurement being taken over time is that the values that result are likely to

be correlated. Commonly more than one time series (univariate) may be recorded,

resulting in a multiple variable (multivariate) time series situation. Statistical analyses for

univariate and multivariate time series are the focus of this investigation.

A practical approach was adapted to the presentation of available methods for dealing

with data correlated over time. Basic principles were presented before full repeated

measures and time series techniques for statistical analyses. Repeated measures were

found suited to shorter time series while time series techniques were better suited to

longer time series (ie. length of more than 25). Both areas of statistical analyses can be

applied to data correlated over time. Two main repeated measures techniques of split plot

designs and MANOVA, which have usage outside of time series, were introduced and

evaluated. Various traditional univariate time series models were detailed including

autoregressive integrated moving average models (ARIMA) models. Multivariate time

series models were then presented, including multiple independent variable and vector

based variants of ARIMA models. Practical examples from a rainfall data set illustrated

the well developed and supported concepts detailed. Each section built on those presented

previously in a logical, orderly fashion.

A review of recent theoretical developments and practical applications in the area of

multiple time series was provided. A large variety of fields make use of multiple time

series and the direction taken by theoretical and practical literature varied depending

largely on the particular field. Recent developments in ARIMA, genetic algorithms,

nonlinear developments and more were discussed. Pointers were given towards possible

future directions for analyses of data correlated over time.

A detailed forestry application based on data from an experiment on the effects of

compaction and cultivation on soil chemical and biological properties over time was

presented. Due to the short time series (less than twenty time periods) nature of the

iv

experiments, split plot and MANOVA were utilised for most analysis. Moving average

smoothers and cross correlation functions proved useful for exploring relationships

between treatments and variables. Interpreted together, the split plot and MANOVA

designs were found to be far more informative than either could be in isolation. Many

significant relationships were determined from the original data set.

A number of statistical issues were found to be very important when considering analyses

of data correlated over time. Large amounts of natural variation or error make the

establishment of significant relationships difficult. Hence it is important to carefully

consider sources of variation in experimental designs. All analyses covered require

certain assumptions that need to be carefully monitored. In the case of ARIMA time

series analysis, stationarity of mean and variance is commonly violated. In the case of

split plot and MANOVA designs, normality must be watched.

v

ACKNOWLEDGEMENTS

Firstly a big thankyou to my wife Leanne (Fenech) for the patience and occasional

bravery to have a go at understanding statistics. Thankyou to Mum (Denise Fenech), Dad

(Louis Fenech) and Grandma (Hazel Corby) who have always been and continue to

always be there. Cheers to my assorted uncles, aunts, cousins, rellies, in-laws, neighbours

and even brother Scott, who see Stuart first and mathematical nut second.

The ‘old school’ friends have been supportive, fun, and kept my feet firmly on the

ground. Thankyou particularly to Ben Marks, Walter Haas, Nicholas Page, Thomas

White, Simon Mahoney, Cara Barnes, Kym Higgins and Lisa Tarca.

Thankyou to all at the Griffith University Cooperative Research Centre for Sustainable

Production Forestry (CRC) for your support. In particular, thankyou to Tim Blumfield,

ZhiHong Xu and Chengrong Chen for your time.

A special thankyou to my supervisor Janet Chaseling. It was not so long ago I was

seventeen and hiding in statistics lectures being happily unnoticed. Thankyou for coaxing

me out of the shadows and into the light of statistics. Your time, dedication, experience,

general knowledge and well placed stirring are greatly appreciated.

Thankyou to all the other people I have had the pleasure of dealing with at Griffith

University over the years. Of particular note are Andrew Rock, Rodney Topor, Carlo

Hamalainen, Alex Creagh, Rebecca O’Leary and Cameron Hurst.

Cheers!

Stuart Fenech – October 2002

http://www.humanfrailty.com/

vi

TABLE OF CONTENTS

ABSTRACT...................................................................................................iii

ACKNOWLEDGEMENTS ............................................................................ v

LIST OF FIGURES......................................................................................... x

LIST OF TABLES .......................................................................................xiii

GLOSSARY................................................................................................. xiv

SYMBOLS.................................................................................................... xv

1 INTRODUCTION ................................................................................ 1

2 TIME SERIES THEORY ..................................................................... 5

2.1 Fundamental Statistical Concepts ..................................................................... 5

2.1.1 Univariate Information................................................................................ 5

2.1.2 Bivariate Information ................................................................................ 11

2.1.3 Dependence within Variables.................................................................... 12

2.1.4 Statistical Measures and Terms................................................................. 15

2.1.5 Hypothesis Testing Overview ................................................................... 16

2.1.6 Outliers ...................................................................................................... 16

2.2 Correlation Functions...................................................................................... 17

2.2.1 The Autocorrelation Function (ACF)........................................................ 17

2.2.2 The Partial Autocorrelation Function (PACF) .......................................... 19

2.2.3 The Inverse Autocorrelation Function (IACF).......................................... 20

2.2.4 The Cross Correlation Function (CCF)..................................................... 23

2.3 Repeated Measures Models............................................................................. 26

2.3.1 Background ............................................................................................... 27

2.3.2 Split Plot Designs ...................................................................................... 30

2.3.3 MANOVA................................................................................................. 32

2.4 Univariate Time Series Models....................................................................... 36

2.4.1 Time Series Model Components ............................................................... 36

2.4.2 General Time Series Models ..................................................................... 39

vii

2.4.3 Moving Averages ...................................................................................... 40

2.4.4 Simple Linear Regression ......................................................................... 44

2.4.5 Multiple Linear Regression....................................................................... 45

2.4.6 Stationarity ................................................................................................ 48

2.4.7 Backshift Notation..................................................................................... 52

2.4.8 AR (Autoregressive) Models .................................................................... 52

2.4.9 MA (Moving Average) Models................................................................. 55

2.4.10 ARMA (Autoregressive Moving Average) Models.................................. 58

2.4.11 ARIMA (Autoregressive Integrated Moving Average) Models ............... 58

2.4.12 Forecasting ................................................................................................ 64

2.5 Multivariate Time Series Models.................................................................... 65

2.5.1 Multivariate ARIMA Models.................................................................... 65

2.5.2 Vector ARIMA Models............................................................................. 70

3 THEORY LITERATURE REVIEW.................................................. 73

3.1 AR/ARMA/ARIMA Developments................................................................ 76

3.2 ARIMA Alternative Developments ................................................................ 80

3.3 Bayesian Developments .................................................................................. 82

3.4 Nonlinear Developments................................................................................. 83

3.5 Miscellaneous Developments.......................................................................... 85

4 APPLICATION LITERATURE REVIEW........................................ 88

4.1 Medical Applications ...................................................................................... 92

4.2 Economic Applications ................................................................................... 94

4.3 Sociology Applications ................................................................................... 96

4.4 Natural Phenomena Applications.................................................................... 97

5 FORESTRY CASE STUDY ............................................................ 105

5.1 Background ................................................................................................... 105

5.2 Previous Data Analysis ................................................................................. 110

5.2.1 Chemical Data ......................................................................................... 110

5.2.2 Biological Data........................................................................................ 112

5.3 Limitations and Scope................................................................................... 114

5.4 Data Analysis Techniques............................................................................. 115

5.4.1 Analysis Direction................................................................................... 115

viii

5.4.2 Exploratory Data Analysis (EDA) .......................................................... 120

5.4.3 Correlation Analysis................................................................................ 121

5.4.4 Overall Split Plot Designs ....................................................................... 122

5.4.5 Overall MANOVA Designs .................................................................... 125

5.4.6 Season Based Split Plot Designs............................................................. 127

5.4.7 Season Based MANOVA Designs.......................................................... 129

5.4.8 Multiple Comparison Tests ..................................................................... 130

5.5 Data Analysis and Results............................................................................. 132

5.5.1 Nitrate Levels .......................................................................................... 132

5.5.2 Ammonium Levels .................................................................................. 136

5.5.3 Total Mineral Nitrogen Levels ................................................................ 142

5.5.4 Nitrate Dynamics..................................................................................... 146

5.5.5 Ammonium Dynamics ............................................................................ 149

5.5.6 Total Mineral Nitrogen Dynamics .......................................................... 151

5.5.7 Nitrate Leaching...................................................................................... 152

5.5.8 Ammonium Leaching.............................................................................. 154

5.5.9 Total Mineral Nitrogen Leaching............................................................ 156

5.5.10 Microbial Carbon Levels......................................................................... 158

5.5.11 Microbial Nitrogen Levels ...................................................................... 162

5.5.12 Microbial Carbon to Nitrogen Ratio ....................................................... 166

5.6 General Discussion........................................................................................ 174

6 CONCLUSION................................................................................. 178

REFERENCES............................................................................................ 182

APPENDIX A – SAS EXAMPLES INPUT............................................... 187

APPENDIX B – SAS EXAMPLES OUTPUT........................................... 191

APPENDIX C – EXPERIMENT TIME PERIODS ................................... 199

APPENDIX D – VARIABLE LIST ........................................................... 200

APPENDIX E – NITRATE LEVELS ........................................................ 205

APPENDIX F – AMMONIUM LEVELS .................................................. 214

APPENDIX G – TOTAL MINERAL NITROGEN LEVELS ................... 223

ix

APPENDIX H – NITRATE DYNAMICS ................................................. 232

APPENDIX I – AMMONIUM DYNAMICS............................................. 241

APPENDIX J – TOTAL MINERAL NITROGEN DYNAMICS.............. 245

APPENDIX K – NITRATE LEACHING .................................................. 249

APPENDIX L – AMMONIUM LEACHING ............................................ 253

APPENDIX M – TOTAL MINERAL NITROGEN LEACHING............. 257

APPENDIX N – MICROBIAL CARBON LEVELS................................. 261

APPENDIX O – MICROBIAL NITROGEN LEVELS ............................. 269

APPENDIX P – MICROBIAL C:N RATIO .............................................. 277

x

LIST OF FIGURES

Figure 2.1: Skewness – left, right and no skew................................................................... 7

Figure 2.2: May rainfall plotted against year. ..................................................................... 9

Figure 2.3: Correlations of 1 (positive), -1 (negative) and 0.1 (weak positive)................ 12

Figure 2.4: Rainfall over time from 1985 to 2001 inclusive – line graph......................... 13

Figure 2.5: Example autocorrelation function................................................................... 18

Figure 2.6: Monthly rainfall graphical autocorrelation function....................................... 21

Figure 2.7: Monthly rainfall partial graphical autocorrelation function. .......................... 22

Figure 2.8: Monthly rainfall inverse graphical autocorrelation function. ......................... 22

Figure 2.9: Rainfall and Days of Rain over time from 1985 to 2001 inclusive. ............... 24

Figure 2.10: Rainfall and Days of Rain cross correlation function................................... 24

Figure 2.11: Time plot of a pure trend component. .......................................................... 37

Figure 2.12: Time plot of a pure seasonal component. ..................................................... 37

Figure 2.13: Time plot of a pure random component. ...................................................... 38

Figure 2.14: Time plot of trend, season and random components (additive).................... 39

Figure 2.15: A time series before and after applying a moving average smoother........... 40

Figure 2.16: Applying a 2 × 4MA moving average to rainfall data (1994 to 2001). ........ 44

Figure 2.17: Simple linear regression presented graphically. ........................................... 45

Figure 2.18: Applying differencing to a created series with a clear trend. ....................... 51

Figure 2.19: Typical autocorrelation function for AR(1), positive 1φ ............................... 53

Figure 2.20: Typical partial autocorrelation function for AR(1), positive 1φ . .................. 54

Figure 2.21: Typical autocorrelation function for AR(p). ................................................. 54

Figure 2.22: Typical partial autocorrelation function for AR(p). ...................................... 55

Figure 2.23: Typical autocorrelation function for MA(1), positive 1θ .............................. 56

Figure 2.24: Typical partial autocorrelation function for MA(1), positive 1θ ................... 56

Figure 2.25: Typical autocorrelation function for MA(q)................................................. 57

Figure 2.26: Typical partial autocorrelation function for MA(q)...................................... 57

Figure 2.27: Time plot of rainfall over time from 1985 to 2001 inclusive. ...................... 62

Figure 5.1: Picture of the forwarder used for compaction in experiments...................... 106

Figure 5.2: Three sampling cores in the ground at Yarraman. One is being removed. .. 108

xi

Figure 5.3: Mean mineral nitrogen levels (kgN/ha) over the nineteen months............... 117

Figure 5.4: Mean mineral nitrogen dynamics (kgN/ha) over the nineteen months......... 117

Figure 5.5: Graphical notation for compaction and cultivation options. ........................ 121

Figure 5.6: Back transformed means (± S.E.) for compaction effects on mean

nitrate levels in season one...................................................................................... 134


nitrate levels in season two...................................................................................... 135

Figure 5.8: Back transformed means (± S.E.) for compaction and cultivation effects

on mean ammonium levels in season one. .............................................................. 138


ammonium levels in season two (each month separately). ..................................... 139

Figure 5.10: Back transformed means (± S.E.) for cultivation effects on mean

ammonium levels in season two (each month separately). ..................................... 140

Figure 5.11: Back transformed means (± S.E.) for compaction and cultivation

effects on mean ammonium levels in season three. ................................................ 141

Figure 5.12: Back transformed means (± S.E.) for compaction effects on mean total

mineral nitrogen levels in season one. .................................................................... 144


effects on mean total mineral nitrogen levels in season three................................. 145


nitrate dynamics in season one (each month separately). ....................................... 148


effects on mean ammonium dynamics. ................................................................... 150


nitrate leaching. ....................................................................................................... 153


ammonium leaching. ............................................................................................... 155

Figure 5.18: Back transformed means (± S.E.) for cultivation effects on mean total

mineral nitrogen leaching........................................................................................ 157

Figure 5.19: Microbial carbon levels by compaction and cultivation over time............. 158

Figure 5.20: Graphical cross correlation function - microbial carbon and soil

moisture................................................................................................................... 159


effects on mean microbial carbon levels in season two. ......................................... 160

xii


effects on mean microbial carbon levels in season three. ....................................... 161

Figure 5.23: Back transformed means (± S.E.) for block effects on mean microbial

nitrogen levels. ........................................................................................................ 163


microbial nitrogen levels in season one (each month separately). .......................... 164


microbial nitrogen levels in season three (each month separately)......................... 165

Figure 5.26: Graphical cross correlation function – microbial carbon to nitrogen

ratio and soil moisture. ............................................................................................ 167


effects on the mean microbial carbon to nitrogen ratio in season one (each

month separately). ................................................................................................... 168


effects on the mean microbial carbon to nitrogen ratio in season two (each

month separately). ................................................................................................... 169


effects on the mean microbial carbon to nitrogen ratio in season three (month

nine)......................................................................................................................... 170


effects on the mean microbial carbon to nitrogen ratio in season three (months

ten and eleven). ....................................................................................................... 171


effects on the mean microbial carbon to nitrogen ratio in season four. .................. 172

xiii

LIST OF TABLES

Table 2.1: Autocovariance and autocorrelation data from monthly rainfall. .................... 14

Table 2.2: Split plot projected ANOVA for moisture repeated measures example. ......... 31

Table 2.3: Rainfall data from January to December in 2001. ........................................... 43

Table 2.4: First and second differencing applied to 2001 rainfall data............................. 51

Table 3.1: The fields of research involved in recent theoretical articles........................... 74

Table 3.2: Aspects looked at by researchers in recent theoretical articles. ....................... 76

Table 4.1: Field of application for recent literature articles. ............................................. 89

Table 4.2: Techniques used in detail in recent literature articles. ..................................... 91

Table 5.1: Summary of factors and variables provided for analysis in the case study. .. 109

Table 5.2: Standard summary notation for factor levels. ................................................ 120

Table 5.3: Legend for symbols denoting significance. ................................................... 120

Table 5.4: Structure and df in overall split plot ANOVA designs. ................................. 124

Table 5.5: Structure and df in seasonal split plot ANOVA designs................................ 128

xiv

GLOSSARY

ACF Autocorrelation function.

AIC Akaike’s information criterion.

ANOVA Analysis of variance.

AR Autoregressive model.

ARIMA Autoregressive integrated moving average model.

ARMA Autoregressive moving average model.

ARMAX Autoregressive moving average model with explanatory variables.

CCF Cross correlation function.

CRD Completely randomised design.

CV Coefficient of variance.

DF Degrees of freedom.

DPI Department of Primary Industries.

EDA Exploratory data analysis.

IACF Inverse autocorrelation function.

MA Moving average.

MANOVA Multivariate analysis of variance.

MSE Mean square error.

PACF Partial autocorrelation function.

PCA Principle components analysis.

RCB Randomised complete block.

SE Standard error.

VAR Vector autoregressive model.

VARMA Vector autoregressive moving average model.

VMA Vector moving average model.

xv

SYMBOLS

α Significance level (usually 0.05).

β Simple or multiple regression model parameters.

B Backshift operator.

c A constant (in models) and the sample covariance.

d Order of first differencing used.

D Order of seasonal first differencing used.

ε Error or random variation.

µ Population mean.

n Sample size.

p Order of autoregressive model components.

P Order of seasonal autoregressive model components.

q Order of moving average model components.

Q Order of seasonal moving average model components.

r Sample correlation coefficient.

r2 Coefficient of determination.

s Sample standard deviation.

t Time.

X Sample mean (of a variable X).

φ Autoregressive model parameter (and Dickey-Fuller test parameter).

Φ Seasonal autoregressive model parameter.

θ Moving average model parameter.

Θ Seasonal moving average model parameter.

1

1 INTRODUCTION

When a random variable is measured at a number of different times, the result is a

univariate (single variable) time series. Values in a time series are usually correlated due

to the nature of multiple recordings on the same entity. If many different variables are

recorded over time then the situation is a multivariate (multiple variable) time series. This

dissertation investigates the effective analysis of one or more variables where data are

correlated over time.

The project providing the motivation for this thesis comes from forestry experiments

conducted at Yarraman, Australia. The data used came from experiments on the effects of

compaction and cultivation on soil chemical and biological properties. Two data sets

were provided by the Griffith University Cooperative Research Centre for Sustainable

Production Forestry (CRC), one for chemical properties and one for biological properties.

The data sets contain many variables measured over time, and represent a multivariate

time series situation. Details of the experiment and of the initial data analysis are given in

Blumfield et al. (2002) for the chemical data set and Chen et al. (2002) for the biological

data set.

A plethora of techniques are available for the analysis of univariate and multivariate time

series situations. Time series occur in many fields, including finance, physics, computing,

medicine, ecology and forestry. The range of methods and techniques available is as

varied as the fields from which they come. This dissertation investigates modern time

series analysis techniques before providing a detailed forestry application involving

research into the effects of compaction and cultivation on soil chemical and biological

properties. The general aim is to inspire confidence and understanding in dealing

theoretically and practically with data correlated over time. To aid in achieving this,

complex time series concepts are presented by building from the basics.

Advancements in computing mean that time series data can easily be analysed on the

average personal computer, given appropriate software. For the purposes of this

2

dissertation the SAS statistical package (SAS Institute, 1999) is mainly used for

assistance in calculations. However, the best software in the world will not help without

an understanding of what the software is doing. Therefore software is regarded as a tool

for analysis and is only referred to after theoretical aspects are covered.

Three main sections form this dissertation. The first involves reviewing modern accepted

time series analytical techniques. The second section involves a critical review of recent

developments and applications in the multivariate time series field. The final section

involves a detailed application of methods for dealing with data correlated over time. A

more detailed outline of the dissertation is provided below.

Chapter two contains analytical techniques commonly applied to data correlated over

time. Complicated techniques are addressed via a gradual approach that starts with

looking at the basics. Basic statistical concepts that appear in the analysis of correlated

data such as covariance, correlation, standard error, outliers, hypothesis testing and so

forth are looked at first. Four correlation functions commonly used as an exploratory

precursor to time series modelling are presented.

Two main areas of detailed analysis are covered in chapter two. The first is repeated

measures analysis, which can be applied to the time series case. Two particular methods

that can be applied to repeated measures, namely, split plot designs and MANOVA, are

presented in detail. These repeated measures techniques can be used for analysis of any

number of time series. The second group of analytical techniques covered are known as

time series techniques. These techniques are developed specifically for the analysis of

time series. Both univariate and multivariate techniques are reviewed. Univariate

techniques include AR (autoregressive) models, MA (moving average) models, ARMA

(autoregressive moving average) and ARIMA (autoregressive integrated moving average)

models. Multivariate techniques looked into include multivariate ARIMA and vector

ARIMA. Forecasting is not investigated as it is beyond the scope of this thesis.

To accompany the presentation of theory in chapter two is a series of examples. The

practical examples are based on rainfall data collected by the author’s father (Louis

Fenech) over seventeen years at Buccan, Queensland, Australia. Assistance on using the

3

SAS statistical package (SAS Institute, 1999) for analysis is also provided via these

examples.

Theoretical developments in multivariate time series analysis are covered in chapter

three. A rich and varied set of papers from the last eight years are reviewed for their

contribution towards the multivariate time series wealth of knowledge. Many

developments are based on the ARIMA models from chapter two while others involve

ARIMA alternatives, Bayesian statistics and nonlinear techniques among others. This

chapter presents an understandable delving into the many directions of multivariate time

series progress.

Chapter four investigates practical applications of time series techniques from the last

eight years. The methods and techniques used are critically reviewed and reported on

with a view towards practical understanding. Recent articles involve either medical,

economic, sociology or natural phenomena applications.

Chapter five presents a detailed case study analysing the Yarraman data introduced

earlier. Information on the background of the experiments is provided and previous data

analysis carefully reviewed. The purpose of analysis is to investigate the effects of soil

cultivation, compaction, and their possible interaction, over time on soil biological and

chemical variables. Detailed analysis using advanced techniques is applied to twelve

separate variables, nine from the chemical data set and three from the biological data set.

Moving average smoothers, correlation, split plot designs, MANOVA and Bonferroni

modified multiple comparison tests feature among the techniques applied in data analysis.

The forestry case study concludes with a discussion about the statistical analysis

undertaken. The value of all methods and techniques used are reviewed. Considerations

are given towards possible advancement of analyses of this and similar future

experiments.

The dissertation is provided in two volumes. The first volume encompasses all of the

chapters discussed above while the second volume contains appendices. The majority of

the appendices are comprised of raw data analysis from the forestry application.

4

The specific aims of this dissertation are:

• To provide a clear and precise introductory guide to techniques available for the

analysis of data correlated over time, including repeated measures, univariate time

series and multivariate time series techniques.

• To investigate recent theoretical developments in modelling of multivariate time

series situations.

• To investigate recent applications of multivariate time series techniques.

• To apply techniques for dealing with data correlated over time to a data set on

compaction and cultivation effects on soil chemical and biological properties over

time.

While not every concept relating to dealing with correlated data over time is covered

within, every effort has been taken to ensure that a thorough cross section of current time

series topics is covered. After reading this thesis it is intended that the reader will have a

working knowledge of dealing with data correlated over time in the theoretical and

practical sense.

5

2 TIME SERIES THEORY

2.1 Fundamental Statistical Concepts

A random variable is a characteristic or attribute that assumes randomly different values

(Bluman, 2001). This definition allows a lot of things (eg. temperature, monthly rainfall)

to be classified as random variables. For ease of notation this thesis refers to random

variables as upper case letters (eg. X) and particular variable values using a subscript (eg.

X1).

In this section sample data is dealt with exclusively. Occasionally population data may be

available for analysis but as this is uncommon the focus here is on samples. Summary

values calculated from samples are referred to as statistics. A statistic is estimator of a

property for a population parameter, the true value that would be retrieved should the

entire population be analysed.

2.1.1 Univariate Information

Initial investigation into a single random variable (univariate) is often referred to as

exploratory data analysis (EDA). A usual first stage in EDA is to plot a graph of the

variable being investigated. From here there are a number of summary measures that can

be calculated to give more information than can be usually seen with the human eye.

The most common summary measures are measures of central tendency and measures of

dispersion. Measures of central tendency look at ‘expected’ values and answer questions

such as ‘what is the average temperature?’ Measures of dispersion look at how ‘spread

out’ the data is and look at questions like ‘how varied is the temperature?’

Two common measures of central tendency are the median and mean (arithmetic

average). The median is the middle value that the variable takes when ordered. The mean,

more commonly seen in statistical analysis, is denoted using a variable with a bar on the

6

top (eg. X is the mean of variable X). The mean is calculated by adding all variable

values and dividing them by the number of data entries. This is expressed in Equation 2.1

where the number of variable values is n and Xi represents individual variable values. The

mean is a commonly seen and generally well understood summary statistic used in a large

range of statistical analysis in the time series area. While mathematically precise, the

mean is affected by extreme values (outliers) and hence can sometimes be misleading.

n

XX

n

ii∑

== 1 ( 2.1 )

The most common measures of dispersion are the range, variance and standard deviation.

The range is simply the smallest value the variable takes subtracted from the largest

value. The variance 2s is a weighted sum of the differences between the data and the

mean, in effect giving the amount of dispersion around the mean. The standard deviation

s (Equation 2.2) is the square root of the variance and is close in concept to ‘the average

distance of data from the average’. The units of the standard deviation are the same as in

the original variable.

( )2

111

∑=

−

−=

n

ii XX

ns ( 2.2 )

Closely related to the standard deviation is the coefficient of variance (CV), which is in

effect a scaled standard deviation. The problem with the standard deviation and variance

is that as means get larger the standard deviation and variance are likely to also get larger,

making direct comparison of these measures difficult (Zar, 1999). By scaling using the

mean as shown in Equation 2.3 the coefficient of variance gives a standardised

comparable measure of no particular units.

XsCV ×=100 ( 2.3 )

Other measures of dispersion include inter quartile range, mean of absolute deviations

(MAD) and mean of squared deviations (MSD). Variance and standard deviation are

more commonly used and accepted due to favourable statistical properties (Makridakis et

al., 1998).

7

There are many other summary measures available for random variables. One of these is

skewness, which looks at the shape of the distribution formed by a random variable. The

distribution of a variable is said to have a left skew, no skew or a right skew depending

on the dispersion of values as seen in Figure 2.1. The raw formula for calculating skew

involves cubed differences between variable values and the mean (whereas the standard

deviation squared differences). Equation 2.4 shows one of many forms shown for the

calculation of skew. A skew value of zero indicates no skew while a value less than zero

indicates a skew to the left and values more than zero a skew to the right.

Skewness

No SkewRight SkewLeft Skew

Figure 2.1: Skewness – left, right and no skew.

( )( )

3

121 ∑=

−−−

=n

i

i

sXX

nnnskew ( 2.4 )

Kurtosis is a statistic that gives another indication of the shape of a distribution formed

by a random variable in comparison to a normal distribution. The kurtosis formula given

in Equation 2.5 uses differences between variable values and the mean taken to the power

of four. A kurtosis value of zero indicates that the shape is ‘mesokurtic’ and is as the

form of the ‘bell shape’ normal distribution. When the kurtosis value is more than zero

the distribution is ‘leptokurtic’ and tends to have more values further away from the

mean, causing a thinner appearance around the mean. Alternatively, a kurtosis value of

8

less than zero indicates a ‘platykurtic’ distribution where there are more values around

the mean causing a wider appearance (Zar, 1999).

( )( )( )( )

( )( )( )32

13321

1 24

1 −−−−

−−−−

+= ∑= nn

ns

XXnnn

nnkurtosisn

i

i ( 2.5 )

Example 1: Single Variable Investigation on May Rainfall

Being ever so slightly statistically obsessed, the Fenech family has been recording

rainfall since 1985 on their humble residence in Buccan, south of Brisbane, Queensland,

Australia. Every morning around 6am for the better part of seventeen years Louis Fenech,

the author’s father, has dutifully recorded the rainfall in millimetres.

In this example ‘May rainfall’ is investigated. For the investigation it is assumed that the

May rainfall data available (1985 to 2002) is a random sample from a population of May

rainfall.

The Fenech residence wants to know any interesting information, including the median,

mean, standard deviation and skewness. The total rainfall in millimetres (mm) for each

May from 1985 to 2002 has been as follows: 81, 89, 124.6, 8, 162.6, 229.5, 68.1, 85.05,

51.5, 64.35, 41.4, 582.55, 142.35, 153.9, 94.95, 57, 40.75, 50. A graph showing this data

graphically is given in Figure 2.2.

9

May Rainfall

0

100

200

300

400

500

60019

85

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

Year

Rai

nfal

l (m

m)

Figure 2.2: May rainfall plotted against year.

The median (middle value) tells the ‘middle’ value of the rainfall measurements in May.

It is a good gauge of what you would ‘expect’ rainfall in May to be as it is the point

where as many measurements were more than the value as there was less. The median

was found to be 83.025 mm.

The mean, another indication of ‘expected’ rainfall, is calculated below. Notice that the

mean is dramatically higher than the median. A large contribution to this effect is the

extreme rainfall of 583mm in May 1996.

mm

n

XX

n

ii

1444.11818

2126.618

50 40.75 57 8 124.6 89 81

1

=

=

+++++++=

=∑

=

l

The range gives an indication of the variation present in the data. Simply the highest

value minus the lowest, it is simple to calculate. A massive range of 575 mm was found

to be present in May rainfall to date.

10

The more computationally intensive standard deviation and CV are calculated below.

( )

( ) ( ) ( )[ ]8903.127

14.1185014.1188914.11881118

1

11

222

2

1

=

−++−+−

−=

−

−= ∑

=

l

n

ii XX

ns

249.1081444.1188903.127100

100

=

×=

×=XsCV

This large standard deviation of 127.89 says that the ‘average distance from the average’

rainfall is 127.89 mm. This is quite a large standard deviation, a perhaps unsurprising

result given the large variability in the original data. The CV, often considered as a

‘percentage’ from comparing the standard deviation with the mean, is 108. A rule of

thumb sometimes seen is that a CV over twenty it quite variable which makes 108

extremely variable.

Given that there are a few extremely high values of May rainfall, it is anticipated that

there will be a skew to the right (where ‘right’ indicates higher rainfall). The result from

the skew calculation is above zero which reaffirms a skew to the right.

( )( )

( )( )( ) ( ) ( )( )

( )1046.3

89.12714.1185014.1188914.11881

21811818

21

3

333

3

1

=

−++−+−−−

=

−−−

= ∑=

l

n

i

i

sXX

nnnskew

Kurtosis is another measure of the shape of the distribution formed by a random variable.

Given the spread out nature of the May rainfall data, the result here is of little surprise.

The kurtosis value of above zero indicates that May rainfall is ‘leptokurtic’ and tends to

have more values further away from the mean than in a normal distribution.

11

( )( )( )( )

( )( )( )

( ) ( )( )( )

1038.11

1516173

89.12714.1185014.11881

1516171918

3213

3211

2

4

44

24

1

=

××−

−++−××

×=

−−−−

−−−−

+= ∑=

l

nnn

sXX

nnnnnkurtosis

n

i

i

Refer to Appendix A under this example for code to make SAS calculate these statistics

for you.

2.1.2 Bivariate Information

Although summary measures on single variables are common and important, often the

quest is to find relationships between different variables. There are a couple of commonly

used summary measures that can be used to quantify relationships between variables. For

the purposes of this section covariance and correlation will be investigated.

Covariance (c), a measure of how two variables X and Y vary together, is defined in

Equation 2.6. The mean of X is denoted by X , the mean of Y by Y and the number of

values of X and Y being compared by n.

( ) ( )∑=

−−

−=

n

iiiXY YYXX

nc

111 ( 2.6 )

Closely related to covariance is the sample correlation coefficient, r, which is in effect a

scaled covariance where the results are between -1 and 1. The correlation coefficient is

defined in Equation 2.7 where Xs is the standard deviation of X and Ys is the standard

deviation of Y.

( ) ( )

( ) ( )∑∑

∑

==

=

−−

−−==

n

ii

n

ii

n

iii

YX

XYXY

YYXX

YYXX

sscr

1

2

1

2

1 ( 2.7 )

The correlation coefficient measures the level of linear correlation between two variables

X and Y. A ‘linear’ relationship entails that for a change in X there will be a constant

12

change in Y (and vice versa), no matter what the X value is. A value of 1 indicates a

perfect positive relationship, -1 a perfect negative relationship, and a value of 0 indicates

no relationship at all. Figure 2.3 demonstrates these facts graphically.

Figure 2.3: Correlations of 1 (positive), -1 (negative) and 0.1 (weak positive).

2.1.3 Dependence within Variables

Correlation previously investigated (see section 2.1.2) was discussed from the perspective

of there being different variables. However, a variable may be correlated with itself. This

situation is common in time series, where values at one moment in time may be

correlated with the previous (or other) moments in time. For example, there may be

correlation between temperatures or stock prices on sequential days.

The terms ‘autocovariance’ and ‘autocorrelation’ are used to refer to covariance and

correlation within a variable. These measures are taken for particular ‘lags’ or delays of

the given variable. For instance, if looking at daily temperature measures, a lag of one

would look at the measures one day apart, a lag of two would look at measures two days

apart and so on.

Given a lag k, Equation 2.8 calculates sample autocovariance kc and Equation 2.9

calculates sample autocorrelation kr . Notice that these formulae are similar to those given

for the two variable measures in section 2.1.2. In these formulae, Y is the mean of time

series variable Y, tY represents the value of time series Y at time t, and ktY − the value of Y

at lag k.

( ) ( )∑+=

− −−=n

ktkttk YYYY

nc

1

1 ( 2.8 )

13

( ) ( )( )∑

∑

=

+=−

−

−−= n

tt

n

ktktt

k

YY

YYYYr

1

2

1 ( 2.9 )

For a given data set where assessment of correlation at a number of lags is wanted, this

process can become time consuming and tedious. Thankfully software packages

including SAS (SAS Institute, 1999) can be coaxed into generating autocovariance and

autocorrelation information.

Example 2: Autocovariance and Autocorrelation in Rainfall Data

Returning to the Buccan rainfall data, it is of interest to see if there is correlation between

rainfalls in different months. That is, information on autocovariance and autocorrelation

is wanted on monthly rainfall. Weather in south east Queensland typically involves hot,

humid, moderately wet summer conditions from around December to March and cool,

dry winters from around June to September.

A graphical summary from Microsoft Excel (Microsoft Excel, 2001) is shown in Figure

2.4 of the entire monthly rainfall data recorded from 1985 to 2001.

Rainfall Over Time

0

100

200

300

400

500

600

Jan

1985

Jan

1986

Jan

1987

Jan

1988

Jan

1989

Jan

1990

Jan

1991

Jan

1992

Jan

1993

Jan

1994

Jan

1995

Jan

1996

Jan

1997

Jan

1998

Jan

1999

Jan

2000

Jan

2001

Month

Rai

nfal

l

Figure 2.4: Rainfall over time from 1985 to 2001 inclusive – line graph.

14

Using SAS (SAS Institute, 1999) to retrieve autocovariance and autocorrelation data

entails two steps. Firstly, the data must be read in and then proc ‘arima’ must be called to

analyse the data. Appendix A contains the SAS code used for retrieving these results.

The first section of output from the ‘arima’ procedure gives autocorrelation data. A

summary of results obtained is provided in Table 2.1.

Lag Autocovariance Autocorrelation0 7050.041 11 1618.361 0.229552 430.13 0.061013 81.115935 0.011514 -208.723 -0.029615 -686.799 -0.097426 -968.495 -0.137377 -1241.304 -0.176078 -218.564 -0.0319 106.109 0.0150510 281.02 0.0398611 769.686 0.1091712 1626.151 0.2306613 870.382 0.1234614 -28.987065 -0.0041115 -31.67815 -0.0044916 -1228.963 -0.1743217 -1160.323 -0.1645818 -1234.903 -0.1751619 -965.033 -0.1368820 -386.607 -0.0548421 47.909485 0.006822 858.003 0.121723 1163.412 0.1650224 1052.169 0.14924 Table 2.1: Autocovariance and autocorrelation data from monthly rainfall.

Notice that there is a perfect correlation (1) at a lag of zero. This makes sense because the

set of data is being compared with itself at this lag. The largest correlations are at a lag of

one and a lag of twelve. A lag of one could have been anticipated because there may be

15

some relationship between rainfalls in successive months. The correlation at a lag of

twelve is a reflection of the seasonal pattern seen frequently in rainfall data.

2.1.4 Statistical Measures and Terms

There are a number of common statistical measures and terms that find their way into

time series analysis. This section takes a brief look into these concepts.

The standard error of the mean and the standard error for the difference between two

means are standard statistical measures. Both are common summary measures used in

confidence limits and hypothesis testing (Rao, 1998). The standard error of the mean is

given in Equation 2.10 where s is the standard deviation and n the sample size used to

calculate the mean. The standard error of the difference between two means is given in

Equation 2.11 where n1 and n2 are the sample sizes used in calculating the two means.

nsSE

X= ( 2.10 )

21

21

11nn

sSE XX +=− ( 2.11 )

In most statistical models of a time series, a time series Y value at time t (Yt) is seen as a

combination of an explained part and a random error et as seen in Equation 2.12. Random

error is also commonly referred to as natural variation, residual or simply error. A

random error is a natural and logical result of natural variation, measurement error and

other similar issues. There are a number of summary measures of random error. Equation

2.13 shows mean error (ME), Equation 2.14 mean absolute error (MAE) and Equation

2.15 the frequently seen and used mean square error (MSE). In all of these formulae n is

the number of errors and tε the error at time t.

tt ExplainedY ε+= ][ ( 2.12 )

∑=

=n

ttn

ME1

1 ε ( 2.13 )

16

∑=

=n

ttn

MAE1

1 ε ( 2.14 )

∑=

=n

ttn

MSE1

21 ε ( 2.15 )

Makridakis et al. (1998) also briefly investigate some other summary error measures.

These include relative or percentage error, mean percentage error and mean absolute

percentage error. These percentage based errors are useful for comparisons of models

when they are not initially in the same units.

2.1.5 Hypothesis Testing Overview

Often the procedure of hypothesis testing is applied to models as a whole and also to

individual model components. In hypothesis testing there is a null hypothesis that is

assumed true and an alternative hypothesis that is the case if the null is not true. The

probability of the null hypothesis being true, commonly known as the p value, is

evaluated. If the p value is less than a given allowable error α (commonly 0.05) then the

null hypothesis is rejected and the alternative hypothesis accepted.

When testing models and model components, more often than not the null hypothesis

assumes no relationship. No relationship usually entails that a model or model component

has no noticeable impact or influence. For example, in ANOVA the null hypothesis is

there is no difference in mean between different levels of a factorial effect on a dependent

variable. A relationship is shown when the null hypothesis or no relationship is rejected.

In this way, hypothesis testing fills the role of establishing the usefulness of a model or

components in a model.

2.1.6 Outliers

Outliers are values in a data set that are so extreme that they do not appear to be part of

the data set (Zar, 1999). Outliers are frequently the result of errors in measurement or

inconsistency in units. Whereas outliers may be the result of errors, they can also be an

integral, legitimate part of the data set. Sample data containing outliers can lead to severe

17

departures from standard assumptions made in statistical analysis (for example, equal

variances in ANOVA). Where outliers are present, careful consideration must take place

to decide what to do with them. Options include leaving them out of analysis, correcting

errors (if known to be errors) or applying nonparametric statistical analyses, which are

less effected by outliers (Zar, 1999).

2.2 Correlation Functions

Correlation functions are used as a diagnostic tool on time series to judge the types of

relationships evident in the time series. The first three correlation functions here look at

relationships within time series, to see if values at particular times are related to values at

previous times in any way. The final correlation function here investigates relationships

between two time series.

There are a number of different relationships commonly seen in time series that can be

shown by correlation functions. These include where values at one time are related to

those immediately prior to it. Another common relationship is a seasonal one, where

values are related at a fixed interval of time (yearly temperature and rainfall patterns, for

example). These relationships and more present certain patterns in correlation functions

that aid in their diagnosis.

2.2.1 The Autocorrelation Function (ACF)

The autocorrelation function (ACF) and its graphical form called the correlogram are a

natural consequence of having autocorrelation in data. Autocorrelation functions have the

capability to show autocorrelation data (see section 2.1.3) in the form of a graph for ease

of interpretation and understanding.

The autocorrelation function involves calculating correlation at different lags within a

variable using the autocorrelation formula. Figure 2.5 shows a typical graphical

autocorrelation function (correlogram). Note that the correlogram is graphed only from

lags of zero forward. This is because the autocorrelation function is symmetrical around

zero as negative lags are identical to positive lags (Nemec, 1996). That is, the

autocorrelation at a lag of –a is the same as at a.

18

Autocorrelation Function

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Lag

Cor

rela

tion

Figure 2.5: Example autocorrelation function.

If there were absolutely no correlation at a lag, a perfect correlation of zero may be

expected. However, usually a small correlation will appear even if there is no relationship

whatsoever. For judging which correlations will appear merely by chance and which are

significantly different from zero, the standard error of the correlation coefficient is used.

The correlation coefficient standard error is defined as n1 where n is the number of

values being compared for an autocorrelation measure. That is, the standard error changes

through the autocorrelation function because there are a different number of comparisons

depending on the lag. When an autocorrelation measure is more than two standard errors

away from zero, it is regarded as significant (Chatfield, 1980). That is, correlations more

than n2 away from zero are seen as significant. Should a time series be completely

random, all correlations (except zero, which will always be one) should be within this

range.

The autocorrelation function can be used in initial time series analysis phases to deduce

the types of relationships at play, and also after fitting models to judge the success of

modelling.

19

2.2.2 The Partial Autocorrelation Function (PACF)

Whereas the standard autocorrelation function looks at each lag without considering the

effect of other lags, the partial autocorrelation function factors in smaller lags. That is, the

partial autocorrelation function correlation at a lag of n takes into account the correlation

at lags of 1, 2, 3, up to n-1.

The use of the partial autocorrelation function is best shown by an example. Let us say

that there is a strong correlation between maximum temperatures one day apart. That is,

there is a strong autocorrelation in temperature using a lag of one. Assuming this

correlation is sufficiently strong, standard autocorrelation will also report a significant

correlation between temperatures two days apart due to the lag one autocorrelation. For

example, since day one and two temperatures are highly correlated as are days two and

three, then days one and three (lag two) are going to have a certain amount of correlation

due entirely to the lag one correlation.

What may be desirable is a correlation measure that honestly measures the present level

of correlation. That is, correlation that takes into account the effects of correlation at

lesser lags. This is the use of the partial autocorrelation function (PACF).

Each partial autocorrelation coefficient ak is a measure of association between a time

series Y and the same time series with a lag of k (Makridakis et al., 1998). Each partial

autocorrelation coefficient ak is found by running the multiple regression model in

Equation 2.16 where bk is an estimation of ak. The parameter bk is a standard partial

regression coefficient. For more information on multiple linear regression and partial

regression coefficients, please refer to section 2.4.5.

ktkttt YbYbYbbY −−− ++++= ...22110 ( 2.16 )

The graphing of partial autocorrelation coefficients creates the partial autocorrelation

function. This can give a better indication of the exact location of autocorrelation within

variables than the standard autocorrelation function. Critical values for judging

significance are the same as in the standard autocorrelation function. The SAS ARIMA

procedure automatically creates partial autocorrelation functions.

20

2.2.3 The Inverse Autocorrelation Function (IACF)

Inverse autocorrelation is calculated on a time series by applying standard the

autocorrelation function to a modified time series model. In the modified time series

model, autoregressive and moving average components are swapped (Chatfield, 1980).

Autoregressive and moving average components are discussed in detail in sections 2.4.8

to 2.4.11.

The inverse autocorrelation function (IACF) is similar in use and result to the partial

autocorrelation function. The inverse autocorrelation function is regarded as particularly

useful for data with seasonal trends (SAS Institute, 1999). The inverse autocorrelation

function tends to show seasonal (and subset) trend sources more accurately than the other

functions.

The SAS statistical package automatically provides the inverse autocorrelation function,

though it is not as common in time series literature as the other functions documented

here. Critical values are again the same as in the standard autocorrelation function.

Example 3: Rainfall Correlation Functions

From the monthly rainfall data this example generates autocorrelation, partial

autocorrelation and inverse autocorrelation data. The functions formed from this data are

presented graphically. The purpose of this investigation is to find relationships within the

time series. For instance, is the rainfall in a month correlated with rainfall in the previous

month? Is rainfall for a month correlated with rainfall for that month the previous year?

The statistical package SAS was used for the detailed mathematical calculations involved

in this example. Selected input is attached in Appendix A and selected output in

Appendix B.

The autocorrelation function is shown graphically in Figure 2.6. Because the value of the

standard error changes depending on the number n of comparisons involved, the standard

error increases from 0.07 at a lag of one to 0.091 at lag 24. Those lags found to be more

21

than two standard deviations away from zero and hence judged statistically significant

were lags of 0, 1, 7, 12, 16 and 18. The correlation at a lag of 0 is perfect and positive;

since the data is being compared to itself (a lag of 0 is really no lag at all). The significant

correlation at a lag of one reflects similar rainfall in those months directly following each

other. The significant lag at 12 is predictably significant, a side effect of seasonal patterns

seen in rainfall data. Interestingly, lags of 7, 16 and 18 are negative and significant,

another reflection of seasonal trends (rainfall records half a year apart will frequently be

opposites).

Autocorrelation Function

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Lag

Cor

rela

tion

Figure 2.6: Monthly rainfall graphical autocorrelation function.

The partial autocorrelation graph shown in Figure 2.7 presents a similar result to the

autocorrelation results. Only three significant effects were evident this time, at lags of 1,

12 and 16. The lags of 7 and 18 were shown not to be significant when all lags prior to

them were taken into account for partial autocorrelation. The lags found significant here

were all significant by the autocorrelation function and are assumed to have the same

interpretation here.

22

Partial Autocorrelation Function

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Lag

Cor

rela

tion

Figure 2.7: Monthly rainfall partial graphical autocorrelation function.

The inverse autocorrelation function has a reputation for dealing with seasonal effects

more appropriately. It is clear from the inverse autocorrelation function in Figure 2.8 that

they definitely have been dealt with the data differently. Only the lag of 1 remains

significant while the seasonal lag of 12 was close to being significant, at 1.7 standard

errors away from 0 (two standard errors is judged as significant). Although not

significant, the lag at 12 months was responsible for a number of significant lags as side

effects in the standard autocorrelation function. The inverse correlation function

methodology effectively cleaned up the effects of seasonal effects.

Inverse Autocorrelation Function

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Lag

Cor

rela

tion

Figure 2.8: Monthly rainfall inverse graphical autocorrelation function.

23

2.2.4 The Cross Correlation Function (CCF)

Whereas the autocorrelation function investigates correlation within one time series

variable, the cross correlation function looks at correlation between two time series

variables. To investigate relationships evident between two time series, correlations

between both positive and negative lags need to be calculated. This is because a time

series X may cause a delayed effect in a time series Y, or that other time series Y may

cause a delayed effect in X. The cross correlation function uses a slight extension of the

standard correlation r formula. These formulae, which vary for positive and negative lags

of k, are given in Equations 2.17 and 2.18 (modified from McCleary and Hay, 1980).

( )( ) ( )

( ) ( )∑∑

∑

==

−

=+

−−

−−=+

n

ii

n

ii

kn

ikii

YYXX

YYXXkCCF

1

2

1

2

1 ( 2.17 )

( )( ) ( )

( ) ( )∑∑

∑

==

+=−

−−

−−=−

n

ii

n

ii

n

kikii

YYXX

YYXXkCCF

1

2

1

2

1 ( 2.18 )

Example 4: Rainfall Cross Correlation Function (CCF)

Rather than simply be interested in total monthly rainfall, the Fenech residence would

also like to take a look at the number of rain days in each month. For this example, the

interest is in investigating correlation relationships between monthly rainfall and days of

rain. Does monthly rainfall depend on the days of rain from previous months or vice

versa? Figure 2.9 presents a line graph showing both rainfall and days of rain over time.

Due to the mass of values included in the graph it is difficult to draw much in the way of

conclusions from this graph.

24

Rainfall and Days Of Rain Over Time

0

200

400

600

800

1000

1200

Jan

1985

Jan

1986

Jan

1987

Jan

1988

Jan

1989

Jan

1990

Jan

1991

Jan

1992

Jan

1993

Jan

1994

Jan

1995

Jan

1996

Jan

1997

Jan

1998

Jan

1999

Jan

2000

Jan

2001

Date

Rai

nfal

l

0

5

10

15

20

25

Day

s Of R

ain

RainfallDays

Figure 2.9: Rainfall and Days of Rain over time from 1985 to 2001 inclusive.

Rather than calculate cross correlation data manually the SAS statistical package was

called upon. The input for using SAS to generate this data is given in Appendix A and

selected output in Appendix B. Figure 2.10 shows a graphical summary of the cross

correlation function found. The cross correlation function is formed by effectively

comparing rainfall with positive and negative time lags of rain days.

Rainfall, Rain Days Cross-Correlation Function

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

-12

-10 -8 -6 -4 -2 0 2 4 6 8 10 12

Lag

Cor

rela

tion

Figure 2.10: Rainfall and Days of Rain cross correlation function.

25

The cross correlation function revealed a strong significant correlation at a lag of zero, a

reference to there generally being more rain when there are more rain days. The pattern

showing is almost periodic, a reflection of the strong seasonal trends in the base rainfall

data.

26

2.3 Repeated Measures Models

Many standard statistical tests involve assumptions that samples are independently and

randomly collected from populations. These assumptions dictate that the error terms in

models are random and uncorrelated. Frequently this is not the case and there is a degree

of correlation between measures. For example, consider the one factor ANOVA model

for testing equality of treatment means given in Equation 2.19. The error terms ( ijε ) in

this model represent randomly selected individuals within a treatment (Zar, 1999).

Should there be a relationship between specific individuals (within or between

treatments) then the independence assumptions are violated and an alternative method of

analysis should be considered (Rao, 1998).

ijiijY εαµ ++= ( 2.19 )

Repeated measures is a label given to a large number of situations where multiple

measures are recorded on the same experimental unit (Rao, 1998). Repeated measures

situations implicitly violate independence assumptions because there may be correlation

between the multiple measures on the same experimental unit. The error terms in models

involving repeated measures can not safely assume independence and are likely to be

correlated.

A wide variety of experimental situations involve repeated measures. Examples include

where different measures are made at a particular time on each experimental unit and

where the same measure is taken at a number of different times on each experimental

unit. Time series situations can be regarded as repeated measures where observations are

made at a number of different times on each experimental unit (measure).

There are a range of techniques available for both repeated measures and time series

analysis. Typically, repeated measures are used for the analysis of time series when there

few (up to ten) occasions, while analytical methods designed specifically for time series

analysis are used when there are many (at least 25) occasions (Nemec, 1996). Methods

available for analysis of repeated measures include split plot designs, mixed models and

multivariate ANOVA (MANOVA).

27

This section first presents a background to the theoretical and practical aspects involved

in repeated measures situations. Split plot designs and MANOVA are then reviewed as

techniques available for the analysis of repeated measures. The SAS statistical package

(SAS Institute, 1999) supports evaluations using all methods introduced here to some

degree.

Working knowledge of ANOVA is assumed in this section. There are a plethora of text

books available that detail degrees of freedom, models, blocking, ANOVA tables,

multiple comparison tests, calculations and other issues pertaining to the use of ANOVA

models. Recommended introductory texts include Bluman (2001) and Mann (1998). For

more in depth information Rao (1998) and particularly Zar (1999) are recommended.

2.3.1 Background

Repeated measures situations commonly occur in ANOVA, the univariate analysis of

variance. In its most basic form, ANOVA tests for equality of means and interactions

between any number of factors (Bluman, 2001). ANOVA tests for equality of sample

means use F-tests of variance ratios. The F-test answers the question of if the samples

could be taken from the same population. Briefly, the assumptions involved in standard

ANOVA hypothesis testing are as follows from Zar (1999) and Bluman (2001).

1. Model components are additive.

2. Each combination of factors has the same variance.

3. Each combination of factors is taken from a normally distributed population.

4. Error (or natural variation) model terms are independently and normally

distributed. This requires random and independent samples.

The ANOVA F-test is regarded as robust in regards to the second and third assumptions

as little difference will be seen in the results if they are violated (Zar, 1999). When a

repeated measures situation is present, however, the assumption regarding independent

error terms is violated. When data provided has multiple measures recorded on a

particular experimental unit, the error terms are not random and independent. There is an

explicit correlation between the measures on each experimental unit.

28

The inclusion of time as a factor in an ANOVA model may be considered a possible way

of modelling a situation involving repeated measures over time. If exactly the same

measure on an experimental unit is taken at different moments in time there is likely to be

a degree of correlation between these measurements. This situation cannot be regarded as

having random, independent samples. Therefore a standard ANOVA model with time as

a factor does not accurately represent the situation.

To demonstrate how repeated measures situations naturally arise, two simple repeated

measures situations are presented in Equation 2.20 and Equation 2.21. Both involve the

measuring of moisture where µ is overall mean moisture level. For Equation 2.20, i

represents the depth where the moisture measure was taken and ijε an estimation of error

from replicates j within each depth level i. For Equation 2.21, i represents the time when

the moisture measure was taken and ijε an estimation of error from replicates j within

each time level i. Note that the second situation could involve time series data whereas

the first clearly does not.

ijiij DepthMoisture εµ ++= ( 2.20 )

ijiij TimeMoisture εµ ++= ( 2.21 )

For the depth based moisture samples to be completely random, each moisture

measurement would have to be taken from a different, random soil sample. However, it is

likely to be more desirable to take a number of soil samples, divide each up into different

depths and calculate moisture levels from there. This likely situation involves taking

repeated measurements (moisture) from each experimental unit (soil sample). Therefore,

this can be considered a repeated measures situation where moisture is the repeated

measure. Measurements taken from within the same soil sample are likely to have a

degree of correlation, leading to error terms that are not independent using standard

ANOVA.

The time based moisture measurements would need to all be recorded at random

locations to be regarded as independent random samples. Practicalities and needs in

controlling other variables are likely to dictate that measurements are taken at the same

location at each time interval. In this case, repeated measures (moisture) are being

recorded on the same experimental unit (location). Hence this can also be considered a

29

repeated measures situation where moisture is the repeated measure. Measurements taken

at the same location are likely to have a degree of correlation, meaning that the error

terms are not going to be truly independent.

For examples later in this section, Equation 2.22 shows a univariate ANOVA model

where repeated measures occur. Because of assumption violations, it is important to note

that any standard ANOVA hinted at by this model is inappropriate. A dependent variable

moisture is being analysed using the variables location, time and block. There are to be

five locations, four times and three blocks. Due to the demonstrative nature of this

example, exact details such as units are not specified.

( ) ijkkijjiijk BlockTimeLocationTimeLocationMoisture εµ ++×+++= ( 2.22 )

Where:

• i = 1, 2, 3, 4, 5 (location level indicator).

• j = 1, 2, 3, 4 (time level indicator).

• k = 1, 2, 3 (block level indicator).

The block variable is regarded as a standard block that is assumed not to interact with

other factors. The error estimate ijkε in Equation 2.22 therefore includes all interactions

of the block with other factors. In Equation 2.23, the error term is expanded to show all

the interactions implied to be within the error term.

( )( ) ( )( )ijk

jkik

kijjiijk

BlockTimeLocationBlockTimeBlockLocation

BlockTimeLocationTimeLocationMoisture

××+

×+×+

+×+++= µ ( 2.23 )

Repeated measures are involved in this situation because of the time factor.

Measurements taken at different times with the same block and location combination are

likely to have correlation that violates assumptions of independence. Therefore moisture

is regarded as a repeated measure over time. The following sections introduce two

general approaches that can be used to analyse repeated measures situations and apply

them to this particular problem.

30

2.3.2 Split Plot Designs

Split plot designs are a common type of analysis that can be applied to repeated

measures. A split plot design involves placing a subplot (or number of subplots) within a

main plot to test additional factors. Factorial effects contained in the main plot and

subplots are tested separately using different error terms (Rao, 1998). The purpose of

using the different error terms is to avoid violations of independence of errors.

The technique of using each main plot as a complete replicate for another factor often

carries with it a substantial economic or logical advantage (Cochran and Cox, 1957). For

situations where the same measure is recorded at different times, it allows for the analysis

to take place (a logical advantage). The situation where a split plot design is used to deal

with correlation between measurements at different times is referred to as a ‘split plot

over time’.

In split plot designs, the main plot has an estimation of error separate to the contents of

any subplots. The main plot error adheres to demands of random independent errors,

leaving the factors that repeated measures are taken over (eg. time) in the subplots. In

effect the subplot factorial effect values are averaged over the variable repeated measures

are taken over (eg. time) for tests of significance involving main plot factorial effects.

This averaging removed the influence of repeated measures from the main plot.

Every factorial effect contained in the subplots (along with subplots of subplots and so

forth should they be involved) shares a common level of correlation in the error terms.

This correlation is due to the repeated measures over the factors contained in the subplot.

Factorial effects found in the subplot are tested for significance using a subplot error,

which is completely separate to the main plot error. All factorial effects isolated in the

subplot are affected by a standard random error and a common correlated error

component (that results from the repeated measures). Factorial effects within the subplot

can therefore be compared because they shared the same error components.

Once a split plot design is set up, analysis is as per standard ANOVA using variance

ratios and F-tests with one important exception. Factorial effects must be tested for

significance using the appropriate error (natural variation) term. Drawbacks of split plot

31

designs include that the separation may lead to low error degrees of freedom and the

possible loss of valuable information. For further information on split plot designs,

Cochran and Cox (1957) is recommended.

Example 5: Setting up a Split Plot Design

Equation 2.24 gives a version of the model introduced in section 2.3.1. In this base

ANOVA model, moisture is the repeated measure over time since identical moisture

measures taken at different times are assumed to be correlated. Hence factorial effects

involving time are placed in a subplot as is typical in a ‘split plot over time’ design. Table

2.2 shows the resulting projected ANOVA for the split plot design. The location and

block factors are tested using an estimation of error from the location and block

interaction (ε1). The time factor and time by location interaction are tested using all terms

involving time and block interactions (ε2).

( )( ) ( )

( )ijk

jkikk

ijjiijk

TimeBlockLocationTimeBlockTimeLocationTime

BlockLocationBlockLocationMoisture

××+

×+×+

+×+++= µ ( 2.24 )

Source of Variation Degrees of Freedom Location 4 Block 2 Location × Block (ε1) 8 Time 3 Location × Time 12 Block × Time (ε2) 6 Location × Block × Time (ε2) 24 Total 59 Table 2.2: Split plot projected ANOVA for moisture repeated measures example.

Note that the first estimation of error (ε1) has a fairly low number of degrees of freedom,

a common consequence of a split plot design. Each factorial effect tested for significance

uses error factorial effects that share the same error components. In the main plot, all

factorial effects are only affected by a random error because values used in analysis are

averaged over the four times. All factorial effects in the subplot are affected by a

32

completely random error component and a common correlated error component due to

the repeated measures over time.

2.3.3 MANOVA

MANOVA (multivariate ANOVA) presents an alternative way of dealing with repeated

measures situations. MANOVA is a multiple variable (multivariate) equivalent of

ANOVA. Correlation between levels of the repeated measure is dealt with in MANOVA

by the use of vectors and matrices where single values used to stand.

MANOVA involves combining a number of univariate ANOVA models together into

one large vector based model. Each level of each variable that repeated measures are

taken over is represented by a univariate ANOVA model that is combined in MANOVA.

In a situation were repeated measures are recorded over time, this means that a univariate

model for each different time is combined in MANOVA.

Standard ‘sums of squares’ measurements in the ANOVA table are replaced in

MANOVA with covariance matrices called ‘sums of squares and cross products’ (s.s.p.)

matrices. These are needed because the correlation between dependent variables must be

taken into account and covariance matrices provide that functionality. Mathematically,

correlation and covariance are closely linked but not the same (see section 2.1.2).

Elements along the diagonal of the s.s.p. matrices are conventional sums of squares

measurements, as they are calculating sums of squares on univariate components within

the model. All other elements of these matrices measure how dependent variables vary

together. Detailed formulae can be found in Crowder and Hand (1990) for these matrices.

Hypothesis testing in MANOVA is again an extension of that used in ANOVA. A one

way ANOVA null hypothesis is usually of the form given in Equation 2.25, where g is

the number of groups. This claims equality of group means while the alternative

hypothesis is that two or more of the means are not equal. For MANOVA, the null

hypothesis is as in ANOVA for every dependent variable (Zar, 1999). So if there were n

separate dependent variables (still with g groups) the null hypothesis would look as in

Equation 2.26.

33

H0: µ1 = µ2 = … = µg ( 2.25 )

H0: µ11 = … = µg1 and µ12 = … = µg2 and … and µ1n = … = µgn ( 2.26 )

The alternative MANOVA hypothesis states that there are differences between at least

two groups in at least one of the dependent variables. This is a vague conclusion to make

and must therefore be investigated further using should the null hypothesis be rejected.

Note that the MANOVA hypothesis does not test (or enforce) that the means for different

dependent variables are the same. A significant MANOVA result for a factorial effect can

be a reflection of significant differences between mean levels from that factorial effect

and or interactions between that factorial effect and the factors the repeated measures are

taken over.

There are a number of different test statistics for MANOVA, all derived in different ways

from the s.s.p. matrices. Standard notation used in MANOVA denotes the between

groups s.s.p. as TB, within groups s.s.p. at TW and total s.s.p. as T. These matrices are

equivalents of the mean sums of squares measures in standard ANOVA. The common

statistics seen are Wilk’s lambda (determinant TW / determinant T), Roy’s largest root

(largest eigenvalue of TBTW -1), Hotelling-Lawley trace (sum of eigenvalues of TBTW

-1)

and Pillai’s trace (sum of eigenvalues of TBT -1). The values obtained from each of these

methods is converted to an approximate F-value (as used in ANOVA) and hence

compared to an F-distribution. The degrees of freedom used in the F-test are dependent

on the test statistic involved.

The ‘best’ MANOVA test statistic depends on properties of the data being analysed.

Wilk’s Lambda tends to be the most common and is often exclusively used (such as in

Johnson and Wichern, 1982). Zar (1999) points out that Pillai’s trace tends to be the most

robust of the methods, handling departures from strict assumptions reasonably. Roy’s

largest root functions as an upper bound on the test statistic value.

In univariate ANOVA, multiple comparison tests should be taken out should a significant

result be obtained from the main ANOVA. This is to highlight where exactly significant

34

differences lie in the groups being compared. Multiple comparison tests in MANOVA

involve splitting the MANOVA into the separate univariate ANOVA models and

applying ANOVA multiple comparison techniques to each univariate model. In a

MANOVA model of reasonable size, this ends up involving a lot of multiple comparison

tests.

Given a standard allowable error level (α) of 0.05, an average of one in twenty multiple

comparison tests will be significant purely by chance. This risk may be judged as

unacceptable when there are a large number of multiple comparison tests as can be the

case in MANOVA. One way to get around this is the Bonferroni approach, where a

revised error term is used (Rao, 1998). As shown in Equation 2.27, the new allowable

error level is the original error level divided by the number of multiple comparison tests

k. The purpose of this modification is to reduce the occurrence of sporadic relationships.

The Bonferroni approach can be applied to any number of standard multiple comparison

tests including the protected t-test, SNK test and Tukey’s HSD (Honestly Significant

Difference).

kold

newαα = ( 2.27 )

There are a couple of drawbacks involved in MANOVA. Firstly, tests of equality of

means can not be taken out involving the factor the repeated measures were taken over

(eg. time). Furthermore, the additional parameters estimations required in MANOVA

over ANOVA may leave low degrees of freedom left for error estimation. As the number

of dependent variables increases, the degrees of freedom available for error estimation

decreases.

Example 6: Setting up a MANOVA Design

In the ongoing example involving modelling moisture from location, time and a block

effect, repeated measures were taken over the time variable. An ANOVA model

considering moisture measures at one moment in time is shown in Equation 2.28, where i

represents the particular location and j the block. The interaction of location and block is

an estimation of error or natural variation.

35

( )ijjiij BlockLocationBlockLocationMoisture ×+++= µ ( 2.28 )

Since there are four separate times, the MANOVA model simultaneously deals with four

such univariate models as in Equation 2.28. Equation 2.29 shows a vector based

interpretation of what happens when the four ANOVA models are combined. This

representation is compressed to give the form seem in Equation 2.30. Components here

have interpretations as follows:

● ijkMoisture is particular moisture measure at a time i, location j and block k.

● iµ is the mean moisture level at time i.

● ijLocation is the effect of location j at time i.

● ikBlock is the effect of block k at time i.

● ( )ijkBlockLocation× is the interaction between location j and block k at time i.

( )( )( )( )

××××

+

+

+

=

jk

jk

jk

jk

k

k

k

k

j

j

j

j

jk

jk

jk

jk

BlockLocBlockLocBlockLocBlockLoc

BlockBlockBlockBlock

LocationLocationLocationLocation

MoistureMoistureMoistureMoisture

4

3

2

1

4

3

2

1

4

3

2

1

4

3

2

1

4

3

2

1

µµµµ

( 2.29 )

( )ijkikijiijk BlockLocationBlockLocationMoisture ×+++= µ ( 2.30 )

Note that the MANOVA presented here will not test for equality of time means. This

reflects the common drawback of MANOVA where factors the repeated measures are

recorded over can not be tested for equality of means.

For investigation purposes, consider that location is found to be significant in the

MANOVA model. That is, mean difference in moisture levels exist between at least two

locations during at least one time. Multiple comparison tests should then occur to find out

exactly where these differences lie. Remember that there were five locations decided

previously. For each of the four univariate models there would be 10 multiple comparison

tests (4 + 3 + 2 + 1) leading to a total of 40 multiple comparison tests. Given a default

error level (α) of 0.05, it is likely that two (40 × 0.05) tests will be significant purely by

chance. Therefore a Bonferroni approach modified error level of 0.00125 (0.05 / 40) is

worth considering to lessen the occurrence of sporadic significant results.

36

2.4 Univariate Time Series Models

All of the statistical models to be dealt with in this section involve modelling values of

one dependent variable from one of more independent variables. That is, models are

created to explain the behaviour of a dependant variable using other variables we label

independent. This section looks at different models commonly used for single

(univariate) time series. In this univariate time series context, the ‘independent’ variables

are versions of the time series being investigated. Different forms of the original time

series are used to predict that time series.

2.4.1 Time Series Model Components

Components of time series are often split into three separate parts – trend, season and

random error. These three components are usually viewed in their most fundamental

terms by plotting time (x-axis) against the value for the variable (y-axis). This type of plot

is commonly known as a ‘time plot’. The ongoing purely theoretical example for this

section will be average monthly temperature for a location over 100 years (there is no

actual data).

The term ‘trend’ is used in reference to long term changes in a property over time. We

may find that the average monthly temperature is slowly increasing over the 100 years,

forming a trend. Figure 2.11 shows a graph of a pure trend component.

37

A Pure Trend Component

Time

Val

ue

Figure 2.11: Time plot of a pure trend component.

A recurring pattern over time often is regarded as a seasonal trend. Average monthly

temperature will tend to be different every month but return to about the same at the same

month every year. For this reason temperature commonly has a strong seasonal trend with

a period of twelve months. It is possible to have other cyclic trends that are not based on

years. These types of seasonal trends are dealt with in analysis exactly as are yearly

seasonal trends. Figure 2.12 shows a graph of a pure seasonal (or cyclic) component.

A Pure Seasonal Component

Time

Val

ue

Figure 2.12: Time plot of a pure seasonal component.

38

The final component is random error or natural variation. Once everything possible has

been taken into account, there will usually be a certain amount of unexplained variation.

This component is the difference between the temperature in a particular given month

and the average temperature in that month (taking into account long term trends). A

purely random series should be approximately normally distributed with a mean of 0 and

variance of 1/n where there are n data entries (Chatfield, 1980). Figure 2.13 shows a time

series with only a pure random component.

A Pure Random Component

Time

Val

ue

Figure 2.13: Time plot of a pure random component.

It is unrealistic to expect to graph time by a variable and see simply a trend, seasonal

trend or random component. Most models are an intricate combination of these three

components. Figure 2.14 shows a typical result of combining trend, season and random

components. In this case, the three components have been added together. The three

components could be multiplied together or formed by any number of other mathematical

combinations.

39

Trend, Season and Random Components

Time

Val

ue

Figure 2.14: Time plot of trend, season and random components (additive).

2.4.2 General Time Series Models

There are a large number of different models possible in time series. Hence often a

general decomposition form such as that in Equation 2.31 is seen (Makridakis et al.,

1998). Here, the dependent variable Y at a particular time t ( tY ) is seen as a function of

seasonal ( tS ), trend ( tT ) and error ( tε ) components.

( )tttt TSfY ε,,= ( 2.31 )

The exact relationship between components is left out of the general model. This is

because there are many options for these relationships, including additive (Equation

2.32), multiplicative (Equation 2.33) and pseudo-additive (Equation 2.34). Details on the

building of additive models are given in later parts of section 2.4.

tttt TSY ε++= ( 2.32 )

tttt TSY ε××= ( 2.33 )

( )1−+= tttt STY ε ( 2.34 )

There is a clear limit to how much behaviour can be modelled in time series analysis. If

relationships are searched for in too much detail random error may end up being

modelled, which can be unproductive and misleading. If relationships are reported that

are really a result of random error, then those relationships do not really exist and are

40

purporting to be meaningful sources of change. These misleading relationships render the

results at least partially invalid. Modelling randomness is labelled as overfitting.

There are common summary measures available for evaluating the ‘best’ time series

model. The model with the minimum mean square error (see section 2.1.4) gives the

minimum difference between the predicted and actual values and is an indication of a

good model. Another commonly used metric is Akaike’s Information Criterion (AIC),

which uses likelihood to evaluate the value of a model. The best model is regarded as the

one that retrieves the minimum value for the AIC (Pynnönen, 2001).

2.4.3 Moving Averages

Every specific value in a time series is inevitably going to be effected by natural

variation. This often leads to a time series having many ‘spikes’ through time that can

make the time series difficult to interpret and analyse. One method commonly used to

combat these noisy time series situations is to use a moving average. Moving averages

are a common tool used in time series analysis to smooth out random variation in data.

Figure 2.15 gives an example of the smoothing effect of a simple moving average

application.

Original Time Series

Time

Val

ue

Smoothed Time Series

Time

Val

ue

Figure 2.15: A time series before and after applying a moving average smoother.

Applying a moving average (usually) creates a smoother time series. Each value in the

new time series is a modified version of the old value taking into account surrounding

time series values. That is, each new value is an average of sorts of itself and surrounding

41

values in the original time series. Many methods use different weights for surrounding

values in the original series so that different time series locations are ‘prioritised’.

The general from of moving averages is given in Equation 2.35 (Chatfield, 1980). In this

general form tT represents the resulting moving average series T at time t, jtY + values

from the original time series around time t and aj denotes the weights on the original time

series T value. The half width m is defined as m = (k – 1) / 2 where k is the number of

values from the original time series being included. Note that this formula implies that

the weights for values surrounding the central position t are symmetrical. Weights must

add to one to preserve the same scale as the original time series. The moving averages

presented in this section are all specific types of this general moving average form.

∑−=

+=m

mjjtjt YaT ( 2.35 )

In all moving averages there are trade off’s involving the number of positions k included

in each moving average calculation. The higher the value of k, the smoother the resulting

time series is likely to appear. However, inclusion of too many values may lead to over

smoothing were certain patterns are being removed. Furthermore, the larger the value of k

is, the more values are lost at the beginning and end of the smoothed time series. This is

because smoothed values at these positions required values before and after the original

time series.

The most basic form of a moving average is referred to as a simple moving average. This

type of moving average simply gives equal weighting to a set number of positions

surrounding each original position. The number of positions considered k must be odd for

the simple moving average to allow for the same number of values to be considered on

either side of the original position. For example, each point could be replaced with an

average of five values (two from either side of the original position). This means that the

value at time 8 will now be the average of the values at time 6, 7, 8, 9 and 10. This

procedure of selecting values around each time is applied to every observation. This

particular example using five values is known as a 5MA simple moving average. The

general formula for a kMA moving average is given in Equation 2.36 (Makridakis et al.,

42

1998). Note that Y is the original time series, T is the resulting smoothed time series and

the half-width ( ) 21−= km .

∑−=

+=m

mjjtt Y

kT 1 ( 2.36 )

The simple moving average only allows an odd number of terms (eg. k = 3, 5 or 7) to be

used, since the same number of values must be taken on either side of the position. A

centered moving average allows for moving averages using an even number of terms (k).

The centered moving average works by taking the average of two separate simple moving

averages. This concept is shown in Equation 2.37.

25.05.0 +− +

= ttt

TTT ( 2.37 )

A moving average of this form is known as a 2 × kMA moving average because it is the

average of two kMA moving averages. Consider a 2 × 4MA moving average, where t is 4

in Equation 2.37. An even number moving average can be calculated for a value like 5.3T

because it means taking an even number of points on either side of 3.5. For example, the

4MA moving average value 5.4T would be the average of values at times 3, 4, 5 and 6.

Centered moving averages provide different weights for the first and last positions in the

original time series. The first and last values have half the weight of all other values

involved. In the general case of a 2 × k MA centered moving average, all time series

points used will have weights of 1/k except the first and last which have a weight of 1/2k.

This behaviour is best observed by looking at Example 7 at the end of this section.

The last specific type of moving average for review here is a double moving average. A

double moving average is another advancement of the simple moving average. It involves

taking a moving average of a moving average. For example, Equation 2.38 shows how a

3×3 MA double moving average 'tT would be calculated (where each iT is calculated

using 3 MA). Expanding out these moving average forms can reveal interesting implied

weightings. For example, consider the expanded weights present in a 3×3 MA shown in

Equation 2.39.

43

3/)(3/)('

11

11

+−

+−

++=++=

tttt

tttt

YYYTTTTT

( 2.38 )

( ) ( ) ( ) ( ) ( )2112

211112

91

92

31

92

91

3/)(3/)(3/)('

++−−

+++−−−

++++=

++++++++=

ttttt

tttttttttt

YYYYY

YYYYYYYYYT ( 2.39 )

Example 7: Calculating a 2 ×××× 4MA (Centered Moving Average)

For this example a centered moving average value is calculated for one particular point

on a time series. The 2001 rainfall data in Table 2.3 is used to find a 2 × 4 MA value at

April 2001.

Jan 2001 86.9Feb 2001 140Mar 2001 415Apr 2001 48May 2001 40.8Jun 2001 13.3Jul 2001 22.9Aug 2001 2.5Sep 2001 12 Table 2.3: Rainfall data from January to December in 2001.

A moving average at time 4 is the average of a moving average at time 3.5 and 4.5. Once

the moving averages at times 3.5 and 4.5 are full expanded and simplified, an expression

showing different position weightings results.

( ) ( )

65432

65432

65435432

5.45.34

81

41

41

41

81

8222

244

2

YYYYY

YYYYY

YYYYYYYY

TTT

++++=

++++=

++++

+++

=

+=

44

It is clear from the above that the first and last values being looked at have a weight of

one eighth while the others have a weight of one quarter. This pattern of the first and last

values having half the weight of all other values is expected of a centered moving

average. Below the true time series values are substituted in to get the final moving

average result.

( ) ( ) ( ) ( ) ( )

1125.1456625.12.101275.1035.17

3.13818.40

4148

41415

41140

81

4

=++++=

++++=T

Applying the above procedure to every position in the original time series results in a

smoothed time series. Figure 2.16 shows the effect resulting from this smoothing on

Buccan monthly rainfall data from 1994 to 2001 inclusive.


0

50

100

150

200

250

300

350

400

Time

Rai

nfal

l (m

m)

Smoothed Time Series

0

50

100

150

200

250

300

350

400

Time

Rain

fall

(mm

)

Figure 2.16: Applying a 2 × 4MA moving average to rainfall data (1994 to 2001).

2.4.4 Simple Linear Regression

Simple linear regression looks at modelling one continuous dependent variable (Y) from

one continuous independent variable (X). For analysis, a data set must be available

45

containing values of X and resulting values of Y. Simple linear regression is exactly like

finding a ‘line of best fit’. Two population regression parameter estimates, 0β and 1β ,

are calculated to fit a simple linear regression model. The ε model component represents

error or natural variation. The general model is shown in Equation 2.40 where the

subscript j is used to represent a particular occasion.

jjj XY εββ ++= 10 ( 2.40 )

It is a good idea to test the 1β regression coefficient for significance. If 1β is not

significantly different from zero then X is not a good predictor of Y. The graphical

representation of simple linear regression in Figure 2.17 shows 0β as the intercept and

1β as the slope on the ‘line of best fit’.

Figure 2.17: Simple linear regression presented graphically.

The coefficient of determination, symbolised by r2, relays a percentage of variation in Y

accounted for by X (Zar, 1999). The r2 is calculated by dividing the regression sum of

squares by the total sum of squares found during model evaluation. The result for the r2 is

a value between 0 and 1 where zero indicates no variation is being explained by the

model and one that all variation is being explained by the model. Simple linear regression

models work well when there is a strong correlation (r) between X and Y.

2.4.5 Multiple Linear Regression

Multiple linear regression is an extension of simple linear regression where multiple

independent variables are allowed. Instead of having one independent variable X, there

46

are now p dependant variables 1X , 2X , 3X up to pX . For analysis, a data set must be

available containing a set of values of each X variable and resulting values of Y. The

additional independent variables increases the number of regression coefficient estimates

iβ , as seen in the general form given in Equation 2.41. Each regression coefficient iβ is

now regarded as a partial regression coefficient. Once again, the subscript j represents a

particular occasion where values are available.

jpjpjjj XXXY εββββ +++++= ...22110 ( 2.41 )

The model as a whole should first be tested for significance. This F-test in effect tests the

hypothesis that Y has no dependence on any of the X variables (Zar, 1999). Should this

test show some relationship then each partial regression coefficients iβ should be tested

for significance. These tests are to see if each partial regression coefficient is equal to

zero (no relationship).

As in simple linear regression, r2 values give an approximation of how much of the

variability in Y is explained by the model. A variant of the r2 called the adjusted r2 is

particularly suited to the multiple linear regression context. This is because the adjusted

r2 adjusts for the number of variables included in the model whereas the r2 tends to

increase as more variables are included (whether or not there is any meaningful

relationship).

Notice the use of the term ‘independent’ means that the independent variables should not

be associated (correlated) with each other. If there is correlation between ‘independent’

variables, this leads to a situation known as multicolinearity. The unusual situation of

having a significant model as a whole but no significant individual variables is one of

many potential side effects of multicolinearity. The matter of multicolinearity is

particularly important to take note of in the time series area. Many time series models

naturally involve a degree of multicolinearity (see section 2.4.8). It must be carefully

monitored where correlation exists between ‘independent’ variables and a non standard

evaluation strategy such as ‘ridge regression’ or ‘two phase least squares’ may be needed

to acquire accurate parameter estimates (Zar, 1999).

47

There are a number of techniques available for selecting appropriate variables to include

in a multiple linear regression model. These include forward selection, backward

selection and stepwise selection. All techniques are based on individual variables

significance in the model, associated r2 values for the entire model and sometimes other

criteria. Detailed discussion of these common techniques is beyond the scope of this

thesis.

Forms of regression play an important role in time series. The base forms of

autoregressive (AR) models in section 2.4.8 and moving average (MA) models in section

2.4.9 both involve forms of regression models. In turn, autoregressive moving average

(ARMA) models in section 2.4.10 and autoregressive integrated moving average

(ARIMA) models are strongly based on regression models.

Regression is also fundamental in a number of other available but not so common time

series techniques not discussed in detail in this thesis. Harmonic regression (HREG), for

instance, uses sine and cosine components in a multiple linear regression to model a time

series (Stergiou et al., 1997).

Example 8: Linear Regression

It has already been established that there is some relationship between days of rain and

total rain in any particular month for the Buccan rainfall data. For this exercise a simple

linear regression model is looked at where rainfall is modelled as dependent on rain days.

jjj DaysRain εββ ++= 10

The SAS statistical package (SAS Institute, 1999) provides the ‘glm’ and ‘reg’

procedures to support the evaluation of regression models. Appendix A contains the

relevant code for using the SAS ‘reg’ and ‘glm’ procedures for regression analysis of the

Buccan rainfall data.

Both SAS procedures produce exactly the same results with subtly different output. The

‘reg’ procedure is the only one to output an adjusted r2 value while the ‘glm’ procedure

gives a greater variety of ‘sums of squares’ measures. Raw output for both procedures is

provided in Appendix B.

48

Evaluation of the model as a whole found that it was significant, with an F-value of

111.14 and resulting p value of less than 0.0001. The null hypothesis of the there being

no relationship between the independent (days of rain) and dependent (rainfall) variables

in the model is confidently rejected.

An r2 value of 0.3549 tells that 35.49% of the variation observed in rainfall is explained

by the days of rain variable. In this simple linear regression situation, the adjusted r2 is

very similar at 0.3517.

Since the model as a whole was found to be significant, model components are

investigated next. Within the model, days of rain is found to be significant with a

probability of less than 0.0001 of not having an effect on rainfall. In simple linear

regression, the significance result for the model as a whole and the one independent

variable will always be the same.

Parameter estimates for the intercept ( 0β ) and days ( 1β ) are calculated to give to give the

final model seen below. These parameter estimates are provided along with their standard

errors. Notice that these parameters do not necessarily make complete logical sense, as

this model suggests that when there are no rain days there will rainfall of -17mm.

)221.1()073.11(869.121837.17 jj DaysRain ×+−=

2.4.6 Stationarity

A number of the models looked at in later sections (AR, MA and ARMA models) require

a time series to be stationary (Chatfield, 1980). This section considers the issue of

stationarity.

For a time series to be stationary it must be both stationary in mean and in variance; the

mean and variance must not change over time. If a time series is gradually increasing or

49

decreasing over time, this trend makes the time series not stationary in mean. If the data

become more or less variable over time, then a time series does not have stationarity of

variance. In a lot of cases it may be clear from a simple time series graph (a variable

against time) that a time series is stationary. At other times this will not be so clear and

for these cases there are a number of standard tests available to assess stationarity. A

group of tests known as unit root tests can be used as tests of stationarity (Makridakis et

al., 1998).

Three tests of stationarity are the Dickey-Fuller test, random walk test and Phillips-

Perron test (SAS Insitute, 1999). The most commonly referred to test, the Dickey-Fuller

test, is based on a multiple linear regression model. There are a few different forms of the

Dickey-Fuller test including the zero mean, single mean and trend versions (SAS

Institute, 1999). The zero mean test assumes a mean of zero, the single mean test allows a

mean of any particular value and the trend test allows for a trend (difference in mean over

time). All tests test for stationarity of variance while all except trend test for stationarity

of mean.

Equation 2.42 shows the regression model for the zero mean Dickey-Fuller test, Equation

2.43 the model for the single mean Dickey-Fuller test and Equation 2.44 the model for

the trend Dickey-Fuller test. All of these augmented Dickey-Fuller stationarity tests result

in an estimation of the parameter φ (phi). If the series is stationary φ will be negative,

otherwise it will be close to zero. There are a number of options for testing φ and these

include conversion to rho for comparison to the Dickey-Fuller null distribution and an F

approximation. The number of lagged differenced series terms included for analysis is

flexible with around three recommended (Makridakis et al., 1998).

''22

'111

' ... ptptttt YYYYY −−−− ++++= βββφ ( 2.42 )

''22

'1101

' ... ptptttt YYYYY −−−− +++++= ββββφ ( 2.43 )

''22

'1101

' ... ptptttt YYYtYY −−−− ++++++= ββββαφ ( 2.44 )

In the Dickey-Fuller regression models (Equations 2.42 to 2.44) 1−tY is the lagged one

series taken from the original time series, 'tY is the differenced series and each '

ktY − is a

lagged time series taken from the differenced time series. The values β0, β1 and so on are

50

standard regression coefficients. A differenced time series is formed by calculating the

differences at each point from an original time series, as shown in Equation 2.45 where

tY is the original time series, 1−tY is the lagged one original time series and 'tY the

resulting differenced time series.

1'

−−= ttt YYY ( 2.45 )

Stationarity needs to be enforced in a time series before conventional analysis. This

process is referred to as prewhitening in McLeary and Hay (1980) among others. If a time

series is not stationary in variance, a mathematical transformation such as the natural log

or inversion (diving one by values) is applied to the time series (Makridakis et al., 1998).

Logarithms are particularly common and used in many applications such as Chan et al.

and Nicholson et al. (1998).

If a time series is not stationary in mean, differencing as discussed previously is used.

Sometimes it is necessary to use second-order differencing to remove non-stationarity of

the mean. That is, standard differencing must be carried out a second time, as in Equation

2.46 where ''tY is the resulting second-order differenced series. In practice it is regarded

as rarely necessary to go further than second-order differencing (Makridakis et al., 1998).

'1

'''−−= ttt YYY ( 2.46 )

Example 7: Differencing a Time Series

This example has two parts. In the first a simple first and second-order differenced time

series is found from a subset of the Buccan rainfall data. In the second, a data set is

created and then made stationary using differencing.

Part 1

The first eight months of rainfall in 2001 are regarded as the original time series Y. The

first lag of this time series 1−tY is then found simply by shifting tY . The first-order

differenced series 'tY can then be calculated using 1

'−−= ttt YYY . After then lagging the

differenced series to create '1−tY , the second order differenced series ''

tY can be found by

51

calculating '1

'''−−= ttt YYY . Table 2.4 shows the results of applying the differencing to the

2001 data. Notice that with each successive difference one value is lost from the time

series. Whereas this may not be overly important for long time series such as the base

rainfall one, this should be considered for short time series.

t tY 1−tY 1'

−−= ttt YYY '1−tY '

1'''

−−= ttt YYY 1 86.9 - - - - 2 140 86.9 53.1 - - 3 415 140 275 53.1 221.9 4 48 415 -367 275 -642 5 40.8 48 -7.2 -367 359.8 6 13.3 40.8 -27.5 -7.2 -20.3 7 22.9 13.3 9.6 -27.5 37.1 8 2.5 22.9 -20.4 9.6 -30 Table 2.4: First and second differencing applied to 2001 rainfall data.

Part 2

A time series was created where there was a definite trend over time, with a slightly

increasing amount of random error at each point as time goes on. Simple first order

differencing was applied to coax the time series to be stationary in mean. The original

time series is shown along with the resulting differenced time series in Figure 2.18. It is

clear that the trend has been completely removed.


Time

Val

ue

Differenced Time Series

Time

Val

ue

Figure 2.18: Applying differencing to a created series with a clear trend.

52

2.4.7 Backshift Notation

Backshift is a notational format commonly used to represent time series models

(Makridakis et al., 1998). Although fundamentally simple, backshift notation carries

substantial benefits for modelling complex models. The backshift operator B is defined as

in Equation 2.47.

1−= tt YBY ( 2.47 )

The backshift operator B simply changes a time series Y to have a lag of one. Multiple

application of B can shift a time series back n time periods as seen in Equation 2.48.

nttn YYB −= ( 2.48 )

2.4.8 AR (Autoregressive) Models

In section 2.1.3 the concepts of autocovariance and autocorrelation were introduced.

These are measures that look at covariance and correlation within a variable for different

lags. Autoregressive models work along these lines, using multiple linear regression to

model current values of a time series from previous values of the same series (Chatfield,

1980). These models assume that current time series values rely entirely on these

previous values with any differences being due to random noise.

An autoregressive model of order p is referred to as AR(p). For example, an

autoregressive model that includes the first three lags of the current variable would be

referred to as AR(3). The general form of an autoregressive model as shown in Equation

2.49 is much the same as a standard multiple linear regression model. The backshift

notation form is shown in Equation 2.50.

tptpttt YYYcY εφφφ +++++= −−− ...2211 ( 2.49 )

Where:

• tY represents a time series Y observation at time t.

• c a standard regression constant.

• iφ are standard regression coefficients.

• itY − the time series Y observation at a lag of i.

53

• tε is the random error at time t.

ttp

p cYBB εφφ +=−−− )...1( 1 ( 2.50 )

Determination of whether a time series follows an autoregressive model and how many

terms to include can be aided by the use of autocorrelation and partial autocorrelation

functions.

An autoregressive model of order one, AR(1), is a model where each time series value

depends purely on the previous. The autocorrelation function for this model is expected

to decay exponentially to zero on the positive side if 1φ is positive, alternating in sign

starting with a negative correlation if 1φ is negative. The partial autocorrelation function

is expected to show a spike at lag one (of the same sign as 1φ ) before cutting off straight

to zero. For the AR(1) case when 1φ is positive, Figure 2.19 displays a typical

autocorrelation function and Figure 2.20 displays a typical partial correlation function.

AR(1) Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.19: Typical autocorrelation function for AR(1), positive 1φ .

54

AR(1) Partial Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.20: Typical partial autocorrelation function for AR(1), positive 1φ .

For the general autoregressive model of order p, AR(p), there is a given set of behaviour

that can be anticipated. The autocorrelation function is expected to exponentially decay

(potentially in a sine-wave pattern) while the partial autocorrelation function is expected

to have spikes at lags one to p and then cut off to zero (Makridakis et al., 1998). The

spikes are at a number of lags because there are separate relationships between the time

series at a number of time lags. For the AR(p) case Figure 2.21 displays a typical

autocorrelation function and Figure 2.21 displays a typical partial correlation function.

AR(p ) Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.21: Typical autocorrelation function for AR(p).

55

AR(p ) Partial Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.22: Typical partial autocorrelation function for AR(p).

2.4.9 MA (Moving Average) Models

Moving average models use a form of multiple linear regression to model time series

values on previous errors. In effect, the model is assumed to be formed from a moving

average of the error series.

A moving average model of order q is referred to as MA(q) and is shown in Equation

2.51. For example, a moving average model that includes the first three lags of the error

would be referred to as MA(3). The use of subtraction for error terms is a standard

notation and not for any specific purpose.

qtqtttt cY −−− −−−−+= εθεθεθε ...2211 ( 2.51 )

Where:

• tY represents a time series Y observation at time t.

• c is a standard regression constant.

• iθ are standard regression coefficients.

• tε is the error at time t.

• it−ε is the random error at a time lag of i.

The backshift notation form is shown in Equation 2.52. As was the case for AR models,

the autocorrelation and partial autocorrelation functions can help in diagnosing the

appropriateness of MA models and the terms to include.

56

( ) tq

qt BBcY εθθ −−−+= ...1 1 ( 2.52 )

A moving average model of order one, MA(1), is a model where each time series value

depends purely on the previous error. The autocorrelation function for this model is

expected to have a spike at lag one and then cut off to zero. The spike is of the same sign

as the 1θ regression coefficient. The partial autocorrelation function is expected to show

decay at an exponential rate on the negative sign if 1θ is negative and alternating in sign

starting from the positive side if 1θ is positive. For the MA(1) case when 1θ is positive,

Figure 2.23 displays a typical autocorrelation function and Figure 2.24 displays a typical

partial correlation function. Interestingly, the expected patterns displayed in the

autocorrelation and partial autocorrelation functions are almost precisely swapped around

from the AR model case.

MA(1) Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.23: Typical autocorrelation function for MA(1), positive 1θ .

MA(1) Partial Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.24: Typical partial autocorrelation function for MA(1), positive 1θ .

57

For the general moving average model of order q, MA(q), there is a given set of

behaviour that can be anticipated. The autocorrelation function is expected to have spikes

at lags 1 to q and then cut off to zero while the partial autocorrelation function is

expected to exponentially decay, potentially in a sine-wave pattern (Makridakis et al.,

1998). Once again, the expected behaviour shown by MA models in the autocorrelation

and partial correlation functions are almost exactly swapped around from the AR model

case.

MA(q ) Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.25: Typical autocorrelation function for MA(q).

MA(q ) Partial Autocorrelation Function

-1

-0.5

0

0.5

1

1 2 3 4 5 6 7 8 9 10 11 12

Lag

Cor

rela

tion

Figure 2.26: Typical partial autocorrelation function for MA(q).

58

2.4.10 ARMA (Autoregressive Moving Average) Models

Autoregressive moving average (ARMA) models are a useful set of time series models

that combine autoregressive (AR) and moving average (MA) models together. The

resulting form can be seen in Equation 2.53.

qtqttptptt YYcY −−−− −−−++++= εθεθεφφ ...... 1111 ( 2.53 )

All symbols used in Equation 2.53 are as defined in sections 2.4.8 and 2.4.9 for the AR

and MA models combined to form this model. The backshift notation form can be found

in Equation 2.54.

( ) tq

qtp

p BBcYBB εθθφφ −−−+=−−− ...1)...1( 11 ( 2.54 )

Identification of ARMA models and components is by observing for underlying patterns

of AR and MA models discussed in sections 2.4.8 and 2.4.9. Experimentation is

commonly seen in the area of ARMA models, where different model components are

tested to see which end up being significant or useful in the model. Specification of

model components in AR, MA, ARMA and ARIMA (section 2.4.11) is not an exact

science.

2.4.11 ARIMA (Autoregressive Integrated Moving Average) Models

All models looked at so far (AR, MA, ARMA) require a time series to be stationary.

Stationarity of variance is enforced by mathematical transformations while stationarity of

mean by differencing. See section 2.4.6 for more information on stationarity.

Autoregressive Integrated Moving Average models, hereafter referred to as ARIMA

models, allow for autoregressive (AR), integrated (I) and moving average (MA)

components. The use of the term ‘integrated’ is a reference to the use of integration to

return the series back to its original form after differencing and evaluation. The

integration process is the inverse of the differencing process.

ARIMA models are written in the form ARIMA (p, d, q) where p indicates the order of

the autoregressive component, d is the number of first differences used and q is the order

of the moving average component. All models previously investigated can be represented

using this ARIMA notation. For example, an autoregressive model of order three is

59

ARIMA (3, 0, 0), a moving average model of order two is ARIMA (0, 0, 2) and an ARMA

model that combines these two without differencing would be ARIMA (3, 0, 2). A

completely random (‘white noise’) model is simply ARIMA (0, 0, 0).

ARIMA models are best shown using backshift notation as the formulae quickly become

very large and complicated without it (Makridakis et al., 1998). The general ARIMA (p,

d, q) formula using backshift notation is as shown in Equation 2.55. The symbols used in

Equation 2.55 are as defined in sections 2.4.8 and 2.4.9 for AR and MA models.

( )( ) ( )MAIAR

BBcYBBB tq

qtdp

p εθθφφ −−−+=−−−− ...11...1 11 ( 2.55 )

Notice that the backshift notation of Equation 2.55 allows for ease of distinction between

autoregressive (AR), integrated (I) and moving average (MA) model components. An

example model for ARIMA (1, 2, 1) is shown in Equation 2.56.

( )( ) ( )MAIAR

BcYBB tt εθφ 12

1 111 −+=−− ( 2.56 )

The inclusion of seasonality has so far been avoided in the ARIMA models presented.

However, often seasonality is an important component to include in analysis. Seasonal

components in ARIMA models are considered separately to other components because

they can have their own behaviour. Consider the hypothetical modelling of monthly

temperature over many years where there is a season of twelve months. There may be

non-stationarity over time, perhaps as a result of global warming. Temperatures in one

month may be related to the temperature in the same month in the previous year. This

hypothetical example highlights in the practical sense why seasonal components are

important and have their own characteristics that need to be modelled.

Incorporating seasonality into ARIMA models involves further complication and these

models are referred to as ARIMA (p,d,q)(P,D,Q)s models. The p, d and q components

represent non seasonal autoregressive, integrated and moving average components

respectively. P, D and Q represent seasonal autoregressive, integrated and moving

average components respectively. The length of the season involved is referred to as the

60

parameter s. The general model in backshift notation is given in Equation 2.57

(Makridakis et al., 1998).

( )( )( ) ( )( )( ) t

QsQ

sqq

tDsdPs

Psp

p

BBBB

YBBBBBB

εθθ

φφ

Θ−−Θ−−−−

=−−Φ−−Φ−−−−

...1...1

11...1...1

11

11 ( 2.57 )

Where:

• tY is the original time series observation at time t.

• Each iφ is a parameter estimate for the non seasonal autoregressive component.

• Each iΦ is a parameter estimate for the seasonal autoregressive component.

• Each iθ is a parameter estimate for the non seasonal moving average component.

• Each iΘ is a parameter estimate for the seasonal moving average component.

• tε is the random error at time t.

All parameters used in ARIMA models have logical restrictions on the values they can

take. For more information on these issues consult Chatfield (1980) or Makridakis

Makridakis et al. (1998).

The general approach for ARIMA modelling is as follows:

1. Plot the time series variable against time and observe; if the variance is not

stationary over time then transform.

2. If the data is not stationary around the mean (ie. there is a trend of some sort) then

use differencing. Take second-order first differencing at most.

3. Examine the autocorrelation functions of the time series remaining. Seasonality,

autoregressive or moving average components (or a combination of these) may

reveal themselves.

The goal of ARIMA models is to end up with a random (white noise) time series for the

error component. Once all components have been identified and extracted this is what

should result. This is checked by reviewing residual autocorrelation function values after

models have been fitted.

61

An alternative paradigm to ARIMA sometimes used for analysis of univariate time series

data is Winter’s three parameter exponential smoothing (WES). Three smoothing

operations are used to handle trend, seasonality and randomness. Additive and

multiplicative forms of WES are available.

The focus in the remainder of this section is on supplying accurate input and making

valid conclusions using time series tools. For this investigation the particular tool of

choice for aiding in calculations is the SAS statistical software package (SAS Institute,

1999). The SAS ‘arima’ procedure is dedicated to the use of ARIMA models. Below is a

brief overview of the structure of ARIMA input for using SAS and the following example

utilises SAS for a practical application. For full details on the functionality available in

proc arima, it is recommended to consult a SAS/ETS user guide or standard SAS Help

(SAS Institute, 1999).

Within the SAS ‘arima’ procedure, there are two main instruction lines that are

fundamental for ARIMA models. These are the ‘identify’ and ‘estimate’ instructions.

The ‘var’ option in the ‘identify’ statement is for specifying the time series to be

analysed. It is at this point that differences (the integrated component of ARIMA models)

should be specified if they are required. For example, to read in a variable A that needs to

have second-order first differencing for analysis, VAR=A(1,1) should be entered.

The ‘stationarity’ option in the ‘identify’ statement can be used to conduct Dickey-Fuller,

Phillips-Perron and Random Walk tests. The form of the Dickey-Fuller test given in

section 2.4.6 shows that the number of autoregressive components in the test is not fixed.

SAS supports the specifying of different numbers of components for inclusion in

stationarity tests.

Seasonal and non seasonal autoregressive and moving average model components are

specified using the ‘estimate’ statement. Autoregressive lags can be specified using the

‘P’ option and moving average components with the ‘Q’ option. Due to the separation in

modelling between seasonal and non seasonal components care must be taken to ensure

any analysis tool takes this into account. For example, to include moving average

components with a lag of one and two and a seasonal moving average of lag twelve in,

62

Q=(1,2)(12) should be entered. Since the goal of ARIMA modelling is to end up with

purely random residuals, information on the form of the residuals after model fitting is of

value. Adding the keyword ‘plot’ to the ‘estimate’ statement produces autocorrelation

plots of the errors (residuals) after the model is fitted.

Example 9: Rainfall ARIMA Model

In this example an ARIMA model is developed for the monthly Buccan rainfall data from

1985 to 2001. The first task involved is to graph a time plot. The SAS Insight package

was used to draw a scatter plot with time on the x-axis and rainfall on the y-axis. The

resulting graph is shown in Figure 2.27.

Figure 2.27: Time plot of rainfall over time from 1985 to 2001 inclusive.

It is not expected that the mean or variance of the rainfall data will change over time from

personal experience and the fact that only sixteen years were covered. Dickey-Fuller tests

were conducted anyway, using SAS. Since the mean can be any value the type of Dickey-

Fuller test of interest is the single mean augmented Dickey-Fuller test. SAS allows for a

number of different autoregressive orders to be used in the base regression model of the

63

Dickey-Fuller tests. By default SAS tests autoregressive orders of 0, 1 and 2 but it was

decided to test with orders of 0, 1, 3 and 5. Appendix A contains the SAS code used for

these tests and Appendix B contains selected output.

For each single mean augmented Dickey-Fuller test, the null hypothesis of non

stationarity was confidently rejected (p values all well under 0.01). The Buccan rainfall

data was already stationary and hence differencing is not required.

The autocorrelation, partial autocorrelation and inverse correlation functions of the

original time series have already been investigated in Example 3. The main suggestions

from the analysis were that there is a twelve month seasonal pattern and a relationship

between rainfalls from one month to the next. The model to be tested therefore has a

seasonal (autoregressive) component of lag twelve, and a normal autoregressive

component of lag one.

ARIMA (p,d,q)(P,D,Q)s = ARIMA (1,0,0)(1,0,0)12

( )( ) ( ) ttYBB εφ 111 1211 =Φ−−

Specifying this model in SAS was not difficult using the ‘estimate’ line in proc arima.

The exact ARIMA code is included in Appendix A under this example. The ‘estimate’

line told SAS to regard the normal and seasonal components differently. The use of ‘P’ is

as used through this thesis and refers to autoregressive components. Specifying ‘plot’

tells SAS to include autocorrelation plots of the random ‘error’ component in the model.

Plotting the error components helps judge the legitimacy of the model created.

The ease of specifying models in SAS with the ‘estimate’ line allowed quick testing and

evaluation of many different models. For experimentation, different models were tested

to the base one proposed from initial analysis. However, the most suitable model was

found to be the initial model proposed above. Suitability was judged on the significance

of factors in the model, residual autocorrelation information and to a small extent

personal judgement. The remainder of this example investigates how exactly the decision

on accepting the initial model was made.

64

The two autoregressive components in the initial model (lag one and seasonal lag of

twelve) were both found to be significant in the ARIMA model. With p values both of

less than 0.005 there was little question as to whether they were having an effect in the

model.

An indicator of the ‘best’ model is given by minimising the standard error. The initial

model retrieved a standard error estimate of 80.44209, which was not bettered during

tests of other models.

The autocorrelation, partial autocorrelation and inverse autocorrelation function plots of

residuals were carefully scrutinised. These plots should show no relationships whatsoever

if they are the result of modelling an accurate model. The autocorrelation and partial

autocorrelation functions of the residuals had a significant lag at 16. The significant lag

of 16 had little practical interpretation and was dismissed as sporadic after inclusion in

models did not improve model performance. The inverse autocorrelation function

revealed no lag anywhere near significant (including 16).

Accepting the initial model, information can be retrieved from SAS on the values of the

model parameters. An overall mean rainfall estimate is given of 88.47 with a standard

error of 8.91. The lag one parameter estimate is 0.19926 with a standard error of 0.069

while the seasonal lag of twelve parameter estimate is 0.23 with a standard error of

0.073). These parameters are shown below in the form of the original model.

( )( ) ttYBB ε=−− 122267.0119926.01

2.4.12 Forecasting

Although a common application of time series models, forecasting is not investigated in

detail in this thesis as it is beyond the scope in the time available. Forecasting is the

prediction of future values given previous data. There is a large amount of literature and

information on forecasting topics as they are of keen interest in many fields (economics,

biological systems, etc).

65

2.5 Multivariate Time Series Models

Most of the techniques looked at thus far have been concerned with one variable and are

hence referred to as univariate techniques. This may not realistically reflect the situation

which is to be modelled. Models may be required that include more than one time series.

A variable may be more accurately modelled and predicted given information from a

number of different time series.

There are two main classes of multiple variable (multivariate) time series analysis.

Naming for these different methodologies is inconsistent and therefore in this thesis a

standard naming convention is adopted. Multivariate ARIMA models use many time

series to predict only one time series while vector ARIMA models use multiple time

series to predict multiple time series. For example, in multivariate ARIMA a daily

temperature time series may be modelled from previous daily temperature, humidity, and

atmospheric pressure. In vector ARIMA, all of the variables (daily temperature, humidity

and atmospheric pressure) may be modelled simultaneously off previous values of those

same variables. Both multivariate ARIMA and vector ARIMA methodologies are

extensions of univariate (single variable) ARIMA techniques.

2.5.1 Multivariate ARIMA Models

Multivariate ARIMA models allow for the prediction of one time series from a number of

time series (including itself). This is of use when we suspect that a time series may be

affecting the time series we are analysing and want to include this relationship in the

model. We will refer to these additional influential time series as explanatory variables.

For example, daily temperature may perhaps be predicted better from including the

explanatory variables atmospheric pressure, humidity and rainfall in analysis.

To be included in a multivariate model, an explanatory time series may affect the time

series being modelled, but not vice versa. That is, in our model of daily temperature,

humidity may affect the temperature but temperature is not allowed to affect humidity. If

there are relationships in both directions, a more general approach like vector ARIMA is

more appropriate (Makridakis et al., 1998).

66

Conceptually, multivariate ARIMA simply adds the effect of any number of explanatory

variables (X1, X2, …, Xn) on top of a standard univariate ARIMA model. This concept is

shown in Equation 2.58 where a function of each explanatory time series iX is added to

a standard univariate ARIMA model for Y called N.

( ) ( ) ( ) tnnt NXfXfXfY ++++= ...2211 ( 2.58 )

Each additional time series iX may exist by itself or with any number of lags. This is

because the relationship between the cause X and effect in Y may be delayed.

There is a general backshift notation form used for writing these multivariate models. The

one explanatory variable case is shown in Equation 2.59 (Makridakis et al., 1998).

Generalisations with more explanatory variables simply sum together the effects of

separate explanatory variables.

( ) ttt NXBvcY ++= ( 2.59 )

Where:

• c is a constant.

• Y is the time series being investigated.

• X is a time an explanatory time series variable.

• ( )Bv is the transfer function that calculates the effect of X on Y. It is defined by

( ) kk BvBvBvvBv ++++= l

2210 .

• tN is an ARIMA model for the time series Y.

Sometimes the transfer function ( )Bv is represented by the ratio ( ) ( )BB δω where

( ) ss BBBB ωωωωω −−−−= l

2210 and ( ) r

r BBBB δδδδ −−−−= l

2211 . This

alternative formation provides a more efficient parameterisation since it commonly

requires less parameter estimates.

There are two common techniques for fitting multivariate ARIMA models. The first

involves the use of prewhitening and cross correlation and the second, called ‘linear

transfer function’ is a more modern precise methodology (Makridakis et al., 1998).

67

The older method was developed by Box and Jenkins in 1970. McCleary and Hay (1980)

provide a number of examples of the use of this method. The basic steps involved are as

follows:

1. Each time series is made stationary. This is referred to as prewhitening.

2. Explanatory variables at various lags are decided upon and fitted from cross

correlation functions (see section 2.2.4).

3. An ARIMA model is fitted to the time series of focus by using standard univariate

methods on the residuals from the last phase.

The more modern linear transfer function (LTF) involves a number of different steps but

is more precise. Makridakis et al. (1998) contains various pointers for achieving a

successful model. The summarised LTF method steps are as follows:

1. A regression model is fitted using lags of explanatory variables. The number of

lags used should be enough to accurately capture all potential relationships.

2. Stationarity of the time series is enforced if the errors from the initial regression

model are not stationarity.

3. Transfer functions are found to convert explanatory time series into effects on the

time series of focus.

4. A model is fitted using the transfer function and an ARMA model fitted to the

errors resulting from this fitting.

Once an entire model has been fitted, success is measured in the same way as univariate

ARIMA models. Autocorrelation plots of the model residual components should show no

significant lags. The mean square error and Akaike’s Information Criterion (AIC) if

available should be minimised.

A multivariate ARIMA model commonly seen is ARMAX. An ARMAX(p, q, r) model is

a normal autoregressive moving average model with autoregressive (AR) order p, moving

average (MA) order q and explanatory (X) variables of order r. ARMAX models result

from looking at a single dependent time series within a vector ARMA model (Franses,

1998). They contain the effects of many independent variables but only have one

dependant time series variable. In effect ARMAX models are a version of the general

multivariate models discussed in this section where explanatory variables have a fixed

68

order r. Another common multivariate ARIMA form is dynamic regression (DREG)

models, which actually predate ARIMA (Stegiou et al., 1997).

The SAS statistical package (SAS Institute, 1999) contains functionality within proc

‘arima’ to cope with these multivariate models. Within the ‘identify’ statement, the

‘crosscorr’ option allows for explanatory variables to be specified with their given level

of differencing. Then within the ‘estimate’ statement, the ‘input’ option allows for the

specifying of model additional parameters.

Example 10: Rainfall Multivariate ARIMA Model

In this example we extend the univariate ARIMA model (see Example 9) deduced earlier

to make a multivariate model. While we still wish to focus on monthly rainfall, we will

include the days of rain in each month as an explanatory variable.

Dickey-Fuller stationarity tests reveal that rainfall and days of rain are both stationary

and do not need transformation. A cross correlation function between rainfall and days of

rain (see Example 4) showed a strong correlation at time 0. This suggests using a model

where only the lag 0 of days of rain is included.

For the time being the LTF (linear transfer function) method will be used by running a

model with the first ten lags of days of rain included. Below is the model used and

Appendix A contains the SAS code to run it. The important components to note are the

‘crosscor’ option that allows explanatory variables to be specified and the ‘input’ option

that specifies explanatory variable components to be included in the model.

( ) ttt XBvBvBvvcY ε++++++= 1010

2210 l

As expected, the results from this model showed only the first parameter (from time 0,

0v ) to be anywhere near significant in the model. In addition to this, the resulting

autocorrelation plots revealed no significant lags. The SAS parameter estimates used to

establish this understanding are given in Appendix B.

69

Given that only the lag 0 correlation was significant, the model was reran with simply the

lag 0 of the explanatory variable included. The model used is shown below and in

selected SAS output in Appendix B. The autocorrelation functions on the residuals for

this model again showed no meaningful significant lags.

ttt XvcY ε++= 0

Normally at this point an ARMA model would be fitted to the errors to represent the

influence of previous rainfall values and errors on the current model. However, since

residual autocorrelation plots showed no meaningful significant lags, this is unlikely to

provide a better model.

Purely for experimentation then, a model was run that includes autoregressive separate

components of lag one and lag twelve (seasonal) which were significant in the univariate

rainfall model. The relevant model is shown below and SAS input in Appendix B.

( )( ) ttt XvcYBB εφ ++=Φ−− 012

11 11

As expected, the autoregressive components that were significant in the univariate model

were found not to be significant in this multivariate case. The effect of days of rain at lag

0 was found to be far more crucial than the lagged autoregressive terms.

The more primitive model was found to have a higher standard error but perform better

using the AIC. The differences were rather small in both cases though. Accepting the

basic model with only the lag 0 of days of rain included, the numerical form of the base

model is then derived. The model form with parameter estimates and standard errors is

shown below.

ttt XvcY ε++= 0

( ) ( )221.1073.1186902.121837.17 ttt DaysRain ε+×+−=

70

A final note on this particular example is that it does not necessarily make much sense or

meet the requirements of a multivariate time series model. Whereas we know that rain

days will affect rainfall, is it also safe to assume that rainfall affects rain days? If rainfall

does affect rain days, then another model should be used. Furthermore, the inclusion and

use of the lag 0 for rain days means that rainfall in a particular month is basically being

predicted from the number of rain days in that month. The usefulness of such a model,

where previous values play no role whatsoever, is indeed questionable.

2.5.2 Vector ARIMA Models

Vector ARIMA models are an extension of base ARIMA models where instead of

looking at an individual time series, many time series are looked at concurrently. This

view allows for investigation of complex interrelations between different time series,

common components and so forth (Pynnönen, 2001).

All time series to be included in vector ARIMA must first be stationary so they can be

modelled and compared (Franses, 1998). This removing of non stationarity before

analysis is the reason why these methods are referred to in literature as vector ARMA

rather than vector ARIMA.

As is the case with standard ARIMA models, there are a few separate models often used

within the class of vector ARIMA. These are vector AR (autoregression), vector MA

(moving average) and vector ARMA (autoregressive moving average). Commonly the

‘vector’ part is shortened to ‘V’ and these models are referred to as VAR, VMA and

VARMA models.

The VARMA model can be represented using exactly the same form as univariate

ARMA models (Equation 2.60), except it must be remembered that each time series

component is a vector (number of values) rather than a single value. In Equation 2.61 the

extended general form is shown, using vectors and matrices (Franses, 1998).

Yt = c + φφφφ 1Y t-1 + … + φφφφ pY t-p + θθθθ 1ε t-1 + … + θθθθ qε t-q + ε t ( 2.60 )

71

+

++

+

++

+

=

−

−

−

−

−

−

−

−

−

−

−

−

tm

t

t

qtm

qt

qt

qmmqmqm

qmqq

qmqq

tm

t

t

mmmm

m

m

ptm

pt

pt

pmmpmpm

pmpp

pmpp

tm

t

t

mmmm

m

m

mtm

t

t

Y

YY

Y

YY

c

cc

Y

YY

,

,2

,1

,

,2

,1

,,2,1

,2,22,21

,1,12,11

1,

1,2

1,1

1,1,21,1

1,21,221,21

1,11,121,11

,

,2

,1

,,2,1

,2,22,21

,1,12,11

1,

1,2

1,1

1,1,21,1

1,21,221,21

1,11,121,11

2

1

,

,2

,1

...

...

ε

εε

ε

εε

θθθ

θθθθθθ

ε

εε

θθθ

θθθθθθ

φφφ

φφφφφφ

φφφ

φφφφφφ

��

�

��

�

�

�

�

��

�

�

�

�

��

�

�

�

�

��

�

�

��

( 2.61 )

A side effect of vector ARIMA models is that there are a large number of parameters to

be estimated. When there is not enough data to estimate the required parameters, the

situation is known as over fitting. Continuing when over fitting is evident can result in

grossly inaccurate and misleading results. The number of parameters in vector ARMA

models is given in Equation 2.62 where m is the number of variables, p the order of the

autoregressive component and q the order of the moving average component (Franses,

1998).

( )qpmm ++ 2 ( 2.62 )

The number of parameters involved is the key to why these models are not investigated

further in this dissertation. The length of a time series needs to be suitably long to model

the number of parameters involved. The application data for this thesis does not provide

anywhere near the length necessary for vector ARIMA given the number of parameters.

Whereas multivariate ARIMA models allow for the inclusion of additional explanatory

variables in ARIMA, multivariate dynamic regression models allow for additional

explanatory variables in vector ARIMA. Multivariate dynamic regression models allow

for improved modelling of a number of dependent variables in vector ARIMA by

incorporating additional variables purely for explanatory purposes.

Autocorrelation and autocovariance are generalised to autocorrelation and autocovariance

matrices for the vector ARIMA application. The residuals are assumed to follow a

multivariate normal distribution rather than a standard normal distribution. Judging which

elements of these matrices are significant helps decide an appropriate order for

72

autoregressive components. Provided with sufficient data, vector ARIMA can be used for

a number of different purposes. Particularly useful is the modelling and evaluation of

complex interrelations between multiple time series.

Vector ARIMA is beyond the call of duty for the SAS proc ‘arima’ (SAS Institute, 1999).

Instead, SAS offers the proc ‘statespace’ that includes the ability to evaluate these

models. Once again, full investigation is considered beyond the scope of this dissertation.

73

3 THEORY LITERATURE REVIEW The purpose of this theoretical literature review is to sample progress in the multivariate

time series field. Many situations involve the simultaneous recording of more than one

time series. There is considerable variety in the types of data recorded and purposes for

analysis. Therefore, multivariate time series analyses have been the subject of a

significant amount of theoretical development in recent years.

For the purpose of review the focus was on more recent articles, with all articles finally

reviewed published in the last eight years. A large portion of discovered multiple variable

time series literature during these years was tightly linked with a particular field. The

reason for these strong links was mainly that the type of data analysis was only of ‘value’

to the given field. Different data analysis tangents are a natural result of the different

types of data involved and different desires for the information to be extracted by

analysis.

Articles in common statistics journals were found to rarely coincide with multivariate

time series. No particular journal was found that specialises in multivariate time series.

The reason for this is that there is an interest in extracting meaning from naturally

occurring phenomena in a wide range of situations from diverse fields. Each field

develops multivariate time series techniques to suit their particular purpose and

contributes towards the overall wealth of knowledge. For this reason the theoretical

research contained in this review covers a diverse range of different research areas, with

no particular journals having dominance. The journals found with multivariate time series

literature included Computational Statistics and Data Analysis, Artificial Intelligence in

Medicine, Neurocomputing, Physica D, International Journal of Modern Physics,

Reliability Engineering and System Safety and Aquatic Living Resources. The different

fields that arose from a sample of relevant articles found are shown in Table 3.1.

74

Field of Research Article

Year

Chem

istry

Com

puting Forestry

Medicine

Pattern Recognition

Physics

Statistics

Akman and De Gooijer 1996 � Cao et al. 1998 � Crucianu et al. 2001 � Guimarães et al. 2001 � Kulkarni and Parikh 2000 � Lu et al. 2001 � Maharaj 1999 � Nemec 1995 � Ørstavik et al. 2000 � Paluš 1996 � Pynnönen 2001 � Reick and Page 2000 � Repucci et al. 2001 � Swift and Liu 2002 � Swift et al. 2001 � � Wilson et al. 2001 �

Table 3.1: The fields of research involved in recent theoretical articles.

Table 3.2 provides an overview of the base theoretical aspects found in recent articles.

Advanced time series text books commonly contain information on these base theories

used (eg. ARIMA, vector ARIMA). For the most part, recent articles found advance upon

particular aspects involved in this base theory. Where an article addresses more than one

base concept, usually concepts are considered in isolation and occasionally they are

combined. It is important to remember that while there may be different base concepts,

they are linked by the data they analyse and hence combinations of methods are always a

possibility for the future. The main areas of base theoretical concepts found in literature

from the last eight years are:

• ARIMA – Multiple independent variable forms of autoregressive integrated

moving average (ARIMA) models.

75

• Vector ARIMA – Vector based versions of ARIMA, supporting multiple

dependent and independent variables. Includes vector AR (VAR) and vector

ARMA (VARMA) among others.

• Bayesian – A more recent statistical paradigm allowing for the inclusion of prior

information and additional experimental flexibility.

• Component extraction – Investigating finding common components (properties)

within a number of time series.

• Grouping variables – Placing a number of time series into groups depending on

similarities.

• Patterns – The searching for patterns in a number of time series. Analysing

patterns may be used for diagnosis using artificial intelligence, for grouping

together time series and so forth.

• State-Space Modelling – A multiple time series analysis technique related to

vector ARIMA.

• Nonlinear Methods – Analysis for when standard assumptions of linear

relationships are invalid.

The remainder of this chapter provides an overview of recent multivariate time series

developments in a range of areas. The chapter is arranged into the following broad

categories – ARIMA developments, ARIMA alternatives, Bayesian developments,

nonlinear developments and miscellaneous developments.

76

Techniques Involved Article

Year

AR

IMA

Vector A

RM

A

Bayesian

Com

ponent Extraction

Grouping V

ariables Patterns

State-Space Modelling

Nonlinear M

ethods

Akman and De Gooijer 1996 � � Cao et al. 1998 � Crucianu et al. 2001 � � Guimarães et al. 2001 � Kulkarni and Parikh 2000 � Lu et al. 2001 � Maharaj 1999 � � Nemec 1995 � Ørstavik et al. 2000 � � Paluš 1996 � Pynnönen 2001 � � Reick and Page 2000 � Repucci et al. 2001 � � Swift and Liu 2002 � Swift et al. 2001 � Wilson et al. 2001 �

Table 3.2: Aspects looked at by researchers in recent theoretical articles.

3.1 AR/ARMA/ARIMA Developments

ARIMA models and variants of them are common in standard texts on time series

analysis. It is therefore of little surprise that multiple variable forms of these models are

the subject of ongoing research. ARIMA based models are applicable to multiple time

series where there are fixed time intervals, linear relationships and time series that are not

short.

Time series and repeated measures options available and currently in use are reviewed by

Nemec (1995) without introducing new material. The purpose of Nemec’s review was to

provide an understandable set of standard techniques for use in analysing time series

77

situations that arise in forestry. Nemec (1995) looks at the base theory of time series

including correlation functions, ARIMA models and forecasting, before looking at the

functionality provided by the SAS statistical package for analysis.

Pynnönen (2001) provides a comprehensive set of lecture notes that cover and advance

upon topics covered by Nemec (1995). The lecture notes are mathematically accurate

after comparisons with simpler material by the likes of Chatfield (1980). The purpose of

the work by Pynnönen was to present a set of ARIMA based methods and techniques

suitable for use on economic multiple time series. Base univariate AR, MA, ARMA and

ARIMA models are reviewed by Pynnönen (2001). A detailed review of concepts in

vector AR and vector ARMA multiple time series models feature in the remainder of the

article. Section 2.5.2 contains more information on vector ARIMA based methods.

Cointegration is a commonly used concept in recent literature with relevance particularly

in the finance area for the long term modelling of economic time series (see Felmingham

et al., 2000). When a time series has to have first differencing applied to be stationary

before ARIMA modelling it is referred to as being I(1), a notation taken from the ‘I’ for

‘integrated’ in ARIMA models. Two time series are cointegrated if they are both I(1) and

there exists some linear combination of the two that results in a stationary time series.

The existence of cointegration suggests a relationship between the two time series and

tests are available to test the significance of cointegration relationships. An example of

the use of a cointegration test can be found in Felmingham et al. (2000). The time series

X and Y are cointegrated if they are both I(1) and there exists an a (a ≠ 0) such that the

resulting difference between Y and a × X is a stationary time series (Equation 3.1).

Introductory cointegration information can be found in a number of financial application

papers including Diamandis et at. (2000), Felmingham et al. (2000), and Green and

Sparks (1999).

)0(~ IaXY tt − ( 3.1 )

When a number of time series are available on fundamentally the same quantity, parallel

time series often arise. Akman and De Gooijer (1996) devise a method for finding

common components in parallel time series. Finding the common components in multiple

time series can give interesting information about underlying patterns. In effect these

78

methods provide an additional tool for the types of multiple time series commonly found

in text books and literature.

Common autoregressive (AR) and moving average (MA) components are searched for in

parallel time series using what Akman and De Gooijer (1996) call ‘component extraction

analysis’. Each time series was first assumed correlated with every other time series and

represented by a (stationary) ARMA model of order (p, q). The multiple time series are

then used to find common components. The component extraction method was applied to

each original univariate time series to result in a univariate ‘common component’ time

series. Standard univariate time series analysis techniques are then advocated for the

modelling and forecasting of these resulting series. Simulation using varying sample sizes

and fundamental models was used to test the performance of Akman and De Gooijer’s

(1996) proposed techniques. A medical example of male and female death rates from

certain diseases provides a demonstration of a typical practical usage. Although

mathematically intensive and not intuitive theoretically, the method performs well and

should be useful in the practical sense.

When a number of multiple variable time series as seen in Akman and De Gooijer (1996)

are available, these may need to be placed in groups. Maharaj (1999) developed a method

to compare a number of multiple time series and class them into groups. The input for

analysis was a number of multiple time series while the output was a number of groups

each containing multiple time series. The similarity of multiple time series for group

classification was decided by similarity of model parameter estimates. The purpose of

such analysis was to investigate patterns present across different multiple time series

groups.

The multiple variable time series models Maharaj (1999) used in analysis were vector

ARMA (VARMA) models. Each model was first converted to an infinite order vector AR

(VAR) model. Each infinite order VAR model was then truncated and compared to every

other model using provided formulae. The purpose of this was to ascertain how similar

each multiple time series was to every other multiple time series. Every combination of

two modified VAR models was compared using a hypothesis test. The null hypothesis of

this test was that there is no significant difference between the two multiple time series

models. The result of comparing all models was a matrix of p values. A clustering

79

algorithm was proposed to form groups of multiple time series from these p values. These

groups highlight which multiple time series are the most similar to each other.

The grouping algorithm proposed by Maharaj (1999) was performance tested using

simulation. Bivarate (two variable) vector ARMA models of lengths 50 and 200 were

compared. Larger (200 or more) sample sizes were recommended for best results;

particularly as the number of time series in each multiple time series model increases.

The provided techniques make clear logical sense despite being moderately intense

mathematically and are a useful addition to multiple time series analysis techniques.

Some recent developments in the modelling of multiple time series are presented by

Wilson et al. (2001). The focus of the article was on advances in vector based models (as

seen in Akman and De Gooijer, 1996). These authors looked first at a more efficient use

of parameters in vector autoregressive (VAR) models. The reason for wanting to do this

was that the large number of parameters required for standard VAR models (see Franses,

1998) leads to restrictive demands on the minimum size of the time series for analysis.

The goal of the modified model was therefore to provide a good representation of the

structure with as few parameters as possible. The technique presented involved showing

variable relationships using a directed acyclic graph (DAG). In the DAG, particular lags

of each time series are nodes and arrows linking nodes represented casual dependence. A

provided procedure converts the DAG to a graph without directions called a conditional

interdependence graph (CIG). The CIG tells which parameters are to be estimated and

which are to be left out of the model. The final result of this process was a VAR model

with less than the standard number of parameters that performs better using standard

model evaluation measures such as Akaike’s Information Criterion (AIC).

An extension of VAR models also proposed by Wilson et al. (2001) represented moving

average (MA) components using a single smoothing coefficient θ. When θ = 0 the

situation simplifies to standard VAR. These extended VAR models were referred to as

VZAR models after the univariate ZAR label. Univariate models are not necessarily

called ZAR and can be found in many text books including Makridakis et al. (1998). By

including MA components, VZAR models provide an alternative to VARMA models

where fewer parameters are used to represent MA components. The smoothing proposed

works by applying powers of an operator W, shown in Equation 3.2, were B is the

80

backshift operator. The advantage of such a scheme, as is the case in the univariate

equivalent, is that only one parameter (θ) needs to be estimated rather than different

parameters for each time lag. Some questions remain unanswered with ZAR models,

including how to make an appropriate choice of the smoothing coefficient θ.

BBW

θθ

−−=

1 ( 3.2 )

Economic applications are used by Wilson et al. (2001) to test their theoretical models.

The combining of methods such as the two they investigated is a clear potential future

direction. Both advancements given are useful contributions to the time series field, with

the second model particularly promising given the known usefulness and popularity of

the univariate equivalent.

3.2 ARIMA Alternative Developments

At times ARIMA and its multiple variable equivalents are not suitable for various

reasons. The two articles covered in this section provide alternative methods of analysis

for similar data sets as dealt with in the ARIMA based methods in section 3.1.

A well known restriction of multivariate time series techniques using vector ARMA is

the large number of parameters involved. Without a sufficiently long time series, it may

not be possible to accurately estimate all parameters. This situation is known as

overfitting. To avoid this situation software packages often impose restrictions on T, the

necessary length of a time series. Equation 3.3 shows such a restriction for a VAR model

in the SPSS software package, where n is the number of variables and p the order of the

vector autoregressive component (Swift and Liu, 2002). For example, to model five

multiple time series using three lags (the autoregressive component) the original time

series must longer than 20 time units.

( )1+> pnT ( 3.3 )

Swift and Liu (2002) report an unconventional way to successfully model smaller

multivariate time series, that they call the vector autoregressive genetic algorithm

(VARGA) method. Based on genetic algorithms, the method may reduce the time series

81

length necessary to as low as in p+1, where p is the order of the autoregressive

component.

As with all genetic algorithms, Swift and Liu’s VARGA (2002) is an iterative learning

process where original (VAR) matrices are mixed and matched (crossover and mutation)

over successive ‘generations’ to find a solution that performs best in a fixed ‘fitness’

function (Swift and Liu, 2002). Experiments with a multiple variable short time series

revealed good performance from VARGA compared to the traditional VAR method.

Clearly presented and understandable, the provided techniques are a sign of the

possibilities achievable from combining statistics and genetic algorithms. VARGA

presents a fascinating alternative to conventional time series modelling techniques.

Akman and De Gooijer (1996) presented a technique to class a number of multiple time

series into groups. A simpler problem that is also relevant to multiple time series is to

place a number of univariate time series into groups (rather than multiple time series).

This feature is useful but not often covered in standard time series literature. Swift et al.

(2001) develop an unconventional technique for splitting a number of time series into

groups where within group dependencies are high and between groups dependencies are

low. Swift et al. (2001) apply their grouping techniques to data from a chemical process

from an oil refinery and a medical data set about glaucoma eye conditions. The provided

techniques allow for multiple time series that are not necessarily long in length to be split

into appropriate groups.

The grouping algorithm of Swift et al. (2001) has two main phases. The first involves

collecting all significant correlations between variables and lags of variables deemed

significant. The result of this is a list referred to as Q that contains triples, composed of

the two correlated variables and the time lag at which they are correlated. The second

phase uses a grouping algorithm based on Q to form groups from the original multiple

time series.

Swift et al. (2001) make use of evolutionary programming (EP), similar in nature to

genetic algorithms (GA), for finding correlated variables at particular lags. Then the

grouping was done using a genetic algorithm variant called a grouping genetic algorithm

(GGA).

82

The application of EP and GA to multivariate time series is fascinating and deserving of

further research. Although clearly potential exists, methods involving EP and GA are

beyond the scope of this thesis.

3.3 Bayesian Developments

Bayesian statistics provides a whole different view to dealing with statistical problems,

with a degree of flexibility unseen in conventional methods. Using Bayesian statistics

allows for the inclusion of previous information, gives the ability to stop or continue at

any point during an experiment and other features that may be considered advantages. A

prior distribution is assumed for data before analysis and this is combined with data that

follow another distribution to form a posterior distribution (Berry and Stangl, 1996).

Crucianu et al. (2001) investigate the application of Bayesian techniques to the

multivariate time series situation. The focus was on the modelling of time dependent

nonlinear processes. Extensions are proposed to better deal with existing problems with

Bayesian multivariate time series analysis. The purpose of this and similar Bayesian work

is to provide a valid alternative method of analysis that has the potential to be more

appropriate than conventional techniques in certain conditions.

The bulk of the Crucianu et al. (2001) article concerns structuring the prior distribution

and combining it successfully with multiple variable time series data to form a posterior

distribution. In particular, a general method was proposed for translating this prior

knowledge into a prior distribution. The distributions presented are complex mixes of

predominantly Gaussian, gamma and Dirac distributions. Recurrent Neural Networks

(RNN’s) were used as a tool to solve the model parameters.

Bayesian statistics are a relatively new area of statistics and application to multiple

variable time series is in the early phases. Therefore, Bayesian techniques are not

discussed further in this thesis.

83

3.4 Nonlinear Developments

Linear relationships form the fundamental basis of most conventional time series

analysis. A lot of univariate and multivariate time series analysis techniques (for

example, cross correlation) work on the assumption of linear relationships. Often

relationships are not linear and therefore methods and techniques have been developed to

deal with nonlinear situations. This section looks at the most recent developments in the

area of nonlinear time series analysis.

There are a number of tests for nonlinearity available for univariate time series but they

are not as common in the multiple time series context. Paluš (1996) developed a test of

nonlinearity for the multiple time series situation. These techniques are useful for

deciding whether linear or nonlinear models are appropriate.

In the method devised by Paluš (1996), surrogate (substitute) data was generated using a

Fourier-transform based algorithm. Nonlinear statistics are formed from the surrogate and

actual data sets. If there was a significant difference between the statistics for the

surrogate data and the actual data, it was concluded that the (actual) data were not

generated by a linear process. Techniques and tests based on ‘linear redundancies’ were

investigated to provide additional information about variable relationships and avoid

spurious results from imperfect surrogate data. The techniques were tested using data

generated from a two variable (bivariate) autoregressive model and from the Lorenz

(nonlinear mathematical) equations. Brief applications were shown for meteorological

and physiological data. The moderately mathematically complex methods are reliable and

informative.

Although univariate models capturing nonlinear behaviour (dynamics) are common,

multiple variable forms are less well known. Cao et al. (1998) investigate the modelling

and forecasting of multiple time series when the time series have nonlinear dynamics.

The purpose of developing multiple variable models was to display relationships and

information that may not be found or utilised using univariate analysis.

Cao et al. (1998) first consider each time series to see if it is relevant for analysis. In

particular, if a variable can be exactly predicted from other variables it is clearly not

84

necessary. A process of ‘embedding’ was proposed to decide how many lags and

appropriate time delays to include in modelling the multiple time series. Different time

intervals are allowed as opposed to ARIMA based models, where fixed time intervals are

required. The focus was on finding an optimum embedding dimension (number of lags)

to include for optimum prediction of the values at the next time. The new multivariate

time series techniques were found to not be sensitive to the number of data points.

The technique of predicting values by Cao et al. (1998) was presented via a ‘hands on’

approach of looking at physical problems. Theory was presented on determining

predictive relationships between variables. Agreement between the reconstructed

(modelled) time series and the original time series was found to occur when modelling

two nonlinear mathematical equations known as the Rösslers and Lorenz equations. The

techniques presented are complex and appropriate when a phenomenon is accurately

recorded and not subject to much natural variation, as was the case with the physical

examples provided.

Kulkarni and Parikh (1999) also investigate the modelling of multiple variable nonlinear

time series but in a completely different way to Cao et al. (1998). The article extends a

univariate Artificial Neural Network (ANN) approach to a multivariate form. Their

reasoning for investigation was that several variables are usually required to describe

system behaviour (dynamics). Univariate models are looked at before ‘dependent

variable’, ‘vector to scaler’, ‘vector to vector’ and ‘partial data vector to vector’

multivariate models. Computer generated data from the mathematical Lorenz equations

and Henon Map were used to test the time series models. The method accurately makes

short term predictions using clear and concise techniques.

Reick and Page (2000) present a method for predicting the next value in a single time

series using many nonlinear time series for information. To do this, multivariate versions

of standard nonlinear univariate methods were created. The class of nonlinear univariate

methods extended are classed as local and involve the use of ‘nearest neighbour’

techniques. The ‘nearest neighbour’ to a section of a time series is another section that is

the most similar that can be found. The ‘nearest neighbour’ previous series section is then

used to predict how the current series section will continue. The reason for creation of

85

these multivariate techniques was to take advantage of the information that additional

variables could provide.

The center-of-mass-prediction (COM-prediction) and local linear prediction (LL-

prediction) methods extended by Reick and Page (2000) were both based on nearest

neighbour techniques. Vectors are considered at points rather than individual values and

additional mathematics was included to effectively compare sets of vectors instead of sets

of individual values. Vectors allow for comparison of a number of time series instead of

just one. This innovative prediction technique is useful for short term predictions but is

unlikely to provide feasible long term predictions.

Ørstavik et al. (2000) investigate the modelling of a specific case of nonlinear multiple

variable time series where the multiple time series is generated by a spatio-temporal

system, a system which varies over time and space. These systems are commonly

nonlinear and include weather, rainfall and traffic. For the said systems Ørstavik et al.

(2000) propose algorithms to find certain measures (patterns) and then report on the

performance of these algorithms. These developments are based on and intended for use

on physical problems. Further discussion requires heavy mathematical knowledge and is

beyond the scope of this thesis.

3.5 Miscellaneous Developments

This final theoretical review section considers several miscellaneous research articles

found in the literature. Included herein is a look at the use of ‘hierarchical

decomposition’, ‘artificial intelligence’ and ‘state space modelling’. Although varied in

nature, these articles present some of the latest and most inventive approaches in the

multiple time series arena.

Repucci et al. (2001) create a new approach to multivariate time series analysis that they

label hierarchical decomposition (HD). HD is in concept similar to standard

decomposition methods such as PCA (principal components analysis). Standard

techniques do not handle complex dynamically interacting variables very well in theory

or practice. Crucially, unlike PCA, HD is suitable for use where there are nonlinear

86

dynamics. For HD to take place, the variables must be organised hierarchically. This

involves there being one source time series that is based entirely on itself (autoregressive)

and noise. Each time series is then based on time series higher in the hierarchy, its own

state (autoregressive) and its own independent noise. If the variables are not in this form,

HD is inappropriate.

The HD method given by Repucci et al. (2001) involves first using PCA derived

components from the original data to create a multivariate linear autoregressive (MLAR)

model. The MLAR model is then transformed to be as consistent with a hierarchical

interrelationship between variables as possible. When there was a HD situation, the

underlying generators were accurately found. The techniques were tested using

simulation and an application to a multiple variable time series data set. For situations

where a hierarchical structure is present between variables, this analysis is ideal.

Guimarães et al. (2000) apply the field of ‘artificial intelligence’ to multivariate time

series analysis. Technically, genetic algorithms as seen in Swift and Liu (2002) can be

classed as artificial intelligence. However, in this application artificial intelligence was

used in the more common context of simulating human intelligence. The purpose of this

type of analysis was to classify multiple variable time series into groups depending on

their properties. Maharaj (1999) presented techniques for grouping multivariate time

series under a completely different paradigm.

Group allocations of multiple time series by Guimarães et al. (2000) were based on

predefined characteristics by human experts. The planned field of application for this

type of analysis was in medical diagnosis, where observed multiple time series are

compared to typical features of diseases to result in a diagnosis. The specific method

developed by Guimarães et al. (2000) was referred to as temporal knowledge conversion

(TCon). All criteria for diagnosing particular conditions are written in a structured form

of plain English, for ease of understanding and application. The criteria are specified

using primitive patterns (ranges that a feature should be in), events, sequences (events

with time ranges between them) and temporal patterns (combinations of sequences).

Software can then be used from these rules to directly analyse multivariate time series

data and give results. The simple, clear and concise methods proposed are currently

specialised in the medical field but could feasibly be modified for use in other fields.

87

Modelling and prediction of system performance reliability using multivariate time series

analysis was investigated by Lu et al. (2001). State space modelling, a general approach

in multivariate time series analysis and relative of vector ARIMA, was utilised. State

space modelling uses a state vector to give a picture of past and present behaviour. Future

behaviour was then described and determined from the present state and future inputs.

Reliability approaches typically involves probability distributions on time to failure. The

use of time series by Lu et al. (2001) was a new approach and this article presents a

multiple variable model whereas previous models have been univariate. The new

approach was expected to have benefits in better representing the dynamic underlying

conditions involved in reliability studies. System performance and reliability is usually

measured on one or more critical variables called performance measures. A state-space

model form was created to represent the multivariate time series situation formed from

these performance measures and create forecasts. An example of using the created

models for reliability analysis was provided, using a software package created specially

for reliability assessment. Although innovative and interesting, it is unclear at this stage

whether the new techniques form an advantage over typical probabilistic based methods

used in reliability assessment. State space modelling, as a general approach to time series

models, shows a lot more promise for other more ‘standard’ time series applications.

Further information on state space models is beyond the scope of this thesis.

88

4 APPLICATION LITERATURE REVIEW When a number of variables are measures over time, a multiple time series situation

arises. These situations arise frequently in many different fields because the reasons for

analysis and the underlying types of data differ. Research was conducted to show how

and where multiple time series have been used in recent years.

Applications of multiple time series analysis exist in fields as diverse as economics,

ecology and sociology. For this reason research was based on searching for multiple time

series analysis rather than particular journals. Journals found with applications included

the Journal of Hydrology, Agriculture and Forest Meteorology, Fisheries Research,

Continental Shelf Research and Artificial Intelligence in Medicine. Recent articles were

searched for, with all final applications included in this review being published in the last

eight years. Table 4.1 displays the fields involved in recent application papers found.

89

Field of Application Article

Year

Chem

istry

Ecology

Econom

ics

Forestry

Geology

Hydrology

Medicine

Meteorology

Sociology

Boyd and Murray 2001 � Chan et al. 1999 � Chen and Dyke 1998 � Chin, D. A. 1995 � Diamandis et al. 2000 � Felmingham et al. 2000 � Green and Sparks 1999 � Guimarães et al. 2001 � Jensen 2001 �

Li and Kafatos 2000 � Nemec 1995 � Nicholson et al. 1998 � Pech et al. 2001 � Peiris and McNicol 1996 � Reick and Page 2000 � Rodó et al. 2002 � Stergiou et al. 1997 � Swift and Liu 2002 � Swift et al. 2001 � � Van Dongen and Geuens 1998 �

Table 4.1: Field of application for recent literature articles.

Even more so than in theoretical papers, directions taken with applications of multiple

time series tend to be tightly linked with the field of the researchers. The resulting variety

of topics addressed is summarised in Table 4.2. The main sections noted in this table are:

• One dependent – One dependent time series is being investigated.

• Many dependent – Many dependent time series are being investigated

simultaneously.

• ARIMA – Multiple independent variable forms of autoregressive integrated

moving average (ARIMA) models.

90

• Vector ARIMA – Vector based versions of ARIMA, supporting multiple

dependent and independent variables. Includes vector AR (VAR) and vector

ARMA (VARMA) among others.

• Combining series – Combining a number of time series into a single time series.

• Grouping variables – Placing a number of time series into groups depending on

similarities.

• Clustering – Grouping particular periods in time together from analysing multiple

time series.

• PCA – Principle components analysis. Retrieving independent fundamental

components from time series.

• Patterns – The use of searching for patterns in a number of time series. Analysing

patterns may be used for diagnosis using artificial intelligence, grouping together

time series and so forth.

• Nonlinear Methods – Analysis when standard assumptions of linear relationships

are invalid.

91

Techniques Involved Article

Year

One D

ependant

Many D

ependant A

RIM

A

Vector A

RM

A

Com

bining Series G

rouping Variables

Clustering

PCA

Patterns

Nonlinear M

ethods

Boyd and Murray 2001 � � Chan et al. 1999 � � Chen and Dyke 1998 � � Chin, D. A. 1995 � � Diamandis et al. 2000 � � Felmingham et al. 2000 � � � � Green and Sparks 1999 � � Guimarães et al. 2001 � � Jensen 2001 � Li and Kafatos 2000 � � Nemec 1995 � � Nicholson et al. 1998 � � Pech et al. 2001 � Peiris and McNicol 1996 � � Reick and Page 2000 � �

Rodó et al. 2002 � � Stergiou et al. 1997 � � Swift and Liu 2002 � � Swift et al. 2001 � � Van Dongen and Geuens 1998 � �

Table 4.2: Techniques used in detail in recent literature articles.

The purpose of this application literature review was to sample the practical usage of

multivariate time series. The remainder of this chapter provides an overview of the recent

multiple variable time series applications found in a range of fields. The chapter is

arranged into the following broad categories – medical applications, economic

applications, sociology applications and natural phenomena applications.

92

4.1 Medical Applications

Time series in medical applications are the logical result of monitoring an attribute over

time. Multiple time series in turn result when a number of attributes are viewed over

time. Commonly these attributes are different indicators of the health of a patient being

monitored. The three multivariate papers found from the medical field show a focus on

diagnosis as the result of analysis. None of the articles considered the use of these models

for prediction but this is clearly a possibility.

Guimarães et al. (2000) take a different approach to the use and application of

multivariate time series. For the analysis of sleep related breathing disorders, there are

many variables that can be monitored including sleep related signals involving respiration

and circulation. The parallel recording technique of all of these variables is known as

PSG. There are many different types of sleep related breathing disorders that are

normally diagnosed by a human expert from a visual representation of multiple time

series. The aim of these authors was to be able to differentiate between disorders

automatically without the use of a human expert. Criteria were set up from human expert

knowledge to be able to make the necessary classifications.

It was the method of obtaining results that set Guimarães et al. (2000) apart. The

technique used is based on artificial intelligence where computers are programmed to

simulate human intelligence. The specific method used was referred to as temporal

knowledge conversion (TCon). The method works by using primitive patterns (range that

a feature should be in), events, sequences (events with time ranges between them) and

temporal patterns (combinations of sequences). These basic elements are constructed

using plain English for ease of human understanding. Software is then used to directly

analyse multivariate time series data from sleep indicators and give results. The software

automates a process of comparing typical sleep disorder characteristics to that observed

to find a suitable diagnosis. Note that the elements used to make diagnosis decisions are

completely independent of any particular patient’s data. The challenge in this situation is

not so much the diagnosis process (which is rather simple) but in the accurate portrayal of

expert information used in the diagnosis.

93

Swift and Liu (2002) consider a conventional multiple time series situation where a

model was wanted to represent medical data. Restrictions of large parameter

requirements in vector ARIMA are bypassed using a genetic algorithm based technique.

The application was to data obtained from the group of medical eye conditions known as

glaucoma. Data involve many time series from different locations in the eye but over

short periods of time for multiple time series analysis (there were typically between 10

and 44 separate times). That is, analysis of many time series over a short period was

required rather than the typical few time series over a long time period.

The method used by Swift and Liu (2002) was given the label VARGA for vector

autoregressive genetic algorithm. For the glaucoma application, VARGA was found to

perform better than an equivalent VAR technique most of the time. The use of genetic

algorithms for multivariate time series is an interesting one that has a clear potential for

further advancement.

Swift et al. (2001) look at classifying multiple time series into groups. Given a number of

time series, the goal was to classify them into groups where within group dependencies

are high and between groups dependencies are low. Swift et al. apply innovative

grouping techniques to data from a medical data set about glaucoma eye conditions and a

chemical process from an oil refinery. Use is made of evolutionary programming (EP)

and a grouping genetic algorithm (GGA), both similar in nature to genetic algorithms

(GA), to develop a method for classifying time series. The developed procedure involves

an initial correlation search before application of an iterative grouping algorithm. In

effect, initial correlation information between time series is used to place the time series

into suitable groups.

The glaucoma medical data analysed by Swift et al. (2001) comprised of data recorded

from the right eye of 82 patients. Each patient was tested at different locations within the

eye every six months for between five and 22 years. This application of grouping was to

group together locations within the eye that are exhibiting similar behaviour. Because

there were 82 patients, 82 separate grouping applications took place (one for each

patient’s data). The final group sizes resulting from the grouping algorithm were smaller

than expected. Overall the algorithm gave groups low in size and similar for people with

good vision and large groups that are varied in size for those with low vision. The

94

chemical data set also analysed by Swift et al. (2001) involved 50 chemical processes

being recorded from a fluid catalytic converter every minute. The purpose of grouping

here was to group similar chemical processes together. The total time series had a length

of 3000 times and a large delay between cause and effect was known to exist. The results

have groupings of various sizes showing most variables to be dependent on others.

Applying the results to the original chemical process situation gave some interesting and

sometimes unexpected results.

4.2 Economic Applications

Economic data including stocks prices, exchange rates and interest rates can form time

series. Particular economic time series are likely to be influenced by other time series

since markets and rates are somewhat intertwined. Hence the field of multivariate time

series is of interest in the economic area. Not only could the relationships between time

series here be fascinating, but application of findings could prove financially rewarding.

The four papers discussed here look at the analysis of economic multivariate time series.

Chan et al. (1999) used multivariate time series to model relationships between stock

markets in China, Hong Kong and Taiwan. Hong Kong tends to function as a

‘middleman’ in trading between China and Taiwan, a result of political tension. A

multivariate ARIMA on daily stock market indicators from 1992 to 1997 was used to

model market relationships. The four stock market indicators used were the Hong Kong

Hang Seng index, Shanghai B share index, Shenzhen B share index and the Taiwan Stock

Exchange capitalisation weighted stock index. To deal with non stationarity, logarithms

were taken of the stock market values and then first differencing applied. Refer to section

2.4.6 for more information on stationarity.

Chan et al. (1999) used cross correlation matrices between the four market indicators to

establish relationships. Elements of these cross correlation matrices found to be

statistically significant gave information on relationships between markets. It was these

relationships that were particularly of interest as a result of analysis. Significant

correlations were then used to decide which factors to include in multivariate models for

95

each market indicator. Interestingly, the Hong Kong market appeared to be ‘leading’ the

other markets. These and similar relationships can be found from the appropriate usage of

multiple variable time series techniques.

Felmingham et al. (2000) investigate the interdependence of the Australian and foreign

short term interest rates. The foreign markets of interest were the United States, Japan,

United Kingdom, Canada, Germany and New Zealand. A problem with modelling data

such as interest rates is that they are subject to sudden changes as a result of politics,

availability of resources and many other potential ‘shocks’. Sudden changes are referred

to as ‘breaks’ and care must be taken in their analysis. Felmingham et al. (2000) use a

version of the Augmented Dickey Fuller (ADF) stationarity test that takes into account

breaks to find that all time series require first order differencing for modelling.

The data analysed by Felmingham et al. (2000) was quarterly short term interest rates

from 1970 to 1997. Bivariate (two variable) and multivariate analysis using vector AR

based models was applied to the data paying special care to cointegration. Cointegration

is discussed earlier in section 3.1. The appearance of cointegration was regarded as

limited evidence of long term relationships.

Diamandis et al. (2000) took a detailed look into the Greek drachma to dollar and

drachma to mark exchange rates. The interest was in fitting an accurate model that could

then be used to produce better forecasts that the naïve ‘random walk’ method. Money

supply, income and short-term interest rates were available for each market being

analysed. They applied a new approach to determine the order of integration (number of

times first differencing is required) necessary for model components (variables). The

problem with existing methods to determine the level of differencing needed is that they

have been developed for univariate models. These ‘unit root’ tests are not suitable in the

multivariate context. The model was found to contain one component that required first

differencing twice. A data transformation was applied to convert this component into one

that only involved first differencing once. This data transformation was applied because

statistical inference techniques for time series involving first differencing twice are not as

developed as those for simply first differencing. The detailed analysis that followed,

among other things, showed a relationship between domestic money and exchange rates.

96

In their paper Diamandis et al. (2000) use a multivariate cointegration technique to find

combinations of non stationary variables that form stationary series. The chosen

cointegrated vector autoregressive (VAR) model took into account these relationships. A

dynamic error correction model for forecasting based on long term behaviour was found

to well outperform the naïve ‘random walk’ method.

Another example of modelling economic data using multivariate time series analysis is

provided by Green and Sparks (1999). The exact sources of the growth and dynamics of

Canadian economic development around the start of the 20th century have been debated

for some time. This paper looked at annual data from 1870 to 1939 to resolve the debate

by pinpointing the exact sources of growth and dynamics.

Green and Sparks (1999) use a form of a vector autoregression (VAR) model to represent

the variables of population, terms of trade, exports, investment and gross national product

(GNP). Specifically, a ‘cointegration’ model was used to exploit linear combinations of

non stationary time series that resulted in stationary series. The model allowed for

dramatic changes referred to as ‘innovations’ in certain variables to effect specific other

variables. The largest impact on growth and trends in Canadian development was found

to come from innovations in population.

4.3 Sociology Applications

Social data on crimes, marriage rates and so forth collected over time form time series.

The analysis of multiple social time series has the potential to reveal patterns and trends

in human society. Although only one article was found in the class, the article by Jensen

(2001) gives an indication of the potential of multiple time series analysis in the

sociology area.

Literature commonly claims that there is a relationship between television and homicide

(murder) rates. Using multivariate time series regression similar to multivariate AR

Jensen (2001) found this relationship to be spurious. Multivariate time series models

were formed with homicide rates being dependent on many social indicators. These

indicators included the marriage-divorce ratio, Cirrhosis death rate, immigration,

97

unemployment and percentage of population 15 to 24 years old. Lagged terms of

previous homicide rates and television were included. The television effect as judged by

number of televisions, was lagged by ten years for one model and fifteen in another. This

lagging was in line with the claims by those linking television with murder.

The analysis by Jensen (2001) was carried out on murder involving white males and

females in the United States (1945 to 1992), murder rates in Canada (1950 to 1985) and

white murder rates in South Africa (1950 to 1985). Divorce was found to be a far better

indicator in all situations than lagged numbers of televisions. In the United States where

Cirrhosis data is available, a significant relationship was found between this indication of

alcoholism in society and homicide rates.

The analysis by Jensen (2001) could have been improved by the consideration of more

lags of the provided variables. As it stands, no lag was assumed of any variables except

television where set time lags were assumed prior to analysis. There is the potential for

relationships with homicide levels to have so far been undetected. A full vector based

model could also provide fascinating information on the nature of relationships between

all of the variables included in analysis rather than just homicide levels.

4.4 Natural Phenomena Applications

Many different fields use multivariate time series analysis for the fundamental task of

analysing natural phenomena. Statistical analyses are common in the analysis of natural

phenomena due to the standard inclusion of ‘natural variation’ or ‘error’ as is the norm in

natural situations. Often an integral part of natural phenomena models is the effect of

inevitable human intervention. Fields with natural phenomena applications include

chemistry, ecology, forestry, geology, hydrology, and meteorology.

Nemec (1995) was one of very few articles to apply time series in the forestry context.

Nemec presents a practical paper focusing on using repeated measures and time series

techniques for a set of forestry problems. The overall purpose of the paper was to give a

guide to dealing with situations where measurements are correlated. Where

98

measurements are taken repeatedly on a particular entity then correlation must be

factored into analysis.

The specific examples investigated by Nemec (1995) are repeated measures on the height

of seedlings, time series of tree rings over many years and the relationship between tree

rings and rainfall. The time series techniques used are ARIMA variants. Although Nemec

does not go into too much depth on the multivariate methods, basic information was

provided on the availability and use of these techniques. The SAS statistical package was

referred to and used for all applications. For repeated measures Nemec (1995)

demonstrated the SAS ‘glm’ procedure, with occasional references or assistance from the

‘anova’, ‘print’ and ‘sort’ procedures. For time series analysis, the ‘arima’, ‘autoreg’ and

‘forecast’ procedures were used. Detailed hypotheses provided a variety of information

on the nature of the variables involved. The potential for multivariate analysis could have

been explored in more detail but for the most part analysis was appropriate and

informative.

Peiris and McNicol (1996) investigate modelling daily weather using multiple variable

time series techniques. By modelling rain and non-rain variables simultaneously, they

modelled wet and dry days in the one model. Previous models tended to be specific to a

particular task or site whereas Peiris and McNicol strived for a general model to apply to

the entire Scottish climate. Data from many sites spanning 15 to 50 years was available.

Four sites were chosen for detailed analysis. The reason for analysis was to investigate

patterns and trends in rainfall in Scotland.

Some particular rainfall variables were found by Peiris and McNicol (1996) to have

annual cyclic patterns well modelled using cosine functions. Modelling using sine or

cosine waves is an alternative to ARIMA based modelling techniques. In this case the

cyclic components were more of a hindrance that an interest. These cyclic components

were removed before proceeding with multiple variable analyses.

The authors used the resulting detrended variables in a vector ARMA (VARMA) model.

Parameter estimation was calculated using maximum likelihood and the final model used

for the four Scottish sites was a VARMA(2,1) model. That is, a vector ARMA model

with an autoregressive component of order two and moving average component of order

99

one. The large number of parameters needing estimation in this model led to the

possibility of overfitting. Rainfall models were created by applying logistic regression

techniques on the other weather variables. Finally real and simulated weather predictions

were compared over time. Practical solutions to lessen the threat of overfitting are to use

less variables or longer time series. A good level of agreement was found between the

predicted and actual weather.

Starting with an overview of conventional vector ARMA methods, Chin (1995) develops

and uses a ‘scale’ model to represent a multivariate rainfall time series. Chin (1995) saw

conventional multivariate ARIMA based methods as restrictive and inappropriate for his

particular purpose, which involved wanting to model monthly and yearly rainfall data

from the US state of South Florida. The ‘scale’ model presented by Chin (1995) allowed

for the distinction between regional-scale processes that apply to all locations and small-

scale local processes that only apply to a particular location. Processes (variables) are

judged as regional-scale if they are correlated with rainfall in many locations and as

small-scale if correlated with rainfall in only one location.

Chin (1995) created models for both monthly and annual rainfall. The monthly model

was found to be particularly suited to the state space model because the regional-scale

phenomena were found to have a temporal structure. That is, average amounts of rainfall

in particular months were gradually changing over time. This behaviour violates the

assumptions of standard multivariate ARIMA based models.

Li and Kafatos (2000) investigate the relationship between the normalised difference

vegetation index (NDVI) and the El Niño Southern Oscillation (ENSO). The data set

analysis was 11 years worth of monthly NDVI (and ENSO) measurements from 1982 to

1992 from locations throughout the United States. For more information on the nature of

the NDVI and ENSO measures consult the original article. For analysis the authors first

removed the seasonal components before applying principal components analysis (PCA)

to the NDVI. The purpose of the PCA was to find the main sources (components) of

variation within the time series data. The result of the PCA was independent time series

from within the NDVI referred to as standardised linear combinations (SLC’s).

Interannual signals were investigated by wavelet decomposition involving the use of

wavelets, which are fundamental building block functions localised in time or space (Li

100

and Kafatos, 2000). The result was that the fifth strongest principal component (time

series) from the NDVI was significantly correlated with the ENSO signal. However, this

principal component only explained 0.3% of the variance in the NDVI. Appropriate use

of time series analysis found the relationship between the NDVI and the ENSO.

Rodó et al. (2001) predict the water level in the large Lake Gallocanta in Spain for the

years 1889 to 1994 before using multivariate time series techniques to explain the

observed levels. The main purpose of analysis was to provide explanations on the water

level influences. Lake water levels were first predicted using a geochemical method.

Detailed analysis of sediment core samples taken at two sites in the Lake Gallocanta

resulted in a time series of water level. Consult Rodó et al. (2001) for the exact methods

of how the lake level evolution was reconstructed from the soil core samples. The theory

and use of the methods to reconstruct lake levels from the soil core samples formed the

bulk of the article.

Multivariate time series techniques were then used by Rodó et al. (2001) to find

influences on lake water levels. Before analysis, first order differencing was applied to

the lake level to make the time series stationary (see section 2.4.6). The final time series

model explained 62.5% of the lake level variance by modelling annual water level from

annual rainfall and mean maximum temperature. Other potential parameters were tested

but not found to be significant. Additional variables such as evaporation, wind and

relative humidity were not available for the range of years included but could have been

valuable. Analysis over selected years where these variables are available may provide a

more detailed picture of water level influences.

A number of time series may share an underlying trend that is not immediately obvious

from plotting each time series against time. Nicholson et al. (1998) investigate forming a

single time series summarising trend from a number of time series. The data for analysis

involved five groups of phytoplankton sampled over 13 months from two geographical

locations in the North Sea. The goal was to find a linear transformation (univariate time

series) of the multiple time series to maximise the trend. A few techniques are available

for doing this, and a linear smoother method proposed by Hastie and Tibshirani in 1990

was used. This method carries advantages of data not needing to be equally spaced in

time and outliers (extreme values) have less of an impact than in other techniques.

101

Nicholson et al. (1998) took logarithms of the provided data prior to the application of

the linear smoother. The purpose of the logarithm transform was not provided but

assumed to be to enforce stationarity of variance (see section 2.4.6). The amounts of

particular forms of phytoplankton were found to be quite different between the two

locations. That is, linear smoothers applied in the two locations separately showed quite

different trends. More investigation into the details of the linear transformation for

practical applications would be valuable for future analyses.

Boyd and Murray (2001) investigate 22 years worth of yearly measurements of 27

variables. The 27 highly correlated variables were recorded over time from a marine

ecosystem at South Georgia where approximately 36% of the data was missing.

Logarithmic transforms were applied to some variables before inclusion in the model (for

stationarity of variance, see section 2.4.6).

For analysis Boyd and Murray (2001) combined the multiple time series into a single

time series called a combined standardised index (CSI). Three approaches were used in

the formation of the CSI with the third method, which involved smoothing a covariance

matrix to ensure positive eigenvalues, being the most successful. It was acknowledged

that this combining may lose important distinctions between the combined groups. The

CSI of the provided data showed periodic fluctuations but little evidence of a long term

trend.

Boyd and Murray (2001) took a look at how each CSI dealt with correlated and relatively

uncorrelated data. Predictably, when the data was relatively uncorrelated the correlation

between a modelled index and actual index got weaker as more data was removed. When

the variables were correlated the formed index was robust, handling 40 to 50 percent of

values missing.

Most univariate and multivariate time series analyses carry with them assumptions of

linear relationships. When these can not be assumed, nonlinear techniques are available.

A number of recent nonlinear techniques are discussed in section 3.4. Reick and Page

(2000) provide an application using nonlinear techniques referred to as next (or nearest)

neighbour methods. These methods also carry the advantage that they do not require a

102

time series to be stationary. Previously used on univariate time series, next neighbour

methods are generalised for the multivariate context and applied. The most elementary

concept behind nearest neighbour prediction is that of analogue prediction. Analogue

prediction involves searching for a time series section as similar as possible to that

leading up to where the prediction is wanted and assuming the earlier pattern continues.

The common nearest neighbour techniques of local linear (LL) prediction and center-of-

mass-prediction (COM) are based on the concept of analogue prediction.

Reick and Page (2000) analysed data from twelve time series of zooplankton numbers

collected from the German North Sea. Three data sets were created from these twelve

time series by using different moving average lengths to deal with the noisy nature of the

initial data set. A number of additional quantities covering aspects such as water

temperature, salinity and wind were also available. For each time series in each data set

three time series models were created. These were univariate (one variable), bivariate

(two variable) and trivariate (three variable) models. In every model one of the variables

was the one being predicted. In most cases the next neighbour prediction schemes were

found to give better predictions than comparative autoregressive (AR) models.

Various time series methods were applied and compared by Stergiou et al. (1997). The

data set analysed contained total monthly commercial catches for sixteen species from

eighteen fishing sub areas around the Greek islands. A number of independent variables

covering fishing effort, economic factors and climatic factors were also available. Three

general categories of models were applied. The first were standard simple and multiple

linear regression models based on external independent variables. The second group were

the univariate techniques Winter’s exponential smoothing and ARIMA. The final group

were multivariate techniques, which included harmonic regression, dynamic regression

and vector autoregression. All of these techniques are discussed in varying detail in

sections 2.4 and 2.5. For most techniques one model was made for each species creating

sixteen models in total for each particular modelling method.

The success of modelling techniques by Stergiou et al. (1997) was measured by a number

of different standard criteria. The focus was on finding a model that minimised error in

fitting and gave the most accurate forecasts. Despite the inclusion of multivariate

techniques, the all round best performer as judged by standard measures turned out to be

103

the univariate ARIMA model. The multivariate dynamic regression model was not far

behind though. This demonstrates that in the time series arena complex does not

necessarily mean better.

Pech et al. (2001) explore the logical relationship between fishing activity and

availability of the resource being fished. Seven time series collected from 1974 to 1992

documenting fishing effort and subsequent catches were analysed. All combinations of

fishing resources, tactics, strategies and locations used were noted. The approach taken to

analysis was mainly mathematical rather than statistical. Mathematical expressions were

derived to represent the effort, catch and biomass (amount of fishing resources) involved.

To suit the available data, the expressions were changed to involve ‘lower level’ (more

specific) variables including a measure of catch effort. Overall the formulae were created

specifically for the situation and based on assumptions unique to that situation. The

statistical techniques used were to estimate the parameters in the resulting formulae. This

statistical application is tightly bound with the particular situation.

Chen and Dyke (1998) model suspended sediment concentration along with its

relationship to current water velocity profile using multivariate time series techniques.

This is a new approach towards a situation typically dealt with using deterministic partial

differential equations. Previous investigation by Chen and Dyke found that a multivariate

model was more appropriate that a univariate model. The model decided for use was a

multivariate time series model called ARMAX (refer to section 2.5.1).

In their work Chen and Dyke (1998) used a recursive least square algorithm to find

parameters for the ARMAX model. A set of statistical measures (one-step prediction

error, maximum one-step prediction vector error, maximum one-step prediction element

error and maximum parameter variation) were used to judge the suitability of different

ARMAX models. The final suspended sediment model decided upon was an ARMAX (4,

2, 1) model. Although the model was found to fit the data well, the mass of parameters

gave little in the way of practical physical, chemical or biological meanings. Significance

testing of model parameters may have assisted in drawing meaning.

Another example of using multivariate time series analyses in an unconventional

application can be found in Van Dongen and Geuens (1998). Wastewater treatment

104

problems typically involve deterministic analysis of differential equations. Time series

techniques were used to handle the more variable nature of realistic situations.

Univariate and multivariate ARMA models were used by Van Dongen and Geuens

(1998) to model the behaviour exhibited by a ‘lab-scale’ biological wastewater treatment

plant. Every model had one dependant variable. Multivariate models were created to

explain the behaviour of effluent filtered, effluent suspended solids and amount of mixed

liquor suspended solids (MLSS). Success was judged by the amount of variance

explained in the dependant variable as indicated by the Akaike Information Criterion

(AIC) and the adjusted r2. A number of independent variables were available for

inclusion in the multivariate models. Correlation between the ‘independent’ variables

meant that they were not technically independent variables. To deal with these

dependencies, the advanced least squares parameter estimation techniques ‘two stage

least squares’ and ‘three stage least squares’ were used rather than ordinary least squares.

Changes were made to dependant variables by Van Dongen and Geuens (1998) as it was

found that some variables modelled better than others. Some variable values were

inverted while others were used in the form of ratios. The better performance after the

transformation is likely to be because the resulting data adhered better to assumptions of

linear relationships. The final models were judged as successful due to the significance of

stochastic (involving probability) parts of the models and the significance of lagged

explanatory variable effects.

As was the case with Chen and Dyke (1998), Van Dongen and Geuens (1998) applied

time series techniques where deterministic models involving partial differential equations

are typically used. In these situations overall behaviour can be effectively modelled by

time series but there is a compromise in interpretation. Models formed from partial

differential equations are built from basic relationships so returning results to the original

context is fairly simple. On the other hand, when time series techniques are used the

results are relatively detached from the original situation. Time series based techniques to

more effectively convey results could be of value in these situations.

105

5 FORESTRY CASE STUDY This chapter investigates data sets resulting from an in depth experiment on the effects of

mechanical harvesting operations and site management (particularly soil cultivation) on

plantation productivity in second rotation hoop pine plantations. The original trials were

set up by the Queensland Forestry Research Institute Agency for Food and Fibre

Sciences, Queensland Department of Primary Industries (DPI) and are detailed in Smith

and Bubb (2000).

As part of the experiments, the Griffith University team of the Cooperative Research

Centre (CRC) for Sustainable Production Forestry collected soil chemical and biological

data. The data sets contained many variables measured over up to nineteen time periods.

Some results and interpretations produced prior to this case study can be found in


data set.

The focus of this case study is on investigating the effects of soil compaction and soil

cultivation on chemical and biological variables. Since the experiments were conducted

over time it is also of interest to observe variable behaviour in relation to compaction and

cultivation over time. Correlated data is involved because there are repeated measures

over time.

This chapter initially presents the data sets along with information on how they were

collected and issues relating to their usage. Previous data analysis is scrutinised and

detailed analyses provided of the data sets using advanced statistical methods.

5.1 Background

A large experiment was set up on approximately 4.6 hectares of land at Yarraman (26º

52’ S, 151º 51’ E), north west of Brisbane, Queensland, Australia. The experiment was to

investigate the effects of mechanical harvesting operations and soil cultivation on Red

Ferrosol (Krasnozem) soil properties under wet weather conditions (Smith and Bubb,

106

2000). Specifically, these effects were to be investigated at plantation establishment of a

second rotation (2R) Hoop Pine plantation.

The data sets provided are from experiments that were part of a larger set of experiments

conducted at the Yarraman site. The experimental design used in the data sets provided

was based on the original experimental design set up by the DPI. The original experiment

was a randomised complete block (RCB) design with three blocks and twelve treatments.

Within each block the twelve treatments were randomly allocated to different locations.

The three blocks were based on slopes considered in three categories of upper (1), mid

(2) or lower (3). The twelve treatments were formed from combinations of two factors;

compaction with four levels and cultivation with two levels. Compaction levels were set

by using a fully laden Hemek F18HP Cranab 1200 forwarder weighing 40.2 tons (see

Figure 5.1). The four compaction levels used were zero pass (no compaction), one pass,

four pass and sixteen pass. The three cultivation options were none, disc plough and

winged ripper. The three blocks and twelve treatments led to 36 different sub locations

within the main Yarraman site. For more details on the original experiments and design

consult Smith and Bubb (2000).

Figure 5.1: Picture of the forwarder used for compaction in experiments.

Data sets have been provided from measurements of soil chemical and biological

variables over time. Both sets involved measurements taken every 28 days, where this

107

measure was referred to as a ‘month’. The chemical data set was measured over nineteen

time periods while the biological data set covered only fourteen time periods. The

fourteen time periods in the biological data set coincide with the first fourteen time

periods in the chemical data set. The exact dates involved in sampling are given in

Appendix C and range from the 3rd of February 2000 through to the 19th of July 2001.

The soil chemical and biological experiments did not utilise the full number of treatments

available. Three compaction levels of (1) zero pass, (2) one pass and (3) sixteen pass

were used along with two cultivation levels of (1) none and (2) disc plough. This created

six treatments in total when twelve were available. The three blocks were still utilised,

leading to the use of eighteen plots within the Yarraman site.

General environmental measures were also provided to accommodate potential

environmental influences on data variation. In particular, monthly rainfall, mean

maximum temperature, mean minimum temperature and mean temperature range were

provided for each month. Rainfall was measured on site using a pluviometer while

temperature data were calculated from Yarraman Forestry Office records.

Both data sets contain measurements of soil moisture for each soil sample but from

different perspectives. The chemical data set used weight of water in sample divided by

weight of wet soil while the biological data set used weight of water in sample divided by

weight of dry soil. The result is that the two measures are different in nature but

noticeably correlated.

It was considered that the effects of both soil compaction and cultivation are likely to

vary according to soil depth. For this reason the chemical data set considers two soil

depths of 0-10 cm and 10-20 cm. In the biological data set only the 0-10 cm depth is

investigated.

In the chemical data set, there was interest in dynamics of nitrogen transformations and

leaching. Soil mineral nitrogen dynamics covers soil mineral nitrogen fluxes at various

periods of measurement. To investigate soil mineral nitrogen dynamics and leaching a

technique was used that involved sequential, in situ exposure of soils. At each location in

each month, three sampling tubes called cores were driven into the ground (see Figure

108

5.2). One was removed immediately and used for baseline data. The remaining two were

left for the 28 days, one capped so that nitrogen leaching could not take place and one

uncapped so that nitrogen leaching could occur.

Figure 5.2: Three sampling cores in the ground at Yarraman. One is being removed.

Each soil sample was analysed for the three forms of nitrogen nitrite (NO2), nitrate (NO3)

and ammonium (NH4). The standard metric used in analysis for these measures was

kilograms of nitrogen per hectare (kgN/ha). Soil mineral nitrogen dynamics were

calculated by subtracting baseline core levels from the capped core levels. For

ammonium, it was assumed that positive dynamics were related to nitrogen

mineralisation while negative dynamics were a reflection of immobilisation or

nitrification. For nitrate and nitrite, positive dynamics were assumed to be the result of

nitrification whilst negative dynamics were assumed to be the result of denitrification.

Leaching was calculated by subtracting the uncapped core level from the capped core

level. By this method data from the three cores produced baseline nitrogen levels,

nitrogen dynamics and nitrogen leaching measures for each form of nitrogen.

Biological variables were measured from the same soil samples as used in the chemical

data set. Every month the baseline samples from the chemical data set were analysed at

the 0-10 cm depth level only. In particular, microbial biomass carbon (MBC) and

109

microbial biomass nitrogen (MBN) were measured. An additional ratio variable,

calculated from microbial carbon divided by microbial nitrogen (ie. MBC / MBN), was

also of interest for analysis.

Further details on the methods used for sampling and extracting data can be found in


data set. Table 5.1 contains a summary of the factors and variables provided in the data

sets. For more information on the factors and variables provided, consult Appendix D.

Data Set(s) Label Units Comments Both Month [Levels] Categories: 0 to 19, or 0 to 14. Both Block [Levels] Categories: 1 to 3, Based on slope. Both Compaction [Levels] Categories: 1 to 3 (0, 1, 16 pass). Both Cultivation [Levels] Categories: 1 (none) or 2 (plough). Chemical Depth [Levels] Categories: 1 (0-10 cm) or (10-20 cm). Chemical Grav % Gravimetric soil moisture. Biological Moist % Soil moisture, different to ‘grav’. Chemical Rainfall mm Recorded monthly rainfall. Chemical MaxTemp ºC Mean monthly maximum temperature. Chemical MinTemp ºC Mean monthly minimum temperature. Chemical TmpRange ºC Mean monthly temperature range. Chemical HaNO2 kgN/ha Nitrite levels. Chemical HaNO3 kgN/ha Nitrate levels. Chemical HaNH4 kgN/ha Ammonium levels. Chemical HaTOTN kgN/ha Total mineral nitrogen levels. Chemical PotNO2 kgN/ha Nitrite dynamics. Chemical PotNO3 kgN/ha Nitrate dynamics. Chemical PotNH4 kgN/ha Ammonium dynamics. Chemical PotTOTN kgN/ha Total mineral nitrogen dynamics. Chemical LchNO2 kgN/ha Nitrite leaching. Chemical LchNO3 kgN/ha Nitrate leaching. Chemical LchNH4 kgN/ha Ammonium leaching. Chemical LchTOTN kgN/ha Total mineral nitrogen leaching. Biological MBN µg/g Microbial biomass nitrogen. Biological MBC µg/g Microbial biomass carbon. Biological MicroC:N ratio Ratio of MBC / MBN. Biological MBNFlux µg/g Changes in MBN each month.

Table 5.1: Summary of factors and variables provided for analysis in the case study.

The purpose for the analysis of the data sets is to investigate the effects of compaction

and cultivation over time on the chemical and biological variables introduced in this

110

section. It is of interest to see if compaction affects these variables, if cultivation affects

these variables or if perhaps an interaction of these two factors is involved. Furthermore,

the role that time has to play with any of these relationships is of special interest. Related

to this investigation are other factors and variables (blocks, rainfall, etc.) that may also be

influencing the variables.

5.2 Previous Data Analysis

Existing articles can be found addressing initial analyses of the chemical data set in

Blumfield et al. (2002) and the biological data set in Chen et al. (2002). In this section

the initial analyses are reviewed in a statistical sense. The original papers contain detailed

analysis and conclusions in terms of forestry that are not pursued here. For more

information on the forestry issues involved, consult the original articles.

5.2.1 Chemical Data

Blumfield et al. (2002) analysed the chemical data set using the SPSS Base 10 software

system (SPSS, 1999). Parametric analysis techniques used included ANOVA for

comparison of means and group contrast multiple comparison tests. Where the

assumption of normally distribution populations seemed uncertain, nonparametric

techniques including the Mann Whitney U-test for equality of means and Spearmans rho

(rank correlation) were utilised. The application of non parametric methods is a good idea

given that normality is not likely in a lot of the variables.

Decisions on significance were based on the standard 0.05 significance level using p

values. Commonly p values were quoted along with significance information.

An unexplained curiosity in the data was the extremely high values of ammonium

observed in month sixteen. No evidence could be found of measurement error or similar

that could have caused the extreme values. Therefore the values were treated as genuine

outliers and omitted from analyses.

Soil moisture was found to be correlated with rainfall, maximum temperature and

minimum temperature. Soil that had not been cultivated was found to have significantly

111

higher mean moisture content at 0-10 cm than soil that was cultivated. Moisture levels

were not significantly different between cultivation levels at the 10-20 cm depth. Soil

moisture was found to be significantly higher from sixteen pass compaction than from

zero pass and one pass compaction. This was found to be the case both with and without

cultivation.

Blumfield et al. (2002) conducted ANOVA’s separately for each combination of the two

depths and three nitrogen dynamics measures (nitrate, ammonium and total nitrogen).

The data used in each ANOVA was cumulative totals over the nineteen months. Two

small amounts of missing data were ignored in these calculations which although not

recommended probably only negligibly changed the results.

There are advantages and disadvantages to the use of cumulative data. The resulting data

analysis looks at the overall behaviour exhibited but gives no information on how

behaviour changes over time. The picture at certain months and periods of time within

the data may be completely different to the overall picture but this will not be shown. The

effect of individual extreme values may be amplified and give the appearance of

relationships that are not really there.

The use of blocks in the experimental design presents a theoretical and practical problem.

The randomised complete block (RCB) design theoretically involves a block that is

assumed to have an effect but not interact with other factors (Zar, 1999). In the

experiment, the blocks were based on subtle differences in slope (Blumfield, personal

communication) that would not be expected to lead to significant differences in nitrogen

measures. This means that the blocks are not expected to have an effect and suggests that

a completely randomised design (CRD) may have been more appropriate from the outset.

Each ANOVA model tested the factorial effects block, compaction, cultivation, block by

compaction interaction, block by cultivation interaction and compaction by cultivation

interaction. Equation 5.1 shows the form of each specific model. Every model uses the

three way interaction of block by compaction by cultivation as the error term, which has

only four degrees of freedom. A further characteristic of RCB designs is that interactions

involving the block are estimates of natural variation. To remain true to the block design,

the two block interactions that were tested for significance should have been included in

112

the error. More worrying is that at times these block interaction terms were found to be

significant. Combining this fact with the lack of practical differences between the blocks

suggests that a significant block interaction may indicate that effects depend on the

particular piece of land.

( ) ( )( ) ijkjk

ikij

kjiijk

BlockCultBlockCompCultComp

BlocknCultivatioCompactionMeasure

ε

µ

+×+

×+×+

+++=

..

... ( 5.1 )

Where:

• i = 1, 2, 3 (compaction level indicator).

• j = 1, 2 (cultivation level indicator).


The bulk of the article by Blumfield et al. (2002) deals with implications of these results

and more specific results obtained from multiple comparison tests after significant

ANOVA results. Due to the aforementioned concerns with the base design and the

specific nature of reported results, these are not detailed in this section. Blumfield et al.

(2002) should be consulted for further information.

5.2.2 Biological Data

Chen et al. (2002) analysed the biological data set by applying two main groups of

ANOVA models to the data set. The first group were ANOVA models where the factors

soil compaction, cultivation, block and sampling month were included in each model of a

biological measure. The factorial effects included in these models are shown in Equation

5.2. The second group were one factor ANOVA models looking at the main effects of

soil compaction and cultivation.

( ) ( ) ( )( ) ( )( ) ijklijk

kljl

jkikij

lkjiijkl

CultCompMonthBlockCultBlockComp

CultCompCultMonthCompMonthBlocknCultivatioCompactionMonthMeasure

ε

µ

+××+

×+×+

×+×+×+

++++=

..

...

.... ( 5.2 )

113

Where:

• i = 1, 2, 3, …, 14 (month level indicator).

• j = 1, 2, 3 (compaction level indicator).

• k = 1, 2 (cultivation level indicator).

• l = 1, 2, 3 (block level indicator).

There are two main issues with ANOVA as conducted by Chen et al. (2002). The first is

that sampling month has been included as a factor in ANOVA. Correlation between

measures taken at different times in exactly the same location breaks the ANOVA

requirement of random, independent errors. Therefore models simply including the

sampling month as an additional term are not appropriate and may result in misleading

conclusions. Secondly, looking at main effects for variables when there may be

interactions is not recommended. This can lead to misleading conclusions depending on

the exact nature of the interactions.

The biological data set is also faced with the theoretical and practical problems resulting

from the RCB design as was the case with the chemical data set. In the overall ANOVA

models two block interactions were included, leaving a number of two, three and four

way interactions to form the estimate of error. It is unclear exactly how the factorial

effects to be included in the model and the factorial effects to be included in the error

were decided. More of interest is that there were occasional significant block interactions,

again hinting at behaviour depending on the particular piece of land.

Most statistical results reported by Chen et al. (2002) were based on the two base types of

ANOVA models introduced previously. Given the analysis problems, further results are

not reported in this thesis and it is recommended that the original article be consulted for

further information and biological interpretations.

114

5.3 Limitations and Scope

The main limitation for the application of statistical techniques is the limited length of the

investigations in time. Both data sets cover less than one and half year’s worth of time.

This makes estimation of seasonal variation difficult because there are very few seasons

available. In fact, some months only appear once in the data sets, meaning that any

assumptions on seasonal behaviour would be rather naïve.

A small number of nitrogen measurements were missing from the chemical data set.

Some appeared to be left out by accident and were easily filled in from other data while

in two cases data were missing and not able to be obtained. Nitrogen measures were not

provided for both cultivation levels in month seven, block three, compaction level three

and depth two. With less than 0.3% of the data missing, this small number of missing

values is not a cause for concern. It was decided to take into account these missing values

in analysis rather than attempt to approximate values.

The chemical data set contained measurements at two depths of 0-10 cm and 10-20 cm

while the biological data set only looked at the first depth. For the purposes of this thesis

only the 0-10 cm depth is investigated. The reason for this is that the majority of the

effects and behaviour are expected in the 0-10 cm depth (hence why the biological

experiment was only at this depth).

Detailed biological and chemical interpretations of behaviour are beyond the scope of this

thesis. Analysis is dealt with from a statistical point of view in line with the objectives of

the thesis.

115

5.4 Data Analysis Techniques

This section provides a detailed discussion of the statistical models and techniques

applied in section 5.5 during data analysis. Features of the data set as a whole are

discussed before detailed information on the procedures used in data analysis. All

analysis was aided by the SAS statistical package (SAS Institute, 1999) and Microsoft

Excel (Microsoft Corporation, 2001).

5.4.1 Analysis Direction

There are a number of influences that have affected the direction taken in this thesis

towards data analysis. This section contains a review of how the particular direction of

analysis was chosen from the original experimental situation, data sets and other relevant

information.

The original samples could not be assumed to be taken from normally distributed

populations (normality). The main difficulty with assuming normality is the nature of the

variables measurements. Many of the variables are concentrations, which are known to

form a lognormal distribution due to the way they are measured (Chaseling, personal

communication). The majority of the variables that were not concentrations (eg. soil

moisture, rainfall) form distributions that are skewed to the right and where larger

variable values are likely to have a larger standard deviation. These types of situations are

common in biological situations (Rao, 1998).

A mathematical natural log transformation was applied to bring all variable samples

closer to normality. This type of transformation is commonly applied to achieve a closer

adherence to normality, particularly in the case of concentrations and biological variables

(Rao, 1998 and Zar, 1999). The concept of the log transformation is shown in Equation

5.3, where y is the original value and g(y) the new, transformed value. This

transformation was applied to all variables considered except the chemical variables (i.e.

it was applied to the biological variables, rainfall, temperature and moisture variables).

For the chemical variables the transformation shown in Equation 5.4 was used instead as

116

some variable values were zero. A log can not be taken of zero as the result is an infinite

number.

)(

)(log)(yg

e

eyyyg

=

= ( 5.3 )

1.0

)1.0(log)()( −=

+=yg

e

eyyyg

( 5.4 )

The natural log transformation for normality was not applied directly to the dynamics and

leaching measures in the chemical data set. Rather, the transformation was applied to the

original baseline, capped and uncapped core readings (derived from the provided data

set). The dynamics and leaching measurements used in analysis were derived from these

logged original recordings.

A small error was found in a chemical data measurement while determining the original

measurements of capped and uncapped cores from nitrogen levels, dynamics and

leaching. In month thirteen, block three, compaction level three, cultivation level one and

depth two there is an invalid nitrite level and nitrite dynamic measure combination.

Should the data be assumed true, a negative measure was recorded for nitrite in the

capped core. Deciding that the error was more likely to be in the derivation of the

dynamics measure, the dynamics value was changed to zero. The value of zero was

chosen as it would mathematically agree with other data set measures.

A number of extremely high values can be found in the ammonium levels collected from

the baseline samples in month sixteen. These outliers appear to be accurate but were

completely inconsistent with the remainder of the measurements. A flow on effect on

ammonium nitrogen dynamics shows extremely high values in month fifteen and

extremely low values in month sixteen. Figures 5.3 and 5.4 show a summary of the

extreme ammonium behaviour observed. Leaching levels were unaffected by the unusual

values, inferring that only baseline levels were subject to extreme values. For the

purposes of analysis these extreme values (outliers) were removed as they may lead to

misleading results. The ammonium contribution towards overall nitrogen variables was

not changed. This was because the ammonium values may be part of an overall behaviour

and the extreme values are similar in magnitude to nitrate measures anyway.

117

Mean Chemical Nitrogen Levels

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrog

en (k

gN/h

a)

NitriteNitrateAmmonium

Figure 5.3: Mean mineral nitrogen levels (kgN/ha) over the nineteen months.

Mean Chemical Nitrogen Dynamics

-60

-40

-20

0

20

40

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrog

en (k

gN/h

a)

NitriteNitrateAmmonium

Figure 5.4: Mean mineral nitrogen dynamics (kgN/ha) over the nineteen months.

118

Nearly 76% of nitrite level readings were zero. This can be seen in the very small (where

visible) values for nitrite in Figures 5.3 and 5.4. The reason nitrite is present in small

quantities is that it is a midpoint in an ongoing chemical reaction (Blumfield, personal

communication). This fact combined with the large proportion of zero nitrite measures

suggests that meaningful results are unlikely. Therefore it was decided to not analyse

nitrite in isolation in this thesis.

As introduced in section 5.2, the original design was a randomised complete block (RCB)

with blocks being based on small differences in slope. These subtle slope differences

would not be expected to be a source of variation for any variables, which is against the

concept of a RCB design. Therefore a completely randomised design (CRD) may have

been more appropriate but possibly unsuitable practically.

The experimental situation for each different variable is a RCB at a number of different

times. It is a repeated measures over time situation because each variable is recorded at a

number of different times. The same variable recorded at exactly the same location each

month is likely to have a level of correlation between measures, meaning that ‘month’

cannot simply be included as a factor in a RCB design. Two main techniques are used in

this thesis to deal with the repeated measures nature of this experiment – split plot and

MANOVA. Both of these techniques are variants of ANOVA suitable for analysis of

repeated measures. For more information on split plot designs and MANOVA refer to

section 2.3.

Twelve different variables are investigated in this thesis. Nine chemical variables result

from combinations of the three nitrogen variables nitrate, ammonium and total nitrogen

with the three types of measures levels, dynamics and leaching. The three biological

variables are microbial carbon, microbial nitrogen and the carbon to nitrogen ratio. Each

variable is investigated separately in line with the focus of this thesis on analysis over

time. Analysis including a number of these variables, particularly at a specific time, is a

possibility for future analysis. Sections 5.4.2 through to 5.4.8 detail the techniques and

models applied to every variable.

Techniques specifically developed for time series analysis are for the most part not suited

to this experimental situation. The main reason for this is that the time series involved are

119

very short, having at most only nineteen time periods. To create accurate univariate time

series models, original time series this short can rarely be used (Makridakis et al., 1998).

In particular, short time series makes it difficult (if not impossible) to retrieve an

estimation of seasonal effects. Particularly troubling is that multiple variable versions of

time series techniques require even longer base time series (McCleary and Hay, 1980;

Franses, 1998). With a length of (at most) nineteen time periods it is clear that accurate

multivariate time series models are not likely to be successful.

Seasons were created from the original months as an additional factor for analysis, under

the suspicion that behaviour of variables may be different depending on the season.

Furthermore, individual months may contain a large amount of noise or random variation

and a season based approach can somewhat control this issue. From the original nineteen

months, five appropriate seasons were identified. The seasons are shown in detail along

with the original time periods in Appendix C. In summary, months two to four form

season one (autumn 2000), months five to seven form season two (winter 2000), months

nine to eleven form season three (spring 2000), months twelve to fourteen form season

four (summer 2000/2001), and months fifteen to seventeen form season five (autumn

2001). Only the first four seasons are applicable to the biological data set because data

was recorded for only fourteen months. Note that four months data are lost from the

original nineteen in the conversion to seasons, due to inexact alignment with seasons and

because each ‘month’ is 28 days.

The measure used for all chemical variables in analysis was kilograms of nitrogen per

hectare (kgN/ha). In the biological data set, the carbon to nitrogen variable is a ratio

while the other two variables are measured in micrograms per gram (µg/g). During

analysis logged values were used but these modified values are not quoted in results.

Rather, back transformed means and standard errors are quoted when values are

appropriate. Back transformation simply involves reversing the original log

transformation. A back transformed mean is commonly labelled an ‘equivalent mean’

(Chaseling, personal communication) and will differ from the precise numerical mean.

A standard summary notation is used throughout to represent different factor levels. This

notation is for ease of analysis and brevity of reporting analysis results. The original data

120

set provided utilised a similar scheme. The notation used in this chapter is summarised in

Table 5.2.

Factor Level Symbol No compaction (zero pass) 1 One pass 2

Compaction

Sixteen pass 3 None 1 Cultivation Disc Plough 2

Table 5.2: Standard summary notation for factor levels.

Hypothesis tests are frequently conducted in data analysis, with results commonly quoted

with p values. Asterisks are commonly used to reflect the level of significance from

these p values as shown in Table 5.3.

Symbol Interpretation * p < 0.05 ** p < 0.01 *** p < 0.001

Table 5.3: Legend for symbols denoting significance.

5.4.2 Exploratory Data Analysis (EDA)

Exploratory data analysis for each variable involved graphs of the variable over time for

all six treatments. That is, the variable is graphed over the nineteen (or fourteen) time

periods for each compaction and cultivation combination. Values for each compaction

and cultivation combination at each time are averaged over the three blocks, since in a

RCB design blocks should not interact with treatments.

In an effort to smooth the appearance of these graphs over time, a three month moving

average (3MA) was applied to every variable graph. Every month value was effectively

the average of itself, the month before and the month after it. Hence the smoothed graphs

have no values for the first and last months (because there is no month before the first or

after the last). These graphs were commonly more decipherable and clear than the

standard graphs.

121

A standard notation was used to represent compaction and cultivation levels over time in

exploratory data analysis graphs. This notation is shown in Figure 5.5, where the first

value is the level of compaction and the second is the level of cultivation. All graphs are

titled as being ‘by compaction, cultivation’, a reference to the notation symbolising

compaction first and the cultivation second.

Figure 5.5: Graphical notation for compaction and cultivation options.

5.4.3 Correlation Analysis

Correlation is looked at in detail to see if each variable has a relationship with physical

measures of rainfall, temperature and moisture. Correlation was calculated using

Pearson’s correlation coefficient on transformed variable values (see section 5.4.1). Cross

correlation functions were used to give a picture of each variable’s correlation with a

number of lags of the physical measures. Up to and including four lags were applied to

allow for a delayed effect of up to four months. Where cross correlation functions are

included, only positive lags are included because the reverse relationships (eg. that a

variable effects rainfall) do not make practical sense.

For soil moisture, correlation is calculated by comparing moisture with the chosen

variable from every soil sample. This was done because every soil sample has a unique

soil moisture measure. The soil moisture quantity used was different depending on the

data set (see section 5.1). The moisture quantity used for each variable was the one given

in the appropriate data set.

Correlation coefficients involving temperatures and rainfall were calculated using

average variable values for each month. The reason for this is that hypothesis testing on

correlation requires that variable values are selected at random from normally distributed

populations (Zar, 1999). This assumption can clearly not be satisfied for rainfall and

122

temperature measures when the same nineteen (or fourteen) monthly measures are being

compared many times. In fact, it could be anticipated that given the form of the

correlation coefficient (see section 2.1.2), variable values in months with particularly

high and low rainfall or temperature will very strongly influence the correlation result.

5.4.4 Overall Split Plot Designs

Split plot designs over time can be applied to the Yarraman forestry data in both the

chemical and biological data sets. This section contains the models and information

pertaining to the use of these models for the twelve separate variables retrieved from the

data sets. Split plot models are discussed in detail in section 2.3.2.

The purpose of each model was to simultaneously investigate the effects of treatments,

blocks and seasons. In both data sets there is a repeated measures situation, where a

number of measures have been recorded repeatedly over time. Therefore, the split plot

design applied is form of a ‘split plot over time’. The split plot design used involves two

splits where the main plot contains compaction, cultivation and blocks, the subplot

contains seasons and the ‘sub-subplot’ contains months within seasons.

The main plot tests cultivation, compaction and block main effects along with the

cultivation by compaction interaction for significance. The main plot error term is

comprised of block interactions. All tests of significance for factorial effects in the main

plot use values that are averaged over time, removing the effect of any correlation

between measures at different times. All factorial effects confined to the subplot share a

common correlation from repeated measures over season and a random, independent

error. This allows for factorial effects in the subplot to be effectively compared in tests of

significance. The error term in the subplots is formed by season and block interactions.

The sub-subplot contains replicates formed by months within season. Containing simply

replication (and interactions of replication), the sub-subplot therefore forms one large

third error term that is not of use for significance testing of any factorial effects.

Equation 5.5 presents the base overall split plot model form, where each factorial effect is

tested for significance using the error term to the right of it. The three separate error

123

terms ( 1ijkε , 2

ijklε and 3ijklmε ) use block interactions that are not explicitly pointed out in this

form.

( )( ) ( )

( ) 32

1

ijklmijklijl

jlill

ijkkijjiijklm

CultCompSeason

CultSeasonCompSeasonSeasonBlockCultCompCultCompMeasure

εε

εµ

++××+

×+×++

++×+++=

( 5.5 )

Where:




• l = 1, 2, 3, 4, 5 (season level indicator).

• m = 1, 2, 3 (month within season indicator).

The structure of the main, subplot and sub-subplot is shown in Table 5.4, along with the

numbers of degrees of freedom (df) for analysis for chemical and biological variables.

Degrees of freedom vary because of differences in months available for analysis. The

terms used as estimates of error are as per standard RCB designs. The factorial effects

used as error terms are shown in Table 5.4 using an asterisk (*). Note that the main plot

has ten error degrees of freedom while the subplots have at least 36. The main plot error

degrees of freedom are low but not low enough to cause concern.

124

Plot Source Of Variation Chem. Df Biol. Df Main Compaction 2 2 Main Cultivation 1 1 Main Compaction×Cultivation 2 2 Main Block 2 2 Main Block×Compaction (*) 4 4 Main Block×Cultivation (*) 2 2 Main Block×Compaction×Cultivation (*) 4 4 Subplot Season 4 3 Subplot Season×Compaction 8 6 Subplot Season×Cultivation 4 3 Subplot Season×Compaction×Cultivation 8 6 Subplot Season×Block (*) 8 6 Subplot Season×Compaction×Block (*) 16 12 Subplot Season×Cultivation×Block (*) 8 6 Subplot Season×Compaction×Cultivation×Block (*) 16 12 Sub-subplot (Month:Season) (*) 10 8 Sub-subplot (Month:Season)×Compaction (*) 20 16 Sub-subplot (Month:Season)×Cultivation (*) 10 8 Sub-subplot (Month:Season)×Compaction×Cultivation (*) 20 16 Sub-subplot (Month:Season)×Block (*) 20 16 Sub-subplot (Month:Season)×Compaction×Block (*) 40 32 Sub-subplot (Month:Season)×Cultivation×Block (*) 20 16 Sub-subplot (Month:Season)×Comp.×Cult.×Block (*) 40 32

Table 5.4: Structure and df in overall split plot ANOVA designs.

The number of degrees of freedom for all potential factorial effects as given in Table 5.4

is as expected if there were full and complete data for each variable. This is not always

the case as some data was not available or removed in the chemical data set. Therefore,

the actual degrees of freedom during analysis were at times less. Usually the lesser

degrees of freedom appeared in the months within season replication terms that were not

used in hypothesis testing anyway.

As with most split plot designs, factorial effects are tested for significance using a

standard ANOVA F-test except using different standard error measures for specific

factorial effects. All factorial effects in the main plot are tested using the main plot error,

and in the subplot using the subplot error.

125

Hypothesis testing in these split plot designs involves means. For example, a test of

significance for ‘compaction’ tests the null hypothesis that compaction means are equal

where the alternative hypothesis is that not all means are equal. Being tested on values

averaged over the times, the main plot results present an ‘overall’ picture and are

effectively very similar to the previous cumulative analyses used by Blumfield et al.

(2002).

Should season have a significant interaction with compaction or cultivation (or their

interaction), further interpretation is investigated by considering analyses for each season.

If there are not significant interactions involving season, multiple comparison tests are

conducted to reveal the nature of the differences. For information on season based

analyses refer to section 5.4.6 and for multiple comparison tests refer to section 5.4.8.

5.4.5 Overall MANOVA Designs

The multivariate analysis of variance (MANOVA) is one way to deal with repeated

measures situations such as found in the chemical and biological data sets. This section

contains the MANOVA models and associated important information for the application

of MANOVA to these data sets. MANOVA models are discussed in detail in section

2.3.3.

In the context of this application, MANOVA offers less flexibility than split plot analysis.

In particular, MANOVA does not provide an indication of equality of means for the

factor over which repeated measures were taken. A significant MANOVA result may be

a reflection of an overall effect or interactions involving the factor over which the

repeated measures are taken. Split plot designs evaluate both of these possibilities

separately.

The models in this section are the MANOVA equivalent of the split plot models

presented in section 5.4.4. If every month were to be included in MANOVA, there would

be nineteen (or fourteen) dependent variables. This number of dependent variables

exceeds the available error degrees of freedom and hence models of this type cannot be

applied due to overfitting. To fit these models would result in no feasible estimate of

126

natural variation. Therefore, the overall MANOVA models use seasons but do not

include the detail of months.

To achieve analysis using seasons but not including months, the data in each season was

averaged over the three months. This created a data set with four or five time periods

from the seasons and no month components. There were four seasons for biological

variables and five for chemical variables due to the original number of sampled months.

The general form of the models used is shown in Equation 5.6. Each model is for a

particular variable (eg. total mineral nitrogen dynamics or microbial nitrogen level).

Within each model there are five (or four) dependent variables resulting from the five (or

four) different seasons. Beyond this, the model has the ‘ring’ of a standard ANOVA

model in appearance. The error term is formed from treatment and block interactions as

per standard RCB designs.

( ) ijkmkmijm

jmimmijkm

BlocknCultivatioCompactionnCultivatioCompactionMeasure

εµ

++×+

++= ( 5.6 )

Where:

• ijkmMeasure refers to each variable measure at a particular compaction level i,

cultivation level j, block k and season m.

• mµ is the mean level for the variable being analysed at season m.




• m = 1, 2, 3, 4, 5 (season level indicator – four in the case of biological variables).

• ijkmε represents the natural variation or error term. This is formed from the

interaction of the block with the other model components.

Four common test statistics are used in MANOVA – Wilk’s lambda, Roy’s largest root,

Hotelling-Lawley trace and Pillai’s trace (Zar, 1999). Pillai’s trace tends to be the most

robust to deviations from strict MANOVA assumptions (Zar, 1999). Roy’s largest root is

not usually considered in isolation as it is an upper bound for the test statistic value.

127

Hypothesis tests in MANOVA are different to those in split plot models. In MANOVA,

the hypotheses are the combination of a number of univariate ANOVA hypotheses. For

example, the test of significance for ‘compaction’ tests that variable means for the three

compaction levels are the same in every season (the same in season one, the same in

season two, etc.). The alternative hypothesis for this case is that there is a difference in

mean variable level between at least two compaction levels in at least one season. This

vague conclusion does not really tell much and hence season based MANOVA models

(section 5.4.7) are chosen as the next analysis step if significant relationships are found.

5.4.6 Season Based Split Plot Designs

The purpose behind season based split plot designs is to investigate the behaviour of

treatments within each separate season. These designs are only of relevance if significant

interactions with season are found from the overall split plot designs or MANOVA

significances need to be investigated in more detail. These models test for relationships

within particular seasons without consideration of overall behaviour.

For each variable, should season based split plot models be decided upon, five (or four)

different models are run. These five (or four) models are for each different season. Each

model is a basic ‘split plot over time’ where treatments and the block are in the main plot

and time related factorial effects are in the subplot. Note that now month is the time unit

rather than season. Split plot models are discussed in detail in section 2.3.2.

The season based model is shown in Equation 5.7. There are three months because within

each season there are only three months. The two separate error terms ( 1ijkε and 2

ijkmε ) use

block interactions that are not explicitly pointed out in this model form.

( )( ) ( )

( ) 2

1

ijkmijm

jmimm

ijkkijjiijkm

CultCompMonth

CultMonthCompMonthMonthBlockCultCompCultCompMeasure

ε

εµ

+××+

×+×++

++×+++=

( 5.7 )

128

Where:




• m = 1, 2, 3 (month level indicator).

The structure of the main and subplots is shown in Table 5.5, along with the expected

number of degrees of freedom for analysis on chemical and biological variables. In this

case the degrees of freedom are identical for the chemical and biological data sets

because there is the same number of months in each season, irrespective of the data set.

The terms used as estimates of error are as per standard RCB designs. Note that the main

plot has ten error degrees of freedom while the subplot has 24. The main plot error

degrees of freedom are not high, but are not low enough to be considered a problem.

Plot Source Of Variation Chem. Df Biol. Df Main Compaction 2 2 Main Cultivation 1 1 Main Compaction×Cultivation 2 2 Main Block 2 2 Main Block×Compaction (*) 4 4 Main Block×Cultivation (*) 2 2 Main Block×Compaction×Cultivation (*) 4 4 Subplot Month 2 2 Subplot Month×Compaction 4 4 Subplot Month×Cultivation 2 2 Subplot Month×Compaction×Cultivation 4 4 Subplot Month×Block (*) 4 4 Subplot Month×Compaction×Block (*) 8 8 Subplot Month×Cultivation×Block (*) 4 4 Subplot Month×Compaction×Cultivation×Block (*) 8 8

Table 5.5: Structure and df in seasonal split plot ANOVA designs.

Hypothesis testing in these split plot designs involves means. Should significant

differences be found, the exact nature of these significances is investigated further using

multiple comparison tests. The techniques involved in the multiple comparison tests

undertaken are given in section 5.4.8.

129

5.4.7 Season Based MANOVA Designs

Season based MANOVA designs are to investigate treatment and other effects within

each separate season. As is the case with the split plot equivalent in section 5.4.6, these

tests are only relevant if the overall MANOVA had significances or significant

interactions with season were found in the overall split plot designs.

Four or five season based MANOVA models are run for each variable depending on the

number of seasons. Each model contains three dependent variables, one for each month.

MANOVA models are discussed in detail in section 2.3.3.

The season based model is shown in Equation 5.8. The error term is formed from the

interaction of treatments and the block as per standard RCB designs.

( ) ijkmkmijm

jmimmijkm

BlocknCultivatioCompactionnCultivatioCompactionMeasure

εµ

++×+

++= ( 5.8 )

Where:

• ijkmMeasure refers to each variable measure at a particular compaction level i,

cultivation level j, block k and month m.

• mµ is the mean level for the variable being analysed at month m.




• m = 1, 2, 3 (month level indicator).

• ijkmε represents the natural variation or error term. This is formed from the

interaction of the block with the other model components.

When a factorial effect is significant in MANOVA, the conclusion is rather vague and

unspecific. For example, should the ‘compaction’ factor be significant in the season

based MANOVA model presented here, the conclusion is that there are significant

differences in the mean variable level between at least two compaction levels during at

130

least one month. This conclusion should be investigated further using multiple

comparison tests. In MANOVA this is achieved by reverting back to univariate

MANOVA models and applying standard multiple comparison tests. More information

on multiple comparison tests used in this analysis is provided in section 5.4.8.

5.4.8 Multiple Comparison Tests

Multiple comparison tests were used to find where exactly differences lie, should

significant differences be found between factorial effect levels. For example, if

compaction is found to be significant in a model, multiple comparison tests can tell

exactly which compaction levels are significantly different from each other.

Multiple comparison tests are conducted in this thesis using pair wise t-tests, though other

methods may be just as appropriate. Fundamentally, a multiple comparison test compares

two means for significant differences. Where software was not available to automate

multiple comparison tests, they were done manually using the pair wise t-test based least

significant mean difference formula in Equation 5.9.

21

112 nn

sLevelceSignificantLSD edf +××

= ( 5.9 )

Where:

• edf is the error degrees of freedom.

• s is an approximation of error standard deviation (commonly the ‘root error mean

square’).

• n1 and n2 are the number of observations in the two samples being compared for

equality of means.

Often there are lot of multiple comparison tests involved in a full investigation. Using a

standard allowable error rate of 0.05, one in twenty tests will return significant results

purely by chance. A common approach available to combat this problem, used in this

thesis, is the Bonferroni approach. The Bonferroni approach involves a simple

modification to the significance level as determined by the number of multiple

comparison tests involved. The standard allowable error level is divided by the number of

131

multiple comparison tests to take place. For example, given a standard allowable error of

0.05 and twenty multiple comparison tests, the new allowable error is 0.0025 (0.05 / 20).

All graphs showing treatment means and multiple comparison results follow a standard

format. The lines above and below each mean each display one standard error of the

mean. Means are all assigned at least one letter. Means that share the same letter are not

significantly different. Where multiple months are shown on one diagram, each month is

considered separately. All means and standard errors shown are calculated by back

transformation from the logged values used in analysis (see section 5.4.1).

132

5.5 Data Analysis and Results

Full analysis of each chemical and biological variable is contained in this section. Each of

the twelve variables is analysed separately by exploratory data analysis, investigation into

correlation and the application of split plot and MANOVA models. The technicalities and

model specifics for these applications is reviewed in section 5.4.

5.5.1 Nitrate Levels

Nitrate levels are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix E.

Exploratory data analysis of chemical nitrate levels revealed a rather confused

intermingling of treatments over time. Earlier on, nitrate levels appear to be lowest where

sixteen compactions were applied and highest under one compaction, irrespective of

cultivation. Beyond the initial months, though, clear relationships faded and towards the

end no ploughing and no compaction tends to have the lowest nitrate levels and sixteen

compactions with disc plough cultivation the highest. Applying a moving average

smoother accentuated these relationships (or rather, lack thereof). Both graphs showed

that nitrate levels are gradually increasing over time. The exploratory data analysis

revealed that treatment effects clearly differ over time as treatments showed a lack of

consistent trends.

Investigating the correlation of nitrate levels with rainfall, maximum temperature,

minimum temperate and soil moisture revealed only one significant correlation. Soil

moisture was found to have significant correlation with nitrate levels in the following

month (p = 0.032). The strength of this correlation was weak (r = 0.11917) and could be

regarded as spurious, given that twenty correlation tests were taken at a significance level

of 0.05.

Overall split plot and MANOVA designs found an interaction between the season and

compaction levels. In particular, the split plot design had a strongly significant season by

compaction interaction factorial effect (p < 0.0001). This indicates that the effect of

133

different compaction levels depends on the season. The overall MANOVA model quoted

the compaction effect as being significant for Roys largest root (p = 0.0153) and close to

significant (p < 0.10) for all of the other test statistics. As previously discussed,

significant effects in MANOVA can be a reflection of an overall effect or an interaction

involving that effect and the factor over which repeated measures were taken. In this

case, given the significant split plot interaction, the MANOVA result is a reflection of an

interaction of compaction with season. Due to the significant interaction, main effects are

not investigated because results may be misleading. Instead, each season is looked at

using season based models to investigate behaviour within each season.

Strong significant relationships involving compaction were found in the first season,

which covers months two to four. Significant results were seen for compaction in both

the split plot design (p = 0.0007) and MANOVA (p < 0.05 for all four test statistics)

season based models in the first season. Investigation using multiple comparison tests

found that there were significantly lower amounts on nitrate when sixteen pass

compaction was applied compared to the other two compaction levels (both p < 0.005).

Figure 5.6 presents these results graphically using back transformed means and standard

errors. The only other effect found significant in season one was the month factor in the

split plot design. This tells that there are significant differences in mean nitrate levels

between the three months in season one. Month differences are not a priority for analysis

and are therefore not further investigated.

134

Nitrate Levels By Compaction in Season 1

0

5

10

15

20

25

30

35

40

1

Season

Mea

n N

itrat

e (k

gN/h

a) . None

1 Pass

16 Pass

Figure 5.6: Back transformed means (± S.E.) for compaction effects on mean nitrate

levels in season one.

The significant compaction effect is also present in the second season, which covers

months five to seven. In this instance the compaction effect is not as strong but still

significant in both the split plot (p = 0.0277) and MANOVA (p < 0.05 for three of the

four test statistics) season based designs. At the 0.05 significance level, sixteen pass

compaction once again leads to significantly different mean nitrate levels compared to the

other two compaction levels (p < 0.05). However, using the Bonferroni modified

significance level for multiple comparison tests, only the sixteen pass and zero pass

compaction levels are significantly different (p = 0.012). Back transformed means,

standard errors and significant differences are shown in Figure 5.7. No other factorial

effects were significant in the season based split plot and MANOVA designs in season

two.

135

Nitrate Levels By Compaction in Season 2

0

5

10

15

20

25

30

35

40

2

Season

Mea

n N

itrat

e (k

gN/h

a) . None

1 Pass

16 Pass

Figure 5.7: Back transformed means (± S.E.) for compaction effects on mean nitrate

levels in season two.

No significant effects from compaction (or cultivation) were found in the remaining three

seasons for nitrate levels. The only significant effects were for month in the third (p =

0.0048) and fourth (p = 0.02) seasons using the split plot designs. These simply indicate

differences in mean nitrate levels between months and are not investigated further as they

are not of specific interest.

In summary:

• Compaction significantly affected nitrate levels in the first and second seasons. In

the first season, sixteen pass compaction led to significantly lower nitrate levels

than the other two compaction levels. In the second season, sixteen pass

compaction led to significantly lower nitrate levels than the zero pass compaction

level.

• Cultivation was not found at any point to have a significant influence on nitrate

levels.

• The block was not found to have a significant influence on nitrate levels, but

should have given the use of an RCB design.

136

• Months within seasons commonly have differences in mean nitrate level. This

was the case in the first, third and fourth seasons.

5.5.2 Ammonium Levels

Ammonium levels are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix F.

Exploratory data analysis (EDA) on ammonium levels found extreme values in month

sixteen that were subsequently omitted from analysis. The mass of unclear trends

appeared slightly more palatable with the use of a moving average smoother. The clearest

relationship seen was that, in general, ammonium levels were higher when the disc

plough was used. In particular, with none and one pass compaction mean ammonium

levels were particularly low with no cultivation and particularly high with use of the

plough. Over time, ammonium levels were regularly changing and rather unstable.

Correlation analysis looked at possible correlation of ammonium levels with rainfall,

maximum temperature, minimum temperature and soil moisture. This revealed three

significant correlations, all of which are suspected to be spurious. The significant

correlations suggested that rainfall effects ammonium four months later (p = 0.004), and

that soil moisture effects ammonium three (p = 0.025) and four (p = 0.026) months later.

Should a Bonferroni modification be used on the significance level for correlation, none

of these seemingly spurious relationships would be significant.

A plethora of significant effects appear in the overall split plot and MANOVA designs

for ammonium levels. It is clear that, as opposed to nitrate, cultivation plays a part in

ammonium levels. The split plot design revealed a significant season interaction with

cultivation (p = 0.0319) while the MANOVA design a significant cultivation effect (p =

0.0458 for all test statistics). Both overall models hint at a possible interaction of

compaction and cultivation. The exact nature is difficult to consider from the overall

models since only two of the four MANOVA test statistics found the interaction

significant (p < 0.05) and although the term is significant in the split plot design (p =

137

0.0082) it is also known that cultivation interacts with season. To obtain a clearer picture

of the influences on ammonium levels, season based models were evaluated.

In the first season, an interaction was found between compaction and cultivation. The

interaction was clearly significant in both the split plot (p < 0.0001) and MANOVA (p <

0.05 for all test statistics; p < 0.001 for two) season based models. Multiple comparison

tests found many significant differences in mean ammonium levels. Figure 5.8

graphically displays the following significant differences:

• At the zero pass compaction; there was significantly more ammonium when the

plough was used for cultivation.

• At the sixteen pass compaction the opposite was true, where there was

significantly less ammonium when the plough was used.

• When there was no cultivation, significantly more ammonium was present under

sixteen pass compaction than the other two compaction levels.

• When there was disc plough cultivation, there was significantly more ammonium

at the zero pass compaction than the one pass compaction.

138

Ammonium Levels By Compaction, Cultivation in Season 1

0

1

2

3

4

5

6

7

8

9

10

1

Season

Mea

n A

mm

oniu

m (k

gN/h

a)

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough

Figure 5.8: Back transformed means (± S.E.) for compaction and cultivation effects on

mean ammonium levels in season one.

In the second season the effect of compaction and cultivation depends on the particular

month. The season based split plot design found a significant interaction between month

and compaction (p = 0.037) as well as between month and cultivation (p = 0.0221).

MANOVA also found a significant effect for cultivation (p = 0.0043 for all test statistics)

and compaction (p < 0.05 for all). MANOVA significances can be a reflection of an

overall effect or an interaction with the factor over which repeated measures are taken. In

this case it is clear from the split plot design results that the MANOVA result is an

indication of compaction and cultivation interactions with month. Note that there is no

significant compaction and cultivation interaction in season two, unlike in season one.

The exact nature of the season two compaction by month interaction and the cultivation

by month interaction are investigated using multiple comparison tests within each

particular month. No significant differences in ammonium means resulted from different

139

compaction levels in any month. Back transformed means and results are shown for these

compaction multiple comparison tests in Figure 5.9. In respect to cultivation, only a

significant difference was found in month six, where there was significantly more

ammonium when the disc plough was used. Back transformed means and results are

shown for these cultivation multiple comparison tests in Figure 5.10.

Ammonium Levels By Compaction in Season 2

0

1

2

3

4

5

6

7

8

9

10

5 6 7

Month

Mea

n A

mm

oniu

m (k

gN/h

a)

None 1 Pass 16 Pass

Figure 5.9: Back transformed means (± S.E.) for compaction effects on mean ammonium

levels in season two (each month separately).

140

Ammonium Levels By Cultivation in Season 2

0

1

2

3

4

5

6

7

8

9

10

5 6 7

Month

Mea

n A

mm

oniu

m (k

gN/h

a)

None Disc Plough

Figure 5.10: Back transformed means (± S.E.) for cultivation effects on mean ammonium

levels in season two (each month separately).

The situation in the third season was similar to that in season one except the relationships

were not as strong. The split plot design found a significant interaction between

compaction and cultivation (p = 0.03). In the MANOVA design, only Roy’s largest root

had a significant interaction result (p = 0.0199) but all test statistics had p values under

0.15. Cultivation clearly has a strong effect as it was significant in both the split plot (p =

0.0006) and MANOVA (p = 0.0101 for all), but this effect is known to depend on

compaction level because of the interaction. Multiple comparison tests revealed the

nature of the interaction between compaction and cultivation. For both the zero pass and

one pass compaction levels, significantly higher ammonium levels were found when the

plough was used (as opposed to no cultivation). These multiple comparison test means

and results are shown graphically in Figure 5.11.

141

Ammonium Levels By Compaction, Cultivation in Season 3

0

1

2

3

4

5

6

7

8

9

10

3

Season

Mea

n A

mm

oniu

m (k

gN/h

a)

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


mean ammonium levels in season three.

Significant differences were common between months in the season based models. The

first (p = 0.002), fourth (p < 0.0001) and fifth (p <0.004) seasons all had significant

month factors as determined by the split plot designs. This tells that mean ammonium

levels differ between months but this is not investigated further as it is not the focus of

investigation. With the exception of month, there were no significant factorial effects

found in months four and five. As was the case with nitrate levels, effects of compaction

and cultivation seem to ‘wear off’ over time.

In summary:

• Both compaction and cultivation have varying effects on ammonium levels for the

first three seasons. These seasons cover the months two through to eleven

(excluding eight). The exact nature of the behaviour in each of the seasons is as

follows:

142

o In the first season, there was an interaction between compaction and

cultivation. When there was no compaction, there was significantly more

ammonium when plough cultivation was used. With sixteen pass

compaction, there was significantly more ammonium when no cultivation

was used. When there was no cultivation, significantly more ammonium

was present under sixteen pass compaction than the other two compaction

levels. When there was disc plough cultivation, there was significantly

more ammonium at the zero pass compaction than the one pass

compaction.

o There was an interaction between compaction and month along with

cultivation and month is the second season. The ammonium means from

different compaction levels were completely different in every month with

no specific significant differences. In the second month, there was

significantly more ammonium when the disc plough was used (as opposed

to no cultivation).

o There was an interaction between compaction and cultivation in the third

season. There was significantly more ammonium with disc plough

cultivation than with no cultivation for the zero and one pass compaction

levels.

• The block was not found to have a significant influence on ammonium levels, but

should have given the use of an RCB design.

• Months within seasons commonly had differences in mean ammonium level. This

was the case in the first, fourth and fifth seasons.

5.5.3 Total Mineral Nitrogen Levels

Total mineral nitrogen levels are investigated in this section to find out what can be seen

to be influencing these levels. Raw data, graphs and results are contained in Appendix G.

Exploratory data analysis using raw and smoothed time series for each treatment over

time showed different behaviour at different times. Early on, one pass compaction with

disc plough cultivation resulted in generally higher total mineral nitrogen levels. The

other treatment combinations were more or less interchangeable, with sixteen pass

143

compaction tending to have the least total mineral nitrogen levels (both cultivation

levels). Central months visibly lack distinctive patterns or consistency from month to

month. In the later months the sixteen pass compaction with disc plough applied tended

to have the highest total nitrogen while the zero pass compaction with no cultivation the

lowest.

Analysis using correlation revealed a number of possible relationships between total

nitrogen and physical properties. Should the Bonferroni modification be applied,

however, none of the significant relationships would be significant. The most plausible

possible relationships are minimum temperature lagged by one month (p = 0.0417) and

soil moisture with no lag (p = 0.0058).

The total mineral nitrogen level situation was first investigated using overall split plot

and MANOVA designs. The split plot design revealed a significant interaction of season

with compaction (p = 0.0127). Interestingly, the MANOVA overall design did not give

any significant relationships whatsoever, including for compaction. In the case of an

interaction between season and compaction, it would be anticipated that MANOVA

would have a significant result for compaction, but most p values for compaction in the

MANOVA are between 0.25 and 0.26. Season based MANOVA and split plot models

were used to find the exact nature of the exhibited behaviour.

In the first season the split plot design found compaction significant (p = 0.019) while the

MANOVA results were not as clear. One of the four MANOVA test statistics found

compaction significant (p = 0.0443) while two found the compaction by cultivation

interaction significant (p < 0.05). The MANOVA test statistic most robust to departures

from strict statistical assumptions, Pillai’s trace, was nowhere near significant for the

compaction by cultivation interaction (p = 0.1509). Therefore, given this fact and the lack

of significance for this interaction in the split plot design (p = 0.4628), the interaction was

not investigated for the time being. Investigation of the main effect of compaction during

season one found that total nitrogen levels are significantly higher when there is one pass

compaction compared to sixteen pass compaction. Means and significant differences

during season one resulting from compaction are shown in Figure 5.12.

144

Total Mineral Nitrogen Levels By Compaction in Season 1

0

5

10

15

20

25

30

35

40

1

Season

Mea

n N

itrog

en (k

gN/h

a) None

1 Pass

16 Pass

Figure 5.12: Back transformed means (± S.E.) for compaction effects on mean total

mineral nitrogen levels in season one.

The only other season to show significant differences relating to compaction or

cultivation was season three. The compaction by cultivation interaction was found to be

significant using an overall split plot design (p = 0.0321). The interaction was only

significant for Roy’s largest root in MANOVA (p = 0.029) but returned p values under

0.15 for all test statistics. Multiple comparison tests using the Bonferroni modification

found no significant differences in mean total mineral nitrogen levels from different

compaction and cultivation combinations. Without the Bonferroni modification some

comparisons are significant (p < 0.05). Figure 5.13 shows the means and (lack of)

significant differences for compaction and cultivation combinations in season three.

145

Total Mineral Nitrogen Levels By Compaction, Cultivation in Season 3

0

5

10

15

20

25

30

35

40

3

Season

Mea

n N

itrog

en (k

gN/h

a)

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


mean total mineral nitrogen levels in season three.

There are significant differences in mean total mineral nitrogen levels between months in

a number of seasons. Given the somewhat erratic nature of the exploratory data analysis

graphs this does not come as a surprise. Significant differences between months are

present in the first (p = 0.0001), third (p = 0.017), fourth (p = 0.0038) and fifth (p <

0.0001) seasons.

The block was close to having a significant effect in seasons two and five. This was

because the ‘upper bound’ Roy’s largest root test statistic in MANOVA was significant

in these two seasons. Furthermore, block was close to being significant in the second

season split plot design (p = 0.075). None of these results confirmed a significant block

effect at any point.

146

In summary:

• In season one there are significant differences in mean mineral nitrogen resulting

from different compaction levels. There was significantly more total nitrogen

where one pass compaction had been applied than where sixteen pass compaction

was applied.

• The block was not found to have a significant influence on mean total mineral

nitrogen levels, but should have given the use of an RCB design.

• Months within seasons commonly have differences in mean total mineral nitrogen

level. This was the case in the first, third, fourth and fifth seasons.

5.5.4 Nitrate Dynamics

Nitrate dynamics are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix H.

Exploratory data analysis assisted slightly in deciphering the complex, seemingly

unrelated trends formed by the treatments over time. Even in the smoothed graphical

form relationships were unclear. In the earlier and later months there are no clear trends,

with treatments orders commonly changing dramatically from month to month. The only

consistency over time can be found in the middle months where the one pass compaction,

disc plough cultivation and sixteen pass compaction, disc plough cultivation clearly have

higher nitrate dynamics.

Correlation analysis clearly showed that rainfall, maximum temperature, minimum

temperature and soil moisture are not correlated with nitrate dynamics. None of the

twenty separate correlation tests revealed a significant result as all had p values over 0.05.

Not a lot was revealed by overall split plot and MANOVA models. The split plot model

revealed a three way interaction between season, compaction and cultivation (p =

0.0196). This can be interpreted as meaning that the behaviour of the interaction between

compaction and cultivation on nitrate dynamics depends on the particular season. This is

a fair enough call given the rather inconsistent behaviour shown in graphs created during

147

exploratory data analysis. The equivalent factorial effect to suggest a season, compaction

and cultivation interaction in MANOVA is the compaction by cultivation interaction. The

compaction by cultivation interaction was, however, not significant in MANOVA (p >

0.05 for all test statistics). To investigate the nature of the interaction between season,

compaction and cultivation found in the split plot model, season based MANOVA and

split plot models were the next step.

During season one, split plot and MANOVA revealed two significant relationships. The

first and most notable is an interaction between month and cultivation from the seasonal

split plot design (p = 0.0113). Upon further investigation it was found that there was a

significantly higher level of nitrate dynamics in month four when there is no cultivation

compared to when disc plough cultivation is used (p = 0.0099). Other months in the

season had higher dynamics when the disc plough was used (but not significantly more).

This cultivation situation is shown graphically in Figure 5.14. The equivalent seasonal

MANOVA model did not find cultivation significant (p = 0.1003 for all test statistics) but

instead found the block significant (p < 0.05 for three of the four test statistics). The

season one split plot model was close to finding the block term significant (p = 0.0547).

The block term was not found to be significant in any other season and is not investigated

further as it is not the focus of this case study.

148

Nitrate Dynamics By Cultivation in Season 1

0

0.5

1

1.5

2

2.5

3

3.5

4

2 3 4

Month

Mea

n N

itrat

e (k

gN/h

a) .

None Disc Plough

Figure 5.14: Back transformed means (± S.E.) for cultivation effects on mean nitrate

dynamics in season one (each month separately).

Significant relationships beyond the first season were few and far between. The split plot

designs revealed a significant difference in mean nitrate dynamics levels between the

three months in the fourth season. The MANOVA design in the final season was close to

finding a significant interaction between compaction and cultivation, with ‘upper bound’

Roy’s largest root having a p value under 0.05.

In summary:

• There was an interaction between month and cultivation in season one on nitrate

dynamics. Investigating further revealed the only significant difference to be in

the third month (month four), where there was significantly more nitrate dynamics

when there was no cultivation.

• The block was significant using MANOVA (and very close to being significant

using a split plot design) in season one. At no other season was the block

significant, as it should be to adhere to a RCB design.

149

• There is a significant difference in mean nitrate dynamics in season four between

the three months.

5.5.5 Ammonium Dynamics

Ammonium dynamics are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix I.

The graphs created during exploratory data analysis provided few clues to relationships

between compaction, cultivation and ammonium dynamics. Early on the sixteen pass

compaction, no cultivation treatment had the lowest level of ammonium dynamics. For

the remainder of the times there are no clear common trends in the time series resulting

from the different treatments. The performance of a treatment appears to be largely

dependent on the particular month. The smoothed graph removes a lot of the variability

associated with each particular month and shows that those treatments where disc plough

cultivation has occurred tend to have higher ammonium dynamics during the middle

months. Extremely high values observed in month fifteen and extremely low values in

month sixteen were removed prior to analysis to prevent misleading results.

Correlation analysis found a significant negative correlation with soil moisture (p =

0.0311). This tells that the more soil moisture, the less ammonium dynamics there will

be. However, if a Bonferroni modification was applied to the twenty tests of correlation,

none would be found to be significant. Therefore caution should be applied because this

relationship between ammonium dynamics and soil moisture may be spurious.

Overall split plot and MANOVA design results were evaluated to investigate influences

on ammonium dynamics. The overall split plot design found a significant interaction

between compaction and cultivation (p = 0.0464). MANOVA only found this interaction

significant for Roy’s largest root (p = 0.0399) but all p values were under 0.15. Further

investigation using multiple comparison tests revealed no significantly different means

resulting from compaction and cultivation combinations, as shown in Figure 5.15.

Without the Bonferroni modification on the multiple comparison tests two significant

differences appear. These are between the one pass and sixteen pass compaction levels

150

when the disc plough is used and between the two cultivation levels when sixteen pass

compaction is used.

Ammonium Dynamics By Compaction, Cultivation

0

0.5

1

1.5

2

2.5

3

3.5

4

1-5

Season

Mea

n A

mm

oniu

m (k

gN/h

a)

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


mean ammonium dynamics.

The overall split plot design also found a significant season effect (p = 0.0036), a simple

reference to mean ammonium levels being different between seasons. This relationship is

not investigated further because it is not of interest. Season based models were not

investigated because there was no evidence of interactions involving the season,

indicating similar behaviour (or lack thereof) across all seasons.

In summary:

• There is an overall interaction between compaction and cultivation on ammonium

dynamics. Significant differences between treatment means are only present if the

Bonferroni modification is not applied. These are between the one pass and

151

sixteen pass compaction levels when the disc plough is used and between the two

cultivation levels when sixteen pass compaction is used.

• The block does not have a significant effect on ammonium dynamics.

• There is a significant difference in mean ammonium dynamics between different

seasons.

5.5.6 Total Mineral Nitrogen Dynamics

Total mineral nitrogen dynamics are investigated in this section to find out what can be

seen to be influencing these levels. Raw data, graphs and results are contained in

Appendix J.

The picture given by graphs during exploratory data analysis for total mineral nitrogen

dynamics is similar to that given in nitrate and ammonium dynamics. That is, there is not

very much to see. Nitrogen dynamics slowly rise from the start before peaking at month

fifteen, savagely dropping in month sixteen and somewhat returning to ‘normal’ in month

seventeen. For many months it appears that treatments where the disc plough was used

had more total mineral nitrogen dynamics.

Total mineral nitrogen dynamics were compared with lags of rainfall, minimum

temperature, maximum temperature and soil moisture to see if there were any significant

correlations. No correlation tests were found to be anywhere near significant (all p >

0.10). Therefore, it would appear that there is no relationship between total mineral

nitrogen dynamics and rainfall, temperature or soil moisture.

Overall split plot and MANOVA designs revealed very little about the nature of the

influences on total mineral nitrogen dynamics. The only term significant in either model

was season in the split plot design (p < 0.0001). That is, there is a different mean level of

mineral nitrogen dynamics depending on the season. This simply reflected the

exploratory data analysis graphs that showed different dynamics at different times.

152

In summary:

• It is unclear what exactly is influencing total nitrogen dynamics, except for time.

Mean total nitrogen dynamics were found to be significantly different depending

on the season.

• The block was not significant, as it should have been for a RCB design.

5.5.7 Nitrate Leaching

Nitrate leaching levels are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix K.

Exploratory data analysis involved the creation of graphs to observe the behaviour of

different treatments on nitrate leaching over time. The clearest relationship was that those

treatments where disc plough cultivation was involved tended to have higher levels of

nitrate leaching. The cultivation differences given, the trends between compaction levels

were inconsistent and varied erratically through the experiment. A smoothed version of

the original graph emphasised the apparent differences in leaching levels between the two

cultivation levels.

Nitrate leaching was found not to be correlated with rainfall, temperature or soil

moisture. This was discovered by using a number of correlation significance tests

between nitrate leaching and up to four lags of rainfall, minimum temperature, maximum

temperature and soil moisture. The p value for every correlation test was above 0.05.

The overall split plot result reaffirmed the suspicion that cultivation levels may be

effecting mean nitrate leaching levels. The cultivation factor was found to be significant

in the split plot design (p = 0.0383) but not in the MANOVA (p = 0.2197 for all test

statistics). The reason for this inconsistency is due to the nature of the hypotheses being

tested in split plots and MANOVA. In this case, MANOVA is testing for equality of

mean nitrate leaching in each season while the split plot tests for equality of nitrate

leaching means averaged over the five seasons. That is, the split plot test is looking more

at the ‘bigger picture’ while the MANOVA is focusing on the details. Multiple

153

comparison tests found overall leaching levels significantly higher (p = 0.0383) when the

disc plough was used as opposed to when no cultivation was applied. The nitrate leaching

means and cultivation level comparison is shown graphically in Figure 5.16.

Nitrate Leaching By Cultivation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1-5

Season

Mea

n N

itrat

e (k

gN/h

a) .

None

Disc Plough

Figure 5.16: Back transformed means (± S.E.) for cultivation effects on mean nitrate

leaching.

The only other significant result from the overall designs was for season in the split plot

design (p = 0.0047). This is simply a reflection of different mean nitrate leaching in

different seasons and is not of interest for further analysis.

In summary:

• Cultivation significantly affects mean nitrate leaching levels. Significantly more

mean nitrate leaching occurs when the disc plough is used.

• No evidence was provided that compaction has any influence on nitrate leaching.

• Season was found to significantly affect nitrate leaching. This reflects differences

in mean nitrate leaching at different times.

• The block was not found to be significant when it should have been because of

the RCB design.

154

5.5.8 Ammonium Leaching

Ammonium leaching levels are investigated in this section to find out what can be seen to

be influencing these levels. Raw data, graphs and results are contained in Appendix L.

Raw and smoothed graphs were created for exploratory data analysis to investigate the

effects of different compaction and cultivation levels over time. Levels of ammonium

leaching appeared to be highest when the disc plough was used, while differences

between compaction levels were far less clear. Precise behaviour again varies from

month to month and treatments lack distinct trends overall.

Ammonium leaching levels were found not to be correlated with rainfall, temperature or

soil moisture. A series of correlation significance tests found no p values under 0.20

which clearly stated a lack of correlation of ammonium leaching with rainfall, maximum

temperature, minimum temperature and soil moisture.

Overall split plot and MANOVA designs investigated possible effects on mean levels of

ammonium leaching. The overall split plot reported a significant cultivation effect (p =

0.0132) which was not significant under MANOVA (p = 0.15156). The reason for this is

anticipated to be that cultivation levels have an overall effect but not a strong enough

significance in any particular season to given a significant MANOVA result. Multiple

comparison tests reveal that overall ammonium leaching levels are higher when the disc

plough is used (p = 0.0132). This result is shown graphically in Figure 5.17.

155

Ammonium Leaching By Cultivation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1-5

Season

Mea

n A

mm

oniu

m (k

gN/h

a)

None

Disc Plough

Figure 5.17: Back transformed means (± S.E.) for cultivation effects on mean ammonium

leaching.

The overall MANOVA returned a significant result for compaction using the ‘upper

bound’ Roy’s largest root with a p value of 0.0274. All other compaction test statistics

were not close to being significant though, with p values over 0.10. Interestingly, the p

value for compaction in the split plot design was a lot higher at 0.3504. These results do

not provide sufficient evidence that compaction is affecting ammonium leaching. Season

was found to be significant in the split plot model (p = 0.0089), a reflection of different

mean ammonium leaching levels in different seasons. Overall, no interactions with

season were found in the split plot design and there were no clearly significant factors in

the overall MANOVA. There is therefore no motivation for looking at season based

models.

In summary:

• Overall, cultivation significantly affected ammonium leaching. Significantly

higher levels of ammonium leaching occurred where the disc plough was used

compared to where no cultivation was applied.

• No evidence was provided that compaction has any influence on nitrate levels.

156

• Season was found to be significant. This simply informs that mean ammonium

leaching levels are different depending on the season.

• The block was not significant when it should have been for a RCB design.

5.5.9 Total Mineral Nitrogen Leaching

Total mineral nitrogen leaching levels are investigated in this section to find out what can

be seen to be influencing these levels. Raw data, graphs and results are contained in

Appendix M.

Graphs created for exploratory data analysis presented a similar picture for total mineral

nitrogen leaching as for nitrate and ammonium leaching. Higher levels of total mineral

nitrogen leaching tend to be present where the disc plough was used for cultivation.

There is not a clear relationship depending on the level of compaction. Mean levels of

total mineral nitrogen leaching differ depending on the month, with central months

having the highest levels.

Correlation analysis revealed no significant relationships between total mineral nitrogen

leaching and rainfall, minimum temperature, maximum temperature or soil moisture. All

twenty tests of the correlation significance resulted in p values over 0.05.

Overall split plot and MANOVA designs were used to establish influences on total

mineral nitrogen leaching. Cultivation was found to be significant using split plot designs

(p = 0.0482) but not using MANOVA (p = 0.2262). This unusual result is considered a

reflection of there being a significant overall effect but a lack of significant effects in

specific seasons. Investigation into the overall situation found mean mineral nitrogen

leaching to be significantly higher when the disc plough is used (p = 0.0482) as shown in

Figure 5.18.

157

Total Mineral Nitrogen Leaching By Cultivation

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1-5

Season

Mea

n N

itrog

en (k

gN/h

a)

None

Disc Plough

Figure 5.18: Back transformed means (± S.E.) for cultivation effects on mean total

mineral nitrogen leaching.

The split plot design also revealed significant differences in mean nitrogen leaching

levels depending on season (p = 0.0004). This is because there are different levels of total

mineral nitrogen leaching in different seasons. The block was nowhere near being

significant in either the overall split plot or MANOVA designs.

In summary:

• There is an overall effect of cultivation on total mineral nitrogen leaching. Where

the disc plough was used there was significantly more nitrogen leaching.

• No evidence was provided that compaction has any influence on total nitrogen

leaching.

• Mean total nitrogen leaching levels depend on the particular season.

• The block was not significant when it should have been for a RCB design.

158

5.5.10 Microbial Carbon Levels

Microbial carbon levels are investigated in this section to find out what can be seen to be

influencing these levels. Raw data, graphs and results are contained in Appendix N.

Graphs created during exploratory data analysis for microbial carbon levels revealed

interesting behaviour that was clearly changing over time. Four treatments were relatively

consistent over time while two treatments where the plough was used vary in a systematic

and interesting way, as shown by the smoothed graph in Figure 5.19.

Microbial Carbon Levels By Compaction, Cultivation (3MA Smoothed)

0

100

200

300

400

500

600

700

800

900

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Car

bon

(µg/

g)

0, None 0, Plough1, None 1, Plough16, None 16, Plough

Figure 5.19: Microbial carbon levels by compaction and cultivation over time.

159

Correlation analysis found that microbial carbon levels are strongly correlated (p <

0.0001) with soil moisture. The correlation coefficient is 0.286, indicating that while it is

not a strong correlation; it is clearly a significant one (from the p value). Figure 5.20

shows a graphical cross correlation function resulting from calculating the correlation of

microbial carbon with five lags of soil moisture. The strong correlations after the lag of

zero are assumed to be mainly a result of the strong correlation at the lag of zero. A

significant correlation was also found between rainfall and microbial carbon levels two

months later (p = 0.018). It is likely that this relationship is spurious due to there being a

total of twenty correlation calculations.

Cross-Correlation:Microbial Carbon

Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

Figure 5.20: Graphical cross correlation function - microbial carbon and soil moisture.

Overall split plot and MANOVA designs revealed a number of sources of variation for

microbial carbon levels. The only agreement between the split plot and MANOVA

models was in regards to the block. The split plot model found the block significant (p =

0.0234) as did two of the MANOVA test statistics (p < 0.05). In addition to this, the

overall split plot revealed a significant season by compaction interaction (p = 0.0189).

Compaction was not significant in MANOVA, which would have been an indication of

agreement on a season by compaction interaction. The season by compaction interaction

informs that the behaviour of compaction depends on the season. For this reason season

based models are pursued.

160

The first significant effects are found in the second season. The split plot design found

the compaction by cultivation interaction to be significant (p = 0.0408). Only Roy’s

largest root found this term significant in MANOVA (p = 0.027). Multiple comparison

tests using the Bonferroni modification found no significant differences between means

resulting from compaction and cultivation combinations. The means involved and

significance results from these tests are shown in Figure 5.21. Without the Bonferroni

modification two significant differences (p < 0.05) would result. These are between zero

pass and sixteen pass compaction when there is no cultivation, and between the two

cultivation levels when there is no compaction.

Microbial Carbon Levels By Compaction, Cultivation in Season 2

0

200

400

600

800

1000

1200

2

Season

Mea

n C

arbo

n (k

g/ha

) .

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


mean microbial carbon levels in season two.

In the third season, the block is significant by both the split plot (p = 0.0039) and

MANOVA (p < 0.02 for all test statistics). The exact nature of the block effect is not

investigated further as it is not a focus of this case study. The split plot design also found

the compaction by cultivation interaction significant (p = 0.0424) as was the case in the

161

second season. Multiple comparison tests using the Bonferroni approach did not find any

significant differences between means. Means, significant differences and standard errors

are shown in Figure 5.22. If the Bonferroni approach had not been applied, one (relevant)

significant difference would have resulted (p < 0.05). This is between the two cultivation

levels when there is no compaction. Note this is exactly the same as one of the mean

differences that was close to being significant in season two.

Microbial Carbon Levels By Compaction, Cultivation in Season 3

0

200

400

600

800

1000

1200

3

Season

Mea

n C

arbo

n (k

g/ha

) .

0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


mean microbial carbon levels in season three.

In summary:

• There was a strong positive correlation between soil moisture and microbial

carbon levels.

• There was an interaction between compaction and cultivation in seasons two and

three. Multiple comparison tests revealed no significant differences between

means though some were very close.

162

• The block had a significant effect on mean microbial carbon levels. This was

significant overall and in the third season.

5.5.11 Microbial Nitrogen Levels

Microbial nitrogen levels are investigated in this section to find out what can be seen to

be influencing these levels. Raw data, graphs and results are contained in Appendix O.

Exploratory data analysis using raw and smoothed graphs of treatments over time showed

some clear trends. The zero pass, no cultivation treatment tends to have the highest levels

of microbial nitrogen. The zero pass, disc cultivation treatment tends to have the lowest

levels of microbial nitrogen. The remainder of the treatments swap around frequently and

do not have their ordering as clear. Through the nineteen months microbial nitrogen

levels are decreasing overall, with a spike in months six and ten.

As with the other microbial variables, microbial nitrogen is strongly correlated with soil

moisture (p < 0.0001). With a correlation of 0.378 at no time lag, higher moisture levels

are associated with higher levels of microbial nitrogen. No significant correlations were

found between microbial nitrogen and a variety of lags of rainfall, minimum temperature

and maximum temperature.

Overall split plot and MANOVA designs were applied to the microbial nitrogen variable.

The block was significant in the split plot design (p = 0.0082) but only significant for

Roy’s largest root (p = 0.0099) in the MANOVA. This suggests that while the block

appears significant looking over all times, within any particular month there is not a

strongly significant difference. Although the block is not a focus of this case study,

multiple comparison tests revealed significant mean differences between the third block

and the first and second blocks (see Figure 5.23). The split plot design also reported a

significant interaction between season and compaction (p = 0.0484).The MANOVA

compaction effect, which should also be significant if there is an interaction between

compaction and season, was not significant (except for Roy’s largest root, with a p value

of 0.0401). Because of the significant interaction of compaction and season from the split

plot model, season based models are investigated to find the exact nature of the

interaction.

163

Microbial Nitrogen Levels By Block

0

20

40

60

80

100

120

140

160

180

200

1-5

Season

Mea

n N

itrog

en (µ

g/g)

Block 1

Block 2

Block 3

Figure 5.23: Back transformed means (± S.E.) for block effects on mean microbial

nitrogen levels.

The interaction between month and compaction was found to be significant in two

different seasons. In season one, the split plot model found the interaction significant (p =

0.0397) while in the equivalent MANOVA model only one test statistic found

compaction significant (p = 0.015). In the third season, the split plot design found the

interaction strongly significant (p = 0.0079) and two test statistics found compaction

significant (p < 0.05) in the equivalent MANOVA model. Using multiple comparison

tests, the only significant compaction effect in season one was in month three. In this

month significantly higher mean microbial nitrogen levels were present with no

compaction than sixteen pass compaction. Only one significant difference was found in

season three as well. A significantly higher mean microbial nitrogen level was present

with no compaction than one pass compaction in month nine. Compaction means and

significances for each month are shown for season one in Figure 5.24 and for season

164

three in Figure 5.25. These figures show that the effect of compaction varies substantially

between months.

Microbial Nitrogen By Compaction in Season 1

0

50

100

150

200

250

300

350

2 3 4

Month

Mea

n N

itrog

en (µ

g/g)

None 1 Pass 16 Pass

Figure 5.24: Back transformed means (± S.E.) for compaction effects on mean microbial

nitrogen levels in season one (each month separately).

165

Microbial Nitrogen By Compaction in Season 3

0

50

100

150

200

250

300

350

9 10 11

Month

Mea

n N

itrog

en (µ

g/g)

None 1 Pass 16 Pass

Figure 5.25: Back transformed means (± S.E.) for compaction effects on mean microbial

nitrogen levels in season three (each month separately).

The second season was relatively void of any relationships, except for a strongly

significant season effect (p < 0.0001). This simply informs that the mean microbial

nitrogen level is different in different months.

The block was significant or close to being significant in a number of different seasons.

In the first season the block was significant with one MANOVA test statistic (p = 0.0134)

and close to being significant with the other test statistics (all p < 0.15). Three of the four

MANOVA test statistics found the block significant in the second season (p < 0.05) while

the split plot result was close to significant (p = 0.0685). The block was significant in the

third season for both the split plot (p = 0.0023) and MANOVA (p < 0.05 for three of the

four test statistics) models. In the fourth and final season the block was close to

significant (p = 0.0566) in the split plot model and significant (p = 0.0423) for one test

statistic in MANOVA (all p < 0.15).

166

In summary:

• There was a strong positive correlation between soil moisture and microbial

nitrogen levels.

• An interaction between month and compaction exists in seasons one and three.

This is a reflection of different compaction behaviour in different months.

Significantly more microbial nitrogen was found at no compaction compared to

sixteen pass compaction in month three. Significantly more microbial nitrogen

was found at no compaction compared to one pass compaction in month nine.

• Significant differences in mean microbial nitrogen exist between the blocks. The

block was significant overall and in every season (to varying degrees).

• In the second season month was strongly significant. Mean nitrogen levels differ

during this season depending on the month.

5.5.12 Microbial Carbon to Nitrogen Ratio

The microbial carbon to nitrogen ratio is investigated in this section to find out what can

be seen to be influencing these levels. Raw data, graphs and results are contained in

Appendix P.

Graphs created during exploratory data analysis provided a few clues as to the influences

on the microbial carbon to nitrogen ratio. Overall, the ratio decreases slowly until month

seven before sharply rising, then sharply dropping in month nine and sharply rising again

in month twelve. Those treatments where the disc plough was applied tend to have higher

levels than their non cultivated counterparts. The differences between treatments are

unclear and vary from month to month.

Correlation analysis found that the microbial carbon to nitrogen ratio is not correlated

with rainfall, maximum temperature or minimum temperature. Along with the other

biological variables the ratio was found to be strongly correlated with soil moisture (p <

0.0001). Unlike the other biological variables this was a significant negative correlation,

meaning that higher levels of the ratio are associated with lower soil moisture levels.

Figure 5.26 shows the cross correlation function formed from correlation between the

167

ratio and a number of lags of soil moisture. It is suspected that the correlation is really at

a lag of zero and other strong lags are a reflection of this relationship.

Cross-Correlation:Microbial C:N Ratio

Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

Figure 5.26: Graphical cross correlation function – microbial carbon to nitrogen ratio and

soil moisture.

Many relationships were uncovered by overall split plot and MANOVA designs. In

particular, the split plot design found a highly significant (p = 0.0005) three way

interaction between season, compaction and cultivation. This informed that the behaviour

of cultivation and compaction combinations depends on the season. Interestingly, the

equivalent MANOVA term that would be expected to be significant, the compaction by

cultivation interaction, was only significant for Roy’s ‘upper bound’ largest root (p =

0.0334). All MANOVA test statistics for this interaction did have p values under 0.15

though. These interactions were seen as reason to investigate further using season based

designs. Note that graphs in this section use different scales for clarity of visual

representation.

During the first season the split plot design revealed a significant three way interaction

involving month, compaction and cultivation. This is reaffirmed by three of the four

MANOVA test statistics for the compaction by cultivation interaction being significant (p

< 0.05). Multiple comparison tests found three notable significant mean differences. In

month three, the carbon to nitrogen ratio is significantly higher with the plough than

without the plough where one pass compaction has been applied. The precise reverse of

this is true in month four, where the ratio is significantly less when the plough is used

168

compared to when it is not used at the one pass compaction level. When the plough is

used in month four, there is a significantly higher ratio where there is no compaction

compared to one pass compaction. Many more multiple comparison test results would

have been significant if the Bonferroni modification was not used. However, the results

are somewhat erratic. Figure 5.27 presents the means and significant differences from the

month by compaction by cultivation interaction in season one.

Microbial C:N Ratio By Compaction, Cultivationin Season 1

0

2

4

6

8

10

12

14

2 3 4

Month

Mea

n C

arbo

n to

Nitr

ogen

Rat

io 0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


the mean microbial carbon to nitrogen ratio in season one (each month separately).

The exact nature of effects on the biological carbon to nitrogen ratio in season two is

difficult to decipher from the season based split plot and MANOVA results. The split plot

result reported a significant interaction of month and compaction (p = 0.0231) and a

separate significant cultivation effect (p = 0.009). The MANOVA claimed that there is an

interaction between compaction and cultivation though, with all test statistics returning a

p value under 0.05. To be on the safe side the behaviour of compaction and cultivation

levels was investigated within each month using multiple comparison tests. In month five

169

the mean carbon to nitrogen ratio was significantly lower for the sixteen pass compaction,

no cultivation treatment than every other treatment. In month six, the only significant

difference was that the ratio was significantly higher for no compaction than sixteen pass

compaction when there was no cultivation. Figure 5.28 presents these results graphically

where month seven has been left out mainly for aesthetic value (there were no significant

mean differences in this month anyway).


0

1

2

3

4

5

6

7

5 6

Month

Mea

n C

arbo

n to

Nitr

ogen

Rat

io 0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


the mean microbial carbon to nitrogen ratio in season two (each month separately).

The third season presents a similar scenario to that seen in the first two seasons.

Influences of compaction and cultivation levels depend on the month. The split plot

design found a significant month by compaction interaction (p < 0.0001) and month by

cultivation interaction (p = 0.0009) while the equivalent MANOVA design found the

compaction by cultivation interaction significant (p < 0.05 for three of the four test

statistics). These results demonstrated that the carbons to nitrogen ratio means were

different depending on the month and that there may be an interaction between

170

compaction and cultivation in these months. Therefore multiple comparison tests were

applied within each month of the third season. Means and results of these tests are shown

in Figure 5.29 for month nine and Figure 5.30 for months ten and eleven. Many

significant differences were found and were sometimes contradictory in different months.

This is a side effect of the known interactions with month.

Microbial C:N Ratio By Compaction, Cultivationin Season 3 (Month 9)

0

5

10

15

20

25

30

35

9

Month

Mea

n C

arbo

n to

Nitr

ogen

Rat

io 0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


the mean microbial carbon to nitrogen ratio in season three (month nine).

171

Microbial C:N Ratio By Compaction, Cultivationin Season 3 (Month 10, 11)

0

2

4

6

8

10

12

14

10 11

Month

Mea

n C

arbo

n to

Nitr

ogen

Rat

io 0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


the mean microbial carbon to nitrogen ratio in season three (months ten and eleven).

The behaviour in season four was similar to that seen in the other three seasons. The

overall split plot design found a significant interaction between season and cultivation (p

= 0.0066) that was reinforced by the significant cultivation effect in MANOVA (p =

0.0134 for all test statistics). While cultivation was interacting with seasons it was also

interacting with compaction as found in the split plot design (p = 0.0066) and hinted at by

MANOVA (p < 0.10 for all test statistics). The consequence of this is that the effects of

compaction and cultivation once again depended on the particular month. Multiple

comparison tests found that the mean carbon to nitrogen ratio value was significantly

higher for disc plough cultivation than no cultivation when there was one pass

compaction in month twelve. More multiple comparison test results would have been

significant without the use of the Bonferroni modification. The display of means and

significant differences for season four in Figure 5.31 shows the differences present in

mean carbon to nitrogen ratio values between treatments in different months.

172


0

5

10

15

20

25

30

35

11 12 13

Month

Mea

n C

arbo

n to

Nitr

ogen

Rat

io 0 Pass, None

0 Pass, Plough

1 Pass, None

1 Pass, Plough

16 Pass, None

16 Pass, Plough


the mean microbial carbon to nitrogen ratio in season four.

As was the case with the other biological variables, the block commonly led to significant

differences in mean carbon to nitrogen ratio measures. The overall split plot design found

the block significant (p = 0.026) while only Roy’s ‘upper bound’ test statistic in the

overall MANOVA found it significant (p = 0.0258). In the first season only Roy’s largest

root test statistic found the block significant (p = 0.0407). For the second season the

block was strongly significant in MANOVA (p < 0.05 for all test statistics) but not so in

the split plot design (p = 0.3982). The split plot design (p = 0.0124) and MANOVA (p <

0.05 for all test statistics) agreed on block significance in the third season. No significant

differences were found with either test statistic in the fourth (and final) season.

More significant differences were found in the analysis of the carbon to nitrogen ratio

than with any other variable. This stems from there being a particularly small amount of

error in the statistical methods applied. Detailed biological reasons for these small

amounts of error are beyond the scope of this thesis. It would appear that the carbon to

173

nitrogen ratio properties do not vary as much depending on particular pieces of land as

other variables.

In summary:

• There was a strong negative correlation between soil moisture and the microbial

carbon to nitrogen ratio.

• The behaviour of the carbon to nitrogen ratio from compaction and cultivation

depends entirely on the month. Changing from month to month, significant

differences between treatments (in months) are common but since they differ

from month to month, an unclear picture is painted overall.

• The block term was significant (or close to being significant) in most models

evaluated. The block was particularly influential overall and in seasons two and

three.

174

5.6 General Discussion

The Yarraman experiments and their resulting data sets and analysis make an interesting

statistical case study. In this discussion, the main case study issues and problems are

covered from a statistical point of view. Data analyses are reflected upon with a purpose

towards advancing the potential to extract information out of similar situations in the

future. Lastly, future directions for the current data set are presented.

Problems with the initial randomised complete block (RCB) design were felt throughout

data analysis. These manifested when the block was not significant in most models. It is

also likely that the design contributed towards the overall lack of sensible significant

relationships found. Randomised complete block (RCB) designs use a factor as a block.

This blocking factor is assumed to have an effect on the variable being modelling and not

interact with any other factors. The data results suggest that both of these RCB

assumptions have been violated. Firstly, rarely was the block found to be significant in

models and this was not surprising given the subtle differences that differentiated the

three blocks (Blumfield, personal communication). Secondly, previous data analysis by

Blumfield et al. (2002) and Chen et al. (2002) suggested significant interactions of the

block with other factors. Given the relative failure of the blocking aspect of the initial

model, alternatives should be considered in future. Blocking is useful for providing

necessary replication but needs to be on a factor that adheres to RCB design assumptions.

The log transformation for variable values prior to analysis was found to be fairly

successful. Making the samples adhere closer to normality led to noticeably more

significant relationships than there was from analysis on the raw data (not included).

Furthermore, the split plot and MANOVA results were more consistent when analysed

using the log transformed values. This suggests that part of the reason for the occasional

inconsistency between split plot and MANOVA results may be due to deviations from

statistical assumptions.

The possibility of other variables affecting the chemical and biological measures was

investigated during data analysis. This was done by cross correlation of each variable

175

with the available environmental variables rainfall, maximum temperature, minimum

temperature and soil moisture. The only strongly significant relationships were between

soil moisture and the microbial carbon, microbial nitrogen and microbial carbon to

nitrogen ratio variables. The inclusion of soil moisture as a covariate in these variable

models is a possibility for the future.

There were not a large number of significant relationships found during analysis. Often,

while samples revealed a sizable difference in mean, the same could not be assumed for

the populations they came from. The reason for this was that high error estimates made

significant differences difficult to find. This is because variation attributed to a factorial

effect needs to be significantly greater than that for the error to be judged significant.

The nature of the large amount of error is important to consider because it may be the key

to more successful future analyses. It is possible that soil samples are extremely variable

due to factors beyond control (eg. what was growing in the location 50 years ago). It is

more likely that there are recordable additional variables influencing levels of the

variables (such as physical soil properties). In this case an improved design may be

possible that reduces error by more effectively dealing with variation. The other

alternative for reducing error is to take more samples (replicates) but this is not as

sensible and probably not as feasible as improving the design.

The split plot and MANOVA designs used were found to be effective in answering

statistical questions in the case study. The differences in test structure and hypothesis for

the most part complemented each other, answering slightly different questions. The two

techniques were found to give interesting and often seemingly conflicting results.

Tests of significance for the same factor in split plot and MANOVA designs are quite

different in nature. Consider the overall designs from the case study where both evaluate

compaction, cultivation, the compaction by cultivation interaction and block factorial

effects for significance. In these models repeated measures were taken over season. Both

split plot and MANOVA test for equality of means but how they do so is different. The

split plot tests look at the average value for each factorial effect over time while

MANOVA searches within each time to gauge significant differences of each factorial

effect. Therefore, the split plot result gives a picture of overall behaviour without

176

consideration of specific seasons. The split plot test does not provide any indication of an

interaction. MANOVA on the other hand could return a significant result that is not an

overall effect but an indication of an interaction between season and a factorial effect

because it has evaluated the situation from within each season.

The main plot tests of significance are not the only tests provided by split plot designs.

Continuing with the overall models in the case study, every factorial effect (except the

block) in the main plot is evaluated for an interaction with season in the subplot. This can

be seen as a benefit over MANOVA, where interactions between season and factorial

effects are bound with the result for that factorial effect. For example, a significant result

for compaction in MANOVA could be an indication of a significant compaction effect,

significant compaction by season interaction or both. The equivalent split plot model tests

for compaction and the compaction by season interaction separately.

In some analyses factorial effects were significant in MANOVA but not in the split plot

design and vice versa. In fact, at times the results from the two methods were complete

opposites. At first puzzling, consideration of the hypotheses and analysis techniques of

the two methods reveals how these differences can occur. When a factorial effect is

significant in MANOVA but not in the split plot designs, the mean variable levels do not

show a significant difference when averaged over time, but do at a particular time or

combination of times. Commonly in this case the split plot design had an interaction

between the factorial effect and time as that was the true nature of the significant

MANOVA result. The alternative case where a factorial effect is significant in the split

plot designs but not in MANOVA also occurred. This arises when there are only small

differences in any particular month but these small differences are significant when

looked at over all times. In these situations, the split plot result will be significant (in the

main plot only) as it analyses values averaged over time while MANOVA looks at each

time period involved.

The benefit of using the Bonferroni modification for multiple comparison tests in this

application is open to debate. Where it may prevent spurious relationships when a lot of

comparisons take place, sometimes it prevented there from being any significant results.

Often when a factorial effect was significant, no multiple comparison tests were

significant because the Bonferroni modified p value was too low. The application of a

177

conservative multiple comparison test like Tukey’s or Student-Newman-Keuls (SNK)

without the Bonferroni modification may be a better approach in the future.

For the short time series situation found here, MANOVA and split plot designs were

appropriate. As the number of time intervals increases, MANOVA begins to falter as the

number of degrees of freedom available for error declines. In the case of the Yarraman

data sets, this problem was avoided by using seasons rather than months. Conventional

time series techniques were not appropriate due to the small size of the time series. The

desirable characteristics of investigating a seasonal effect and including multiple

variables were definitely not feasible with a maximum of nineteen times. As the time

series begins to get large (ie. more than 25 time intervals) conventional univariate time

series specific analysis techniques may be useful.

The direction taken in analysis from the start was to look at each individual variable over

time, rather than focusing on particular times. There is the potential for multivariate

analysis at specific times rather than using different times. For instance, nitrate levels,

dynamics and leaching could all be analysed in one MANOVA model (but not a standard

split plot model as one dependent variable is required). These types of models were

beyond the scope of this thesis but present a possibility for future analysis.

178

6 CONCLUSION

Data correlated over time result from situations where one or more variables are recorded

over time. This dissertation found analysis of data correlated over time to be a common

endeavour frequented by many fields. Medicine, chemistry and forestry feature among

the range of fields that dabble in data correlated over time. A variety of techniques are

available to deal with differences in types of data and purposes of analysis.

No matter what exactly your area of interest is there is an analysis option available. Better

understood theories adorn many text books and are not overly complex. Care must be

taken that data adhere to the assumptions of the statistical tests involved. Advanced

methods even allow for the exploitation of the ‘lack of’ relationships (eg. cointegration

models).

Techniques available for the analysis of data correlated over time can be placed into two

large general categories. These are repeated measures and time series techniques, both

supported in modern statistical software packages. Repeated measures techniques cover

situations where multiple measurements are recorded on the same experimental unit and

these may not necessarily be over time. Time series techniques are created specifically

for situations where a number of measurements are made over time. Although designed

specifically for time series situations, time series techniques have drawbacks in regards to

the length of time series required for analysis. A general rule is that at least 25 times are

recommended for univariate time series (Nemec, 1996), and this increases dramatically

for multivariate techniques due to the additional parameters involved. Where there are

too few times to feasibly conduct (univariate or multivariate) time series analysis,

repeated measures techniques can usually be applied.

Recent theoretical and practical applications found in the literature were quite varied in

nature, with ARIMA based methods featuring predominantly. A range of other

techniques exists on the fringes, though most of these methods are designed for particular

situations. Nonlinear techniques exist for usage when assumptions of linear relationships

179

are not feasible. Bayesian techniques introduce a new statistical paradigm into the

multivariate time series field. One of the most interesting possibilities found was the use

of genetic algorithms for the accurate modelling of short multivariate time series

situations.

The Yarraman forestry case study involved many simultaneously recorded variables over

a maximum of nineteen months. Chemical and biological variables were recorded each

month at different compaction and cultivation levels for three designed blocks. In effect,

the original design was a randomised complete block (RCB) recorded over a number of

times. The original decision to use slope as the block was based on experimental

practicalities and suspicions. The validity of the block is questionable, though, as it was

based on very slight differences that would not have been expected to be a source of

variations for recorded variables.

Only having nineteen time periods in the case study made analysis using specific time

series methods impractical. Therefore it was instead decided to focus more on repeated

measures designs and use the two techniques of MANOVA and split plot analysis. Some

time series based techniques were applied where appropriate. Original graphs of the

twelve variables for each combination of the compaction and cultivation levels over time

were smoothed with the aid of a three month moving average. A form of cross correlation

functions was effectively used to judge correlation between particular variables and

environmental variables. Each variable was compared with a number of lags of each

environmental variable in case there was a delayed effect of environmental behaviour on

variable response. In this case only one side of the typical cross correlation function made

practical sense, as soil properties affecting environmental variables like temperature is

irrational.

Split plot designs and MANOVA were both found to be valuable analytical tools for the

case study. Using the split plot designs and MANOVA in conjunction with one another

revealed far more information than either technique would have if used in isolation. This

is because of differences in the hypotheses tested by the two techniques. In the correlated

data over time situation, factorial effects in the (split plot) main plot section are estimated

using values averaged over time. In MANOVA estimates are based on investigating the

180

factorial effects at each time using a global testing process. Practically, the more specific

results from the split plots designs were best to observe first and then refer to MANOVA.

Differences in approach between split plots and MANOVA led to occasional situations

where the two analytical methods appear to give conflicting results. It is seen in the case

study that these apparent differences, if interpreted correctly, add to the overall

understanding of the data. For example, mean nitrate leaching was found to be

significantly affected by cultivation overall from the split plot design (p = 0.0383) but not

the MANOVA (p = 0.2197). This is because in any given season there is not a noticeably

significant difference between the cultivation levels but considered over all seasons there

is.

A major hindrance to finding significant relationships in this experiment was the large

amount of natural variation or error present. The main reason suspected for the high

levels of variation was the presence of confounding variables that were not included in

the model. It was found that the biological variables were significantly correlated with

soil moisture and it is suspected that the inclusion of soil moisture may improve the

results. The time constraints of this thesis prohibited the investigation of relevant

covariate analyses.

When it comes to analysis of data correlated over time, it is clear from this investigation

that it is important to know what exactly is of interest. This is because there is a great

range of time series techniques available, and an appropriate selection will save time and

resources. Related to this is the importance of a suitable experimental design from the

outset.

Although not elementary statistical concepts, repeated measures and time series analyses

are not too difficult theoretically or practically. Focusing on ARIMA based methods, split

plot designs and MANOVA for discussion, all have advantages and disadvantages.

Modern statistical packages provide support for the evaluation of all base techniques for

ARIMA, split plots and MANOVA. In the case of SAS (SAS Institute, 1999), split plot

designs are not as well supported because multiple factorial effects can not be specified

as an error term (this must be done manually). ARIMA based methods involve more

human decision making as appropriate terms to include in models are frequently selected

181

from correlation functions and model ‘success’. Split plot designs and MANOVA require

less human decision making, but require expert interpretation when complex or

contrasting behaviour is found. All methods involve important statistical assumptions

about samples that must be carefully monitored. In the case of ARIMA, stationarity is a

common issue while for split plot and MANOVA it is normality.

This dissertation has presented a set of practically based techniques for dealing with data

correlated over time. Investigation has been provided into the latest and greatest literature

developments and application. A detailed example demonstrating the usefulness of

techniques of dealing with data correlated over time has been provided. It is anticipated

that the reader now has a working understanding of the concepts and issues surrounding

the analysis of situations involving data correlated over time.

182

REFERENCES

Akman, I., and De Gooijer, J. G. (1996), Component Extraction Analysis of Multivariate Time Series, Computational Statistics and Data Analysis, 21, 487-499.

Alexandrov, G. A., Yamagata, Y., and Oikawa, T. (1999), Towards a Model for Projecting Net Ecosystem Production of the World Forests, Ecological Modelling, 123, 183-191.

Berry, Donald A. & Stangl, Dalene K. (1996), Bayesian Biostatistics, Marcel Dekker Inc, New York, USA.

Bluman, A. G. (2001), Elementary Statistics – A Step By Step Approach, Mc-Graw Hill, New York, New York, USA.

Blumfield, T. J., Xu, Z. H., Chen, C. R. (2002), Soil Compaction and Mineral Nitrogen Dynamics during Hoop Pine Plantation Establishment, Cooperative Research Centre (CRC) for Sustainable Production Forestry, Griffith University, Nathan, Queensland, Australia.

Boyd, I. L. and Murray, A. W. A. (2001), Monitoring a Marine Ecosystem Using Responses of Upper Trophic Level Predators, Journal of Animal Ecology, 70, 747-760.

Cao, L., Mees, A., and Judd, K. (1998), Dynamics from Multivariate Time Series, Physica D, 121, 75-88.

Chan, W. S., Lo, H. W. C., and Cheung, S. H. (1999), Return Transmission Among Stock Markets of Greater China, Mathematics and Computers in Simulation, 48, 511-518.

Chatfield, C. (1980), The Analysis of Time Series: An Introduction, Second Edition, Chapman and Hall, New York, New York, USA.

Chaturvedi, A., Wan, A. T. K, and Singh, S. P. (2002), Improved Multivariate Prediction in a General Linear Model with an Unknown Error Covariance Matrix, Journal of Multivariate Analysis, [Online] Available: http://www.academicpress.com/jmva (23/07/2002).

Chen, C. R., Xu, Z. H., Blumfield, T. J. and Hughes, J. M. (2002), Soil Microbial Biomass During the Early Establishment of Hoop Pine Plantation: Seasonal Variation and Impacts of Site Preparation, Cooperative Research Centre (CRC) for Sustainable Production Forestry, Griffith University, Nathan, Queensland, Australia.

Chen, H., and Dyke, P. P. G. (1998), Multivariate Time-Series Model for Suspended Sediment Concentration, Continental Shelf Research, 18, 123-150.

Chin, D. A. (1995), A Scale Model of Multivariate Rainfall Time Series, Journal of Hydrology, 168, 1-15.

Cochran, W. G. and Cox, G. M., 1957, Experimental Designs 2nd Edition, John Wiley, New York.

183

Crowder, M. J. and Hand, D. J. (1990), Analysis of Repeated Measures, Chapman & Hall, New York, USA.

Crucianu, M., Boné, R., Asselin de Beauville, J. (2001), Bayesian Learning For Recurrent Neural Networks, Neurocomputing, 36, 235-242.

Diamandis, P. F., Georgoutsos, D. A., and Kouretas, G. P. (2000), The Monetary Model in the Presense of I(2) Components: Long-run Relationships, Short-run Dynamics and Forecasting of the Greek Drachma, Journal of International Money and Finance, 19, 917-941.

Felmingham, B., Qing, Z., and Healy, T. (2000), The Interdependence of Australian and Foreign Real Interest Rates, Economic Record, 76, 163-171.

Franses, P. H. (1998), Time Series Models for Business and Economic Forecasting, Cambridge University Press, United Kingdom.

Green, A. G. and Sparks, G. R. (1999), Population Growth and the Dynamics of Canadian Development: A Multivariate Time Series Approach, Explorations in Economic History, 36, 56-71.

Guimarães, G., Peter, J. H., Penzel, T., and Ultsch, A. (2001), A Method for Automated Temporal Knowledge Acquisition Applied to Sleep-Related Breathing Disorders, Artificial Intelligence in Medicine, 23, 211-237.

Jensen, G. F. (2001), The Invention of Television as a Cause of Homicide: The Reification of a Spurious Relationship, Homicide Studies, 5, 114-130.

Johnson, R. A. and Wichern, D. W. (1982), Applied Multivariate Statistical Analysis, Prentice-Hall, New Jersey, USA.

Keselman, H. J., Algina, J., Kowalchuk, R. K. and Wolfinger, R. D. (1999), A Comparison of Recent Approaches to the Analysis of Repeated Measurements, British Journal of Mathematical and Statistical Psychology, 52, 63-78.

Khalil, M., Panu, U. S. and Lennox, W. C. (2001), Groups and Neural Networks Based Streamflow Data Infilling Procedures, Journal of Hydrology, 241, 153-176.

Kulkarni, D. R., and Parikh, J. C. (2000), Multivariate Time Series Modeling In A Connectionist Approach, International Journal of Modern Physics, 11, 159-173.

Li, Z., and Kafatos, M. (2000), Interannual Variability of Vegetation in the United States and Its Relation to El Niño/Southern Oscillation, Remote Sensing of Environment, 71, 239-247.

Lu, S., Lu, H. and Kolarik, W. J. (2001), Multivariate Performance Reliability Prediction in Real-Time, Reliability Engineering and System Safety, 72, 39-45.

Maharaj, E. A. (1999), Comparison and Classification of Stationary Multivariate Time Series, Pattern Recognition, 32, 1129-1138.

Makridakis, S., Wheelwright, S. C., and Hyndman, R. J. (1998), Forecasting Methods and Applications – Third Edition, John Wiley & Sons, New York, USA.

Mann, P. M. (1998), Introductory Statistics – Third Edition, John Wiley & Sons, New York, New York, USA.

McCleary, R., and Hay, R. A. Jnr. (1980), Applied Time Series Analysis For The Social Sciences, Sage Publications, Beverly Hills, California, USA.

184

Microsoft Corporation (2001), Microsoft Excel 2002, Microsoft Corporation, USA.

Nemec, A. F. L. (1995), Analysis of Repeated Measures and Time Series: An Introduction with Forestry Examples (Biometrics Information Handbook No. 6), Ministry of Forests Research Program, British Columbia, Canada.

Nicholson, M., Fryer, R., and Maxwell, D. (1998), Multivariate Trends in Phytoplankton Species Groups in the Southern North Sea, ICES Journal of Marine Science, 55, 581-586.

Ørstavik, S., Carretero-González, R., and Stark, J. (2000), Estimation of Intensive Quantiles in Spatio-Temporal Systems From Time-Series, Physica D, 147, 204-220.

Paluš, M. (1996), Detecting Nonlinearity in Multivariate Time Series, Physics Letters A, 213, 138-147.

Pech, N., Samba, A., Drapeau, L, Sabatier, R., and Laloë, F. (2001), Fitting a Model of Flexible Multifleet-Multispecies Fisheries to Senegalese Artisanal Fishery Data, Aquatic Living Resources, 14, 81-98.

Peiris, D. R., and McNicol, J. W. (1996), Modelling Daily Weather with Multivariate Time Series, Agricultural and Forest Meteorology, 79, 219-231.

Perry, D. A. (1998), Detecting The Scientific Basis Of Forestry, Annual Review of Ecology and Systematics, 29, 435-66.

Pynnönen, S. (2001), Multivariate Time Series, Professor of Statistics, University of Vaasa.

Rao, P. V. (1998), Statistical Research Methods in the Life Sciences, Brooks/Cole Publishing Company, Pacific Grove, California, USA.

Reick, C. H., and Page, B. (2000), Time Series Prediction by Multivariate Next Neighbour Methods with Application to Zooplankton Forecasts, Mathematics and Computers in Simulation, 52, 289-310.

Repucci, M. A., Schiff, N. D., and Victor, J. D. (2001), General Strategy for Hierarchical Decomposition of Multivariate Time Series: Implications for Temporal Lobe Seizures, Annals of Biomedical Engineering, 29, 1135-1149.

Rodó, X., Giralt, S., Burjachs, F., Comín, F. A., Tenorio, R. G., Julià, R. (2002), High-Resolution Saline Lake Sentiments As Enhanced Tools For Relating Proxy Paleolake Records to Recent Climatic Data Series, Sedimentary Geology, 148, 203-220.

SAS Institute Inc (1999), The SAS System for Windows – Version 8.00 Edition, SAS Institute Inc., Cary, NC, USA.

Smith, T. E. and Bubb, K. A. (2000), The Effects of Mechanical Harvesting Operations on Plantation Productivity in QDPIF Hoop Pine Plantations, Part A: Impacts to Soil Physical Properties, Queensland Forestry Research Institute: Agency for Food and Fibre Sciences, Gympie, Queensland, Australia.

SPSS (1999), SPSS Base 10 Application Guide, SPSS Inc., Chicago, USA.

Stărică, C. (1999), Multivariate Extremes for Models with Constant Conditional Correlations, Journal of Empirical Finance, 6, 515-553.

185

Stergiou, K. I., Christou, E. D., and Petrakis,G. (1997), Modelling and Forecasting Months Fisheries Catches: Comparison of Regression, Univariate and Multivariate Time Series Methods, Fisheries Research, 29, 55-95.

Swift, S., Liu, X. (2002), Predicting Glaucomatous Visual Field Deterioration Through Short Multivariate Time Series Modelling, Artificial Intelligence in Medicine, 24, 5-24.

Swift, S., Tucker, A., Martin, N. and Liu, X. (2001), Grouping Multivariate Time Series Variables: Applications to Chemical Process and Visual Field Data, Knowledge-Based Systems, 14, 147-154.

Telenius, B. and Verwijst, T. (1995), The Influence of Allometric Variation, Vertical Biomass Distribution and Sampling Procedure On Biomass Estimates In Commercial Short-Rotation Forests, Bioresource Technology, 51, 247-253.

Van Dongen, G. and Geuens, L. (1998), Multivariate Time Series Analysis For Design and Operation of a Biological Wastewater Treatment Plant, Water Research, 32, 691-700.

Wilson, G. T., Reale, M., and Morton, A. S. (2001), Developments in Multivariate Time Series Modeling, University of Canterbury, Christchurch, New Zealand , Report Number: VCDMS2001/1.

Wold, S., Sjöström, M. and Eriksson, L. (2001), PLS-Regression: A Basic Rook of Chemometrics, Chemometrics and Intelligent Laboratory Systems, 58, 109-130.

Zar, J. H. (1999), Biostatistical Analysis – Fourth Edition, Prentice-Hall International Inc., New Jersey, USA.

186

STATISTICAL ANALYSES OF MULTIVARIATE

TIME SERIES DATA

WITH APPLICATION TO COMPACTING

EFFECTS ON SOIL CHEMICAL AND

BIOLOGICAL PROPERTIES IN FORESTRY

VOLUME TWO

By Stuart Fenech BSc (AES)

Australian School of Environmental Studies

Faculty of Environmental Science

GRIFFITH UNIVERSITY

BRISBANE

This dissertation is submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science with Honours in Australian Environmental Studies.

October 2002

187

APPENDIX A – SAS EXAMPLES INPUT

Example 1 - Exploratory Data Analysis

data EDA;input mayRain;datalines;8189124.68162.6229.568.185.0551.564.3541.4582.55142.35153.994.955740.7550run;

proc univariate;run;

Example 2 – Autocorrelation and Autocovariance

/* Reading in the data */data rainy;input mon $ year rain;datalines;Jan 1985 76.2Feb 1985 94.8Mar 1985 224... [A LOT MORE DATA]Oct 2001 56.9Nov 2001 153Dec 2001 102run;

/* Using proc ARIMA for autocovariance and autocorrelation */proc arima;identify VAR=rain;run;

188

Example 3 – Rainfall Correlation Functions

/* Reading in the data */data rainy;input mon $ year rain;datalines;Jan 1985 76.2Feb 1985 94.8Mar 1985 224Apr 1985 119... [A LOT MORE DATA]Oct 2001 56.9Nov 2001 153Dec 2001 102run;

/* Using proc ARIMA for autocorrelation function, etc. */proc arima;identify VAR=rain;run;

Example 4 – Rainfall Cross Correlation Function

/* Reading in the data */data rainy;input mon $ year rain days;datalines;Jan 1985 76.2 4Feb 1985 94.8 9Mar 1985 224 14...Sep 2001 12 4Oct 2001 56.9 6Nov 2001 153 11Dec 2001 102 7run;

/* Using proc ARIMA for cross correlation */proc arima;identify VAR=rain CROSSCOR=days;run;

189

Example 7 – Linear Regression


/* Using proc REG for regression */proc reg;model rain = days /* Additional variables if needed */;run;

/* Using proc GLM for regression */proc glm;model rain = days /* Additional variables if needed */;run;

Example 8 – Rainfall ARIMA Model


/* Initial testing for stationarity */proc arima;identify VAR=rain STATIONARITY=(DICKEY=(0,1,5,10));run;

/* Running an actual ARIMA model */proc arima;identify VAR=rain STATIONARITY=(DICKEY=(0,1,5,10));estimate P=(1)(12) PLOT;run;

190

Example 9 – Rainfall Multivariate ARIMA Model


proc arima; /* Testing lage 0 to 12 of days */identify VAR=rain CROSSCOR=days STATIONARITY=(DICKEY=(0,1,5,10));estimate INPUT=((0,1,2,3,4,5,6,7,8,9,10,11,12) days) PLOT;run;

proc arima; /* Testing only lag 0 of days */identify VAR=rain CROSSCOR=days STATIONARITY=(DICKEY=(0,1,5,10));estimate INPUT=((0) days) PLOT;run;

proc arima; /* Testing lags 0 of days, 1 and 12 of rain */identify VAR=rain CROSSCOR=days STATIONARITY=(DICKEY=(0,1,5,10));estimate INPUT=(0 days) P=(1)(12) PLOT;run;

191

APPENDIX B – SAS EXAMPLES OUTPUT

Example 3 – Rainfall Correlation Functions

Note that in the following SAS output that the numbers ‘1’, ‘2’ etc. are actually the first

decimal place. The left hand side of the diagram symbolises negative correlations the

right hand side positive collelations. For example, the ‘7’ on the left hand side of each

diagram represents ‘-0.7’. Full stops are used to indicate two standard errors either side of

zero. The asterisks are (rather crude) representations of the correlation present with each

lag.

Autocorrelations

Lag -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 Std Error0 | |********************| 01 | . |***** | 0.0700142 | . |* . | 0.0736113 | . | . | 0.0738584 | . *| . | 0.0738675 | .**| . | 0.0739256 | ***| . | 0.0745527 | ****| . | 0.0757838 | . *| . | 0.0777629 | . | . | 0.077823

10 | . |* . | 0.07783711 | . |**. | 0.07793712 | . |***** | 0.07868313 | . |**. | 0.08193114 | . | . | 0.08283815 | . | . | 0.08283916 | ***| . | 0.08284017 | ***| . | 0.08461918 | ****| . | 0.08617419 | .***| . | 0.08790220 | . *| . | 0.08894021 | . | . | 0.08910622 | . |** . | 0.08910923 | . |***. | 0.08992024 | . |***. | 0.091392

"." marks two standard errors

192

Inverse Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 11 -0.14819 | ***| . |2 0.01785 | . | . |3 0.00192 | . | . |4 -0.04323 | . *| . |5 -0.02746 | . *| . |6 -0.04143 | . *| . |7 0.12140 | . |**. |8 -0.06345 | . *| . |9 0.01748 | . | . |

10 0.03610 | . |* . |11 0.02650 | . |* . |12 -0.13184 | ***| . |13 -0.03791 | . *| . |14 0.07625 | . |**. |15 -0.08401 | .**| . |16 0.12260 | . |**. |17 0.02838 | . |* . |18 0.06193 | . |* . |19 0.02578 | . |* . |20 0.00497 | . | . |21 0.03667 | . |* . |22 -0.08051 | .**| . |23 -0.03179 | . *| . |24 -0.02682 | . *| . |

Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 11 0.22955 | . |***** |2 0.00878 | . | . |3 -0.00464 | . | . |4 -0.03353 | . *| . |5 -0.08803 | .**| . |6 -0.10035 | .**| . |7 -0.12725 | ***| . |8 0.04279 | . |* . |9 0.02107 | . | . |

10 0.02495 | . | . |11 0.07991 | . |**. |12 0.17402 | . |*** |13 0.01375 | . | . |14 -0.06698 | . *| . |15 0.01440 | . | . |16 -0.17265 | ***| . |17 -0.06877 | . *| . |18 -0.07572 | .**| . |19 -0.02888 | . *| . |20 -0.01077 | . | . |21 -0.00723 | . | . |22 0.10487 | . |**. |23 0.04594 | . |* . |24 0.03017 | . |* . |

193

Example 4 – Cross Correlation Function (CCF)

Comments are as per the Example 3 on autocorrelation functions.

Correlation of rain and days

Variance of input = 15.10861Number of Observations 204

Crosscorrelations

Lag Covariance Corr -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1-12 87.698391 0.26871 | . |***** |-11 62.931614 0.19282 | . |**** |-10 31.158648 0.09547 | . |**. |-9 -16.809443 -.05150 | . *| . |-8 -3.484089 -.01068 | . | . |-7 -68.982688 -.21136 | ****| . |-6 -71.005827 -.21756 | ****| . |-5 -44.118362 -.13518 | ***| . |-4 -53.835095 -.16495 | ***| . |-3 -11.265130 -.03452 | . *| . |-2 31.202538 0.09561 | . |**. |-1 74.625334 0.22865 | . |***** |0 194.433 0.59575 | . |************ |1 74.621641 0.22864 | . |***** |2 33.891011 0.10384 | . |**. |3 10.905942 0.03342 | . |* . |4 -15.958500 -.04890 | . *| . |5 -56.899180 -.17434 | ***| . |6 -96.559925 -.29586 | ******| . |7 -74.264974 -.22755 | *****| . |8 -35.327098 -.10824 | .**| . |9 -28.199919 -.08641 | .**| . |

10 16.475877 0.05048 | . |* . |11 74.464415 0.22816 | . |***** |12 101.732 0.31171 | . |****** |


194

Example 8 – Linear Regression

The REG ProcedureModel: MODEL1Dependent Variable: rain

Analysis of Variance

Sum of MeanSource DF Squares Square F Value Pr > F

Model 1 510441 510441 111.14 <.0001Error 202 927767 4592.90592Corrected Total 203 1438208

Root MSE 67.77098 R-Square 0.3549Dependent Mean 88.29181 Adj R-Sq 0.3517Coeff Var 76.75794

Parameter Estimates

Parameter StandardVariable DF Estimate Error t Value Pr > |t|

Intercept 1 -17.18371 11.07325 -1.55 0.1223days 1 12.86902 1.22072 10.54 <.0001

The GLM Procedure

Dependent Variable: rain

Sum ofSource DF Squares Mean Square F Value Pr > FModel 1 510441.398 510441.398 111.14 <.0001Error 202 927766.995 4592.906Corrected Total 203 1438208.393

R-Square Coeff Var Root MSE rain Mean0.354915 76.75794 67.77098 88.29181

Source DF Type I SS Mean Square F Value Pr > Fdays 1 510441.3983 510441.3983 111.14 <.0001

Source DF Type III SS Mean Square F Value Pr > Fdays 1 510441.3983 510441.3983 111.14 <.0001

StandardParameter Estimate Error t Value Pr > |t|

Intercept -17.18370784 11.07324557 -1.55 0.1223days 12.86902297 1.22072097 10.54 <.0001

195

Example 9 – Rainfall ARIMA Model

Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < TauZero Mean 0 -74.3640 <.0001 -6.72 <.0001

1 -44.1021 <.0001 -4.66 <.00013 -21.7670 0.0008 -3.14 0.00185 -15.2863 0.0060 -2.54 0.0110

Single Mean 0 -156.394 0.0001 -11.22 <.00011 -152.853 0.0001 -8.67 <.00013 -173.290 0.0001 -6.76 <.00015 -908.962 0.0001 -6.56 <.0001

Trend 0 -156.416 0.0001 -11.20 <.00011 -152.913 0.0001 -8.65 <.00013 -173.197 0.0001 -6.74 <.00015 -909.016 0.0001 -6.54 <.0001

Conditional Least Squares Estimation

Standard ApproxParameter Estimate Error t Value Pr > |t| Lag

MU 88.46947 8.90639 9.93 <.0001 0AR1,1 0.19926 0.06926 2.88 0.0044 1AR2,1 0.22670 0.07276 3.12 0.0021 12

Constant Estimate 54.78186Variance Estimate 6470.93Std Error Estimate 80.44209

The ARIMA Procedure

AIC 2372.02SBC 2381.974Number of Residuals 204

* AIC and SBC do not include log determinant.

Correlations of Parameter Estimates

Parameter MU AR1,1 AR2,1

MU 1.000 0.001 0.003AR1,1 0.001 1.000 -0.063AR2,1 0.003 -0.063 1.000

196

Autocorrelation Plot of Residuals

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1

0 1.00000 | |********************|1 -.00418 | . | . |2 0.02090 | . | . |3 -.00335 | . | . |4 0.02526 | . |* . |5 -.01751 | . | . |6 -.05223 | . *| . |7 -.13569 | ***| . |8 0.00856 | . | . |9 0.01327 | . | . |10 -.00798 | . | . |11 0.03288 | . |* . |12 -.01308 | . | . |13 0.05938 | . |* . |14 -.02232 | . | . |15 0.01769 | . | . |16 -.15192 | ***| . |17 -.07228 | . *| . |18 -.10248 | .**| . |19 -.07657 | .**| . |20 -.01122 | . | . |21 -.02114 | . | . |22 0.09650 | . |**. |23 0.10554 | . |**. |24 0.04805 | . |* . |


Inverse Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1

1 0.05504 | . |* . |2 0.03323 | . |* . |3 -0.00812 | . | . |4 -0.05064 | . *| . |5 -0.01044 | . | . |6 -0.00149 | . | . |7 0.11853 | . |**. |8 -0.02505 | . *| . |9 0.02746 | . |* . |10 0.02628 | . |* . |11 -0.00179 | . | . |12 0.03220 | . |* . |13 -0.04304 | . *| . |14 0.03290 | . |* . |15 -0.03709 | . *| . |16 0.11700 | . |**. |17 0.06798 | . |* . |18 0.09580 | . |**. |19 0.08192 | . |**. |20 -0.00424 | . | . |21 0.01902 | . | . |22 -0.09172 | .**| . |

197

23 -0.06844 | . *| . |24 -0.02149 | . | . |

Partial Autocorrelations

Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1

1 -0.00418 | . | . |2 0.02089 | . | . |3 -0.00318 | . | . |4 0.02481 | . | . |5 -0.01720 | . | . |6 -0.05348 | . *| . |7 -0.13582 | ***| . |8 0.00804 | . | . |9 0.02003 | . | . |10 -0.00602 | . | . |11 0.03808 | . |* . |12 -0.02016 | . | . |13 0.04356 | . |* . |14 -0.03838 | . *| . |15 0.01749 | . | . |16 -0.14918 | ***| . |17 -0.08034 | .**| . |18 -0.09348 | .**| . |19 -0.08340 | .**| . |20 0.00681 | . | . |21 -0.02820 | . *| . |22 0.09420 | . |**. |23 0.07241 | . |* . |24 0.02325 | . | . |

Model for variable rain

Estimated Mean 88.46947

Autoregressive Factors

Factor 1: 1 - 0.19926 B**(1)Factor 2: 1 - 0.2267 B**(12)

198

Example 10 – Multivariate Rainfall ARIMA model

(Initial model)


Standard ApproxParameter Estimate Error Pr > |t| Lag VariableMU 41.31748 34.57051 0.2336 0 rainNUM1 12.20554 1.44052 <.0001 0 daysNUM1,1 -0.01414 1.48190 0.9924 1 daysNUM1,2 0.81242 1.48888 0.5860 2 daysNUM1,3 -0.02988 1.48894 0.9840 3 daysNUM1,4 -0.84325 1.48745 0.5715 4 daysNUM1,5 1.54941 1.43216 0.2807 5 daysNUM1,6 1.31813 1.47719 0.3734 6 daysNUM1,7 1.63912 1.47407 0.2676 7 daysNUM1,8 0.0088404 1.46303 0.9952 8 daysNUM1,9 1.92493 1.45910 0.1887 9 daysNUM1,10 0.20047 1.42377 0.8882 10 days

(Later model)


Standard ApproxParameter Estimate Error Pr > |t| Lag Variable

MU -17.18371 11.07325 0.1223 0 rainNUM1 12.86902 1.22072 <.0001 0 days

Constant Estimate -17.1837Variance Estimate 4592.906Std Error Estimate 67.77098AIC 2301.1SBC 2307.736Number of Residuals 204* AIC and SBC do not include log determinant.

199

APPENDIX C – EXPERIMENT TIME PERIODS

Month Start Date Season 1 3 February 2000 2 2 March 2000 3 30 March 2000 4 27 April 2000

1: Autumn 2000

5 25 May 2000 6 22 June 2000 7 20 July 2000

2: Winter 2000

8 17 August 2000 9 14 September 2000 10 12 October 2000 11 9 November 2000

3: Spring 2000

12 7 December 2000 13 4 January 2001 14 1 February 2001

4: Summer 2000/2001

15 1 March 2001 16 29 March 2001 17 26 April 2001

5: Autumn 2001

18 24 May 2001 19 21 June 2001 20* 19 July 2001

* No measurements were taken for month 20. Included purely to show end of month 19.

200

APPENDIX D – VARIABLE LIST

This appendix provides a summary of the factors and variables provided for analysis.

Note that some easily derived (and irrelevant) variables have not been included, for

simplicity.

* Indicates variables that are dependant or strongly linked with variables previously

listed.

CHEMICAL VARIABLES

MONTH

Values: 1 to 19.

Interpretation: Month or 28 day period.

BLOCK

Values: 1, 2, 3.

Interpretation: Replicate or block. These differ depending on slope present. 1 is upper

slope, 2 is mid slope, 3 is lower slope.

COMPACT

Values: 1, 2, 3.

Interpretation: Level of compaction. None (1), one (2) or sixteen (3) compactions.

CULT

Values: 1, 2.

Interpretation: Whether or not the sample has had cultivation applied. Not cultivated (1)

or cultivated with disc plough (2).

201

DEPTH

Values: 1, 2.

Interpretation: Soil depth being analysed. 0-10 cm (1) or 10-20 cm (2) within the sample

taken.

GRAV

Units: Percentage.

Interpretation: Gravimetric soil moisture. Percent of soil in sample.

MOIST *

Units: Percentage.

Interpretation: Soil moisture content. Percent of water in sample.

Note: Closely related to GRAV above.

RAINFALL

Units: Mm per month.

Interpretation: Monthly rainfall as measured on site.

MAXTEMP

Units: Degrees Celsius.

Interpretation: Mean maximum temperature during a month.

MINTEMP *


Interpretation: Mean minimum temperature during a month.

Note: Very strong (85.593%) correlation with MAXTEMP.

TMPRANGE *


Interpretation: Mean temperature range during month.

Note: 99.73% correlation with (MAXTEMP - MINTEMP). Not simply max-min due to

rounding error.

202

HANO2, HAN03, HANH4, HATOTN *

Units: kgN/Ha.

Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen levels, respectively.

Note: Total is sum of previous three (rounding error applies).

POTNO2, POTNO3, POTNH4, POTTOTN *

Units: kgN/Ha.

Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen dynamics,

respectively.

Note: Total is sum of previous three (rounding error applies). Calculated from capped

core – baseline.

LCHNO2, LCHNO3, LCHNH4, LCHTOTN *

Units: kgN/Ha.

Interpretation: Nitrite, nitrate, ammonium and total mineral nitrogen leaching,

respectively.

Note: Total is sum of previous three (rounding error applies). Calculated from capped –

uncapped.

CUPOTNO2, CUPOTNO3, CUPOTNH4, CUPOTTN [All * ]

Units: kgN/Ha.

Interpretation: Cumulative nitrite, nitrate, ammonium and total mineral nitrogen

dynamics.


CULCHNO2, CULCHNO3, CULCHNH4, CULCHTN [All * ]

Units: kgN/Ha.

Interpretation: Cumulative nitrite, nitrate, ammonium and total mineral nitrogen leaching.


203

BIOLOGICAL VARIABLES

Note: All measurements taken at the 0-10cm soil depth.

BLOCK

Values: 1, 2, 3.

Interpretation: Replicate or block. These differ depending on slope present. 1 is upper

slope, 2 is mid slope, 3 is lower slope.

Note: In chemical data set as well.

MONTH

Values: 1 to 14.

Interpretation: Month or 28 day period.

Note: Corresponds with first 14 months from chemical set variable of same name.

COMPACT

Values: 1, 2, 3.

Interpretation: Level of compaction. None (1), one (2) or 16 (3) compactions.


CULT

Values: 1, 2.

Interpretation: Whether or not the sample has had cultivation applied. Not cultivated (1)

or cultivated with disc plough (2).


MBN

Units: Ug/g (micrograms per gram).

Interpretation: Microbial nitrogen level, referred to as MBN.

204

MBC

Units: Ug/g (micrograms per gram).

Interpretation: Microbial carbon level, referred to as MBC.

MICROC:N *

Units: Ratio.

Interpretation: Ratio of microbial carbon to nitrogen. MBC/MBN.

MBNFLUX *

Units: µg/g (micrograms per gram).

Interpretation: Microbial nitrogen flux. 13 months because it is simply the difference in

microbial nitrogen between successive months.

MOIST

Units: Percentage.

Interpretation: Soil moisture. Percent of soil as oven-dried weight.

205

APPENDIX E – NITRATE LEVELS Degrees of freedom, test statistics, F values and p values have their usual interpretations.

In MANOVA, “Wilk” refers to Wilk’s lambda, “Pillai” to Pillai’s trace, “HL” to the

Hotelling-Lawley trace and “Roy” to Roy’s largest root. The number of asterisks (*)

represents the level of significance. If p is less than 0.001 then the symbol used is “***”;

if p is less then 0.01 then the symbol used in “**” and if p is less than 0.05 then the

symbol used is “*”.

Nitrate Levels By Compaction, Cultivation,Depth 0-10 cm

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .


206

Nitrate Levels By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .


207

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.071774977 0.31109683 0.211662798 0.08446 1 0.390098182 0.345335295 0.351742539 0.11917 2 0.225078152 0.423550487 0.279488789 0.03844 3 0.217614909 0.459377045 0.169258885 -0.00111 4 0.117667997 0.402983877 0.09976742 -0.03071

Correlation p-values

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.770289183 0.194825683 0.384355859 0.119 1 0.10951225 0.1604459 0.152314574 0.032 2 0.38508926 0.090235437 0.277292335 0.503 3 0.41815731 0.073443253 0.53087977 0.9851 4 0.676205849 0.136392668 0.723518466 0.6154

Cross-Correlation:Nitrate Levels

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion

Cross-Correlation:Nitrate LevelsSoil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

208

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 4.463444 0.041172 * Cultivation 1 2.234167 0.165860 Compaction×Cultivation 2 1.088696 0.373447 Block 2 2.745249 0.112117 Season 4 25.47999 0.000000 *** Season×Compaction 8 6.50085 0.000010 *** Season×Cultivation 4 0.838996 0.507291 Season×Compaction×Cultivation 8 1.780685 0.104355

Overall MANOVA

Test Source Of Variation Ndf DdfTest Stat.

F Value p value Sig

Wilk Cultivation 5 6 0.51692 1.12 0.438600 Pillai Cultivation 5 6 0.48308 1.12 0.438600 HL Cultivation 5 6 0.93454 1.12 0.438600 Roy Cultivation 5 6 0.93454 1.12 0.438600 Wilk Compaction 10 12 0.11085 2.4 0.076200 Pillai Compaction 10 14 1.20432 2.12 0.096700 HL Compaction 10 6.7 5.17796 2.95 0.086500 Roy Compaction 5 7 4.55356 6.37 0.015300 * Wilk Compaction×Cultivation 10 12 0.39833 0.7 0.708700 Pillai Compaction×Cultivation 10 14 0.65686 0.68 0.723200 HL Compaction×Cultivation 10 6.7 1.37193 0.78 0.651400 Roy Compaction×Cultivation 5 7 1.26215 1.77 0.238200 Wilk Block 10 12 0.38669 0.73 0.686700 Pillai Block 10 14 0.72814 0.8 0.631000 HL Block 10 6.7 1.28912 0.73 0.682900 Roy Block 5 7 0.98881 1.38 0.335000

209

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 16.27645 0.000717 *** Cultivation 1 1.5912711 0.235776 Compaction × Cultivation 2 0.91367 0.432080 Block 2 0.4690602 0.638685 Month 2 11.827422 0.000266 *** Month×Compaction 4 0.9947435 0.429507 Month×Cultivation 2 2.4712513 0.105707 Month×Compaction×Cultivation 4 1.8914225 0.144662

Season 2

Source Of Variation Df F Value p value Sig Compaction 2 5.2468002 0.027664 * Cultivation 1 0.5459634 0.476957 Compaction × Cultivation 2 3.4642485 0.071930 Block 2 3.3826475 0.075499 Month 2 1.4820891 0.247223 Month×Compaction 4 0.6083719 0.660507 Month×Cultivation 2 0.3492567 0.708737 Month×Compaction×Cultivation 4 0.3892338 0.814212

Season 3

Source Of Variation Df F Value p value Sig Compaction 2 0.143626 0.867967 Cultivation 1 0.0002949 0.986636 Compaction × Cultivation 2 1.3966129 0.291810 Block 2 0.6181587 0.558316 Month 2 6.7366834 0.004763 ** Month×Compaction 4 0.5769341 0.682100 Month×Cultivation 2 0.0604309 0.941502 Month×Compaction×Cultivation 4 0.7190796 0.587274

210

Season 4

Source Of Variation Df F Value p value Sig Compaction 2 0.9748616 0.410403 Cultivation 1 3.8284691 0.078874 Compaction × Cultivation 2 0.5316026 0.603386 Block 2 2.0401966 0.180687 Month 2 4.6030903 0.020319 * Month×Compaction 4 0.5033742 0.733572 Month×Cultivation 2 1.2328767 0.309260 Month×Compaction×Cultivation 4 0.744417 0.571232

Season 5

Source Of Variation Df F Value p value Sig Compaction 2 1.3006132 0.314728 Cultivation 1 0.2153729 0.652527 Compaction × Cultivation 2 0.1115725 0.895524 Block 2 1.425204 0.285375 Month 2 1.7952657 0.187678 Month×Compaction 4 0.5815466 0.678914 Month×Cultivation 2 0.3404418 0.714836 Month×Compaction×Cultivation 4 2.1860018 0.101080

211

Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.70075 1.14 0.390300 Pillai Cultivation 3 8 0.29925 1.14 0.390300 HL Cultivation 3 8 0.42704 1.14 0.390300 Roy Cultivation 3 8 0.42704 1.14 0.390300 Wilk Compaction 6 16 0.1312 4.7 0.006100 ** Pillai Compaction 6 18 0.98649 2.92 0.036100 * HL Compaction 6 9.1 5.72481 7.34 0.004400 ** Roy Compaction 3 9 5.56357 16.69 0.000500 ***Wilk Compaction×Cultivation 6 16 0.41762 1.46 0.253500 Pillai Compaction×Cultivation 6 18 0.65069 1.45 0.251700 HL Compaction×Cultivation 6 9.1 1.23098 1.58 0.257400 Roy Compaction×Cultivation 3 9 1.07946 3.24 0.074600 Wilk Block 6 16 0.58917 0.81 0.578900 Pillai Block 6 18 0.46305 0.9 0.513700 HL Block 6 9.1 0.60865 0.78 0.605600 Roy Block 3 9 0.36735 1.1 0.397600

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.79306 0.7 0.580200 Pillai Cultivation 3 8 0.20694 0.7 0.580200 HL Cultivation 3 8 0.26093 0.7 0.580200 Roy Cultivation 3 8 0.26093 0.7 0.580200 Wilk Compaction 6 16 0.22955 2.9 0.041500 * Pillai Compaction 6 18 1.03212 3.2 0.025700 * HL Compaction 6 9.1 2.21635 2.84 0.076700 Roy Compaction 3 9 1.40505 4.22 0.040500 * Wilk Compaction×Cultivation 6 16 0.41398 1.48 0.247600 Pillai Compaction×Cultivation 6 18 0.68208 1.55 0.218000 HL Compaction×Cultivation 6 9.1 1.18356 1.52 0.274400 Roy Compaction×Cultivation 3 9 0.93553 2.81 0.100400 Wilk Block 6 16 0.49938 1.11 0.400800 Pillai Block 6 18 0.50771 1.02 0.443300 HL Block 6 9.1 0.98829 1.27 0.358900 Roy Block 3 9 0.97371 2.92 0.092700

212

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.98768 0.03 0.991200 Pillai Cultivation 3 8 0.01232 0.03 0.991200 HL Cultivation 3 8 0.01247 0.03 0.991200 Roy Cultivation 3 8 0.01247 0.03 0.991200 Wilk Compaction 6 16 0.74211 0.43 0.849000 Pillai Compaction 6 18 0.26932 0.47 0.823800 HL Compaction 6 9.1 0.33211 0.43 0.844700 Roy Compaction 3 9 0.27639 0.83 0.510500 Wilk Compaction×Cultivation 6 16 0.57897 0.84 0.558500 Pillai Compaction×Cultivation 6 18 0.47451 0.93 0.495300 HL Compaction×Cultivation 6 9.1 0.63483 0.81 0.584700 Roy Compaction×Cultivation 3 9 0.40897 1.23 0.355400 Wilk Block 6 16 0.64104 0.66 0.679800 Pillai Block 6 18 0.37778 0.7 0.654300 HL Block 6 9.1 0.5306 0.68 0.670600 Roy Block 3 9 0.46782 1.4 0.304100

Season 4


F Value p value Sig


213

Season 5


F Value p value Sig


214

APPENDIX F – AMMONIUM LEVELS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Ammonium Levels By Compaction, Cultivation,Depth 0-10 cm

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)


215

Ammonium Levels By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)


216

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.374834073 0.300855478 0.399884796 0.03618 1 0.300605831 0.165882532 0.32567179 0.06732 2 0.091370791 0.022838142 0.18765877 0.06133 3 -0.1447029 0.08629043 -0.00320268 -0.13599 4 -0.70678132 -0.13572033 -0.34585806 -0.13985



Cross-Correlation:Ammonium Levels

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

217

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.55428 0.591168 Cultivation 1 16.20717 0.002416 ** Compaction×Cultivation 2 8.07254 0.008186 ** Block 2 0.401862 0.679411 Season 4 9.444786 0.000010 *** Season×Compaction 8 0.691267 0.697171 Season×Cultivation 4 2.889562 0.031902 * Season×Compaction×Cultivation 8 0.467082 0.873146

Overall MANOVA


F Value p value Sig

Wilk Cultivation 5 6 0.20791 4.57 0.045800 * Pillai Cultivation 5 6 0.79209 4.57 0.045800 * HL Cultivation 5 6 3.80978 4.57 0.045800 * Roy Cultivation 5 6 3.80978 4.57 0.045800 * Wilk Compaction 10 12 0.22061 1.35 0.305000 Pillai Compaction 10 14 0.90318 1.15 0.393000 HL Compaction 10 6.7 2.97184 1.69 0.254300 Roy Compaction 5 7 2.76922 3.88 0.052800 Wilk Compaction×Cultivation 10 12 0.09638 2.67 0.055500 Pillai Compaction×Cultivation 10 14 1.09306 1.69 0.180000 HL Compaction×Cultivation 10 6.7 7.4103 4.22 0.037100 * Roy Compaction×Cultivation 5 7 7.13481 9.99 0.004300 ** Wilk Block 10 12 0.28971 1.03 0.473900 Pillai Block 10 14 0.85878 1.05 0.452200 HL Block 10 6.7 1.93913 1.1 0.464600 Roy Block 5 7 1.62341 2.27 0.157100

218

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 9.92615 0.004218 ** Cultivation 1 11.369482 0.007099 ** Compaction × Cultivation 2 32.614782 0.000042 *** Block 2 1.2204024 0.335550 Month 2 8.1213334 0.002024 ** Month×Compaction 4 0.5126283 0.727043 Month×Cultivation 2 0.4873514 0.620199 Month×Compaction×Cultivation 4 1.4440088 0.250201

Season 2

Source Of Variation Df F Value p value Sig Compaction 2 0.1349308 0.875341 Cultivation 1 2.4482647 0.148722 Compaction × Cultivation 2 0.5506311 0.593114 Block 2 0.3888218 0.687671 Month 2 3.3622712 0.051607 Month×Compaction 4 3.0347654 0.037021 * Month×Cultivation 2 4.4844597 0.022145 * Month×Compaction×Cultivation 4 0.5105076 0.728538

Season 3

Source Of Variation Df F Value p value Sig Compaction 2 0.1977355 0.823719 Cultivation 1 24.47833 0.000581 *** Compaction × Cultivation 2 5.0843357 0.029965 * Block 2 1.1089639 0.367293 Month 2 2.0311934 0.153123 Month×Compaction 4 0.4866067 0.745427 Month×Cultivation 2 1.6554639 0.212079 Month×Compaction×Cultivation 4 1.4942311 0.235301

219

Season 4

Source Of Variation Df F Value p value Sig Compaction 2 0.7894138 0.480482 Cultivation 1 4.8778011 0.051685 Compaction × Cultivation 2 1.2255112 0.334176 Block 2 1.8676206 0.204559 Month 2 51.581116 0.000000 *** Month×Compaction 4 0.0290996 0.998246 Month×Cultivation 2 3.0976969 0.063569 Month×Compaction×Cultivation 4 0.1503591 0.961017

Season 5


220

Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.32401 5.56 0.023300 * Pillai Cultivation 3 8 0.67599 5.56 0.023300 * HL Cultivation 3 8 2.08637 5.56 0.023300 * Roy Cultivation 3 8 2.08637 5.56 0.023300 * Wilk Compaction 6 16 0.28025 2.37 0.078600 Pillai Compaction 6 18 0.76841 1.87 0.141300 HL Compaction 6 9.1 2.39464 3.07 0.063200 Roy Compaction 3 9 2.3198 6.96 0.010100 * Wilk Compaction×Cultivation 6 16 0.09157 6.15 0.001700 ** Pillai Compaction×Cultivation 6 18 0.94579 2.69 0.048100 * HL Compaction×Cultivation 6 9.1 9.51314 12.2 0.000700 ***Roy Compaction×Cultivation 3 9 9.47007 28.41 <.0001 ***Wilk Block 6 16 0.65911 0.62 0.713300 Pillai Block 6 18 0.36883 0.68 0.669000 HL Block 6 9.1 0.4748 0.61 0.719100 Roy Block 3 9 0.35556 1.07 0.410600

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.20901 10.09 0.004300 ** Pillai Cultivation 3 8 0.79099 10.09 0.004300 ** HL Cultivation 3 8 3.78444 10.09 0.004300 ** Roy Cultivation 3 8 3.78444 10.09 0.004300 ** Wilk Compaction 6 16 0.15302 4.15 0.010500 * Pillai Compaction 6 18 1.16514 4.19 0.008300 ** HL Compaction 6 9.1 3.45598 4.43 0.022800 * Roy Compaction 3 9 2.68022 8.04 0.006500 ** Wilk Compaction×Cultivation 6 16 0.51394 1.05 0.429100 Pillai Compaction×Cultivation 6 18 0.49822 1 0.458000 HL Compaction×Cultivation 6 9.1 0.92209 1.18 0.393500 Roy Compaction×Cultivation 3 9 0.89567 2.69 0.109400 Wilk Block 6 16 0.53485 0.98 0.470400 Pillai Block 6 18 0.50742 1.02 0.443800 HL Block 6 9.1 0.79067 1.01 0.472400 Roy Block 3 9 0.6733 2.02 0.181700

221

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.26095 7.55 0.010100 * Pillai Cultivation 3 8 0.73905 7.55 0.010100 * HL Cultivation 3 8 2.83216 7.55 0.010100 * Roy Cultivation 3 8 2.83216 7.55 0.010100 * Wilk Compaction 6 16 0.72612 0.46 0.825700 Pillai Compaction 6 18 0.28684 0.5 0.798500 HL Compaction 6 9.1 0.35934 0.46 0.821200 Roy Compaction 3 9 0.29981 0.9 0.478500 Wilk Compaction×Cultivation 6 16 0.30654 2.15 0.103700 Pillai Compaction×Cultivation 6 18 0.7773 1.91 0.134700 HL Compaction×Cultivation 6 9.1 1.98871 2.55 0.099300 Roy Compaction×Cultivation 3 9 1.84007 5.52 0.019900 * Wilk Block 6 16 0.47386 1.21 0.352400 Pillai Block 6 18 0.58589 1.24 0.331200 HL Block 6 9.1 0.98427 1.26 0.360900 Roy Block 3 9 0.83288 2.5 0.125600

Season 4


F Value p value Sig

Wilk Cultivation 3 8 0.67201 1.3 0.339100 Pillai Cultivation 3 8 0.32799 1.3 0.339100 HL Cultivation 3 8 0.48808 1.3 0.339100 Roy Cultivation 3 8 0.48808 1.3 0.339100 Wilk Compaction 6 16 0.55689 0.91 0.514400 Pillai Compaction 6 18 0.46043 0.9 0.517900 HL Compaction 6 9.1 0.76458 0.98 0.489800 Roy Compaction 3 9 0.72146 2.16 0.162100 Wilk Compaction×Cultivation 6 16 0.52724 1.01 0.455300 Pillai Compaction×Cultivation 6 18 0.51642 1.04 0.430100 HL Compaction×Cultivation 6 9.1 0.81385 1.04 0.457400 Roy Compaction×Cultivation 3 9 0.69464 2.08 0.172700 Wilk Block 6 16 0.52835 1 0.457500 Pillai Block 6 18 0.5043 1.01 0.448600 HL Block 6 9.1 0.8309 1.07 0.446700 Roy Block 3 9 0.74833 2.24 0.152300

222

Season 5


F Value p value Sig


Note: The season 5 model has two dependent variables – month 16 is not included.

223

APPENDIX G – TOTAL MINERAL NITROGEN

LEVELS Degrees of freedom, test statistics, F values and p values have their usual interpretations.



represents the level of significance. If p is less than 0.001 then “***”, if p is less then

0.01 then “**” and if p is less than 0.05 then “*”.

Total Mineral Nitrogen Levels By Compaction, Cultivation, Depth 0-10 cm

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)


224

Total Mineral Nitrogen Levels By Compaction, Cultivation, Depth 0-10 cm (3MA Smoothed)

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)


225

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.149596438 0.290523525 0.265994179 0.14902 1 0.43981975 0.413163352 0.484185647 0.14639 2 0.355556502 0.489020628 0.426642884 0.04027 3 0.311261646 0.591480017 0.331237485 -0.02697 4 0.130664642 0.501739359 0.175608414 -0.02886



Cross-Correlation:Total N. Levels

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion

Cross-Correlation:Total N. LevelsSoil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

226

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 3.237539 0.082388 Cultivation 1 4.135041 0.069400 Compaction×Cultivation 2 0.095566 0.909678 Block 2 3.011809 0.094667 Season 4 20.70201 0.000000 *** Season×Compaction 8 2.791478 0.012731 * Season×Cultivation 4 0.583816 0.675846 Season×Compaction×Cultivation 8 0.947481 0.487387

Overall MANOVA


F Value p value Sig


227

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 6.041082 0.019045 * Cultivation 1 0.2516923 0.626746 Compaction × Cultivation 2 0.8330084 0.462793 Block 2 0.0676657 0.934997 Month 2 13.320379 0.000128 *** Month×Compaction 4 0.7994956 0.537388 Month×Cultivation 2 1.3335844 0.282366 Month×Compaction×Cultivation 4 1.9201531 0.139671

Season 2


Season 3

Source Of Variation Df F Value p value Sig Compaction 2 0.4805692 0.632007 Cultivation 1 2.3308639 0.157819 Compaction × Cultivation 2 4.9457739 0.032111 * Block 2 0.6367425 0.549173 Month 2 4.8493954 0.017028 * Month×Compaction 4 0.3432344 0.846042 Month×Cultivation 2 0.0697412 0.932823 Month×Compaction×Cultivation 4 1.0485044 0.403169

228

Season 4


Season 5


229

Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.7936 0.69 0.581400 Pillai Cultivation 3 8 0.2064 0.69 0.581400 HL Cultivation 3 8 0.26007 0.69 0.581400 Roy Cultivation 3 8 0.26007 0.69 0.581400 Wilk Compaction 6 16 0.28357 2.34 0.081500 Pillai Compaction 6 18 0.90768 2.49 0.062100 HL Compaction 6 9.1 1.85204 2.37 0.116600 Roy Compaction 3 9 1.35389 4.06 0.044300 * Wilk Compaction×Cultivation 6 16 0.25513 2.61 0.058400 Pillai Compaction×Cultivation 6 18 0.75602 1.82 0.150900 HL Compaction×Cultivation 6 9.1 2.87577 3.69 0.038700 * Roy Compaction×Cultivation 3 9 2.86048 8.58 0.005300 ** Wilk Block 6 16 0.6056 0.76 0.611400 Pillai Block 6 18 0.40366 0.76 0.611300 HL Block 6 9.1 0.63599 0.82 0.583800 Roy Block 3 9 0.61098 1.83 0.211400

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.65062 1.43 0.303600 Pillai Cultivation 3 8 0.34938 1.43 0.303600 HL Cultivation 3 8 0.537 1.43 0.303600 Roy Cultivation 3 8 0.537 1.43 0.303600 Wilk Compaction 6 16 0.3996 1.55 0.224800 Pillai Compaction 6 18 0.72665 1.71 0.175500 HL Compaction 6 9.1 1.18654 1.52 0.273300 Roy Compaction 3 9 0.78307 2.35 0.140600 Wilk Compaction×Cultivation 6 16 0.53245 0.99 0.465600 Pillai Compaction×Cultivation 6 18 0.51254 1.03 0.435900 HL Compaction×Cultivation 6 9.1 0.7936 1.02 0.470500 Roy Compaction×Cultivation 3 9 0.66689 2 0.184500 Wilk Block 6 16 0.41811 1.46 0.254400 Pillai Block 6 18 0.60215 1.29 0.310000 HL Block 6 9.1 1.34324 1.72 0.221500 Roy Block 3 9 1.30614 3.92 0.048300 *

230

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.73508 0.96 0.456700 Pillai Cultivation 3 8 0.26492 0.96 0.456700 HL Cultivation 3 8 0.3604 0.96 0.456700 Roy Cultivation 3 8 0.3604 0.96 0.456700 Wilk Compaction 6 16 0.69067 0.54 0.768900 Pillai Compaction 6 18 0.31193 0.55 0.760500 HL Compaction 6 9.1 0.44409 0.57 0.746300 Roy Compaction 3 9 0.43543 1.31 0.331200 Wilk Compaction×Cultivation 6 16 0.31378 2.09 0.111300 Pillai Compaction×Cultivation 6 18 0.79948 2 0.119300 HL Compaction×Cultivation 6 9.1 1.82595 2.34 0.120300 Roy Compaction×Cultivation 3 9 1.60042 4.8 0.029000 * Wilk Block 6 16 0.66737 0.6 0.728200 Pillai Block 6 18 0.34969 0.64 0.700400 HL Block 6 9.1 0.47285 0.61 0.720800 Roy Block 3 9 0.41057 1.23 0.353900

Season 4


F Value p value Sig


231

Season 5


F Value p value Sig

Wilk Cultivation 3 8 0.96712 0.09 0.963100 Pillai Cultivation 3 8 0.03288 0.09 0.963100 HL Cultivation 3 8 0.034 0.09 0.963100 Roy Cultivation 3 8 0.034 0.09 0.963100 Wilk Compaction 6 16 0.40778 1.51 0.237600 Pillai Compaction 6 18 0.71627 1.67 0.184800 HL Compaction 6 9.1 1.14812 1.47 0.288000 Roy Compaction 3 9 0.73328 2.2 0.157700 Wilk Compaction×Cultivation 6 16 0.29029 2.28 0.087700 Pillai Compaction×Cultivation 6 18 0.86384 2.28 0.081800 HL Compaction×Cultivation 6 9.1 1.91396 2.45 0.108400 Roy Compaction×Cultivation 3 9 1.57736 4.73 0.030100 * Wilk Block 6 16 0.32136 2.04 0.119600 Pillai Block 6 18 0.7016 1.62 0.198600 HL Block 6 9.1 2.04035 2.62 0.093500 Roy Block 3 9 2.00471 6.01 0.015600 *

232

APPENDIX H – NITRATE DYNAMICS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Nitrate Dynamics By Compaction, Cultivation,Depth 0-10 cm

-50

-40

-30

-20

-10

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .

0, None 0, Plough

1, None 1, Plough

16, None 16, Plough

233

Nitrate Dynamics By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

-20

-10

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .

0, None 0, Plough

1, None 1, Plough16, None 16, Plough

234

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.151193688 0.096098762 0.040455248 0.01601 1 0.229692186 0.332712651 0.181337575 0.01005 2 -0.32442784 0.428952959 0.03836094 0.04219 3 -0.15824778 -0.06717411 -0.25523954 -0.01604 4 0.121083764 -0.25202723 -0.23340802 -0.00451



Cross-Correlation:Nitrate Dynamics

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

235

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.361981 0.705056 Cultivation 1 0.001363 0.971281 Compaction×Cultivation 2 1.458587 0.278075 Block 2 0.415109 0.671141 Season 4 3.56246 0.012676 * Season×Compaction 8 1.068198 0.400851 Season×Cultivation 4 0.930248 0.454375 Season×Compaction×Cultivation 8 2.585801 0.019593 *

Overall MANOVA


F Value p value Sig

Wilk Cultivation 5 6 0.7117 0.49 0.777100 Pillai Cultivation 5 6 0.2883 0.49 0.777100 HL Cultivation 5 6 0.40509 0.49 0.777100 Roy Cultivation 5 6 0.40509 0.49 0.777100 Wilk Compaction 10 12 0.29475 1.01 0.486000 Pillai Compaction 10 14 0.8311 1 0.489900 HL Compaction 10 6.7 1.96573 1.12 0.457300 Roy Compaction 5 7 1.71706 2.4 0.142000 Wilk Compaction×Cultivation 10 12 0.22964 1.3 0.327000 Pillai Compaction×Cultivation 10 14 0.86304 1.06 0.446400 HL Compaction×Cultivation 10 6.7 2.95108 1.68 0.257300 Roy Compaction×Cultivation 5 7 2.80731 3.93 0.051200 Wilk Block 10 12 0.38124 0.74 0.676200 Pillai Block 10 14 0.68382 0.73 0.689300 HL Block 10 6.7 1.45235 0.83 0.621700 Roy Block 5 7 1.3234 1.85 0.221300

236

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 3.1790453 0.085376 Cultivation 1 0.1646168 0.693486 Compaction × Cultivation 2 0.8604193 0.452071 Block 2 3.941777 0.054668 Month 2 0.8581284 0.436558 Month×Compaction 4 0.5881418 0.674368 Month×Cultivation 2 5.4327159 0.011319 * Month×Compaction×Cultivation 4 1.7116214 0.180283

Season 2


Season 3


237

Season 4


Season 5


238

Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.47737 2.92 0.100300 Pillai Cultivation 3 8 0.52263 2.92 0.100300 HL Cultivation 3 8 1.09481 2.92 0.100300 Roy Cultivation 3 8 1.09481 2.92 0.100300 Wilk Compaction 6 16 0.46914 1.23 0.343600 Pillai Compaction 6 18 0.53604 1.1 0.400800 HL Compaction 6 9.1 1.1205 1.44 0.299100 Roy Compaction 3 9 1.11055 3.33 0.070100 Wilk Compaction×Cultivation 6 16 0.44637 1.32 0.302500 Pillai Compaction×Cultivation 6 18 0.60111 1.29 0.311300 HL Compaction×Cultivation 6 9.1 1.13389 1.45 0.293600 Roy Compaction×Cultivation 3 9 1.03068 3.09 0.082300 Wilk Block 6 16 0.2136 3.1 0.032700 * Pillai Block 6 18 1.05712 3.36 0.021100 * HL Block 6 9.1 2.41433 3.1 0.061900 Roy Block 3 9 1.64288 4.93 0.027100 *

Season 2


F Value p value Sig


239

Season 3


F Value p value Sig


Season 4


F Value p value Sig


240

Season 5


F Value p value Sig

Wilk Cultivation 3 8 0.98709 0.03 0.990600 Pillai Cultivation 3 8 0.01291 0.03 0.990600 HL Cultivation 3 8 0.01308 0.03 0.990600 Roy Cultivation 3 8 0.01308 0.03 0.990600 Wilk Compaction 6 16 0.60638 0.76 0.612900 Pillai Compaction 6 18 0.39475 0.74 0.626100 HL Compaction 6 9.1 0.64725 0.83 0.575000 Roy Compaction 3 9 0.64435 1.93 0.194800 Wilk Compaction×Cultivation 6 16 0.32519 2.01 0.124000 Pillai Compaction×Cultivation 6 18 0.75246 1.81 0.153800 HL Compaction×Cultivation 6 9.1 1.83635 2.35 0.118900 Roy Compaction×Cultivation 3 9 1.69552 5.09 0.024900 * Wilk Block 6 16 0.43874 1.36 0.289200 Pillai Block 6 18 0.62738 1.37 0.278700 HL Block 6 9.1 1.12857 1.45 0.295800 Roy Block 3 9 0.97381 2.92 0.092600

241

APPENDIX I – AMMONIUM DYNAMICS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Ammonium Dynamics By Compaction, Cultivation,Depth 0-10 cm

-60

-40

-20

0

20

40

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)

0, None 0, Plough 1, None

1, Plough 16, None 16, Plough

242

Ammonium Dynamics By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

-20

-15

-10

-5

0

5

10

15

20

25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)



243

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 -0.30827252 0.070493888 -0.22071841 -0.1233 1 -0.06403202 0.050717585 -0.20659955 -0.15974 2 -0.19477609 -0.02038566 -0.31777376 -0.11039 3 -0.04303375 -0.29792401 -0.31542012 -0.06122 4 0.501893265 -0.17026102 0.049907475 -0.04534



Cross-Correlation:Ammonium Dynamics

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

244

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.586346 0.574395 Cultivation 1 1.163804 0.306011 Compaction×Cultivation 2 4.236824 0.046477 * Block 2 1.90036 0.199752 Season 4 4.502184 0.003607 ** Season×Compaction 8 1.356114 0.239880 Season×Cultivation 4 1.917511 0.122747 Season×Compaction×Cultivation 8 0.342826 0.944551

Overall MANOVA


F Value p value Sig

Wilk Cultivation 5 6 0.43147 1.58 0.294900 Pillai Cultivation 5 6 0.56853 1.58 0.294900 HL Cultivation 5 6 1.31765 1.58 0.294900 Roy Cultivation 5 6 1.31765 1.58 0.294900 Wilk Compaction 10 12 0.2916 1.02 0.478400 Pillai Compaction 10 14 0.8445 1.02 0.471600 HL Compaction 10 6.7 1.96264 1.12 0.458100 Roy Compaction 5 7 1.68578 2.36 0.146800 Wilk Compaction×Cultivation 10 12 0.11402 2.35 0.081100 Pillai Compaction×Cultivation 10 14 1.28728 2.53 0.055100 HL Compaction×Cultivation 10 6.7 4.25061 2.42 0.131500 Roy Compaction×Cultivation 5 7 3.124 4.37 0.039900 * Wilk Block 10 12 0.15152 1.88 0.148800 Pillai Block 10 14 1.214 2.16 0.091000 HL Block 10 6.7 3.18748 1.81 0.225900 Roy Block 5 7 1.95102 2.73 0.111400

245

APPENDIX J – TOTAL MINERAL NITROGEN

DYNAMICS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Total Mineral Nitrogen Dynamics By Compaction, Cultivation, Depth 0-10 cm

-60

-40

-20

0

20

40

60

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)



246

Total Mineral Nitrogen Dynamics By Compaction, Cultivation, Depth 0-10 cm (3MA Smoothed)

-30

-20

-10

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)



247

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.072752525 0.249203995 0.114758559 -0.06176 1 0.199769622 0.241858587 0.065315971 -0.02599 2 -0.31710666 0.275800519 -0.09265454 -0.01167 3 -0.1705821 -0.15791168 -0.35713878 -0.03348 4 0.011916034 -0.36315432 -0.27357296 -0.02251



Cross-Correlation:Total N. Dynamics

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

248

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.113938 0.893455 Cultivation 1 0.172506 0.686665 Compaction×Cultivation 2 0.564854 0.585573 Block 2 0.307029 0.742323 Season 4 7.699315 0.000071 *** Season×Compaction 8 0.952014 0.483951 Season×Cultivation 4 0.737626 0.570941 Season×Compaction×Cultivation 8 1.951618 0.073588

Overall MANOVA


F Value p value Sig


249

APPENDIX K – NITRATE LEACHING Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Nitrate Leaching By Compaction, Cultivation,Depth 0-10 cm

-50

-40

-30

-20

-10

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .


250

Nitrate Leaching By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

-20

-10

0

10

20

30

40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n N

itrat

e (k

gN/h

a) .


251

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 -0.22372978 0.196864848 -0.02160497 -0.06462 1 0.036330546 0.136222382 0.020689343 -0.06545 2 -0.47593451 -0.10248736 -0.33868516 -0.07224 3 -0.00302566 -0.40839496 -0.41043598 0.00638 4 -0.07609275 -0.38222812 -0.267336 -0.04805



Cross-Correlation:Nitrate Leaching

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

252

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.623358 0.555740 Cultivation 1 5.686292 0.038311 * Compaction×Cultivation 2 1.221334 0.335299 Block 2 0.4524 0.648502 Season 4 4.303844 0.004686 ** Season×Compaction 8 1.451897 0.200016 Season×Cultivation 4 0.478589 0.751252 Season×Compaction×Cultivation 8 0.615986 0.760030

Overall MANOVA


F Value p value Sig


253

APPENDIX L – AMMONIUM LEACHING Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Ammonium Leaching By Compaction, Cultivation,Depth 0-10 cm

-25

-20

-15

-10

-5

0

5

10

15

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)


254

Ammonium Leaching By Compaction, Cultivation,Depth 0-10 cm (3MA Smoothed)

-10

-8

-6

-4

-2

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n A

mm

oniu

m (k

gN/h

a)


255

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 -0.20617833 0.178228084 -0.01761848 -0.00897 1 -0.04946184 0.253037596 0.13081083 -0.03976 2 -0.06037652 0.135222092 -0.01605389 -0.06684 3 0.163350894 -0.03935116 -0.13420019 -0.03544 4 0.306248611 0.041600246 0.062753102 -0.05768



Cross-Correlation:Ammonium Leaching

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

256

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 1.166713 0.350414 Cultivation 1 9.036143 0.013207 * Compaction×Cultivation 2 0.168363 0.847393 Block 2 0.070164 0.932696 Season 4 3.827587 0.008854 ** Season×Compaction 8 1.268665 0.281992 Season×Cultivation 4 1.356305 0.263073 Season×Compaction×Cultivation 8 0.338753 0.946410

Overall MANOVA


F Value p value Sig

Wilk Cultivation 5 6 0.32761 2.46 0.151600 Pillai Cultivation 5 6 0.67239 2.46 0.151600 HL Cultivation 5 6 2.05238 2.46 0.151600 Roy Cultivation 5 6 2.05238 2.46 0.151600 Wilk Compaction 10 12 0.16027 1.8 0.166800 Pillai Compaction 10 14 1.04071 1.52 0.230400 HL Compaction 10 6.7 3.98562 2.27 0.149500 Roy Compaction 5 7 3.64124 5.1 0.027400 * Wilk Compaction×Cultivation 10 12 0.41624 0.66 0.740800 Pillai Compaction×Cultivation 10 14 0.64082 0.66 0.742700 HL Compaction×Cultivation 10 6.7 1.26535 0.72 0.692100 Roy Compaction×Cultivation 5 7 1.14568 1.6 0.274700 Wilk Block 10 12 0.18663 1.58 0.224600 Pillai Block 10 14 1.12079 1.78 0.156100 HL Block 10 6.7 2.71108 1.54 0.294600 Roy Block 5 7 1.79176 2.51 0.131200

257

APPENDIX M – TOTAL MINERAL NITROGEN

LEACHING Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Total Mineral Nitrogen Leaching By Compaction, Cultivation, Depth 0-10 cm

-60

-40

-20

0

20

40

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)


258

Total Mineral Nitrogen Leaching By Compaction, Cultivation, Depth 0-10 cm (3MA Smoothed)

-20

-10

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Month

Mea

n M

iner

al N

itrog

en (k

gN/h

a)


259

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 -0.17141961 0.243220592 0.062611986 -0.03378 1 0.050710422 0.225847715 0.105032715 -0.02691 2 -0.47405996 0.003430887 -0.27977062 -0.05707 3 -0.06907472 -0.34116018 -0.41998601 0.00437 4 -0.03634688 -0.39590333 -0.31650144 -0.07238



Cross-Correlation:Total N. Leaching

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

260

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.675458 0.530696 Cultivation 1 5.058849 0.048246 * Compaction×Cultivation 2 1.110948 0.366697 Block 2 0.087981 0.916479 Season 4 6.180628 0.000428 *** Season×Compaction 8 1.101983 0.378597 Season×Cultivation 4 0.375236 0.825161 Season×Compaction×Cultivation 8 0.937304 0.495150

Overall MANOVA


F Value p value Sig


261

APPENDIX N – MICROBIAL CARBON LEVELS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Microbial Carbon Levels By Compaction, Cultivation

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Car

bon

(µg/

g)


262

Microbial Carbon Levels By Compaction, Cultivation (3MA Smoothed)

0

100

200

300

400

500

600

700

800

900

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Car

bon

(µg/

g)


263

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.085570019 0.238078047 0.227729794 0.28629 1 0.284682759 0.184128101 0.285440056 0.20622 2 0.565368398 0.060178663 -0.0113084 0.13565 3 0.028254635 0.368053099 0.273313942 0.09367 4 -0.2751191 -0.02979152 -0.15943005 0.19816


Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.727608236 0.326338458 0.348417652 <.0001 1 0.252214465 0.464537119 0.250899746 0.0015 2 0.018022878 0.818534269 0.965641537 0.0464 3 0.917272112 0.160734648 0.305719589 0.1893 4 0.320993117 0.916062829 0.570338739 0.0077


Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

264

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.39208 0.685596 Cultivation 1 0.856956 0.376392 Compaction×Cultivation 2 3.435646 0.073157 Block 2 5.59942 0.023358 * Season 3 5.247222 0.004160 ** Season×Compaction 6 2.956556 0.018880 * Season×Cultivation 3 0.40138 0.752847 Season×Compaction×Cultivation 6 1.794473 0.127962

Overall MANOVA


F Value p value Sig

Wilk Cultivation 4 7 0.39604 2.67 0.121700 Pillai Cultivation 4 7 0.60396 2.67 0.121700 HL Cultivation 4 7 1.52498 2.67 0.121700 Roy Cultivation 4 7 1.52498 2.67 0.121700 Wilk Compaction 8 14 0.26683 1.64 0.200400 Pillai Compaction 8 16 0.8934 1.61 0.197200 HL Compaction 8 8 2.14716 1.79 0.214100 Roy Compaction 4 8 1.81659 3.63 0.056900 Wilk Compaction×Cultivation 8 14 0.28697 1.52 0.236700 Pillai Compaction×Cultivation 8 16 0.90919 1.67 0.182800 HL Compaction×Cultivation 8 8 1.80119 1.5 0.289500 Roy Compaction×Cultivation 4 8 1.2577 2.52 0.124300 Wilk Block 8 14 0.17264 2.46 0.067200 Pillai Block 8 16 0.93381 1.75 0.161700 HL Block 8 8 4.1757 3.48 0.048500 * Roy Block 4 8 4.0224 8.04 0.006600 **

265

Seasonal Split Plot

Season 1


Season 2

Source Of Variation Df F Value p value Sig Compaction 2 0.2814361 0.760484 Cultivation 1 0.1273823 0.728578 Compaction × Cultivation 2 4.4823787 0.040763 * Block 2 2.9838509 0.096336 Month 2 1.973918 0.160826 Month×Compaction 4 1.3737506 0.272592 Month×Cultivation 2 0.3409503 0.714483 Month×Compaction×Cultivation 4 0.4103859 0.799361

Season 3

Source Of Variation Df F Value p value Sig Compaction 2 1.1097742 0.367049 Cultivation 1 1.7627457 0.213792 Compaction × Cultivation 2 4.4073715 0.042414 * Block 2 10.145989 0.003921 ** Month 2 2.5780606 0.096778 Month×Compaction 4 0.1928468 0.939761 Month×Cultivation 2 0.2881036 0.752240 Month×Compaction×Cultivation 4 0.1105257 0.977646

266

Season 4


Seasonal MANOVA

Season 1


F Value p value Sig


267

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.87689 0.37 0.774000 Pillai Cultivation 3 8 0.12311 0.37 0.774000 HL Cultivation 3 8 0.14039 0.37 0.774000 Roy Cultivation 3 8 0.14039 0.37 0.774000 Wilk Compaction 6 16 0.3441 1.88 0.146600 Pillai Compaction 6 18 0.73405 1.74 0.169100 HL Compaction 6 9.1 1.67903 2.15 0.144000 Roy Compaction 3 9 1.53066 4.59 0.032600 * Wilk Compaction×Cultivation 6 16 0.32839 1.99 0.127700 Pillai Compaction×Cultivation 6 18 0.75357 1.81 0.152900 HL Compaction×Cultivation 6 9.1 1.79561 2.3 0.124800 Roy Compaction×Cultivation 3 9 1.64377 4.93 0.027000 * Wilk Block 6 16 0.49282 1.13 0.388200 Pillai Block 6 18 0.56443 1.18 0.360300 HL Block 6 9.1 0.91297 1.17 0.398500 Roy Block 3 9 0.76015 2.28 0.148200

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.80247 0.66 0.601300 Pillai Cultivation 3 8 0.19753 0.66 0.601300 HL Cultivation 3 8 0.24616 0.66 0.601300 Roy Cultivation 3 8 0.24616 0.66 0.601300 Wilk Compaction 6 16 0.69444 0.53 0.775200 Pillai Compaction 6 18 0.31669 0.56 0.753100 HL Compaction 6 9.1 0.42398 0.54 0.764200 Roy Compaction 3 9 0.38203 1.15 0.382100 Wilk Compaction×Cultivation 6 16 0.52339 1.02 0.447700 Pillai Compaction×Cultivation 6 18 0.47867 0.94 0.488700 HL Compaction×Cultivation 6 9.1 0.90666 1.16 0.402000 Roy Compaction×Cultivation 3 9 0.90229 2.71 0.107900 Wilk Block 6 16 0.15034 4.21 0.009900 ** Pillai Block 6 18 1.103 3.69 0.014400 * HL Block 6 9.1 3.96657 5.09 0.014900 * Roy Block 3 9 3.48273 10.45 0.002700 **

268

Season 4


F Value p value Sig

Wilk Cultivation 3 8 0.58453 1.9 0.208800 Pillai Cultivation 3 8 0.41547 1.9 0.208800 HL Cultivation 3 8 0.71078 1.9 0.208800 Roy Cultivation 3 8 0.71078 1.9 0.208800 Wilk Compaction 6 16 0.49965 1.11 0.401300 Pillai Compaction 6 18 0.5675 1.19 0.356100 HL Compaction 6 9.1 0.86702 1.11 0.424800 Roy Compaction 3 9 0.66491 1.99 0.185400 Wilk Compaction×Cultivation 6 16 0.57997 0.83 0.560500 Pillai Compaction×Cultivation 6 18 0.44453 0.86 0.543800 HL Compaction×Cultivation 6 9.1 0.68198 0.87 0.548600 Roy Compaction×Cultivation 3 9 0.61307 1.84 0.210300 Wilk Block 6 16 0.24935 2.67 0.054200 Pillai Block 6 18 0.87697 2.34 0.075500 HL Block 6 9.1 2.50382 3.21 0.056300 Roy Block 3 9 2.28181 6.85 0.010700 *

269

APPENDIX O – MICROBIAL NITROGEN LEVELS Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Microbial Nitrogen Levels By Compaction, Cultivation

0

50

100

150

200

250

300

350

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Nitr

ogen

(µg/

g)

0, None 0, Plough

1, None 1, Plough

16, None 16, Plough

270

Microbial Nitrogen Levels By Compaction, Cultivation (3MA Smoothed)

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Nitr

ogen

(µg/

g)

0, None 0, Plough

1, None 1, Plough

16, None 16, Plough

271

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 -0.12603623 0.065341584 0.259654118 0.37576 1 0.09066895 -0.24046069 0.033387725 0.34172 2 0.139363849 0.023840581 0.195881708 0.26111 3 -0.19666723 0.367826802 0.375130526 0.06075 4 -0.3582452 0.070785178 0.106448489 0.15905


Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.607153746 0.79042018 0.283038836 <.0001 1 0.720494694 0.336475947 0.895365091 <.0001 2 0.593715863 0.927633935 0.451168479 0.0001 3 0.465370149 0.161011908 0.152224469 0.3952 4 0.189809631 0.802061769 0.705740751 0.033

Cross-Correlation:Microbial Nitrogen

Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

272

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 0.73746 0.502634 Cultivation 1 3.028711 0.112427 Compaction×Cultivation 2 2.483271 0.133166 Block 2 8.05243 0.008249 ** Season 3 50.73149 0.000000 *** Season×Compaction 6 2.383457 0.048397 * Season×Cultivation 3 0.360821 0.781635 Season×Compaction×Cultivation 6 0.681095 0.665831

Overall MANOVA


F Value p value Sig

Wilk Cultivation 4 7 0.74643 0.59 0.678200 Pillai Cultivation 4 7 0.25357 0.59 0.678200 HL Cultivation 4 7 0.33972 0.59 0.678200 Roy Cultivation 4 7 0.33972 0.59 0.678200 Wilk Compaction 8 14 0.29369 1.48 0.249200 Pillai Compaction 8 16 0.76673 1.24 0.336900 HL Compaction 8 8 2.19924 1.83 0.204900 Roy Compaction 4 8 2.10134 4.2 0.040100 * Wilk Compaction×Cultivation 8 14 0.478 0.78 0.626500 Pillai Compaction×Cultivation 8 16 0.60274 0.86 0.565500 HL Compaction×Cultivation 8 8 0.92313 0.77 0.640200 Roy Compaction×Cultivation 4 8 0.67161 1.34 0.333900 Wilk Block 8 14 0.19026 2.26 0.086800 Pillai Block 8 16 0.9195 1.7 0.173800 HL Block 8 8 3.67895 3.07 0.066900 Roy Block 4 8 3.51482 7.03 0.009900 **

273

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 0.2247053 0.802677 Cultivation 1 1.0992693 0.319105 Compaction × Cultivation 2 1.4359927 0.282991 Block 2 3.7777499 0.059970 Month 2 0.561242 0.577809 Month×Compaction 4 2.9751665 0.039660 * Month×Cultivation 2 0.7881072 0.466122 Month×Compaction×Cultivation 4 2.3679733 0.081172

Season 2


Season 3

Source Of Variation Df F Value p value Sig Compaction 2 2.7697966 0.110357 Cultivation 1 1.6339222 0.230033 Compaction × Cultivation 2 2.9406423 0.098986 Block 2 11.835921 0.002310 ** Month 2 14.509175 0.000074 *** Month×Compaction 4 4.4397701 0.007927 ** Month×Cultivation 2 1.4491961 0.254577 Month×Compaction×Cultivation 4 0.4014639 0.805637

274

Season 4


Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.67994 1.26 0.352800 Pillai Cultivation 3 8 0.32006 1.26 0.352800 HL Cultivation 3 8 0.47073 1.26 0.352800 Roy Cultivation 3 8 0.47073 1.26 0.352800 Wilk Compaction 6 16 0.31874 2.06 0.116700 Pillai Compaction 6 18 0.70365 1.63 0.196600 HL Compaction 6 9.1 2.06713 2.65 0.090700 Roy Compaction 3 9 2.03257 6.1 0.015000 * Wilk Compaction×Cultivation 6 16 0.40742 1.51 0.237100 Pillai Compaction×Cultivation 6 18 0.72154 1.69 0.180100 HL Compaction×Cultivation 6 9.1 1.13792 1.46 0.292000 Roy Compaction×Cultivation 3 9 0.65374 1.96 0.190400 Wilk Block 6 16 0.28721 2.31 0.084800 Pillai Block 6 18 0.78514 1.94 0.129100 HL Block 6 9.1 2.22986 2.86 0.075500 Roy Block 3 9 2.1105 6.33 0.013400 *

275

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.72433 1.01 0.435300 Pillai Cultivation 3 8 0.27567 1.01 0.435300 HL Cultivation 3 8 0.38058 1.01 0.435300 Roy Cultivation 3 8 0.38058 1.01 0.435300 Wilk Compaction 6 16 0.40329 1.53 0.230500 Pillai Compaction 6 18 0.70941 1.65 0.191200 HL Compaction 6 9.1 1.20019 1.54 0.268300 Roy Compaction 3 9 0.88411 2.65 0.112200 Wilk Compaction×Cultivation 6 16 0.38277 1.64 0.199300 Pillai Compaction×Cultivation 6 18 0.74425 1.78 0.160500 HL Compaction×Cultivation 6 9.1 1.28073 1.64 0.240700 Roy Compaction×Cultivation 3 9 0.92006 2.76 0.103800 Wilk Block 6 16 0.21925 3.03 0.035700 * Pillai Block 6 18 1.00437 3.03 0.031700 * HL Block 6 9.1 2.5411 3.26 0.054200 Roy Block 3 9 2.04151 6.12 0.014800 *

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.47535 2.94 0.098700 Pillai Cultivation 3 8 0.52465 2.94 0.098700 HL Cultivation 3 8 1.1037 2.94 0.098700 Roy Cultivation 3 8 1.1037 2.94 0.098700 Wilk Compaction 6 16 0.27138 2.45 0.071000 Pillai Compaction 6 18 0.73067 1.73 0.172000 HL Compaction 6 9.1 2.67722 3.43 0.047100 * Roy Compaction 3 9 2.67439 8.02 0.006500 ** Wilk Compaction×Cultivation 6 16 0.44624 1.33 0.302300 Pillai Compaction×Cultivation 6 18 0.61073 1.32 0.299100 HL Compaction×Cultivation 6 9.1 1.11325 1.43 0.302100 Roy Compaction×Cultivation 3 9 0.98342 2.95 0.090800 Wilk Block 6 16 0.17164 3.77 0.015700 * Pillai Block 6 18 0.8329 2.14 0.098500 HL Block 6 9.1 4.79977 6.15 0.008000 ** Roy Block 3 9 4.79426 14.38 0.000900 ***

276

Season 4


F Value p value Sig

Wilk Cultivation 3 8 0.65149 1.43 0.305000 Pillai Cultivation 3 8 0.34851 1.43 0.305000 HL Cultivation 3 8 0.53495 1.43 0.305000 Roy Cultivation 3 8 0.53495 1.43 0.305000 Wilk Compaction 6 16 0.51769 1.04 0.436400 Pillai Compaction 6 18 0.5572 1.16 0.370400 HL Compaction 6 9.1 0.78701 1.01 0.474800 Roy Compaction 3 9 0.49443 1.48 0.283800 Wilk Compaction×Cultivation 6 16 0.60014 0.78 0.600600 Pillai Compaction×Cultivation 6 18 0.41514 0.79 0.592300 HL Compaction×Cultivation 6 9.1 0.6408 0.82 0.580100 Roy Compaction×Cultivation 3 9 0.59824 1.79 0.218100 Wilk Block 6 16 0.31867 2.06 0.116600 Pillai Block 6 18 0.82148 2.09 0.105300 HL Block 6 9.1 1.6983 2.18 0.140600 Roy Block 3 9 1.3795 4.14 0.042300 *

277

APPENDIX P – MICROBIAL C:N RATIO Degrees of freedom, test statistics, F values and p values have their usual interpretations.






Microbial Carbon to Nitrogen Ratio By Compaction, Cultivation

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Car

bon

to N

itrog

en R

atio


278

Microbial Carbon to Nitrogen Ratio By Compaction, Cultivation (3MA Smoothed)

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Month

Mea

n M

icro

bial

Car

bon

to N

itrog

en R

atio


279

Correlations

Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.181414152 0.019091962 -0.21189515 -0.25247 1 0.008497303 0.354591055 0.075656439 -0.28758 2 0.068060908 -0.00338806 -0.23589943 -0.24208 3 0.249237729 -0.29198277 -0.34000751 0.00542 4 0.316141731 -0.09824805 -0.19648889 -0.03484


Lag Of… Rainfall Max. Temp Min. Temp Moisture 0 0.457310377 0.938164708 0.383821949 <.0001 1 0.973305208 0.148792127 0.765423105 <.0001 2 0.795212646 0.989703426 0.36202593 0.0003 3 0.351911053 0.272499568 0.19756854 0.9396 4 0.250994357 0.7275801 0.482761667 0.6425


Rainfall

-1

-0.5

0

0.5

1

0 1 2 3 4

Rainfall Lag

Cor

rela

tion


Max. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Max. Temp. Lag

Cor

rela

tion


Min. Temp.

-1

-0.5

0

0.5

1

0 1 2 3 4

Min. Temp. Lag

Cor

rela

tion


Soil Moisture

-1

-0.5

0

0.5

1

0 1 2 3 4

Soil Moisture Lag

Cor

rela

tion

280

Overall Split Plot

Source Of Variation Df F Value p value Sig Compaction 2 1.786044 0.217153 Cultivation 1 7.516088 0.020777 * Compaction×Cultivation 2 0.102909 0.903153 Block 2 5.37481 0.025999 * Season 3 107.9573 0.000000 *** Season×Compaction 6 3.059472 0.015975 * Season×Cultivation 3 4.632102 0.007704 ** Season×Compaction×Cultivation 6 5.364746 0.000483 ***

Overall MANOVA


F Value p value Sig

Wilk Cultivation 4 7 0.25088 5.23 0.028600 * Pillai Cultivation 4 7 0.74912 5.23 0.028600 * HL Cultivation 4 7 2.98597 5.23 0.028600 * Roy Cultivation 4 7 2.98597 5.23 0.028600 * Wilk Compaction 8 14 0.19546 2.21 0.093100 Pillai Compaction 8 16 1.0953 2.42 0.062900 HL Compaction 8 8 2.62856 2.19 0.144100 Roy Compaction 4 8 1.80393 3.61 0.057800 Wilk Compaction×Cultivation 8 14 0.19083 2.26 0.087500 Pillai Compaction×Cultivation 8 16 1.07103 2.31 0.073700 HL Compaction×Cultivation 8 8 2.86792 2.39 0.119600 Roy Compaction×Cultivation 4 8 2.26102 4.52 0.033400 * Wilk Block 8 14 0.23047 1.9 0.141100 Pillai Block 8 16 0.90789 1.66 0.184000 HL Block 8 8 2.73856 2.28 0.132200 Roy Block 4 8 2.49825 5 0.025800 *

281

Seasonal Split Plot

Season 1

Source Of Variation Df F Value p value Sig Compaction 2 3.1441144 0.087223 Cultivation 1 0.1351852 0.720778 Compaction × Cultivation 2 0.0376455 0.963190 Block 2 3.1389074 0.087502 Month 2 0.0707726 0.931867 Month×Compaction 4 3.2318507 0.029541 * Month×Cultivation 2 3.2104226 0.058141 Month×Compaction×Cultivation 4 9.3208489 0.000108 ***

Season 2

Source Of Variation Df F Value p value Sig Compaction 2 1.9565573 0.191813 Cultivation 1 10.422711 0.009045 ** Compaction × Cultivation 2 3.0133435 0.094576 Block 2 1.0109636 0.398226 Month 2 121.84314 0.000000 *** Month×Compaction 4 3.4508415 0.023072 * Month×Cultivation 2 0.7271065 0.493650 Month×Compaction×Cultivation 4 1.8520383 0.151800

Season 3

Source Of Variation Df F Value p value Sig Compaction 2 3.5457504 0.068564 Cultivation 1 0.1743075 0.685134 Compaction × Cultivation 2 4.2775027 0.045467 * Block 2 7.0336997 0.012384 * Month 2 20.237078 0.000007 *** Month×Compaction 4 14.306756 0.000004 *** Month×Cultivation 2 9.5827093 0.000873 *** Month×Compaction×Cultivation 4 1.0279365 0.413076

282

Season 4

Source Of Variation Df F Value p value Sig Compaction 2 2.0741743 0.176389 Cultivation 1 11.677755 0.006577 ** Compaction × Cultivation 2 8.6491272 0.006597 ** Block 2 0.9382031 0.423228 Month 2 1.3062356 0.289409 Month×Compaction 4 0.6745547 0.616144 Month×Cultivation 2 4.9171217 0.016228 * Month×Compaction×Cultivation 4 1.4502301 0.248306

Seasonal MANOVA

Season 1


F Value p value Sig

Wilk Cultivation 3 8 0.48111 2.88 0.103200 Pillai Cultivation 3 8 0.51889 2.88 0.103200 HL Cultivation 3 8 1.07854 2.88 0.103200 Roy Cultivation 3 8 1.07854 2.88 0.103200 Wilk Compaction 6 16 0.26806 2.48 0.068300 Pillai Compaction 6 18 0.78752 1.95 0.127400 HL Compaction 6 9.1 2.5231 3.23 0.055200 Roy Compaction 3 9 2.43805 7.31 0.008700 ** Wilk Compaction×Cultivation 6 16 0.15982 4 0.012200 * Pillai Compaction×Cultivation 6 18 0.896 2.43 0.066900 HL Compaction×Cultivation 6 9.1 4.90792 6.29 0.007500 ** Roy Compaction×Cultivation 3 9 4.8357 14.51 0.000900 ***Wilk Block 6 16 0.33944 1.91 0.140900 Pillai Block 6 18 0.7683 1.87 0.141400 HL Block 6 9.1 1.62859 2.09 0.153300 Roy Block 3 9 1.40223 4.21 0.040700 *

283

Season 2


F Value p value Sig

Wilk Cultivation 3 8 0.45011 3.26 0.080700 Pillai Cultivation 3 8 0.54989 3.26 0.080700 HL Cultivation 3 8 1.22167 3.26 0.080700 Roy Cultivation 3 8 1.22167 3.26 0.080700 Wilk Compaction 6 16 0.13576 4.57 0.006900 ** Pillai Compaction 6 18 0.9546 2.74 0.045300 * HL Compaction 6 9.1 5.70026 7.31 0.004500 ** Roy Compaction 3 9 5.58101 16.74 0.000500 ***Wilk Compaction×Cultivation 6 16 0.14847 4.25 0.009500 ** Pillai Compaction×Cultivation 6 18 1.17356 4.26 0.007600 ** HL Compaction×Cultivation 6 9.1 3.56643 4.57 0.020700 * Roy Compaction×Cultivation 3 9 2.78865 8.37 0.005700 ** Wilk Block 6 16 0.12022 5.02 0.004500 ** Pillai Block 6 18 1.28391 5.38 0.002400 ** HL Block 6 9.1 3.95635 5.07 0.015000 * Roy Block 3 9 2.72092 8.16 0.006200 **

Season 3


F Value p value Sig

Wilk Cultivation 3 8 0.1361 16.93 0.000800 ***Pillai Cultivation 3 8 0.8639 16.93 0.000800 ***HL Cultivation 3 8 6.34779 16.93 0.000800 ***Roy Cultivation 3 8 6.34779 16.93 0.000800 ***Wilk Compaction 6 16 0.12778 4.79 0.005600 ** Pillai Compaction 6 18 0.89702 2.44 0.066500 HL Compaction 6 9.1 6.63165 8.5 0.002600 ** Roy Compaction 3 9 6.60224 19.81 0.000300 ***Wilk Compaction×Cultivation 6 16 0.21373 3.1 0.032800 * Pillai Compaction×Cultivation 6 18 0.78645 1.94 0.128200 HL Compaction×Cultivation 6 9.1 3.67808 4.72 0.018900 * Roy Compaction×Cultivation 3 9 3.67785 11.03 0.002300 ** Wilk Block 6 16 0.08989 6.23 0.001600 ** Pillai Block 6 18 1.07905 3.52 0.017600 * HL Block 6 9.1 8.24579 10.57 0.001200 ** Roy Block 3 9 8.01118 24.03 0.000100 ***

284

Season 4


F Value p value Sig

Wilk Cultivation 3 8 0.28018 6.85 0.013400 * Pillai Cultivation 3 8 0.71982 6.85 0.013400 * HL Cultivation 3 8 2.56911 6.85 0.013400 * Roy Cultivation 3 8 2.56911 6.85 0.013400 * Wilk Compaction 6 16 0.58204 0.83 0.564700 Pillai Compaction 6 18 0.44206 0.85 0.547900 HL Compaction 6 9.1 0.67669 0.87 0.552600 Roy Compaction 3 9 0.60867 1.83 0.212600 Wilk Compaction×Cultivation 6 16 0.26025 2.56 0.062200 Pillai Compaction×Cultivation 6 18 0.90155 2.46 0.064600 HL Compaction×Cultivation 6 9.1 2.22072 2.85 0.076300 Roy Compaction×Cultivation 3 9 1.89214 5.68 0.018400 * Wilk Block 6 16 0.68459 0.56 0.758500 Pillai Block 6 18 0.3251 0.58 0.739900 HL Block 6 9.1 0.44658 0.57 0.744100 Roy Block 3 9 0.41226 1.24 0.352300

Documents

Statistical Analyses of Multivariate Time Series Data With Application to Compacting Effects on Soil Chemical and Biological Properties in Forestry