7
Stat Methods Appl (2012) 21:363–369 DOI 10.1007/s10260-012-0203-6 Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas Domenico Piccolo Accepted: 6 July 2012 / Published online: 21 July 2012 © Springer-Verlag 2012 Abstract We discuss the scientific contribution of Battaglia and Protopapas’ paper concerning the debate on global warming supported by an extensive analysis of tem- perature time series in the Alpine region. In the work, Authors use several exploratory and modelling tools for assessing and discriminating the presence of different patterns in the data. We add some general and specific considerations mainly devoted to the modelling stage of their analysis. Keywords ARIMA models · Time series classification · AR metric · Forecastability content · Non-linear models 1 Introduction Battaglia and Protopapas have offered an important addition to the debate on global warming supported by an extensive analysis of temperature time series in the Alpine region. A significant feature of this work is the use of several exploratory and mod- elling tools for assessing and discriminating the presence of different patterns in the data, with special reference to the last decades: thus, the paper is a valuable pro- totype of what a statistician should do when faced to dynamic data since it com- bines experimental data, different modelling approaches and advanced computational tools. A missing topic of the paper concerns the possibility to apply one of the sev- eral techniques of clustering time series data (Liao 2005, for a survey). In fact, the D. Piccolo (B ) Department TEOMESUS, Statistical Sciences Unit, University of Naples Federico II, Naples, Italy e-mail: [email protected] 123

Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

Embed Size (px)

Citation preview

Page 1: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

Stat Methods Appl (2012) 21:363–369DOI 10.1007/s10260-012-0203-6

Discussion of “An analysis of global warmingin the Alpine region based of nonlinear nonstationarytime series models” by F. Battagliaand M. K. Protopapas

Domenico Piccolo

Accepted: 6 July 2012 / Published online: 21 July 2012© Springer-Verlag 2012

Abstract We discuss the scientific contribution of Battaglia and Protopapas’ paperconcerning the debate on global warming supported by an extensive analysis of tem-perature time series in the Alpine region. In the work, Authors use several exploratoryand modelling tools for assessing and discriminating the presence of different patternsin the data. We add some general and specific considerations mainly devoted to themodelling stage of their analysis.

Keywords ARIMA models · Time series classification · AR metric · Forecastabilitycontent · Non-linear models

1 Introduction

Battaglia and Protopapas have offered an important addition to the debate on globalwarming supported by an extensive analysis of temperature time series in the Alpineregion. A significant feature of this work is the use of several exploratory and mod-elling tools for assessing and discriminating the presence of different patterns in thedata, with special reference to the last decades: thus, the paper is a valuable pro-totype of what a statistician should do when faced to dynamic data since it com-bines experimental data, different modelling approaches and advanced computationaltools.

A missing topic of the paper concerns the possibility to apply one of the sev-eral techniques of clustering time series data (Liao 2005, for a survey). In fact, the

D. Piccolo (B)Department TEOMESUS, Statistical Sciences Unit, University of Naples Federico II, Naples, Italye-mail: [email protected]

123

Page 2: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

364 D. Piccolo

regionalization objective is a relevant and common issue in meteorological, environ-mental and hydrological data analysis, as recently emphasized by Corduas (2011) whointroduced a mixed strategy for classifying time series, based on the use of regres-sion models and a testing procedure for the AR metric (Corduas and Piccolo 2008).Although the AR metric (Piccolo 1984, 1990) preliminarily assumes that series havebeen transformed in such a way that the observed data are realizations of an invert-ible ARIMA models, the use of such a distance could add further evidence about thecommon dynamic structure of the selected temperature series (see Otranto 2010 forempirical evidence and generalizations).

In this perspective, my discussion is mainly motivated by a different approach whenstudying dynamic data generated by the same phenomenon in different circumstances.Thus, I will not question the interesting results obtained by Authors but I will focus onsome general principles, few exploratory additions and some modelling issues whichmay be worth of consideration. A concluding remark will end this discussion.

2 General principles

I have a general criticism about the principle that a statistician should approach timeseries with a fitting purpose and without wondering whether it is possible to capturethe data generating process. In fact, when we observe a pattern in real data we only seea slice of a single realization of a stochastic process. Thus, any analysis involving timeseries data (motivated by forecasting, clustering, discrimination, control, etc. purposes)should be strongly anchored to the possibility of doing inference about the stochasticprocess that has generated those data. Surprisingly, although Authors state that “…,what we are searching for is not a formalization of the data generating process, but onlya parsimonious and efficient representation of the dynamic behavior of the observeddata” (Sect. 3), they do look for the data generating process of temperature when theycheck for a break in time trend motivated by CO2 and greenhouse gas emission, andother anthropic factors. Moreover, the Authors argument that hypothesis of no changeof temperature in the last years must be rejected, and they are oriented to check ifa model with structural breaks in the data is valuable and validated by data. Thisagain goes in the direction of searching for the data generating process supported bydata.

A second general principle is that we should distinguish in time series analysisbetween how much a model is able to fit (and forecast) data and how large the foreca-stability content of our data is. Indeed, it is well known that no model will improve theforecast of a series generated by a pure white noise and whatever the model is built forsuch data would simply be an artifact. As a consequence, it is necessary to establishwhat is the benchmark for forecastability (and, thus, for the explanation of data) isbefore looking for the “best model”. A possible measure of the forecastability contentof data should be nonparametric. In this respect, the spectral approach may be a conve-nient starting point for developing an appropriate measure. This general considerationseems relevant also for temperature data where a common pattern in the data is evi-dent. Indeed, Battaglia and Protopapas’ paper confirms that a simple model is reallyeffective for capturing the main features of the temperature time series of the Alpineregion.

123

Page 3: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

An analysis of global warming 365

1800 1850 1900 1950 2000

−2

−1

01

23

Sta

ndar

dize

d tim

e se

ries

Fig. 1 Standardized time series of temperatures

3 Exploratory evidence

Authors introduce temperature data with remarkable conciseness and clarity, and sup-port the geographical subdivision in four subregions by means of statistical indices andsome exploratory graphs. In the same perspective, we would make few considerationswhich may be employed when mastering such time series data.

First of all, it seems evident from Fig. 3 of Battaglia and Protopapas’ paper thattemperature series are well characterized by varying level and dispersion. The location(that is, the latitude) of the selected stations and some characteristics of the Alpineenvironment are well know circumstances which may cause such behaviour. Thus, afirst step would be to compare the time series removing the location and dispersioneffects by means of a simple standardization of the data. In Fig. 1 the overlap of the plotof the 15 standardized time series is displayed. We notice a homogeneous pattern, withfew limited discrepancies in some short periods. In this way, preliminary evidence infavour of the Authors’ choice to use a unique modelling tool for all data is achieved.

A second step that we would have taken concerns the search for a preliminarytransformation in order to check whether the standard assumptions for time seriesmodelling are satisfied or whether they may be best achieved by transforming data.In this respect, we found that the maximum likelihood estimation of the parameter ofthe Box–Cox transformation (Box and Cox 1964) leads to λ ranging in [−2.0, 0.79];often, the confidence interval for λ did not include 1. We are conscious that the pres-ence of a break in any temperature series induces a serious bias in the estimation ofλ. Notwithstanding, we would have preferred that the logarithmic transformation wasapplied to data (a logarithmic transformation improves Gaussianity of all series). Asa matter of fact, the joint use of first difference and logged series is a common devicefor modelling purpose and interpretation since ∇ log Xt is approximately the relative

123

Page 4: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

366 D. Piccolo

Fig. 2 Estimates of global autocorrelation functions of temperatures (stations by row)

rate of variation of the series, and the study of this quantity has been indicated byAuthors as one of the main questions (Sect. 1).

4 Modelling issues

We do not discuss the nonlinearity arguments (other discussants are more expertson these topics) but the selection of the IMA(1,1) models for fitting the series and,then, the Authors’ conclusion about the related inefficacy cast some questions. Theinitial choice has in fact relevant consequences on the final steps when piecewise andIMA(1,1) models are compared (see Table 7; Figure 10).

Figures 2 and 3 illustrate the estimates of the global and partial correlation functionsof the original data, respectively. We found no evidence of a ∇ operator from theseplots; instead, according to our experience and standard identification practice, thepattern of correlation structures (both global and partial) seems to support in favourof the presence of AR or ARMA components (first 2–3 lags, for instance).

123

Page 5: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

An analysis of global warming 367

Fig. 3 Estimates of partial autocorrelation functions of temperatures (stations by row)

Authors base their choice on automatic identification criteria. In principle we donot adhere to the blind use of such a procedure unless obliged by situations where alarge amount of time series are to be modelled. Thus, we wonder the reason why theydo not use an ad hoc modelling approach dealing with such a small number of series.Surely, there is a misprint in Table 2, since the specification of IMA(1,1) model is cor-rectly defined according the classical (Box and Jenkins 1970) notation but the negativeestimates of Table 2 is not consistent with the negative values of lag-1 autocorrelationestimates of first difference, say ρ̂(1).

However, our main concern with the IMA(1,1) results refers to the identification ofthe model, given the estimates of ρ̂(1) for the series ∇xt as listed in Table 1.

Then, in 4 cases (for the KAR, KRE, STR, WIE stations) | ρ̂(1) |> 0.5 and this vio-lates the constraint |ρ(1) |< 0.5 which is a necessary condition for a M A(1) processto be admissible. Moreover, for other sites (MIL, MUN, REG and TOR) the estimatesis about 0.49, that is very close the non-invertibility bound. As a consequence, wereally suspect that the introduction of a ∇ operator induces an overdifferencing in

123

Page 6: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

368 D. Piccolo

Table 1 Estimates of 1-lag autocorrelation function of first difference

Site Estimates Site Estimates Site Estimates

BER −0.481 MIL −0.487 STU −0.483

GEN −0.475 MUN −0.492 TOR −0.496

INN −0.477 PAD −0.455 UDI −0.457

KAR −0.501 REG −0.492 VER −0.469

KRE −0.524 STR −0.504 WIE −0.512

the observed time series which has partially recovered by the introduction of the θ

parameter.Finally, some words are needed in order to clarify the definition and the use of R2

measure (Sects. 2 and 4, and Tables 2 and 4) as a fitting device. We assume that Authorsuse this measure as a tool for comparing empirical initial and residual variances, sincethe index is non defined for a nonstationary process as is the case of IMA(1,1) mod-els. Then, we expected an increasing relationship among the estimated θ and R2. Inaddition, when applied to observed data, the index R2 discounts the effect of bothoperators ∇ and (1 − θ B) and we are unable to measure the effectiveness of a MAparameter in this context.

The whole problem of the difference operator may be, perhaps, overcome if we sim-ply compare the variance of the original xt and differenced ∇ xt series, respectively.In all cases, except for GEN, MIL and PAD stations (with a marginal reduction of 5.6,5.7, 4.2 %, respectively), the operator ∇ does increase the variance of all temperatureseries (from 2.7 % up to 41 %).

As a consequence, the choice of an IMA(1,1) model as a benchmark for improv-ing linear models cannot be supported and it cannot be used to compare linear andpiecewise nonlinear forecasts (Figure 10; Table 7).

5 Concluding remarks

Modelling multiple time series for regionalization is a rewarding task for assessingvalidity, consistency and usefulness of different dynamic structures. This paper meetsall these objectives and in this line of reasoning we would add a final consideration(derived by data analysis and by taking several arguments raised by Authors intoaccount).

It is evident that a major break occurred in the series around 1982–1984 years withvarying degree and this fact is undoubtedly the main objective evidence resulting fromthis work. However, temperature data (affected by various acting forces at local leveland obeying to very long atmospheric changes involving all the planet) maintain theirsound correlation structure, which may be altered by an increasing variability. As aconsequence, it would be useful to use models that include both the internal correlationstructure of data and a piecewise linear trend.

123

Page 7: Discussion of “An analysis of global warming in the Alpine region based of nonlinear nonstationary time series models” by F. Battaglia and M. K. Protopapas

An analysis of global warming 369

Thus, if g(Xt ; βt ) is the common process able to fit the time dynamic of tempera-tures and D(T )

t is a dummy variable (= 1 for t > T , and = 0 elsewhere, where T isthe time of the structural break), the generating process may be simplified as:

Xt =[γ0 + γ1 t + g(Xt ; βt )

] (1 − D(T )

t

)+

[c0 + c1 t + g(Xt ; βt )

]D(T )

t .

This structure may be equivalently written as

Xt = levelt + slopet t + g(Xt ; βt ),

where{

levelt = γ0 + (c0 − γ0) D(T )t ;

slopet = γ1 + (c1 − γ1) D(T )t .

In fact, the model selected by Authors is consistent with this proposal except thatthey avoid the maintenance of a common correlation structure since they refused anIMA(1,1) model.

As a final conclusion, Authors are to be warmly congratulated for producing awell written paper, with interesting results and accurate analysis and which resultsvery stimulating for further developments in a relevant area where accurate statisticalanalyses are necessary for a correct debate.

References

Box GEP, Cox DR (1964) An analysis of transformation. J R Stat Soc Ser B 26:211–252Box GEP, Jenkins GM (1970) Time series analysis: forecasting and control, revised edition 1976. Holden-

Day, San FranciscoCorduas M (2011) Clustering streamflow time series for regional classification. J Hydrol 407:73–80Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput

Stat Data Anal 52:1860–1872Liao TW (2005) Clustering of time series data—a survey. Pattern Recogn 38:1857–1874Otranto E (2010) Identifying financial time series with similar dynamic conditional correlation. Comput

Stat Data Anal 54:1–15Piccolo D (1984) Una topologia per la classe dei processi ARIMA. Statistica XLIV:47–59Piccolo D (1990) A distance measure for classifying ARMA models. J Time Ser Anal 11:153–163

123