38
© ECMWF March 26, 2020 Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim Stockdale Antje Weisheimer [email protected]

Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

© ECMWF March 26, 2020

Assessing the skill of SEAS5 and

The signal-to-noise paradox

With contributions from Stephanie Johnson and Tim Stockdale

Antje Weisheimer

[email protected]

Page 2: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Local SST bias is a function of § lead time§ season

System 5 (Nov 2017 onwards)System 4 (Nov 2011 - Oct 2017)System 3 (March 2007 - Nov 2011)

Nino3.4 SST drift

Page 3: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

In a perfect model:

RMSE ≃ ensemble spread

In SEAS5:

actual RMS errors > “perfect model” errors

SST forecast performance Nino3.4 SST forecast error and ensemble spread:lead-time dependence (all seasons)

RMSE forecast errorEnsemble spreadRMSE of persistence

forecast anomaly correlationpersistence anomaly correlation

Page 4: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

What is a perfect model ensemble?§ Perfect sampling of the underlying

probability distribution of the true state§ Over a large number of forecasts, the

statistical properties of the truth are identical to the statistical properties of a member of the ensemble

§ I.e., the truth is indistinguishable from the ensemble

§ Time-mean RMSE of the ensemble meanequals the time-mean ensemble spread

à spread is an indication of perfect model error

From Palmer et al. (2006)

Page 5: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

RMSE forecast errorEnsemble spreadRMSE of persistence

forecast anomaly correlationpersistence anomaly correlation

Nino3.4 SST forecast error and ensemble spread:seasonal dependence (lead-time: 5 months)

Page 6: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

6

System 4 (SEAS4)2011

SEAS5 2017

Atmosphere Cycle 36r4TL255 L91

Cycle 43r1TCo319 L91

Ocean NEMO v3.0ORCA 1.0-L42

NEMO v3.4ORCA 0.25-L75

Sea ice model Sampled climatology LIM2Atm. initial conditions ERA-Interim/Ops ERA-Interim/OpsOcean and sea ice initial conditions

OCEAN4 OCEAN5

In this presentation, plots primarily show seasonal means at one month forecast lead (i.e. month 2 to 4 of the forecast) using 25 ensemble members

System 4 vs SEAS5

Reference for SEAS5 documentation: Johnson et al. (2019)

Page 7: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

CRPSSSEAS5 - CRPSSSEAS4

7

DJF JJA

Improved ENSO skill

Reduced skill in eastern Indian Ocean

Reduced skill in NW AtlanticImproved skill around sea ice edge

Global 2m temperature skill differences

Page 8: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Nino3.4 SST bias in DJF

System 4

SEAS5

Large improvement in equatorial cold tongue bias

Page 9: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Nino3.4 SST skillbias anomaly correlation

Improvement in Niño 3.4 skill, particularly at longer lead times

Page 10: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Forecast lead time when ACC (NINO3.4) < 0.9

mon

th

S1 S2 S3 S4 SEAS5

Page 11: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

11

Cold SST anomalies in the eastern Indian Ocean are too large, too variable and too frequentResults in large errors in skill in the eastern box of Indian Ocean Dipole index

SEAS4SEAS5

month 2-4month 5-7

Verification month Verification month Verification month

RM

SE o

f ano

mal

ies

(⁰C)

ΔCRPSSBias Amplitude ratio RMSE

Eastern Indian Ocean

Page 12: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

12

Eastern Indian Ocean SST forecast anomalies

System 4

SEAS5

Page 13: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

13

§ Poor representation of the decadal variability in the Northwest Atlantic

§ Related to increased resolution of new ocean analysis system

SEAS4ERA-Interim

SEAS5ERA-Interim

DJF SST anomalies in Northwest Atlantic SEAS5 DJF SST bias

Northwest Atlantic

Page 14: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

SEAS4 used a simple empirical scheme that only captured the trend

but not the interannual variability

Adding LIM2 improves skill in predicting sea ice but introduces

sea ice biases

à Largest biases in summer

à Biggest skill improvements in autumn

14

JJA

SON

SEAS5 biasSEAS5 – SEAS4 RMSE Artic sea ice

Page 15: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Increased skill in T2m north of 70⁰ N associated with improved sea ice prediction

15

ASO mean 2 m temperature north of 70⁰ N July start - one month lead

Prognostic sea ice model introduces interannual variability in arctic sea ice extent

ASO mean sea ice extent north of 70⁰ N July start - one month lead

Artic sea ice

Page 16: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Zonally averaged profiles of § temperature (colour) § zonal wind (contours)

Bias in System 4

Bias in System 5

Largely reduced tropospheric temperature biases in DJF

Too strong and poleward displaced jets, especially in JJA

Strong cold tropopause biases

From Johnson et al. (2019)

Page 17: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Bias of geopotential height at 500 hPa

Mean-state changes in line with tropopause warming in SEAS5 Strong jet biases in JJA

From Johnson et al. (2019)

Page 18: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

From Johnson et al. (2019)

Ensemble mean correlation skill for 2m temperature

Page 19: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Ensemble mean correlation skill for precipitation

From Johnson et al. (2019)

Page 20: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Reliability diagrams

From Johnson et al. (2019)

2m temperature upper tercile evens in DJF

Tropics Europe

Overconfident forecasts

Page 21: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Hindcasts of the NAO in DJF

From Johnson et al. (2019)

Page 22: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Hindcast skill of the NAO, AO and PNA in DJFNov start dates 1981-2016, 51 ensemble members

ensemblesize

uncertainty

sampleyears

uncertainty

Two types of sampling uncertainty

Page 23: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

C3S multi-model hindcast period

Page 24: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

“ideal” situation “real” situation

Climatology

Forecastobservation

Seasonal forecasting in the extra-tropicswith low signal-to-noise ratios

Page 25: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Signal and noise

xm,n :variablexwithmembermandyearn

�̅� =1𝑁𝑀

8!

"

8#

$

𝑥#,! 𝑥! =1𝑀8#

$

𝑥#,!

𝑉𝐴𝑅&'&() =1𝑁𝑀

8!

"

8#

$

𝑥#,! − �̅�*

Mean: Ensemble mean:

𝑽𝑨𝑹𝒕𝒐𝒕𝒂𝒍 = 𝑽𝑨𝑹𝒔𝒊𝒈𝒏𝒂𝒍 + 𝑽𝑨𝑹𝒏𝒐𝒊𝒔𝒆

𝑉𝐴𝑅456!() =1𝑁8

!

"

𝑥! − �̅� * 𝑉𝐴𝑅!'547 =1𝑁𝑀

8!

"

8#

$

𝑥#,! − 𝑥!*

ensemblemeanvarianceà “signal”

varianceofensemblemembersaboutensemblemeanà “noise”

à S/N=VARsignal /VARnoise

Variance:

Page 26: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Seasonal forecasts of the winter NAOApril 2014 Ensemble hindcasts of the NAO index 1993-2012

with the Met Office model (GloSea GA3)

r=0.62 S/N=0.2

however, be obtained without thinning of the Markovchain, which reduces the time to generate 105 samplesto a few seconds. Potential scale reduction factorsclose to one (Gelman and Rubin 1992) and visual in-spection of trace plots were taken as evidence forsuccessful convergence and proper mixing of theMarkov chains.

d. Relation between ensemble mean and observations

The signal-plus-noise model can be used to learnabout the relationship between the observations and themeans of the ensemble forecasts. It follows from stan-dard normal theory (e.g., Mardia et al. 1979, their sec-tion 3.2) that if the model parameters u are known, theconditional distribution of the observation yt given theensemble mean xt is

(ytj x

t, u);N

"my1

bs2s

b2s2s 1s2

h/R(x

t2m

x),

s2« 1s2

s

s2h

Rb2s2s 1s2

h

!#. (10)

In other words, the relationship between the observa-tions yt and ensemble means xt is described by a simplelinear regression model for which the intercept, slope,and residual variance parameters are functions of theknown parameters of the signal-plus-noise model. So ifthere was no uncertainty in the signal-plus-noise pa-rameters, one could use Eq. (10) as a basis for post-processing the ensemble means to predict theobservations. Correcting dynamical forecasts by linearregression, also known as MOS (Glahn and Lowry1972), forms the basis for commonly used post-processing techniques in seasonal forecasting (e.g.,Feddersen et al. 1999). In section 3f, we will comparethe simple linear regression approach with a fullyBayesian posterior predictive approach that accountsfor parameter uncertainty.Eade et al. (2014) use the relation between signal-

plus-noise interpretation and linear regression in theirpostprocessing technique for the ensemble mean andthen adjust the distribution of the ensemble membersaround the postprocessed ensemble mean to have thesignal-to-noise ratio implied from the correlation whileretaining year-to-year variability in the ensemblespread. That is, while Eq. (10) assumes a constantvariance, the method of Eade et al. (2014) allows fortime-varying ensemble variance. But Tippett et al.(2007) have shown for seasonal precipitation forecaststhat retaining the year-to-year variability of the ensemblevariance does not improve the forecasts. The question ofwhether the ensemble spread should influence the width

of the forecast distribution in seasonal NAO forecastingis not addressed further in this paper.

3. Application to seasonal NAO hindcasts

a. The data

The signal-plus-noise model is demonstrated hereby application to seasonal forecasts of the winter(December–February mean) NAO, discussed in Scaifeet al. (2014). Seasonal NAO predictability has furtherbeen studied by Doblas-Reyes et al. (2003), Eade et al.(2014), and Smith et al. (2014). NAO is defined here asthe difference in sea level pressure between the Azoresand Iceland [or nearest model grid points to these twolocations; cf. Scaife et al. (2014)]. A 24-member en-semble hindcast was generated annually from 1992 to2011 by the Met Office GloSea5, using lagged initiali-zation between 25 October and 9 November [detailsabout GloSea5 can be found in MacLachlan et al.(2014)]. Raw forecast and observation data are shown inFig. 1. In Table 1, we show a number of summary sta-tistics of the hindcast data. (Data used to generate fig-ures, graphs, plots, and tables are freely available viacontacting the lead author at [email protected].)

b. Prior specification

We use the following independent prior distributionfunctions for the model parameters: mx, my ;N (0, 302),s2s ;G21(2, 25),s2

«, s2h ;G21(3, 100), andb;N (1,0:72),

where G21(a, b) denotes the inverse-gamma distribu-tion with shape parameter a and scale parameter b. Arandom variable X;G21(a, b) has a density functionproportional to x2a21 exp(2b/x). The inverse-gammadistribution was chosen as a prior because it is acommon choice for variance parameters that can sim-plify Bayesian calculations. The prior distributions on mx

and my are very wide and uninformative, and we found

FIG. 1. Raw winter NAO ensemble data generated by GloSea5(small gray-shaded squares), ensemble mean forecasts (large graysquares), and verifying NAO observations (circles).

1000 JOURNAL OF CL IMATE VOLUME 29

Scaife et al. (GRL 2014)

Siegert et al. (JClim 2016)

calibrated

non-calibrated

Page 27: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

The specific skill measures for deterministic, cate-gorical, and probabilistic forecasts considered in thisstudy are described next. We should point out that thedesign of the Monte Carlo experiment described abovecorresponds to the case of conditional skill estimatewhere change in the PDF (or the signal) for each forecast–observation pair remains invariant. For the real-worldsituation, however, this is not the case as the signal (orthe PDF) for each forecast–observation pair (or theforecast season) can differ. The skill estimates for a realworld situation are often referred to as unconditionalskill estimate (Kumar 2007). As discussed in appendixB, in relation to the SN there is equivalence between theexpected value of conditional and unconditional skill,and the Monte Carlo analysis of uncertainty estimates inskill measures following the ‘‘conditional case’’ is muchmore straightforward.

a. AC

For the deterministic forecast we consider the anom-aly correlation between the observed and the forecasttime series. The forecast in this case is the first momentof the PDF of the seasonal means, and for dynamicalseasonal prediction methodologies, is computed as theensemble mean. The first moment of the PDF for thetarget season is also the shift in the PDF of the seasonalmean from zero (i.e., from the first moment for theclimatological PDF; see Fig. 1). Further details, and forlarge n the computation of the expected value of the ACappears in appendix A.

b. HSS

For the categorical forecasts, the HSS measures theforecast success rate relative to a random guess as theforecast. In the present case categorical forecasts forterciles are considered (see Fig. 1). If standard deviationof normally distributed observations is assumed to beunity, the below and upper tercile boundaries are at

20.43 and10.43, respectively. Depending on their value,individual observations for the time series of size n canfall in the below-normal, normal, or the above-normaltercile with their probability given by the PDF for thetarget season in Fig. 1. The forecast tercile category isassumed to be the tercile that has the largest probabilityof occurrence, and for positive shifts in the PDF, theabove-normal tercile is always the forecast category.For large n the computation of the expected value ofHSS appears in appendix A.

c. RPSS

For the assessment of probabilistic skill measure, andits variation with the size of the verification time series,the RPSS is utilized. As for the HSS, probability fore-casts for the tercile are considered. The individual ob-servations in the verification time series can fall in anyone of the three categories with probability determinedby the PDF for the target season in Fig. 1. The forecastprobabilities, based on the perfect knowledge of thePDF, are perfectly known and do not change. Onceagain, for large n, the computation of the expected valueof the RPSS appears in appendix A.

3. Results

a. Expected value of skill measures

For large verification time series, the expected valueof the three skill measures for different signal-to-noiseratios is shown in Fig. 2. Curves in Fig. 2 are based ontheoretical considerations (see appendix A) and havealso been verified based on the Monte Carlo approachwith large verification time series (with n ; 106). Asexpected, for zero SN the expected value of all skillscores is 0, and for large SN (corresponding to a largeshift in the PDF of seasonal means relative to the cli-matological PDF) the expected value of skill scores

FIG. 2. Variation in various measures of skill (y axis) and signal-to-noise ratio (x axis) for (left) AC, (middle) HSS, and (right) theRPSS. The signal-to-noise ratio is defined as the ratio of the mean shift between the climatological PDF and the PDF for the target seasonand the spread of the PDF (i.e., m/s).

2624 MONTHLY WEATHER REV IEW VOLUME 137

expe

cted

cor

rela

tion

ACC

signal-to-noise(µ/s)

Kumar (MWR 2009)

Correlation skill and signal-to-noise (S/N) ratio

“The expected value, however, is only realized for long verification time series. In practice, the verifications for specific seasons seldom exceed a sample size of 30. The estimates of skill measure based on small verification time series, because of sampling errors, can have large departures from their expected value.”

The expected value for various measures of skill for seasonal climate predictions is determined by the S/N ratio.

asymptotes to 1. Within these two limits the rate of in-crease in the expected value with the SN is a function ofthe skill measure. For example, the AC has the fastestrate increase while the RPSS has the slowest. Variouscurves in Fig. 2 can also be used to provide a transfor-mation between the expected value of one skill score tothe corresponding value for the other skill score (alsosee Barnston 1992).For small SN, the small expected value for the skill

scores is not because of any inaccuracy in the forecast,as the forecasts are based on the perfect knowledge ofthe PDF and are the best they can get, but because ofthe variability in the observations from one event toanother making them different from the correspondingforecast (for more discussion see Kumar and Hoerling2000 and Kumar et al. 2001).

b. Variability in skill scores for smallverification time series

We next analyze the variability in the skill scores fromtheir expected value because of sampling errors result-ing from small verification time series. This variability isquantified as the standard deviation of the 5000 skillestimates. The spread in the skill score for differentlengths of the verification time series, and for differentexpected values of the skill score (which has one-to-onemapping onto the SN, see Fig. 2), are shown in Fig. 3.We start our discussion with the analysis for the AC

(Fig. 3, left panel). The standard deviations of AC basedon 5000 estimates for the lengths of the verification timeseries incrementing from 10 to 50 are shown. The fol-lowing points are noted: the largest spread in the esti-mates of AC occurs for smaller expected values of theAC; the spread in skill estimates decreases monotoni-cally for higher expected values of AC; larger and largerverification time series lead to a relatively smaller re-duction in the spread. For example, differences in thespread for a verification time series of size 10 versus 20

are much larger than they are for verification time seriesof 40 versus 50. In other words longer and longer veri-fication time series result in diminishing returns in termsof increased accuracy in the estimate for the AC. Thissituation is analogous to diminishing returns in terms ofimprovements in the expected value of skill with in-creasing ensemble sizes (Kumar and Hoerling 2000); forsmall expected values of AC , 0.3, and for the verifi-cation time series of n; 25, the spread in the estimate ofAC is of the same order as the expected value of theAC. This corresponds to the typical scenario of seasonalprediction in the extratropical latitudes (e.g., over NorthAmerica) where the SN is small and verification timeseries for an individual season (e.g., DJF time mean) israrely larger than 25 yr (Kumar 2007).A physical explanation for the general behavior of the

variation in spread with the expected value of the AC isstraightforward. For small verification time series vari-ations in the estimates of the AC from their expectedvalues are due to the inadequate sampling of the PDFfor the observed time series, as the ‘‘deterministic’’forecast is always the same. For a small positive shift inthe PDF (or for small SN corresponding to the smallexpected value of AC) the observed seasonal means inthe verification time series are distributed around theclimatological mean of zero and could either be positiveor negative while the forecasts always have a positivevalue. And therefore, for particular instances of thesampled observed time series, the AC can have largevariations. As the length of the verification time seriesincreases and the PDF is better sampled, the estimatedvalue of AC approaches its expected value of 0, and atthe same time the sampling variability goes down. For alarge SN, even though the PDF may not be adequatelysampled by a small observed time series, because oflarge shift in the PDF from its climatological mean,observed values have a high probability of being posi-tive, and the estimated value of the AC is always close

FIG. 3. Variations in the spread of estimates of skill measure (y axis) with the expected values of skill score (x axis) for (left) AC,(middle) HSS, and (right) RPSS. The departures in the skill estimates are due to the small verification time series, and spread is shown forverification time series of sizes 10–50 (with an increment of 10).

AUGUST 2009 KUMAR 2625

expected correlation

unce

rtain

ty o

f cor

rela

tion

verification sample size

Probability of expected correlation for a given realised value of skill

increases. Also, as the length of the verification timeseries increases from 20 to 50, the probability curve alsonarrows. Finally, the probability values are generallyskewed with higher probabilities for lower expectedvalues than for the higher one. This is to be expectedfrom Fig. 2 where spread in the estimates of skill scoredecreases with increasing expected value (except for theRPSS). The probability curves in Fig. 4 for various skillmeasures, therefore, provide an estimate that if for averification time series the realized value of skill mea-sure is ‘‘a’’ then what are the probabilities for the trueexpected value in the range of 0–1?

4. Summary and discussion

For small verification time series, the computed valueof skill measure has a contribution from the samplingerrors. Even if the forecasts were perfect (e.g., are basedon a very large ensemble), the sampling errors are dueto the unpredictable variations in the observations fromthe forecasts. In this paper we have documented how

the spread in the estimates of skill measures varies fordifferent SN ratios and for the length of the verificationtime series.An immediate context for the analysis is the need of

providing an estimate of skill together with the real-time seasonal predictions (Kumar 2007). Such skill es-timates are either obtained from the history of real-timeforecasts, or are obtained from the hindcasts (or theretrospective forecasts). In either case, the verificationtime series for the target season seldom exceeds 30 yr;therefore, skill estimates are influenced by samplingerrors. The analysis provided in this paper describes aneed for including errors bars with the estimates of skill,as sampling variations can have a large contribution to asingle estimate of skill. This is particularly important forseasonal predictions in the extratropical latitudes wherethe SN is low, and the spread in the skill estimates canbe as large as their expected value.One caveat in the analysis presented here is that the

forecasts are assumed to be perfect, for example, ifbased on dynamical seasonal prediction methodologies,

FIG. 4. Probabilities (y axis) that if a realized value of skill estimate is (left) 0.3, (middle) 0.5, or (right) 0.7, what is the expected valuefor the skill measure (x axis): (top) AC, (middle) HSS, and (bottom)RPSS. The dashed curves are for the verification time series of length20 and the solid curves are for the verification time series of length 50.

AUGUST 2009 KUMAR 2627

r=0.3 r=0.5 r=0.7

verification sample size

50

20

expected correlation

Page 28: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

The Ratio of Predictable Components (RPC)

Predictable Components (PCs) … predictable part of the total variance

observed Pcobs … estimated from explained variance = r2(obs, ensmean)

model Pcmodel … estimated from ratio of signal variance to total variance

𝑅𝑃𝐶 =𝑃𝐶'84𝑃𝐶#'97)

≥𝑟 𝑜𝑏𝑠, 𝑒𝑛𝑠 𝑚𝑒𝑎𝑛

⁄𝑉𝐴𝑅456!() 𝑉𝐴𝑅&'&()

Do seasonal-to-decadal climate predictionsunderestimate the predictabilityof the real world?Rosie Eade1, Doug Smith1, Adam Scaife1, Emily Wallace1, Nick Dunstone1,Leon Hermanson1, and Niall Robinson1

1Met Office Hadley Centre, Exeter, UK

Abstract Seasonal-to-decadal predictions are inevitably uncertain, depending on the size of the predictablesignal relative to unpredictable chaos. Uncertainties can be accounted for using ensemble techniques,permitting quantitative probabilistic forecasts. In a perfect system, each ensemble member would represent apotential realization of the true evolution of the climate system, and the predictable components in modelsand reality would be equal. However, we show that the predictable component is sometimes lower in modelsthan observations, especially for seasonal forecasts of the North Atlantic Oscillation and multiyear forecastsof North Atlantic temperature and pressure. In these cases the forecasts are underconfident, with eachensemble member containing too much noise. Consequently, most deterministic and probabilistic measuresunderestimate potential skill and idealized model experiments underestimate predictability. However, skilfuland reliable predictions may be achieved using a large ensemble to reduce noise and adjusting the forecastvariance through a postprocessing technique proposed here.

1. Introduction

Individual weather events are generally not predictable more than a couple of weeks ahead. This is becausethe atmosphere is chaotic, so that infinitesimal differences in initial conditions grow over a few days intolarge-scale disturbances [Lorenz, 1963]. However, the atmosphere can be influenced by predictable slowlyvarying factors, leading to a prolonged shift in the climate. For example, in the tropics, sea surface temperature(SST) in the Pacific varies between warm and cool conditions every few years during the El Niño–SouthernOscillation (ENSO), influencing seasonal temperature and rainfall in many regions [e.g., Trenberth and Caron,2000; Alexander et al., 2002; Smith et al., 2012]. In the extratropics, North Atlantic SST varies on multidecadaltime scales, often referred to as the Atlantic Multidecadal Oscillation or Atlantic Multidecadal Variability(AMV), with associated decadal changes in climate over Europe, America, and Africa [Knight et al., 2006; Zhangand Delworth, 2006; Sutton and Hodson, 2007; Sutton and Dong, 2012], the Atlantic storm track positionand/or strength [Wilson et al., 2009; Woollings et al., 2012; Frankignoul et al., 2013], and Atlantic hurricanefrequency [Goldenberg et al., 2001; Smith et al., 2010; Dunstone et al., 2011]. Other drivers of atmosphericvariability include external factors such as solar variability, changes in greenhouse gases, changes in aerosols,and internal variability including the Madden Julian Oscillation, sudden stratospheric warmings, and theIndian Ocean dipole [Smith et al., 2012].

Seasonal-to-decadal climate predictions aim to predict these drivers and their influence on the atmosphereusing coupled general circulation models (GCMs) of the atmosphere, ocean, land, and cryosphere [e.g.,Doblas-Reyes et al., 2013; Meehl et al., 2014; Smith et al., 2012; Kirtman et al., 2013]. They therefore predictchanges in climate and the frequency of associated extreme events [Hamilton et al., 2012; Eade et al., 2012].Time-averaged predictions may be decomposed into two components: (1) unpredictable noise resultingfrom the chaotic nature of the atmosphere and (2) a component that is potentially predictable because it isconstrained by predictable factors, such as ENSO, AMV, or external forcing. Uncertainties are inevitable inseasonal-to-decadal prediction and depend on the size of the predictable signal relative to unpredictablechaos (signal-to-noise ratio), as well as imperfections in observations used for initial conditions, the fidelity ofGCMs, and uncertainties in projected radiative perturbations [e.g., Hawkins and Sutton, 2011]. Ensemblesare therefore created to assess uncertainties and enable quantitative probabilistic forecasts to be made[e.g., Palmer et al., 2004; Wang et al., 2009].

EADE ET AL. ©2014. The Authors. 5620

PUBLICATIONSGeophysical Research Letters

RESEARCH LETTER10.1002/2014GL061146

Key Points:• Model members can be too noisyand not potential realizations ofthe real world

• Predictability may be underestimatedby idealized experiments andskill measures

• Can achieve skilful and reliable forecastsusing large ensembles to reduce noise

Supporting Information:• Readme• Tables S1 and S2 and Figures S1–S8

Correspondence to:R. Eade,[email protected]

Citation:Eade, R., D. Smith, A. Scaife, E. Wallace,N. Dunstone, L. Hermanson, andN. Robinson (2014), Do seasonal-to-decadal climate predictions under-estimate the predictability of thereal world?, Geophys. Res. Lett., 41,5620–5628, doi:10.1002/2014GL061146.

Received 7 JUL 2014Accepted 24 JUL 2014Accepted article online 26 JUL 2014Published online 8 AUG 2014

This is an open access article under theterms of the Creative CommonsAttribution-NonCommercial-NoDerivsLicense, which permits use and distri-bution in any medium, provided theoriginal work is properly cited, the use isnon-commercial and no modificationsor adaptations are made.

Eade et al. (GRL 2014)

Page 29: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Perfect model ensembles and potential skill

Properties of a perfect model ensemble§ Time-mean ensemble spread == RMSE of ensemble mean forecast§ Correlation skill of a perfect ensemble (or potential skill) =

correlation(ensemble mean, ensemble member)§ Observed correlation ≤ perfect model correlation ??

§ RPC of a perfect ensemble == 1

Page 30: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

The signal-to-noise “conundrum” or “paradox”

Eade et al. (GRL 2014)

RPC of DJF MSLP in GloSea5representations of troposphere, stratosphere, landsurface and ocean, and the associated assimilationsystems and other procedures to initialize forecasts.Atmosphere resolution is TL255 (with a gridresolution of 80 km) and 91 vertical levels, fullycovering the stratosphere. The ocean has NEMOORCA1 configuration, and initial conditions comefrom the ECMWF Ocean Reanalysis System 4(ORAS4) reanalysis [Balmaseda et al., 2013]. Thesystem became operational in November 2011and is calibrated using a set of reforecastscovering the period of 1981–2010. The reforecastshave recently been extended to an ensemblesize of 51, and results presented here are forDecember-January-February (DJF) seasonal meanvalues from integrations starting on 1 November.

Monthly values of the model AO index from eachintegration are defined by projection onto apattern obtained from an EOF analysis of theERA-Interim (ERAI) reanalysis. The pattern is thefirst EOF of the mean sea level pressure (MSLP)

monthly anomalies in the circumpolar region 20N–90N for the period of 1981–2010, where data from all12 calendar months are analyzed together. Our observed AO index, calculated by projecting ERAI data ontothis pattern, correlates with the index from NOAA’s Climate Prediction Center at about 0.996, and ourpattern is a very close match to theirs (not shown) despite using a different set of years to calculate the EOF.

The ensemblemean AO forecast is then verified against the analyzed AO index. Figure 1 shows the time seriesof the observed and predicted AO, where the years are labeled according to the start of the forecast (e.g.,1988 forecast verifies in DJF 1988/1989). The ensemble mean AO from the model is scaled by a factor of 6 forvisual presentation, for reasons to be discussed shortly. The correlation is 0.608, and many of the major peaks(e.g., 1988/1989 and 2009/2010) are well captured. Some of the biggest failures are in years affected byvolcanic eruptions (1991 after Pinatubo and 1982 after El Chichon). Since volcanic aerosol can affect the NHwinter circulation [e.g., Robock, 2000], and since S4 does not represent this effect very well, one might chooseto exclude the 2 years after each of these eruptions from the analysis. If this is done, the correlation increasesto 0.732 for the 26 nonvolcanic winters, although sampling limitations make the actual value of thesecorrelations uncertain.

The 30 year correlation has a 95% confidence interval of 0.56 to 0.81, obtained by a basic bootstrap overensemble members. A jackknife gives comparable intervals, but note that a percentile bootstrap cannot beused because the correlation skill of a resampled ensemble is biased compared to the full sample. Samplingerrormeans that the correlation of 0.608 is more likely to underestimate rather than overestimate the value thatwould be obtained with a large ensemble, which is why the confidence interval is centered on values higherthan the measured value. Although our bootstrap method gives us a higher “best guess” correlation, we retainthe directly calculated value as a conservative estimate.

The high level of correlation (0.608) is surprising, because the ensemble members themselves show a highdegree of spread. This can be quantified by the ratio of interannual signal (S) to intraensemble noise (N)standard deviations for the set of forecasts. The S/N ratio, calculated using unbiased variance estimators,is 0.09. (Since this is so small, there is substantial sampling uncertainty despite the large sample sizes; ajackknife estimate of the 95% confidence interval is 0.02 to 0.19). Although the total variance of themodel AOhas realistic amplitude for individual ensemble members, this low S/N ratio results in the ensemble meananomalies being very small, since the noise averages out in a large ensemble. This is why Figure 1 shows themodel ensemble mean AO values with a scaling.

If reality is like the model, the actual outcome for a given year would have little predictability, and theexpected correlation between observations and even a perfect forecasting system would be low (variance

1980 1984 1988 1992 1996 2000 2004 2008 2012−3

−2

−1

0

1

2

3

AODJF, scaling factor=6

ObsSys 4

Figure 1. The Arctic Oscillation index for DJF, as analyzedfrom ERAI (blue) and as predicted by the S4 ensemble meanfrom 1 November (red). The S4 ensemble mean is scaled by afactor of 6 to be of comparable amplitude to the observedindex.

Geophysical Research Letters 10.1002/2014GL062681

STOCKDALE ET AL. ©2015. American Geophysical Union. All Rights Reserved. 1174

Ensemble hindcasts of the AO index 1981-2010 with the ECMWF System 4

r=0.61 S/N=0.1

Stockdale et al. (GRL 2015)

The real world seems to have higher predictability than the model.

(𝐑𝐏𝐂 =𝐏𝐂𝐨𝐛𝐬𝐏𝐂𝐦𝐨𝐝𝐞𝐥

)

Page 31: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

MSLP anomaly correlation skill

real-world skill perfect-model skill

DJF

JJA

Page 32: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Anscombe’s quartet

Illustrative example of correlation drawbacks after Anscombe (1973):

§ Four pairs of x-y variables

§ The four y variables have the same mean (=7.5), variance (=4.1) and correlation (=0.82)

§ However, distributions of variables are very different

Normally distributed,“well behaved”

Not normally distributed,non-linear relationship

Perfect linear relationship,Except for one outlier

No relationshipwith one outlier

Anscombe (Amer. Statist. 1973)

Page 33: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

§ A very long data set of seasonal hindcasts to study changes in predictability

§ Use of ECMWF’s re-analysis of the 20th Century (ERA-20C) that spans the 110-year period 1900 to 2010 to initialise atmospheric seasonal forecasts with ECMWF’s forecast model

§ SSTs and sea-ice are prescribed using HadISSTs

§ Seasonal re-forecast experiments over the period 1900-2010

§ Large ensemble of 51 perturbed members

§ Focus here: 4-month forecast initialised on 1st of Nov each year to cover boreal winter (DJF) season

§ More details in Weisheimer et al. (QJRMS 2017) and O’Reilly et al. (GRL 2017)

§ Coupled Seasonal Forecasts of the 20th Century (CSF-20C), see Weisheimer et al., (BAMS 2020)

Atmospheric Seasonal Forecasts of the 20th Century (ASF-20C)

Page 34: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Z500 anomaly correlation skill

real-world skill perfect-model skill

System 4

SEAS5

ASF-20C

From Weisheimer et al. (2018)

Page 35: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

Multi-decadal variability in skill and predictability in ASF-20C

From Weisheimer et al. (2018)

r=0.31 (p<0.01)

RPC = 1.02

Page 36: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

RPC throughout the 20th Century

1912-1940

1942-1970

1981-2009

1900-2009

The relationship between predictable components in

the real world and in the perfect model scenario has

changed throughout the 20thCentury

From Weisheimer et al. (2018)

Why …?

Page 37: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

ReferencesAnscombe, F.J. 1973. Graphs in statistical analysis. The American Statistician 27, 1, 17-21. See also

https://en.wikipedia.org/wiki/Anscombe%27s_quartet.

Eade, R., D. Smith, A. Scaife, E. Wallace, N. Dunstone, L. Hermanson, and N. Robinson. 2014. Do seasonal-todecadalclimate predictions underestimate the predictability of the real world?, Geophys. Res. Lett., 41, 5620–5628, doi:10.1002/2014GL061146.

Johnson, S.J., Stockdale, T. N., Ferranti, L., Balmaseda, M. A., Molteni, F., Magnusson, L., Tietsche, S., Decremer, D., Weisheimer, A., Balsamo, G., Keeley, S., Mogensen, K., Zuo, H., and Monge-Sanz, B. (2019). SEAS5: The new ECMWF seasonal forecast system, Geosci. Model Dev. Discuss., 12, 1087–1117, doi:10.5194/gmd-2018-228

Kumar, A., A.G. Barnston and M.P. Hoerling. 2001. Seasonal predictions, probabilistic verifications, and ensemble size. J. Clim., 14, 1671-1676.

Kumar, A. 2009. Finite samples and uncertainty estimates for skill measures for seasonal prediction. Mon. Weather Rev., 137, 2622-2631.

O’Reilly, C. H., J. Heatley, D. MacLeod, A. Weisheimer, T. N. Palmer, N. Schaller, and T. Woollings. 2017. Variability in seasonal forecast skill of Northern Hemisphere winters over the twentieth century, Geophys. Res. Lett., 44, 5729–5738, doi:10.1002/2017GL073736.

O’Reilly, C.H., A. Weisheimer, T. Woollings, L. Gray and D. MacLeod. 2018b. The importance of stratospheric initial conditions for winter North Atlantic Oscillation predictability and implications for the signal-to-noise paradox. Q. J. R. Meteorol. Soc., doi:10.1002/qj3413.

Palmer, T., R. Buizza, R. Hagedorn, A. Lawrence, M. Leutbecher and L. Smith (2006). Ensemble prediction: A pedagogical perspective. ECMWF Newsletter, 106, 10-17.

Page 38: Assessing the skill of SEAS5 - ECMWF Events (Indico) · 2020-03-30 · Assessing the skill of SEAS5 and The signal-to-noise paradox With contributions from Stephanie Johnson and Tim

ReferencesScaife, A.A., et al. (2014). Skilful long-range prediction of European and North American winters. Geophys. Res. Lett., 41,

2514-2519, doi:10.1002/2014GL059637.

Scaife, A.A. and D. Smith (2018). A signal-to-noise paradox in climate science. npj Climate and Atmospheric Science, 1, 28.

Shi, W., N. Schaller, D. MacLeod, T.N. Palmer and A. Weisheimer. 2015. Impact of hindcast length on estimates of seasonal climate predictability. Geophys. Res. Lett., 42, doi:10.1002/2014GL062829.

Siegert, S., D.B. Stephenson and P.G. Sansom (2016). A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability? J. Clim., 29, 995-1012, doi: 10.1175/JCLI-D-15-0196.1

Stockdale, T.N., F. Molteni, and L. Ferranti. 2015. Atmospheric initial conditions and the predictability of the Arctic Oscillation, Geophys. Res. Lett., 42, 1173–1179, doi:10.1002/2014GL062681.

Stockdale, T., S. Johnson, L. Ferranti, M. Balmaseda and S. Briceag. 2018. ECMWF’s new long-range forecasting system SEAS5. ECMWF Newsletter, 154, 15-20.

Weisheimer, A., N. Schaller, C. O'Reilly, D. MacLeod and T.N. Palmer. 2017. Atmospheric seasonal forecasts of the 20th Century: multi-decadal variability in predictive skill of the winter North Atlantic Oscillation and their potential value for extreme event attribution. Q. J. R. Meteorol. Soc., 143, 917-926, doi:10.1002/qj.2976.

Weisheimer, A., D. Decremer, D. MacLeod, C. O'Reilly, T. Stockdale, S. Johnson and T.N. Palmer (2018). How confident are predictability estimates of the winter North Atlantic Oscillation? Q. J. R. Meteorol. Soc., doi:10.1002/qj.3446

Weisheimer, A., D. Befort, D. MacLeod, T.N. Palmer, C. O'Reilly and K. Strommen (2020). Seasonal forecasts of the 20th Century Bull. Amer. Meteor. Soc., in press, doi:10.1175/BAMS-D-19-0019.1