Climatic Data Analysis and Diagnostics - CLIMsystems Ltddocuments.climsystems.com/services/CLIMsystems-data-analysis-s… · The Climatic Data Analysis and Diagnostics document is

Head Office 9 Achilles Rise Flagstaff Hamilton, New Zealand 3210 Phone: +64 7 834 2999 Mobile: +64 27 316 9777 Email: [email protected] Web: http://climsystems.com/

Climatic Data Analysis and

Diagnostics

March 2014

CLIMsystems Ltd

By Dr. Chonhua Yin

Reviewed by Dr. Peter Urich

mailto:[email protected]

2 | P a g e

Introduction

The Climatic Data Analysis and Diagnostics document is one of a series of ‘living’ documents

developed by the staff at CLIMsystems to provide brief overviews of the range of experience held by

our extremely well trained and active staff.

The methods used to carry out climatic data analysis and could be simple or very complex. The most

common statistic perhaps is the average of some variables (e.g., temperature and precipitation).

However, solely knowing the average would be not enough. Sometimes, it might be misleading. For

example, the average temperature may be consistent with previous time spans but the variance may

have changed in some 'significant' way. Thus, other analysis might be more important such as trend

analysis and extreme value analysis (EVA).

Classical statistical methods use the assumption of stationarity which implies that a variable's

distribution (e.g., mean, variance, no trend, etc.) does not vary with time. Obviously, the stationarity

assumption is violated under a climate changing (e.g., warming) scenario. Therefore, some specific

methods have to be used to deal with such an issue such as detrending, bias-correction and

downscaling, etc. In addition to conventional statistics, there are 'diagnostics' which are used to

assess the nature of climate variations on differing time scales.

Derivative climate metric

There is a general consensus within the climate community that any change in the frequency or

severity of extreme climate events would have profound impacts on nature and society. It is thus

very important to analyse extreme events. As more GCMs and their daily data are available in the

CMIP5 archive, SimCLIM derive more climate indices, which can make climate data directly relevant

to on-the-ground adaptation decision making.

Interpreting climate change is challenging without running the type of sophisticated climate impact

models. However, derivative climate metrics allow for a more easy and intuitive way to interpret

changes to daily climate data which can be interpreted as surrogates for impacts agriculture, water

supply, flood risk, human health, and energy demand and ecosystem resilience.

Methods for calculating derivative metrics

A suite of 27 core statistics are calculated using data (Table 1 and 2), with most of these statistics

being widely used for many years in the climate community for characterizing extreme events (e.g.,

Easterling et al., 2003). The calculation of each statistic is straightforward and some more complete

descriptions can be found in the literature (e.g., Schulzweida et al., 2011, von Engelen et al., 2008).

Heating and cooling degree days were calculated using a base temperature of 18 °C (65 °F) following

the Encyclopedia of World Climatology (Oliver, 2005). Growing degree days were calculated using a

base temperature of 10 °C with no upper threshold used.

Table 1: Temperature-based derivative climate metrics calculated from daily data (16 in total)

http://en.wikipedia.org/wiki/Average

http://en.wikipedia.org/wiki/Variance

http://en.wikipedia.org/wiki/Trend_estimation

http://en.wikipedia.org/wiki/Extreme_value_theory

http://en.wikipedia.org/wiki/Stationary_process

3 | P a g e

Long Name Variables Units Description Annual Monthly

Average Low Temperature

tasmin °C Monthly mean of daily minimum temperatures

yes

Average High Temperature

tasmax °C Monthly mean of daily maximum temperature

yes

Daily Temperature Range

dtr °C Monthly mean difference between tasmax and tasmin

yes

Hottest Temperature

txx °C Maximum temperature for the month and year

yes yes

Coldest Temperature

tnn °C Minimum temperature for the month and year

yes yes

Hot Days Temperature

tx90 °C Maximum temperatures exceeded the hottest 10% of all days per year

yes

Number of Frost Days

fd days

Frost days (min temperature lower than 0°C)

yes yes

Number of Summer Days

su days Annual count of days when daily tasmax greater than 25°C

yes

Number of Warm Days

tx90p % Very warm days percent: percent of time that daily Tmax values exceed the reference period (1961-1990) 90th percentile Tmax

yes yes

Number of Cold Days

tx10p

% Very cold days percent: percent of time that daily Tmax values are below the reference period (1961-1990) 10th percentile Tmax

yes yes

Number of Warm Nights

tn90p % Warm nights percent: percent of time that daily Tmin values exceed the reference period (1961-1990) 90th percentile Tmin

yes yes

Number of Cold Nights

tn10p % Cold nights percent: percent of time that daily Tmin values are below the reference period (1961-1990) 10th percentile Tmin

yes yes

Heat Wave Duration Index

hwdi days Heat wave duration index, number of days per year within intervals of at least 6 days of Tmax>(5°C+Tmax normal for historic period). Normal Tmax for historic period is a 5-day running mean

yes

Growing Degree Days

gd10 days Growing degree days, for Tavg, sum of degrees > 10°C for each day, but month and year

yes yes

Heating Degree Days

hd18 days

Heating degree days, calculated with 18°C base temperature, by month and year

yes yes

Cooling Degree Days

cd18 days

Cooling degree days, calculated with 18°C base temperature, by month and year

yes yes

4 | P a g e

Table 2: Precipitation-based derivative climate metrics calculated from daily data (11 in total).

Long Name Variables Units Description Annual Monthly

Total Precipitation

pr mm

Total precipitation for the month and year

yes yes

Consecutive Dry Days

cdd days

largest number of consecutive dry days (with daily pr<1mm) per year

yes

Number of Dry Periods

cdd5 days

number of consecutive dry day periods of length > 5 days, per year

yes

Number of Wet Days

r02 days

Number of wet days (with precipitation > 0.2mm/day), per month and year

yes yes

Wet Days

r90p % Percent of wet days per year with rainfall>90-percentile wet-day precipitation, where percentiles are based on ref period 1961-1990. Only days with rainfall>1 mm are considered 'wet'

yes yes

Wet Day Rainfall

r90ptot %

Precipitation percent per year due to days with precipitation>90-percentile reference period precipitation

yes yes

5 Day Rainfall

rx5d mm

Maximum 5-day precipitation total per year

yes yes

Daily Rainfall

sdii mm/day

Simple daily intensity index: the mean daily precipitation on 'wet' days (>1mm)

yes yes

1 Day Rainfall rx1d mm Maximum 1-day precipitation total per year

yes yes

Maximum length of Dry Spell

cdd days Maximum number of consecutive day with daily precipitation <1mm

yes

Maximum length of Wet Spell

cwd days Maximum number of consecutive day with daily precipitation ≥1mm

yes

Derivative Climate Metric Applications

To better understands the utility of these metrics, they have been classified by how they relate to

specific real-world-applications, such as assessing crop productivity, water supply, flood risk, human

health, energy demand, and ecosystem resilience.

Crop productivity relies on many different climate factors including total precipitation, growing

degree days, dry days, and average low and high temperatures.

5 | P a g e

Water supply is focused on three precipitation variables: total precipitation—quantifying average

water input into the system; and two measures of dryness and drought conditions—consecutive dry

days and number of dry periods.

Flood risk is driven by rainfall average, measures of wet day rainfall and short term maximum rainfall

intensities.

Human health focuses solely on temperature stress (hot and cold) to people: hottest and coldest

single day temperature; number of warm days and cold nights; and the heat wave duration index.

Energy demand incorporates heating and cooling demand using heating and cooling degree days.

Ecosystem resilience to climate change is complex and so incorporates many different aspects

including total precipitation, dry conditions, extreme hot and cold temperatures, and growing

degree days.

Advanced Data Analysis and Diagnosis

Drought Indices - SPI an SPEI

The Standardized Precipitation Index (SPI) is a way of measuring drought that is different from the

Palmer drought index (PDI). Like the PDI, this index is negative for drought, and positive for wet

conditions. But the SPI is a probability index that considers only precipitation, while Palmer's indices

are water balance indices that consider water supply (precipitation), demand (evapotranspiration)

and loss (runoff).

The Standardized Precipitation Evapotranspiration Index (SPEI) is an extension of the widely used

Standardized Precipitation Index (SPI). The SPEI is designed to take into account both precipitation

and potential evapotranspiration (PET) in determining drought. Thus, unlike the SPI, the SPEI

captures the main impact of increased temperatures on water demand. Like the SPI, the SPEI can be

calculated on a range of timescales from 1-48 months. At longer timescales (>~18 months), the SPEI

has been shown to correlate with the self-claibrating PDSI (sc-PDSI).

If only limited data are available, say temperature and precipitation, PET can be estimated with the

simple Thronthwaite method. In this simplified approach, variables that can affect PET such as wind

speed, surface humidity and solar radiation are not accounted for. In cases where more data are

available, a more sophisticated method to calculate PET is often preferred in order to make a more

complete accounting of drought variability. However, these additional variables can have large

uncertainties.

http://www.ncdc.noaa.gov/oa/climate/research/prelim/drought/spi.html#drought

6 | P a g e

Drought/Extreme Sea Level Rise/ Sand Storm/Hydrological Analysis Using

Copulas

The return period of a given event is usually defined as the average time elapsing between two

successive realizations of the event itself. In the applications, and especially in engineering, the

design of works usually adopt the return period of a prescribed event as a common criterion for

sizing the manufactures: indeed, the return period provides a very simple, yet efficient, means for

doing risk analysis because it is able to concentrate into a single number a large amount of

information.

In most situations the analysis of the return period involves univariate cases; unfortunately, this may

lead to an over/underestimation of the risk associated with a given event. As a matter of fact, these

events are often characterized by the joint behaviour of several random variables (RV), and these

are usually non-independent. For instance, drought events are characterized by duration-magnitude-

intensity (Fig. 1).

As a consequence, the relevant events should better be defined in terms of two or more variables;

this makes things complicated since the family of interesting events increases with the number of

variables. A possible way of investigating bivariate/trivariate data consists of studying the

dependence function and the marginals separately. In this respect, copulas exactly describe and

model the dependence structure between random variables, independently of the marginal laws

involved.

Fig.1 Schematic of joint distribution analysis based on Copulus

Multi-timescale extreme precipitation analysis

IDF stands for Intensity-Duration-Frequency. Intensity, duration and frequency are the parameters

that make up the axes of the graph of IDF curves. An IDF Curve is a tool that characterizes an area’s

rainfall pattern. By analysing past rainfall events, statistics about rainfall re-occurrence can be

Severity

Du

ratio

n (

mo

nth

)

Return Periods T(D>=d and S>=s) from Gumbel

3

5

5

10

10

20

20

20

50

50

50

10

01

00

10

0

100

20

02

00

20

0

200

0 2 4 6 8 10 120

1

2

3

4

5

6

7

8

33

3

5

5

5

10

10

20

20

50

100

Severity

Du

ratio

n (

mo

nth

)

Return Periods T(D>=d or S>=s) from Gumbel

0 2 4 6 8 10 120

1

2

3

4

5

6

7

8

7 | P a g e

determined for various standard return periods, for example, the size rainfall event that statistically

occurs every 10 years. Typically 2-, 5-, 10-, 25-, 50-, 500- and 1000-year return periods are shown on

IDF curves.

Rainfall intensity in the IDF Curve is the average rainfall depth that falls per time increment.

Simplified, high rainfall intensity indicates that it’s raining heavily and low intensity that it’s raining

lightly. Typically the rainfall intensity is stated in mm/hr. Lines on the IDF Curve graph represent

probability, for example, the 10-year line would represent rainfall events that have a probability of

occurring once every 10 years. IDF curves are most often used for design. Municipalities and other

approval agencies typically set out standards for design of infrastructure that includes minimum

capacity in terms of rainfall return periods.

IDF curves are created by analysing years of rainfall records. The longer and more complete the

record, the better the quality of the statistical analysis. Long records of rainfall data are also less

likely to represent a short-term rainfall anomaly, for example, a decade of high precipitation that is

not representative of the long-term rainfall pattern of the region. The analysis of the rainfall record

usually begins with Cumulative Frequency Analysis, which requires advanced statistical skills.

CLIMsystems is excellent in carrying out IDF analysis using a dozen methods (e.g., Fig. 2).

Fig.2 Schematic of IDF analysis

Extreme Precipitation Event

CLIMsystems carried out an extreme precipitation event analysis of the seven day Boulder, Colorado

rainfall event that started on September 9, 2013, using the GEV tool in SimCLIM 2013. The analysis is

based on fitting the historical annual maximum values to the Generalized Extreme Value (GEV)

distribution. Future changes were also analysed based on the extreme precipitation change patterns

derived from the daily precipitation output of 22 IPCC AR5 GCMs in combination with an RCP8.5

scenario applying an ensemble pattern scaling approach.

8 | P a g e

The following figure (Fig.3) shows maximum annual rainfall amounts compared to corresponding precipitation

return year estimates from 1 to 10,000 years for durations of 7 days for the rain gage of Boulder (39.99 ˚N,

105.27 ˚W). The bound of the 90% confidence interval is also shown in the figure to illustrate uncertainty

associated with the calculation of GEV distribution, which increases as the return year becomes larger. The

figure showed that the Colorado event started September 9, 2013 (red circle) was equal to an extreme event

with a return year more than 2000 years. Moreover, the figure showed that there might be more extreme

events in the future. Such a change could be identified from two directions (horizontal and vertical arrows). For

example, from a point of view of horizon, the event with the return year of 100 under present climatic

condition would recur more frequently, with about a return year of about 30 under the 2100_RC8.5 pattern.

That is, it will take place within every 30 years. From the vertical direction, the event could become more

intense, whose amount could increase from 220 mm to 300mm.

Fig.3 Extreme rainfall event analysis

Wind Data Analysis

A wind rose is a graphic tool used by meteorologists to give a succinct view of how wind speed and

direction are typically distributed at a particular location. Presented in a circular format, the wind

rose shows the frequency of winds blowing FROM particular directions. The length of each "spoke"

around the circle is related to the frequency of time that the wind blows from a particular direction.

Each concentric circle represents a different frequency, emanating from zero at the centre to

increasing frequencies at the outer circles. The wind roses shown here contain additional

information, in that each spoke is broken down into discrete frequency categories that show the

percentage of time that winds blow from a particular direction and at certain speed ranges. All wind

roses shown here use 16 cardinal directions, such as north (N), NNE, NE, etc.

An example is shown Fig.3. It is an annual wind rose for a station in Australia, based on 23 years of hourly

wind data (all hours of the day). This rose shows that there is not a prevailing wind that most of winds blows

from a single direction at the station. Particularly, the wind around south direction comprises 12% of all hourly

wind directions, slightly higher than other directions. It also shows that the wind rarely blows from the

northeast or the southwest. These wind roses also provide details on speeds from different directions.

Examining winds from the northwest (the longest spoke) one can determine that approximately 7% of the time

the wind blows from the south at speeds blow 10.0 m/s. Similarly, on this spoke it can be calculated that winds

blow from the south at speeds between 10.0 and 15.0 m/sec about 2% of the time (9%-7%), at speeds

http://en.wikipedia.org/wiki/Wind

9 | P a g e

between 15.0 and 20.0 m/sec about 2% of the time (11%-9%), between 20.0 and 25.0 m/sec about 1% of

the time.

Fig.3 illustration of wind rose

Seasonal forecast Indices

The extreme forecast index (EFI, Lalaurette, 2003; Zsoter, 2006), developed at ECMWF, is an example of an index that was designed to identify situations where the medium range ensemble prediction system (EPS) forecasts are detecting extreme situations. Detection of extremes can be accomplished by comparing model forecasts to the underlying model climatology (Lalaurette, 2003; Alfieri et al., 2011). The major advantage of such an approach is that it can be applied everywhere including in areas where observations are sparse or unavailable and that it inherently accounts for the need of forecast calibration as it is based on relative difference between forecasts and model climatology. Dutra et al. (2013) applied EFI calculation for seasonal scale. In the EFI scheme, the probabilities are compared to the model climate (M-climate) distribution for the chosen location, time of year and lead time. The underlying assumption is that, if a forecast is anomalous or extreme, relative to the M-climate, the real weather is also likely to be anomalous or extreme compared to the real climate. The M-climate is based on 1999-2010 hindcast climatology re-runs of the current CFSv2 model. If the CFSv2 probability distribution agrees with the M-climate distribution then EFI = 0. If the probability distribution (mean, spread and asymmetry) does not agree with the climate probability distribution, the EFI takes non-zero values. In the special case where all the CFSv2 members forecast values above the absolute maximum in the M-climate, the EFI = +1; if they all forecast values below the absolute minimum in the M-climate the EFI = -1.

10 | P a g e

Although higher EFI values indicate that an extreme event is more likely than usual, the values do not represent probabilities, as such. Any forecasts or warnings must be based on a careful study of probabilistic and deterministic information. Since potentially extreme situations (wind storms, for example) are characterized by high dynamical instability in the atmosphere and high EPS spread, EFI users should be aware that it is not uncommon for an extreme event to be preceded by wide-ranging shallow slope CDFs, yielding EFI values that are not particularly high. CDFs should be directly referenced.

Fig.5 Seasonal Forecast EFI

Long-Term Trend and Abrupt Change

On the basis of the feasibility of hypothesis test techniques, the long-term trend for major climate

variables could be investigated. Parametric tests are limited by the assumptions such as the

normality and constant variance of the error terms. Nonparametric tests have not these additional

assumptions and are better adapted to the trend test for hydro-meteorological time series.

The possible trends of annual and monthly climatic time series are detected by using a non-

parametric method and the abrupt changes have been examined using

Moving T-test (MTT) method,

Yamamoto method,

Pettitt Method,

Cramer Method,

11 | P a g e

Lepage method and

Mann-Kendall method,

Hurst exponent.

Fig.6 Schematic of Mann-Kendall abrupt trend diagnosis

References

Alfieri, L, Velasco, D, Thielen, J. (2011). Flash flood detection through a multi-stage probabilistic warning

system for heavy precipitation events. Advances in Geosciences 29: 69–75, DOI: 10.5194/adgeo-29-69-

2011, www.adv-geosci.net/29/69/2011/.

Dutra, E., Diamantakis, M., Tsonevsky, I., Zsoter, E., Wetterhall, F., Stockdale, T., Richardson, D. and

Pappenberger, F. (2013). The extreme forecast index at the seasonal scale. Atmospheric Science

Letters, 14(4), 256-262.

Easterling, D. R., L. V. Alexander, A. Mokssit, and V. Detemmerman, 2003. CCI/CLIVAR Workshop to Develop

Priority Climate Indices, Bull. Am. Met. Soc., 84(10), 1403-1407.

Lalaurette F. (2003). Early detection of abnormal weather conditions using a probabilistic extreme forecast

index. Quarterly Journal of The Royal Meteorological Society 129(594): 3037–3057.

Zsoter E. (2006). Recent developments in extreme weather forecasting. ECMWF Newsletter 107(107): 8–17.

12 | P a g e

Oliver, J. E, 2005. Encyclopedia of World Climatology. Springer. The Netherlands. 854 pp.

Schulzweida, U., L. Kornblueh, and R. Quast, 2011. CDO User’s Guide, Climate Data Operators, Version 1.5.0,

MPI for Meteorology, March 2011, 183 pp.

von Engelen, A., A. Klein Tank, G. ven de Schrier, and L. Klok, 2008. Towards an operational system for

assessing observed changes in climate extremes: European Climate Assessment & Dataset

(ECA&D)Rep., 40 pp, KNMI, The Netherlands.

Documents

Climatic Data Analysis and Diagnostics - CLIMsystems Ltddocuments.climsystems.com/services/CLIMsystems-data-analysis-s… · The Climatic Data Analysis and Diagnostics document is