32
New techniques for climate data homogenization New techniques for climate data homogenization Xiaolan L. Wang Xiaolan L. Wang Climate Research Division, ASTD, STB, Environment Canada Climate Research Division, ASTD, STB, Environment Canada CMC Seminar, 19 September 2008 CMC Seminar, 19 September 2008

New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

New techniques for climate data homogenizationNew techniques for climate data homogenization

Xiaolan L. WangXiaolan L. Wang

Climate Research Division, ASTD, STB, Environment CanadaClimate Research Division, ASTD, STB, Environment Canada

CMC Seminar, 19 September 2008CMC Seminar, 19 September 2008

Page 2: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

-6

-4

-2

0

2

4

6

2000199519901985198019751970196519601955

Mon

thly

win

d sp

eed

anom

alie

s (k

m/h

r)

Time of observation

adjustedraw

Monthly mean surface wind speed series (Nanaimo, BC)

different signs!-0.0015100.002231

-8

-6

-4

-2

0

2

4

6

199519901985198019751970196519601955

Tem

pera

ture

ano

mal

ies

(deg

ree

C)

Time of obervation

-0.000324-0.001287

De-seasonalized

monthly mean series of daily max. temperatures (Collegeville, NS)1. Temperature:

2. Wind speed:

-1.54°CEffects of mean-shifton trendestimation

Mean, var., &extreme indices

biased!

Some examples of discontinuity in climate data series

Page 3: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

3. Precipitation -

discontinuity due to joining of observations from nearby stations

Annual total rainfall for Moose Jaw Sask

0

100

200

300

400

500

600

700

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

mm

Moose Jaw Chab Moose Jaw A Moose Jaw CS

Use caution before adjusting to short interval

Courtesy of Lucie Vincent and Eva Mekis

See Vincent and Mekis

(2008), Discontinuities due to joining precipitation station observations in Canada,

J. App. Meteor. Climatol., in press.

Page 4: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Courtesy of Lucie Vincent

See Vincent et al. (2007), Surface temperature and humidity trends in Canada for 1953-2005. J. Clim., 20, 5100-5113.

4. Relative humidity –

discontinuity due to introduction of dewcel

in June 1978

60

65

70

75

80

85

90

1950 1960 1970 1980 1990 2000 2010

%

60

65

70

75

80

85

90

1950 1960 1970 1980 1990 2000 2010

%

60

65

70

75

80

85

90

1950 1960 1970 1980 1990 2000 2010

%

60

65

70

75

80

85

90

1950 1960 1970 1980 1990 2000 2010

%

Original valuesAdjusted values

Kuujjuaq, Québec

Winter Spring

Summer Fall

step= -8.0% step= -7.1%

step= -2.8%

step= -3.3%

Page 5: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Fog occurrence

5. Fog & low ceiling occurrence frequency -

Effects of mean-shift on trend estimation

Fog occurrence

occurrence offrequency theis

month in soccurrence ofnumber theis month in nsobservatio ofnumber theis

5.05.0

log odds Log

t

tt

t

t

tt

tt

MS

f

tStM

SMS

=

⎟⎟⎠

⎞⎜⎜⎝

⎛+−

+=η

Low ceiling occurrence

Page 6: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

6. Cloudiness frequency data

Clear sky occurrence

Overcast occurrence

Page 7: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

-

examples of

hourly

station pressure values

900

1000

1100

1200

1300

2002200120001999199819971996199519941993

pres

sure

(hPa

)

Year

P0Pz

Cape Hooper, NULong run of obviously wrong station pressure values

(with some correct ones in between)

Δ

= 294.6

hPa

920

960

1000

1040

Nov. 2000 Apr. 2001 Jan. 2002

pres

sure

(hPa

)

Year

PzP0

Dease Lake LWIS, BC

Long run of obviously wrong station pressure values (with a few correct ones in between)Δ

= 91.4

hPa

Dashed lines -

corrected values

-20

-15

-10

-5

0

5

10

15

199519901985198019751970

Mon

thly

pre

ssur

e an

omal

ies

(hPa

)

Time of obervation

0.009574-0.002386

De-seasonalized

monthly

mean station pressure series (Burgeo, NFLD)

Ignore 10.6 m elev.

7. Pressure data:

-2.956 hPa

Relatively small shift but it changes the sign

of trend estimate

Huge errors!

SLPStation pressure

Station pressureSLP

Page 8: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

= 3.4 hPa

940

960

980

1000

1020

19901985198019751970

Pz

Year

Lytton, BC

Original Corrected

Rz

series of original pressure data

Rz

series of corrected pressure data

27.4

m

960

990

1020

1970 1975 1980 1985 1990

P z(hPa

) ∆

= 3.4 hPa

- Another problematic hourly pressure series (relocation with elev. drop of 27.4 m)

Periodic feature -

reflects a problem in sealevel pressure reduction for elevated sites

It affects results of extreme (e.g., storminess) analysis!

We use a physical equation combined with a statistical method to tackle this kind of problems

See Wan et al. (2007), A Quality Assurance System for Canadian Hourly Pressure Data.

J. App. Meteor. Climatol., 46 (No. 11), 1804-1817.

Page 9: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Data discontinuities are inevitable due to inevitable changes in

observing instrument/location/environment… and evolving technology

Huge amount of $ spent on collecting climate data

big waste of $ if there were no effort to clean up the data

more and more effort devoted to climate data homogenization

Such discontinuities not only affect trend

assessment, but also affect

-

calibration of statistical relationships

for use in statistical weather forecast,

-

estimates of other statistics

that are used to study

climate variability

and

climate extremes, and to validate

model simulations

-

many other aspects of our study and understanding of the climate system

Page 10: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Special cases:

(a trend-change without anaccompanying mean-shift)

Data Homogenization Tools:1. Use physical relationships to correct some known

shifts

2. Use statistical methods to detect shifts

and estimate their magnitudeMetadata

e.g., hydrostatic equation to adjust pressure data for elevation

changes,log wind profile for anemometer height adjustments for wind speeds

2) TPR3 test for detecting mean-shifts

in constant trend

series (Wang 2003):

3) TPR4 test for detecting mean-shifts and/or

trend-changes

(Lund & Reeves 2002):

1) SNH test for detecting mean-shifts

in zero-trend

series (Alexandersson

1986):

Most commonly used statistical methods include

e.g., urbanization effects on T

Use a variant of TPR3!

changepoint

Page 11: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Problems/drawbacks of these changepoint detection methods:

Our recent studiesOur recent studies1. Propose two penalized tests to 1. Propose two penalized tests to even outeven out distributiondistribution of false alarm rate & detection powerof false alarm rate & detection power2. Extend these penalized tests to 2. Extend these penalized tests to account foraccount for the first order the first order autocorrelationautocorrelation3. Propose a stepwise testing algorithm for detecting 3. Propose a stepwise testing algorithm for detecting multiple changepointsmultiple changepointsNow, Now, developingdeveloping methods to deal with nonmethods to deal with non--Gaussian data (e.g. daily Gaussian data (e.g. daily precipprecip.),.), and and

multimulti--categorical data (e.g., frequencies of cloudiness conditions)categorical data (e.g., frequencies of cloudiness conditions)

1. Uneven distribution

of false alarm rate and detection power (details next)2. Model errors are assumed IID Gaussian3. For a series that contains at most one changepoint

-

The RHtestV2 software package-

Examples of application

-

The new methods and their detection power aspects (vs. those of the old methods)-

The uneven distribution problem of the old methods

The remainder of this presentation -

outline

Detection of a changepoint in a homogeneous series

Page 12: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0 50 100 150 200 250 300 350 400 450 500

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 150N = 200N = 300N = 500

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Fals

e al

arm

rat

e (F

AR

)Changepoint position k

N = 8N = 10N = 15N = 20N = 25N = 30N = 40

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 50N = 60N = 70N = 80N = 90N = 100

αα

α

SNH test :(or its equivalentmaximal t test)

SNH test: U-shape FAR(k) curves!

SNH & TPR3 suffer from the effect of unequal sample sizes

(N1

= k

N2

= N-k, generally)FAR(k) =αe(k) ≇α, pe(k) ≇ p !

Problems!

TPR3: W-shape FAR(k) curves!

0.03

0.04

0.05

0.06

0.07

0.08

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 10N = 15N = 20N = 25N = 30N = 40

α

0.03

0.05

0.07

0.09

0.11

0.13

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 50N = 60N = 70N = 80N = 90N = 100

α

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 50 100 150 200 250 300 350 400 450 500

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 150N = 200N = 250N = 300N = 400N = 500

α

Smaller effectsThe effective level of confidence on the identified changepoints that are

near the ends of series is much lower than the specified level p = (1 - α)

Not all thus-identified changepoints are actually significant at the nominal level,

while some actually-significant changepoints might go undetected – cheating!

Detection power depends on changepoint location (near the ends or middle of the series);

it has a much higher chance to give you a false alarm near the ends of a homogeneous series

Page 13: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

0 50 100 150 200 250 300 350 400 450 500

Scal

ing

fact

or P

k

Changepoint position k

N = 150N = 200N = 300N = 500

0.8

0.85

0.9

0.95

1

1.05

1.1

0 5 10 15 20 25 30 35 40

Scal

ing

fact

or P

k

Changepoint position k

N = 8N = 10N = 15N = 20N = 25N = 30N = 40

0.8

0.85

0.9

0.95

1

1.05

1.1

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Scal

ing

fact

or P

k

Changepoint position k

N = 50N = 60N = 70N = 80N = 90N = 100

Wang et al. (2007) propose a Penalized Maximal t (PMT) testto even out the U-shaped FAR(k) curves of the SNH type tests:

[ ]

⎟⎟⎠

⎞⎜⎜⎝

⎛−+−

−=

−==

−⎟⎠⎞

⎜⎝⎛ −

=

=

∑ ∑

∑∑

= +=

+==

−≤≤

k

t

N

ktttk

N

ktt

k

tt

k

Nk

XXXXN

XkN

XXk

X

XXN

kNkkT

kTkPPT

1 1

22

21

2

12

11

2121

11max

)()(2

1 ;1

;)(ˆ1)( where

)( )(max

σ

σ

∩-shape penalty P(k):(thick curves)

)05.0())((

max

maxT

kTR e

=☓, ╋, □, ○, …:

Page 14: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

0.2

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 50

SNHTPMT

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 100

SNHTPMT

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0 5 10 15 20 25 30 35 40 45 50 55 60

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 60

SNHTPMT

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 80

SNHTPMT

0.01

0.05

0.09

0.13

0.17

0.21

0.25

0.29

0.33

0.37

0.41

0.45

0.49

0.53

0 40 80 120 160 200 240 280 320 360 400 440 480

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 500

SNHTPMT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

0.23

0.25

0 20 40 60 80 100 120 140

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 150

SNHTPMT

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

0.23

0.25

0.27

0.29

0 20 40 60 80 100 120 140 160 180 200

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 200

SNHTPMT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

0.23

0.25

0.27

0.29

0.31

0.33

0.35

0.37

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 300

SNHTPMT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 2 4 6 8 10 12 14 16 18 20Fa

lse

alar

m r

ate

(FA

R)

Changepoint position k

N = 20

SNHTPMT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 2 4 6 8 10

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 10

SNHTPMT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 30

SNHTPMT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 40

SNHTPMT

FAR(k): SNH test vs

PMT test

PMT makes FAR(k) =αe(k) ≋α, p e(k

) ≋ p !

Page 15: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

0 50 100 150 200 250 300 350 400 450 500

Rat

io o

f m

ean

sign

ific

ance

rat

e

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2

1.22

1.24

1.26

0 50 100 150 200 250 300 350 400 450 500

Rat

io o

f m

ean

hit r

ate

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

0.98

1

1.02

1.04

1.06

1.08

1.1

0 50 100 150 200 250 300 350 400 450 500

Rat

io o

f m

ean

posi

tion

rate

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

Power of detection: PMT vs

SNH(real mean-shift Δ

at t = k)σ/Δ=RSS

Mean

Hit Rate:(PMT/SNH)

Mean

Position Rate:(PMT/SNH)

Mean

Significance Rate:(PMT/SNH)

Detect it in [k-2, k+2]& with p ≥

0.95

Detect it in [k-2, k+2]regardless of p

Detect it with p ≥

0.95regardless of position

averaged over2 ≤

k

N-2

Each value from 1000 simulations

Page 16: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Wang (2008a) proposes a Penalized Maximal F test (PMF testto even out the W-shaped FAR(k) curves of TPR3 test

[ ]

∑ ∑

= +=

=

−≤≤

−−+−−=

−−=

−−

=

=

k

t

N

ktttA

N

tt

A

A

fNk

tXtXSSE

tXSSE

NSSESSESSEkF

kFkPPF

1 1

22

21

1

2000

0

11max

)ˆˆ()ˆˆ(

;)ˆˆ(

;)3/(

1/)()( where

)( )(max

βμβμ

βμ

M-shape penalty Pf (k):(thick curves)

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

0 50 100 150 200 250 300 350 400 450 500

Scal

ing

fact

or P

k

Changepoint position k

N = 150N = 200N = 300N = 400N = 500

0.75

0.8

0.85

0.9

0.95

1

1.05

0 5 10 15 20 25 30 35 40

Scal

ing

fact

or P

k

Changepoint position k

N = 15N = 20N = 25N = 30N = 40

0.75

0.8

0.85

0.9

0.95

1

1.05

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Scal

ing

fact

or P

k

Changepoint position k

N = 50N = 60N = 70N = 80N = 90N = 100

)05.0())((

max

maxF

kFR e

=☓, ╋, □, ○, …:

Page 17: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 40

TPR3PMFT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 2 4 6 8 10 12 14

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 15

TPR3PMFT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 2 4 6 8 10 12 14 16 18 20

Fals

e al

arm

rat

e (F

AR

)Changepoint position k

N = 20

TPR3PMFT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 30

TPR3PMFT

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 50

TPR3PMFT

0.03

0.05

0.07

0.09

0.11

0 5 10 15 20 25 30 35 40 45 50 55 60

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 60

TPR3PMFT

0.03

0.05

0.07

0.09

0.11

0.13

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 80

TPR3PMFT

0.03

0.05

0.07

0.09

0.11

0.13

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 100

TPR3PMFT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0 20 40 60 80 100 120 140

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 150

TPR3PMFT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0 20 40 60 80 100 120 140 160 180 200

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 200

TPR3PMFT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

0.23

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 300

TPR3PMFT

0.01

0.03

0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

0.23

0.25

0.27

0.29

0 40 80 120 160 200 240 280 320 360 400

Fals

e al

arm

rat

e (F

AR

)

Changepoint position k

N = 400

TPR3PMFT

FAR(k): TPR3 vs

PMF test

PMF test makes FAR(k) =αe(k) ≋α, p e(k

) ≋ p !

Page 18: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Power of detection: PMF vs

TPR3(real mean-shift Δ

at t = k)σ/Δ=RSS

Mean

Hit Rate:(PMF/TPR3)

Mean

Position Rate:(PMF/TPR3)

Mean

Significance Rate:(PMF/TPR3)

Detect it in [k-3, k+3]& with p ≥

0.95

Detect it in [k-3, k+3]regardless of p

Detect it with p ≥

0.95regardless of position

averaged over2 ≤

k

N-2

Each value from 10,000 simulations

0.98

1

1.02

1.04

1.06

1.08

1.1

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Rat

io o

f m

ean

sign

ific

ance

rat

e

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

N

1

1.02

1.04

1.06

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Rat

io o

f m

ean

posi

tion

rate

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

N

1

1.02

1.04

1.06

1.08

1.1

1.12

1.14

0 50 100 150 200 250 300 350 400 450 500 550 600 650 700

Rat

io o

f m

ean

hit r

ate

Time series length

RSS = 0.25RSS = 0.5 RSS = 1.0 RSS = 1.5 RSS = 2.0

N

Page 19: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

PMT and PMF tests: still assume IID Gaussian errors!

Autocorrelation and Periodicity-

inherent features of most climate data series

Common practice:

-

use ref series to diminish periodicity (and trend)

> problem: ref series are not always available or homogeneous!

-

apply tests to annual series to reduce autocorr.

(usually autocorr. can not be diminished by using ref series! example)

sample size: only N/12

diff. shift-years identified in diff. month series for the same shift!

Some people ignore the effects, unaware of them

Page 20: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Consider AR(1) errors:

ttt z+= −1φεε

IID Gaussian noise(constant var.)

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

7

7.5

8

8.5

9

9.5

10

10.5

11

11.5

12

12.5

13

13.5

14

14.5

15

15.5

16

16.5

17

17.5

18

18.5

19

19.5

-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

95-t

h pe

rcen

tiles

of

PTm

ax

AR(1) autocorrelation

N=10

N=15

N=20

N=25

N=30

N=35

N=40

N=45

N=50

N=55

N=60

N=65

N=1200

),(),0( max,max, NPTNPT φαα <

Ignore apositive autocorr.

increase # offalse alarms!

),(),0( max,max, NPTNPT φαα >

Ignore a negative

autocorr.let changepoints

go undetected!

PTmax

percentiles as a function of autocor. Φ

for each N

Effects of autocor. Φ

on PTmax

Each value estimated from 10 million simulations

Page 21: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6

6.8

7

7.2

7.4

7.6

7.8

8

8.2

8.4

8.6

8.8

9

9.2

9.4

9.6

0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200

95-t

h pe

rcen

tiles

of

PTm

ax

Sample size (series length) N

-0.200-0.150-0.100

-0.050

0 0.025

0.0500.075

0.1000.125

0.1500.175

0.2000.225

0.2500.275

0.3000.325

0.3500.375

0.4000.4250.450

0.475

0.500

0.525

0.550

0.575

0.600

0.625

0.650

0.675

0.700

0.725

0.750

0.775

0.800

PTmax percentiles as a function of sample size N for each φ

The PMTredalgorithm

(RHtestV2)

),(max, NPT φα

Table of critical values

)],ˆ( ),,ˆ([ ),ˆ(

series residual model full thefrom ]ˆ,ˆ[ ˆ

maxmaxmax NPTNPTNPT UL

UL

φφφ

φφφ

Each value estimated from 10 million simulations

Page 22: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Effects of autocor. Φ

on PFmax

Consider AR(1) errors:

ttt z+= −1φεε

IID Gaussian noise(constant var.)

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

185

190

195

200

205

210

-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

95-t

h pe

rcen

tiles

of

PFm

ax

AR(1) autocorrelation

N=20

N=25

N=30

N=35

N=40

N=45

N=50

N=60

N=70

N=80

N=90

N=100N=1200

),(),0( max,max, NPFNPF φαα >

Ignore negative

autocorr.let changepoints

go undetected!

PFmax

percentiles as a function of autocor. Φ

),(),0( max,max, NPFNPF φαα <

Ignore apositive autocorr.

increase # offalse alarms!

Each value estimated from 10 million simulations

Page 23: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

42

44

46

48

50

52

54

56

58

60

62

64

66

68

70

72

74

76

78

80

82

84

86

88

90

92

94

96

98

0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200

95-t

h pe

rcen

tiles

of

PFm

ax

Sample size (series length) N

-0.200-0.150-0.100-0.0500 0.0250.0500.075

0.1000.125

0.1500.175

0.2000.225

0.2500.275

0.3000.325

0.3500.375

0.400

0.425

0.450

0.475

0.500

0.525

0.550

0.575

0.600

0.625

0.650

0.675

0.700

0.725

0.750

0.775

0.800

Table of critical values

PFmax

percentiles as a function of sample size N

for each ɸ

The PMFredalgorithm

(RHtestV2)

),(max, NPF φα

)],ˆ( ),,ˆ([ ),ˆ(

series residual model full thefrom ]ˆ,ˆ[ ̂

maxmaxmax NPFNPFNPF UL

UL

φφφ

φφφ

Each value estimated from 10 million simulations

Page 24: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

PMTred algorithm (for zero-trend series): PMFred algorithm (for constant trend series):

# of shifts

Random order

Power aspects: False Alarm Rate (FAR) and Hit Rate (HR; k±18)

FAR ⋍α=0.05, regardless of ɸ value

Ignore positive ɸ FAR » α

Power of accurate detection is satisfactory, Nc = 1 or Nc > 1!

φ̂in y uncertaint - rangey uncertaint 95%

(Each value estimated from 10,000 simulations)

Power depends on shift magnitude &

average segment length

No shift

Singleshift

Multipleshifts;AR(1)errors

IID

AR(1)

A large sample size is needed for a reasonably accurate estimate of ɸ!

Page 25: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

The RHtestV2 software packageThe RHtestV2 software package --

recognized by WMO at COP-13 (Bali, Dec. 2007)

(R & FORTRAN)(R & FORTRAN)

1. PMT1. PMTredred

algorithm algorithm --

for zerofor zero--trend series with IID or AR(1) Gaussian errors trend series with IID or AR(1) Gaussian errors --

for use with reference seriesfor use with reference series

--

FindU.wRefFindU.wRef, , FindUD.wRefFindUD.wRef, , StepSize.wRefStepSize.wRef

2. PMF2. PMFredred

algorithmalgorithm

--

for constant trend series with IID or AR(1) Gaussian errors for constant trend series with IID or AR(1) Gaussian errors --

can be used can be used withoutwithout

reference seriesreference series

--

FindUFindU, , FindUDFindUD, , StepSizeStepSize

Page 26: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Input data format for the RHtestV2(Annual series) OR

(Monthly series)

OR

(Daily series)

1974 00 00

-4.87

1967 7 00 1015.70

1994 1 27 8.11975 00 00

-3.88

1967 8 00 1015.95

1994 1 28 5.3 1976 00 00

-5.65

1967 9 00 1016.10

1994 1 29 4.91977 00 00

-3.34

1967 10 00 -999.99

1994 1 30 4.91978 00 00

-5.72

1967 11 00 1010.71

1994 1 31 4.01979 00 00

-4.08

1967 12 00 1011.58

1994 2 1 3.91980 00 00

-4.97

1968 1 00 1009.37

1994 2 2 7.21981 00 00

-2.58

1968 2 00 1003.07

1994 2 3 8.71982 00 00

-4.48

1968 3 00 1011.94

1994 2 4 6.31983 00 00

-999.

1968 4 00 1014.74

1994 2 5 -999.1984 00 00

-3.40

1968 5 00 1009.59

1994 2 6 -999.1985 00 00

-4.58

1968 6 00 1011.77

1994 2 7 -999.1986 00 00

-4.25

1968 7 00 1014.35

1994 2 8 -999. 1987 00 00

-2.25

1968 8 00 1010.87

1994 2 9 9.01988 00 00

-3.94

1968 9 00 1016.45

1994 2 10 6.0…

YYYY MM DD data YYYY MM DD data

MissingValuecode

Missingvaluecode

A Ref series should be in a separate file of the same format!Base and Ref series

can have data for different periods or missing valuesat different times, but they must have the same missing value code!

YYYY MM DD data

MissingValuecode

Page 27: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Stepwise testing algorithm-

Possible

multiple change-points in one single time series

-

Most methods developed assuming at most one change-point

1st change-point: might be false or inaccurate (“contaminated”)

NC0

1st

check

S1 S2

C03C01

2nd

check

[1] Find the most probable changepoint in the series being tested:

S43rd

check S5

S3

=S4

+S5(re-assessment of change-point C0 C02)

Is the largest among C01

, C02

, C03

significant at the nominal level α?No (End!)Yes, …

Page 28: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

[3] Estimate the significance of each changepoint Ki in the presence of{Cj

} (j=1,2,…Nc

) [4] Find the most significant

one among {Ki } (i=1,2,…Nc

+1) . Is it significant at the nominal level α?

series whitened thefrom ˆ and ,),ˆ(or ),ˆ( ,ˆ max,max, pNPFNPT ii φφφ αα

[1] Find the most probable changepoint in the series being tested C1

Stepwise testing algorithm for detecting multiple changepoints

[2] Mc =Nc ≥1 changepoints identified so far:

{Cj

} (j=1,2…,Nc ) Nc+1 segments. Identify the most probable changepoint in each segment {Ki } (i=1,2,…Nc+1)

No (Nc=Mc

)

[5] Find the smallest

among the Nc

shifts, re-assess its significance in thepresence of others. Is it significant at the nomial

level α?

Yes, repeat [2]-[4]

[6] output the list {Cj

} (j=1,2,…Nc

) andobtain the final estimates of -

shift significance-

shift size (Nc+1)-phase model fit

Yes

Delete it from the list {Cj

} (j=1,2,…Nc

),set Nc

= Nc

-1, re-assess significanceof the remaining shifts, and repeat [5]

No

add it to list {Cj

} (j=1,2,…Nc

), anddelete it from list {Ki } (i=1,2,…Nc

+1); then, set Nc

=Nc

+1, & repeat [3]-[4]

YesIs Nc

> Mc

?No

Main points:(1) Significant changepoints are added to

the list of changepoints, one by one:the most

significant, the next most…2) Significance of a changepoint is always

assessed in the presence of others listed,and re-assessed if the list changes.

(3) In the resulting list, each changepoint is significant in the presence of all others in the list.

(4) Final estimates (Nc+1)-phase model fitto de-seasonalized

base series, andto the base-minus-reference series (β≡0)if a reference series is used

Page 29: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Examples of applicationExamples of application 1. Single changepoint case1. Single changepoint case

--

PMTPMTredred BaseBase--RefRef seriesseries

((BurgeoBurgeo--Yarmouth)Yarmouth)

Burgeo

monthly station pressure series

-20

-15

-10

-5

0

5

10

15

199519901985198019751970

Mon

thly

pre

ssur

e an

omal

ies

(hPa

) De-seasonalized

pressure series

990

995

1000

1005

1010

1015

1020

1025

199519901985198019751970M

onth

ly p

ress

ure

anom

alie

s (h

Pa)

detectedmetadata

Actual shift: Dec. 1976 (10.6m 0)Detected shift: Oct. 1976

990

995

1000

1005

1010

1015

1020

1025

199519901985198019751970

Mon

thly

mea

n pr

essu

res

(hPa

)

Adjusted pressure series

0.117] [-0.098, 0.010ˆmonthper hPa 009574.0ˆ

=

=

φ

β

-2.956 hPa

Plots like theseare included

in the RHtestV2 output

Page 30: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Examples of applicationExamples of application 2. Two changepoints2. Two changepoints

--

PMFPMFredred dede--seasonalizedseasonalizedtemp. seriestemp. series

(shifts & seasonal means(shifts & seasonal meansestimated via iteration)estimated via iteration)

-30

-20

-10

0

10

20

19951990198519801975197019651960195519501945194019351930192519201915

Tem

pera

ture

(de

gree

C)

Time of obervation

Amos monthly means of daily minimum temperatures

DetectedPMFred Metadata

(1) Oct 1953

Aug 1948-Oct 1958: Stn

elevation change(990 feet 1002 feet)

(2) Aug 1958

Oct 1958: Thermometer screen &stand replaced

-10

-5

0

5

10

19951990198519801975197019651960195519501945194019351930192519201915

Tem

pera

ture

ano

mal

ies

(deg

ree

C)

Time of obervation

De-seasonalized

series

(2)1.97°C

(1)-1.23°C

0.233] [0.112, 0.173ˆmonthper C 001385.0ˆ

=

=

φ

β

Page 31: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

Type-1: in boldType-0: in italic

DetectedPMTred_gRef PMFred Metadata( 1) Feb 1960 Oct 1960 May 1957-Feb 1961: anemometer 45B U2A( 2) Jul 1965 Dec 1964 Feb 1961-Mar 1967: Ane. height change( 3) Feb 1974 Jan 1974 Early 1974: WSD detector has a rusted U-arm ( 4) Jan 1976 Nov 1975 Dec 1975: WSD detector replaced( 5) Mar 1980 X Mar 1981: U2A had bearing jam, was replaced ( 6) Aug 1982 Jan 1984 Jan 1984: WSD detector replaced (was sticking)( 7) Feb 1985 X

Mar 1985: Wind system relocated( 8) Nov 1985 X

Jun 1986: Wind system completely recalibrated( 9) Feb 1993 Feb 1993 Jan 1993: WSD detector & tilt pole replaced (10) Jun 1994 Apr 1994 Jun 1994: WSD detector replaced, new tilt pole(11) Apr 1997 Apr 1997 No metadata from Jul. 1994 to Apr. 1997

Nmin

=5 Nmin

=10

Examples of application Examples of application 3.3.

Multiple changepoints caseMultiple changepoints case

--

PMTPMTredred BaseBase--RefRef seriesseriesRef = Ref = gRefgRef

= geostrophic winds (= geostrophic winds (homo. SLPhomo. SLP) ) or or Ref = Ref = mRefmRef

= 5= 5--stns mean surface windsstns mean surface winds--

PMFPMFredred dede--seasonalizedseasonalized base seriesbase series

Nanaimo monthly wind speed series

0

2

4

6

8

10

12

14

16

2000199519901985198019751970196519601955

Mon

thly

mea

n w

ind

spee

ds (

km/h

r)

Time of observation

BaseDiffResults of PMTred

0

2

4

6

8

10

12

14

16

20052000199519901985198019751970196519601955

Win

d sp

eed

(km

/hr)

Time of observation

PMF basePMT gRef diffResults of PMFred

(solid line)&PMTred

(dashed line)

Nseg <10

0

2

4

6

8

10

12

14

16

2000199519901985198019751970196519601955

Mon

thly

mea

n w

ind

spee

ds (

km/h

r)

Time of observation

BaseDiff

Results of PMTred (using mRef)

(1)(2)

(3)(4)

(5)(6)

(7) (8)(9)

(11)(10)

37.1ˆ03.0ˆ

−=Δ

−=Δ

dNot a shift in the base series (Δ=-0.03)but a problem in the mRef

series!

Two estimates of shift-size - Any notable inconsistency

between them indicates …

Aug. 1971

Apply PMFred

to a Ref series to eliminate large artificial shifts (if any) before it is used!

It is always necessary to check the veracity

of changepoints identified statistically!Use of good reference series increases detection power!

Page 32: New techniques for climate data homogenization Xiaolan L. Wang€¦ · 19-09-2008  · Courtesy of Lucie Vincent and Eva Mekis. See Vincent and Mekis (2008), Discontinuities due to

For more details about using the RHtestV2, please refer to For more details about using the RHtestV2, please refer to

““Wang, X. L. and Y. Feng, 2007: Wang, X. L. and Y. Feng, 2007: RHtestV2 User Manual.RHtestV2 User Manual.””

Available atAvailable at

http://http://cccma.seos.uvic.ca/ETCCDMI/software.shtmlcccma.seos.uvic.ca/ETCCDMI/software.shtml,,

along with the open source softwarealong with the open source software

Thank you very much!Thank you very much!

References:Wang, X. L., 2008a: Penalized maximal F test for detecting undocumented mean-shift without trend change.

J. Atmos. Oceanic Technol., 25 (No. 3), 368-384. DOI:10.1175/2007/JTECHA982.1Wang, X. L., 2008b: Accounting for autocorrelation in detecting mean-shifts in climate data series

using the penalized maximal t or F test. J. App. Meteor. Climatol, 47, 2423–2444.Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized Maximal t-test for Detecting Undocumented Mean Change

in Climate Data Series. J. App. Meteor. Climatol., 46 (No. 6), 916-931. DOI:10.1175/JAM2504.1Wan, H., X. L. Wang, and V. R. Swail, 2007: A Quality Assurance System for Canadian Hourly Pressure Data.

J. App. Meteor. Climatol., 46 (No. 11), 1804-1817.