Fuzzy Verification toolbox: definitions and results

Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss

Fuzzy Verification toolbox: definitions and results

Felix AmentMeteoSwiss, Switzerland

2 Fuzzy Verification [email protected]

Which rain forecast would you rather use?

Mesoscale model (5 km) 21 Mar 2004

Sydney

Global model (100 km) 21 Mar 2004

Sydney

Motivation for new scores

Observed 24h rain

RMS=13.0 RMS=4.6


Fine scale verification: Fuzzy Methods

observation forecast

x x xx xx x x

x

X XX X

X Xx XXX

x

x x

Intensity

ScaleEvaluate box

statistics

• (Choose a threshold to define event and non-event)

• define scales of interest

• consider statistics at these scales for verification

General Recipe

“… do not evaluate a point by point match!”

score depends on spatial scale and intensity


A Fuzzy Verification Toolbox

Ebert, E.E., 2007: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteorol. Appls., submitted.

Toolbox available at http://www.bom.gov.au/bmrc/wefor/staff/eee/fuzzy_verification.zip

Fuzzy method Decision model for useful forecast

Upscaling (Zepeda-Arce et al. 2000; Weygandt et al. 2004) Resembles obs when averaged to coarser scales

Anywhere in window (Damrath 2004), 50% coverage Predicts event over minimum fraction of region

Fuzzy logic (Damrath 2004), Joint probability (Ebert 2002) More correct than incorrect

Multi-event contingency table (Atger 2001) Predicts at least one event close to observed event

Intensity-scale (Casati et al. 2004) Lower error than random arrangement of obs

Fractions skill score (Roberts and Lean 2005) Similar frequency of forecast and observed events

Practically perfect hindcast (Brooks et al. 1998) Resembles forecast based on perfect knowledge of observations

Pragmatic (Theis et al. 2005) Can distinguish events and non-events

CSRR (Germann and Zawadzki 2004) High probability of matching observed value

Area-related RMSE (Rezacova et al. 2005) Similar intensity distribution as observed


Applying fuzzy scores

Fuzzy scores provide a wealth of information, but

• the results seems to be contrasting

• their interpretation is sometimes difficult

• contain too many numbers

goodpoor


Application versus testbed

Know the scores Forecast error is unknown

! ?

? !Scores are unknown Know the forecast

error

Application

Testbed


A Fuzzy Verification testbed

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

0.90 0.90 0.90 0.90 0.90 0.90 1.00 1.00 1.00

0.70 0.70 0.70 0.70 0.70 0.70 1.00 1.00 1.00

0.50 0.50 0.50 0.50 0.50 0.50 0.90 0.90 0.90

0.50 0.50 0.50 0.50 0.50 0.50 0.90 0.90 0.90

0.40 0.40 0.50 0.50 0.50 0.50 0.90 0.90 0.90

0.30 0.40 0.40 0.50 0.50 0.50 0.90 0.90 0.90

Perturbation Generator

AnalyzerFuzzy Verification Toolbox

Virtual truth(Radar data, model data, synthetic field)

Realizations of virtual erroneousmodel forecasts

Realizations ofverification results Assessment of

• sensitivity (mean)

• [reliability (STD)]

Two ingredients:

1. Reference fields: Hourly radar derived rain fields, August 2005 flood event, 19 time stamps (Frei et al., 2005)

2. Perturbations: next slide


Perturbations

Perturbation Type of forecast error Algorithm

PERFECT No error – perfect forecast! -

XSHIFT Horizontal translationHorizontal translation (10 grid points)

BROWNIAN No small scale skillRandom exchange of neighboring points (Brownian motion)

LS_NOISE Wrong large scale forcingMultiplication with a disturbance factor generated by large scale 2d Gaussian kernels.

SMOOTHHigh horizontal diffusion (or coarse scale model)

Moving window arithmetic average

DRIZZLEOverestimation of low intensity precipitation

Moving Window filter setting each point below average point to the mean value


Perfect forecast

All scores should equal !

• But, in fact, 5 out of 12 do not!


Effect of „Leaking“ Scores

observation forecast

Problem: Some methods assume no skill at scales below window size!

pobs=0.5 pforecast=0.5

Assuming random ordering within window

yes no

yes 0.25 0.25

no 0.25 0.25

An example: Joint probability method

ForecastO

BS Not perfect!


Expected response to perturbations

XSHIFT BROWNIAN LS_NOISE SMOOTH DRIZZLE

Sensitivity: expected (=0.0); not expected (=1.0)

Contrast := mean( ) – mean( )

Summary in terms of contrast:

low high

intensity

coarse

fine

spat

ial

scal

e


Summary real

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

Up-scaling

Any-where in Window

50%cover-age

FuzzyLogig

JointProb.

Multi event

cont. tab.

Intensity Scale

Fraction Skill

Score

Prag-matic Appr.

Practic. Perf.

HindcastCSSR

Area related RMSE

Leaking Scores

XSHIFTBROWNIAN SMOOTH

LS_NOISE DRIZZLEContrast

STD

good

good

• Leaking scores show an overall poor performance

• “Intensity scale” and “Practically Perfect Hindcast” perform in general well, but …

• Many score have problem to detect large scale noise (LS_NOISE); “Upscaling” and “50% coverage” are beneficial in this respect

• Leaking scores show an overall poor performance

• “Intensity scale” and “Practically Perfect Hindcast” perform in general well, but …

• Many score have problem to detect large scale noise (LS_NOISE); “Upscaling” and “50% coverage” are beneficial in this respect


Spatial detection versus filtering

x=25km

x=10km

x=5km

Horizontal translation (XSHIFT) with variable displacement x

• “Intensity scale” method can detect spatial scale of perturbation

• All other methods like the “Fraction Skill score” just filter small scale errors


Redundancy of scoresCorrelation (%) of resulting scores between all score for all thresholds, window sizes – averaged over all types of perturbation:

Groups of scores:

• UP, YN, MC, FB, PP

• FZ, JP

• FB, PP, (IS)


August 2005 flood eventPrecipitation sum 18.8.-23.8.2005:

Mean: 106.2mm

Mean: 43.2mm

Mean: 73.1mm

Mean: 62.8mm

(Hourly radar data calibrated using rain gauges (Frei et al., 2005))


Fuzzy Verification of August 2005 flood Based on 3 hourly accumulations during August 2005 flood period (18.8.-23.8.2005)

bad

goodIntensitythreshold (mm/3h)

Sca

le(7

km g

ridpo

ints

)

COSMO-7 COSMO-2


Fuzzy Verification of August 2005 flood

COSMO-7 better

COSMO-2 better

neutral

Difference of Fuzzy Scores

Intensity threshold (mm/3h)

Sca

le(7

km g

ridpo

ints

)


D-PHASERADAR

• Operational phase (June until November 2007) is running

• 33 atmospheric models take part …

• … and store there output in a common format in one data archive

Demonstration of Probabilistic Hydrological and Atmospheric Simulation of flood Events in the Alpine region

Standard verification (see Poster)

Let’s apply the fuzzy toolbox

• Models: COSMO -2, -7, -DE, -EU

• Period: August 2007

• Lead times: most recent forecast starting at forecast hour +03.

• Observations: Swiss Radar data aggregated on each model grid

• To be verified: 3h accumulation of precip.


D-PHASE: August 2007Intensity Scale score (preliminary), 3h accumulation

CO

SM

O-7

CO

SM

O-2

CO

SM

O-D

E

CO

SM

O-E

U


Conclusions

• Fuzzy Verification score are a promising framework for verification of high resolution precipitation forecasts.

• The testbed is a useful tool to evaluate the wealth of scores (not necessarily fuzzy ones):

• Not all scores indicate a perfect forecast by perfect scores (Leaking scores).

• The “intensity scale” method is able to detect the specific scale of an spatial error.

• MeteoSwiss goes for: Upscaling, Intensity scale, Fraction skill score ( and Pracitically perfect hindcast) methods.

• First long term application for D-PHASE has just started.


Summary ideal

-0.1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0

0.1

0.2

Up-scaling

Any-where in Window

50%cover-age

FuzzyLogig

JointProb.

Multi event

cont. tab.

Intensity Scale

Fraction Skill

Score

Prag-matic Appr.

Practic. Perf.

HindcastCSSR

Area related RMSE

Leaking Scores

XSHIFTBROWNIAN SMOOTH

LS_NOISE DRIZZLEContrast

STD

good

good


D-PHASE: August 2007


D-PHASE: August 2007


D-PHASE: August 2007 – cosmoch7


D-PHASE: August 2007 – Cosmoch2


D-PHASE: August 2007 - LME


D-PHASE: August 2007 - LMK


August 2005 flood eventPrecipitation sum 18.8.-23.8.2005:

Mean: 73.1mm

Mean: 62.8mm Mean: 106.2mm

Mean: 43.2mm

7


August 2005 flood event

Fuzzy Verification (hourly accumulations):

COSMO-7 COSMO-2


August 2005 flood event

Fuzzy Verification COSMO-2 – COSMO-7:

• Suprisingly, small differences

• However, COSMO2 seems to be slightly better slightly better at

Documents

Fuzzy Verification toolbox: definitions and results