Upload
erik
View
40
Download
0
Embed Size (px)
DESCRIPTION
Fuzzy Verification toolbox: definitions and results. Felix Ament MeteoSwiss, Switzerland. Mesoscale model (5 km) 21 Mar 2004. Global model (100 km) 21 Mar 2004. Observed 24h rain. Sydney. Sydney. RMS=13.0. RMS=4.6. Motivation for new scores. Which rain forecast would you rather use?. - PowerPoint PPT Presentation
Citation preview
Federal Department of Home Affairs FDHAFederal Office of Meteorology and Climatology MeteoSwiss
Fuzzy Verification toolbox: definitions and results
Felix AmentMeteoSwiss, Switzerland
2 Fuzzy Verification [email protected]
Which rain forecast would you rather use?
Mesoscale model (5 km) 21 Mar 2004
Sydney
Global model (100 km) 21 Mar 2004
Sydney
Motivation for new scores
Observed 24h rain
RMS=13.0 RMS=4.6
3 Fuzzy Verification [email protected]
Fine scale verification: Fuzzy Methods
observation forecast
x x xx xx x x
x
X XX X
X Xx XXX
x
x x
Intensity
ScaleEvaluate box
statistics
• (Choose a threshold to define event and non-event)
• define scales of interest
• consider statistics at these scales for verification
General Recipe
“… do not evaluate a point by point match!”
score depends on spatial scale and intensity
4 Fuzzy Verification [email protected]
A Fuzzy Verification Toolbox
Ebert, E.E., 2007: Fuzzy verification of high resolution gridded forecasts: A review and proposed framework. Meteorol. Appls., submitted.
Toolbox available at http://www.bom.gov.au/bmrc/wefor/staff/eee/fuzzy_verification.zip
Fuzzy method Decision model for useful forecast
Upscaling (Zepeda-Arce et al. 2000; Weygandt et al. 2004) Resembles obs when averaged to coarser scales
Anywhere in window (Damrath 2004), 50% coverage Predicts event over minimum fraction of region
Fuzzy logic (Damrath 2004), Joint probability (Ebert 2002) More correct than incorrect
Multi-event contingency table (Atger 2001) Predicts at least one event close to observed event
Intensity-scale (Casati et al. 2004) Lower error than random arrangement of obs
Fractions skill score (Roberts and Lean 2005) Similar frequency of forecast and observed events
Practically perfect hindcast (Brooks et al. 1998) Resembles forecast based on perfect knowledge of observations
Pragmatic (Theis et al. 2005) Can distinguish events and non-events
CSRR (Germann and Zawadzki 2004) High probability of matching observed value
Area-related RMSE (Rezacova et al. 2005) Similar intensity distribution as observed
5 Fuzzy Verification [email protected]
Applying fuzzy scores
Fuzzy scores provide a wealth of information, but
• the results seems to be contrasting
• their interpretation is sometimes difficult
• contain too many numbers
goodpoor
6 Fuzzy Verification [email protected]
Application versus testbed
Know the scores Forecast error is unknown
! ?
? !Scores are unknown Know the forecast
error
Application
Testbed
7 Fuzzy Verification [email protected]
A Fuzzy Verification testbed
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.90 0.90 0.90 0.90 0.90 0.90 1.00 1.00 1.00
0.70 0.70 0.70 0.70 0.70 0.70 1.00 1.00 1.00
0.50 0.50 0.50 0.50 0.50 0.50 0.90 0.90 0.90
0.50 0.50 0.50 0.50 0.50 0.50 0.90 0.90 0.90
0.40 0.40 0.50 0.50 0.50 0.50 0.90 0.90 0.90
0.30 0.40 0.40 0.50 0.50 0.50 0.90 0.90 0.90
Perturbation Generator
AnalyzerFuzzy Verification Toolbox
Virtual truth(Radar data, model data, synthetic field)
Realizations of virtual erroneousmodel forecasts
Realizations ofverification results Assessment of
• sensitivity (mean)
• [reliability (STD)]
Two ingredients:
1. Reference fields: Hourly radar derived rain fields, August 2005 flood event, 19 time stamps (Frei et al., 2005)
2. Perturbations: next slide
8 Fuzzy Verification [email protected]
Perturbations
Perturbation Type of forecast error Algorithm
PERFECT No error – perfect forecast! -
XSHIFT Horizontal translationHorizontal translation (10 grid points)
BROWNIAN No small scale skillRandom exchange of neighboring points (Brownian motion)
LS_NOISE Wrong large scale forcingMultiplication with a disturbance factor generated by large scale 2d Gaussian kernels.
SMOOTHHigh horizontal diffusion (or coarse scale model)
Moving window arithmetic average
DRIZZLEOverestimation of low intensity precipitation
Moving Window filter setting each point below average point to the mean value
9 Fuzzy Verification [email protected]
Perfect forecast
All scores should equal !
• But, in fact, 5 out of 12 do not!
10 Fuzzy Verification [email protected]
Effect of „Leaking“ Scores
observation forecast
Problem: Some methods assume no skill at scales below window size!
pobs=0.5 pforecast=0.5
Assuming random ordering within window
yes no
yes 0.25 0.25
no 0.25 0.25
An example: Joint probability method
ForecastO
BS Not perfect!
11 Fuzzy Verification [email protected]
Expected response to perturbations
XSHIFT BROWNIAN LS_NOISE SMOOTH DRIZZLE
Sensitivity: expected (=0.0); not expected (=1.0)
Contrast := mean( ) – mean( )
Summary in terms of contrast:
low high
intensity
coarse
fine
spat
ial
scal
e
12 Fuzzy Verification [email protected]
Summary real
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
Up-scaling
Any-where in Window
50%cover-age
FuzzyLogig
JointProb.
Multi event
cont. tab.
Intensity Scale
Fraction Skill
Score
Prag-matic Appr.
Practic. Perf.
HindcastCSSR
Area related RMSE
Leaking Scores
XSHIFTBROWNIAN SMOOTH
LS_NOISE DRIZZLEContrast
STD
good
good
• Leaking scores show an overall poor performance
• “Intensity scale” and “Practically Perfect Hindcast” perform in general well, but …
• Many score have problem to detect large scale noise (LS_NOISE); “Upscaling” and “50% coverage” are beneficial in this respect
• Leaking scores show an overall poor performance
• “Intensity scale” and “Practically Perfect Hindcast” perform in general well, but …
• Many score have problem to detect large scale noise (LS_NOISE); “Upscaling” and “50% coverage” are beneficial in this respect
13 Fuzzy Verification [email protected]
Spatial detection versus filtering
x=25km
x=10km
x=5km
Horizontal translation (XSHIFT) with variable displacement x
• “Intensity scale” method can detect spatial scale of perturbation
• All other methods like the “Fraction Skill score” just filter small scale errors
14 Fuzzy Verification [email protected]
Redundancy of scoresCorrelation (%) of resulting scores between all score for all thresholds, window sizes – averaged over all types of perturbation:
Groups of scores:
• UP, YN, MC, FB, PP
• FZ, JP
• FB, PP, (IS)
15 Fuzzy Verification [email protected]
August 2005 flood eventPrecipitation sum 18.8.-23.8.2005:
Mean: 106.2mm
Mean: 43.2mm
Mean: 73.1mm
Mean: 62.8mm
(Hourly radar data calibrated using rain gauges (Frei et al., 2005))
16 Fuzzy Verification [email protected]
Fuzzy Verification of August 2005 flood Based on 3 hourly accumulations during August 2005 flood period (18.8.-23.8.2005)
bad
goodIntensitythreshold (mm/3h)
Sca
le(7
km g
ridpo
ints
)
COSMO-7 COSMO-2
17 Fuzzy Verification [email protected]
Fuzzy Verification of August 2005 flood
COSMO-7 better
COSMO-2 better
neutral
Difference of Fuzzy Scores
Intensity threshold (mm/3h)
Sca
le(7
km g
ridpo
ints
)
18 Fuzzy Verification [email protected]
D-PHASERADAR
• Operational phase (June until November 2007) is running
• 33 atmospheric models take part …
• … and store there output in a common format in one data archive
Demonstration of Probabilistic Hydrological and Atmospheric Simulation of flood Events in the Alpine region
Standard verification (see Poster)
Let’s apply the fuzzy toolbox
• Models: COSMO -2, -7, -DE, -EU
• Period: August 2007
• Lead times: most recent forecast starting at forecast hour +03.
• Observations: Swiss Radar data aggregated on each model grid
• To be verified: 3h accumulation of precip.
19 Fuzzy Verification [email protected]
D-PHASE: August 2007Intensity Scale score (preliminary), 3h accumulation
CO
SM
O-7
CO
SM
O-2
CO
SM
O-D
E
CO
SM
O-E
U
20 Fuzzy Verification [email protected]
Conclusions
• Fuzzy Verification score are a promising framework for verification of high resolution precipitation forecasts.
• The testbed is a useful tool to evaluate the wealth of scores (not necessarily fuzzy ones):
• Not all scores indicate a perfect forecast by perfect scores (Leaking scores).
• The “intensity scale” method is able to detect the specific scale of an spatial error.
• MeteoSwiss goes for: Upscaling, Intensity scale, Fraction skill score ( and Pracitically perfect hindcast) methods.
• First long term application for D-PHASE has just started.
21 Fuzzy Verification [email protected]
Summary ideal
-0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0
0.1
0.2
Up-scaling
Any-where in Window
50%cover-age
FuzzyLogig
JointProb.
Multi event
cont. tab.
Intensity Scale
Fraction Skill
Score
Prag-matic Appr.
Practic. Perf.
HindcastCSSR
Area related RMSE
Leaking Scores
XSHIFTBROWNIAN SMOOTH
LS_NOISE DRIZZLEContrast
STD
good
good
22 Fuzzy Verification [email protected]
D-PHASE: August 2007
23 Fuzzy Verification [email protected]
D-PHASE: August 2007
24 Fuzzy Verification [email protected]
D-PHASE: August 2007 – cosmoch7
25 Fuzzy Verification [email protected]
D-PHASE: August 2007 – Cosmoch2
26 Fuzzy Verification [email protected]
D-PHASE: August 2007 - LME
27 Fuzzy Verification [email protected]
D-PHASE: August 2007 - LMK
28 Fuzzy Verification [email protected]
August 2005 flood eventPrecipitation sum 18.8.-23.8.2005:
Mean: 73.1mm
Mean: 62.8mm Mean: 106.2mm
Mean: 43.2mm
7
29 Fuzzy Verification [email protected]
August 2005 flood event
Fuzzy Verification (hourly accumulations):
COSMO-7 COSMO-2
30 Fuzzy Verification [email protected]
August 2005 flood event
Fuzzy Verification COSMO-2 – COSMO-7:
• Suprisingly, small differences
• However, COSMO2 seems to be slightly better slightly better at