Upload
kory
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
An analysis of different bias-correction algorithms in a synthetic environment. Joo-Hyung Son 1 Zoltan Toth 2 and Dingchen Hou 3 1) Numerical Weather Prediction Division KMA 2) Environmental Modeling Center NCEP/NWS/NOAA 3) EMC/NCEP/NWS/NOAA and SAIC. OUTLINE. Introduction - PowerPoint PPT Presentation
Citation preview
An analysis of different bias-correction algorithms in a synthetic environment
Joo-Hyung Son1
Zoltan Toth2 and Dingchen Hou3
1)Numerical Weather Prediction Division KMA2)Environmental Modeling Center NCEP/NWS/NOAA
3)EMC/NCEP/NWS/NOAA and SAIC
• Introduction• Generation of a Synthetic Data Set• Effects of Sample size on the Bias Estimation• Bias Estimation Based on Bayesian Approach• Effect of Bias Correction on Probabilistic
Forecast• Summary
OUTLINE
Background
• NWP products is subject to systematic error and random errors.
• Estimating bias from historical data and then subtracting it from the forecast provides an effective way of reducing systematic errors.
Existing Questions
• How to estimate the Bias? There exist various methods of bias correction, e.g. equal weight method and Kalman Filter type algorithm (Cui et al, 2005).
• What is the length of the historical data set required for a reasonable accuracy of bias estimation? No systematic investigations.
This Study – A Simplified Approach
• Single forecast of a single variable at a single grid point.
• Simulated forecast (synthetic data )--- no dynamic evolution.
• Simulated forecast of various skill (lead time) and bias level.
• Simulation can be extended to represent more realistic forecasts.
Introduction
Generation of synthetic data - analysis
• Assumptions– Remove annual cycle– Standardized
– Stationary processStationary process
s
mii c
cax
Daily climate data
Climate mean
Climate standard deviation
mc
sc
ia
• Estimate parameters based on- 40 years climate data at 37.5N, 117.5W- 2m temperature
• Analysis– General ARMA(p,q) model
– Order of autoregressive– Order of moving average– White noise– Autocorrelation parameter– Moving average parameter
p
i
q
jtjtjitit xx
1 1
p
q),0(~ 2
WNt
Aotocorrelation
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300
p = 20 q = 1
-3
-2
-1
0
1
2
3
4
0 365 730
Generation of synthetic data - analysis
Time series of analysis
Climate generated by ARMA(20,1)
Requirements:• The time series of analysis and forecast are similar stationary stochastic
processes. • Forecast is correlated to analysis with a coefficient reflecting the skill
of the forecastfor perfect correlation and non-correlated forecast. (simulate lead time 1 to 16 days)
• Forecast is subject to random error (independent of analysis) with various variance (=1 no skill, =0 no noise).
• Forecast is statistically the same as analysis (N(0,1)). This is satisfied by setting =sqrt(1-**2).
• A constant (time independent) bias is added to the forecast.
Model:
Generation of synthetic data - forecast
bfAf ea – analysis generated by ARMA model, N(0,1) – forecast, N(0.1) : forecast error, N(0,1)– bias, constant – correlation between forecast and analysis
aA
b
f ef
Generation of synthetic data - forecast
time series of synthetic data (no bias)
-3
-2
-1
0
1
2
3
4
0 10 20 30 40 50 60 70 80 90 100analysis
corr=0.1
corr=0.9
time series of real forecast and anaysis
-3
-2
-1
0
1
2
3
4
0 10 20 30 40 50 60 70 80 90 100analysis
day 1
day 10
Testing Synthetic forecast model against real forecast data
Comparison between Real data & Synthetic data
Purple linePurple line:
• “prediction” of how the forecast would look.
• Normal forecast distribution centered on alpha times a,
• : correlation estimated based on whole observation period
• : mean of all analysis values falling between 3 and 4.
• : standard deviation of forecast when corresponding analysis is between 3 and 4
HistogramHistogram:
•Forecast after moving bias
),ˆ( 2aN
a
day 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-20 -15 -10 -5 0 5 10 15 20
10 day
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-20 -15 -10 -5 0 5 10 15 20
day 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-20 -15 -10 -5 0 5 10 15 20
day 16
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
-20 -15 -10 -5 0 5 10 15 20
Chi-square Test
Lead time chi-square p-value
day 1 1.78427 0.97081
day 2 9.57123 0.21420
day 3 3.73155 0.81013
day 4 39.51318 0.00000
day 5 1137.78357 0.00000
day 6 37.41785 0.00000
day 7 26.03835 0.00050
day 8 7.17229 0.41117
day 9 17.95259 0.01219
day 10 8.96989 0.25483
day 11 26.03835 0.00050
day 12 4.92888 0.66864
day 13 8.96989 0.25483
day 14 7.32356 0.39599
day 15 7.32356 0.39599
day 16 3.73155 0.81013
mean
Testing Synthetic forecast model against real forecast data
Bias-correction algorithms
• Traditional method (method 1)
– Bias ~ weighted average of
– Bias Estimation• Equal weight
• Kalman Filter
– Bias Correction
af
nnn afn
bn
nb )(
1ˆ1ˆ1
nnn afbb )(ˆ)1(ˆ1
: Kalman Filter weight
nnn bff ˆ11
Absolute bias error of Method 1
Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~120)
bias error for a single case [equal weight]
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 200 400 600 800 1000
corr 0.0
corr 0.3
Kalman Filter method (alpha = 0.02)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500
n
absolu
te b
ias e
rror
m1 0.0
m1 0.3
m1 0.6
m1 0.95
Kalman filter absolute bias error for 100 cases
equal weight abasolute bias error for 100 cases
0
0.1
0.2
0.3
0.4
0.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
corr 0.00
corr 0.15
corr 0.30
corr 0.45
corr 0.60
corr 0.75
corr 0.90
Bias-correction algorithms
• Traditional method (method 1)
)()( abfaEafE e ))1(( bfaE e
For a particular a
Given the forecast model
bbaafE )1()(
For longer time series to sample, the whole distribution of , bafE )(
i.e. bafEE ea )]([
a
• New method (method 2)– Based on Bayesian ApproachBased on Bayesian Approach– Bias ~ weighted average of
Note without sampling the whole distribution of
shorter time series – Bias Estimation
• Equal weight
• Kalman Filter
– Bias correction
: Kalman Filter weight
af
nnn afn
bn
nb )(
1ˆ1ˆ1
nnn afbb )(ˆ)1(ˆ1
nnn bff ˆ11
bafE )(a
af
Absolute bias error of Method 2
Kalman Filter Absolute bias error of 100 cases
bias error for a single case [equal weight]
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 200 400 600 800 1000
corr 0.0
corr 0.3
Red points: the point of equal weighting bias error corresponding to the average of the KF bias error from 1001 to 10000 based on the correlation (~90)
equal weight abasolute bias error for 100 cases
0
0.1
0.2
0.3
0.4
0.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
corr 0.00
corr 0.15
corr 0.30
corr 0.45
corr 0.60
corr 0.75
corr 0.90
Kalman Filter method (alpha = 0.02)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500
n
absolu
te b
ias e
rror
m3 0.0
m3 0.3
m3 0.6
m3 0.95
Comparison of Methods 1 & 2
Kalman Filter method (alpha = 0.02)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500
n
abso
lute
bia
s er
ror m1 0.0
m1 0.3
m1 0.6
m1 0.95
m3 0.0
m3 0.3
m3 0.6
m3 0.95
m1
m2 0
500
1000
1500
2000
2500
3000
0.00 0.20 0.40 0.60 0.80 1.00
correlation
sam
ple
spac
e
M1_5%
M1_10%
M2_5%
M2_10%
Equal weight method
Sample size required for the error to be less than a specific percentage of real bias
m1
m2
BIAS (Kalman Filter, method 1)
0
0.05
0.1
0.15
0.2
0.25
0.000.050.100.150.200.250.300.350.400.450.500.550.600.650.700.750.800.850.900.95
161311108654321
correlation
lead time(day)
bias
Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test
• Assumption– Uncertainty is perfectly known
(no bias in 2nd momentum)
0.95 0.75 0.20
1bbb o • Forecast– Bias increases with lead time (decreases with correlation)– Modified bias– Bias is standardized by climate standard deviation
1bbb o
• CRPS
ic })1({ 22jjjji ppc ,
CDF
analysis
2jj p
2)1( jj pic
ia
Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test
if
21
ia
• Ensemble distribution = forecast uncertainty– PDF of forecast
if 21
2
2
2
)(exp
2
1)(
xx
,
Effects on ensemble based probabilistic forecast Continuous Ranked Probability Score (CRPS) test
Raw fcst100 warming period
CRPS [equal weight, method 1]
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
161311108654321
correlation
lead time(day)
crps
CRPS [Kalman Filter, method 1]
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
161311108654321
correlation
lead time(day)
crps
5000 warming period
For synthetic forecast with error levels similar to that in real forecast
For synthetic forecast with error levels similar to that in real forecast
CRPS (Kalman Filter)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.000.050.100.150.200.250.300.350.400.450.500.550.600.650.700.750.800.850.900.95
161311108654321correlation
lead time(day)
CR
PS
For synthetic forecast with error levels larger than that in real forecast
Summary• Working with synthetic analysis/forecast data sets is useful in the
investigation of the performance of various statistical bias correction methods. (quick assessment/comparison)
• Bayesian type bias estimation method may have the additional benefits (bias error).
• Bias error is independent of bias level, but the probabilistic forecast error can be reduced as the bias is larger.
• Need to consider realistic ensemble forecast and more complex bias estimation algorithms (comparing frequency and Bayesian approaches).