Upload
maude-palmer
View
222
Download
0
Embed Size (px)
Citation preview
Development of a Macro Editing Approach
Work Session on Statistical Data Editing, Topic v: Editing based on results21-23 April 2008WP 30
Overview
• Introduction to series of surveys that measures U.S. petroleum product supplied
• Limitation of micro editing and need for an edit approach at the aggregate level
• Approach considered for macro editing and the three types of models developed using one product as an example
• In sample forecast results and out-of-sample forecast performance results
• Summary and conclusions
The PSRS and Micro Edit Limitations
• The surveys, respondents and data collected– WPSRS: Weekly, six cut-off sample surveys– MPSRS : Monthly, nine population census surveys– PSA: Annual of revised monthly estimates, population census
• Limitations– Variability of responses– Lagged population coverage
• Corrective Measures– Micro editing – Imputation
The Approach
• Purpose of Study– Develop point and interval forecast at national and regional
levels– One-month ahead forecast
• Approach– Econometric time-series models– Three models : Base, ARMA, and Supplemental Models– Micro editing enhanced by providing capabilities to identify
outliers at the aggregate level
Model Development
• Model at product level– Distillate (Low Sulfur, High Sulfur, Total)– Gasoline
• Model at two geographic levels – National– Regional (PADD)
Model Forms
• Base Model: trends and seasonal factors expressed as:
• ARMA Model: Box-Jenkins approach utilizing AR and MA to capture the variation and seasonal pattern expressed as:
• Supplemental Model: Base Model with exogenous variables expressed as:
termsMAorARDShiftTrendDemandk
kkjjt
12
210
termsMAorARExogDShiftTrendDemandi
iik
kkjjt
12
210
m
n
mmjjt ARMAShiftTrendDemand
1
10
US Distillate Demand: 1996-2006
US Distillate: Total
2500
3000
3500
4000
4500
5000
1996 1998 2000 2002 2004 2006
Th
ou
san
d B
arre
ls p
er D
ay
US Distillate Demand: 1996-2006
US Distillate: High Sulfur
0
500
1000
1500
2000
1996 1998 2000 2002 2004 2006
Th
ou
san
d B
arre
ls p
er D
ay
US Distillate Demand: 1996-2006
•
US Distillate: Low Sulfur
1500
2000
2500
3000
3500
4000
1996 1998 2000 2002 2004 2006
Th
ou
san
d B
arre
ls p
er D
ay
In-Sample One-Month-Out Forecast Evaluation Statistics
Total Distillate Models
Base ARMA Suppl.
RMSE 100.55 126.48 89.36
MAE 83.03 97.32 73.96
MAPE 2.22 2.59 1.98
HSD Models
Base ARMA Suppl.
RMSE 83.48 109.39 71.22
MAE 63.67 84.96 54.99
MAPE 5.76 7.59 5.13
LSD Models
Base ARMA Suppl.
RMSE 74.74 94.84 74.36
MAE 60.22 76.42 59.85
MAPE 2.28 2.93 2.26
Note: There is no evidence of bias in any of the models
U.S. Distillate DemandBest Model Summary Statistics
Total HSD LSD
Adjusted R2 0.909 0.898 0.949
S.E. of Regression
96.58 76.98 79.55
Note: Estimation period Jan 1996 through Dec 2006
In-Sample Model Fit: Best Model 2000-2006 ( 2 forecast standard errors)
US: Total Distillate
2500
3000
3500
4000
4500
5000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
Total Distillate: US Model
US: HSD Demand
0
500
1000
1500
2000
2500
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
Total HSD: US Model
In-Sample Model Fit: Best Model 2000-2006( 2 forecast standard errors)
In-Sample Model Fit: Best Model 2000-2006( 2 forecast standard errors)
US: LSD Demand
1500
2000
2500
3000
3500
4000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
Total LSD: US Model
Out-of-Sample Forecast Results: Best Model 2006-2007
US: Total Distillate
2500
3000
3500
4000
4500
5000
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
Total Distillate: US Model
In-Sample Out-of-Sample
Out-of-Sample Forecast Results: Best Model 2006-2007, HSD
US: Total HSD
0
500
1000
1500
2000
2500
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
Total HSD: US Model
Out-of-SampleIn-Sample
Out-of-Sample Results: Best Model 2006-2007, LSD
US: Total LSD
1500
2000
2500
3000
3500
4000
4500
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
Total LSD: US Model
Out-of-SampleIn-Sample
Regional Models• Regions: Petroleum Administration for Defense District• Identify exogenous variables to explain regional patterns of distillate demand
– Residential heating in the Northeast (PADD 1): Heating Degree-Days– Agriculture in the Midwest (PADD 2): Precipitation
HDD DEV Population-Weighted Heating Degree-Days: Deviation from NormalPRECIP DEV Area-Weighted Precipitation: Deviation from Long-Term NormalEMP TRANS Employment in Transportation IndustriesIPI MFG Index of Industrial Production for Durable GoodsFREIGHT INDX Transportation Services Index for Freight PRICE RATIO Average monthly spot price ratio: No.2 Fuel Oil / Natural Gas
Exogenous Variables Used in Supplemental Distillate Models
PADD 1 PADD 2 PADD 3 PADD 5 NATIONAL
HSD LSD TOT HSD LSD TOT HSD LSD TOT HSD LSD TOT HSD LSD TOT
HDD DEV X X X X X
PRECIP DEV X X X X
EMP TRANS X X
IPI MFG X
FREIGHT INDX X X X X
PRICE RATIO X
Regional Model Details: In-Sample Model Fit, PADD 1 HSD
PADD 1: HSD
0
200
400
600
800
1000
1200
1400
1600
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
HSD: PADD 1 Model
Regional Model Details: In-Sample Model Fit, PADD 1, LSD
PADD 1: LSD
400
500
600
700
800
900
1000
1100
1200
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
LSD: PADD 1 Model
Regional Model Details: In-Sample Model Fit, PADD 2, HSD
PADD 2: HSD
0
100
200
300
400
500
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
HSD: PADD 2 Model
Regional Model Details: In-Sample Model Fit, PADD 2, LSD
PADD 2: LSD
400
600
800
1000
1200
1400
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06
Th
ou
san
d B
arre
ls /
Day
LSD: PADD 2 Model
Regional Model Details: Out-of-Sample Forecast Results, PADD 1, HSD
PADD 1: HSD
0
200
400
600
800
1000
1200
1400
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
HSD: PADD 1 Model
In-Sample Out-of-Sample
Regional Model Details: Out-of-Sample Forecast Results, PADD 1, LSD
PADD 1: LSD
0
200
400
600
800
1000
1200
1400
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
LSD: PADD 1 Model
In-Sample Out-of-Sample
Regional Model Details: Out-of-Sample Forecast Results, PADD 2, HSD
PADD 2: HSD
0
200
400
600
800
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
HSD: PADD 2 Model
In-Sample Out-of-Sample
Regional Model Details: Out-of-Sample Forecast Results, PADD 2, LSD
PADD 2: LSD
600
800
1000
1200
1400
Jan-06 Jul-06 Jan-07 Jul-07
Th
ou
san
d B
arre
ls /
Day
LSD: PADD 2 Model
In-Sample Out-of-Sample
Benefits & Limitations
• How does this improve EIA’s current activities?– Establishes a range of expected results at the aggregate level that will
alert a reviewer when to investigate possible anomalies in the respondent data
– Can identify the region which provides largest contribution to deviation, guiding further editing and imputation activities prior to data release
– Reduces risk of revisions to released data
• Limitations of Modeling– Reasons for deviations are not always readily apparent: respondent
error, structural shifts in consumption, or failure of the model to respond to external influences
– Regional-level models provide guidance, but not necessarily answers
– Ranges may be too large
Future Plans
• Model improvements– Dynamic adjustments to known issues like shifts– Better exogenous variables
• Automation of gathering and formatting model inputs– Weather Data– Economic Data– Forecast generation
• Expand to other key petroleum products– Gasoline and gasoline subcomponents (currently underway)– Residual fuel oil