Upload
hopebay-technologies-inc
View
359
Download
1
Embed Size (px)
Citation preview
A General Framework for Enhancing Prediction Performance on Time Series Data
Chin-Hui Chen陳晉暉
Prof. Pu-Jen Cheng鄭卜壬教授
增進時序資料預測效能之一般化模型
Predict Time Series Data● Time series data: {Yi} where i=1,...,t
(t = current timestamp)Value Yi is specific data property. E.g. Traffic Flow, Query Frequency.
● Given {Yi} and prediction horizon h, predict the value of {Yj} where j=t+1,...,t+h.
Predict Time Series Data (cont'd)● Many researches have been studying to
predict time series data. For example: Neural Network based method, Regression based method.
● These methods use past data {Yt-n,..., Yt} to forecast future data {Yt+1,..., Yt+h}.
Predict Method
Past Data {Yt-n,..., Yt}
Future Data {Yt+1,..., Yt+h}
Predict Time Series Data (cont'd)● Short-term prediction
○ h=1○ e.g. Predict {Yt+1}.
● Long-term prediction○ h>1○ e.g. Predict {Yt+1,..., Yt+13}.
Intuitively...● The nearer the dataset is, the more accurate
we predicts.
● The longer the prediction horizon is, the more error occurs.
● The nearer...the dataset is, the more accurate we predicts.
● Predict Method: Exponential Smoothing
● We apply Exponential Smoothing on Traffic Flow Data.
● If the predict method captures the trend or periodicity of the time series data, then...
● Predict Method: Neural Network ○ Capture Periodicity
nearer NOT ALWAYS accurate
longer NOT ALWAYS error
Also● Continuous & Dependent
a. Time Series Data is continuous. So the prediction can be continuous.
b. The neighbor prediction results may cover each other and improve each other. If we want to predict at time t, it is possible to use result at t-1 or t-2 to cover the result.
Multiple Prediction● Therefore, for each data point in time series,
it has been predicted h times. ● We will have "multiple prediction" in given
data point in time series.
● The most accurate result may not always happen in the latest one.
● We propose a general enhancement framework to utilize prediction results of multiple prediction to improve the accuracy.
Related WorksTime Series Predict Methods: 1. Machine Learning Based
a. Neural Network2. Regression Based
a. ARIMA approachb. Holt-Winters ES approach
NNetThe architecture of multilayer perceptron is as follows: ● Notation: NNet(i, h)● Input Layer: i neurons● Single hidden layer: 4 neurons ● Output Layer: h neurons
○ The input neurons include {v(k), k = t−i+1, ..., t}, while the output neuron is {v(t+1)...v(t+h)}, where t represents the current time.
● Tangent sigmoid function and linear transfer function are used for activation function.
● This model is trained using back-propagation algorithm over the training dataset.
ARIMA● Stands for "Autoregressive integrated
moving average"● The model comprises 3 parts.
○ differencing○ autoregressive (AR)○ moving average (MA)
● Seasonal○ NS-ARIMA: Nonseasonal ARIMA○ S-ARIMA: Seasonal ARIMA
Differencing: non-stationary -> stationary
● stationary:○ A stationary time series is one whose statistical
properties such as mean, variance, autocorrelation, etc. are all constant over time.
NS-ARIMA● Notation: ARIMA (p, d, q)
○ d = the order of differencing○ p = the order of autoregressive ○ q = the order of moving average
Holt-Winters ES 1. Stands for "Holt-Winters Exponential
Smoothing" 2.
Trend
ActualSmoothed
Periodicity
● To improve time series prediction, a general enhancement framework is proposed.
● The framework utilizes multiple prediction results and tries to learn the data dependency to improve the accuracy.
Predict Method
Multiple Prediction
Overview
Past Data {Yt-n,..., Yt}
STE (Short-Term Enhancement)
LTE-NR (Long-Term Enhancement NRegression)
{NNet, ARIMA or HW-ES}
LTE-R (Long-Term Enhancement Regression)
● Given a predict method, the multiple prediction result can be generated. The enhancement algorithms input these information and learn from it.
● The multiple prediction result and the corresponding labels are listed in the following slide.
STE (Short-Term Enhancement)● SVR (Support Vector Regression) is adopted.● Target Value: Yt+1
● As the multiple prediction is done, it is possible to have more accurate prediction values among Z1 - Z13.
Feature Set1. S1: Statistic
a. Trimmed Mean (t_mean)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling (gaussian_dist)
2. S2: Reliabilitya. Avg Min Error (avg_min_e)b. Last Min Error (last_min_e)c. Trend (trend)
3. Periodicity Feature
S1 Statistic 1. Trimmed mean (t_mean) It calculates the mean after discarding given parts of a probability (P%) at high and low end.
Mean(Z1,...,Zh) trimmed with P = 10%.
2. Last N Prediction (last_n)For the elements: Zh, Zh-1,..., Z1 , get the lastest N predictions. N = 1 is applied. (E.g. Z13 )
3. Gaussian Distribution Modeling (gaussian_dist)
where μ = mean(Z1,...,Zh), σ = std(Z1,...,Zh)
Produce N values from the distribution. N = 1 is applied.
z12
z13
z1
z2
z3
Yt+2 Yt+3Yt+1
X1X2Yt+4X3
1st Prediction (farthest)
2nd Prediction
3rd Prediction
Vz1
Vz2
Vz3
Vz12
1. Avg Min Error (avg_min_e)
VZk : the vector of partial predicted results
GTZk : the corresponding ground truth of VZk
Select Zk with the min MAE1(VZk , GTZk)
where k = 1,...,h-1
1 MAE = Mean Absolute Error
3. Trend (trend)difference: d(m)(t) = d(m)(t) - d(m)(t-1)
Select Zk with the max
cosine_sim( d(1)(VZk) , d(1)(GTZk ) )
where k = 1,...,h-1 and |VZk|>3
Periodicity Feature● The previous period data represents certain
accurate confidence. Therefore, we consider periodicity into feature set property.
● Periodicity detection: FFT(Fast Fourier transform)
● Add periodicity enhancement to S1 and S2.
Feature Set w/ Periodicity1. S1: Statistic w/ Periodicity
a. Trimmed mean (t_mean_wp)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling
(gaussian_dist_wp)2. S2: Reliability w/ Periodicity
a. Avg Min Error (avg_min_e_wp)b. Last Min Error (last_min_e_wp)c. Trend (trend_wp)
S1 Statistic w/ Periodicity 1. Trimmed mean (t_mean_wp) It calculates the mean after discarding given parts of a probability (P%) at high and low end.
Mean(Z1,...,Zh,Zp) trimmed with P = 10%.
3. Gaussian Distribution Modeling (gaussian_dist_wp)
where μ = mean(Z1,...,Zh,Zp), σ = std(Z1,...,Zh,Zp)
Produce N values from the distribution. N = 1 is applied.
S2 Reliability w/ Periodicity 1. Avg Min Error (avg_min_e_wp)
VZk : the vector of partial predicted results
GTZk : the corresponding ground truth of VZk
Select Zk with the min MAE(VZk , GTZk)
where k = 1,...,h-1,p
2. Last Min Error (last_min_e_wp)
Select Zk with the min MAE( VZk[1] , GTZk[1] )
where k = 1,...,h-1,p
3. Trend (trend_wp)difference: d(m)(t) = d(m)(t) - d(m)(t-1)
Select Zk with the max
cosine_sim( d(1)(VZk) , d(1)(GTZk ) )
where k = 1,...,h-1,p and |VZk|>3
Feature Set w/ Periodicity1. S1: Statistic w/ Periodicity
a. Trimmed mean (t_mean_wp)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling
(gaussian_dist_wp)2. S2: Reliability w/ Periodicity
a. Avg Min Error (avg_min_e_wp)b. Last Min Error (last_min_e_wp)c. Trend (trend_wp)
Predict Method
Multiple Prediction
Overview
Past Data {Yt-n,..., Yt}
STE (Short-Term Enhancement)
LTE-NR (Long-Term Enhancement NRegression)
{NNet, ARIMA or HW-ES}
LTE-R (Long-Term Enhancement Regression)
LTE (Long-Term Enhancement)● LTE-R (Long-Term Enhancement
Regression)
● LTE-NR (Long-Term Enhancement NRegression)
LTE-R (Long-Term Enhancement Regression)● After STE is done, the predicted result can be
used to improve Long-Term prediction.● Given a predict method, the method takes
STE result as one of the input value and make enhanced predictions.
●
...
LTE-NR (Long-Term Enhancement NRegression)● Train multiple SVRs to make N predictions.
Yt+2 Yt+3Yt+1
X1X2Yt+4X3
Vz1Vz2
Vz3
Vz12
Vzp
LTE-NR (Long-Term Enhancement NRegression)● These N predicted results can be passed into
the predict method to enhance the prediction.
● LTE-R is the special case of LTE-NR when N=1
● The behavior is illustrated.
...
Predict Method
Multiple Prediction
Overview
Past Data {Yt-n,..., Yt}
STE (Short-Term Enhancement)
LTE-NR (Long-Term Enhancement NRegression)
{NNet, ARIMA or HW-ES}
LTE-R (Long-Term Enhancement Regression)
Dataset●
BRS: ETC Data from Bridge Roadside System in Oceania
Data Range Jan, 2009 - Dec, 2011 (3 yrs)
Time Interval Week (ISO Week Date)
Data Weekly Traffic Flow
● Traffic-Flow Theory○ Traffic stream properties: speed(v), density(k), flow
(q).○ Flow(q)*:
i. x1: a specific detection point.(e.g., induction loop)
ii. m: the number of vehicles passing through x1.iii. T: a predefined time interval. (e.g., 1 month)
* Henry Lieu (January/February 1999). "Traffic-Flow Theory". Public Roads (US Dept of Transportation) (Vol. 62· No. 4).
Induction Loop
Photo via http://auto.howstuffworks.com/car-driving-safety/safety-regulatory-devices/red-light-camera1.htm
Observation1. Periodicity observed. 2. Spring and summer: Dissimilar, shifting.3. Fall: Regular.4. Winter: Small disturbance.
Experiment Setting● Training Data: 2009, 2010 (104 weeks) ● Testing Data: 2011 (52 weeks)● Prediction horizon:
○ Short-Term: h=1○ Long-Term: h=13 (3 months)
● Evaluate: RMSD/RMSE (stands for Root-Mean-Square Deviation/Error )
Model Parameters● NNet:
○ 5-fold CV.○ input neurons: 52○ output neurons: h
● ARIMA: ○ d, p, q trained by Box-Jenkins approach○ s = 52
● HW-ES:○ τ = 52
● SVR: ○ 5-fold CV.○ grid search: gamma(γ)= 2^(-3:3), cost(C)= 2^(-1:6)
NNet ARIMA HW-ES
BL 29508.35 25121.31 16438.36
S1 29096.10 (+1.40%) 27843.35* (-10.84%) 16246.83 (+1.17%)
S1_wp 24824.02** (+15.87%) 21524.15** (+14.32%) 16333.37 (+0.64%)
S2 27661.48* (+6.25%) 26718.26* (-6.36%) 15624.02* (+4.95%)
S2_wp 25178.40* (+14.67%) 21862.60* (+12.97%) 14882.54* (+9.46%)
S1+S2 28050.20* (+5.94%) 25552.13 (-1.71%) 15924.13* (+3.13%)
Total 23593.48** (+20.04%) 21182.93* (+15.68%) 15592.74* (+5.14%)
STE: BRS
T-test with p < 0.01 (**) and p< 0.05 (*) against baseline method
● NNet got the best improvement.○ NNet (+20.04%) v.s. HW-ES (+5.14%)
● HW-ES is more accurate. ○ HW-ES (16438.36 -> 15592.74)○ NNet (29508.35 -> 23593.48)
● Periodicity feature has great improvement.○ NNet ( +5.94% -> +20.04% )○ ARIMA ( -1.71% -> +15.68% )
Feature Analysis● To better understand the effectiveness of
features, we analyze the correlation between RMSE and specific feature value. (RMSE v.s. Feature)
● Three standard measurements including Pearson’s product-moment, Kendall’s tau and Spearman’s rho are considered.
● The absolute values of measurements are depicted below.
● Periodicity feature overall gets better correlation.
● Without Periodicity○ gaussian_dist○ last_min_error
● With Periodicity○ last_min_error_wp○ trend_wp
LTE-R (h=13)
NNet ARIMA HW-ES
BL 24321.10 20648.60 25934.51
LTE-R 23401.23* (+3.78%)
20562.28 (+0.41%)
23636.87* (+8.86%)
T-test with p < 0.01 (**) and p< 0.05 (*) against baseline method
LTE-NR (h=13)N=1 N=2 N=3 N=4 N=5
NNet +3.78% +1.56%(-58%)
+5.26%(+39%)
+0.91%(-76%)
-0.87%(-123%)
ARIMA +0.41% +1.21%(+195%)
+0.92%(+120%)
+0.12%(-70%)
+0.13%(-68%)
HW-ES +8.86% +9.13%(+3%)
+9.59%(+7.6%)
+8.45%(-4.6%)
+3.14%(-65%)
● In LTE-R, ARIMA has the best prediction. But HW-ES improves the most.
● In LTE-NR, we can observe that when N=3 (NNet, HW-ES)or N=2(ARIMA) , the prediction is improved greatly.
● We design a general framework for enhancing prediction performance where the predict method can capture trend or periodicity property.
● We adopted Read-World traffic data. With the great improvement,○ City's competitiveness planning○ Improves the budget and forecast estimation○ Improve maintenance planning to optimize the
maintenance spending