A General Framework for Enhancing Prediction Performance on Time Series Data

A General Framework for Enhancing Prediction Performance on Time Series Data

Chin-Hui Chen陳晉暉

Prof. Pu-Jen Cheng鄭卜壬教授

增進時序資料預測效能之一般化模型

1. Motivation2. Related Works3. Framework4. Experiment5. Conclusion

Agenda

Motivation

● Time series data is everywhere.● For example:

○ Query Trend Data○ Traffic Flow Data

Motivation

Google Trends: "typhoon". Japan, 2004 - 2013.

Google Trends: "typhoon". Japan, 2012.

Traffic Flow: ETC bridge, 2009 - 2011

Traffic Flow: ETC bridge, 2010

Predict Time Series Data● Time series data: {Yi} where i=1,...,t

(t = current timestamp)Value Yi is specific data property. E.g. Traffic Flow, Query Frequency.

● Given {Yi} and prediction horizon h, predict the value of {Yj} where j=t+1,...,t+h.

Predict Time Series Data (cont'd)● Many researches have been studying to

predict time series data. For example: Neural Network based method, Regression based method.

● These methods use past data {Yt-n,..., Yt} to forecast future data {Yt+1,..., Yt+h}.

Predict Method

Past Data {Yt-n,..., Yt}

Future Data {Yt+1,..., Yt+h}

Predict Time Series Data (cont'd)● Short-term prediction

○ h=1○ e.g. Predict {Yt+1}.

● Long-term prediction○ h>1○ e.g. Predict {Yt+1,..., Yt+13}.

Traffic Flow Prediction (h=1)


Intuitively...● The nearer the dataset is, the more accurate

we predicts.

● The longer the prediction horizon is, the more error occurs.

● The nearer...the dataset is, the more accurate we predicts.

● Predict Method: Exponential Smoothing

● We apply Exponential Smoothing on Traffic Flow Data.

● Traffic Flow Prediction

The longer...the prediction horizon is, the more error occurs. ● Prediction Horizon = 10

However,1. Trend

2. Periodicity

● If the predict method captures the trend or periodicity of the time series data, then...

● Predict Method: Neural Network ○ Capture Periodicity

nearer NOT ALWAYS accurate

longer NOT ALWAYS error

Also● Continuous & Dependent

a. Time Series Data is continuous. So the prediction can be continuous.

b. The neighbor prediction results may cover each other and improve each other. If we want to predict at time t, it is possible to use result at t-1 or t-2 to cover the result.


Multiple Prediction● Therefore, for each data point in time series,

it has been predicted h times. ● We will have "multiple prediction" in given

data point in time series.

Yt+2 Yt+3Yt+1

Yt+4

1st Prediction (farthest)

2nd Prediction

3rd Prediction

Deep Color -> Accurate

Yt+2 Yt+3Yt+1

Yt+4

10/40

12/40

18/40

● The most accurate result may not always happen in the latest one.

● We propose a general enhancement framework to utilize prediction results of multiple prediction to improve the accuracy.

Related Works

Related WorksTime Series Predict Methods: 1. Machine Learning Based

a. Neural Network2. Regression Based

a. ARIMA approachb. Holt-Winters ES approach

Neural Network

NNetThe architecture of multilayer perceptron is as follows: ● Notation: NNet(i, h)● Input Layer: i neurons● Single hidden layer: 4 neurons ● Output Layer: h neurons

○ The input neurons include {v(k), k = t−i+1, ..., t}, while the output neuron is {v(t+1)...v(t+h)}, where t represents the current time.

● Tangent sigmoid function and linear transfer function are used for activation function.

● This model is trained using back-propagation algorithm over the training dataset.

ARIMA

ARIMA● Stands for "Autoregressive integrated

moving average"● The model comprises 3 parts.

○ differencing○ autoregressive (AR)○ moving average (MA)

● Seasonal○ NS-ARIMA: Nonseasonal ARIMA○ S-ARIMA: Seasonal ARIMA

Differencing: non-stationary -> stationary

● stationary:○ A stationary time series is one whose statistical

properties such as mean, variance, autocorrelation, etc. are all constant over time.

NS-ARIMA● Notation: ARIMA (p, d, q)

○ d = the order of differencing○ p = the order of autoregressive ○ q = the order of moving average

NS-ARIMAARIMA(p, d, q):

S-ARIMAARIMA(p, d, q)(P,D,Q)s:

S-ARIMAARIMA(p, d, q)(P,D,Q)s:

e.g.ARIMA(1,0,1)(1,1,2)12

● In this work, S-ARIMA is adopted.

Holt-Winters ES

Holt-Winters ES 1. Stands for "Holt-Winters Exponential

Smoothing" 2.

Trend

ActualSmoothed

Periodicity

Framework

● To improve time series prediction, a general enhancement framework is proposed.

● The framework utilizes multiple prediction results and tries to learn the data dependency to improve the accuracy.

Predict Method

Multiple Prediction

Overview


STE (Short-Term Enhancement)

LTE-NR (Long-Term Enhancement NRegression)

{NNet, ARIMA or HW-ES}

LTE-R (Long-Term Enhancement Regression)

● Given a predict method, the multiple prediction result can be generated. The enhancement algorithms input these information and learn from it.

● The multiple prediction result and the corresponding labels are listed in the following slide.

z13

z1

z2

z3

Yt+2 Yt+3Yt+1

X1X2Yt+4X3


2nd Prediction

3rd Prediction

STE (Short-Term Enhancement)● SVR (Support Vector Regression) is adopted.● Target Value: Yt+1

● As the multiple prediction is done, it is possible to have more accurate prediction values among Z1 - Z13.

Feature Set1. S1: Statistic

a. Trimmed Mean (t_mean)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling (gaussian_dist)

2. S2: Reliabilitya. Avg Min Error (avg_min_e)b. Last Min Error (last_min_e)c. Trend (trend)

3. Periodicity Feature

z13

z1

z2

z3

Yt+2 Yt+3Yt+1

X1X2Yt+4X3


2nd Prediction

3rd Prediction

S1 Statistic 1. Trimmed mean (t_mean) It calculates the mean after discarding given parts of a probability (P%) at high and low end.

Mean(Z1,...,Zh) trimmed with P = 10%.

2. Last N Prediction (last_n)For the elements: Zh, Zh-1,..., Z1 , get the lastest N predictions. N = 1 is applied. (E.g. Z13 )

3. Gaussian Distribution Modeling (gaussian_dist)

where μ = mean(Z1,...,Zh), σ = std(Z1,...,Zh)

Produce N values from the distribution. N = 1 is applied.

z12

z13

z1

z2

z3

Yt+2 Yt+3Yt+1

X1X2Yt+4X3


2nd Prediction

3rd Prediction

Vz1

Vz2

Vz3

Vz12

S2 Reliability 1. Avg Min Error (avg_min_e)

Ground Truth

Long-Term Predict

Long-Term Predict

1. Avg Min Error (avg_min_e)

VZk : the vector of partial predicted results

GTZk : the corresponding ground truth of VZk

Select Zk with the min MAE1(VZk , GTZk)

where k = 1,...,h-1

1 MAE = Mean Absolute Error

2. Last Min Error (last_min_e)

Ground Truth

Long-Term Predict

Long-Term Predict

2. Last Min Error (last_min_e)

Select Zk with the min MAE( VZk[1] , GTZk[1] )

where k = 1,...,h-1

3. Trend (trend)

Ground TruthLong-Term PredictLong-Term Predict

3. Trend (trend)difference: d(m)(t) = d(m)(t) - d(m)(t-1)

Select Zk with the max

cosine_sim( d(1)(VZk) , d(1)(GTZk ) )

where k = 1,...,h-1 and |VZk|>3

Periodicity Feature● The previous period data represents certain

accurate confidence. Therefore, we consider periodicity into feature set property.

● Periodicity detection: FFT(Fast Fourier transform)

● Add periodicity enhancement to S1 and S2.

Period

z12

z13

z1

z2

z3

Yt+2 Yt+3Yt+1

X1X2Yt+4X3

Vz1

Vz2

Vz3

Vz12

zP

Vzp

Feature Set w/ Periodicity1. S1: Statistic w/ Periodicity

a. Trimmed mean (t_mean_wp)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling

(gaussian_dist_wp)2. S2: Reliability w/ Periodicity

a. Avg Min Error (avg_min_e_wp)b. Last Min Error (last_min_e_wp)c. Trend (trend_wp)

S1 Statistic w/ Periodicity 1. Trimmed mean (t_mean_wp) It calculates the mean after discarding given parts of a probability (P%) at high and low end.

Mean(Z1,...,Zh,Zp) trimmed with P = 10%.

3. Gaussian Distribution Modeling (gaussian_dist_wp)

where μ = mean(Z1,...,Zh,Zp), σ = std(Z1,...,Zh,Zp)

Produce N values from the distribution. N = 1 is applied.

S2 Reliability w/ Periodicity 1. Avg Min Error (avg_min_e_wp)

VZk : the vector of partial predicted results

GTZk : the corresponding ground truth of VZk

Select Zk with the min MAE(VZk , GTZk)

where k = 1,...,h-1,p

2. Last Min Error (last_min_e_wp)

Select Zk with the min MAE( VZk[1] , GTZk[1] )

where k = 1,...,h-1,p

3. Trend (trend_wp)difference: d(m)(t) = d(m)(t) - d(m)(t-1)

Select Zk with the max

cosine_sim( d(1)(VZk) , d(1)(GTZk ) )

where k = 1,...,h-1,p and |VZk|>3

Feature Set w/ Periodicity1. S1: Statistic w/ Periodicity

a. Trimmed mean (t_mean_wp)b. Last N Prediction (last_n)c. Gaussian Distribution Modeling

(gaussian_dist_wp)2. S2: Reliability w/ Periodicity

a. Avg Min Error (avg_min_e_wp)b. Last Min Error (last_min_e_wp)c. Trend (trend_wp)

z12

z13

z1

z2

z3

Yt+2 Yt+3Yt+1

X1X2Yt+4X3

Vz1

Vz2

Vz3

Vz12

zP

Vzp

Predict Method

Multiple Prediction

Overview






LTE (Long-Term Enhancement)● LTE-R (Long-Term Enhancement

Regression)

● LTE-NR (Long-Term Enhancement NRegression)

LTE-R (Long-Term Enhancement Regression)● After STE is done, the predicted result can be

used to improve Long-Term prediction.● Given a predict method, the method takes

STE result as one of the input value and make enhanced predictions.

●

...

LTE-NR (Long-Term Enhancement NRegression)● Train multiple SVRs to make N predictions.

Yt+2 Yt+3Yt+1

X1X2Yt+4X3

Vz1Vz2

Vz3

Vz12

Vzp

LTE-NR (Long-Term Enhancement NRegression)● These N predicted results can be passed into

the predict method to enhance the prediction.

● LTE-R is the special case of LTE-NR when N=1

● The behavior is illustrated.

...

LTE (Long-Term Enhancement)● LTE-R

● LTE-NR

...

...

Predict Method

Multiple Prediction

Overview






Experiment

Dataset●

BRS: ETC Data from Bridge Roadside System in Oceania

Data Range Jan, 2009 - Dec, 2011 (3 yrs)

Time Interval Week (ISO Week Date)

Data Weekly Traffic Flow

● Traffic-Flow Theory○ Traffic stream properties: speed(v), density(k), flow

(q).○ Flow(q)*:

i. x1: a specific detection point.(e.g., induction loop)

ii. m: the number of vehicles passing through x1.iii. T: a predefined time interval. (e.g., 1 month)

* Henry Lieu (January/February 1999). "Traffic-Flow Theory". Public Roads (US Dept of Transportation) (Vol. 62· No. 4).

Induction Loop

Photo via http://auto.howstuffworks.com/car-driving-safety/safety-regulatory-devices/red-light-camera1.htm

Observation1. Periodicity observed. 2. Spring and summer: Dissimilar, shifting.3. Fall: Regular.4. Winter: Small disturbance.

Experiment Setting● Training Data: 2009, 2010 (104 weeks) ● Testing Data: 2011 (52 weeks)● Prediction horizon:

○ Short-Term: h=1○ Long-Term: h=13 (3 months)

● Evaluate: RMSD/RMSE (stands for Root-Mean-Square Deviation/Error )

Model Parameters● NNet:

○ 5-fold CV.○ input neurons: 52○ output neurons: h

● ARIMA: ○ d, p, q trained by Box-Jenkins approach○ s = 52

● HW-ES:○ τ = 52

● SVR: ○ 5-fold CV.○ grid search: gamma(γ)= 2^(-3:3), cost(C)= 2^(-1:6)

STE● Baseline: NNet, ARIMA, HW-ES

NNet ARIMA HW-ES

BL 29508.35 25121.31 16438.36

S1 29096.10 (+1.40%) 27843.35* (-10.84%) 16246.83 (+1.17%)

S1_wp 24824.02** (+15.87%) 21524.15** (+14.32%) 16333.37 (+0.64%)

S2 27661.48* (+6.25%) 26718.26* (-6.36%) 15624.02* (+4.95%)

S2_wp 25178.40* (+14.67%) 21862.60* (+12.97%) 14882.54* (+9.46%)

S1+S2 28050.20* (+5.94%) 25552.13 (-1.71%) 15924.13* (+3.13%)

Total 23593.48** (+20.04%) 21182.93* (+15.68%) 15592.74* (+5.14%)

STE: BRS

T-test with p < 0.01 (**) and p< 0.05 (*) against baseline method

● NNet got the best improvement.○ NNet (+20.04%) v.s. HW-ES (+5.14%)

● HW-ES is more accurate. ○ HW-ES (16438.36 -> 15592.74)○ NNet (29508.35 -> 23593.48)

● Periodicity feature has great improvement.○ NNet ( +5.94% -> +20.04% )○ ARIMA ( -1.71% -> +15.68% )

Feature Analysis● To better understand the effectiveness of

features, we analyze the correlation between RMSE and specific feature value. (RMSE v.s. Feature)

● Three standard measurements including Pearson’s product-moment, Kendall’s tau and Spearman’s rho are considered.

● The absolute values of measurements are depicted below.

NNet h=13

● Periodicity feature overall gets better correlation.

● Without Periodicity○ gaussian_dist○ last_min_error

● With Periodicity○ last_min_error_wp○ trend_wp

LTE-R (h=13)

NNet ARIMA HW-ES

BL 24321.10 20648.60 25934.51

LTE-R 23401.23* (+3.78%)

20562.28 (+0.41%)

23636.87* (+8.86%)

T-test with p < 0.01 (**) and p< 0.05 (*) against baseline method

LTE-NR (h=13)N=1 N=2 N=3 N=4 N=5

NNet +3.78% +1.56%(-58%)

+5.26%(+39%)

+0.91%(-76%)

-0.87%(-123%)

ARIMA +0.41% +1.21%(+195%)

+0.92%(+120%)

+0.12%(-70%)

+0.13%(-68%)

HW-ES +8.86% +9.13%(+3%)

+9.59%(+7.6%)

+8.45%(-4.6%)

+3.14%(-65%)

● In LTE-R, ARIMA has the best prediction. But HW-ES improves the most.

● In LTE-NR, we can observe that when N=3 (NNet, HW-ES)or N=2(ARIMA) , the prediction is improved greatly.

Conclusion

● We design a general framework for enhancing prediction performance where the predict method can capture trend or periodicity property.

● We adopted Read-World traffic data. With the great improvement,○ City's competitiveness planning○ Improves the budget and forecast estimation○ Improve maintenance planning to optimize the

maintenance spending

Technology

A General Framework for Enhancing Prediction Performance on Time Series Data