EMD and wavelet decomposition based denoising and ...web.cs.elte.hu/blobs/diplomamunkak/msc_actfinmat/2019/plangar_balint.pdfEMD and wavelet decomposition based denoising and forecasting

Eötvös Lóránd University

Corvinus University of Budapest

EMD and wavelet decomposition based denoising and

forecasting of crude oil prices

MSc thesis

Author: Supervisor:

Bálint Plangár Milán Csaba Badics

May 10, 2019

ACKNOWLEDGEMENT

Firstly, I would like to express my sincere gratitude to my advisor Milán Csaba Badics

for the continuous support of my research, for his patience, motivation, immense

knowledge and critical mindset. His guidance helped me in all the time of research and

writing of this thesis. The door to Milán’s office was always open whenever I ran into a

trouble spot or had a question about my research or writing. He consistently allowed this

paper to be my own work, but steered me in the right direction whenever he thought I

needed it. I could not have imagined having a better advisor and mentor for my research

project.

NYILATKOZAT

Név: Plangár Bálint

ELTE Természettudományi Kar, szak: Biztosítási és pénzügyi matematika

NEPTUN azonosító: JL3QFB

Szakdolgozat címe:

EMD and wavelet decomposition based denoising and forecasting of crude oil prices

A szakdolgozat szerzőjeként fegyelmi felelősségem tudatában kijelentem, hogy a

dolgozatom önálló munkám eredménye, saját szellemi termékem, abban a hivatkozások

és idézések standard szabályait következetesen alkalmaztam, mások által írt részeket a

megfelelő idézés nélkül nem használtam fel.

Budapest, 2019.05.10 ______________________

a hallgató aláírása

Table of Contents

1. Int roduct ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

2 . Lit erature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

3 . Cr it ica l review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4. Research framework fo r financ ia l t ime ser ies fo recast ing . . . . . . . . . . 12

5. Poss ible research quest ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6. Decomposit ion methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1. Empir ica l mode decomposit io n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.2. Discret e wave let based deco mposit ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

8. Empir ica l ana lys is and resu lt s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.1. Pred ict ion st rat egy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8.2. Pred ict ion mode l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8.3. Resu lt s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

9. Robustness check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

10. Conc lus io n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

List of figures

1. FIGURE: SIMPLIFIED REPRESENTATION OF THE FOUR BROAD RESEARCH DESIGNS, SOURCE: OWN

FIGURE ........................................................................................................................................ 14 2. FIGURE: RESEARCH FRAMEWORK OF FINANCIAL TIME SERIES FORECASTING, SOURCE: OWN FIGURE

.................................................................................................................................................... 19 3. FIGURE: PLOTTING THE ENVELOPE AND THEIR MEAN, SOURCE: METATRADER, 2012..................... 25 4. FIGURE: COMPARISON OF TRANSFORMATIONS, SOURCE: ULIHA, 2016, 512.P ................................. 26 5. FIGURE: PROCESS OF WAVELET DECOMPOSITION, SOURCE: MIRZAEI ET AL., 2010, 303.P. ............. 29 FIGURE 6: BRENT CRUDE OIL PRICES AND RETURNS FOR THE ENTIRE SAMPLE, SOURCE: OWN FIGURE

.................................................................................................................................................... 31 7. FIGURE: NUMBER OF IMFS DURING THE ESTIMATION PERIOD USING EXPANDING WINDOW, SOURCE:

OWN FIGURE ............................................................................................................................... 32 8. FIGURE: COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD ON THE ENTIRE SAMPLE,

SOURCE: OWN FIGURE ................................................................................................................ 33 9. FIGURE: IN-SAMPLE COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD, SOURCE: OWN

FIGURE ........................................................................................................................................ 34 10. FIGURE: COMPONENTS OF BRENT CRUDE OIL GENERATED BY EMD DURING THE RECESSION,

2006.09. – 2010.09., SOURCE: OWN FIGURE ................................................................................ 35 11. FIGURE: PREDICTION PROCESS, SOURCE: OWN FIGURE ................................................................ 36 12. FIGURE: SELECTED RESEARCH DESIGN FOR EMPIRICAL MODE DECOMPOSITION, SOURCE: OWN

FIGURE ........................................................................................................................................ 37 13. FIGURE: RATIO OF SIGNIFICANT LAGS IN THE FIRST THREE IMFS, SOURCE: OWN FIGURE ........... 41 14. FIGURE: TYPICAL VALUES OF PERMUTATION ENTROPY ESTIMATED FROM DENOISED SIGNALS,

SOURCE: OWN FIGURE ................................................................................................................ 42 15. FIGURE: NUMBER OF DROPPED IMFS BASED ON SAMPLE ENTROPY AND NUMBER OF GENERATED

IMFS USING EXPANDING WINDOW, SOURCE: OWN FIGURE ......................................................... 43 16. FIGURE: NUMBER OF DROPPED IMFS BASED ON SHANNON ENTROPY AND NUMBER OF GENERATED

IMFS USING EXPANDING WINDOW, SOURCE: OWN FIGURE ......................................................... 43 17. FIGURE: NUMBER OF DROPPED DETAIL COMPONENTS BASED ON SHANNON AND SAMPLE ENTROPY

USING EXPANDING WINDOW, SOURCE: OWN FIGURE ................................................................... 45 18. FIGURE: DENOISED SIGNALS AND THEIR PERMUTATION ENTROPY USING WAVELET

DECOMPOSITION, SOURCE: OWN FIGURE .................................................................................... 46 19. FIGURE: CUMULATIVE RSE OF THE TWO BEST PERFORMING MODELS THROUGHOUT THE OUT-OF-

SAMPLE PERIOD, SOURCE: OWN FIGURE ..................................................................................... 48 20. FIGURE: HISTOGRAMS OF THE NUMBER OF IMFS USING ROLLING WINDOW, SOURCE: OWN FIGURE

.................................................................................................................................................... 49 21. FIGURE: PERMUTATION ENTROPY BASED NOISE SELECTION IN CASE OF EMD, SOURCE: OWN

FIGURE ........................................................................................................................................ 50 22. FIGURE: NUMBER OF DROPPED DETAIL COMPONENTS BASED ON SHANNON AND SAMPLE ENTROPY

USING WEEKLY DATA AND ROLLING WINDOW, SOURCE: OWN FIGURE ........................................ 50

EMD and wavelet decomposition based denoising and forecasting of crude oil prices Bálint Plangár

1

1. Introduct ion

Signal processing is long-known technique for analyzing and detecting hidden

components in a measured signal. It has been applied mainly in the field of electrical

engineering, however signal processing has several application fields for example

processing or interpreting spoken words (Smith et al., 2017), processing pictures or

videos (Baimbetov, 2015). It can be used also for image or video compression (Berres et

al., 2017) and noise reduction (Boukhayma et al., 2016).

Signal decomposition is a useful technique for applying noise reduction or

analyzing the original time series in a less complicated representation. The most

commonly used strategy is the ‘divide and conquer’ strategy, which is a decomposition-

ensemble learning paradigm. The strategy divides the original time series into meaningful

components then predicts the components instead of the original time series. The

decomposition-ensemble models show better performance than the conventional single

models. Signal decomposition is also useful for noise reduction which helps focusing on

the most important components of the time series. Noise reduction means dropping a

component or components after decomposition. Studies showed that noise reduction can

guarantee high superiority in data fitting, resulting in better prediction performance.

(Jammazi & Aloui, 2012) (Guo et al., 2012) (Harris & Yilmaz, 2009). We can think of

signal processing (decomposition, noise reduction) as the preprocessing stage of model

building.

Financial time series have the characteristics of complex nonlinearity, dynamic

variation, high irregularity and non-stationarity (Watkins and Plourde, 1994) (Krichene,

2007) (Zhang et al, 2015). That is why conventional financial econometric tools

(ARIMA, GARCH, VAR etc.) are not efficient methods for describing financial time

series. Even machine learning models failed to fit the data and produce satisfying

prediction results. Due to the benefits of signal processing several studies applied signal

decomposition in the field of economic/financial time series prediction. The advantages

of signal processing turned the attention of researchers to the methods of signal

processing.

The traditional forecasting strategies can be generally described as (Tang et al.

2014):

𝑋𝑡+ℎ = 𝑓(𝑋𝑡) + 𝜀𝑡 (1)


2

where 𝑋𝑡 denotes the value of a time series at time t, h is the prediction horizon, 𝑋𝑡 =

{𝑋𝑡−1, … , 𝑋𝑡−𝑙} are the past values of the original time series and 𝜀𝑡 is the prediction errors

following independent and identical distribution. Based on the f function design and the

parameter evaluation methods, the existing models for crude oil price forecasting can fall

into three main types (Yu et al., 2015): (1) traditional econometric models with relatively

simple fixed functions and strict data assumptions for example auto-regressive integrated

moving average (ARIMA) (Xiang & Zhuang, 2013), generalized autoregressive

conditional heteroscedasticity (GARCH) (Nomikos & Andriosopoulos, 2012), vector

auto-regression (VAR) (Mirmirani & Li, 2005) or error correction models (ECM) (Lanza

et al., 2005). (2) Machine learning techniques with flexible functions and self-learning

capability such as artificial neural networks (ANN) (Guo et al., 2012), support vector

machines (SVM) (Kim, 2003) or support vector regressions (SVR) (Lin et al., 2012). (3)

Hybrid models combining several single models. (Yu et al, 2015)

Nevertheless these techniques gradually infiltrated into the field of financial time

series analysis because these methods are able to represent smooth and also volatile

functions in a way that can obtain time and frequency information of a time series

(Yousefi et al., 2005) (Guo et al., 2012) (Bekiros, Marcellino, 2013). The prediction of

oil prices are even more challenging than the prediction of financial time series, since the

price of oil is strongly influenced by many factors, which can cause large-scale price

movements for example political events, investors’ expectations about the future, weather

or economic reports of top oil producing countries.

Oil price series forecasting receives a great attention, since oil price plays an

important role in the world economy (Guan et al., 2016) (H-Y. Zhang et al. 2015)

(Juvenal, Petrella, 2014). Crude oil is among the most important energy resource since it

is the world’s most dominant fuel, making up just over a third of all energy consumed

(BP, 2018). Furthermore crude oil is also the world’s largest and most actively traded

commodity, Brent crude oil and West Texas Intermediate are among the top three most

traded commodities in the world (FIA, 2018).

Although the current literature of traditional financial econometric forecasting has

promising results, the tools of signal processing are not widespread in financial

econometric researches. Nonetheless one can find several mistakes in the literature, which

make the reproduction and comparability of articles difficult. There is no general research

framework, which can help categorize the articles. It is not possible to recreate most of

the papers because they lack the necessary parameters, data or program. It is a common


3

mistake that papers ignore the look-ahead bias, since their models use future information.

Some papers do not specify the window size or type (rolling or expanding) and the

hyperparameter optimization method, furthermore researchers rarely emphasize the

sensitivity of the applied method to window type and size. The selection of benchmark

models are often not designed properly, frequently a flexible model is compared to a

relatively simple one. It is a frequent mistake in the literature that the differences between

EMD and wavelet are analysed with different prediction models, consequently the partial

effects (decomposition, noise reduction etc.) are not described thoroughly. Some papers

overcomplicate their prediction models using for example neural network for the

decomposed data and another neural network for the predictions of the previous one.

There are papers which reconstruct the components into low-medium-high frequency

components or low-medium-high-trend components etc. however often there is no

analysis on the optimality of the reconstruction method. The model comparison often

lacks statistical hypothesis testing. In most of the papers an economic evaluation based

on the prediction model (portfolio selection, Sharp-rate etc) or robustness check (different

frequencies, volatile vs calm period) are missing. Some papers compare their prediction

models based only on one time series data and they choose a relatively short out-of-

sample period.

Given the available literature, the paper’s contribution is threefold: (1) the paper

provides a thorough literature review based on the most important articles, (2) the paper

introduces a general research framework which describes the possible research designs in

decomposition based economic/financial time series forecasting and classifies the articles

introduced in the literature review with the help of the general research framework, (3)

the paper compares PACF, entropy and the expert judgement based noise selection

methods in terms of their contribution to prediction accuracy.

The remainder of this paper is organized as follows. Section 2 summarizes the most

important studies that has been carried out in decomposition based financial time series

forecasting. Section 3 provides a critical review of the literature, focusing on the factors

that restrain papers’ reproducibility and comparability. Section 4 describes the general

research framework for the current and future literature of financial time series

forecasting. Section 5 provides some of the possible research questions and designs that

can be formulated based on the framework. The applied methodologies including

advanced techniques and the research framework of the paper are described in section 6.

After that section 7 introduces the research design and the time series data, which is


4

followed by the description of the empirical analysis and results in section 8. Finally,

section 9 provides a robustness check for the prediction models and section 10 concludes

the paper.

2. Literature review

This section summarizes the most important studies that has been carried out in

decomposition based economic/financial time series forecasting. All the papers apply one

of the decomposition methods from wavelet or empirical mode decomposition family.

The main purpose of this section is to describe the trends in financial time series

forecasting, focusing particularly on the differences of signal processing methods.

Yousefi et al. (2005) illustrated an application of wavelets as a possible technique

for investigating the issue of market efficiency in futures markets for crude oil. They

introduced a wavelet-based prediction procedure to provide forecasts for the spot price

over the horizons of one, two, three and four months. The results of their models are

compared with data from the actual futures markets for oil. The relative performance of

this procedure is used to investigate whether futures markets are efficiently priced. They

used average monthly WTI spot prices and NYMEX futures prices, the data covers the

period 1986-2003. The Daubechies’ wavelet of order seven and a five level wavelet

decomposition is applied as a prediction model. The predictions are calculated as an

extension of the decomposed data on each level, then the authors reconstructed the data

with the help of inverse wavelet transform. For the approximation level a spline fit, while

for the lower detail levels a trigonometric fit is applied. Researchers came to the

conclusion that the futures market might not be efficiently priced, since the wavelet-based

predictions of spot prices were closer to the real spot prices than actual futures prices.

Jammazi and Aloui (2012) implemented the dynamic properties of multilayer back

propagation neural network (MBPNN) and the Harr A trous wavelet with six level

decomposition to achieve prominent prediction of crude oil prices. They use monthly

WTI crude oil spot prices to generate out of sample forecasts, the data covers the period

from 1988 to 2010. They choose the prediction horizon to be 19 months and the

conventional MBPNN as a benchmark model. To ameliorate the fitting ability of the

MBPNN, the high frequency components (D1 – D6) are dropped and only the smoothed

signal is used for model building. The inverse wavelet transform is applied for the

smoothed component to reconstruct smoothed WTI prices. They came to the conclusion


5

that reducing excess noise from WTI price can ameliorate the fitting ability of the

MBPNN, since the hybrid model outperformed the standard MBPNN model.

Bekiros and Marcelino (2013) used a shift-invariant wavelet transform to analyze

the dependence structure and predictability of currency markets across different

timescales. Their study attempts to probe into the micro-foundations of across-scale

causal heterogeneity on the basis of trader behavior with different time horizons. They

use three time series of daily closing currency rates, namely EUR/USD, YPN/USD and

GBP/USD. They calculated the foreign exchange returns and realized volatility series for

model building purposes and they chose random walk as a benchmark model. The data

span a time period from 1999.01.05. to 2010.05.10. (2960 observations). The researchers

determined the optimal level of multiscale decomposition with respect to the

minimization of the Shannon entropy-related criterion. They used different models for

the approximation and the details. For the approximation level (A4) a cubic spline fit,

while for the details (D1-D4) ARIMA is applied to extend the decomposed signal. The

prediction procedure includes the following steps: invariant transformation with the

SIDWT, boundary extension with spline and ARIMA, reconstruction of the wavelet

series with inverse SIDWT, finally the out-of-sample forecasts for one day to five day are

obtained and compared to the prediction calculated from neural network. The authors

showed that the application of wavelet decomposition and artificial neural networks

provided enhanced predictability.

Yu et al. (2008) proposed an empirical mode decomposition (EMD) based neural

network ensemble learning paradigm for forecasting crude oil spot prices. They used

daily WTI and Brent crude oil from the period of 1986-2006. After the original crude oil

spot series were decomposed, a three-layer-feedforward neural network (FNN) model

was used to model each of the extracted IMFs. After that an adaptive linear neural

network (ALNN) was applied to formulate an ensemble output for the original crude oil

price series. The following models were used as benchmarks: EMD-FNN-Averaging,

EMD-ARIMA-ALNN, EMD-ARIMA-Averaging, Single FNN, and single ARIMA. The

authors’ results show that the decomposition-and-ensemble strategy can effectively

improve the prediction performance based on RMSE and deviation statistics. They also

show that EMD is a meaningful tool for prediction performance improvement.

Lin et al. (2012) proposed a hybrid forecasting model using EMD and least squares

support vector regression (LSSVR) for foreign exchange rate forecasting. The LSSVR is

constructed to forecast each IMFs and the residual value individually and then all these


6

forecasted values are aggregated to produce the final forecasted value for foreign

exchange rates. This is a typical application of ‘divide and conquer’ strategy. Daily

USD/NTD, JPY/NTD and RMB/NTD exchange rates are used and the data covers the

period 2005.07.01. – 2009.12.31. The researchers use the following benchmark models:

EMD-ARIMA, single LSSVR and single ARIMA without time series decomposition.

Their results show that the proposed EMD-LSSVR model outperforms the benchmark

models based on various statistical performance measures.

Xiong et al. (2013) proposes a hybrid model built on EMD based on the feed-

forward neural network (FNN) modeling framework incorporating the slope based

method (SBM). The slope based method is proposed to restrain the end effect that

occurred during the shifting process of EMD. The authors examine the iterated, direct and

multiple-input multiple output (MIMO) forecasting strategy. After the original crude oil

spot series were decomposed, a three-layer-feedforward neural network (FNN) model

was used to model each of the extracted IMFs. This was followed by the application of

another FNN to formulate an ensemble output for the original crude oil price series.

Weekly data from the WTI crude oil spot price are used between the period of 2000.01.07.

– 2011.12.30. They examine several prediction horizons including 4, 8, 12, 16, 20 and

24. The researchers use the following models as benchmarks: single FNN without EMD,

naïve random walk without EMD and EMD-FNN without SBM. The results indicate that

the proposed EMD-SBM-FNN model using the MIMO strategy is the best in terms of

prediction accuracy.

Shu-ping et al.’s (2014) study incorporates the idea of decomposition-

reconstruction-ensemble. The new insight of their paper is to use the run length judgement

method to reconstruct the component sequences based on the characteristics of the

components. They built a multiscale combined forecasting model based on EMD. They

apply ANN and SVM as prediction models. Monthly spot price of WTI crude oil from

January 1986 to November 2013 is selected. The oil price series was decomposed and

reconstructed into high-, medium-, low frequency and trend sequences. They use ANN

model for the high frequency, SVM for medium and low frequency individually and

ARIMA for the trend component. The authors apply another SVM to formulate an

ensemble output for the original time series. In their analysis the researchers apply the

run length judgement method, which is a potential tool for noise selection, however they

do not drop any components. Their model generated out of sample prediction for 12 and

23 periods ahead. They came to the conclusion that the multiscale combined model


7

obtained the best forecasting result compared with single ARIMA, Elman, SVM and

GARCH and combined models including ARIMA-SVM and EMD-SVM-SVM method.

Yu et al. (2015) proposed a decomposition-ensemble methodology with data-

characteristic driven reconstruction for crude oil price forecasting to enhance prediction

accuracy and reduce computation complexity. Four main steps are involved in the study:

data decomposition for simplifying the complex data, component reconstruction based on

data-characteristic driven modeling, individual prediction for each reconstructed

component and ensemble prediction for final output. The weekly crude oil prices in the

WTI and Brent markets are used, the data covers the period January 1986 and July 2014.

They analyze multiple reconstruction methods including run length judgement, fine to

coarse and sample entropy reconstruction. Besides, numerous benchmark models are

applied to test their proposed method including typical decomposition-ensemble models

without reconstruction and similar decomposition-ensemble models with existing

reconstruction strategies. The authors tested the proposed method with several prediction

horizons including 1,2,3 and 4 weeks. The results indicate that the data-characteristic

driven reconstruction approach improves the existing decomposition-ensemble

techniques based on statistical performance measures and computational time.

Zhu et al. (2016) developed an adaptive multiscale ensemble learning paradigm

incorporating ensemble empirical mode decomposition (EEMD), particle swarm

optimization and LSSVM with kernel function prototype. Three main steps are involved

in the study: with the help of extrema symmetry expansion EEMD (ESE-EEMD) the

original oil price series is decomposed, after that the authors applied the fine-to-coarse

reconstruction algorithm in order to identify the high frequency, low frequency and trend

components. Different prediction models are used for each of the components, ARIMA

is used to predict the high frequency components, LSSVM is used to predict the low

frequency and trend components, finally the prediction results of all components are

aggregated. The article analyzes three energy price series including daily WTI crude oil.

The study applies the fine-to-coarse method which can be used for noise selection.

Numerous benchmark models are applied, including typical decomposition-ensemble

models without reconstruction and similar decomposition-ensemble models with existing

reconstruction strategies. The results indicate that the proposed method can significantly

improve the level and directional prediction accuracy.

Lahmiri (2016) presents a new time series forecasting model which integrates

variational mode decomposition (VMD) and general regression neural network (GRNN).


8

Three benchmark models are applied: EMD-GRNN, FFNN and ARIMA. Daily data of

WTI, CANUS and the Volatility index from 2008.01.02. to 2013.12.16. are used to

conduct the experiments. Two main steps are involved: EMD or VMD is applied to the

original data to obtain components, then they will be fed to the GRNN for forecasting

purpose. The researchers demonstrated the superiority of the VMD-based method over

the three competing prediction approach, consequently VMD is an effective technique for

analysis and prediction of economic and financial time series. VMD has the ability to

separate tones of similar frequencies and it is more robust to noisy data contrary to EMD.

Table 1. summarizes the main articles mentioned in this section and gives extra

details of the research papers. The last three rows of the table contains articles that are not

mentioned in the literature review however the decomposition method they use can be

useful for financial time series forecasting.

This section introduced the most important research papers that readers can most

frequently encounter. The above review helps the reader to become familiar with the

current trends in financial time series forecasting particularly with the decomposition

based prediction strategies. Although the current literature of traditional financial

econometric forecasting has promising results, the tools of signal processing are not

widespread in financial econometric researches. In spite of the fact that one can find

promising results in the literature, signal processing methods in economic/financial time

series forecasting is not widespread because articles are not reproducible and comparable.

A general research framework is missing from the literature, which could help categorize

the articles, determine the necessary parameters for reproduction and foster comparability

of studies.

9 9

1. Table: Details of the research papers described in the literature review, Source: Own table

Author Data Frequency Window Decomposition

method

Stopping

criterion

Noise selection

/Noise

reduction

Aggregation Prediction

horizon

Main prediction

model

Yousefi et al. (2005) WTI spot price NYMEX futures

Monthly 100 random samples

Daubechies’ Wavelet Expert judgement

Not used Signal processing inverse

1,2,3,4 spline, trigonometric fit

Yu et al. (2008) WTI spot price Brent spot price

Daily NaN EMD Residual based Not used Learning 1, 30 FNN (ensemble: ALNN)

Jammazi & Aloui (2012)

WTI spot price Monthly NaN Haar A Trous Wavelet Expert judgement

Expert judgement / Drop D1-D6

Signal processing inverse

19 MBPNN

Lin et al. (2012) FX rates Daily NaN EMD Residual based Not used Sum of components NaN LSSVR

Guo et al. (2012) Wind speed Monthly/Daily NaN Modified EMD Residual based Expert judgement/ Drop more freq.

Sum of components 1,18 FNN

Bekiros & Marcelino (2013)

FX rates, volatility, return

Daily Rolling Shift invariant DWT Expert judgement

Not used Signal processing inverse

1,2,3,4,5 spline, ARIMA

Xiong et al. (2013) WTI spot price Weekly Multiple window type

SBM-EMD NaN Not used Learning 1,4,8,12,16,20,24

FNN (ensemble: FNN)

Shu-ping et al. (2014)

WTI spot price Monthly NaN EMD NaN Not used Learning 12,23

SVM,NN,ARIMA

(ensemble: SVM)

Xiong et al. (2014) NN3 competition Monthly NaN EMD,

Daubechies’ Wavelet

Expert

judgement Not used Learning 1

SVR (ensemble: SVR)

Yu et al. (2015) WTI spot price

Brent spot price Weekly NaN EEMD NaN Not used Sum of components 1,2,3,4 LSSVR, ANN

Zhu et al. (2016) WTI spot price CO2 EUA

Daily Rolling ESE-EEMD Expert judgement

Not used Sum of components 1 ARIMA, LSSVM

Lahmiri (2016) WTI spot price FX rates, VIX

Monthly/Daily NaN VMD Residual based Not used Learning 1 GRNN

Afanasyev & Fedorova (2016)

Power exchange Daily Rolling CEEMDAN Expert judgement

Not used NaN 1 NaN


10

3. Crit ica l review

The main purpose of this section is to provide a critical review of the literature

focusing primarily on the mistakes and criticisms. The criticisms are formulated based on

the research papers mentioned in the literature review. The section can help researchers

to form an opinion on the results of studies and it can foster the application of signal

processing in economic/financial time series forecasting.

First and foremost, there is no general research framework, which summarizes the

main results and conclusions of studies, the relevant research questions and the possible

research designs. There is no review paper in the decomposition based forecasting

literature, which summarizes the main results of studies, the possible research designs

and the relevant research directions. Consequently a general research framework can

solve the aforementioned problems. This study intends to provide a research framework

for the current and future literature of economic/financial time series forecasting in

section 4. and it also provides the possible research questions and designs in section 5.

It is not possible to reproduce most of the papers because they lack the necessary

model parameters, data or code. In spite of the fact that decomposition based

economic/financial time series forecasting studies have promising results, their external

and internal validity is low. The most frequently missing elements of research

descriptions are the window size and type used for decomposition and prediction. Data,

packages/toolboxes used for the analysis and a model description with parameter

selection should be provided. If studies were easily reproducible, a great progress could

be made on their application in economic/financial time series analysis.

The stopping criterions of decomposition methods are not analyzed thoroughly,

their instability is frequently ignored. Analyzing the connection between the number of

components and the characteristics of a time series is a prerequisite to the spread of signal

processing methods in economic/financial time series analysis. A well-designed

robustness check can solve this problem. Changing the data type (return or price), window

size and type or the frequency of the data provide solution for the problem.

It is a common mistake that papers ignore the look-ahead bias, since their models

use future information. This improves prediction accuracy and it gives an accurate

prediction result, which is actually unreliable. The decomposition result is highly

sensitive to the window therefore decomposing the entire time series, then utilizing this


11

at an earlier prediction is a mistake. Consequently, it cannot be compared to traditional

econometric tools. Based on the descriptions and figures provided in studies, researchers

do not analyze the number of components during the prediction process, which is a crucial

element of an analysis. A thorough analyses should be made on the number of

components, since it changes as the window used for decomposition rolls or expands (e.g.

in case of EMD). Due to the fact that wavelet has a predetermined component number the

information content of the component should be analyzed throughout the prediction

horizon.

It is rarely explained which model should be fit on components (low-, medium,-

high-frequency etc.). The statistical analysis of components are often missing from

studies. Reconstructing the components into low-, medium-, high-frequency components

is a frequently used method however it is difficult to explain why this method should

work in general. Researchers should pay more attention to the analysis of components

(complexity, nonlinearity, structural breaks etc.) and choose the reconstruction method

and forecasting models accordingly. This can foster the comparison of different

decomposition methods.

It is a frequent mistake in the literature that the differences between decomposition

methods are analyzed with different prediction models, consequently the partial effects

(decomposition, reconstruction, noise reduction) are not described properly. In this case

it is impossible to decide whether the decomposition or the noise reduction improved the

prediction accuracy. A properly designed study selects benchmark models in a way that

can separate the positive effects of decomposition and noise reduction. Consequently the

choice of benchmark models is crucial to the separation of partial effects.

It is also a mistake that the out-of-sample time period is often too short. Out-of-

sample evaluation shows how good the applied method is. The longer the out-of-sample

period is, the more reliable the results become. Consequently it is worth choosing a long

out-of-sample period and repeat the analysis both with rolling and expanding window.

Statistical comparison of models are rarely done, therefore the significance of the

difference of two prediction models is not checked. Diebold-Mariano test is the most

frequently used, however using model confidence set is a better approach to test the

difference of two models. Moreover in most of the papers an economic evaluation, based

on the prediction models is missing (for example analyzing the differences between


12

Sharpe-ratios based on a portfolio selection). Statistical evaluation, per se, does not

provide information about the economic efficiency and applicability of models.

Studies usually miss robustness check. It can be easily done by changing the

frequency of the data (intraday, daily, weekly etc.), using price time series instead of

return series or the window size and type can also be changed. Applying linear and

nonlinear prediction models is also a good strategy to check robustness. Articles often

ignore the analysis of decomposition and noise reduction in case of periods which have

different characteristics (volatile, smooth, noisy periods). Using robustness check can

strengthen the reliability of results and provides more information about the

decomposition strategy.

There are no articles in the decomposition based forecasting literature that apply

simulation. A well-designed research is missing which analyses how noise reduction

performs in case of time series with different characteristics. In case of simulation the

data generating process can be controlled and a comprehensive analysis can be performed

on decomposition and noise reduction.

The definition of noise and its representation in a time series are not described

thoroughly. The implementation of noise reduction can differ in case of different

prediction approaches. That is why these characteristics make the comparability of

research papers more difficult. It is difficult to measure the partial effect, which stems

from noise reduction, if the concept of noise is not described properly.

The criticisms expressed in this section are the author’s own opinion and not part

of any review papers. Nevertheless avoiding the aforementioned mistakes have several

benefits. It can foster the application of signal processing methods in economic/financial

time series forecasting and strengthen the reliability, validity of results.

4. Research framework for financia l t ime series forecast ing

The main purpose of this section is to provide a research framework for the current

and future literature of financial time series forecasting. The section emphasizes how

difficult the interpretation and reproduction of an article is without a proper general

framework. It is a currently missing element of the literature in spite of the fact that it has

many advantages. The framework paves the way for comparing papers in the field of

financial time series forecasting and gives a road map for future researches.


13

The framework provides the following advantages: (1) the proposed framework

facilitates the aggregation of results of the current literature, consequently it helps us

better understand the efficiency of signal processing techniques in financial time series

analysis. Moreover it paves the way for a meta-study in which the current results can be

combined. (2) It helps with the formulation of the research design, since it is easier to

design your research if you know the general framework of the field, (3) it makes a great

progress in comparing and classifying research papers, since it provides the necessary

groupings for classification. (4) It helps researchers specify all the necessary details or

parameters of their research design thereby facilitating the paper’s reproduction, (5) it

helps determine the reliability of the results presented in a research paper. All in all the

framework fills a gap in the current literature which opens up the opportunities for further

researches.

Based on the papers described in the literature review there are four broad research

designs. The designs are depicted on figure 1., in spite of the fact that figure 1. simplifies

the research approaches, its perspicuity makes them easy to understand. As a first step all

of the approaches involve the decomposition of the original signal into components. The

first method applies noise selection and noise reduction in the second stage in order to

enhance the prediction performance, then applies signal processing inverse in order to

obtain the denoised signal. If the denoising method is well designed the resulting signal

should be less complex and hopefully easier to predict. The second approach also involves

decomposition in the first stage, after that it predicts the future value of each components

in the second stage, then applies signal processing inverse and obtains the predicted

values. This research design gives us the possibility to predict different components with

different prediction methods (e.g. ANN for highly irregular components and linear

regression for a smooth component). Reconstructing the components into low-medium-

high components is a frequently applied technique in the literature. The third research

design involves decomposition in the first stage, prediction of the components in the

second stage (reconstruction of the components can be used here as well), however

instead of using the inverse method of signal processing it builds a new prediction model

which uses the prediction of the second stage as input variables. This research design can

put different weights on the predicted values of the second stage, however it is a complex

prediction approach and its application should be well-founded. The fourth research

design applies first a decomposition method, then the components are used as input


14

variables for a prediction model. Of course the reconstruction of components into low-

medium-high components can be used here as well. Nevertheless the fourth research

design is the less frequently used design in the literature.

1. Figure: Simplified representation of the four broad research designs, Source: Own figure

The detailed representation of the research designs is described on figure 2. This

paper proposes the structure introduced on the figure as the general research framework

for decomposition based financial time series forecasting. The dark green boxes represent

the main stages of a prediction process. The first stage involves the data selection, the

second stage is the decomposition, which is followed by the frequency selection. After

the frequency selection is done we arrive to the fourth stage which is the reconstruction.

After that researchers should choose the number of models in the fifth stage, design the

prediction thoroughly in the sixth stage and finally select the aggregation method. This

framework is sufficiently detailed to categorize research papers, moreover it defines all

the necessary parameters which should be given in any research paper in order to ensure

comparability and replication. Furthermore, with the help of the framework, it is easier to

determine what the research question of an article is.

The first stage involves the selection of data type, data frequency and window. The

return and price level should be separated due to the different characteristics of the same

data expressed in returns and price level. Another important issue with the data selection


15

is its frequency, because a model that is the best fit on weekly data is not necessarily the

best on intraday data. The window size and type (rolling or expanding) should also be

given, because some methods are highly sensitive to these parameters. The size and type

of the window are the most frequently missing elements of a research design. Here

bootstrap means selectin random samples of consecutive observations with equal length.

The elements of the first stage are data type, frequency and window. They can be used

for robustness check, which is rarely done in researches.

The second stage is the decomposition. In this stage a broad method family, the

exact decomposition method and the stopping criterion should be selected. In the current

literature there are two frequently used approaches: empirical mode decomposition

(EMD) and wavelet based decomposition. Both of the methods have improved

modifications which have -in theory- better characteristics, however there are few papers

in the literature which analyze the partial effect of choosing an improved modification

instead of the simplest version. These are listed in column II. b). The variational mode

decomposition (VMD) method in column II. a) has been proposed as an alternative of

EMD to easily separate tones of similar frequencies in data where EMD fails. This paper

lists separately VMD and EMD because VMD is based on a different algorithm. EMD is

the simplest version of the decomposition family, it will be described later in this paper.

The EMD modified with the slope based method intends to handle the end effect problem

of the simple EMD, while the ensemble empirical mode decomposition (EEMD) intends

to handle the potential mode mixing problem of EMD. The extrema symmetry expansion

EEMD is the modified version of EEMD, which gives a solution for both mode mixing

and end effect problem. Nevertheless EEMD introduces additional noise into the results

of decomposition and does not produce stable number of IMFs after applying to the same

time-series. The complete ensemble empirical mode decomposition with adaptive noise

(CEEMDAN) is introduced in the literature to solve this problem. The third

decomposition method described in this paper is the discrete wavelet transform. This

method will be described later in detail. The most frequently applied wavelet is the

Daubechies’ and the Haar wavelet in the literature. However the classical decimated

DWT involves subsampling of the filter output to half the original length, which leads to

a serious drawback, namely the transform is not shift invariant. Specifically, the DWT of

a shifted signal is not the shifted version of the DWT of the signal. Nevertheless an

undecimated DWT can be implemented without the subsampling technique, moreover


16

they are invariant to circularly shifting the time series. A new variation of the undecimated

DWT, namely the shift invariant DWT (SIDWT) is proposed in the literature. Besides

being shift invariant, SIDWT employs a specialized periodic extension pattern to deal

with boundary effects. However SIDWT is not an orthogonal basis, since it produces an

over-determined representation of the series. The SIDWT method will be described later

in this paper. After the decomposition method has been chosen a stopping criterion should

be selected. Research papers which apply certain thresholds or 𝑙𝑜𝑔2𝑁 or determine the

maximum number of shifting or use a predetermined order as a stopping criterion are

classified in the expert judgement group. The residual based stopping criterion is

applicable only in case of EMD family. This will be described later in this paper. Some

papers pursue an optimal decomposition with respect to the minimization of an entropy-

related criterion, which describes the information-relevant properties of the representation

of a signal.

The third stage is the frequency selection. This stage starts with noise selection.

Here expert judgement contains all the papers that selected certain component or

components as noise without analysis (e.g. select the highest frequency component). A

noise component can be selected with the help of partial autocorrelation function. PACF

is a way to measure the linear relation of a time series with its own lagged values when

the intermediate effects are filtered out. The run length judgement method is a tool for

measuring the irregularity of a given signal. It assigns a run number to a signal and larger

the number is, the higher the volatility is. Another way of selecting noise is to use an

entropy related approach. The permutation entropy, the sample entropy and the Shannon

entropy are possible tools for noise selection. There could be other entropy definitions as

well, however column III. a) lists all that are mentioned in one of the papers in Table 1.

Lot of papers do not apply any noise selection method, they are classified into the ‘skip

noise selection’ category. After the noise component or components are selected we can

drop one or more components. Here a ‘no drop’ box is introduced in order to classify

papers which skipped the noise selection procedure.

The fourth stage is the reconstruction. In this stage one should select the

reconstruction type and rule. Total aggregation means the aggregation of the components

for the original level. It is a box for those articles which drop a noise component then

aggregates the components and analyze the denoised time series later on. Several papers

reconstruct the components into low, medium and high frequency components in order


17

to analyze different features of the signal separately and improve prediction performance.

The ‘no reconstruction’ box is for those papers which do not use reconstruction. There

are several reconstruction rules that can be applied. Expert judgement incorporates all the

papers that use reconstruction without analysis. The run length judgement method is the

same as in the case of noise selection. This method can also be used as a reconstruction

rule. In case of the data characteristic driven reconstruction rule, the decomposed modes

are thoroughly analyzed to explore the hidden data characteristics (complexity, cyclicity,

mutability, tendency) and are accordingly reconstructed. Fine to coarse reconstruction

rule can be described as the following: high-pass filtering by adding fast oscillations

(IMFs with smaller index) up to slow (IMFs with larger index). First we sum some

components then we calculate t test to identify how many components can be summed up

without departing significantly from zero. These components will be reconstructed into a

high frequency component and the rest of the IMFs will be reconstructed into a low

frequency component. A clustering method can be used as well on statistics calculated

from each components.

In the fifth stage one should choose the number of models used for prediction. The

‘one model’ contains typically those papers which decompose the original data, drop a

noise component then aggregate the components for the original level. In case of the

‘same models’ and ‘different models’ boxes researchers build multiple prediction models.

For example, a paper that is classified into the ‘same models’ box decomposes the original

data and builds ANN for each of the components, while a paper from the ‘different

models’ box builds ANN for one component, SVM for another etc.

The sixth stage is the prediction. Here the window, the prediction horizon, the

prediction model, feature selection method and the hyperparameter optimization should

be selected. Researchers can select the ‘same’ window if they want to use the same type

as in the case of I. c) or it is possible to choose different. The prediction horizon can be

set to one or multiple periods. Column VI. c) lists all the prediction models that were used

in papers introduced in Table 1. Feature selection lists the methods that can be used for

selecting input variables. Here expert judgement contains all the papers that selected input

variables without analysis. Using ANN typically involves the input selection through

optimization on a validation set. It is important to point out that most of the papers do not

use hyperparameter selection through optimization on a validation set, instead they use a

predetermined model architecture.


18

The seventh stage is the aggregation. Here ‘no aggregation’ box is created for those

researches that aggregated the decomposed signal in an earlier stage (first method) or use

the components as input variables to predict directly the future value of the original signal

(fourth method). The ‘prediction’ box incorporates researches where a new model is fit

on the predicted values obtained using each of the components. Some papers apply the

inverse of the decomposition method at the end to obtain the predicted values (fourth

method), these are classified into the ‘signal processing inverse’ box. It is the wavelet

inverse or the summation in case of EMD family.

Table 2. classifies the researches described in the literature review based on the

general research framework introduced in this section. This paper assigns seventeen

numbers for each of the researches based on their main prediction model. Every number

represents a column from figure 2., the first number shows the data type, the second

number the data frequency etc. and the last number represents the aggregation method,

from each of the columns a number should be selected that is why a zero value is given

to a column in case the researchers do not specify it. The first three numbers would be 1-

2-0 in case of a paper which analyze daily return data but there is no information written

about the window. Some papers apply multiple prediction models or window type. In this

case more box numbers are given for the same column.

This section of the paper introduced a general research framework which describes

the possible research designs in decomposition based economic/financial time series

forecasting. The framework can help compare papers in the field of economic/financial

time series forecasting and gives a road map for future researches. Furthermore it defines

all the necessary parameters that should be given for replication. Besides, this section also

classified those research papers that were introduced in the literature review. Based on

Table 1. the window size, type and the stopping criterion are the most frequently missing

parameters from a research design.

19

2. Figure: Research framework of financial time series forecasting, Source: Own figure

20

Author Title Category

Yousefi et al. (2005) Wavelet-based prediction of oil prices 2-4-3| 3-6-1| 7-1| 3-7| 1| 1-12- 12 -1-1| 3

Yu et al. (2008) Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm

2-2-0| 2-1-2| 7-1| 3-7| 2| 0-12-6-1-2| 2

Lin et al. (2012) Empirical mode decomposition–based least squares support vector regression for foreign exchange rate forecasting

2-2-0| 2-1-2| 7-2| 3-7| 2| 0-0-7-1-2| 3

Jammazi & Aloui (2012) Crude oil price forecasting: Experimental evidence from wavelet decomposition and neural network modeling

2-4-0| 3-7-1| 1-2| 3-7| 1| 0-2-6-1-2| 3

Bekiros & Marcelino (2013) The multiscale causal dynamics of foreign exchange markets 1-2-1| 3-8-1| 7-1| 3-7| 1| 1-12 – 24 -3-1| 3

Xiong et al. (2013) Beyond one-step-ahead forecasting: Evaluation of alternative multi-step-ahead

forecasting models for crude oil prices 2-3-12| 2-2-0| 7-2| 3-7| 2| 1-12-6-3-2| 2

Shu-ping et al. (2014) Multiscale Combined Model Based on Run-Length-Judgment Method and Its Application in Oil Price Forecasting

2-4-0| 2-1-0| 7-2| 2-2| 3| 0-2-468-3-2| 2

Yu et al. (2015) A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil price forecasting

2-3-0| 2-3-0| 7-2| 2-2345| 2| 0-12-67-1-12| 3

Zhu et al. (2016) An Adaptive Multiscale Ensemble Learning Paradigm for Nonstationary and Nonlinear Energy Price Time Series Forecasting

2-2-2| 2-4-1| 7-2| 2-4| 3| 1-1-47-1-2| 3

Lahmiri (2016) A variational mode decompoisition approach for analysis and forecasting of economic and financial time series

2-23-0| 12-1-2| 7-2| 3-7| 1| 0-1-6-1-1| 2

2. Table: Classification of researches introduced in the literature review, Source: Own table


21

5. Possible research quest ions

This section provides some of the possible research questions and designs that can

be formulated based on the framework. The main purpose of this section is to briefly

introduce the research questions which can solve one of the problems mentioned in the

critical review. These are potential articles which can make a significant progress in the

economic/financial time series forecasting literature.

Analyzing the stopping criterion of the EMD model family. This problem is

important because the stopping criterion can influence the number of components. There

are cases where the spline fit cannot be done or the threshold should be changed in order

to ensure convergence. During the analysis the window size and type, data frequency,

volatile and smooth periods should be taken into account.

Investigating the sensitivity of the decomposition of economic/financial time series

to the selected window. In this case one should test how sensitive the EMD and wavelet

resolution is, how the number of components change when the window rolls or expands

in case of EMD and how the information content of the components change in case of

wavelet. One should select the number of components in advance, when applying wavelet

decomposition, that is why the information content of these components should be

considered.

Comparing noise selection methods and analyzing their effect on prediction

accuracy. In this research project one should summarize the possible noise definitions in

case of the four prediction methods (introduced in section 4.) and find which noise

selection method is appropriate to use in each of the cases.

Comparing different reconstruction methods. In this research project one should

test which of the reconstruction method is the most efficient, how many components (low,

medium, high etc.) should be made, which of the characteristics should be used to

reconstruct the components.

Analyzing the partial effects of decomposition, reconstruction and noise reduction

separately. All of these methods can enhance prediction performance however their

individual contribution to prediction accuracy is rarely analyzed.

Testing the efficiency of prediction models in case the values of components are

predicted separately. Some papers apply different prediction models on components,

some choose only one model. Nevertheless it should be investigated whether it is worth

choosing models based on statistical properties of a component.


22

Perform an analysis based on simulated data. Define several data generating

processes which produce time series with different frequencies then use them to create an

aggregated time series. Due to the controlled nature of the analysis the result of signal

processing methods will be more reliable.

Investigating whether there is a relationship between the amplitude of noise and

liquidity, volatility or volatility of volatility in case of time series which are from different

asset classes.

Analyze the efficiency of decomposition methods in prediction accuracy in case of

economic/financial time series which are rarely analyzed by signal processing methods

(for example volatility index, inflation, GDP).

Apply multi-dimensional decomposition, exploiting the relations between time

series. One should investigate whether the result of simultaneous decomposition can be

used to predict one or the other time series more accurately.

The potential research questions mentioned above do not claim to be exhaustive,

however they illustrate well that several studies are missing from the literature.

Nevertheless the author claims that the results of these researches could make a great

progress in decomposition based economic/financial time series forecasting literature.

6. Decomposit ion methods

This section provides a detailed introduction to the decomposition methods,

namely, empirical mode decomposition (EMD) and wavelet based decomposition. These

are the two methods which are used to analyse the original signal in a new representation.

6.1. Empirical mode decomposition

Empirical mode decomposition (EMD) method first appeared in the article of

Huang et al. (1998). They introduced a new method to deal with both non-stationary and

nonlinear data by decomposing the signal first, and analyse the physical meaning of the

decomposition later.

EMD has the characteristics of being intuitive, direct, a posteriori and adaptive with

the basis of the decomposition based on the data. The basic principle of EMD is to

decompose the signal into a sum of oscillatory functions, namely, intrinsic mode

functions (IMF). The decomposition based on three assumptions: (1) the signal has at

least two extrema, one maximum and one minimum, (2) the characteristic time scale is


23

defined by the time lapse between the extrema, (3) if the data has no extrema but contains

only inflection points then it can be differentiated once or more times to get the extrema

(Huang et al. 1998). Huang et al. (1999) introduced two requirements in order to get

meaningful IMFs: (1) in the whole data series, the number of extrema (sum of maxima

and minima) and the number of zero crossings, must be equal, or differ at most by one,

(2) the mean value of the envelopes defined by local maxima and minima must be zero at

all points. Nevertheless the components’ orthogonality is not guaranteed theoretically.

For some data, the neighboring components could certainly have sections of data carrying

the same frequency at different time durations. The amount of leakage usually depends

on the length of data as well as the decomposition results. However Huang et al. (1998)

argues that orthogonality is a requirement only for linear decomposition systems, it would

not make physical sense for a nonlinear decomposition as in EMD.

The different scales can be identified directly in two ways. First by the time lapse

between the successive alternations of local maxima and minima, secondly by the time

lapse between the successive zero crossings. Huang et al. (1998) adopted the time lapse

between successive extrema as the definition of the time scale for the intrinsic oscillatory

mode. This choice is beneficial because it gives a fine resolution of the oscillatory mode.

One can extract the scales by the shifting process. Any data series 𝑥(𝑡) (𝑡 =

1,2, … 𝑛) can be decomposed according to the following shifting procedure (Yu et al.,

2008):

1) Identify all the local extrema, including local maxima and local minima of the

time series 𝑥(𝑡)

2) Connect all local extrema by a cubic spline line to generate its upper and lower

envelopes 𝑋𝑢𝑝(𝑡) and 𝑋𝑙𝑜𝑤(𝑡). In this step we should fit a cubic spline

separately to the time series of local minimum and local maximum points.

3) Compute the point-by-point envelope mean 𝑚(𝑡) from upper and lower

envelopes (𝑚(𝑡) =𝑋𝑢𝑝(𝑡)+𝑋𝑙𝑜𝑤(𝑡)

2)

4) Extract the details: 𝑐(𝑡) = 𝑥(𝑡) − 𝑚(𝑡). Steps 1) – 4) is plotted on Figure 3.

5) Check the properties of 𝑐(𝑡), if 𝑐(𝑡) meets the two requirements of Huang et al.

(1999), an IMF is derived and 𝑥(𝑡) should be replaced with the residual : 𝑟(𝑡) =

𝑥(𝑡) − 𝑐(𝑡). In case 𝑐(𝑡) is not an IMF then 𝑥(𝑡) should be replaced with 𝑐(𝑡).

One has to repeat the algorithm 1) – 5) until a stopping criterion is satisfied.


24

The typical stopping criterions can be classified into three groups (1) residual based,

(2) expert judgement, (3) entropy based. According to the residual based criterions the

above algorithm should be stopped if we reach the final time series 𝑟(𝑡) as a residual

component that becomes a monotonic function or has at most one local extremum. This

criterion is suggested by Huang et al. (1999). The following researches also used this

method as a stopping criterion (Lin at al., 2012), (Yu et al., 2008), (Guo et al., 2012).

Riling et al. (2003) introduces the mode amplitude 𝛼(𝑡) ≔ 𝑥𝑢𝑝(𝑡)− 𝑥𝑙𝑜𝑤(𝑡)

2, and the

evaluation function 𝜎(𝑡) ≔ |𝑚(𝑡)

𝛼(𝑡)|. Thus the sifting is iterated until 𝜎(𝑡) < 𝜃1 for some

prescribed fraction (1 − 𝛼) of the total duration, while 𝜎(𝑡) < 𝜃2 for the remaining

fractions, where 𝜃1and 𝜃2 aimed to guarantee globally small fluctuations in the mean

while taking into account locally large excursions. One can typically set 𝜃1 = 0.05, 𝜃2 =

0.5 and 𝛼(𝑡) = 0.05.

There are several approaches in the literature where the stopping criterion includes

a certain threshold that is determined by the researcher. Lahmiri (2016) computed the

standard deviation (SD) from two consecutive sifting results. According to this approach

the shifting process should be stopped if the standard deviation is less than an arbitrary

small number1. Huang et al. (1998) emphasize that carrying the shifting process to an

extreme could make the resulting IMF a pure frequency modulated signal of constant

amplitude. To guarantee that the IMF components have enough physical sense one should

set SD value between 0.2 – 0.3. Another stopping criterion can be defined by the

following three conditions: (1) at each point (mean amplitude) < (threshold * envelope

amplitude), (2) mean of Boolean array ((mean amplitude)/(envelope amplitude) >

threshold) < tolerance, and (3) the number of zero crossings and the number of extrema

is less than or equal to one Lahmiri (2016). In this case threshold, threshold2 and the

tolerance value are set by the researcher, Lahmiri (2016) applied 0.5, 0.5, 0.5 values. Zhu

et al. (2016) terminated the shifting process when it reached the maximum shifting times

of 10. In Xiong et al. (2014) paper the whole sifting process stops after 𝑙𝑜𝑔2𝑁 IMFs have

been extracted, where N is the length of the data series.

Tseng and Lee (2010) applied an etropic analysis strategy. They analyized to what

extent information relevant to underlying functions of x(t) is carried in the IMFs. They

defined a normalized information scale to measure the information extent. Their

1 𝑆𝐷(𝑘) =

∑ (𝑐𝑘−1− 𝑐𝑘 )2𝑇

𝑡=0

∑ 𝑐𝑘−12𝑇

𝑡=0< 𝜀


25

numerical studies showed that the scale correctly quantifies the extent of information that

is codified in an IMF. Based on this scale the IFMs that are information-free components

can be identified.

After the stopping criterion is satisfied, the original data series can be expressed by

𝑥(𝑡) = ∑ 𝑐𝑗(𝑡) + 𝑟𝑛(𝑡)𝑛𝑗=1 , where n is the number of IMFs, 𝑟𝑛(𝑡) is the final residual

which is the main trend of 𝑥(𝑡) and 𝑐𝑗(𝑡) (𝑗 = 1, … , 𝑛) are the IMFs. Thus, one can

achieve decomposition of the data series into n-empirical mode functions and one

residual. The IMF components have different frequency band and they change with

variation of time series 𝑥(𝑡), while 𝑟𝑛(𝑡) represents the central tendency of the data (Yu

et al., 2008).

3. Figure: Plotting the envelope and their mean, Source: Metatrader, 2012

Empirical mode decomposition has several distinct advantages, however it also has

some serious disadvantages. On the one hand it is relatively easy to understand and

implement, the fluctuations within a time series are automatically and adaptively selected

from the time series, it is robust for nonlinear and nonstationary time series

decomposition, EMD can adaptively decompose a time series into several IMF

components and one residual components. Unlike wavelet decomposition, EMD is not

required to determine a filter base function before decomposition (Yu et al., 2008). On

the other hand the decomposition results can be mode mixing, which means that a single

IMF contains sparsely distributed timescales, or similar timescales are broken down into

different IMFs (i.e. orthogonality condition is not satisfied) (Zhu et al., 2016).

Furthermore EMD suffers the end effect. End effect refers to the situation in which when


26

calculating the upper and lower envelopes with the cubic spline function in the sifting

process of EMD, divergence appears on both ends of the data series and gradually

influences the inside of the data series, greatly distorting the results (Deng et al., 2001).

6.2. Discrete wavelet based decomposition

Wavelet methodology, a refinement of Fourier analysis, is an alternative for

analyzing nonstationary data with high irregularities and cyclical pattern. The wavelet

multiscale decomposition allows for simultaneous analysis in the time and frequency

domain. It converts a signal into a series of wavelets and provides a way for analyzing

waveforms, bounded in both frequency and duration. That is why wavelet decomposition

could be a valuable means of exploring the complex dynamics of financial time series

(Bekiros, Marcellino, 2013). Figure 4. depicts the benefits of wavelet transform in

comparison with the time domain representation, Fourier transform and the short-time

Fourier transform.

4. Figure: Comparison of transformations, Source: Uliha, 2016, 512.p

Figure 4. highlights that in case of a time domain representation we have no

frequency information however we have information about the amplitude of a signal. The

Fourier transform uses a basis of sines and cosines of different frequencies to determine

how much of each frequency the signal contains. The Fourier transform does not allow

the frequency content of the signal to change over time therefore it can tell us how much


27

of each frequency exists in the signal but it does not tell us when in time these frequency

components exist. To overcome such limitation it has been suggested the short-time

Fourier transform. It consists in applying a short-time window to the signal and

performing the Fourier transform within this window as it slides across all the data.

However, any time-frequency analysis is limited by the Heisenberg uncertainty principle,

which states it is impossible to know simultaneously the exact frequency and the exact

time of occurrence of this frequency in a signal (i.e. there is a trade off between time and

frequency resolution). The problem with the short-time Fourier transform is that it uses

constant length windows. In contrast, the wavelet transform uses local base functions that

can be stretched and translated with a flexible resolution in both frequency and time. In

case of the wavelet transform, the time resolution is intrinsically adjusted to the frequency

with the window width narrowing when focusing on high frequencies while widening

when assessing low frequencies. Allowing for windows of different size makes it possible

to improve the frequency resolution of the low frequencies and the time resolution of the

high frequencies. This means that, a certain high or low frequency component can be

located better in time. Wavelet enables a more flexible approach in time series analysis,

wavelet analysis is seen as a refinement of Fourier analysis. (Rua, 2012) (Uliha, 2016)

The signal 𝑥[𝑛] is a discrete time function i.e. a sequence, where n is an integer.

The procedure starts with passing the sequence through a half band digital lowpass filter

with impulse response ℎ[𝑛]. Signal filtering corresponds to the mathematical operation

of convolution of the signal with the impulse response of the filter. The convolution is

defined as follows:

𝑥[𝑛] ∗ ℎ[𝑛] = ∑ 𝑥[𝑘] ∙ ℎ[𝑥 − 𝑘]∞𝑘=−∞ (2)

A half band lowpass filter removes all frequencies2 that are above half of the highest

frequency in the signal. After passing the signal through a half band lowpass filter, half

of the samples can be eliminated. Discarding every other sample will subsample the signal

by two and the signal will then have half the number of points. The scale of the signal is

now doubled. The lowpass filtering removes the high frequency information, but leaves

the scale unchanged. Only the subsampling process changes the scale. However

resolution is related to the amount of information in the signal, and therefore, it is affected

by the filtering operations. Nevertheless the subsampling operation after filtering does

2 In discrete signals frequency is expressed in terms of radians.


28

not affect the resolution, half the samples can be discarded without any loss of

information. In summary, the lowpass filtering halves the resolution, but leaves the scale

unchanged. The signal is then subsampled by 2 since half of the number of samples are

redundant. This doubles the scale. This procedure can be expressed as:

𝑦[𝑛] = ∑ ℎ[𝑘] ∙ 𝑥[2𝑛 − 𝑘]∞𝑘=−∞ (3)

The DWT analyzes the signal at different frequency bands with different resolutions by

decomposing the signal into a coarse approximation and detail information. DWT

employs two sets of functions, called scaling functions and wavelet functions, which are

associated with low pass and highpass filters, respectively. The decomposition of the

signal into different frequency bands is simply obtained by successive highpass and

lowpass filtering of the time domain signal. In summary the original signal is first passed

through a halfband highpass filter𝑔[𝑛] and a lowpass filter ℎ[𝑛], after the filtering half of

the samples can be eliminated, then the signal can be subsampled by 2, simply by

discarding every other sample. This constitutes one level of decomposition and can be

expressed as follows:

𝑦ℎ𝑖𝑔ℎ[𝑘] = ∑ 𝑥[𝑛] ∙ 𝑔[2𝑘 − 𝑛]𝑛 (4)

𝑦𝑙𝑜𝑤[𝑘] = ∑ 𝑥[𝑛] ∙ ℎ[2𝑘 − 𝑛]𝑛 (5)

where 𝑦ℎ𝑖𝑔ℎ[𝑘] and 𝑦𝑙𝑜𝑤[𝑘] are the outputs of the highpass and lowpass filters after

subsampling by 2. The decomposition halves the time resolution since only half the

numbers of samples now characterize the entire signal. However, this operation doubles

the frequency resolution, since the frequency band of the signal now spans only half the

previous frequency band, reducing the uncertainty in the frequency by half. This

procedure can be repeated for further decomposition. Figure 5. illustrates this procedure.


29

5. Figure: Process of wavelet decomposition, Source: Mirzaei et al., 2010, 303.p.

The highpass and lowpass filters are not independent of each other and they are related

by the following equation, where L is the filter length (in number of points):

𝑔[𝐿 − 1 − 𝑛] = (−1)𝑛 ∙ ℎ[𝑛] (6)

The frequency bands that have little information for the original signal will have very low

amplitudes, consequently that part of the signal can be discarded without loss of

information, allowing data reduction. The reconstruction of the original signal is easy if

we use halfband filters, since they form orthonormal basis. The reconstruction formula

can be expressed by:

𝑥[𝑛] = ∑ ((𝑦ℎ𝑖𝑔ℎ[𝑘] ∙ 𝑔[2𝑘 − 𝑛]) + (𝑦𝑙𝑜𝑤[𝑘] ∙ ℎ[2𝑘 − 𝑛]))∞𝑘=−∞ (7)

However, if the filters are not ideal halfband, then perfect reconstruction cannot be

achieved Daubechies (1992). The most famous wavelets are known as the Daubechies’

wavelets, however Coiflet, Haar and Symlet wavelets are also frequently used types.

One of the most important benefits of wavelet decomposition is its strong

theoretical background and the possibility of applying wavelet that produces orthogonal

components. In comparison with Fourier transform, wavelet transform uses local base

functions that can be stretched and translated with a flexible resolution in both frequency

and time, resulting in more frequency and time domain information. Due to the filtering

one can easily choose noise component. On the other hand one should choose a base

function before the analysis which can highly affect the results, there is no recipe book


30

for choosing the type of wavelet for a specific time series. Not only should the base

function be chosen by the researcher in advance, but also the order of the wavelet. The

classical, decimated discrete wavelet transform involves subsampling of the output of the

high- and low-pass filters to half their original length. This leads to a serious drawback,

namely the transform is not invariant in the real-axis. Specifically, the DWT of a shifted

signal is not the shifted version of the DWT of the signal (Bekiros, Marcellino, 2013).

Furthermore wavelet transform, just like EMD, suffers from the boundary effect (Su et

al., 2012).

7. Data

This section introduces the data that is used for the research. First, the main

properties of the data will be presented, as well as some of its most important descriptive

statistics. Then the effect of empirical mode decomposition will be described on an

example.

In this study the daily Brent crude oil spot price is chosen as an experimental

sample. The data is available and can be downloaded from the website of Energy

Information Administration. The data span a time period from 2000.01.04. to 2019.03.14.

(4867 observations). The given sample length is chosen as it encompasses the most

relevant extreme events occurred in the history of oil price for example 2001’s Terrorist

attack, IRAQ invasion of 2003, subprime crisis in 2008 and the OPEC decision in

2014.The original time series is non-stationary based on KPSS and ADF tests, that is why

this study uses log returns for prediction purposes. Only the log returns will be

decomposed and later predicted with the selected models. The data set is divided into two

parts, the in-sample period starts from 2000.01.04. and lasts 2006.01.03. (1542

observations), while the out-of-sample covers the period 2006.01.04. to 2019.03.14.

(3325 observations). The original Brent crude oil time series and the log returns are shown

on figure 6. the dashed line separates the in-sample from the out-of-sample period. Table

3. describes some of the most important descriptive statistics of log returns.


31

In-sample Out-of-sample

Mean 0.06089 % 0.00153 %

Standard deviation 0.02445 0.02137

Median 0.13609 % 0.02151 %

Min – max -0.19891 – 0.12853 -0.16832 – 0.18129

ADF statistic (p-value) -38.91 (0.001) -56.79 (0.001)

KPSS statistic (p-value) 0.023 (0.1) 0.067 (0.1)

Auto(1) – Auto(2) 0.0079 – 0.0259 0.0149 – 0.0138 3. Table: Descriptive statistics of log returns calculated on observations from the in-sample and

out-of-sample, Source: Own table

Based on the statistics in table 3. the two samples have relatively similar characteristics,

their mean return can be regarded as equal based on two-sample t-test. They are both

stationary and have no first and second order autocorrelation. Nevertheless both contain

volatile and calm periods.

Figure 6: Brent crude oil prices and returns for the entire sample, Source: Own figure

Figure 8-10. show the empirical mode decomposition of log returns in different periods.

Figure 8. shows a decomposition procedure which uses the entire sample. The red signal

is the original log return series, the green signal is the residual and the blue signals are the

IMFs. This figure describes how EMD decomposes the original signal into meaningful


32

components. The original signal can be reconstructed by simply summing up all the

components. The components contain high-, mid- and low-frequency information and

capture the complex characteristic of returns. The same decomposition for the in-sample

is described on figure 9. The figure shows us that the decomposition result is not

independent from the window size. In case of using only the in-sample information 15

components are generated, while 19 components are obtained from the entire sample. I

also calculated the components using data two years before and after the collapse of

Lehman Brothers. The results are shown on figure 10. In this case 13 components are

generated. The instability of components can be explained by the continuously changing

environment and by the occurrence of extreme events which can alter the data generating

process. Nevertheless the instability of components makes the result of noise selection

more difficult to interpret, since the number of selected components will also vary during

the analyzed period. Histograms of the number of EMD components can be seen on figure

7. An expending window type is applied for the decomposition, the first window covers

the in-sample period then the window expands as new data is given to the sample on a

daily basis. Figure 7. shows that the number of IMFs gradually increases as the window

expands. It is important to emphasize that the decomposition process in case of EMD can

be time consuming if we apply an expanding window.

In this paper I used MATLAB R2016b software for my calculations, for wavelet

decomposition I used ‘wavedec’ function, while ‘emd’ function was applied for empirical

mode decomposition. Signal reconstruction can be done by summation in case of EMD,

while ‘waverec’ function can be used for reconstructing wavelet coefficients.

7. Figure: Number of IMFs during the estimation period using expanding window, Source: Own

figure

33

33

8. Figure: Components of Brent crude oil generated by EMD on the entire sample, Source: Own figure

34

34

9. Figure: In-sample components of Brent crude oil generated by EMD, Source: Own figure

35

35

10. Figure: Components of Brent crude oil generated by EMD during the recession, 2006.09. – 2010.09., Source: Own figure


36

8. Empirica l analys is and results

This section introduces the prediction strategy, briefly describes the ARIMA model

and presents the results. The prediction strategy will be described with the help of the

general research framework which was introduced in section 4. This paper applies

ARIMA as prediction model, because it is a simple model and its parameters can be

estimated relatively quickly. It is important to emphasize that the focus of this paper is on

the denoising ability of EMD and wavelet.

8.1. Prediction strategy

This study applies the first research design from the four broad designs which were

introduced in section 4 on figure 1. This involves the decomposition of the original data,

the approach then applies different noise selection methods to choose and drop noise

components. The rest of the components are aggregated with the help of signal processing

inverse, which is summation in case of EMD, wavelet inverse in case of wavelet

decomposition. The prediction process is shown on figure 11. This study defines noise as

the following: the component or components of an observable signal, which, if dropped,

improves prediction accuracy.

11. Figure: Prediction process, Source: Own figure

The detailed version of the research design is depicted on figure 12.

37

12. Figure: Selected research design for empirical mode decomposition, Source: Own figure


38

Figure 12. helps better understand the selected research design for empirical mode

decomposition. This study analyzes daily log return data, using expanding window. The

first window is from 2000.01.04. to 2006.01.03. and it expands on a daily basis. Empirical

mode decomposition is selected as a decomposition method and a residual based stopping

criterion is chosen for terminating the algorithm. This paper applies the same stopping

criterion as Lahmiri (2016). Lahmiri (2016) computed the standard deviation (SD) from

two consecutive sifting results. The shifting process should be stopped if the standard

deviation is less than an arbitrary small number3. Huang et al. (1998) emphasize that

carrying the shifting process to an extreme could make the resulting IMF a pure frequency

modulated signal of constant amplitude. To guarantee that the IMF components have

enough physical sense one should set SD value between 0.2 – 0.3. This paper selected

𝜀 =0.2 as the stopping criterion, however several times the stopping criterion had to be

set 𝜀 =0.3 because the algorithm failed to converge.

This paper applies expert judgement, PACF, permutation entropy, sample entropy

and Shannon entropy as noise selection methods. In case of expert judgement the first,

the first two, the first three then the first four components are dropped. PACF approach

is based on the consideration that an uncorrelated identically distributed random sequence

with zero expected value can be regarded as white noise. The entropies are used as a tool

for optimal decomposition with respect to the minimization of an entropy, which

describes the information-relevant properties of the representation of a signal. The

entropy of each denoised signal is estimated step-wise and it is compared with the one

from the previous level. The procedure is the following, after decomposition the first

component is dropped, the rest of the signal is aggregated then the entropy of the denoised

signal is estimated. After that the first two components are dropped, the rest of the signal

is aggregated then the entropy of the denoised signal is estimated etc. The optimal level

of decomposition is determined at the minimum value of the entropy.

After the noise selection is done, a component or some components are dropped

and the rest of the components are aggregated. This paper applies only one prediction

model and uses the same window for prediction as for the decomposition (i.e. expanding

window). An ARIMA (p,0,q) model is selected for a one period prediction, the lag

parameters are p=1,2,3,4 and q=0,1,2,3,4. The optimal parameters are chosen based on

3 𝑆𝐷(𝑘) =

∑ (𝑐𝑘−1− 𝑐𝑘 )2𝑇

𝑡=0

∑ 𝑐𝑘−12𝑇

𝑡=0< 𝜀


39

Bayesian information criterion. Due to the fact that the components were aggregated in

an earlier stage, there is no need for aggregation at the end of the process.

The research design is the same in case of wavelet decomposition except for the

fact that a 10-level discrete wavelet decomposition is applied with the help of order 7

Daubechies wavelet. The level is selected based on Bekiros & Marcellino (2013), and the

order 7 Daubechies wavelet is applied in this study because it is one of the most popular

selection in the literature. Wavelet decomposition generates one approximation

component and ten detail components. The detail components contain the high frequency

information, therefore these are the components which are potentially selected as noise.

The noise selection procedure is the same as in the case of EMD. The expert judgement

approach involves dropping the first, the first two, the first three then the first four detail

components. After decomposition, PACF based noise selection aggregates each D1-D10

components per se with wavelet inverse. Components that have no autocorrelation are

dropped. In case of the entropy statistics, after decomposition, D1 component is dropped

then the rest of the components (A10, D2-D10) are aggregated using the inverse wavelet

transform and the three entropies are estimated. After that the first two detail components

are dropped, the rest of the signal is aggregated (A10, D3-D10) using wavelet inverse

then the entropy of the denoised signal is estimated etc. The optimal level of

decomposition is determined at the minimum value of the entropy.

To measure the forecasting performance, two main criteria are used for evaluation

of level prediction and directional forecasting, respectively. The root mean squared error

(RMSE) is selected as the evaluation of level prediction. RMSE can be defined as

𝑅𝑀𝑆𝐸 = √1

𝑁∑ (�̂�(𝑡) − 𝑥(𝑡))2𝑁

𝑡=1 (8)

where N is the number of prediction, �̂�(𝑡) is the predicted value and 𝑥(𝑡) is the observed

signal. Accuracy is one of the most important criteria for forecasting models, the other

being the decision improvements generated from directional predictions. From the

business point of view the latter is more important than the former. The ability to predict

movement direction can be measured by a directional statistic (𝐷𝑠𝑡𝑎𝑡) (Yu et al. 2008).

The statistic can be expressed as


40

𝐷𝑠𝑡𝑎𝑡 =1

𝑁∑ 𝑎𝑡

𝑁𝑡=1 (9)

where 𝑎𝑡 = 1 if (𝑦𝑡+1 − 𝑥𝑡)(𝑥𝑡+1 − 𝑥𝑡) ≥ 0 and 𝑎𝑡 = 0 otherwise. Here 𝑦𝑡+1 represents

the predicted value given by a model and 𝑥𝑡, 𝑥𝑡+1 are observed values.

8.2. Prediction model

This section describes briefly the prediction model ARIMA. In this research

ARIMA models are trained on the denoised signal in order to generate one period out of

sample prediction. The literature is rich in the description of ARIMA that is why only the

most important characteristics of the two models are highlighted in this section.

In an ARIMA model (Box & Jenkins, 1970) the future value of a variable is

assumed to be a linear function of several past observations and random errors. The

underlying process that generates the time series takes the following form:

Φ(𝐵)𝑦𝑡 = 𝜃(𝐵)𝑒𝑡 (10)

where 𝑦𝑡 and 𝑒𝑡 are the actual value and random error at time t respectively. 𝐵 denotes

the backward shift operator 𝐵𝑦𝑡 = 𝑦𝑡−1 and 𝐵2𝑦𝑡 = 𝑦𝑡−1 etc. and Φ(𝐵), 𝜃(𝐵) denotes

the following:

Φ(𝐵) = 1 − Φ1𝐵1 − Φ2𝐵2 − ⋯ − Φ𝑝𝐵𝑝 (11)

𝜃(𝐵) = 1 − 𝜃1𝐵1 − 𝜃2𝐵2 − ⋯ − 𝜃𝑞𝐵𝑞 (12)

where p ang q are parameters and often referred to as lag orders of the model. Random

errors are assumed to be independently and identically distributed with a mean of zero

and a constant variance. If the dth difference of {𝑦𝑡} is an ARMA process of order p and

q, then 𝑦𝑡 is called an ARIMA(p-d-q) process.

8.3. Results

This section summarizes the most important empirical results of the study. For the

sake of simplicity the section first introduces the results of the PACF noise selection,

followed by the results of the entropy based noise selection in cases when EMD was used

as decomposition method. After that, the results of wavelet based decomposition are

summarized in the same order, followed by the noise selection made with an expert


41

judgement approach. At the end of this section the results of EMD and wavelet

decomposition are compared.

Using the prediction strategy with PACF noise selection is proved to be a weak

approach. In case we want to drop components that are uncorrelated with their own lags

(i.e. PACF can be considered as zero for all lags) then none of the components are selected

as noise. Even the highest frequency component has significant first, second and third

order autocorrelation. The first bar chart of figure 13.4 shows the ratio of the first IMFs

that have statistically significant lags, based on the figure all the IMF1 have statistically

significant first, second and third order autocorrelation (i.e. PACF lags are not zero). The

results are the same in case of IMF2 and IMF3 sequences. That is why this approach

suggests that all the IMFs have information content thereby using the observed signal

(return) is beneficial.

Permutation entropy, a natural complexity measure for time series (Bant & Pompe,

2002), is proved to be a weak approach in this study. The time delay was set to one, the

order of the ordinal patterns was set to three. This means that three consecutive

observations were grouped in embedded vectors5. The noise selection was not successful

with permutation entropy, since its value decreased as more and more IMFs were

4 The signal was decomposed on a daily basis, therby the number of decomposition was 3325. I collected

all the IMF1, IMF2 and IMF3 sequences becasue theese are the highest frequency components. The figure

shows the ratio of IMFs that have statistically significant 1-6 lags based on PACF. 5 For more details check Riedl et al. (2013)

13. Figure: Ratio of significant lags in the first three IMFs, Source: Own figure


42

dropped. Consequently the minimum value of permutation entropy was calculated at the

trend component in 91% of all decompositions.

14. Figure: Typical values of permutation entropy estimated from denoised signals, Source: Own

figure

Figure 14. shows the effect of denoising on permutation entropy. The values on figure

14. were calculated using the entire sample and shows how permutation entropy decreases

as more and more IMFs are dropped6. In spite of the fact that figure 14. shows the result

of one decomposition, the same pattern appeared in most of the cases. Therefore

permutation entropy suggests in 91% of decompositions that all of the IMF components

should be dropped except the trend. Noise selection with sample and Shannon entropy

were more successful. Figure 15-16. show the result of noise selection based on sample

and Shannon entropy. The left histogram shows the number of dropped components based

on the entropy and the right is the number of components generated with expanding

window.

6 The first point on the figure was calculated after denoising the signal from IMF1. The second point on the

figure shows the value of permutation entropy when IMF1 and IMF2 are dropped etc.


43

15. Figure: Number of dropped IMFs based on sample entropy and number of generated IMFs

using expanding window, Source: Own figure

16. Figure: Number of dropped IMFs based on Shannon entropy and number of generated IMFs


Noise selection with sample entropy led to a similar result but not that radical as

permutation entropy. Sample entropy suggests to drop several components and use the

last two to three components for reconstruction. In case of sample entropy the embedding

dimension was set to 200 and the tolerance value to 0.37. Nevertheless Sample entropy

can be used for the analysis because in most of the times it suggests to keep some of the

components which we can use for reconstructing a signal. Shannon entropy based noise

7 For determining the parameters I used Richman and Mooran (2000) study, however parameter selection

involved several trials and erros.


44

selection has the most promising result, it suggests to keep on average seven to eight

components. Later in this section the prediction enhancing performance of sample and

Shannon entropy based noise selection is described.

Using level ten wavelet decomposition and PACF for noise selection has the same

result as in case of EMD: none of the components are selected as noise because all are

autocorrelated based on PACF. Consequently in this study PACF could not be used as a

noise selection tool.

Wavelet decomposition also led to the same conclusion in case of the three

entropies. The more detail components we drop the less the value of permutation entropy

is. In general the reconstruction of approximation coefficient is suggested based on

permutation entropy. Figure 18. shows how permutation entropy decreases as more and

more detail components are dropped. The top chart on the left side shows a signal that is

denoised from D1, the chart under it shows the signal that is denoised from D1 and D2,

the last chart is the reconstructed approximation component. Their permutation entropy

value is presented on the right side of the figure. Figure 18. was created using the entire

sample, thereby it depicts one decomposition, however the pattern on the figure is similar

to the majority of decompositions. Sample and Shannon entropy led to a similar result as

in case of empirical mode decomposition. Figure 17. shows the dropped detail

components based on Shannon and sample entropy. A level ten wavelet decomposition

was applied, therefore every time an entropy suggests that ten detail components should

be dropped is equivalent to reconstructing only the approximation coefficients. As in case

of EMD Shannon entropy based noise selection has the most promising result, it suggests

to keep two to three detail components.


45

17. Figure: Number of dropped detail components based on Shannon and Sample entropy


46

18. Figure: Denoised signals and their permutation entropy using wavelet decomposition, Source: Own figure


47

Due to the fact that sample and Shannon entropy suggest to drop several

components both in case of EMD and wavelet decomposition, an expert judgement

approach is applied for noise selection. This strategy involves dropping the first, the first

two, the first three and the first four components. These components are selected

arbitrarily, nevertheless these are the ‘high frequency’ components and it stands to reason

that dropping them is beneficial. In the following part of this section the prediction

performances of the above methods are summarized, using the prediction strategy

described in section 8.1.

Table 4. shows the RMSE values for predicting the original log returns, the

prediction performance of using an expert judgement approach and the results of entropy

based denoising. It also shows the Diebold Mariano test statistics. Table 5. shows the

values of the direction statistic.

Original Method Expert judgement Entropy

2.115

- Drop 1 Drop 1-

2

Drop 1-

3

Drop 1-

4

Shannon Sample

EMD 2.812 3.445 3.550 3.402 3.191 3.040

Wavelet 2.589 2.625 2.464 2.370 2.30 2.412

DM test 5.84 11.29 14.54 14.96 14.28 11.23 4. Table: Prediction performance of the denoising methods based on RMSE (multiplied by 100),

Source: Own table


75.42%

- Drop 1 Drop 1-

2

Drop 1-

3

Drop 1-

4

Shannon Sample

EMD 64.18% 45.65% 30.86% 29.41% 37.62% 36.92%

Wavelet 67.88% 65.92% 70% 71.22% 74.11% 62.33% 5. Table: Prediction performance of the denoising methods based on 𝑫𝒔𝒕𝒂𝒕 ,Source: Own table

Based on table 4-5. dropping IMF1 (i.e. the highest frequency component) is the best

prediction strategy for empirical mode decomposition, while selecting noise with

Shannon entropy gives the most accurate level prediction in case of wavelet

decomposition. The direction statistic led to the same conclusion. Diebold-Mariano test

analyses the equivalence of two forecasts based on squared prediction errors. Every EMD

prediction strategy is compared to its wavelet counterpart. Based on DM statistics the

equivalence of forecasts can be rejected. Based on table 4-5. wavelet based decomposition

led to a more accurate prediction relative to EMD. However, forecasting the original

signal is the most accurate prediction based on both RMSE and directional statistic. If we

compare the prediction accuracy of the original signal to the best performing EMD and

wavelet, the DM statistic rejects their equivalence. Figure 19. shows the evolution of


48

cumulative RSE throughout the out-of-sample period in case of the best performing EMD

and wavelet models.

19. Figure: Cumulative RSE of the two best performing models throughout the out-of-sample

period, Source: Own figure

9. Robustness check

In the last section of this study, I will perform robustness checks. In this section, I

will test the validity of my results, by recalculating the models with slightly different

settings. This way I can check how sensitive my results are. The general research

framework (figure 2.) gives a tool for robustness check by selecting different settings in

Data column. This study uses weekly log returns for the same period and applies rolling

widow for decomposition and prediction. Each window contains 250 observations and

the number of decomposition was 751.

The threshold selection for empirical mode decomposition was highly sensitive.

This study used 𝜀 =0.2 as a threshold for terminating the shifting process. However the

threshold had to be changed several times between values 0.2 and 0.4 in order to ensure

convergence. Seemingly there is no connection between threshold changes and volatile

periods. The root cause of the parameter change is unknown. I had the same problem in

case of expanding window with daily data and rolling window with weekly data.

I also had difficulties with the spline fit on the local minima and maxima time series.

In case of using rolling window (500, 1000, 1500, 1600, 2000 and 2500) and daily log


49

returns the spline fit could not be applied, which is an important part of the shifting

process.

The number of IMFs are gradually increased in case of expanding window (figure

7.), while it remained stable for weekly data and rolling window. This can be explained

by the change in the data generating process. In case we use expanding window all the

past information is used even those that are observed before a potential regime shift. A

rolling window, that incorporates 250 observations every time, has less change to use

information from multiple regimes. Another explanation for the stable IMF number stems

from the fact that weekly data are smoother than daily data. Therefore high frequency

components are removed as we change from daily to weekly data.

20. Figure: Histograms of the number of IMFs using rolling window, Source: Own figure

The result of PACF noise selection remained the same both in case of EMD and

wavelet decomposition. The components are autocorrelated based on PACF,

consequently they cannot be considered as white noise.

Noise selection with permutation entropy remained the same both in case of EMD

and wavelet decomposition.


50

21. Figure: Permutation entropy based noise selection in case of EMD, Source: Own figure

Figure 21. shows the histogram of IMFs on the left, using rolling window, weekly

data and the result of all 751 decompositions. The histogram on the right shows the

number of IMFs that are suggested to drop by permutation entropy. The figure shows that

almost all of the IMFs should be dropped based on permutation entropy. Wavelet

decomposition led to the same conclusion. The more detail components we drop the less

the value of permutation entropy is. In general the reconstruction of approximation

coefficients is suggested based on permutation entropy. Sample entropy drops all the

detail components, while Shannon entropy gives similar results as in case of expanding

window and daily data.

22. Figure: Number of dropped detail components based on Shannon and Sample entropy using

weekly data and rolling window, Source: Own figure


51

Table 6. shows the RMSE values for predicting the original log returns, the prediction

performance of using an expert judgement approach and the results of entropy based

denoising. The best performing models changed, sample entropy based denoising is the

most accurate in case of EMD, while dropping IMF1 and IMF2 strategy is the most

accurate in case of wavelet based decomposition. Using rolling window and weekly data

does not change the fact that wavelet based denoising strategies are more accurate than

empirical mode decomposition based strategies. All the wavelet strategies are better than

predicting the original log returns.


4.569

- Drop 1 Drop 1-

2

Drop 1-

3

Drop 1-

4

Shannon Sample

EMD 7.468 7.049 6.986 6.133 5.365 5.217

Wavelet 4.056 4.045 4.048 4.054 4.052 4.102

DM test 22.23 19.38 18.73 14.72 12.68 12.74 6. Table: Prediction performance of the denoising methods based on RMSE (multiplied by 100)

using rolling window and weekly data, Source: Own table


69.6%

- Drop 1 Drop 1-

2

Drop 1-

3

Drop 1-

4

Shannon Sample

EMD 58.0% 55.2% 57.07% 57.87% 57.5% 60.0%

Wavelet 63.47% 59.73% 64.93% 67.2% 68.4% 64.1% 7. Table: Prediction performance of the denoising methods based on 𝑫𝒔𝒕𝒂𝒕 using rolling window

and weekly data, Source: Own table

10. Conclus ion

Given the available literature, the paper’s contribution is threefold: (1) the paper

introduced a general research framework which describes the possible research designs

in decomposition based financial time series forecasting, (2) the paper provided a

thorough literature review based on the most important articles and classified it with the

help of the general research framework, (3) the paper compared PACF, entropy and the

expert judgement based noise selection methods in terms of their contribution to

prediction accuracy.

The framework introduced in this paper has several advantages, it helps with the

formulation of the research design, it helps researchers specify all the necessary details

or parameters of their research design thereby facilitating the paper’s reproduction, the


52

framework makes a great progress in comparing and classifying research papers, since it

provides the necessary groupings for classification. The proposed framework facilitates

the aggregation of the results of the current literature, consequently it helps us better

understand the efficiency of signal processing techniques in financial time series analysis.

Moreover it paves the way for a meta-study in which the current results can be combined.

It helps determine the reliability of the results presented in a research paper. The

framework fills a gap in the current literature which opens up the opportunities for further

researches.

This paper provided a thorough literature review on financial time series

forecasting. All the papers apply one of the decomposition methods from wavelet or

empirical mode decomposition family and they analyze oil price or foreign exchange

data. The main purpose of the literature review was to describe the trends in financial

time series forecasting, focusing particularly on characteristics such as decomposition

method, data, noise selection, reconstruction method, prediction models and the result of

their analysis.

Finally the paper compared PACF, entropy and the expert judgement based noise

selection methods. The noise selection was based on the decomposition result of

empirical mode decomposition and wavelet decomposition. Dropping the highest

frequency component was the best strategy in case of EMD, while Shannon entropy noise

selection resulted the most accurate prediction in case of wavelet decomposition.

However none of the strategies produced better performance than predicting the original

time series. This paper performed a robustness check where the decomposition strategies

were recalculated with slightly different settings. Using weekly data and rolling window

the wavelet based denoising strategies produced more accurate forecasts than predicting

the original signal. This result emphasizes the sensitivity of denoising methods to the

input data and the parameter settings.

The analysis gives reason for concern. First of all the threshold selection for

empirical mode decomposition was highly sensitive. The convergence of the EMD

algorithm is not stable, it frequently stopped because it failed to fit a spline. Moreover the

number of IMFs gradually increased using expanding window, which makes the

interpretation of components more difficult. Using EMD for decomposition is time

consuming, the daily decomposition with expanding window took about 4-5 hours. The

decomposition based forecasting strategies have promising results, as it was shown in the

literature review, however the decomposition strategies presented in this paper failed to


53

beat the strategy where decomposition was not involved in case of expanding window.

The situation was different when rolling window was applied. Finding a proper noise

selection method can enhance prediction performance, since it can help models to train

for the fundamental part of a signal and capture the most important factors.

54

References

A. Boukhayma, A. Peizerat, C. Enz, 2016, Noise Reduction Techniques and Scaling

Effects towards Photon Counting CMOS Image Sensors, Sensors, 2016. Apr. 09.

A. Chen, M.T. Leung, D. Hazem, 2003, Application of neural networks to an emerging

financial market: Forecasting and trading the Taiwan Stock Index, Computers &

Operations Research, Vol. 30, 901-923. p.

A. Mirzaei, A. Ayatollahi, P. Gifani, L. Salehi, 2010, Spectral Entropy for Epileptic

Seizures Detection, 2010 Second International Conference on Computational Intelligence

A. Lanza, M. Manera, M. Giovannini, 2005,Modeling and forecasting cointegrated

relationships among heavy oil and product prices, Energy Economics, Vol. 27, 831–48.

p.

A. Rua, 2012, Wavelets in economics, Banco de Portugal Economic Bulletin, Vol. 18,

No. 2, pp. 71–79.

A.C. Smith, P. Monaghan, F. Huettig, 2017, The multimodal nature of spoken word

processing in the visual world: Testing the predictions of alternative models of

multimodal integration, Journal of Memory and Language vol. 93, 2017 April, 276-303.

p.

A. S. Berres, T. L. Turton, M. Petersen, D. H. Rogers, J.P. Ahrens, 2017, Video

Compression for Ocean Simulation Image Databases, Workshop on Visualisation in

Environmental Sciences

BP, 2018, Statistical Review of World Energy, 2018 June, 67th edition

[Link:https://www.bp.com/content/dam/bp/business-

sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-

2018-full-report.pdf]

B. Zhu, X. Shi, J. Chevallier, P. Wang, Y-M. Wei, 2016, An Adaptive Multiscale

Ensemble Learning Paradigm for Nonstationary and Nonlinear Energy Price Time Series

Forecasting, Journal of Forecasting

C. Bandt, B. Pompe, 2002, Permutation Entropy: A Natural Complexity Measure for

Time Series, Physical Review Letters, Vol. 88, No. 17

C-S. Lin, S-H. Chiu, T-Y. Lin, 2012, Empirical mode decomposition-based least squares

support vector regression for foreign exchange rate forecasting, Economic Modelling,

vol 29, 2583-2590. p.

C-Y. Tseng, HC Lee, 2010, Entropic interpretation of empirical mode decomposition and

its applications in signal decomposition, Advances in Adaptive Data Analysis, Vol. 2,

No. 4, 429-449. p.

FIA, 2018, Total 2017 volume 25.2 billion contracts, down 0.1% from 2016, 2018. jan.

24.

[Link: https://fia.org/articles/total-2017-volume-252-billion-contracts-down-01-2016]

https://fia.org/articles/total-2017-volume-252-billion-contracts-down-01-2016

55

J.L. Zhang, Y.J. Zhang, L. Zhang, 2015, A novel hybrid model for crude oil price

forecasting, Energy Economics, Vol. 49, 2015. May, 649-659. p.

N. Krichene, 2007, Recent Dynamics of Crude Oil Prices, International Monetary Fund,

Working Paper December 2006

N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung,

H.H. Liu,1998, The empirical mode decomposition and the Hilbert spectrum for

nonlinear and nonstationary time series analysis, Proceedings of the Royal Society A:

Mathematical, Physical & Engineering Sciences 454, 903–995.

L. Juvenal, I. Petrella, 2014, Speculation in the oil market, Journal of Applied

Econometrics, vol 30, 2015 June/July, 621-649. p.

L. Yu, Z. Wang, L. Tang, 2015, A decomposition-ensemble model with data-

characteristic-driven reconstruction for crude oil price forecasting, Journal of Applied

Energy, vol. 156, 251-267. p.

L.Yu, S. Wang, K.K. Lai, 2008, Forecasting crude oil price with an EMD-based neural

network ensemble learning paradigm, Energy Economics, vol. 30, 2623 – 2635. p.

L. Yu, W. Dai, L. Tang, 2016, A novel decomposition ensemble model with extended

extreme learning machine for crude oil price forecasting, Engineering Application of

Artificial Intelligence, Article in Press

M. Khashei, M. Bijari, 2010, An artificial neural network (p, d,q) model for timeseries

forecasting, Expert Systems with Applications, Vol. 37, 479-489. p.

G. Uliha, 2016, Az olajár és a makrogazdaság kapcsolatának elemzése folytonos wavelet

transzformáció segítségével, Statisztikai Szemle, Vol. 94, No. 5., 505 -534. p.

G.C. Watkins, A. Plourde, 1994, How volatile are crude oil prices?, OPEC Review, vol

18. 220-245.p.

G.E.P. Box, G. Jenkins, 1970. Time Series Analysis: Forecasting and Control, Holden-

Day, San Francisco, CA.

G.P. Zhang, B. E. Patuwo, M.Y. Hu, 2001, A simulation study of artificial neural

networks for nonlinear time-series forecasting, Computers & Operations Research, Vol.

28, 381-396.p.

H. Su, Q. Liu, J. Li, 2012, Boundary Effects Reduction in Wavelet Transform for Time-

frequency Analysis, WSEAS Transaction on Signal Processing, Vol. 8., Issue 4, 169-

179.p.

H-Y. Zhang, Q. Ji, Y. Fan, 2015, What drives the formation of global oil trade patterns,

Energy Economics, vol. 49, 2015. March

I. Daubechies, 1992, Ten Lectures on Wavelets. Regional Conference Series in Applied

Mathematics (SIAM), vol. 61. Society for Industrial and Applied Mathematics,

Philadelphia, USA

56

J.S. Richman, J.R. Mooran, 2000, Physiological time-series analysis using approximate

entropy and sample entropy, American Journal of Physiology, Vol. 278, 2039-2049. p.

K-J. Kim, 2003, Financial time series forecasting using support vector machines,

Neurocomputing, Vol. 55, Issues 1-2, 307-319.p.

M. Riedl, A. Müller, N. Wessel, 2013, Practical consideration of permutation entropy,

The European Physical Journal Special Topics, Vol. 222, June 2013, 249-262. p.

N. Nomikos, K. Andriosopoulos, Modelling energy spot prices: empirical evidence

from NYMEX. Energy Econ 2012, Vol. 34, 1153–69. p.

Q. Guan, H. An, X. Gao, S. Huang, H. Li, 2016, Estimating potential trade links in the

international crude oil trade: A link prediction approach, Energy, Vol 102, 406-415. p.

R. Jammazi, C. Aloui, 2012, Crude oil price forecasting: Experimental evidence from

wavelet decomposition and neural network modeling, Energy Economics, Vol. 34, 828 –

841.p.

R. D.F. Harris, F. Yilmaz, 2009, A momentum trading strategy based on the low frequency

component of the exchange rate, Journal of Banking and Finance, Vol. 33, 1575-1585. p.

S. Bekiros, M. Marcellino, The multiscale causal dynamics of foreign exchange markets,

Journal of International Money and Finance, Vol. 33, 282-305. p.

S. Lahmiri, A variational mode decompoisition approach for analysis and forecasting of

economic and financial time series, Expert Systems with Application, Vol. 55, 268-273.p.

S. Mirmirani, HC. Li, 2005, A comparison of VAR and neural networks with genetic

algorithm in forecasting price of oil, Advances Econometrics, Vol. 19, 203–23. p.

S. Yousefi, I. Weinreich, D. Reinarz, 2005, Wavelet based prediction of oil prices, Chaos,

Solitons and Fractals, 265-275. p.

Tang L, Yu L, He KJ., 2014, A novel data-characteristic-driven modeling methodology

for nuclear energy consumption forecasting, Applied Energy, Vol. 128, 1–14. p.

T. Xiong, Y. Bao, Z. Hu, 2013, Beyond one-step-ahead forecasting: Evaluation of

alternative multi-step-ahead forecasting models for crude oil prices, Energy Economics,

Vol. 40, 405-415.p.

T. Xiong, Y. Bao, Z. Hu, 2014, Does restraining end effect matter in EMD-based

modeling framework for time series prediction? Some experimental evidences,

Neurocomputing, Vol. 123, 174-184. p.

W. Shu-ping, H. Ai-mei, W. Zhen-xin, L. Ya-qing, B. Xiao-wei, 2014, Multiscale

Combined Model Based on Run-Length-Judgment Method and Its Application in Oil

Price Forecasting, Hindawi Publishing Corporation

Y. Baimbetov, I. Khalil, M. Steinbauer, G. Anderst-Kotsis, 2015, Using Big Data for

Emotionally Intelligent Mobile Services through Multi-Modal Emotion Recognition, In:

https://www.researchgate.net/journal/1951-6355_The_European_Physical_Journal_Special_Topics

57

Geissbühler A., Demongeot J., Mokhtari M., Abdulrazak B., Aloulou H. (eds) Inclusive

Smart Cities and e-Health. ICOST 2015. Lecture Notes in Computer Science, vol 9102.

Springer, Cham

Y. Deng, W. Wang, C. Qian, Z. Wang, D. Dai, 2001, Boundary-processing-technique in

EMD method and Hilbert transform, Chinese Science Bulletin, Vol. 46, 954 – 960. p.

Y. Xiang, HX. Zhuang, 2013, Application of ARIMA model in short-term prediction of

international crude oil price, Advances in Material Research, Vol. 798, 979–82. p.

Z. Guo, W. Zhao, H. Lu, J. Wang, 2012, Multi-step forecasting for wind speed using a

modified EMD-based artificial neural network model, Renewable Energy, Vol. 37, 241-

249. p.

Z. Wu, N.E. Huang, 2009, Ensemble empirical mode decomposition: a noise assisted

data analysis method, Advances Adapive Data Analysis, Vol. 1, 1- 41. p.

Documents

EMD and wavelet decomposition based denoising and ...web.cs.elte.hu/blobs/diplomamunkak/msc_actfinmat/2019/plangar_balint.pdfEMD and wavelet decomposition based denoising and forecasting