A Hybrid Prediction Method for Stock Price Using LSTM and

Research ArticleA Hybrid Prediction Method for Stock Price Using LSTM andEnsemble EMD

Yang Yujun 123 Yang Yimei 12 and Xiao Jianhua12

1School of Computer Science and Engineering Huaihua University Huaihua 418008 China2Key Laboratory of Intelligent Control Technology for Wuling-Mountain Ecological Agriculture in Hunan ProvinceHuaihua 418000 China3Key Laboratory of Wuling-Mountain Health Big Data Intelligent Processing and Application in Hunan Province UniversitiesHuaihua 418000 China

Correspondence should be addressed to Yang Yimei yym1630163com

Received 24 July 2020 Revised 13 September 2020 Accepted 21 November 2020 Published 4 December 2020

Academic Editor Cheng Lu

Copyright copy 2020 Yang Yujun et al )is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

)e stock market is a chaotic complex and dynamic financial market )e prediction of future stock prices is a concern andcontroversial research issue for researchers More and more analysis and prediction methods are proposed by researchers Weproposed a hybrid method for the prediction of future stock prices using LSTM and ensemble EMD in this paper We usecomprehensive EMD to decompose the complex original stock price time series into several subsequences which are smoothermore regular and stable than the original time series )en we use the LSTM method to train and predict each subsequenceFinally we obtained the prediction values of the original stock price time series by fused the prediction values of severalsubsequences In the experiment we selected five data to fully test the performance of the method)e comparison results with theother four predictionmethods show that the predicted values show higher accuracy)e hybrid predictionmethod we proposed iseffective and accurate in future stock price prediction Hence the hybrid prediction method has practical application andreference value

1 Introduction

According to the statistics of China Securities Depositoryand Clearing Corporation as of March 2020 there are 1633million securities investors in China Stock price forecastingis a difficult and meaningful task for financial institutionsand private investors In order to effectively reduce in-vestment risks and obtain stable returns on investmentmany scholars put forward a large number of predictionmodels [1ndash9] With the speedy development of big dataapplication technology especially the application of ma-chine learning and deep learning in the financial field it hasa profound impact on investors Research directions includelow-frequency data and high-frequency data [10] )eprevious research studies are mainly divided into two kindsof methods fundamental analysis and technical analysis[11]

On the one hand in technical analysis people widely usemathematical statistical techniques to analyze historicalstock price trends and predict recent stock prices In recentyears many researchers have applied a variety of machinelearning algorithms to analyze and predict stock prices suchas neural networks multicore learning [12] stepwise re-gression analysis [13] and deep learning [14 15] Althoughmany algorithms have achieved good results in certain as-pects there are many parameter configurations and dataselection problems in the use of machine learning which isstill an important area of research On the other hand in thefundamental analysis [16ndash18] people mainly use naturallanguage processing to analyze the companyrsquos financial newsand financial statements to predict the future stock pricetrend

)e long-short term memory (LSTM) is a very goodmethod in dealing with time series Stock price data belongs

HindawiComplexityVolume 2020 Article ID 6431712 16 pageshttpsdoiorg10115520206431712

to time-series data )erefore many researchers use LSTM[19ndash26] to analyze and predict stock prices Many studieshave analyzed the correlation between time-series data[27ndash30] and the results show that LSTM has advantages intime-series data In the literature [31] researchers usedLSTM to predict the coding unit split and the experimentalresults proved the advantages of LSTM in terms of efficiencyTime-series data trend research is also a new form of time-series data prediction so LSTM is a natural choice

Empirical mode decomposition (EMD) technology isusually applied to nonstationary and nonlinear signals EMDcan decompose nonlinear signals into several inherentmodal functions (IMF) adaptively EMD can effectivelysuppress continuous noise such as Gaussian noise [32]However EMD cannot suppress intermittent noise andmixed noise Ensemble empirical mode decomposition(EEMD) technology can solve the problem of noise-modemixing [33] In the EEMD algorithm a group of white noiseis first added to the original signal and then it is decomposedinto several IMF )e average value of the correspondingIMF set is regarded as the correct result EEMD will separatethe noise in different IMF from the original signal com-ponents [34] thus eliminating the noise-mode mixingphenomenon In recent years the application of EEMD hasattracted the attention of many researchers and scholars[27 35ndash44] In order to solve the problem that noise inpractical applications makes interference term retrievaldifficult Zhang et al [35] proposed a technique based onEEMD and EMD to achieve automatic interference termretrieval from the spectral domain low-coherence interfer-ometry )e proposed algorithm uses EEMD technology tomake the relative error of coupling strength less than 2 Tosolve the problem that Gaussian noise and non-Gaussiannoise seriously hinder the detection of rolling bearing defectsby traditional methods Jiang et al [36] proposed a newrolling bearing inspectionmethod that combines bispectrumanalysis with improved integrated EMD )is method usesensemble empirical mode decomposition technology to havesuperior performance in reducing multiple backgroundnoises and can more effectively detect defects in rollingbearings To solve the problem of the influence of the au-thenticity of the partial discharge signal on the evaluationaccuracy of the transformer insulation performance Wanget al [37] proposed a method to suppress white noise in PDsignals based on the integration of EMD and the combi-nation of high-order statistics )is method uses EEMDdecomposition to the threshold and reconstructs each IMFto suppress the white noise in each component To solve theproblem that most existing measurement methods onlyfocus on mathematical values and are affected by mea-surement errors interference and uncertainty Wei et al[38] proposed a new time-history comparison for vehiclesafety analysis by the integrating empirical mode decom-position method)is method uses EEMD decomposition tomake the trend signal to reflect the overall change and is notaffected by high-frequency interference To solve the difficultproblem of wind speed prediction Yang and Yang [39]proposed a hybrid BRR-EEMD short-term predictionmethod for wind speed based on the EMD and Bayesian

ridge regression (BRR) )is hybrid method uses theBayesian regression method and the EEMD to performregression prediction on each subsequence decomposed bythe EEMD and obtains good results in wind speed pre-diction In order to find potential profit arbitrage oppor-tunities when the returns of stock index futures contractsand stock index futures contracts continue to deviate fromfair prices in irrational and nonefficiently operating marketsSun and Sheng [40] proposed a time-series analysis methodbased on integrated EMD )is method uses EEMD toanalyze the stock futures basis sequence and extracts amonotonically decreasing trend from the sequence to dis-cover business opportunities To improve the problem that asingle method of predicting complex and nonlinear stockprices cannot achieve good results Al-Hnaity and Abbod[41] proposed a hybrid integrated model based on ensembleempirical mode decomposition and backpropagation neuralnetwork to predict the closing price of the stock index )eresearchers [41] have proposed five hybrid predictionmodels ensemble EMD-NN ensemble EMD-Bagging-NNensemble EMD-Crossvalidation-NN ensemble EMD-CV-Bagging-NN and ensemble EMD-NN )e experimentalresults show that the performance of the ensemble EMD-CV-Bagging-NN ensemble EMD-Crossvalidation-NN andensemble EMD-Bagging-NN models based on ensembleEMD are all a grade higher than that of the ensemble EMD-NN model and significantly higher than the single neuralnetwork model

)e typical forecasting scheme is based on the forecast ofthe time-series data itself and does not deal with the time-series data itself It has become a challenge that how tocombine the existing forecasting methods to improve theforecasting effect by decomposing the time-series data )eabove methods use EEMD to decompose the time-seriesdata to improve the performance of the algorithm How toeffectively decompose complex and nonlinear stock time-series data for prediction has been puzzled by many re-searchers Due to the uncertainty and nonlinearity of thestock time series the deviation of a single method to predictstock prices is generally relatively large)e abovementionedhybrid method does indeed improve the algorithm signifi-cantly )erefore it can be boldly guessed that the hybridmethod generally can get better prediction results than thesingle specific method Besides the original complex timeseries was decomposed by the EEMD method into severalrelatively stable subsequence time series By effectivelycombining several current effective forecasting methods theforecasting results of relatively stable subsequence timeseries are theoretically better Combining the features of theEEMD method based on the improved empirical modedecomposition method and the LSTM machine learningalgorithm this paper proposes a hybrid LSTM-EEMDmethod for stock index price prediction

)e rest of this paper is organized as follows )reerelated terminology such as EMD EEMD and BRR arepresented in Section 2 while Section 3 briefly introduces theflowchart of our proposed LSTM-EEMD method and thestructure of our proposed hybrid LSTM-EEMD method InSection 4 we describe the experiment data collection

2 Complexity

experiment data preprocessing and modeling processing InSection 5 we describe the experimental results of ourproposed hybrid LSTM-EEMD method and analyze theresults of simulation experiment of our proposed hybridLSTM-EEMDmethod for prediction Finally the conclusionof this paper and some future works are described in Section8

2 Related Works

Over the years many studies in the financial field havefocused on the problems of stock price prediction )esestudies mainly focus on three important research directions(1) based on the machine learning method (2) based on thetime-series analysis method and (3) based on the hybridmethod Below we first briefly introduce related termi-nology )en we introduce the LSTM and EEMD related tothis study

21 Stock Price Predicting Stock prices are a highly volatiletime series Stock prices are affected by various factors suchas national policies interest rates exchange rates industrynews inflation monetary policies temporary events in-vestor sentiment and human intervention Predicting stockprices on the surface requires the establishment of a model ofthe relationship between stock prices and these factorsAlthough these factors will temporarily change the stockprice in essence these factors will be reflected in the stockprice and will not change the long-term trend of the stockprice )erefore stock prices can be predicted simply withhistorical data

)is paper believes that there are many studies using asingle analysis method to predict stock market trends butthe results are not good Need to consider a variety of factorsor use a variety of techniques to build a hybrid model tofurther explore the prediction of stock prices In-depthsystematic research is required to answer the followingresearch questions

RQ1 Which factors or combinations of factors mostaffect the trend of the stockRQ2 What kind of analysis technology combination ismost suitable for stock trend predictionRQ3 Do we need to use deep learning methods to minedata in order to better discover the internal relationshipbetween the stock market and influencing factorsRQ4 Whether the predictability of the analysis modeldepends on specific stock company characteristicssuch as the domain shareholder background andpoliciesRQ5 In the context of stockmarket forecasting have wedeveloped some effective forecasting methodsRQ6 We should focus on the analysis of the specificnature of the stock price itself rather than solvinggeneral relationship problems Whether the priceanalysis driven by influencing factors can be moreeffective

22 LSTM )e long-short term memory neural network isgenerally called LSTM )e LSTM was proposed byHochreiter and Schmidhuber [27] in 1997 )e LSTM is aspecial type of recurrent neural network (RNN) )e biggestfeature of the RNN which was improved and promoted byAlex Graves is that long-term dependent information of datacan be obtained LSTM has been widely used in many fieldsand has achieved considerable success in many problemsSince LSTM can remember the long-term information ofdata the design of LSTM can avoid the problem of long-term dependence Currently the LSTM is a very populartime-series forecasting model Below we first introduce theRNN network followed by the LSTM neural network

3 Preliminaries

31RecurrentNeuralNetwork When we deal with problemsrelated to the timeline of events such as speech recognitionsequence data machine translation and natural languageprocessing traditional neural networks are powerless RNNis specifically proposed to solve these problems Because thecorrelation between the contexts of the text needs to beconsidered in the word processing the weather conditions ofconsecutive days and the relationship between the weatherconditions of the day and the past days need to be consideredwhen predicting the weather

)e RNN has a chain form of repeating neural networkmodules In the standard RNN this repeated structuralmodule has only a very simple structure such as a tanh layer)e simple structure of the recurrent neural network isshown in Figure 1

)e design intent of RNN is to solve nonlinear problemswith timelines )e way of internal connection of the re-current neural network generally only feeds forward thedata but in the bidirectional recurrent neural network itallows the forward and backward directions to feedback thedata )e RNN has designed a feedback mechanism so theRNN can easily update the weight or residual value of theprevious step )e design of the feedback mechanism is verysuitable for time-series forecasting )e RNN can extractrules from historical data and then use the rules to predicttime series Figure 1 shows the simple structure of the RNNand Figure 2 shows the expanded diagram of the basicstructure of the RNN )e left side of the arrow with unfoldlabel in Figure 2 is the basic structure of the recurrent neuralnetwork )e right side of the arrow with unfold label inFigure 2 is a continuous 3-level expansion of the basicstructure of the recurrent neural network An input data xt isinput into module h of the RNN )e yt is an output ofmodule h of the RNN values at time t Like other neuralnetworks the recurrent neural network shares all parametersof each layer such as Whx Whh and Wyh in Figure 2 Asshown in Figure 2 the RNN shares two input parametersWhx and Whh and one output parameter Wyh As we allknow the number of parameters in each layer of a generalmultilayer neural network is different Looking at Figure 2we feel that the operation of each step of the recurrent neuralnetwork is the same on the surface In fact the output yt andthe input xt are different )e number of parameters of the

Complexity 3

recurrent neural network will be significantly reducedduring training After multilevel expansion the recurrentneural network becomes a multilayer neural networkLooking closely at Figure 2 we find that Whh between layerht-1 and ht is the same as Whh between layer ht and ht+1 inform In value and meaningWhh between layer ht-1 and ht isdifferent as Whh between layer ht and ht+1 Similarly Whxand Wyh have similar situations

Although each layer of the RNNneural network has outputand input modules the output and input modules of somelayers can be omitted in specific application scenarios Forexample in language translation we only need the overalllanguage symbol output after the last language symbol is inputand do not need to know the language symbol output after eachlanguage symbol is input )e main feature of RNN is self-expanding with multiple hidden layers

As we all know during network training the recurrentneural network models are prone to disappearing gradientsOnce the gradient of the model disappears completely thealgorithm enters an endless loop and the network trainingwill not end eventually leading to RNN paralysis )ereforesimple RNNs are prone to gradient disappearance problemsand are not suitable for long-term predicting

)e purpose of designing LSTM is to avoid or reduce theappearance of the problem of vanishing gradient whiledealing with long-term correlation time series by simpleRNN Based on the simple RNN the LSTM adds the outputgates the input gates and the forget gates In Figure 3 allthree gates are replaced by σ which can effectively preventthe gradient from being eliminated )erefore LSTM cansolve long-term dependence problems )e purpose of de-signing memory neurons is to store some LSTM importantinformation for state information In addition in generaleach gate has an activation function )is function performsnonlinear transformations or trade-offs on data Generallythe forget gate ft can filter some status information Equa-tions (1)ndash(6)fd6 associated with the LSTM neural network isshown below)ey are for the forget gate input gate inverseof the memory cell memory cell output gate and outputrespectively

ft σ Wf middot htminus 1 xt

1113872 1113873 + bf1113872 1113873 (1)

it σ Wi middot htminus 1 xt

1113872 1113873 + bi1113872 1113873 (2)

1113957Ct tanh WC middot htminus 1 xt

1113872 1113873 + bC1113872 1113873 (3)

Xt+1

ht+1

Xtndash1

htndash1

Xt

ht

tanhA A

Figure 1 )e simple structure of the RNN

Xtndash1

htndash1 ht+1ht

ytndash1 yt+1yt

Xt+1Xt

WhxWhxWhxWhx

Whh Whh Whh

WyhWyhWyhWyh

y

x

h Unfold

Figure 2 )e expanded diagram of the basic structure of the RNN

4 Complexity

Ct ft times Ctminus 1 + it times 1113957Ct (4)

ot σ Wo middot htminus 1 xt

1113872 1113873 + bo1113872 1113873 (5)

ht ot times tanh Ct( 1113857 (6)

32 EMD )e empirical mode decomposition (EMD)proposed by Huang et al in 1998 [42] is a widely usedadaptive time-frequency analysis method Empirical modedecomposition is an effective decomposition method fortime-series data Due to the common local features of time-series data the EMDmethod has extracted the required datafrom them and obtained very good results in applicationsfields Hence many scholars apply the EMD method inmany fields successfully )e prerequisite for EMD de-composition is the existence of the following three as-sumptions [43]

)e following briefly introduces the decompositionprocess

(1) Suppose there is a signal s(i) with the black line inFigure 4 )e extreme value of the signal is shown inred and blue dots Form the upper wrapping linethrough all the blue dots and form the lowerwrapping line through all the red dots

(2) Calculate the average value of the lower and upperwrapping lines to form an average purple-red linem(i) Here define the discrepancy line d(i) as

d(i) s(i) minus m(i) (7)

(3) Judge whether the discrepancy line d(i) is an IMFaccording to IMF judgment rules If the discrepancyline d(i) comply with IMF judgment rules the dis-crepancy line d(i) is the ith IMF f(i) Otherwise thediscrepancy line d(i) is considered the signal s(i) and

repeat two steps above until d(i) complies with IMFjudgment rules After this the IMF f(t) is defined as

f(t) d(t) t 1 2 3 n minus 1 (8)

(4) Calculate and get the IMF f(t) by

r(t) s(i) minus c(t) (9)

where r(t) is considered the residual signal(5) Repeat execution the four steps above N times until

running status meets stop conditions Obtain the NIMFs which meet with

r1 minus c2 r2

⋮

rNminus 1 minus cN rN

⎧⎪⎪⎨

⎪⎪⎩(10)

(1) Next level signal has to contain more than two ex-treme values one is the minimum value and theother is the maximum value

htndash1

Ctndash1

Xt

ht

otC~titft

Ct

yt

+

tanhσ σ σ

tanh

times

times times

Figure 3 )e basic structure of the LSTM neural network

IMF 1 iteration 0

ndash2

ndash1

0

1

2

20 30 40 50 60 70 80 90 100 110 12010

Figure 4 )e decomposition process chart of sequences

Complexity 5

(2) Determine the time scale of signal characteristicbased on the time difference between two extremevalues

(3) If the data has only an inflection point and no ex-tremum more times judgments are needed to revealthe extremum

Finally the following equation (11) expresses the com-position of the original signal s(i)

s(i) 1113944N

j1fj(t) + rj(t) (11)

33 EEMD A classical EMD has mode mixing problemswhen decomposing complex vibration signals To solve theabove problem Wu and Huang [44] proposed the EEMDmethod in 2009 )e EEMD method is short for ensembleempirical mode decomposition method EEMD is com-monly used for nonstationary signal decompositionHowever the EEMD method is significantly different fromWT transform and FFT transform Here WT is the wavelettransform and FFT is the fast Fourier transformWithout theneed for basis functions the EEMD method can decomposeany complex signal At the same time the EEMD methodcan decompose any signal into many IMFs Here IMF is theintrinsic modal function)e decomposed IMF componentscontain local different feature signals EEMD can decomposenonstationary data into multiple stable subdata and then useHilbert transform to get the time spectrum which hasimportant physical significance Comparative analysis withFFT and WT decomposition the EEMD has the charac-teristics of intuitive direct posterior and adaptive )eEEMD method has adaptive characteristics because of thelocal features of the time-series signal )e following brieflyintroduces the process of EEMD decomposing data

Assume that the EEMD will decompose the sequence XAccording to the steps of EEMD decomposition n subse-quences will be obtained after decomposition )ese nsubsequences include n minus 1 IMFs and one remaining sub-sequence Rn Here these n minus 1 IMFs are n minus 1 componentsubsequence Ci(i 1 2 n minus 1) of the original sequence X)ese n minus 1 IMFs are named IMFi(i 1 2 n minus 1) )eremaining subsequence Rn is sometimes named residualsubsequence )e detailed steps of using EEMD to de-compose the sequence are introduced as follows

(1) Suppose there is a signal s(i) with the black line inFigure 4 )e extreme value of the signal is shown inred and blue dots Form the upper wrapping linethrough all the blue dots and form the lowerwrapping line through all the red dots )e lowerwrapping line is shown in the red line in Figure 4And the upper wrapping line is the blue line inFigure 4

(2) Calculate the average value of the lower and upperwrapping lines to form a mean line m(i) which isshown in purple-red line in Figure 4

(3) Obtain the first component IMF1 of the signal s(i))e IMF1 is obtained by calculation formula d(i)

s(i) minus m(i) )is formula means the difference of theoriginal signal s(i) minus the mean line m(i) of thelower and upper wrapping lines

(4) Take the first component IMF1 d(i) as a new signals(i) and repeat execution the three steps above untilrunning status meets stop conditions )e followingdescribes the stop condition in detail

(a) )e average m(i) is approximately 0(b) )e number of signal lines d(i) passing through

zero points is greater than or equal to the numberof extreme points

(c) )e number of iterations reaches the setmaximum

(5) Take subsequence d(i) as the ith IMF fi (i 1 2 n minus 1) Obtain the residual R by calculating theformula r(i) s(i) minus d(i)

(6) Take the residual r(i) as the new signal s(i) to cal-culate the (i+1)th IMF and repeat execution the fivesteps above until running status meets stop condi-tions )e following describes the stop condition indetail

(a) )e signal s(i) has been completely decomposedand has obtained all IMF

(b) )e number of decomposition level reaches theset maximum

Finally the following equation (12) expresses the com-position of the original sequences and n subsequencesdecomposed by the EEMD

x(t) 1113944nminus 1

i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)

where the number n of subsequences depends on thecomplexity of the original sequences Figure 5 shows the sintime series represented by equation (13) and the IMF dia-gram of it which is decomposed by the EEMD

x(t) sin(20πt) + 2lowast sin(200πt) + 3t (13)

where t 0 f 2f 1000f f 0001

4 Methodology

)e principle of our hybrid prediction method LSTM-EEMD for stock price based on LSTM and EEMD is in-troduced in detail in this section )ese theories are thetheoretical formation of our forecasting methods )e fol-lowing first introduces the flowchart the basic structure andthe process of the LSTM-EEMD hybrid stock index pre-diction method based on the ensemble empirical modedecomposition and the long-short term memory neuralnetwork Our proposed hybrid LSTM-EEMD predictionmethod first uses the EEMD to decompose the stock indexsequences into a few simple stable subsequences )en thepredict result of each subsequence is predicted by the LSTM

6 Complexity

method Finally the LSTM-EEMD obtains the final pre-diction result of the original stock index sequence by fusingall LSTM prediction results of several stock indexsubsequences

Figure 6 shows the structure and process of the LSTM-EEMD method )e basic structure and process of theLSTM-EEMD predict method include three modules )ethree modules are the EEMD decomposition module LSTMprediction module and fusion module Our proposedLSTM-EEMD prediction method includes three stages andeight steps Figure 7 shows three stages and eight steps of ourproposed method )e three stages of the proposed hybridLSTM-EEMD prediction method are input data modelpredict and evaluate model )e model evaluation stage andthe data input stage each include 4 steps and the modelprediction stage includes 3 steps )e hybrid LSTM-EEMDprediction method is introduced in detail as follows

(1) )e simulation data is generated And the real-worldstock index time-series data are collected )en theoriginal stock index time-series data are pre-processed to make the data format of stock indextime series satisfy the format requirements for de-composition of the EEMD Finally the input data Xof the LSTM-EEMD hybrid prediction method isformed

(2) )e input data X is decomposed into a few sequencesby the EEMD method If n subsequences are ob-tained then there are one residual subsequence Rnand n minus 1 subsequences )ese n subsequences are

expressed as Rn IMF1 IMF2 IMF3 IMFn minus 1respectively

(3) )e prediction process of any subsequence is notaffected by the prediction process of other subse-quences A LSTM model is established and trainedfor each subsequence Hence we need to build and

50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)

EEMD decomposition of sin sequences

Sign

alIM

F 1

IMF

2R

Figure 5 )e sequences of sin signal decomposed by the EEMD

Fusion

EEMD

Rn

LSTM1

Using LSTM predict for IMFi and Rn

SubP1 SubP2 SubPn-1

Stock index sequence input data

Prediction results totalP

SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn

Figure 6 )e structure and process of the LSTM-EEMD method

Complexity 7

train n LSTM for n independent subsequences)esen independent LSTM models are namedLSTMk(k 1 2 n minus 1 n) respectivelyWe use then LSTM models to predict these n independentsubsequences and get n prediction values of the stockindex time series )ese n prediction values arenamed SubPk (k 1 2 n minus 1 n) respectively

(4) Fusion function is the core of hybrid method Atpresent there are many fusion functions such assum weighted sum weighted product and so on)e function of these fusion functions is to mergeseveral results into the final result In this paper theproposed hybrid stock prediction method selectedthe weighted sum as the fusion function )eweighted results of all subsequences are accumulatedto form the final prediction result for the originalstock index data )e weight here can be presetaccording to the actual application In this paper weuse the same weight of each subsequence and theweight of each subsequence is 1

(5) Finally we compare the predicted values with theactual value of stock index time-series data sequenceand calculate the values of RMSE MAE and R2 Weuse three evaluation criteria of the RMSE MAE and

R2 to evaluate the LSTM-EEMD hybrid predictionmethod According to these evaluation values thepros and cons of the method can be judged

Figure 7 shows the predict progress and data flowchart ofthe proposed LSTM-EEMD method in this paper )epredict progress in Figure 7 can be introduced in 3 stages)e three stages are input data model predict and evaluatemodel )e stage of input data is divided into 4 steps )efour steps are collect data preprocess data decompose databy the EEMD and generate n sequence )ere are n LSTMmodel in the stage of model predict )e input data of theLSTM model is n sequences generated in the previous stage)ese LSTMmodels separately predict n sequences to obtainn prediction results )e stage of the evaluate model is alsodivided into 4 steps )e first step is to fuse n predictedvalues with weights In this paper we choose weightedaddition as the fusion function )e weighted addition fu-sion function sums the n prediction value with certainweights )e output result of the weighted addition fusionfunction is the prediction result of the stock index time-series data Finally we need to calculate the value of R2MAE and RMSE of the prediction results before evaluatingthe proposed hybrid prediction model )e quality of theproposed hybrid prediction model can directly evaluated bythe values of R2 MAE and RMSE

5 Experiment Data

)e experiment data in our research of this paper is in-troduced in detail in this part We selected two types ofexperiment data to test in this paper in order to betterevaluate the prediction effect of this method)e first type ofexperiment data is artificially simulated data generatedautomatically by computer )e correctness and effective-ness of our method are verified by these artificially simulateddata )e second type of experimental data is real-worldstock index data Only the actual data of the society canreally test the quality of the method )e model testedthrough social actual data is the most fundamental re-quirement for applying our proposed method to some real-world fields

51 Artificially Simulated Experimental Data We use arti-ficially simulated experiment data to verify the effectivenessand correctness of our method To get effective and accurateexperiment result the artificially generated simulation ex-periment data should be long enough Hence in the ex-periment we choose the length of the artificially simulatedexperiment data to be 10000 )e artificially simulatedexperiment data is created by the computer according to thesin function of formula (13)

52 Real-World Experimental Data In order to empiricallystudy the effect of the proposed prediction method wecollected stock indices data in the real-world stock field asexperiment data from Yahoo Finance To obtain more ob-jective experimental results we choose 4 stock indices from

hellip

(1) Collection data

(2) Preprocess data

(3) EEMD decompose data

(4) Obtain IMFi Rn of n subsequence

hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion

(6) Obtain prediction results totalP

(7) Obtain RMSE MAE R2 of totalP

(8) Model evaluation results

IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn

totalP = sum ni=1 (ki lowast SubPi)

Figure 7 )e data flowchart of our proposed hybrid method

8 Complexity

different countries or regions)ese stock indices are all veryimportant stock indices in the worldrsquos financial field

)e four stock indices in this experiment are ASX DAXHSI and SP500 )e SP500 is used as an abbreviation of theStandardampPoorrsquos 500 in this paper )e SP500 is a US stockmarket index )is stock index is a synthesis of the stockindexes of 500 companies listed on NASDAQ and NYSE)e HSI is used as an abbreviation of the Hang Seng Index)e HSI is an important stock index for Hong Kong inChina )e German DAX is used as an abbreviation of theDeutscher Aktienindex in Germany in Europe )e DAXstock index is compiled by the Frankfurt Stock ExchangeDAX company consists of 30major German companies so itis a blue chip market index )e ASX is used as an abbre-viation of Australian Securities Exchange and is a stockindex compiled by Australia

)e datasets of every stock index include five dailyproperties )ese five properties are transaction volumehighest price lowest price closing price and opening price)e data time interval of each stock index is from theopening date of the stock to May 18 2020

53 Data Preprocessing In order to obtain good experimentresults we try our best to deal with all the wrong or incorrectdata in the experiment data Of course more experimentaldata is also an important condition for obtaining goodexperimental results Incorrect data mainly include recordswith zero trading volume and records with exactly the samedata for two and more consecutive days After removingwrong or incorrect noise data from original data we showthe trend of the close price for the four major stock indexesin Figure 8

In the experiment we usually first standardize the ex-perimental data Let Xi xi(t) be the ith stock time-seriesindex at time t where t 1 2 3 T and i 1 2 3 NWe define the daily return logarithmic as shown in theformula Gi(t) log(xi(t)) minus log(xi(t minus 1)) We define the dailyreturn standardization as shown in the formulaRi(t) (Gi(t) minus langGi(t)rang)δ where langGi(t)rang is the meanvalues of the daily return logarithmic Gi(t) and δ is thestandard deviation of the daily return logarithmic Gi(t)

δ

1n minus 1

1113944

n

i1

11139741113972

xi minus x( 11138572

(14)

where x 1n 1113936ni1 xi

6 Experiment Results of Other Methods

In order to compare the performance of other forecastingmethods we comparatively studied and analyzed the resultsof other three forecasting methods on the same data in theexperiment Table 1 shows the experiment results of fiveprediction methods Since the LSTM is introduced above itwill not be repeated here Firstly SVR BARDR and KNR arebriefly introduced )en the experimental results of thesethree methods are analyzed in detail

61 SVR SVR is used as an abbreviation of the SupportVector Regression )e SVR is a widely used regressionmethod Refer to the manual of the libsvm toolbox the lossand penalty function control the training process of machinelearning )e libsvm toolbox is support vector machinetoolbox software and this toolbox is mainly used for SVMpattern recognition and regression software package In theexperiment the SVR used was a linear kernel which wasimplemented with liblinear instead of libsvm )e SVRshould be extended to large number of samples)e SVR canchoose a variety of penalty functions or loss functions SVRhas 10 parameters and the settings of these parameters areshown in Table 2

62 BARDR BARDR is short for Bayesian ARD Regressionwhich is used to fit the weight of the regression model )ismethod assumes that the weight of the regression modelconforms to the Gaussian distribution To better fit theregression model weights the BARDR uses the ARD priortechnique We assume the distribution of the regressionmodel weights conforms to the Gaussian distribution )eparameter alpha and parameter lambda of BARDR are theprecision of the noise distribution and the precisions of theweights distributions respectively BARDR has 12 param-eters Table 3 shows the settings of these parameters

63 KNR KNR is used as an abbreviation of the K-nearestNeighbors Regression )e K-nearest Neighbors Regressionmodel is also a parameterless model )e K-nearestNeighbors Regression just uses the target value of the K-nearest training samples to make a decision on the re-gression value of the sample to be tested )at is predict theregression value based on the similarity of the sample Ta-ble 4 shows the KNR 8 parameters and the settings of theseparameters

)e experimental results of other four methods and ourproposed two methods for five sequences of sin SP500 HSIDAX and ASX are shown in Table 1 In the experiment wepreprocessed the five sequences for those methods Table 5shows the length of the experimental data Table 1 shows theexperiment values of R2 (R Squared) MAE (Mean AbsoluteError) and RMSE (Root Mean Square Error) According tothe real results and the predicted results we try to get smallerexperimental value of MAE RMSE and R2 )e smaller theresult of MAE or RMSE is the better experimental values ofthe method are However the larger the result of R2 is thebetter the prediction effect of the method is

For comparison we show the top two results of eachsequences in bold as shown in Table 1 Among the abovefour traditional methods (SVR BARDR KNR and LSTM)for predicting sin artificial sequences data BARDR andLSTM are found to be the best methods When predictingSP500 HSI and DAX real time-series data the BARDRmethod and LSTM method have the best results Howeverthe BARDR method and SVR method have the best resultswhile predicting ASX real time-series data )ereforeamong the above four traditional methods the BARDR

Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400

Figure 8 )e trend of the four major stock indexes

Table 1 Prediction results of time seriesMethod Sin SP500 HSI DAX ASX

SVRRMSE 0282409 1219904783 1334991215 0653453 0206921MAE 0238586 990639020 1282862498 0415796 0146022R2 0968293 minus 1665447 minus 1088731 0938262 0959223

BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271

KNRRMSE 0567807 1198484251 553641943 0740763 0303352MAE 0443162 937309021 452300293 0445212 0186848R2 0871824 minus 1572662 minus 1239516 0920661 0912361

LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165

Proposed LSTM-EMDRMSE 0032478 76562082 78487994 0403972 0066237MAE 0026175 47317009 71062834 0217733 0045643R2 0999569 0985523 0987594 0979444 0996174

Proposed LSTM-EEMDRMSE 0201358 48878331 48783906 0324303 0101511MAE 0164348 38556500 38564812 0247722 0081644R2 0982980 0994171 0988451 0986832 0991071

10 Complexity

method and LSTM method are the best methods for the fivesequences data

Carefully observed Table 1 we found that the remainingfive methods in addition to the KNR all have very goodprediction effects on sin sequences )e R2 evaluation in-dexes of SVR BARDR LSTM LSTM-EMD and LSTM-EEMD methods are all greater than 096 BARDR LSTMand LSTM-EMD have the best prediction effect and their R2

evaluation indexes are all greater than 099 Among themBARDR has the best prediction effect and its R2 evaluationvalue is greater than 09999 )ese values show that themethod has better prediction effect for the time series ofchange regularity and stability especially the BARDRmethod Here the result of choosing one of the six methodsshows the prediction effect in Figure 9 Tomake the resultingmap clear and legible the resulting graph of Figure 9 displaysonly the last 90 prediction results

Observing the SVR experiment results in Table 1 wefound that the prediction values of this method on DAXASX and sin is better than that of SP500 and HSI time-seriesdata Figure 10 shows the prediction results of the SVR onASX which has a better prediction effect on ASX stock dataBecause the change of sin sequence has good regularity andstability the SVR method is more suitable to predict se-quence with good regularity and stability It can be specu-lated that the DAX and ASX time-series data have goodregularity and stability However the SVR method predictsSP500 and HSI sequences data )e prediction effect is very

poor)is shows that SP500 andHSI sequences data changeshave poor regularity and stability

Observing the experiment results of the KNR in Table 1we found this method has similar performance to the SVRmethod )e prediction results on DAX ASX and sin se-quence are better than that on SP500 and HSI sequenceEspecially for stock SP500 and HSI sequence the predictioneffect is poor )e stock time series is greatly affected bypeople activities so the changes in stock sequence arecomplicated irregular and unstable It can be inferred thatthe KNR is not suitable to predict the stock sequence so theKNR is not suitable to predict sequence predictions withunstable and irregular changes

In Table 1 observing the experiment results of theBARDR and LSTM we found the performance of theBARDR and LSTM methods is relatively good )ese twomethods not only have better prediction effects on DAXASX and sin time-series data but also have more significantprediction effects on SP500 and HSI sequence In particularthe prediction values of stock SP500 and HSI sequences aremuch better than that of KNR and SVR methods )echanges of stock sequence data are irregular complex andunstable but the prediction effects of the BARDR and LSTMare still very good It can be concluded that the BARDR andLSTM methods can predict stock time series so the BARDRand LSTM are more suitable to predict sequence predictionswith unstable and irregular changes Figure 11 shows theprediction results of the BARDR for DAX time-series datawhich has a better prediction effect on DAX stock time-series data

In Table 1 observing the experiment results of theBARDR and LSTM we found these two methods have goodprediction effects on the five different time-series dataprovided by the experiment By comparing their experi-mental results it is easy to find that the performance of theBARDR is better than the LSTM method Except that theLSTM is a little better than the BARDR in the SP500 time-series prediction effect the LSTM is worse than the BARDRin the other four time-series prediction effects Although theBARDR is better than the LSTM in predicting performancethe BARDR method is several times longer than the LSTMmethod in experimental time In addition although in ourexperiments the prediction values of the LSTM is worsethan the BARDRmethod the experimental time is short andthe training parameters used are still very few In futurework we may be able to improve performance of ourproposed methods by increasing the number of neurons andthe number of training iterations of the LSTM methodBased on the principle analysis we think the predictive effectof the LSTM will exceed that of the BARDR method byreasonably increasing the number of neurons and thenumber of training iterations

7 Experiments

We selected two types of experiment data to test in this paperin order to better evaluate the prediction effect of thismethod )e first type of experiment data is artificiallysimulated data generated automatically by computer )e

Table 2 )e value of eight parameters of SVRParameter name Parameter type Parameter valueepsilon Float 00Tol Float 1e minus 4C Float 10

Loss ldquoepsilonrdquo orldquosquared_epsilonrdquo Epsilon

fit_intercept Bool Trueintercept_scaling Float 10Dual Bool TrueVerbose Int 1random_state Int 0max_iter Int 1000

Table 3 )e value of eight parameters of BARDRParameter name Parameter type Parameter valuen_iter Int 300Tol Float 1e minus 3alpha_1 Float 1e minus 6alpha_2 Float 1e minus 6lambda_1 Float 1e minus 6lambda_2 Float 1e minus 6Compute_score Bool False)reshold_lambda Float 10000fit_intercept Bool TrueNormalize Bool Falsecopy_X Bool TrueVerbose Bool False

Complexity 11

second type of experimental data is real-world stock indexdata )e model tested through social actual data is the mostfundamental requirement for applying our proposed

method to some real-world fields )is section conductsdetailed research and analysis on the experiment results oftwo aspects

71 Analysis of Experimental Results Based on ArtificialSimulation Data We use artificially simulated experimentdata to verify the effectiveness and correctness of ourmethod To get effective and accurate experiment result theartificially generated simulation experiment data should belong enough Hence in the experiment we choose the lengthof the artificially simulated experiment data to be 10000 asshown in Table 5

Before analyzing the experiment results of real-worldstock index data we first use the proposed LSTM-EMD andLSTM-EEMD methods to predict sin simulation sequence)e simulation experiment can verify the effectiveness andcorrectness of the two proposed methods

Observing the sin data column of Table 1 we found thethree indicators of the LSTM method LSTM-EMD methodand LSTM-EEMDmethod are basically equivalent under thesame number of neurons and iterations )e three resultsindicated that these three methods predict effects are similarto the sin simulation time-series data Among them theLSTM-EMD method has the best prediction effect indi-cating that the method we proposed is effective and canimprove the prediction effect of the experiment Observing

Table 4 )e value of eight parameters of KNRParameter name Parameter type Parameter valuen_neighbors Int 5Weights ldquoUniformrdquo or ldquodistancerdquo UniformAlgorithm ldquoAutordquo ldquoball_treerdquo ldquokd_treerdquo ldquobruterdquo Autoleaf_size Int 30p Int 2Metric String or callable Murkowskimetric_params Dict Nonen_jobs Int 1

Table 5 )e length of the experiment dataName of data Sin SP500 HSI DAX ASXAll data length 10000 23204 8237 1401 4937Train data length 9000 20884 7413 1261 4443Test data length 1000 2320 824 140 494

0

1

2

3

4

5

6

4020 80600

RealPredict

Figure 9 Prediction results of the LSTMmethod for sin sequence

200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict

Figure 10 Prediction results of SVR for ASX sequence

200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict

Figure 11 Prediction results of BARDR for DAX sequence

12 Complexity

the sin data column in Table 1 we found the experimentalresults of the LSTM are a little better than the LSTM-EEMDbut the gap is very small and can be almost ignored After in-depth analysis this is actually that the sin time-series dataitself is very regular By adding noise data and then usingEEMD decomposition the number of subtime series de-composition becomes more and more complicated so theprediction effect is slightly worse In summary the twoproposed prediction methods LSTM-EMD and LSTM-EEMD in this paper have a better comprehensive effect thanthe original LSTMmethod indicating that the two proposedprediction methods are correct and effective

72 Analysis of Experiment Results Based on Real DataIn order to verify the practicability of the proposed LSTM-EMD and LSTM-EEMD prediction methods we use the twomethods proposed in this paper to predict four real-worldstock index time series in the financial market )e fourtime-series are the DAX ASX SP500 and HSI )e stockindex time series is generated in human society and reflectssocial phenomena )e time series of stock indexes in thefinancial market is seriously affected by human factors so itis complex and unstable )ese four time-series are veryrepresentative )ey come from four different countries orregions and can better reflect the changes in differentcountries in the world financial market

By comparing the three evaluation indicators of RMSEMAE and R2 of the LSTM LSTM-EMD and LSTM-EEMDprediction methods in Table 1 we found the predictionresults of the LSTM-EEMD prediction method in the foursequence data are better than the LSTM-EMD predictionmethod )e experimental results show that the LSTM-EEMD prediction method is better than the LSTM-EMDprediction method Figure 12 shows the prediction results ofthe LSTM-EMD method and the LSTM-EEMD method ofHSI time series

By observing Table 1 it is easy to find that the results ofthe proposed LSTM-EMD and LSTM-EEMDmethods in thethree sequences of HSI DAX and ASX are much better thanthe traditional LSTM method We think that there are twomain reasons for obtaining such good experimental resultsOn the one hand the method proposed in this paper de-composes HSI DAX and ASX time series by EMD or EEMDso that the complex original HSI DAX and ASX time seriesare decomposed into multiple more regular and stablesubtime series )e multiple subtime series are easier topredict than the complex original time series )erefore thepredicting results of multiple subtime series is more accuratethan the predicting results of directly predicting complexoriginal time series On the other hand although the changesof HSI DAX and ASX time series are complicated they canbe decomposed into more regular and stable time series Inorder to clearly understand the changes in HSI DAX andASX time series and EMD or EEMD decomposition Fig-ures 13 and 14 show all changes in EMD or EEMD de-composition of DAX time series respectively Figure 8 is theoriginal change of the DAX time series and all subgraphs inFigures 13 and 14 are the EMD or EEMD decomposition

subgraphs of the DAX time series It is easy to see that thelower EMD or EEMD decomposition subgraphs are gentlermore regular more stable more predictable and have betterprediction effects

Carefully observe the data of the LSTM method LSTM-EMD method and LSTM-EEMD method in Table 1 in thethree sequences of HSI DAX and ASX It can be found thatthe proposed LSTM-EEMD method has a better experi-mental effect than the proposed LSTM-EMD method andthe proposed LSTM-EMDmethod has a better experimentaleffect than the traditional LSTM method )e three indi-cators RMSE MAE and R2 used in the experiment all ex-emplify the above relationship )e smaller the RMSE orMAE experimental value of a method the better the pre-diction effect of the method However the larger the R2 valueof a method the better its prediction effect

In this paper we proposed a hybrid prediction methodbased on EMD or EEMD )e results of experiment showthat the fusion prediction results are more superior to thetraditional method for direct prediction of complexoriginal time series in most instances Although the ex-periments in this paper comparatively studied two hybridmethods based on EMD or EEMD and four other classicalmethods such as SVR BARDR KNR and LSTM)e twohybrid methods based on EMD and EEMD have differenteffects in different time series and there are different time-series methods that reflect different experimental

1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)

Figure 12 Prediction results of (a) LSTM-EMD method and(b) LSTM-EEMD method of HSI time series

Complexity 13

advantages and disadvantages In actual applicationpeople can choose different methods according to theactual situation and apply it to specific practical fields Inthe actual environment if the results obtained by themethod you choose do not meet your expectations youcan choose another method

8 Conclusion and Future Work

We proposed a hybrid short-term predictionmethod LSTM-EMD or LSTM-EEMD based on LSTM and EMD or EEMDdecomposition methods )e method is based on a complexproblem divided and conquered strategy Combining the

200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01

Figure 13 )e EMD decomposition graphs of DAX time series

375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000

Figure 14 )e EEMD decomposition graphs of DAX time series

14 Complexity

advantages of EMD EEMD and LSTM we used the EMD orEEMD method to decompose the complex sequence intomultiple relatively stable and gentle subsequences Use theLSTM neural network to train and predict each subtimeseries)e prediction process is simple and requires only twosteps to complete First we use the LSTM to predict eachsubtime series value )en the prediction results of multiplesubtime series are fused to form a complex original time-series prediction result In the experiment we selected fivedata for testing to fully the performance of the method )ecomparison results with the other four prediction methodsshow that the predicted values show higher accuracy )ehybrid prediction method we proposed is effective andaccurate in future stock price prediction Hence the hybridprediction method has practical application and referencevalue However there are some shortcomings )e proposedmethod has some unexpected effects on the experimentalresults of time series with very orderly changes

)e research and application of analysis and predictionmethods on time series have a long history and rapid de-velopment but the prediction effect of traditional methodsfails to meet certain requirement of real application on someaspects in certain fields Improving the prediction effect isthe most direct research goal Taking the study results of thispaper as a starting point we still have a lot of work in thefuture that needs further research In the future we willcombine the EMD method or EEMD method with othermethods or use the LSTM method in combination withwavelet or VMD

Data Availability

Data are fully available without restriction )e originalexperimental data can be downloaded from Yahoo Financefor free (httpfinanceyahoocom)

Conflicts of Interest

)e authors declare that they have no conflicts of interest

Authorsrsquo Contributions

Yang Yujun contributed to all aspects of this work YangYimei and Xiao Jianhua conducted the experiment andanalyzed the data All authors reviewed the manuscript

Acknowledgments

)is work was supported in part by the Scientific ResearchFund of Hunan Provincial Education under Grants 17C1266and 19C1472 Key Scientific Research Projects of HuaihuaUniversity under Grant HHUY2019-08 Key Laboratory ofWuling-Mountain Health Big Data Intelligent Processingand Application in Hunan Province Universities and theKey Laboratory of Intelligent Control Technology forWuling-Mountain Ecological Agriculture in Hunan Prov-ince under Grant ZNKZ2018-5 )e fund source has no rolein research design data collection analysis or interpretationor the writing of this manuscript

References

[1] E Hadavandi H Shavandi and A Ghanbari ldquoA geneticfuzzy expert system for stock price forecastingrdquo in Pro-ceedings of the 2010 Seventh International Conference onFuzzy Systems and Knowledge Discovery pp 41ndash44 YantaiChina August 2010

[2] C Zheng and J Zhu ldquoResearch on stock price forecast basedon gray relational analysis and ARMAX modelrdquo in Pro-ceedings of the 2017 International Conference on Grey Systemsand Intelligent Services (GSIS) pp 145ndash148 StockholmSweden August 2017

[3] A Kulaglic and B B Ustundag ldquoStock price forecast usingwavelet transformations in multiple time windows and neuralnetworksrdquo in Proceedings of the 2018 3rd InternationalConference on Computer Science and Engineering (UBMK)pp 518ndash521 Sarajevo Bosnia September 2018

[4] G Xi ldquoA novel stock price forecasting method using thedynamic neural networkrdquo in Proceedings of the 2018 Inter-national Conference on Robots amp Intelligent System (ICRIS)pp 242ndash245 Changsha China February 2018

[5] Y Yu S Wang and L Zhang ldquoStock price forecasting basedon BP neural network model of network public opinionrdquo inProceedings of the 2017 2nd International Conference onImage Vision and Computing (ICIVC) pp 1058ndash1062Chengdu China June 2017

[6] J-S Chou and T-K Nguyen ldquoForward forecast of stock priceusing sliding-window metaheuristic-optimized machine-learning regressionrdquo IEEE Transactions on Industrial Infor-matics vol 14 no 7 pp 3132ndash3142 2018

[7] Y Guo S Han C Shen Y Li X Yin and Y Bai ldquoAn adaptiveSVR for high-frequency stock price forecastingrdquo IEEE Accessvol 6 pp 11397ndash11404 2018

[8] J Lee R Kim Y Koh and J Kang ldquoGlobal stock marketprediction based on stock chart images using deep Q-net-workrdquo IEEE Access vol 7 pp 167260ndash167277 2019

[9] J Zhang Y-H Shao L-W Huang et al ldquoCan the exchangerate be used to predict the shanghai composite indexrdquo IEEEAccess vol 8 pp 2188ndash2199 2020


[11] D Lien Minh A Sadeghi-Niaraki H D Huy K Min andH Moon ldquoDeep learning approach for short-term stocktrends prediction based on two-stream gated recurrent unitnetworkrdquo IEEE Access vol 6 pp 55392ndash55404 2018

[12] I I Hassan ldquoExploiting noisy data normalization for stockmarket predictionrdquo Journal of Engineering and Applied Sci-ences vol 12 no 1 pp 69ndash77 2017

[13] S Jeon B Hong and V Chang ldquoPattern graph tracking-based stock price prediction using big datardquo Future Gener-ation Computer Systems vol 80 pp 171ndash187 2018

[14] E Chong C Han and F C Park ldquoDeep learning networks forstock market analysis and prediction methodology datarepresentations and case studiesrdquo Expert Systems with Ap-plications vol 83 pp 187ndash205 2017

[15] X Ding Y Zhang T Liu and J Duan ldquoUsing structuredevents to predict stock price movement an empirical in-vestigationrdquo in Proceedings of the EMNLP pp 1ndash11 DohaQatar October 2014

[16] M Dang and D Duong ldquoImprovement methods for stockmarket prediction using financial news articlesrdquo in Pro-ceedings of 2016 3rd National Foundation for Science andTechnology Development Conference on Information and

Complexity 15

Computer Science pp 125ndash129 Danang City VietnamSeptember 2016

[17] B Weng M A Ahmed and F M Megahed ldquoStock marketone-day ahead movement prediction using disparate datasourcesrdquo Expert Systems with Applications vol 79 pp 153ndash163 2017

[18] Y Yujun L Jianping and Y Yimei ldquoAn efficient stockrecommendation model based on big order net inflowrdquoMathematical Problems in Engineering vol 2016 Article ID5725143 15 pages 2016

[19] S O Ojo P A Owolawi M Mphahlele and J A AdisaldquoStock market behaviour prediction using stacked LSTMnetworksrdquo in Proceedings of the 2019 International Multi-disciplinary Information Technology and Engineering Con-ference (IMITEC) pp 1ndash5 Vanderbijlpark South AfricaNovember 2019

[20] S Liu G Liao and Y Ding ldquoStock transaction predictionmodeling and analysis based on LSTMrdquo in Proceedings of the2018 13th IEEE Conference on Industrial Electronics andApplications (ICIEA) pp 2787ndash2790 Wuhan China May2018

[21] D Wei ldquoPrediction of stock price based on LSTM neuralnetworkrdquo in Proceedings of the 2019 International Conferenceon Artificial Intelligence and Advanced Manufacturing(AIAM) pp 544ndash547 Dublin Ireland October 2019

[22] K Chen Y Zhou and F Dai ldquoA LSTM-based method forstock returns prediction a case study of China stock marketrdquoin Proceedings of the 2015 IEEE International Conference onBig Data (Big Data) pp 2823-2824 Santa Clara CA USANovember 2015

[23] Y Yang and Y Yang ldquoHybrid method for short-term timeseries forecasting based on EEMDrdquo IEEE Access vol 8pp 61915ndash61928 2020

[24] A H Bukhari M A Z Raja M Sulaiman S IslamM Shoaib and P Kumam ldquoFractional neuro-sequentialARFIMA-LSTM for financial market forecastingrdquo IEEE Ac-cess vol 8 p 71326 2020

[25] S Ma L Gao X Liu and J Lin ldquoDeep learning for trackquality evaluation of high-speed railway based on vehicle-body vibration predictionrdquo IEEE Access vol 7 pp 185099ndash185107 2019

[26] Y Hu X Sun X Nie Y Li and L Liu ldquoAn enhanced LSTMfor trend following of time seriesrdquo IEEE Access vol 7pp 34020ndash34030 2019

[27] S Hochreiter and J Schmidhuber ldquoLong short-term mfe-moryrdquo Neural Computation vol 9 no 8 pp 1735ndash17801997

[28] Y Yang J Li and Y Yang ldquo)e cross-correlation analysis ofmulti property of stock markets based on MM-DFArdquo PhysicaA Statistical Mechanics and Its Applications vol 481pp 23ndash33 2017

[29] Y Yujun L Jianping and Y Yimei ldquoMultiscale multifractalmultiproperty analysis of financial time series based on Renyientropyrdquo International Journal of Modern Physics C vol 28no 2 2017

[30] F A Gers and J Schmidhuber ldquoRecurrent nets that time andcountrdquo in Proceedings of the IEEE-INNS-ENNS InternationalJoint Conference on Neural Networks IJCNN 2000 vol 3pp 189ndash194 Como Italy July 2000

[31] Y Wei Z Wang M Xu and S Qiao ldquoAn LSTM method forpredicting CU splitting in H 264 to HEVC transcodingrdquo inProceedings of the IEEE Visual Communications and ImageProcessing (VCIP) pp 1ndash4 St Petersburg FL USA De-cember 2017

[32] C Zhang W Ren T Mu L Fu and C Jia ldquoEmpirical modedecomposition based background removal and de-noising inpolarization interference imaging spectrometerrdquo Optics Ex-press vol 21 no 3 pp 2592ndash2605 2013

[33] X Zhou H Zhao and T Jiang ldquoAdaptive analysis of opticalfringe patterns using ensemble empirical mode decomposi-tion algorithmrdquo Optics Letters vol 34 no 13 pp 2033ndash20352009

[34] N E Huang J R Yeh and J S Shieh ldquoEnsemble empiricalmode decomposition a noise-assisted data analysis methodrdquoAdvances in Adaptive Data Analysis vol 1 no 1 pp 1ndash412009

[35] H Zhang F Wang D Jia T Liu and Y Zhang ldquoAutomaticinterference term retrieval from spectral domain low-co-herence interferometry using the EEMD-EMD-basedmethodrdquo IEEE Photonics Journal vol 8 no 3 pp 1ndash9 ArticleID 6900709 2016

[36] Y Jiang C Tang X Zhang W Jiao G Li and T Huang ldquoAnovel rolling bearing defect detection method based on bis-pectrum analysis and cloud model-improved EEMDrdquo IEEEAccess vol 8 pp 24323ndash24333 2020

[37] WWangW Peng M Tian andW Tao ldquoPartial discharge ofwhite noise suppression method based on EEMD and higherorder statisticsrdquo Fe Journal of Engineering vol 2017 no 13pp 2043ndash2047 2017

[38] Z Wei K G Robbersmyr and H R Karimi ldquoAn EEMDaided comparison of time histories and its application invehicle safetyrdquo IEEE Access vol 5 pp 519ndash528 2017

[39] Y Yang and Y Yang ldquoHybrid prediction method for windspeed combining ensemble empirical mode decompositionand Bayesian ridge regressionrdquo IEEE Access vol 8pp 71206ndash71218 2020

[40] J Sun and H Sheng ldquoApplications of Ensemble Empiricalmode decomposition to stock-futures basis analysisrdquo inProceedings of the 2010 2nd IEEE International Conference onInformation and Financial Engineering pp 396ndash399Chongqing China September 2010

[41] B Al-Hnaity and M Abbod ldquoA novel hybrid ensemble modelto predict FTSE100 index by combining neural network andEEMDrdquo in Proceedings of the 2015 European Control Con-ference (ECC) pp 3021ndash3028 Linz Austria July 2015

[42] N E Huang Z Shen S R Long et al ldquo)e empirical modedecomposition and the Hilbert spectrum for nonlinear andnon-stationary time series analysisrdquo Proceedings of the RoyalSociety of London Series A Mathematical Physical and En-gineering Sciences vol 454 no 1971 pp 903ndash995 1998

[43] F Jiang Z Zhu and W Li ldquoAn improved VMD with em-pirical mode decomposition and its application in incipientfault detection of rolling bearingrdquo IEEE Access vol 6pp 44483ndash44493 2018

[44] Z H Wu and N E Huang ldquoEnsemble empirical mode de-composition a noise-assisted data analysis methodrdquo Ad-vances in Adaptive Data Analysis vol 1 no 1 pp 1ndash14 2009

16 Complexity

to time-series data )erefore many researchers use LSTM[19ndash26] to analyze and predict stock prices Many studieshave analyzed the correlation between time-series data[27ndash30] and the results show that LSTM has advantages intime-series data In the literature [31] researchers usedLSTM to predict the coding unit split and the experimentalresults proved the advantages of LSTM in terms of efficiencyTime-series data trend research is also a new form of time-series data prediction so LSTM is a natural choice

Empirical mode decomposition (EMD) technology isusually applied to nonstationary and nonlinear signals EMDcan decompose nonlinear signals into several inherentmodal functions (IMF) adaptively EMD can effectivelysuppress continuous noise such as Gaussian noise [32]However EMD cannot suppress intermittent noise andmixed noise Ensemble empirical mode decomposition(EEMD) technology can solve the problem of noise-modemixing [33] In the EEMD algorithm a group of white noiseis first added to the original signal and then it is decomposedinto several IMF )e average value of the correspondingIMF set is regarded as the correct result EEMD will separatethe noise in different IMF from the original signal com-ponents [34] thus eliminating the noise-mode mixingphenomenon In recent years the application of EEMD hasattracted the attention of many researchers and scholars[27 35ndash44] In order to solve the problem that noise inpractical applications makes interference term retrievaldifficult Zhang et al [35] proposed a technique based onEEMD and EMD to achieve automatic interference termretrieval from the spectral domain low-coherence interfer-ometry )e proposed algorithm uses EEMD technology tomake the relative error of coupling strength less than 2 Tosolve the problem that Gaussian noise and non-Gaussiannoise seriously hinder the detection of rolling bearing defectsby traditional methods Jiang et al [36] proposed a newrolling bearing inspectionmethod that combines bispectrumanalysis with improved integrated EMD )is method usesensemble empirical mode decomposition technology to havesuperior performance in reducing multiple backgroundnoises and can more effectively detect defects in rollingbearings To solve the problem of the influence of the au-thenticity of the partial discharge signal on the evaluationaccuracy of the transformer insulation performance Wanget al [37] proposed a method to suppress white noise in PDsignals based on the integration of EMD and the combi-nation of high-order statistics )is method uses EEMDdecomposition to the threshold and reconstructs each IMFto suppress the white noise in each component To solve theproblem that most existing measurement methods onlyfocus on mathematical values and are affected by mea-surement errors interference and uncertainty Wei et al[38] proposed a new time-history comparison for vehiclesafety analysis by the integrating empirical mode decom-position method)is method uses EEMD decomposition tomake the trend signal to reflect the overall change and is notaffected by high-frequency interference To solve the difficultproblem of wind speed prediction Yang and Yang [39]proposed a hybrid BRR-EEMD short-term predictionmethod for wind speed based on the EMD and Bayesian

ridge regression (BRR) )is hybrid method uses theBayesian regression method and the EEMD to performregression prediction on each subsequence decomposed bythe EEMD and obtains good results in wind speed pre-diction In order to find potential profit arbitrage oppor-tunities when the returns of stock index futures contractsand stock index futures contracts continue to deviate fromfair prices in irrational and nonefficiently operating marketsSun and Sheng [40] proposed a time-series analysis methodbased on integrated EMD )is method uses EEMD toanalyze the stock futures basis sequence and extracts amonotonically decreasing trend from the sequence to dis-cover business opportunities To improve the problem that asingle method of predicting complex and nonlinear stockprices cannot achieve good results Al-Hnaity and Abbod[41] proposed a hybrid integrated model based on ensembleempirical mode decomposition and backpropagation neuralnetwork to predict the closing price of the stock index )eresearchers [41] have proposed five hybrid predictionmodels ensemble EMD-NN ensemble EMD-Bagging-NNensemble EMD-Crossvalidation-NN ensemble EMD-CV-Bagging-NN and ensemble EMD-NN )e experimentalresults show that the performance of the ensemble EMD-CV-Bagging-NN ensemble EMD-Crossvalidation-NN andensemble EMD-Bagging-NN models based on ensembleEMD are all a grade higher than that of the ensemble EMD-NN model and significantly higher than the single neuralnetwork model

)e typical forecasting scheme is based on the forecast ofthe time-series data itself and does not deal with the time-series data itself It has become a challenge that how tocombine the existing forecasting methods to improve theforecasting effect by decomposing the time-series data )eabove methods use EEMD to decompose the time-seriesdata to improve the performance of the algorithm How toeffectively decompose complex and nonlinear stock time-series data for prediction has been puzzled by many re-searchers Due to the uncertainty and nonlinearity of thestock time series the deviation of a single method to predictstock prices is generally relatively large)e abovementionedhybrid method does indeed improve the algorithm signifi-cantly )erefore it can be boldly guessed that the hybridmethod generally can get better prediction results than thesingle specific method Besides the original complex timeseries was decomposed by the EEMD method into severalrelatively stable subsequence time series By effectivelycombining several current effective forecasting methods theforecasting results of relatively stable subsequence timeseries are theoretically better Combining the features of theEEMD method based on the improved empirical modedecomposition method and the LSTM machine learningalgorithm this paper proposes a hybrid LSTM-EEMDmethod for stock index price prediction

)e rest of this paper is organized as follows )reerelated terminology such as EMD EEMD and BRR arepresented in Section 2 while Section 3 briefly introduces theflowchart of our proposed LSTM-EEMD method and thestructure of our proposed hybrid LSTM-EEMD method InSection 4 we describe the experiment data collection

2 Complexity


2 Related Works






3 Preliminaries




Complexity 3






1113872 1113873 + bf1113872 1113873 (1)


1113872 1113873 + bi1113872 1113873 (2)


1113872 1113873 + bC1113872 1113873 (3)

Xt+1

ht+1

Xtndash1

htndash1

Xt

ht

tanhA A


Xtndash1

htndash1 ht+1ht

ytndash1 yt+1yt

Xt+1Xt

WhxWhxWhxWhx

Whh Whh Whh

WyhWyhWyhWyh

y

x

h Unfold


4 Complexity



1113872 1113873 + bo1113872 1113873 (5)









f(t) d(t) t 1 2 3 n minus 1 (8)





r1 minus c2 r2

⋮


⎧⎪⎪⎨

⎪⎪⎩(10)


htndash1

Ctndash1

Xt

ht

otC~titft

Ct

yt

+

tanhσ σ σ

tanh

times

times times


IMF 1 iteration 0

ndash2

ndash1

0

1

2

20 30 40 50 60 70 80 90 100 110 12010


Complexity 5




s(i) 1113944N

j1fj(t) + rj(t) (11)

















i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)



where t 0 f 2f 1000f f 0001

4 Methodology


6 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity


2 Related Works






3 Preliminaries




Complexity 3






1113872 1113873 + bf1113872 1113873 (1)


1113872 1113873 + bi1113872 1113873 (2)


1113872 1113873 + bC1113872 1113873 (3)

Xt+1

ht+1

Xtndash1

htndash1

Xt

ht

tanhA A


Xtndash1

htndash1 ht+1ht

ytndash1 yt+1yt

Xt+1Xt

WhxWhxWhxWhx

Whh Whh Whh

WyhWyhWyhWyh

y

x

h Unfold


4 Complexity



1113872 1113873 + bo1113872 1113873 (5)









f(t) d(t) t 1 2 3 n minus 1 (8)





r1 minus c2 r2

⋮


⎧⎪⎪⎨

⎪⎪⎩(10)


htndash1

Ctndash1

Xt

ht

otC~titft

Ct

yt

+

tanhσ σ σ

tanh

times

times times


IMF 1 iteration 0

ndash2

ndash1

0

1

2

20 30 40 50 60 70 80 90 100 110 12010


Complexity 5




s(i) 1113944N

j1fj(t) + rj(t) (11)

















i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)



where t 0 f 2f 1000f f 0001

4 Methodology


6 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity






1113872 1113873 + bf1113872 1113873 (1)


1113872 1113873 + bi1113872 1113873 (2)


1113872 1113873 + bC1113872 1113873 (3)

Xt+1

ht+1

Xtndash1

htndash1

Xt

ht

tanhA A


Xtndash1

htndash1 ht+1ht

ytndash1 yt+1yt

Xt+1Xt

WhxWhxWhxWhx

Whh Whh Whh

WyhWyhWyhWyh

y

x

h Unfold


4 Complexity



1113872 1113873 + bo1113872 1113873 (5)









f(t) d(t) t 1 2 3 n minus 1 (8)





r1 minus c2 r2

⋮


⎧⎪⎪⎨

⎪⎪⎩(10)


htndash1

Ctndash1

Xt

ht

otC~titft

Ct

yt

+

tanhσ σ σ

tanh

times

times times


IMF 1 iteration 0

ndash2

ndash1

0

1

2

20 30 40 50 60 70 80 90 100 110 12010


Complexity 5




s(i) 1113944N

j1fj(t) + rj(t) (11)

















i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)



where t 0 f 2f 1000f f 0001

4 Methodology


6 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity



1113872 1113873 + bo1113872 1113873 (5)









f(t) d(t) t 1 2 3 n minus 1 (8)





r1 minus c2 r2

⋮


⎧⎪⎪⎨

⎪⎪⎩(10)


htndash1

Ctndash1

Xt

ht

otC~titft

Ct

yt

+

tanhσ σ σ

tanh

times

times times


IMF 1 iteration 0

ndash2

ndash1

0

1

2

20 30 40 50 60 70 80 90 100 110 12010


Complexity 5




s(i) 1113944N

j1fj(t) + rj(t) (11)

















i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)



where t 0 f 2f 1000f f 0001

4 Methodology


6 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity




s(i) 1113944N

j1fj(t) + rj(t) (11)

















i1Ci( 1113857 + Rn i 1 2 n minus 1 (12)



where t 0 f 2f 1000f f 0001

4 Methodology


6 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity







50

25

00

ndash25

2

0

ndash2

1

0

ndash1

3

2

1

000 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

00 02 04 06 08 10

Time (s)


Sign

alIM

F 1

IMF

2R


Fusion

EEMD

Rn

LSTM1


SubP1 SubP2 SubPn-1



SubPn

LSTM2 LSTMn-1

IMFn-1IMF2IMF1

LSTMn


Complexity 7






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity






5 Experiment Data




hellip

(1) Collection data

(2) Preprocess data



hellip

hellip

Mod

el ev

alua

tion

Mod

el p

redi

ctD

ata i

nput

(5) Fusion




IMF1 IMF2

LSTM1

SubP1 SubP2

LSTM2

SubPnndash1

LSTMnndash1

IMFnndash1 Rn

LSTMn

SubPn



8 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity






δ

1n minus 1

1113944

n

i1

11139741113972


(14)









Complexity 9

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity

3000

2000

1000

SP50

0

30000

20000

10000

HSI

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

0 1000 2000 3000 4000 5000

30

25

20

DA

X

8

6

4

2A

SX

0 200 400 600 800 1000 1200 1400




BARDRRMSE 0012349 114140660 326487107 0387025 0110975MAE 0011054 110146472 233641980 0256125 0075355R2 0999939 0998275 0992213 0978343 0988271


LSTMRMSE 0059597 30261563 306481549 0654662 0135236MAE 0051376 19164419 224250031 0504645 0106639R2 0998562 0997128 0980270 0962750 0931165



10 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity









7 Experiments






Complexity 11








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity








0

1

2

3

4

5

6

4020 80600

RealPredict


200015001000 1250 1750750250 50003

4

5

6

7

8

RealPredict


200100 400 5003000

18

20

22

24

26

28

30

32

RealPredict


12 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity








1800020000220002400026000280003000032000

1000 1200600 800200 4000

RealPredict

(a)

1750020000225002500027500300003250035000

200 400 600 800 1000 12000

RealPredict

(b)


Complexity 13




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity




200 8006004000 1200 14001000

2627

ndash250025

ndash202

ndash250025

ndash101

ndash101

ndash20

01


375376377

2224

ndash250025

ndash2500

ndash250025

ndash250025

ndash101

ndash101

ndash101

400 600 14001000 800 12002000


14 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity



Data Availability






Acknowledgments


References

















Complexity 15






























16 Complexity






























16 Complexity

Documents

A Hybrid Prediction Method for Stock Price Using LSTM and