6
Supplementary Information Application of different Time Series Models on Epidemiological Data - Comparison and Predictions for Malaria Prevalence Ram Rup Sarkar 1,2,* ,ChandrajitChatterjee 3 1 Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune 411008, Maharashtra, India 2 Academy of Scientific & Innovative Research (AcSIR), CSIR-NCL Campus, Pune 411008, India 3 MinistryofStatisticsand Programme Implementation, GovernmentofIndia, East Block-6, Level 4, R. K. Puram, New Delhi 110066, India *Corresponding Author: Dr. Ram Rup Sarkar, Tel: +91-20-2590 3040; Fax: +91-20-2590 2621; E-Mail: [email protected] Analysis of Financial Data Data and Diagnostic checks: Stock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages of IBM (NYSE) stock price. Though the data on Indian stock prices with detailed history is available for longer time periods for all stocks, but we have chosen 40 time points since our interest is to investigate the small fluctuations of the logged stock prices and not the volatility patterns that become evident through clustering over large time horizons. Moreover, to compare and contrast two types of time series data (stock price and epidemiological), which are completely different in nature, we chose similar sample sizes for both (one of our epidemiological data sets contain similar number of data points). This data will be used as a test set for the models chosen in this paper to establish their applicability in the domain to which they originally belong. For stock price data, we observe from the Levene’s test that it is heteroskedastic and we reject the null hypothesis of equal variance across groups (p-value < 0.01). Figure S1 shows the histogram, Q-Q plot and fitted time series models for the stock price data. The plot of quantiles of the stock prices (FigureS1(b)) show close proximity to the normal distribution in the middle and erratic patterns in the tails, which is a common behavior in this type of fat-tailed data

SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

Supplementary Information

Application of different Time Series Models on Epidemiological Data - Comparison and Predictions for Malaria Prevalence

Ram Rup Sarkar1,2,*,ChandrajitChatterjee3

1 Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune 411008, Maharashtra, India2 Academy of Scientific & Innovative Research (AcSIR), CSIR-NCL Campus, Pune 411008, India 3MinistryofStatisticsand Programme Implementation, GovernmentofIndia, East Block-6, Level 4, R. K. Puram, New Delhi 110066, India

*Corresponding Author: Dr. Ram Rup Sarkar, Tel: +91-20-2590 3040; Fax: +91-20-2590 2621; E-Mail: [email protected]

Analysis of Financial Data

Data and Diagnostic checks: Stock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages of IBM (NYSE) stock price. Though the data on Indian stock prices with detailed history is available for longer time periods for all stocks, but we have chosen 40 time points since our interest is to investigate the small fluctuations of the logged stock prices and not the volatility patterns that become evident through clustering over large time horizons. Moreover, to compare and contrast two types of time series data (stock price and epidemiological), which are completely different in nature, we chose similar sample sizes for both (one of our epidemiological data sets contain similar number of data points). This data will be used as a test set for the models chosen in this paper to establish their applicability in the domain to which they originally belong.

For stock price data, we observe from the Levene’s test that it is heteroskedastic and we reject the null hypothesis of equal variance across groups (p-value < 0.01). Figure S1 shows the histogram, Q-Q plot and fitted time series models for the stock price data. The plot of quantiles of the stock prices (FigureS1(b)) show close proximity to the normal distribution in the middle and erratic patterns in the tails, which is a common behavior in this type of fat-tailed data sets. Moreover, for the stock price data we observed that the data set is a typical non-stationary series, which settles down eventually with increase in number of lags (FigureS2(a)).

Page 2: SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

Figure S1: Stock price data: a) Histogram, b) Q-Q plot, and c) fitted time series models.

To check the suitability of all the time series models (ARIMA, GARCH and Random Walk) on the financial data we applied them on the stock price data and the corresponding models are obtained, which are summarized below.

Models for Stock prices data: Figure S1(c) shows the comparative plot of fits for different time series models to the stock prices data. The characteristics for each of these fits are listed below and the statistical summary for the fitted time series is given in Table S1:

(i) ARMA (6, 6) model was found to fit the stock price data with coefficient of determination (R 2) value of 93.78%, signifying a good fit. This involves model parameters with 6 auto-regressive terms and 6 moving average terms, intercept included.

(ii) ARCH (1) was the optimal choice from the family of GARCH models with R 2 of 83.21%. This fit was an expected result as stock price data almost always have volatility clusters whereby there is conditional dependence of the volatility of one point on the variance of the previous point, justifying the applicability of the model.

Page 3: SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

(iii) The Random Walk model has R2 of 99.6% and shows best fit using random noise terms, which is because the fluctuations are short and sharp along the trend component, and is modeled appropriately by the random walk model.

Table S1:Model selection and diagnostics test for Stock price data.

We have performed the JarqueBera test of normality of residuals and Augmented Dickey Fuller test for stationarity in the predicted series. From Table S1 it can be easily observed that the p-values indicate inability to reject both the null hypotheses for all models applied. Thus we may infer for all the models for stock price, the fits have normal residuals and the unit root of the predicted series is stationary.

Predictions:We have forecasted few more points for the Stock price data (six points) and validated against the available observed data (Table S2). The results show good predictions for each of these models. From Table S2, the AIC suggests that the ARCH (1) (or GARCH (0,1)) model provides the best approximation for Stock price data. This is very common to historical experience since a heteroskedastic model will always provide the best fit to a volatile stock price data set.

Table S2: Observed Stock Price data, Model predictions and AIC.

Model selection

ARIMA GARCH Random Walk

Model chosen ARIMA (6,0,6) GARCH (0,1) -

Model R2 93.78% 83.21% 99.60%

Diagnostics

JarqueBera Test

Chi-square statistic 1.03 0.47 -1.96

Degrees of freedom 2 2 3

p-value 0.6 0.79 0.59

Augmented Dickey-Fuller Test

D-F statistic -2.19 -1.94 2.91

Lag order 3 3 2

p-value 0.5 0.6 0.23

Page 4: SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

Continuous time domain model: This is a well-known model among the financial economists and has wide application since early twentieth century in asset price modeling [2]. In established theories as Black-Scholes theorem of modeling of European price options, it is assumed that the underlying stock prices may be modeled following a Geometric Brownian motion. In our study also it gives a reasonably good fit with the stock price data with an R2 of 66.51% and an adjusted R2 of 65.63% (FigureS2(b) shows the model fit).

Figure S2: Stock price data: a) Auto-correlation functions, and b) Simulated time series of the Geometrical Brownian motion.

The results obtained for the stock price data sets shows that these time series models are suitable for financial data analysis. In fact this data set acts as a test data set to identify and confirm that the time series models that we have chosen are accepted methods for modeling such data series. In the main paper, we demonstrate that these models are equally applicable for the malaria data and provide excellent methods for forecasting.

References:

Observed data(Lower and upper confidence

intervals)Predictions

GARCH ARIMA Random Walk

117.3 (73.13,161.37) 116.66 115.27 117.07

117.7 (73.50,161.92) 117.45 115.27 117.54

117.8 (73.61,162.08) 117.86 116.01 117.68

119.0 (74.43,163.29) 117.95 114.21 118.72

119.0 (74.57,163.50) 119.16 114.47 118.9

118.5 (74.24,163.01) 119.1 113.64 118.48

AIC

Sum of Squares 522103.71 511927.87 523462.37

 419.07 382.61 383.5

Page 5: SM Journals - Models for Stock prices data: · Web viewStock price (daily) data was collected from open source of the New York Stock Exchange [1]. The data consists of 40 daily averages

1. The New York Stock Exchange. Available: http://www.nyse.com/listed/ibm.html

2. Bachelier L. Théorie de la speculation. AnnalesScientifiques de L’ÉcoleNormaleSupérieure. (English translation- A. J. Boness and Cootner, P.H. (1964) The random character of stock market prices. Cambridge, MA: MIT Press 17 – 75.), 1900.