Upload
koushik-rakshit
View
25
Download
0
Embed Size (px)
Citation preview
Sales Forecasting of an Airline Company(Time Series Analysis)
Submitted By:-Ankush RoyAshitha VS
Koushik RakshitKrishna B
Roma Agrawal
04/18/2023
Time Series Analysis Using SAS
2
Agenda
Introduction Objective Data Preparation
Check for Volatility Check for Non-Stationarity Check for Seasonality
Creation of Test and Training Datasets Building Model and Validation Forecasting Graphical Representation Appendix
04/18/2023
Time Series Analysis Using SAS
3
Introduction
What is Time Series Analysis?Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.
Time series forecasting is the use of a model to predict future values based on previously observed values.
Component of Time Series:1. Seasonal variations: repeats over a specific period such as a day, week, month,
season, etc.2. Trend variations: up or down movement in a reasonably predictable pattern3. Cyclical variations: that correspond with business or economic 'boom-bust'
cycles or follow their own peculiar cycles4. Random variations: Irregular erratic fluctuations
04/18/2023
Time Series Analysis Using SAS
4
Objective
Data Description:
The dataset contains two variables: DATE and AIR.1. DATE: contains sorted SAS date values recorded from Jan 1949 to Dec 1960. 2. AIR: contains the sales value in that month
Objective:On the basis of given data, predict the sales for next 12 months (Jan 1961 to Dec 1961)
04/18/2023
Time Series Analysis Using SAS
5
• Scatterplot created by taking time on x-axis & sales on y-axis to get an idea about data
• A Japanese fan shaped or an inverted fan shaped plots are indicators of high volatile data
• Transformation needs to be done to convert high volatile data to low volatile
• In our case, the initial graph was fan shaped. We have gone for log and square root transformations
• Among the two LOG provided a better result & hence it was chosen.
Data PreparationVolatility Check
04/18/2023
Time Series Analysis Using SAS
6
• A non stationary data is completely memory less with no fixed patterns.Such a data can’t be used for forecasting
• Non-Stationarity is checked by using Augmented Dickey Fuller Test (ADF).
• Non-Stationarity can be removed by differencing
• In our case, data was found to be non-stationary
• Hence, differencing was done to make data stationary
Data PreparationNon-Stationarity Check
Note: Differencing was done on LOG transformed data
04/18/2023
Time Series Analysis Using SAS
7
• Autocorrelation function(ACF) gives the correlation between Y(t) & Y(t-s); S is the period of lag
• If ACF gives high values at fixed interval, then it can be considered as period of seasonality
• A differencing of same order would be done to de-seasonalize the data
• In our case,it was found that ACF gave high values at fixed intervals of 12 (so, S=12)
• Hence differencing was done at an interval of 12
Data PreparationSeasonality Check
04/18/2023
Time Series Analysis Using SAS
8
Creation of Test and Training Dataset
Training Dataset: Part of dataset which is used to build a model
Test Dataset: Part of dataset used to validate the model built
Forecasting needs to be done for 1 year(12 months), therefore we will keep last one year of data (year 1960) as the test dataset and rest of the data will be used to built the model as a training dataset.
04/18/2023
Time Series Analysis Using SAS
9
Building Model and Validation
MINIC (Minimum Information Criteria) option under PROC ARIMA generates the minimum BIC (Bayesian Information Criteria) Model after exploring all the possible combinations of ‘p’ (Auto Regressive) and ‘q’ (Moving Average) lags from 0 to 5 (default).
04/18/2023
Time Series Analysis Using SAS
10
By observation, we can see that the minimum of the matrix is the value -6.3503 corresponding to AR 3 and MA 0 location(i.e. p=3 & q=0).
We will consider all the models in the neighborhood of this model and for each of them will generate AIC (Akaike Information Criteria) and SBC (Schwartz Bayesian Criteria) and calculate the average of them.
We will select the top 6-7 models based on relatively lower value of the average and for each of them generate forecasts.
Detailed excel sheet for all AIC,SBC and MAPE values is at Location
Building Model and ValidationContinued…
04/18/2023
Time Series Analysis Using SAS
11
Forecasting
The forecasts generated (for the year 1960) for each of the 6 combination selected from AIC & SBC separately compared with the actual values of the same time point stored in the test dataset
‘MAPE’ (Mean Absolute Percentage Error) is calculated for above 6 forecasted values for the year 1960
Lowest MAPE value comes out to be for P=0 and Q=3, hence final forecasting will be done using this model.
04/18/2023
Time Series Analysis Using SAS
12
Forecasted Values
04/18/2023
Time Series Analysis Using SAS
13
Graphical Representation
Jan-
49
May
-49
Sep-4
9
Jan-
50
May
-50
Sep-5
0
Jan-
51
May
-51
Sep-5
1
Jan-
52
May
-52
Sep-5
2
Jan-
53
May
-53
Sep-5
3
Jan-
54
May
-54
Sep-5
4
Jan-
55
May
-55
Sep-5
5
Jan-
56
May
-56
Sep-5
6
Jan-
57
May
-57
Sep-5
7
Jan-
58
May
-58
Sep-5
8
Jan-
59
May
-59
Sep-5
9
Jan-
60
May
-60
Sep-6
0
Jan-
61
May
-61
Sep-6
1
50
100
150
200
250
300
350
400
450
500
550
600
650
700
Actual Vs Forecast
Actual Sales Values Forecasted Sales Values
Date
Sa
les
Va
lue
s
04/18/2023
Time Series Analysis Using SAS
14
Appendix
Full Code is at “SAS code for forecasting”
15
04/18/2023
Time Series Analysis Using SAS