Exploring data patterns
2
Overview
• The purpose of the forecast is to reduce the range of uncertainty within which management judgments must be made.
• Petunjuk Utama:• accurate enough and justified on a cost-
benefit basis• effectively presented
3
Forecasting Process
• Problem formulation and data collection
• Data manipulation and cleaning
• Model building and evaluation
• Model implementation (the actual forecast)
• Forecast evaluation
4
Problem formulation and data collection
• The problem determines the appropriate data• If a quantitative forecasting methodology is
being considered, the relevant data must be available and correct
• If appropriate data are not available• The problem may have to be redefined or a
non-quantitative forecasting methodology employed.
5
Data manipulation and cleaning
• Some effort is required to get data into a form that is required for using certain forecasting procedures.• It is possible to have too
much data as well as too little in forecasting process.
• Some data may not be relevant to the problem.
• Some data may have missing values that must be estimated.
Data
• Data should be reliable and accurate
• Data should be relevant
• Data should be consistent
• Data should be timely
Generally there are two types of data:
• Data collected at single point in time• Cross sectional data: observations collected at a single
point in time
• Observations of data made over time• Time series: data are collected, recorded, or observed over
successive increments of time
Time series data pattern
• Trend: long-term component that represents the growth or decline in the time series over an extended period of time
• Cyclical component: wavelike fluctuation around the trend
• Seasonal component: pattern of change that repeats itself year after year
8
Pola data
Trend-cyclicalSeasonal
Exploring data pattern: autocorrelation analysis
• Autocorrelation: the correlation between a variable lagged one or more periods and itself
• Autocorrelation coefficients for different time lags of a variable are used to identify time series data pattern
10
Autocorrelation Analysis
11
Pola Data Autocorrelation Analysis
12
12
1
23
22
1
n
t tt
n
tt
n
t tt
n
tt
Y Y Y Yr
Y Y
Y Y Y Yr
Y Y
12
Pola Data
• Random• Nilai rk untuk nilai berapa pun k mendekati nol
• Trend• Nilai rk untuk nilai k=1, tinggi dan k naik maka
rk makin kecil
• Seasonal/cyclic• Nilai rk untuk nilai pengulangan musim/siklus
akan tinggi.
13
Contoh
14
Contoh
100
110
120
130
140
150
160
170
0 5 10 15
15
r1
16
r2
r3
Correlogram
• Correlogram or autocorrelation function is a graph of autocorrelations for various lags of a time series
321
1,0
0,8
0,6
0,4
0,2
0,0
-0,2
-0,4
-0,6
-0,8
-1,0
Lag
Auto
corr
elation
Autocorrelation Function for C1(with 5% significance limits for the autocorrelations)
Random
10987654321
1,0
0,8
0,6
0,4
0,2
0,0
-0,2
-0,4
-0,6
-0,8
-1,0
Lag
Auto
corr
elation
Autocorrelation Function for C1(with 5% significance limits for the autocorrelations)
Trend
Lag
Auto
corr
ela
tion
16151413121110987654321
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Autocorrelation Function for index(with 5% significance limits for the autocorrelations)
Trend
• A significant relationship exists between successive time series values
• Autocorrelation coefficients are typically large for the first several time lags and then gradually drop toward zero as the number of lags increase
• Non-stationary time series• Stationary time series: mean and variance
remain constant over time varies about a fixed level (no growth or decline) over time
Seasonal data
Lag
Auto
corr
ela
tion
2624222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Autocorrelation Function for C1(with 5% significance limits for the autocorrelations)
Seasonal data
• If a series is seasonal, a pattern related to the calendar repeats itself over a particular interval of time (usually a year)
• Observations in the same position for different seasonal periods tend to be related
Choosing a forecasting technique
• Define the nature of the forecasting problem
• Explain the nature of the data under investigation
• Describe the capabilities and limitations of potentially useful forecasting techniques
• Develop some predetermined criteria on which the selection decision can be made
Major factor influencing the selection of forecasting technique is the identification and understanding of historical patterns in the data
Forecasting for stationary data
Stationary forecasting techniques are used whenever:
• The forces generating a series have stabilized and the environment in which the series exists is relatively unchanging
• A very simple model is needed because of lack of data or for ease of explanation or implementation
• Stability may be obtained by making simple corrections for factors such as population growth or inflation
• The series may be transformed into a stable one
• The series is a set of forecast errors from a forecasting technique that is consider adequate
Forecasting techniques: naïve, simple average, moving average, ARMA (Box-Jenkins)
Forecasting techniques for data with trend
• A time series is said to have trend if its average values changes over time (increase or decrease)
• Forecasting techniques:• Moving averages• Holt’s linear exponential smoothing• Simple regression• Growth curves• Exponential models• ARIMA (Box-Jenkins)
Forecasting techniques for seasonal data
Forecasting techniques for seasonal data are used whenever:
• Weather influences the variable of interest, i.e. electrical consumption, clothing, agricultural growing session
• The annual calendar influences the variable of interests
Forecasting techniques: classical decomposition, winter’s exponential smoothing, multiple regression, ARIMA
Forecasting techniques for cyclical series
• Cyclical patterns are difficult to model because their patterns are typically not stable
• The up-down wavelike fluctuations around the trend rarely repeat at fixed interval of time, the magnitude of fluctuations also tends to vary
• Because of the irregular behavior of cycles, analyzing cyclical component of the series often requires finding coincidental or leading economic indicators
Forecasting techniques for cyclical series
Forecasting cyclical data are used whenever:
• The business cycle influences the variable of interest
• Shifts in popular tastes occur
• Shifts in population occur
• Shifts in production life cycle occur
Forecasting techniques: classical decomposition, economic indicator, econometric models, multiple regression, ARIMA (Box-Jenkins)
other factors to consider
• Time horizon• Short-intermediate term: moving averages,
exponential smoothing, Box-Jenkins, classical decomposition, regression model
• Long term: regression model
• Time constraint• Exponential smoothing, trend projection, regression
model, classical decomposition
• Representation for decision making process• Regression model, trend projection, classical
decomposition, exponential smoothing
31
Tugas#1
• Cari data:• Jakarta IHSG• Harga Minyak Dunia• Harga Emas• Kurs Rupiah terhadap dollar• Gross Domestic Product• Kebutuhan darah • Jumlah wisatawan DIY• Konsumsi listrik/air/kebutuhan pokok (sembako)
• Evaluasi Pola Datanya
• Laporan:• 10 halaman maksimum• Presentasi minggu depan
32
Model building and evaluation
• Fitting the collected data into a forecasting model that is appropriate in terms of minimizing forecasting error.
• The simpler the model, the better it is in terms of gaining acceptance.
• Judgment is involved in this selection process.
33
Model implementation (the actual forecast)
• Forecasting for recent periods in which the actual historical values are known is often used to check the accuracy of the process.
34
Forecast evaluation
• Comparing forecast values with the actual historical values.
• Examination of the error patterns often leads the analysis to a modification of the forecasting procedure.
35
Measuring forecasting errors
1. Mean Absolute Deviation (MAD)
1
n
t ti
A FMAD
n
2. Mean Square Error (MSE)
2
1
( )n
t ti
A FMSE
n
36
Measuring forecasting errors
3. Mean Percentage Error (MPE)
)
1
(
100
nt t
i t
A F
AMPE
n
4. Mean Absolute Percentage Error (MAPE)
1 100
nt t
i t
A F
AMAPE
n
37
Measuring forecasting errors
6. Tracking Signal
t tX FTS
MAD
5. R-squared
38
Metode Validasi Silang
• Simple cross validation• Training set & test set
• Double cross validation• Training set & test set + silang
• Test-set cross validation• 30% acak test set & sisanya training
set
39
Metode Validasi Silang
• k-fold cross validation• Dibagi k bagian, k untuk test set &
sisanya training set + diulang k kali
• Leave-on-out cross validation• Mirip k-fold, dengan k=jumlah data
______________________________Creative-Productive-Efficient
40
Questions?