A STUDY OF NEURAL NETWORKS AND MULTIPLE NEURAL NETWORKS …

A STUDY OF

NEURAL NETWORKS AND MULTIPLE NEURAL

NETWORKS

IN MAKING SHORT-TERM AND LONG-TERM TIME-SERIES

PREDICTION

OF PETROLEUM PRODUCTION AND GAS CONSUMPTION

A Thesis

Submitted to the Faculty of Graduate Studies and Research

In Partial Fulfillment of the Requirements

for the Degree of

Master of Science

in

Computer Science

University of Regina

by

Hanh Hong Nguyen

Regina, Saskatchewan

November, 2002

Copyright 2002: H.H. Nguyen

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A STUDY OF

NEURAL NETWORKS AND MULTIPLE NEURAL

NETWORKS

IN MAKING SHORT-TERM AND LONG-TERM TIME-SERIES

PREDICTION

OF PETROLEUM PRODUCTION AND GAS CONSUMPTION

A Thesis

Submitted to the Faculty of Graduate Studies and Research

In Partial Fulfillment of the Requirements

for the Degree of

Master of Science

in

Computer Science

University of Regina

by

Hanh Hong Nguyen

Regina, Saskatchewan

November, 2002

Copyright 2002: H.H. Nguyen


National Library of Canada

Acquisitions and Bibliographic Services

395 Wellington Street Ottawa ON KlA ON4 Canada

Bibliotheque nationale du Canada

Acquisisitons et services bibliographiques

395, rue Wellington Ottawa ON KlA ON4 Canada

The author has granted a non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

Canada

Your file Votre reference ISBN: 0-612-82633-3 Our file Notre reference ISBN: 0-612-82633-3

L'auteur a accorde une licence non exclusive permettant a la Bibliotheque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de cette these sous la forme de microfiche/film, de reproduction sur papier ou sur format electronique.

L'auteur conserve la propriete du droit d'auteur qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou aturement reproduits sans son autorisation.


National Library of C an ad a

A cquisitions and Bibliographic S erv ices

395 Wellington Street Ottawa ON K1A 0N4 Canada

B ibliotheque nationale du C an a d a

A cquisisitons et se rv ic es b ib liographiques

395, rue Wellington Ottawa ON K1A 0N4 Canada

Your file Votre reference ISBN: 0-612-82633-3 Our file Notre reference ISBN: 0-612-82633-3

The author has granted a nonexclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

L'auteur a accorde une licence non exclusive permettant a la Bibliotheque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de cette these sous la forme de microfiche/film, de reproduction sur papier ou sur format electronique.

L'auteur conserve la propriete du droit d'auteur qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou aturement reproduits sans son autorisation.

CanadaReproduced with permission of the copyright owner. Further reproduction prohibited without permission.

UNIVERSITY OF REGINA

FACULTY OF GRADUATE STUDIES AND RESEARCH

CERTIFICATION OF THESIS WORK

We, the undersigned, certify that Hanh Hong Nguyen, candidate for the Degree of Master of Science, has presented a thesis titled A Study of Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction of Petroleum Production and Gas Consumption, that the thesis is acceptable in form and content, and that the student demonstrated a satisfactory knowledge of the field covered by the thesis in an oral examination held ecember 13, 2002.

External Examiner:

Internal Examiners:

ordon Huang, Faculty of Engineering

Dr. Christine Chan, Supervisor

•




CERTIFICATION OF THESIS WORK

We, the undersigned, certify that Hanh Hong Nguyen, candidate for the Degree of M aster of Science, has presented a thesis titled A Study o f Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction o f Petroleum Production and Gas Consumption, that the thesis is acceptable in form and content, and that the studen t dem onstrated a satisfactory knowledge of the field coveredby the thesis in an oral examination held lecem ber 13, 2002.

External Examiner:p f^ jo rd o n Huang, Faculty of Engineering

Internal Examiners:Dr. Christine Chan, Supervisor




PERMISSION TO USE POSTGRADUATE THESIS

TITLE OF THESIS: A Study of Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction of Petroleum Production and Gas Consumption

NAME OF AUTHOR: Hanh Hong Nguyen

DEGREE: Master of Science

In presenting this thesis in partial fulfillment of the requirements for a postgraduate degree from the University of Regina, I agree that the Libraries of this University shall make it freely available for inspection. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the professor or professors who supervised my thesis work, or in their absence, by the Head of the Department or the Dean of the Faculty in which my thesis work was done. It is understood that with the exception of UMI Dissertations Publishing (UMI) that any copying, publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Regina in any scholarly use which may be made of my material in my thesis.

SIGNATURE:

DATE: Dec 13t ' zo 02-




PERMISSION TO USE POSTGRADUATE THESIS

TITLE OF THESIS: A Study of Neural Networks and Multiple Neural Networks in Making Short-Term andLong-Term Tim e-Series Prediction of Petroleum Production and G as Consumption

NAME OF AUTHOR: Hanh Hong Nguyen

DEGREE: Master of Science

In presenting this thesis in partial fulfillment of the requirem ents for a postgraduate degree from the University of Regina, I ag ree that the Libraries of this University shall m ake it freely available for inspection. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the professor or professors who supervised my thesis work, or in their absence, by the Head of the Departm ent or the Dean of the Faculty in which my thesis work was done. It is understood that with the exception of UMI Dissertations Publishing (UMI) that any copying, publication or u se of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to m e and to the University of Regina in any scholarly use which may be m ade of my material in my thesis.

SIGNATURE:

DATE: D | 3 " ^ 2.0 0 2-


Abstract

The task of modeling data is difficult when the data of some variables are unavailable

either totally or partially during the examined time span. Lacking the data, it is at times

impossible to model causal relationships between those variables and the variable to be

forecasted. In such a case, a possible solution is to use univariate time series modeling

where the historical data of the variable of interest is used to develop a model. In this

thesis, a univariate time series approach, using solely the petroleum production and gas

flow rate respectively is taken to construct two stand-alone feed-forward neural network

forecasting models. Neural network approach was chosen for the tasks due to its ability to

handle non-linearity and its freedom from a priori selection of mathematical models. The

results of the experiments suggest that one-step-ahead forecasts can be made with

reasonably accuracy.

A relatively novel outcome of this thesis is the integration of individual artificial

neural networks into a single model that may produce better long-term predictions. Each

component network is constructed for making direct forecasts of different time interval

ahead. The combination of individual artificial neural networks, called a multiple neural

network model, propagates forward in different-length steps in order to make forecasts.

Due to the various step-lengths, it is expected that the number of recursion steps is

smaller, and hence the accumulative error is lower.

ii


Abstract

The task of modeling data is difficult when the data of some variables are unavailable

either totally or partially during the examined time span. Lacking the data, it is at times

impossible to model causal relationships between those variables and the variable to be

forecasted. In such a case, a possible solution is to use univariate time series modeling

where the historical data of the variable of interest is used to develop a model. In this

thesis, a univariate time series approach, using solely the petroleum production and gas

flow rate respectively is taken to construct two stand-alone feed-forward neural network

forecasting models. Neural network approach was chosen for the tasks due to its ability to

handle non-linearity and its freedom from a priori selection of mathematical models. The

results of the experiments suggest that one-step-ahead forecasts can be made with

reasonably accuracy.

A relatively novel outcome of this thesis is the integration of individual artificial

neural networks into a single model that may produce better long-term predictions. Each

component network is constructed for making direct forecasts of different time interval

ahead. The combination of individual artificial neural networks, called a multiple neural

network model, propagates forward in different-length steps in order to make forecasts.

Due to the various step-lengths, it is expected that the number of recursion steps is

smaller, and hence the accumulative error is lower.


Acknowledgements

I wish to express my sincere thanks to Dr. Christine W. Chan for her supervision,

encouragement and financial support.

Thanks Saskatchewan Energy and Mines and SaskEngergy for providing the data on the

petroleum and gas consumption domains.

Special thanks go to Dr. Malcolm Wilson, Erik Nickel, Chris Gilboy and Dr. Gang Zhao

for providing their wise suggestions and expertise on the petroleum domain.

I am grateful to the Faculty of Graduate Studies and Research of the University of Regina

for providing the scholarships.

A warm word of thanks goes to my friends Tran Thi Minh Chau, Linhui Jiang and the

current and past students at the ERU lab for their friendships and useful discussions.

Last but not least, I wish to express my gratefulness to my parents and sister for their

incredible encouragement and emotional support.

iii


Acknowledgements

I wish to express my sincere thanks to Dr. Christine W. Chan for her supervision,

encouragement and financial support.

Thanks Saskatchewan Energy and Mines and SaskEngergy for providing the data on the

petroleum and gas consumption domains.

Special thanks go to Dr. Malcolm Wilson, Erik Nickel, Chris Gilboy and Dr. Gang Zhao

for providing their wise suggestions and expertise on the petroleum domain.

I am grateful to the Faculty of Graduate Studies and Research of the University of Regina

for providing the scholarships.

A warm word of thanks goes to my friends Tran Thi Minh Chau, Linhui Jiang and the

current and past students at the ERU lab for their friendships and useful discussions.

Last but not least, I wish to express my gratefulness to my parents and sister for their

incredible encouragement and emotional support.


iii

Table of Contents

ABSTRACT ii

ACKNOWLEDGEMENT iii

TABLE OF CONTENTS ..iv

LIST OF TABLES .viii

LIST OF FIGURES .ix

1. INTRODUCTION 1

1.1 MOTIVATION AND RESEARCH OBJECTIVES 1

1.2 THESIS STRUCTURE 3

2. BACKGROUND ON FORCASTING AND TIME SERIES

FORECASTING 4

2.1 OVERVIEW OF FORCASTING..... 4

2.1.1 Forecasting System Framework ...4

2.1.2 Forecasting Applications 6

2.1.3 Classification of Forecasting Models 7

2.1.4 Selection of a Forecasting Model 9

2.2 OVERVIEW OF TIME SERIES AND TIME SERIES FORECASTING 10

2.2.1 Time Series ..10

2.2.2 Decompositions of Time Series ...12

2.2.3 Performance Criteria 16

2.2.4 Time Series Forecast Techniques ..18

iv


Table of Contents

ABSTRACT................................................................................................................................. ii

ACKNOWLEDGEMENT....................................................................................................... iii

TABLE OF CONTENTS......................................................................................................... iv

LIST OF TABLES................................................................................................................... viii

LIST OF FIGURES................................................................................................................... ix

1. INTRODUCTION................................................................................................................ 1

1.1 MOTIVATION AND RESEARCH OBJECTIVES.................................................... 1

1.2 THESIS STRUCTURE....................................................................................................3

2. BACKGROUND ON FORCASTING AND TIME SERIES

FORECASTING................................................................................................................... 4

2.1 OVERVIEW OF FORCASTING.................................................................................. 4

2.1.1 Forecasting System Framework.............................................................................4

2.1.2 Forecasting Applications......................................................................................... 6

2.1.3 Classification o f Forecasting Models.................................................................... 7

2.1.4 Selection o f a Forecasting M odel...........................................................................9

2.2 OVERVIEW OF TIME SERIES AND TIME SERIES FORECASTING.............10

2.2.1 Time Series............................................................................................................... 10

2.2.2 Decompositions o f Time Series............................................................................. 12

2.2.3 Performance Criteria..............................................................................................16

2.2.4 Time Series Forecast Techniques.......................................................................... 18

iv


2.2.4.1 Simple Exponential Smoothing 18

2.2.4.2 Holt-Winter 19

2.2.4.3 Univiriate Box-Jenkins .20

2.2.4.4 Memory Based Reasoning 22

2.2.4.5 Artificial Neural Networks 25

3. METHODOLOGY: NEURAL NETWORKS AND MULTIPLE-NEURAL-

NETWORK FRAMEWORK ...28

3.1 BACKGROUND ON NEURAL NETWORKS 28

3.1.1 Artificial Neurons .29

3.1.2 Transfer Functions ..31

3.2 BACK PROPAGATION LEARNING PROCEDURE .32

3.2.1 Generalized Delta Rule and Gradient Descent .32

3.2.2 Back-propagation Formulae .33

3.2.3 Back-propagation Procedure .37

3.3 CONSIDERATIONS ON NEURAL NETWORK TOPOLOGY AND

TRAINING PARAMETERS .38

3.4 LITERATURE REVIEW: MULTIPLE NEURAL NETWORKS

APPROACHES 39

3.5 MULTIPLE NEURAL NETWORK APPROACH .45

3.5.1 Motivation ...45

3.5.2 Structure of a Multiple Neural Network Model .47

3.6 TOOLS .49

3.6.1 NeuroOn-line Tool-kit .49

V


2.2.4.1 Simple Exponential Smoothing.......................................................................... 18

2.2.4.2 Holt-Winter........................................................................................................... 19

2.2.4.3 Univiriate Box-Jenkins........................................................................................ 20

2.2.4.4 Memory Based Reasoning..................................................................................22

2.2.4.5 Artificial Neural Networks..................................................................................25

3. METHODOLOGY: NEURAL NETWORKS AND MULTIPLE-NEURAL-

NET WORK FRAMEWORK.......................................................................................... 28

3.1 BACKGROUND ON NEURAL NETWORKS.......................................................28

3.1.1 Artificial Neurons.................................................................................................. 29

3.1.2 Transfer Functions................................................................................................ 31

3.2 BACK_PROPAGATION LEARNING PROCEDURE..........................................32

3.2.1 Generalized Delta Rule and Gradient Descent................................................. 32

3.2.2 Back-propagation Formulae................................................................................33

3.2.3 Back-propagation Procedure...............................................................................37

3.3 CONSIDERATIONS ON NEURAL NETWORK TOPOLOGY AND

TRAINING PARAMETERS........................................................................................ 38

3.4 LITERATURE REVIEW: MULTIPLE NEURAL NETWORKS

APPROACHES..............................................................................................................39

3.5 MULTIPLE NEURAL NETWORK APPROACH................................................. 45

3.5.1 Motivation.............................................................................................................. 45

3.5.2 Structure o f a Multiple Neural Network M odel.................................................47

3.6 TOOLS........................................................................................................................... 49

3.6.1 NeuroOn-line Tool-kit.......................................................................................... 49

v


3.6.2 Multiple Neural Network Tool ...50

4. CASE STUDIES .61

4.1 PETROLEUM PRODUCTION PREDICTION 61

4.1.1 Data ..64

4.1.1.1 Data Collection 64

4.1.1.2 Data Cleaning and Transformation .64

4.1.1.3 Data Set Manipulation .66

4.1.2 Using NeurOn-line ...66

4.1.2.1 Development of a Model of Production Time Series and Geoscience

Parameters .67

4.1.2.2 Development of a Model of Production Time Series Only .69

4.1.3 Using Multiple Neural Network 69

4.1.4 Results 71

4.1.4.1 NOL Model ..71

4.1.4.2 Multiple-ANN and Single-ANN Models .73

4.1.5 Discussions .75

4.1.6 Conclusion and Future Works ..76

4.2 HOURLY GAS FLOW PREDICTION .76

4.2.1 Data Collection and Pre-processing .79

4.2.2 Training and Validation 79

4.2.3 Testing 81

4.2.4 Discussions .83

4.2.5 Conclusion and Future Works .83

vi


3.6.2 Multiple Neural Network Tool..............................................................................50

4. CASE STUDIES.................................................................................................................61

4.1 PETROLEUM PRODUCTION PREDICTION........................................................ 61

4.1.1 Data .......................................................................................................................... 64

4.1.1.1 Data Collection..................................................................................................... 64

4.1.1.2 Data Cleaning and Transformation................................................................... 64

4.1.1.3 Data Set Manipulation......................................................................................... 66

4.1.2 Using NeurOn-line................................................................................................. 66

4.1.2.1 Development of a Model of Production Time Series and Geoscience

Parameters.............................................................................................................67

4.1.2.2 Development of a Model of Production Time Series Only............................69

4.1.3 Using Multiple Neural Network........................................................................... 69

4.1.4 Results......................................................................................................................71

4.1.4.1 NOL M odel.......................................................................................................... 71

4.1.4.2 Multiple-ANN and Single-ANN Models......................................................... 73

4.1.5 Discussions..............................................................................................................75

4.1.6 Conclusion and Future Works...............................................................................76

4.2 HOURLY GAS FLOW PREDICTION...................................................................... 76

4.2.1 Data Collection and Pre-processing................................................................... 79

4.2.2 Training and Validation........................................................................................ 79

4.2.3 Testing......................................................................................................................81

4.2.4 Discussions.............................................................................................................. 83

4.2.5 Conclusion and Future Works............................................................................. 83

vi


5. OBSERVATIONS AND DISCUSSIONS ...84

5.1 DISCUSSIONS ON SUITABILITY OF TIME SERIES MODELLING IN

FORECASTING. .84

5.2 DISCUSSIONS ON USING THE NOL TOOL-KIT .85

5.3 DISCUSSIONS ON USING THE MNN TOOL .85

5.3.1 Reusing weights of lower-ordered ANNs .85

5.3.2 Using multi-step validation 86

5.3.3 Setting training parameters 87

5.3.4 Updating training parameters 88

6. CONCLUSION AND FUTURE WORKS .90

6.1 CONCLUDING SUMMARY ...90

6.2 FUTURE WORKS ...92

BIBLIOGRAPHY .95

APPENDIX A - RUNNING THE MNN TOOL .100

APPENDIX B - FORMATS OF PARAMETER AND DATA FILES FOR THE

MNN TOOL ..101

APPENDIX C - SAMPLE DATA ..104

vii


5. OBSERVATIONS AND DISCUSSIONS......................................................................84

5.1 DISCUSSIONS ON SUITABILITY OF TIME SERIES MODELLING IN

FORECASTING.............................................................................................................84

5.2 DISCUSSIONS ON USING THE NOL TOOL-KIT................................................85

5.3 DISCUSSIONS ON USING THE MNN TOOL....................................................... 85

5.3.1 Reusing weights o f lower-ordered ANNs.............................................................85

5.3.2 Using multi-step validation....................................................................................86

5.3.3 Setting training parameters....................................................................................87

5.3.4 Updating training parameters............................................................................... 88

6. CONCLUSION AND FUTURE WORKS.................................................................... 90

6.1 CONCLUDING SUMMARY......................................................................................90

6.2 FUTURE WORKS........................................................................................................ 92

BIBLIOGRAPHY..................................................................................................................... 95

APPENDIX A - RUNNING THE MNN TOOL............................................................... 100

APPENDIX B - FORMATS OF PARAMETER AND DATA FILES FOR THE

MNN TOOL..............................................................................................................................101

APPENDIX C - SAMPLE DATA....................................................................................... 104

vii


List of Tables

Table 4.1.1 Network Configuration — model 1 .68

Table 4.1.2 Network configuration — model 2 69

Table 4.1.3 Sensitivities — model 1 .73

Table 4.1.4 Sensitivities — model 2 .73

Table C.1 Sample of oil production data 104

Table C.2 Sample of raw core analysis data 106

Table C.3 Sample of pressure data 107

Table C.4 Sample of flow rate data at Melfort station 107

viii


List of Tables

Table 4.1.1 Network Configuration - model 1........................................................................68

Table 4.1.2 Network configuration - model 2.........................................................................69

Table 4.1.3 Sensitivities - model 1............................................................................................73

Table 4.1.4 Sensitivities - model 2 ........................................................................................... 73

Table C .l Sample of oil production data................................................................................104

Table C.2 Sample of raw core analysis data.......................................................................... 106

Table C.3 Sample of pressure data..........................................................................................107

Table C.4 Sample of flow rate data at Melfort station..........................................................107


viii

List of Figures

Figure 2.1 Conceptual framework of a forecasting system ...5

Figure 2.2 A broad classification of forecasting methods ...8

Figure 2.3 Australian monthly production of basic iron .13

Figure 2.4 The time series from the previous figure after removing trend effect .13

Figure 2.5 A time series with seasonal effect .14

Figure 2.6 A sample k-d tree ..24

Figure 3.1 A Multi-layer Artificial Neural Network ..29

Figure 3.2 An Artificial Neuron .30

Figure 3.3 Activation functions 31

Figure 3.4 Layers in a feed-forward neural network 34

Figure 3.5 A sample MNN model .48

Figure 3.6 Classes of the neural network system of the MNN tool 50

Figure 3.7 Screen for inputting training parameters .53

Figure 3.8. Screen for inputting training parameters of component neural network 55

Figure 3.9. Screen for inputting testing parameters ..57

Figure 3.10. Screen for inputting parameters for prediction .58

Figure 3.11 Screen for training output .59

Figure 3.12 Screen for inputting testing output 59

Figure 3.13 Screens for prediction output .60

Figure 4.1.1 Well production history .63

ix


List of Figures

Figure 2.1 Conceptual framework of a forecasting system..................................................... 5

Figure 2.2 A broad classification of forecasting methods........................................................8

Figure 2.3 Australian monthly production of basic iron.........................................................13

Figure 2.4 The time series from the previous figure after removing trend effect...............13

Figure 2.5 A time series with seasonal effect.......................................................................... 14

Figure 2.6 A sample k-d tree...................................................................................................... 24

Figure 3.1 A Multi-layer Artificial Neural Network..............................................................29

Figure 3.2 An Artificial Neuron.................................................................................................30

Figure 3.3 Activation functions.................................................................................................31

Figure 3.4 Layers in a feed-forward neural network.............................................................. 34

Figure 3.5 A sample MNN model..............................................................................................48

Figure 3.6 Classes of the neural network system of the MNN tool......................................50

Figure 3.7 Screen for inputting training parameters............................................................... 53

Figure 3.8. Screen for inputting training parameters of component neural network......... 55

Figure 3.9. Screen for inputting testing parameters................................................................ 57

Figure 3.10. Screen for inputting parameters for prediction..................................................58

Figure 3.11 Screen for training output......................................................................................59

Figure 3.12 Screen for inputting testing output.......................................................................59

Figure 3.13 Screens for prediction output.................................... 60

Figure 4.1.1 Well production history........................................................................................ 63

ix


Figure 4.1.2 Elimination of incomplete records ...65

Figure 4.1.3 Predicted vs. target — model 1 .71

Figure 4.1.4 Predicted vs. target — model 2 .72

Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods ..74

Figure 4.1.6 Desired vs. predicted outputs ...74

Figure 4.2.1 Schematic of St. Louis East system .77

Figure 4.2.2 Hourly flow during a day ...78

Figure 4.2.3 Validated RMSE of 5 models for 24 hour period .80

Figure 4.2.4 Test errors for MNN and single ANN for 24 hour period .81

Figure 4.2.5 Predicted vs. actual for 24 hours ahead .82

Figure 4.2.6 Predicted vs. actual for 6 hours ahead .82

Figure 5.1 Side effect of large validation window 88

Figure A.1 Main screen of the MNN tool 100

x


Figure 4.1.2 Elimination of incomplete records......................................................................65

Figure 4.1.3 Predicted vs. target - model 1............................................................................. 71

Figure 4.1.4 Predicted vs. target - model 2............................................................................. 72

Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods 74

Figure 4.1.6 Desired vs. predicted outputs.............................................................................. 74

Figure 4.2.1 Schematic of St. Louis East system.................................................................... 77

Figure 4.2.2 Hourly flow during a day.....................................................................................78

Figure 4.2.3 Validated RMSE of 5 models for 24 hour period..............................................80

Figure 4.2.4 Test errors for MNN and single ANN for 24 hour period............................... 81

Figure 4.2.5 Predicted vs. actual for 24 hours ahead.............................................................. 82

Figure 4.2.6 Predicted vs. actual for 6 hours ahead................................................................ 82

Figure 5.1 Side effect of large validation window................................................................. 88

Figure A .l Main screen of the MNN tool.............................................................................. 100


X

Chapter 1

Introduction

1.1 Motivation and Research Objectives

Forecasting is a key element of decision-making. The effectiveness of a decision often

depends heavily on events that occur after the decision. Therefore, the ability to

accurately predict the uncontrollable aspects of these events should improve the choice

that the decision-maker makes.

Time series modeling is a quantitative forecasting method. A time series is a

collection of observations made sequentially in time [Ch75]. In time series forecasting,

historical data is analyzed to identify common data patterns and develop a model that can

later be used for prediction of future values. Time series modeling have been applied in

various areas of business, engineering and science. Our study focuses on univariate time

series forecasting where the input variables are delay lags of the outputs.

The neural network technique was chosen for this research mainly because it is free

from a priori selection of mathematical models while Box-Jenkins, one of the most

widely used statistical techniques for forecasting, requires it. Model selection involves

examining various graphs based on the transformed data to try to identify potential

mathematical models that might provide a good fit to the data. Other advantages of neural

networks include the ability to learn from examples, the ability to capture non-linear

structure, their parallel computations, and their fault tolerance via redundant information

coding.

1


Chapter 1

Introduction

1.1 Motivation and Research Objectives

Forecasting is a key element of decision-making. The effectiveness of a decision often

depends heavily on events that occur after the decision. Therefore, the ability to

accurately predict the uncontrollable aspects of these events should improve the choice

that the decision-maker makes.

Time series modeling is a quantitative forecasting method. A time series is a

collection of observations made sequentially in time [Ch75]. In time series forecasting,

historical data is analyzed to identify common data patterns and develop a model that can

later be used for prediction of future values. Time series modeling have been applied in

various areas of business, engineering and science. Our study focuses on univariate time

series forecasting where the input variables are delay lags of the outputs.

The neural network technique was chosen for this research mainly because it is free

from a priori selection of mathematical models while Box-Jenkins, one of the most

widely used statistical techniques for forecasting, requires it. Model selection involves

examining various graphs based on the transformed data to try to identify potential

mathematical models that might provide a good fit to the data. Other advantages of neural

networks include the ability to learn from examples, the ability to capture non-linear

structure, their parallel computations, and their fault tolerance via redundant information

coding.

1


Neural computing is an area in artificial intelligence first developed in 1940s. It

suffered a period of detention in the early 1970s after some limitations of a simple

perceptron were found in 1969 [Wi92]. In the late 1970s, neural networks have received

considerable renewal of interests due to several improvements in network structures and

algorithms. The back-propagation algorithm utilized in this research thesis is one

paradigm developed in this period.

There have been a reasonable number of successful neural network applications on

time series forecasting, e.g. financial application in [Wa01] or industrial application in

[DSMV01]. It will be observed in this thesis whether neural network technique is suitable

for the applications of petroleum production and gas consumption prediction.

Both the petroleum production and gas consumption prediction applications require

prediction of multiple units ahead. During the development process, we found that

applying recursively a neural network that makes one-step-ahead forecast is not sufficient

for long-term or multiple-step-ahead forecasts. The short-term and long-term trends of a

time series are often different from each other. If we use only one short-term neural

network recursively to predict long term, the results could be very inaccurate. Therefore,

we propose a multiple neural network model using short-term and long-term neural

networks combined together to estimate for a wide range of prediction terms.

A multiple neural network (MNN) is a group of neural networks, each of which is

trained for the purpose of predicting different terms. The ultimate goal of this

combination is to see if we can improve the accuracy of long-term forecasts.

In summary, the objectives of this research include:

2


Neural computing is an area in artificial intelligence first developed in 1940s. It

suffered a period of detention in the early 1970s after some limitations of a simple

perceptron were found in 1969 [Wi92]. In the late 1970s, neural networks have received

considerable renewal of interests due to several improvements in network structures and

algorithms. The back-propagation algorithm utilized in this research thesis is one

paradigm developed in this period.

There have been a reasonable number of successful neural network applications on

time series forecasting, e.g. financial application in [WaOl] or industrial application in

[DSMV01]. It will be observed in this thesis whether neural network technique is suitable

for the applications of petroleum production and gas consumption prediction.

Both the petroleum production and gas consumption prediction applications require

prediction of multiple units ahead. During the development process, we found that

applying recursively a neural network that makes one-step-ahead forecast is not sufficient

for long-term or multiple-step-ahead forecasts. The short-term and long-term trends of a

time series are often different from each other. If we use only one short-term neural

network recursively to predict long term, the results could be very inaccurate. Therefore,

we propose a multiple neural network model using short-term and long-term neural

networks combined together to estimate for a wide range of prediction terms.

A multiple neural network (MNN) is a group of neural networks, each of which is

trained for the purpose of predicting different terms. The ultimate goal of this

combination is to see if we can improve the accuracy of long-term forecasts.

In summary, the objectives of this research include:

2


• Investigating the feasibility of using feed-forward neural networks and time series

modeling in two applications to forecast petroleum production and gas consumption,

and

• Investigating whether grouping neural networks into a model improves the forecast

performance of a single neural network in long-term forecasting.

1.2 Thesis Structure

This thesis consists of six chapters. Chapter 1 gives a brief introduction of the thesis.

Chapter 2 provides an overview of forecasting in general and time-series forecasting in

particular. Chapter 3 presents the fundamentals of the artificial neural network technique

and reviews existing multiple neural network approaches in the literatures. Chapter 4

contains details of the two case studies of developing neural network applications in

prediction of petroleum production and gas consumption. Chapter 5 provides some

discussions based on the case studies in chapter 4. Chapter 6 draws some conclusions for

this thesis and gives recommendations for further research work.

3


• Investigating the feasibility of using feed-forward neural networks and time series

modeling in two applications to forecast petroleum production and gas consumption,

and

• Investigating whether grouping neural networks into a model improves the forecast

performance of a single neural network in long-term forecasting.

1.2 Thesis Structure

This thesis consists of six chapters. Chapter 1 gives a brief introduction of the thesis.

Chapter 2 provides an overview of forecasting in general and time-series forecasting in

particular. Chapter 3 presents the fundamentals of the artificial neural network technique

and reviews existing multiple neural network approaches in the literatures. Chapter 4

contains details of the two case studies of developing neural network applications in

prediction of petroleum production and gas consumption. Chapter 5 provides some

discussions based on the case studies in chapter 4. Chapter 6 draws some conclusions for

this thesis and gives recommendations for further research work.


3

Chapter 2

Background on Forecasting and Time Series Forecasting

This chapter provides background literature on forecasting and time series forecasting.

Section 2.1 gives an overview on a framework for and a classification of forecasting

systems, and reviews some forecasting applications. Section 2.2 focuses on a type of

forecasting called time series modeling. Several time series forecasting techniques will be

reviewed and discussed.

2.1 Overview of Forecasting

According to [CNO3], forecasting and prediction belong to a sub-category within the

general taxonomy of tasks and share the objective of foretelling future events. They

involve making decisions when one does not know with certainty the effect of those

decisions due to for example, randomness of future events. Often, prediction makes use

of past and current data with known values to assign explicit values on some unknown or

future data. When expert opinion or heuristics are combined with historical data, it is

called forecasting. For example, a forecast or predictive model can be built using the

payment history of people to whom you have given loans to help identify people who are

likely to default on loans. In this study, the terms forecasting and prediction are used

interchangeably.

2.1.1 Forecasting System Framework

In general, development of a forecasting system consists of two main phases: modeling

(or training) and forecasting (or transfer) [121.195] [Pan At the modeling phase, a

4


Chapter 2

Background on Forecasting and Time Series Forecasting

This chapter provides background literature on forecasting and time series forecasting.

Section 2.1 gives an overview on a framework for and a classification of forecasting

systems, and reviews some forecasting applications. Section 2.2 focuses on a type of

forecasting called time series modeling. Several time series forecasting techniques will be

reviewed and discussed.

2.1 Overview of Forecasting

According to [CN03], forecasting and prediction belong to a sub-category within the

general taxonomy of tasks and share the objective of foretelling future events. They

involve making decisions when one does not know with certainty the effect of those

decisions due to for example, randomness of future events. Often, prediction makes use

of past and current data with known values to assign explicit values on some unknown or

future data. When expert opinion or heuristics are combined with historical data, it is

called forecasting. For example, a forecast or predictive model can be built using the

payment history of people to whom you have given loans to help identify people who are

likely to default on loans. In this study, the terms forecasting and prediction are used

interchangeably.

2.1.1 Forecasting System Framework

In general, development of a forecasting system consists of two main phases: modeling

(or training) and forecasting (or transfer) [Ru95] [Po89]. At the modeling phase, a

4


forecasting model is constructed from available data and theory. In some cases, a theory

exists that can suggest particular models. In most situations, however, an empirical model

is built from historical data. At the forecasting phase, the model is used to forecast. The

stability of the forecasting model can be assessed by checking the forecasts against

observations. If forecast errors are high, it is possible that the forecast environment is

different from the model development environment. In this case, adaptation of the model

to the new situation is needed. The modeling and forecasting phases may not be explicitly

separated. They can in fact be combined by presenting unfamiliar stimuli at several

points during the training phase so that the model's knowledge of the patterns is tested as

it progresses in learning [Po89]. If the model is unsatisfactory, it has to be re-specified,

tested again and so on until an adequate model is found. Figure 2.1 illustrates the

conceptual framework of a forecasting system.

Modeling Phase

• Theory and/or Previous study

•

Forecasting Phase

• • •

Model Specification

A

C Data_

Model

Estimation

0

Yes

Forecast Generation

New Observation

0

Forecast Generation

Figure 2.1 Conceptual framework of a forecasting system [Ru95]

One approach for building a forecasting model is to use past data to construct a

function that can be used to make predictions under very general circumstances [Ru95].

5


forecasting model is constructed from available data and theory. In some cases, a theory

exists that can suggest particular models. In most situations, however, an empirical model

is built from historical data. At the forecasting phase, the model is used to forecast. The

stability of the forecasting model can be assessed by checking the forecasts against

observations. If forecast errors are high, it is possible that the forecast environment is

different from the model development environment. In this case, adaptation of the model

to the new situation is needed. The modeling and forecasting phases may not be explicitly

separated. They can in fact be combined by presenting unfamiliar stimuli at several

points during the training phase so that the model’s knowledge of the patterns is tested as

it progresses in learning [Po89]. If the model is unsatisfactory, it has to be re-specified,

tested again and so on until an adequate model is found. Figure 2.1 illustrates the

conceptual framework of a forecasting system.

Modeling Phase Forecasting Phase

No No

Adequate Stable

Yes Yes

D a ta

Estimation

ModelModelSpecification

ForecastGeneration

ForecastGeneration

NewObservation

Theory and/or Previous studv

Figure 2.1 Conceptual framework of a forecasting system [Ru95]

One approach for building a forecasting model is to use past data to construct a

function that can be used to make predictions under very general circumstances [Ru95],

5


However, this approach cannot always be carried out in practice. In some cases, the

underlying principles are unknown or poorly understood because the system of interest is

very complicated. Another problem with this approach is that even when the basic laws

are known, it is often not possible to forecast without detailed information about initial

values and boundary conditions. Forecasting models are often based on an assumption

that a well-defined relationship exists between the past and future values of a single

observable [Ru95].

2.1.2 Forecasting Applications

Forecasting is a key element of management decision-making. Since the ultimate

effectiveness of any decision depends upon a sequence of events following the decision,

the ability to predict the uncontrollable aspects of these events prior to making the

decision should permit an improved choice over that which would otherwise be made

[MJG90].

Examples of situations where forecasts are useful are production planning, financial

planning, staff scheduling and facilities planning. To plan the manufacture of a product

line, it could be necessary to forecast unit sales for each item by delivery period in future.

This forecast then can be converted to the requirements of materials, labor, facilities etc,

so that the entire manufacturing system can be scheduled and the required investment can

be justified. Forecasting also plays an important part in process control. By simulating the

future behavior of a process, it may be possible to determine the optimal time and the

level of control actions.

Forecasting technology, principles and application have been developed in many

different fields such as economics, meteorology, environmental management and control.

6


However, this approach cannot always be carried out in practice. In some cases, the

underlying principles are unknown or poorly understood because the system of interest is

very complicated. Another problem with this approach is that even when the basic laws

are known, it is often not possible to forecast without detailed information about initial

values and boundary conditions. Forecasting models are often based on an assumption

that a well-defined relationship exists between the past and future values of a single

observable [Ru95],

2.1.2 Forecasting Applications

Forecasting is a key element of management decision-making. Since the ultimate

effectiveness of any decision depends upon a sequence of events following the decision,

the ability to predict the uncontrollable aspects of these events prior to making the

decision should permit an improved choice over that which would otherwise be made

[MJG90],

Examples of situations where forecasts are useful are production planning, financial

planning, staff scheduling and facilities planning. To plan the manufacture of a product

line, it could be necessary to forecast unit sales for each item by delivery period in future.

This forecast then can be converted to the requirements of materials, labor, facilities etc,

so that the entire manufacturing system can be scheduled and the required investment can

be justified. Forecasting also plays an important part in process control. By simulating the

future behavior of a process, it may be possible to determine the optimal time and the

level of control actions.

Forecasting technology, principles and application have been developed in many

different fields such as economics, meteorology, environmental management and control.

6


Some sample applications are listed as follows. Wu and Lu [WL93] forecasted the trend

of stock market performance. Tangang et al [THT97] used neural networks to forecast the

sea surface temperatures of the equatorial Pacific. Gardner and Dorling [GD99], Boznar

et al. [BLM93] and Yi and Prybutok [YP92] modeled and predicted short-term air

concentration and ozone concentration based on basic meteorological data. In Kao and

Huang [KHOO], a model was developed relating peak pollutant concentrations to

meteorological and emission variables and indices. Guhathakurta et al. [GRT99] and

Shahai et al. [SSS00] made forecasts on Indian summer monsoon rainfall crucial for

proper agriculture planning. Yasdi [Ya99] predicted daily road traffic flow in an effort to

assist the traffic control center. Swiercz et al. [SMKLS00] made predictions on

intracranial pressure, which provided valuable information on the condition of

neurosurgical patients. Utility demand forecasts are discussed in Lertpalangsunti and

Chan [LC98], Lertpalangsunti et al. [LCMT99] and Chiu et al. [CLC97].

2.1.3 Classification of Forecasting Models

Forecasting models can be broadly classified as qualitative or quantitative, depending

upon the extent to which mathematical and statistical methods are used. Quantitative

models belong to either time series or causal categories. Figure 2.2 illustrates a broad

classification of forecasting methods described in [0d83]. Each of the methods is

discussed as follows.

7


Some sample applications are listed as follows. Wu and Lu [WL93] forecasted the trend

of stock market performance. Tangang et al [THT97] used neural networks to forecast the

sea surface temperatures of the equatorial Pacific. Gardner and Dorling [GD99], Boznar

et al. [BLM93] and Yi and Prybutok [YP92] modeled and predicted short-term air

concentration and ozone concentration based on basic meteorological data. In Kao and

Huang [KHOO], a model was developed relating peak pollutant concentrations to

meteorological and emission variables and indices. Guhathakurta et al. [GRT99] and

Shahai et al. [SSSOO] made forecasts on Indian summer monsoon rainfall crucial for

proper agriculture planning. Yasdi [Ya99] predicted daily road traffic flow in an effort to

assist the traffic control center. Swiercz et al. [SMKLSOO] made predictions on

intracranial pressure, which provided valuable information on the condition of

neurosurgical patients. Utility demand forecasts are discussed in Lertpalangsunti and

Chan [LC98], Lertpalangsunti et al. [LCMT99] and Chiu et al. [CLC97],

2.1.3 Classification of Forecasting Models

Forecasting models can be broadly classified as qualitative or quantitative, depending

upon the extent to which mathematical and statistical methods are used. Quantitative

models belong to either time series or causal categories. Figure 2.2 illustrates a broad

classification of forecasting methods described in [Od83]. Each of the methods is

discussed as follows.


Forecasting Methods

Qualitative M thods

uantitative Methods

Time Series Causal

Figure 2.2 A broad classification of forecasting methods

• Qualitative forecasting methods generally use the intuitive opinions of experts to

predict future events subjectively. These opinions may or may not depend on past

data that belong to this category. Usually someone else cannot reproduce these

forecasts because the forecaster does not specify explicitly how the available

information is incorporated into the forecast.

• Quantitative forecasting methods are based on mathematical or statistical models.

They involve the analysis of historical data in an attempt to predict future values of a

variable of interest. Once the underlying model has been chosen, the future forecasts

are determined automatically; they are fully reproducible by any forecaster. Basically,

quantitative forecasting models fall into two fairly well defined categories: the time

series model and the explanatory or causal model.

8


Time Series

QualitativeM ethods

QuantitativeMethods

Forecasting

Figure 2.2 A broad classification of forecasting methods

• Qualitative forecasting methods generally use the intuitive opinions of experts to

predict future events subjectively. These opinions may or may not depend on past

data that belong to this category. Usually someone else cannot reproduce these

forecasts because the forecaster does not specify explicitly how the available

information is incorporated into the forecast.

• Quantitative forecasting methods are based on mathematical or statistical models.

They involve the analysis of historical data in an attempt to predict future values of a

variable of interest. Once the underlying model has been chosen, the future forecasts

are determined automatically; they are fully reproducible by any forecaster. Basically,

quantitative forecasting models fall into two fairly well defined categories: the time

series model and the explanatory or causal model.

8


• In time series models, historical data on the predicted variable are analyzed in an

attempt to identify a data pattern. Then assuming that it will continue in the future,

this pattern is extrapolated to produce a forecast.

• Causal models relate the dependent variables to a number of independent variables.

After a model that describes the relationship between these variables has been

developed, it can be used to forecast the values of the dependent variables of interest.

The empirical evidence reported suggests that causal models do not provide

significantly more accurate forecasts than the time series models, even though the former

are more complex and expensive [HG93].

2.1.4 Selection of a Forecasting Model

The following are some of the main considerations in choosing a forecasting model

[Ru95].

• Required degree of accuracy

• Forecasting horizon

• Forecasting cost

• Degree of complexity

• Availability of data

Some techniques are better than others in making short-term or long-term forecasts.

Hence the forecasting horizon should be taken into consideration when the forecasting

techniques are determined.

In some cases only coarse forecasts are required, in others highly accurate forecasts

are essential. The degree of accuracy depends on the consequence of making wrong

forecasts.

9


• In time series models, historical data on the predicted variable are analyzed in an

attempt to identify a data pattern. Then assuming that it will continue in the future,

this pattern is extrapolated to produce a forecast.

• Causal models relate the dependent variables to a number of independent variables.

After a model that describes the relationship between these variables has been

developed, it can be used to forecast the values of the dependent variables of interest.

The empirical evidence reported suggests that causal models do not provide

significantly more accurate forecasts than the time series models, even though the former

are more complex and expensive [HG93].

2.1.4 Selection of a Forecasting Model

The following are some of the main considerations in choosing a forecasting model

[Ru95],

• Required degree of accuracy

• Forecasting horizon

• Forecasting cost

• Degree of complexity

• Availability of data

Some techniques are better than others in making short-term or long-term forecasts.

Hence the forecasting horizon should be taken into consideration when the forecasting

techniques are determined.

In some cases only coarse forecasts are required, in others highly accurate forecasts

are essential. The degree of accuracy depends on the consequence of making wrong

forecasts.

9


The purpose of forecasting is to reduce the risk in decision-making. Forecasts are

usually erroneous, but the magnitude of the errors depends upon the forecasting system

used. By investing more resources to forecasting, the forecasting accuracy may be

improved and thereby eliminate some of the loss due to uncertainty in the decision

making process. The optimal situation occurs when the total cost of the resource used for

forecasting and the loss due to bad forecasting is minimal.

However, additional resources devoted to forecasting do not always bring any

improvement in accuracy. In the case where two models give similar good results, the

less complex model should be chosen [KR94].

To construct accurate empirical forecasting models, suitable data should be available.

However, it is not always possible to obtain the necessary data with reasonable cost.

2.2 Overview of Time Series and Time Series

Forecasting

2.2.1 Time Series

A time series is a collection of observations made sequentially in time [Ch75]. In formal

terms, a time series is a sequence of vectors, depending on time t [Do96]:

At), t = 0,1,...

The components of the vectors can be any observable variable, such as the

temperature of a building, the total monthly production of an oil well, the gas

consumption in a given area, or the population of a certain country. Strictly speaking, the

time index t must be a non-negative integer but this restriction can be relaxed sometimes

and in some literature, t can have negative values.

10


The purpose of forecasting is to reduce the risk in decision-making. Forecasts are

usually erroneous, but the magnitude of the errors depends upon the forecasting system

used. By investing more resources to forecasting, the forecasting accuracy may be

improved and thereby eliminate some of the loss due to uncertainty in the decision

making process. The optimal situation occurs when the total cost of the resource used for

forecasting and the loss due to bad forecasting is minimal.

However, additional resources devoted to forecasting do not always bring any

improvement in accuracy. In the case where two models give similar good results, the

less complex model should be chosen [KR94].

To construct accurate empirical forecasting models, suitable data should be available.

However, it is not always possible to obtain the necessary data with reasonable cost.

2.2 Overview of Time Series and Time Series

Forecasting

2.2.1 Time Series

A time series is a collection of observations made sequentially in time [Ch75], In formal

terms, a time series is a sequence of vectors, depending on time t [Do96]:

x(t),t = 0,1,...

The components of the vectors can be any observable variable, such as the

temperature of a building, the total monthly production of an oil well, the gas

consumption in a given area, or the population of a certain country. Strictly speaking, the

time index t must be a non-negative integer but this restriction can be relaxed sometimes

and in some literature, t can have negative values.

10


A time series is said to be continuous when observations are made continuously in

time and discrete when observations are taken only at specific times, usually equally

spaced. Discrete time series can arise in several ways. One way is to sample from a

continuous time series at usually equal intervals of time. The result is called a sampled

time series. Another type of discrete series occurs when a variable does not have an

instantaneous value but we can aggregate or accumulate the values over equal intervals of

time.

Time series forecasting consists of estimating the unknown parameters in the

appropriate model and using these estimated parameters, projecting the model into the

future to obtain a forecast [MJG90]. Suppose .ic (t) , t = 0,1,...n is an observed time series.

The problem is to estimate xn±q. The prediction of xn" made at time n of the value q steps

ahead will be denoted as :i(n,q). The integer q is called the lead time.

To forecast time series, it is necessary to represent the behavior of the process with a

mathematical model that can be extended into the future. It is required that the model be a

good representation of the observations in any local segment of the time close to the

present.

If a time series can be predicted exactly, it is said to be deterministic. But most real-

world time series are stochastic in the sense that the future is only partly determined by

past values. Exact predictions are impossible for such time series. Unknown and

uncontrollable factors called noise account for the errors. In some study, the

characteristics of the noise are assumed in order to include noise in the modeling process.

The forecasting period is the basic unit of time for which the forecasts are made. For

example, when forecasts are made every week, the period is a week.

11


A time series is said to be continuous when observations are made continuously in

time and discrete when observations are taken only at specific times, usually equally

spaced. Discrete time series can arise in several ways. One way is to sample from a

continuous time series at usually equal intervals of time. The result is called a sampled

time series. Another type of discrete series occurs when a variable does not have an

instantaneous value but we can aggregate or accumulate the values over equal intervals of

time.

Time series forecasting consists of estimating the unknown parameters in the

appropriate model and using these estimated parameters, projecting the model into the

future to obtain a forecast [MJG90]. Suppose x(t),t = 0,1,...n is an observed time series.

The problem is to estimate xn+q. The prediction of xn+q made at time n of the value q steps

ahead will be denoted as x (n ,q ) . The integer q is called the lead time.

To forecast time series, it is necessary to represent the behavior of the process with a

mathematical model that can be extended into the future. It is required that the model be a

good representation of the observations in any local segment of the time close to the

present.

If a time series can be predicted exactly, it is said to be deterministic. But most real-

world time series are stochastic in the sense that the future is only partly determined by

past values. Exact predictions are impossible for such time series. Unknown and

uncontrollable factors called noise account for the errors. In some study, the

characteristics of the noise are assumed in order to include noise in the modeling process.

The forecasting period is the basic unit of time for which the forecasts are made. For

example, when forecasts are made every week, the period is a week.

11


The forecasting horizon is the number of periods in the future covered by the forecast.

When a forecast is required for the next 10 weeks, broken down by week, the period is a

week and the horizon is 10 weeks. Forecasts typically become less accurate with

increasing forecast horizon. Sometime the term lead time is used in place of forecast

horizon.

The forecasting interval is the frequency with which new forecast are prepared. In

most cases the forecasting interval is the same as forecasting period.

2.2.2 Decomposition of Time Series

Generally speaking, every real-life time series have fluctuations. The fluctuations in data

are caused by many diverse and complex factors. Decomposition is a basic method to

analyze a time series, which attempts to group these factors into categories. A time series

is usually regarded as the combination of four meaningful components:

1. Trend component

2. Cyclical component

3. Seasonal component

4. Irregular or random component

Trend refers to the general direction in which the plot of a time series appears to be

rising or falling over a long period of time. Trend is a result of factors that produce a

steady and gradual change over time. A linear trend can be removed to form a time series

by replacing it with a series x' consisting of the differences between subsequent

values.

.V(t) = — .X(t —1)

12


The forecasting horizon is the number of periods in the future covered by the forecast.

When a forecast is required for the next 10 weeks, broken down by week, the period is a

week and the horizon is 10 weeks. Forecasts typically become less accurate with

increasing forecast horizon. Sometime the term lead time is used in place of forecast

horizon.

The forecasting interval is the frequency with which new forecast are prepared. In

most cases the forecasting interval is the same as forecasting period.

2.2.2 Decomposition of Time Series

Generally speaking, every real-life time series have fluctuations. The fluctuations in data

are caused by many diverse and complex factors. Decomposition is a basic method to

analyze a time series, which attempts to group these factors into categories. A time series

is usually regarded as the combination of four meaningful components:

1. Trend component

2. Cyclical component

3. Seasonal component

4. Irregular or random component

Trend refers to the general direction in which the plot of a time series appears to be

rising or falling over a long period of time. Trend is a result of factors that produce a

steady and gradual change over time. A linear trend can be removed to form a time series

x by replacing it with a series x consisting of the differences between subsequent

values.

x'(t) = x(t) - x(t - 1)

12


Figure 2.3 shows the monthly production of basic iron from 1956 to 1995 in Australia

(source: http://www-personal.buseco.monash.edu.au/—hyndman/TSDL/production.html,

file BASIRON.DAT, downloaded April 2002). The time series exhibits a close-to-linear

rising trend.

Australian monthly production of basic iron 1956-1995

800 0 473 600

400 0. c 200 2

0 co Lo cn C I 0) CO

N CO 0 N. CO In

•Zr CO C.0 01 N- 0

months

Figure 2.3 Australian monthly production of basic iron

After removing the trend effect, the plot fluctuates around the X-axis as shown in

Figure 2.4.

Figure 2.4 The time series from the previous figure after removing trend effect

While the trend is moving slowly upward or downward, there are oscillations or

fluctuations in a wave-like manner above and below the long-term trend line. These

wave-like cycles are called cyclical effects. These cycles tend to be recurrent but not

13


Figure 2.3 shows the monthly production of basic iron from 1956 to 1995 in Australia

(source: http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/production.html,

file BASIRON.DAT, downloaded April 2002). The time series exhibits a close-to-linear

rising trend.

Australian monthly production of basic iron 1956-1995

c 800 1I 600 j| 400 |

c 200 I

months

Figure 2.3 Australian monthly production of basic iron

After removing the trend effect, the plot fluctuates around the X-axis as shown in

Figure 2.4.

Trend effect removed

month

Figure 2.4 The time series from the previous figure after removing trend effect

While the trend is moving slowly upward or downward, there are oscillations or

fluctuations in a wave-like manner above and below the long-term trend line. These

wave-like cycles are called cyclical effects. These cycles tend to be recurrent but not

13


http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/production.html

periodic, i.e. they may or may not follow exactly the same pattern after equal intervals of

time. An important example of cyclical effect in a time series is the business cycles that

represent intervals of prosperity, recession, depression and recovery.

Seasonal effect is not only recurrent but also periodic, and therefore predictable.

Seasonal component refers to the identical or almost identical patterns that a time series

appears to follow during corresponding months of successive years. Such movements are

due to periodic influencing factors e.g. Christmas holiday influences gift sales. Seasonal

effect is also easy to eliminate by computing the differences between corresponding

sequence elements:

.X./(t) = i'(t)—.i'(t — s)

where s is the seasonal period. For example, the water consumption in a customer area

show similar patterns in corresponding days of the week, and s is 7 in this case.

0.00E+00

-2.00E-01

-4.00E-01

-6.00E-01

Figure 2.5 A time series with seasonal effect

All other kinds of fluctuations are grouped into a category called irregular or random

component. This refers to the erratic motions of time series due to the unusual and

unpredictable events, such as natural disasters or wars. Although the duration of the

irregulars is usually short, it may be severe in amplitude.

14


periodic, i.e. they may or may not follow exactly the same pattern after equal intervals of

time. An important example of cyclical effect in a time series is the business cycles that

represent intervals of prosperity, recession, depression and recovery.

Seasonal effect is not only recurrent but also periodic, and therefore predictable.

Seasonal component refers to the identical or almost identical patterns that a time series

appears to follow during corresponding months of successive years. Such movements are

due to periodic influencing factors e.g. Christmas holiday influences gift sales. Seasonal

effect is also easy to eliminate by computing the differences between corresponding

sequence elements:

x '(t) - x ( t) - x{t - s)

where s is the seasonal period. For example, the water consumption in a customer area

show similar patterns in corresponding days of the week, and s is 7 in this case.

6.00E-01

4.00E-01 -

2.00E-01 -

0.00E+00

-2.00E-01

-4.00E-01

-6.00E-01

Figure 2.5 A time series with seasonal effect

All other kinds of fluctuations are grouped into a category called irregular or random

component. This refers to the erratic motions of time series due to the unusual and

unpredictable events, such as natural disasters or wars. Although the duration of the

irregulars is usually short, it may be severe in amplitude.

14


A time series may contain none or any possible combination of the four components.

The analysis of a time series consists of a description of the components present. There

are many ways to formulate a model of a time series. The two most common

mathematical models are:

• Additive Model: Y = T + C + S + R, and

• Multiplicative Model: Y=T x C x S x R,

where: T: the trend component

C: the cyclical component

S: the seasonal component

R: the random component

In practice, mixtures of multiplicative and addictive are also possible. The additive

model can be easier to handle but the multiplicative model may often be more

appropriate. In practice the decision as to which method of decomposition should be

assumed, depends on the degree of success achieved in applying the assumption [Ru95].

A multiplicative model may be handled within the additive framework by taking

logarithms of the components. The seasonal effect in Figure 2.4 is multiplicative.

Distinguishing between the components is usually not easy. Often the components are

so integrated that they are inseparable [Ru95].

The trend, cyclical and seasonal components are considered deterministic while the

random component is at best probabilistic. Accurate forecast of future values can be

expected only when the random variation, as measured by its variance, is small.

Otherwise, the fluctuations of the random variation over time may overwhelm the effect

of the other components or even cancel them out entirely [Ru95].

15


A time series may contain none or any possible combination of the four components.

The analysis of a time series consists of a description of the components present. There

are many ways to formulate a model of a time series. The two most common

mathematical models are:

• Additive Model: Y = T + C + S + R, and

• Multiplicative Model: Y = T x C x S x R ,

where: T: the trend component

C: the cyclical component

S: the seasonal component

R: the random component

In practice, mixtures of multiplicative and addictive are also possible. The additive

model can be easier to handle but the multiplicative model may often be more

appropriate. In practice the decision as to which method of decomposition should be

assumed, depends on the degree of success achieved in applying the assumption [Ru95].

A multiplicative model may be handled within the additive framework by taking

logarithms of the components. The seasonal effect in Figure 2.4 is multiplicative.

Distinguishing between the components is usually not easy. Often the components are

so integrated that they are inseparable [Ru95].

The trend, cyclical and seasonal components are considered deterministic while the

random component is at best probabilistic. Accurate forecast of future values can be

expected only when the random variation, as measured by its variance, is small.

Otherwise, the fluctuations of the random variation over time may overwhelm the effect

of the other components or even cancel them out entirely [Ru95].

15


2.2.3 Performance Criteria

There are many different ways to define prediction error, each of them has advantages

and disadvantages and is used in different circumstances. Generally speaking, the closer

the forecasts yt are to the actual values yt of the series, the more accurate the forecasting

model is.

The most fundamental way to measure error is to calculate the difference between

actual and forecast values. The result is called absolute true error (ATE).

ATE = yt - 57t

The weakness of ATE is that it does not give any idea of how serious the error is relative

to the magnitude of the variable to be predicted. For example, an error of a meter is

unlikely to be a problem when estimating the dimensions of a wheat field, but it could be

significant when estimating the dimensions of a table.

Relative true error (RTE) gives us an idea of the magnitude of error. It is defined as a

fraction between an absolute true error and an actual value.

RTE = (yt - ST) yt

One drawback of using RTE arises when the actual value is extremely small, since the

division by this value will tend to seriously inflate RTE.

Based on the above two basic error measurements, the following are the most

commonly used measures of forecast accuracy.

• Mean Absolute Error (MAE) is defined as the average of the magnitudes of the

absolute true errors.

11 x---1MAE = — LlYt — Yil

n t=1

16


2.2.3 Performance Criteria

There are many different ways to define prediction error, each of them has advantages

and disadvantages and is used in different circumstances. Generally speaking, the closer

the forecasts yt are to the actual values yt of the series, the more accurate the forecasting

model is.

The most fundamental way to measure error is to calculate the difference between

actual and forecast values. The result is called absolute true error (ATE).

ATE = yt - yt

The weakness of ATE is that it does not give any idea of how serious the error is relative

to the magnitude of the variable to be predicted. For example, an error of a meter is

unlikely to be a problem when estimating the dimensions of a wheat field, but it could be

significant when estimating the dimensions of a table.

Relative true error (RTE) gives us an idea of the magnitude of error. It is defined as a

fraction between an absolute true error and an actual value.

RTE = (yt - yt) / yt

One drawback of using RTE arises when the actual value is extremely small, since the

division by this value will tend to seriously inflate RTE.

Based on the above two basic error measurements, the following are the most

commonly used measures of forecast accuracy.

• Mean Absolute Error (MAE) is defined as the average of the magnitudes of the

absolute true errors.

MAE = - ^ \ y , - y , \n ,=i

16


• Mean Absolute Percentage Error (MAPE) is defined as the average of the magnitudes

of the relative true error

1 x-In MAPE =— L

n

Yr - 9 t

Yt (100%)

• Mean Square Error (MSE) is defined as the mean square of any residual

MSE =— 2 i (y, — Yr) 2n 1.1

• Root Mean Square Error (RMSE) is defined as the positive square root of the mean

square error. It is also called the standard error of estimate.

RMSE = ÷ 9,)2n

The basic difference between MAE and MSE (or RMSE) is that the latter squares the

amount of the ATE and by doing so it penalizes large errors more heavily than the former

does. Thus, MAE is an appropriate measure of forecast accuracy when the costs of

forecast errors increase linearly with the size of error while the MSE and RMSE are

better if costs for large error are expensive [Ru95].

Whereas the MAE, MSE and RMSE have dimensions, the MAPE is unit-less.

Therefore, it is particularly useful for comparing the performance of a model on many

different time series. However, due to the drawback of the RTE component in the MAPE

equation, it is not advisable to use MAPE in the circumstances where a series has

extremely small terms.

17


• Mean Absolute Percentage Error (MAPE) is defined as the average of the magnitudes

of the relative true error

• Mean Square Error (MSE) is defined as the mean square of any residual

M S E = - j ^ ( y , - y , ) 1 n ,=i

• Root Mean Square Error (RMSE) is defined as the positive square root of the mean

square error. It is also called the standard error of estimate.

The basic difference between MAE and MSE (or RMSE) is that the latter squares the

amount of the ATE and by doing so it penalizes large errors more heavily than the former

does. Thus, MAE is an appropriate measure of forecast accuracy when the costs of

forecast errors increase linearly with the size of error while the MSE and RMSE are

better if costs for large error are expensive [Ru95].

Whereas the MAE, MSE and RMSE have dimensions, the MAPE is unit-less.

Therefore, it is particularly useful for comparing the performance of a model on many

different time series. However, due to the drawback of the RTE component in the MAPE

equation, it is not advisable to use MAPE in the circumstances where a series has

extremely small terms.

MAPE = - Y Jt (100%)n ,=i y t

RMSE

17


2.2.4 Time Series Forecast Techniques

There is no technique that can be applied in any situation. A few of the widely used

forecasting techniques for univariate time series are outlined below

[0d93][Ch75][CNO3].

2.2.4.1 Simple Exponential Smoothing

Smoothing techniques remove random variation and shows trends and cyclic

components. Simple exponential smoothing method can be applied only to stationary

time series, that is time series with trend and seasonal effects removed. In this technique,

a weighted average of recently observed values of the variable of interest is used as a

forecast.

i(n,l) = co xn + + c2xn _2 +... (2.1)

Weights { ci from the most recent to the older values are calculated as exponentially

decreasing values. Weights are expressed as below in order to have a total sum of one.

ci = i = 0,1,...

where a is a constant in the open range (0,1). The (2.1) becomes

= ax„ + a(1— a)xn_i + a(1— a) 2 X n-2 ± • • • (2.2)

or

18


2.2.4 Time Series Forecast Techniques

There is no technique that can be applied in any situation. A few of the widely used

forecasting techniques for univariate time series are outlined below

[Od93] [Ch75] [CN03].

2.2.4.1 Simple Exponential Smoothing

Smoothing techniques remove random variation and shows trends and cyclic

components. Simple exponential smoothing method can be applied only to stationary

time series, that is time series with trend and seasonal effects removed. In this technique,

a weighted average of recently observed values of the variable of interest is used as a

forecast.

x(n, 1) = c0xn + cxxn_x + c2xn_ 2 +... (2.1)

Weights {ci} from the most recent to the older values are calculated as exponentially

decreasing values. Weights are expressed as below in order to have a total sum of one.

Ci= a (l-a ) \ i = 0,1,...

where a is a constant in the open range (0,1). The (2.1) becomes

x(n, 1) = axn + a ( 1 - a )x n_x + a ( 1 - a ) 2 xn_2 +... (2.2)

or

18


i(n,l) = °Lyn + (1— + a(1—a)x„_2 + ..1= can + (1— a)Sc(n —1,1) (2.3)

If we set .Z(1,1) = x1 , then equation (2.3) can be used recursively to compute forecasts. A

forecast is calculated based on the latest observation and the previous forecast. The

choice of a is made to minimize the MSE on past data.

This method takes little time to develop, requires minimal amount of data, and is

easily understood by users. It is fully automatic. There is no need for expert opinion in

developing such a model. Simple exponential smoothing is widely used in immediate and

short-term forecasting because this method is fast and relatively inexpensive.

2.2.4.2 Holt-Winters

This method is a more sophisticated and generalized version of exponential smoothing in

which allowance is made for trend and seasonal patterns in the data. The Holt-Winters

method has three updating equations to smooth three components: level or overall, trend

and seasonal effect. The equations are intended to give more weight to recent

observations and less weight to observations further in the past. These weights are

geometrically decreasing by a constant ratio. Each equation has coefficients based on

constants that range from 0 to 1.

The sets of equations are different for addictive and multiplicative models. For

multiplicative model, the basic equations are as follows.

int = oactist_s + (1 - a)(mt_i + rt-1) Overall smoothing

st = afixt/mt + (1 - Ast-s Seasonal smoothing

rt = y(mt_i + mt_i)+ (1 - y)rt_i Trend smoothing

.5(t,h)= (mi + hrt )s,_s_h„ Forecast

19


x(n, 1) = axn + { l - a ) [coc^ + a( 1 - a)xn_2 +... ] = axn + (1 - a)x(n -1,1) (2.3)

If we set x(l,l) = xp then equation (2.3) can be used recursively to compute forecasts. A

forecast is calculated based on the latest observation and the previous forecast. The

choice of a is made to minimize the MSE on past data.

This method takes little time to develop, requires minimal amount of data, and is

easily understood by users. It is fully automatic. There is no need for expert opinion in

developing such a model. Simple exponential smoothing is widely used in immediate and

short-term forecasting because this method is fast and relatively inexpensive.

2.2.4.2 Holt-Winters

This method is a more sophisticated and generalized version of exponential smoothing in

which allowance is made for trend and seasonal patterns in the data. The Holt-Winters

method has three updating equations to smooth three components: level or overall, trend

and seasonal effect. The equations are intended to give more weight to recent

observations and less weight to observations further in the past. These weights are

geometrically decreasing by a constant ratio. Each equation has coefficients based on

constants that range from 0 to 1.

The sets of equations are different for addictive and multiplicative models. For

multiplicative model, the basic equations are as follows.

mt = ooc/sts + (1 - cc)(mt.i + rt.j) Overall smoothing

st = oc(3xt/m t + (1 - Seasonal smoothing

r, = y(mt4 + mt-i) + (1 - y)rt.i Trend smoothing

x(t, h) = (mt + hrt )st_s+h Forecast

19


where

• x is the observation

• s is the smoothed observation

• r is the trend factor

• S is the seasonal index, or the number of observations covered by the seasonal period

• ic(t , h) is the forecast at h periods ahead

• t is an index denoting a time period

• a, 13 and y are constants that must be estimated in such a way that the MSE of the

error is minimized

This method has all the advantages of simple exponential smoothing, but it tends to

be more accurate.

2.2.4.3 Univariate Box-Jenkins

In Box-Jenkins approach, a class of models referred to as auto-regressive integrated with

moving average (ARIMA) is examined and an appropriate model is selected for

forecasting. In auto-regressive process, each observation is made up of a random error

component and a linear combination of prior observations. In moving average process,

each observation is made up of a random error component and a linear combination of

previous random error components. These two processes are independent from each

other.

The Box-Jenkins modeling procedure involved five steps: data preparation, model

selection, parameter estimation, model validation, and forecasting.

Step 1: Data preparation involves transformations and differencing.

Transformations operations such as square roots or logarithms can stabilize the variance

20


where

• x is the observation

• s is the smoothed observation

• r is the trend factor

• S is the seasonal index, or the number of observations covered by the seasonal period

• x(t, h) is the forecast at h periods ahead

• t is an index denoting a time period

• a , P and y are constants that must be estimated in such a way that the MSE of the

error is minimized

This method has all the advantages of simple exponential smoothing, but it tends to

be more accurate.

2.2.4.3 Univariate Box-Jenkins

In Box-Jenkins approach, a class of models referred to as auto-regressive integrated with

moving average (ARIMA) is examined and an appropriate model is selected for

forecasting. In auto-regressive process, each observation is made up of a random error

component and a linear combination of prior observations. In moving average process,

each observation is made up of a random error component and a linear combination of

previous random error components. These two processes are independent from each

other.

The Box-Jenkins modeling procedure involved five steps: data preparation, model

selection, parameter estimation, model validation, and forecasting.

Step 1: Data preparation involves transformations and differencing.

Transformations operations such as square roots or logarithms can stabilize the variance

20


in a series where the variation changes with the level. Then the data are differenced until

patterns such as trend or seasonality are totally removed from the data. Differencing

means taking the difference between consecutive observations or between observations a

time period apart. The stationary data are often easier to model than the original data.

Step 2: Model selection involves various graphs based on the transformed and

differenced data to try to identify potential ARIMA processes which might provide a

good fit to the data.

Step 3: Parameter estimation means finding the values of the model coefficients

that provide the best fit to the data. There are sophisticated computational algorithms

designed to do this.

Step 4: Model validation involves testing the assumptions of the model to identify

any areas where the model is inadequate. If the model is found to be inadequate, it is

necessary to go back to step 2 and try to identify a better model.

Step 5: Forecasting is the step after the model has been selected, estimated and

validated. In this step, forecasts are computed using the model.

The contribution of Box and Jenkins was in developing a systematic methodology for

identifying and estimating models that could incorporate both autoregressive and moving

average approaches. This makes Box-Jenkins models a powerful class of models. They

are quite flexible due to the inclusion of both autoregressive and moving average terms. It

is usually possible to find a process that provides an adequate description to the data

[Ch75]. This method provides the most accurate forecast for immediate and short-term

forecast [0d93].

21


in a series where the variation changes with the level. Then the data are differenced until

patterns such as trend or seasonality are totally removed from the data. Differencing

means taking the difference between consecutive observations or between observations a

time period apart. The stationary data are often easier to model than the original data.

Step 2: Model selection involves various graphs based on the transformed and

differenced data to try to identify potential ARIMA processes which might provide a

good fit to the data.

Step 3: Parameter estimation means finding the values of the model coefficients

that provide the best fit to the data. There are sophisticated computational algorithms

designed to do this.

Step 4: Model validation involves testing the assumptions of the model to identify

any areas where the model is inadequate. If the model is found to be inadequate, it is

necessary to go back to step 2 and try to identify a better model.

Step 5: Forecasting is the step after the model has been selected, estimated and

validated. In this step, forecasts are computed using the model.

The contribution of Box and Jenkins was in developing a systematic methodology for

identifying and estimating models that could incorporate both autoregressive and moving

average approaches. This makes Box-Jenkins models a powerful class of models. They

are quite flexible due to the inclusion of both autoregressive and moving average terms. It

is usually possible to find a process that provides an adequate description to the data

[Ch75]. This method provides the most accurate forecast for immediate and short-term

forecast [Od93].

21


A disadvantage of Box-Jenkins approach is the requirement of experts' experience in

identifying a suitable model. Unlike Holt-Winters method, the Box-Jenkins process is not

automatic. Another disadvantage of ARIMA models is that there is not a convenient way

to update the model parameters when new observations arrive. This method also requires

a moderately long series to fit a model to the data. Montgomery [MLJ90] recommends at

least 50 and preferably 100 observations.

2.2.4.4 Memory Based Reasoning

Memory-Based Reasoning (MBR), also known as Case-Based Reasoning, is a form of K-

Nearest Neighbor (KNN) technique. MBR attempts to classify new data points by finding

their nearest neighbors in the state space. The concept of nearest neighbor is based on the

similarity between the pattern of interest and the patterns in the historical database.

A number of distance metrics such as Hamming and Euclidean can be used to

measure the level of similarity. There are several algorithms to reduce the search

complexity for the nearest neighbors. Two classical algorithms are bucketing and k-d

trees.

In bucketing algorithm, the space is divided into identical cells. The points in each

cell are stored in a list. The cells are examined in the order of increasing distance from

the query point. For each cell, the distance between the internal points and the query

point is computed. The search terminated when the distance from the query point to the

current cell is greater than the distance to the closest point already visited.

A k-d tree is a generation of a binary search tree in high dimensions. Each internal

node in a k-d tree is hyper-rectangle split by a hyper-plane orthogonal to one of the

coordinate axis. The hyper-plane divides the hyper-rectangle into two parts associated

22


A disadvantage of Box-Jenkins approach is the requirement of experts’ experience in

identifying a suitable model. Unlike Holt-Winters method, the Box-Jenkins process is not

automatic. Another disadvantage of ARIMA models is that there is not a convenient way

to update the model parameters when new observations arrive. This method also requires

a moderately long series to fit a model to the data. Montgomery [MLJ90] recommends at

least 50 and preferably 100 observations.

2.2.4.4 Memory Based Reasoning

Memory-Based Reasoning (MBR), also known as Case-Based Reasoning, is a form of K-

Nearest Neighbor (KNN) technique. MBR attempts to classify new data points by finding

their nearest neighbors in the state space. The concept of nearest neighbor is based on the

similarity between the pattern of interest and the patterns in the historical database.

A number of distance metrics such as Hamming and Euclidean can be used to

measure the level of similarity. There are several algorithms to reduce the search

complexity for the nearest neighbors. Two classical algorithms are bucketing and k-d

trees.

In bucketing algorithm, the space is divided into identical cells. The points in each

cell are stored in a list. The cells are examined in the order of increasing distance from

the query point. For each cell, the distance between the internal points and the query

point is computed. The search terminated when the distance from the query point to the

current cell is greater than the distance to the closest point already visited.

A k-d tree is a generation of a binary search tree in high dimensions. Each internal

node in a k-d tree is hyper-rectangle split by a hyper-plane orthogonal to one of the

coordinate axis. The hyper-plane divides the hyper-rectangle into two parts associated

22


with the child nodes. The partitioning process continues until the number of points in

each hyper-rectangle is smaller than a given threshold. The purpose of k-d tree is to

partition the sample space according to the distribution of data. The partitioning is finer

where the density of points is higher. To locate the nearest neighbors of a query point,

first the tree was descended to find the data point that lie in the same hyper-rectangle as

the query point. Then it examines the surrounding cells if they overlap the sphere

centered at the query point and containing the closest data point so far.

23


with the child nodes. The partitioning process continues until the number of points in

each hyper-rectangle is smaller than a given threshold. The purpose of k-d tree is to

partition the sample space according to the distribution of data. The partitioning is finer

where the density of points is higher. To locate the nearest neighbors of a query point,

first the tree was descended to find the data point that lie in the same hyper-rectangle as

the query point. Then it examines the surrounding cells if they overlap the sphere

centered at the query point and containing the closest data point so far.


2 3

Node 1 A 0.

Node 2 I* 4 Node 3 4 -00-

Node 2

NodI 4

Node 5

Node 2

Node 4

4 lo 4 110-Node 6 Node 7

11 10-Node 3

Node 1

Node 5

Node 3

Node 6

Node 8

A

Node 9

7,

Node 7

Node 8

•

Node 7

Node 9

Figure 2.6 A sample k-d tree

MBR is considered a lazy learning algorithm because it defers the data processing

until it receives a request to classify an unlabeled point. No models are created. Also,

24


Node 2

Node 1<------------------------------------- ►

Node 2 Node 3< ►

Node 5

< —X ---------- ►Node 6 Node 7

4ode 8

Node 7

’ 'lode 9

Node 3

Node

Node 1

Node 3

Node 7Node 4 Node 6

Node 2

Node 9

Figure 2.6 A sample k-d tree

MBR is considered a lazy learning algorithm because it defers the data processing

until it receives a request to classify an unlabeled point. No models are created. Also,

2 4


confidence levels can be generated using relative distances to matching and non-

matching neighbors.

Farmer and Sidorowich [FS87] attempted to predict the behavior of a time series

generated by a chaotic system using MBR approach. The time series was transformed

into a reconstructed state space using a delay space embedding. In the delay space

embedding, each point in the state space is a vector X composed of time series values

corresponding to a sequence of d delay lags: x/(t) = x(t), x2(t) = x(t-tau), xd(t) = x(t -

(d-1) tau). The nearest k (> d) neighbors in the state space representation were then

located. A local linear map was created for the k neighbors and applied to the value to be

forecast. Although higher order mapping could be used, Farmer and Sidorowich did not

find significant improvements over the linear map. For chaotic time series, they found

this approach is more accurate than standard forecasting techniques such as global linear

autoregressive.

The advantages of MBR are its understandability and the low computational cost

during training. The disadvantages of this method are the requirement of more storage

space and higher computational cost on recall.

2.2.4.5 Artificial Neural Networks

The strategy of artificial neural networks is opposite to the lazy algorithm above.

Artificial neural networks compile the historical data into a model. It discards the training

data after a model has been built. Once a model has been developed, forecasts can be

computed quickly.

An artificial neural network (ANN) is a system composed of many interconnected

processing elements operating in parallel. Its function is determined by network structure,

25


confidence levels can be generated using relative distances to matching and non

matching neighbors.

Farmer and Sidorowich [FS87] attempted to predict the behavior of a time series

generated by a chaotic system using MBR approach. The time series was transformed

into a reconstructed state space using a delay space embedding. In the delay space

embedding, each point in the state space is a vector X composed of time series values

corresponding to a sequence of d delay lags: xj(t) = x(t), X2(t) = x(t-tau), ..., Xd(t) = x(t -

(d-1) tau). The nearest k (> d) neighbors in the state space representation were then

located. A local linear map was created for the k neighbors and applied to the value to be

forecast. Although higher order mapping could be used, Farmer and Sidorowich did not

find significant improvements over the linear map. For chaotic time series, they found

this approach is more accurate than standard forecasting techniques such as global linear

autoregressive.

The advantages of MBR are its understandability and the low computational cost

during training. The disadvantages of this method are the requirement of more storage

space and higher computational cost on recall.

2.2.4.5 Artificial Neural Networks

The strategy of artificial neural networks is opposite to the lazy algorithm above.

Artificial neural networks compile the historical data into a model. It discards the training

data after a model has been built. Once a model has been developed, forecasts can be

computed quickly.

An artificial neural network (ANN) is a system composed of many interconnected

processing elements operating in parallel. Its function is determined by network structure,

25


connection strengths, and the processing performed at computing elements or neurons.

Artificial neural networks are categorized based on their topology and learning rule.

ANN technique has been applied to an increasing number of real-world problems of

considerable complexity. They are especially prominent in solving problems that are too

complex for conventional technologies such as problems that do not have an algorithmic

solution or for which an algorithmic solution is too complex to be found. These problems

include pattern and trend recognition. With the remarkable ability to derive meaning from

complicated or imprecise data, neural networks can be used to extract patterns and detect

trends that are too complex for either humans or other computer techniques to notice. A

trained neural network can then be used to provide projections in new situations.

An ANN model correlates a dependent vector to an independent vector. Each vector

consists of one or more variables. In time series forecasting, the independent variables are

delay lags of the dependent variables. A number of adjoining data points of the time

series Xt-k+1, Xt-k+23 • • •, Xt form the input window or vector, and a point in the future,

Xt+m, is the output. Here k is the window size and m is the lead time. The lead time refers

to the period of time in the future for which a prediction is made. The value of m

depends on whether the forecasting is for long-term or short-term. If the prediction is

made two months ahead for example, then the lead time m is two months. The search for

k is based on domain knowledge and often requires several trial-and-error experiments.

ANN technique has the following advantages. Firstly, it has an ability to learn how to

do tasks based on training data. Secondly, it can capture non-linear structure. Thirdly,

parallel computations make it suitable for real time operation. Fourthly, it is fault tolerant

via redundant information coding. Finally, it is free from a priori selection of

26


connection strengths, and the processing performed at computing elements or neurons.

Artificial neural networks are categorized based on their topology and learning rule.

ANN technique has been applied to an increasing number of real-world problems of

considerable complexity. They are especially prominent in solving problems that are too

complex for conventional technologies such as problems that do not have an algorithmic

solution or for which an algorithmic solution is too complex to be found. These problems

include pattern and trend recognition. With the remarkable ability to derive meaning from

complicated or imprecise data, neural networks can be used to extract patterns and detect

trends that are too complex for either humans or other computer techniques to notice. A

trained neural network can then be used to provide projections in new situations.

An ANN model correlates a dependent vector to an independent vector. Each vector

consists of one or more variables. In time series forecasting, the independent variables are

delay lags of the dependent variables. A number of adjoining data points of the time

series Xt-k+i, X^+2, •••, Xt form the input window or vector, and a point in the future,

Xt+m> is the output. Here k is the window size and m is the lead time. The lead time refers

to the period of time in the future for which a prediction is made. The value of m

depends on whether the forecasting is for long-term or short-term. If the prediction is

made two months ahead for example, then the lead time m is two months. The search for

k is based on domain knowledge and often requires several trial-and-error experiments.

ANN technique has the following advantages. Firstly, it has an ability to learn how to

do tasks based on training data. Secondly, it can capture non-linear structure. Thirdly,

parallel computations make it suitable for real time operation. Fourthly, it is fault tolerant

via redundant information coding. Finally, it is free from a priori selection of

26


mathematical models. Although it is necessary to select neural network parameters and

structures before training, this information is not as crucial as the mathematics model in

other statistical methods.

However, there are also some disadvantages of ANN technique. Firstly, the neural

interconnections and their logical meaning are complex and difficult to understand.

Secondly, the process of finding a suitable topology and identifying the network

parameters is empirical and often time-consuming. Thirdly, neural networks usually

require long training time on a serial computer simulation although the resultant network

can perform in real-time situations. Fourthly, a neural network can over-fit the data, i.e.

the network memorizes the training data but has low forecast ability. Finally, a neural

network can easily become stuck in a local minimum.

More details on ANN technique will be provided in the next chapter.

27


mathematical models. Although it is necessary to select neural network parameters and

structures before training, this information is not as crucial as the mathematics model in

other statistical methods.

However, there are also some disadvantages of ANN technique. Firstly, the neural

interconnections and their logical meaning are complex and difficult to understand.

Secondly, the process of finding a suitable topology and identifying the network

parameters is empirical and often time-consuming. Thirdly, neural networks usually

require long training time on a serial computer simulation although the resultant network

can perform in real-time situations. Fourthly, a neural network can over-fit the data, i.e.

the network memorizes the training data but has low forecast ability. Finally, a neural

network can easily become stuck in a local minimum.

More details on ANN technique will be provided in the next chapter.


2 7

Chapter 3

Methodology: Neural Networks and Multiple-Neural-Network

Framework

This chapter provides background literature on neural network methodology and

introduces the multiple neural network structure used in this project. Section 3.1 provides

some general ideas about neural networks. Section 3.2 focuses on back-propagation

learning procedure. Some practical considerations in neural network topology and

training are discussed in section 3.3. Literatures on multiple neural network approaches

are reviewed in section 3.4. Section 3.5 introduces the multiple-neural-network

framework including its motivation and structure. Lastly, session 3.6 discusses the tools

to develop neural network applications.

3.1 Background on Neural Networks

An artificial neural network (ANN), or neural network (NN) in short, is an information-

processing paradigm that is inspired by the way biological nervous systems, such as the

brain, process information. The key element of this paradigm is the novel structure of the

information processing system. It is composed of a large number of highly interconnected

processing elements called neurons working together to solve specific problems (Figure

3.1). Neural networks have the ability to learn by examples. An ANN is configured for a

specific application, such as pattern recognition or data classification, through a learning

process.

28


Chapter 3

Methodology: Neural Networks and Multiple-Neural-Network

Framework

This chapter provides background literature on neural network methodology and

introduces the multiple neural network structure used in this project. Section 3.1 provides

some general ideas about neural networks. Section 3.2 focuses on back-propagation

learning procedure. Some practical considerations in neural network topology and

training are discussed in section 3.3. Literatures on multiple neural network approaches

are reviewed in section 3.4. Section 3.5 introduces the multiple-neural-network

framework including its motivation and structure. Lastly, session 3.6 discusses the tools

to develop neural network applications.

3.1 Background on Neural Networks

An artificial neural network (ANN), or neural network (NN) in short, is an information-

processing paradigm that is inspired by the way biological nervous systems, such as the

brain, process information. The key element of this paradigm is the novel structure of the

information processing system. It is composed of a large number of highly interconnected

processing elements called neurons working together to solve specific problems (Figure

3.1). Neural networks have the ability to learn by examples. An ANN is configured for a

specific application, such as pattern recognition or data classification, through a learning

process.

2 8


Input Layer Hidden Layer Output Layer

Figure 3.1 A Multi-layer Artificial Neural Network

3.1.1 Artificial Neurons

Artificial neural networks typically consist of artificial neurons as shown in Figure 3.2.


Input Layer Hidden Layer Output Layer

Figure 3.1 A Multi-layer Artificial Neural Network

3.1.1 Artificial Neurons

Artificial neural networks typically consist of artificial neurons as shown in Figure 3.2.


2 9

Dendrites Cell Body

Threshold

Summation

Axon

Figure 3.2 An Artificial Neuron

The artificial neuron is viewed as a node or cell body connected to other nodes via

links that correspond to axon-synapse-dendrite connections. Each link is associated with

a weight. Similar to a synapse in a biological neuron, the weight determines the influence

or strength of one node on another. If a weight is negative, then the connection is

inhibitory, i.e. decreasing the activity of the target unit; if it is positive, it has an

excitatory, i.e. activity enhancing, effect. The influence received from an input link is

called the weighted input and is the product of the corresponding input and the weight of

the link.

At each node, the following computational model is applied.

y = (wi x, +t)

30


DendritesCell Body

Threshold

Axon

Summation

Figure 3.2 An Artificial Neuron

The artificial neuron is viewed as a node or cell body connected to other nodes via

links that correspond to axon-synapse-dendrite connections. Each link is associated with

a weight. Similar to a synapse in a biological neuron, the weight determines the influence

or strength of one node on another. If a weight is negative, then the connection is

inhibitory, i.e. decreasing the activity of the target unit; if it is positive, it has an

excitatory, i.e. activity enhancing, effect. The influence received from an input link is

called the weighted input and is the product of the corresponding input and the weight of

the link.

At each node, the following computational model is applied.

n

y = 6 ^ ( Wixi+ t)i =1

30


Here wixi is a weighted input, t is the node's threshold or bias, y is the output and 0 is the

threshold function.

3.1.2 Transfer Functions

Each node combines the separate influences received on its input links into an overall

influence using a transfer function or activation function O. A transfer function is usually

non-linear.

One simple transfer function passes the sum of the weighted inputs through a

threshold function to determine the node's output. The output is either 0 or 1 depending

on whether the sum of the input is below or above the threshold value used by the

threshold function.

Other transfer functions include piece-wise linear, sigmoid and Gaussian, as showed

in Figure 3.3.

Threshold Piecewise Linear Sigmoid Gaussian

Figure 3.3 Activation functions

A linear activation function is often used for output units. For hidden units, the

logistic sigmoid function is by far probably the most frequently used in ANN. It is a

strictly increasing function that exhibits smoothness.

31


Here W]X\ is a weighted input, t is the node’s threshold or bias, y is the output and 9 is the

threshold function.

3.1.2 Transfer Functions

Each node combines the separate influences received on its input links into an overall

influence using a transfer function or activation function 9. A transfer function is usually

non-linear.

One simple transfer function passes the sum of the weighted inputs through a

threshold function to determine the node’s output. The output is either 0 or 1 depending

on whether the sum of the input is below or above the threshold value used by the

threshold function.

Other transfer functions include piece-wise linear, sigmoid and Gaussian, as showed

in Figure 3.3.

Threshold Piecewise Linear Sigmoid Gaussian

Figure 3.3 Activation functions

A linear activation function is often used for output units. For hidden units, the

logistic sigmoid function is by far probably the most frequently used in ANN. It is a

strictly increasing function that exhibits smoothness.

31


1 Y =

1+ e,„

In this equation, k is the slope factor.

This function has a desirable property that the gradient can be expressed as a simple

function of the output: y' (x) = ky(x)(1-y(x)). The gradient is used in the gradient descent

algorithm which is a part of back-propagation learning procedure.

3.2 Back-Propagation Learning Procedure

The backward-error-propagation procedure (back-propagation in short) is the most

widely used learning procedure for neural networks. This procedure is simple but

relatively efficient. The learning rule in this procedure is called the generalized delta rule.

Generalized delta rule does hill climbing by gradient descent.

3.2.1 Generalized Delta Rule and Gradient Descent

The delta rule developed by Widrow and Hoff (cited in [Ru95]) is one of the most

commonly used learning rules. For a given input vector, the output vector is compared to

the correct answer. The weights are then adjusted to reduce the difference if there is any.

It is an error correcting procedure. The change in weight from a unit in layer i to a unit in

layer j is given by

= = qgjoi

where ti is the learning rate, di is the desired output and oi is the actual output of the unit

in layer j, of represents the output of the unit in layer i, and 6 = dip; is sometimes called

the error at the unit in layer j.

The delta rule works well for neural networks without hidden layers. However, with

hidden layers, the desired outputs of the hidden units are not known, and in fact can only

32


13; = -j —1 + e

In this equation, k is the slope factor.

This function has a desirable property that the gradient can be expressed as a simple

function of the output: y ’(x) = ky(x)(l-y(xj). The gradient is used in the gradient descent

algorithm which is a part of back-propagation learning procedure.

3.2 Back-Propagation Learning Procedure

The backward-error-propagation procedure (back-propagation in short) is the most

widely used learning procedure for neural networks. This procedure is simple but

relatively efficient. The learning rule in this procedure is called the generalized delta rule.

Generalized delta rule does hill climbing by gradient descent.

3.2.1 Generalized Delta Rule and Gradient Descent

The delta rule developed by Widrow and Hoff (cited in [Ru95]) is one of the most

commonly used learning rules. For a given input vector, the output vector is compared to

the correct answer. The weights are then adjusted to reduce the difference if there is any.

It is an error correcting procedure. The change in weight from a unit in layer i to a unit in

layer j is given by

A w ^ j = r}(dj-Oj)oi = rjSjOi

where t] is the learning rate, d j is the desired output and oj is the actual output of the unit

in layer j, o, represents the output of the unit in layer i, and dj = dj-Oj is sometimes called

the error at the unit in layer j.

The delta rule works well for neural networks without hidden layers. However, with

hidden layers, the desired outputs of the hidden units are not known, and in fact can only

32


be computed after the best set of weights has been found, thus the weight adjustments

cannot be calculated. Rumelhart et al. (cited in [Ru95]) developed a generalized form of

the delta rule that is suited for networks with hidden layers. They showed that the method

works for the class of semi-linear activation functions which is non-decreasing and

differentiable. To get the error generated from the output of a middle layer, the back-

propagate procedure backtracks through the middle layer to the units that are responsible

for generating that output. The error generated from the middle layer could be used with

the delta rule to adjust the weights.

The back-propagation procedure is based on the hill climbing process. Hill climbing

is a process of making small changes toward a solution. Each change makes the solution

slightly better, until no further improvements are possible. One way to do hill climbing is

to measure the effects of changing one weight at a time while keeping all the other

weights constant. Then, only the weight that does the most good will be changed.

However, better performance can be obtained if the hill is a sufficiently smooth function

of the weights. In this case, it is possible to proceed in the direction of the most rapid

performance improvement by varying all the weights simultaneously in proportion to

how much good is done by individual changes. This procedure is called gradient descent.

The mathematical explainations for the back-propagation learning procedure are

provided in the section below which is based mostly on Winston [Wi92].

3.2.2 Back-Propagation Formulae

The purpose of making adjustments to the weights is to improve the performance of the

network. One way to measure the performance of a network is to calculate the negative of

total square error:

33


be computed after the best set of weights has been found, thus the weight adjustments

cannot be calculated. Rumelhart et al. (cited in [Ru95]) developed a generalized form of

the delta rule that is suited for networks with hidden layers. They showed that the method

works for the class of semi-linear activation functions which is non-decreasing and

differentiable. To get the error generated from the output of a middle layer, the back-

propagate procedure backtracks through the middle layer to the units that are responsible

for generating that output. The error generated from the middle layer could be used with

the delta rule to adjust the weights.

The back-propagation procedure is based on the hill climbing process. Hill climbing

is a process of making small changes toward a solution. Each change makes the solution

slightly better, until no further improvements are possible. One way to do hill climbing is

to measure the effects of changing one weight at a time while keeping all the other

weights constant. Then, only the weight that does the most good will be changed.

However, better performance can be obtained if the hill is a sufficiently smooth function

of the weights. In this case, it is possible to proceed in the direction of the most rapid

performance improvement by varying all the weights simultaneously in proportion to

how much good is done by individual changes. This procedure is called gradient descent.

The mathematical explainations for the back-propagation learning procedure are

provided in the section below which is based mostly on Winston [Wi92],

3.2.2 Back-Propagation Formulae

The purpose of making adjustments to the weights is to improve the performance of the

network. One way to measure the performance of a network is to calculate the negative of

total square error:

33


P sZ —o,z )2 ) Z

where

P is the measured performance,

s is an index that ranges over sample inputs,

z is an index that ranges over all output nodes,

d, is the desired output for sample input s at the zth node,

o, is the actual output for sample input s at the zth node.

Hidden Layer i Hidden Layer j Hidden Layer k Output Layer z

Figure 3.4 Layers in a feed-forward neural network

The gradient descent rule suggests that the best improvement in performance is

achieved when all the weights are altered in proportion to the corresponding partial

derivative.

The partial derivative of the performance with respect to a particular weight can be

computed by adding up the partial derivative for each input pattern separately. Thus, we

34


where

P is the measured performance,

s is an index that ranges over sample inputs,

z is an index that ranges over all output nodes,

dsz is the desired output for sample input s at the zth node,

osz is the actual output for sample input s at the zth node.

The gradient descent rule suggests that the best improvement in performance is

achieved when all the weights are altered in proportion to the corresponding partial

derivative.

The partial derivative of the performance with respect to a particular weight can be

computed by adding up the partial derivative for each input pattern separately. Thus, we

Hidden Layer i Hidden Layer j Hidden Layer k Output Layer z

Figure 3.4 Layers in a feed-forward neural network

3 4


can focus on an input pattern one at a time and then at the end each weight will be

adjusted by summing the adjustments derived from each input pattern.

Consider, then, the partial derivative

aP

where the weight, wi,i is a weight of a link connecting the i layer of nodes to the j layer

of nodes.

The effect of wi,i on performance P is through the intermediate variable of, the output

of the j node. Using the change rule:

aP aP aoj aoj aPawi, j aw,1 awi, j a0;

Now consider the first quotient on the right hand side. We know that oi is the result of

passing the sum of weighted inputs to an activation function, of = f (E i Wi-->j) •

Treating the sum as an intermediate variable o and apply the chain rule again:

aoj (If (0- j) au./ df (a ) df(6.) of =0 i

aVVI. j j aWi j j j

Substitute the result to the previous equation we have key equation (i)

aP df (6; ) aP .0. aw,, d6; a0 J

(i)

Note that the last quotient on the right hand side can be expressed in terms of the

partial derivatives, aPiaok, in the next layer k.

aP = y aP a0k ,Ia0k aP

a0 z—ia0 a0 ao ao k k i k J k

35


can focus on an input pattern one at a time and then at the end each weight will be

adjusted by summing the adjustments derived from each input pattern.

Consider, then, the partial derivative

dP

dw^ j

where the weight, w ^ j is a weight of a link connecting the i layer of nodes to the j layer

of nodes.

The effect of Wi_*j on performance P is through the intermediate variable oj, the output

of the j node. Using the change rule:

dP _ dP doj _ doj dPdw^j doj dnUj dw^j doj

Now consider the first quotient on the right hand side. We know that oj is the result of

passing the sum of weighted inputs to an activation function, Oj = / ( ^ . 0 ,-w ^ .).

Treating the sum as an intermediate variable O) and apply the chain rule again:

doj dfitTj) d a j df((Tj) d f (<T■) — = J- o. = oi —dWi^j d<7j dw ^ j d(Tj ' 1 d a j

Substitute the result to the previous equation we have key equation (i)

dP d f (a j) dP

° ‘ dOj do}

Note that the last quotient on the right hand side can be expressed in terms of the

partial derivatives, 3P/dok, in the next layer k.

dP _ -yi dP dok dok dPdoj k dok doj k doj dok

35


We know that ok = f (E o w ) where f is the activation function. Treat the sum as

an intermediate variable, ak, and apply the chain rule:

aOk df (a k ) auk _ df (a k )

1

df (o- k ) •

do ao, dak w k dak

Substituting this result back to the equation for apiao; yields the following key

equation (ii)

ap df (elk) aP ao1 k do-k aok

The two key equations (i) and (ii) have two important consequences. First, the partial

derivative of performance with respect to a weight depends on a partial derivative of

performance with respect to the following output. Second, the partial derivative of

performance with respect to one output depends on the partial derivatives of performance

with respect to the outputs in the next layer. That is the reason why we need to compute

error backward from the last layer to the initial layer.

The partial derivative of performance with respect to each output in the final layer is:

ap (—(c1,-002) = 2(d, —or) ao, — ao,

Using equation (iii) to compute backward, the only unsolved factor in equation (i) and

(ii) is the derivative of the activation function. For logistic sigmoid f (a) = 1 , the 1+ e -ka

derivative can be computed easily.

df (a) d 1 = [ ] = k(1+ e -ka ) -2 e -ka = kf (6)(1— f (a)) = ko(1— o)

du du 1+ e -ka

36


We know that ok = where f is the activation function. Treat the sum as

an intermediate variable, Ok, and apply the chain rule:

dok _ d f ( a k) d a k _ d f (<7k) ^ d f ( a k)doj d a k doj d a k J k J k d a k

Substituting this result back to the equation for dP/0Oj yields the following key

equation (ii)

dP d f (<7k) dPdoj k d a k dok

The two key equations (i) and (ii) have two important consequences. First, the partial

derivative of performance with respect to a weight depends on a partial derivative of

performance with respect to the following output. Second, the partial derivative of

performance with respect to one output depends on the partial derivatives of performance

with respect to the outputs in the next layer. That is the reason why we need to compute

error backward from the last layer to the initial layer.

The partial derivative of performance with respect to each output in the final layer is:

dP d ( - { d - o zf )do, do

2(dz - o z) (iii)Z

Using equation (iii) to compute backward, the only unsolved factor in equation (i) and

(ii) is the derivative of the activation function. For logistic sigmoid /(e r) = ---- — , thel + e

derivative can be computed easily.

^ + e~kar 2e-ka - kf (a) ( 1 - f ( c7 )) = ko(l - o) dcr d a l + e

3 6


Finally, weight changes should depend on a learning rate parameter Denote

13„.aPiaor, and absorb the constants to the learning rate then we have the following set of

formulae.

(1) Awi, ; = rloi 0;(1-0,0 Rj

(2) 13; = wi_>k ok(1- ) Ok

(3) Pz = dz - oz

where

1 is the learning rate

oi, oi, ok are actual outputs of nodes in hidden layers i, j and k respectively

oz is the observed output of a node in output layer z

d, is the desired output of a node in output layer z

13„ is a factor that measures how beneficial a change is to a node in layer n

w;-> k is a weight of a link connecting a node from layer j to a node from layer k

Awi,i is the weight change from layer i to layer j

Formula (3) computes the benefit to an output node while formula (2) does the same

but for a hidden node. Formula (1) calculates the weight change from a node in layer i to

a node in layer j.

The back-propagation learning procedure using the above formulae is described in the

following session.

3.2.3 Back-Propagation Procedure

There are two phases in a back-propagation procedure. In the first phase called feed-

forward, output is calculated from an input pattern. In the second phase, it computes

changes to the weights in the final layer first, reuses much of the same computation to

37


Finally, weight changes should depend on a learning rate parameter r\. Denote

Pn=0P/0on and absorb the constants to the learning rate then we have the following set of

formulae.

(1) AwM = riOiOj(l-Oj)Pj

(2) |3j = Sk wj_* ok(l-ok) pk

(3)pz= d z - o z

where

r\ is the learning rate

ok oj, ok are actual outputs of nodes in hidden layers i, j and k respectively

oz is the observed output of a node in output layer z

dz is the desired output of a node in output layer z

is a factor that measures how beneficial a change is to a node in layer n

Wj^k is a weight of a link connecting a node from layer j to a node from layer k

Aw,-)] is the weight change from layer i to layer j

Formula (3) computes the benefit to an output node while formula (2) does the same

but for a hidden node. Formula (1) calculates the weight change from a node in layer i to

a node in layer j.

The back-propagation learning procedure using the above formulae is described in the

following session.

3.2.3 Back-Propagation Procedure

There are two phases in a back-propagation procedure. In the first phase called feed

forward, output is calculated from an input pattern. In the second phase, it computes

changes to the weights in the final layer first, reuses much of the same computation to

37


compute changes to the weights in the previous layer, and ultimately returns to the initial

layer. This is what gives back propagation its name. Using formulae 1, 2 and 3 from

section 3.2.3, the procedure is described as follows.

• Let be the learning rate

• Set all weights and biases to small random values

• Until total error is small enough do

• For each input vector

• Do feed forward pass to get outputs

• Compute benefit 13 for output nodes using (3)

• Compute benefit 13 for hidden nodes, working from the last

layer to the first layer using (2)

• Compute and store weight changes for all weights using (1)

• Add up weight changes for all input vectors and change the weights

The procedure above is called batch back-propagation. In online back-propagation,

weights are updated after each input vector is fed instead of at the end of the whole cycle.

3.3 Considerations on Neural Network Topology and

Training Parameters

A major disadvantage of the neural network approach is that the process of finding a

suitable topology and identifying the network parameters is empirical and often time-

consuming. To develop the best neural network for a particular problem, both the

network topology and the network parameters need to be optimized. Bad choice of

38


compute changes to the weights in the previous layer, and ultimately returns to the initial

layer. This is what gives back propagation its name. Using formulae 1, 2 and 3 from

section 3.2.3, the procedure is described as follows.

• Let r| be the learning rate

• Set all weights and biases to small random values

• Until total error is small enough do

• For each input vector

• Do feed forward pass to get outputs

• Compute benefit (3 for output nodes using (3)

• Compute benefit (3 for hidden nodes, working from the last

layer to the first layer using (2)

• Compute and store weight changes for all weights using (1)

• Add up weight changes for all input vectors and change the weights

The procedure above is called batch back-propagation. In online back-propagation,

weights are updated after each input vector is fed instead of at the end of the whole cycle.

3.3 Considerations on Neural Network Topology and

Training Parameters

A major disadvantage of the neural network approach is that the process of finding a

suitable topology and identifying the network parameters is empirical and often time-

consuming. To develop the best neural network for a particular problem, both the

network topology and the network parameters need to be optimized. Bad choice of

38


parameters can cause network training to be stuck in a local minimum or oscillate around

a minimum.

The numbers of input and output variables in a causal model can often be determined

provided good domain knowledge is available. The outputs are usually the predicted

variables. The inputs are the parameters that influence the outputs and for which data are

either available or easy to obtain. Although it is feasible to use a neural network approach

without an analysis of the independent variables involved, it is generally desirable to

establish a priori the significant variables. This can simplify the data collection process

as well as the complexity of the networks, However, if data gathering and processing is

relatively inexpensive, a common approach is to input all available process parameters to

the network and then let it learn which variables are important. While it may take several

experiments involving different sets of input variables and a number of sensitivity tests

before the most significant input variables can be identified, this approach ensures all

possible variables are examined in the model. The number of hidden nodes is often

determined through trial-and-error process.

3.4 Literature Reviews: Multiple Neural Networks

Approaches

Multiple neural network method has been explored by several researchers. There are

various ways to integrate individual neural networks into one model, each way aims at a

different purpose. This section will review a few approaches that are not necessarily

related to time series modeling, but which explore different aspects related to neural

networks.

39


parameters can cause network training to be stuck in a local minimum or oscillate around

a minimum.

The numbers of input and output variables in a causal model can often be determined

provided good domain knowledge is available. The outputs are usually the predicted

variables. The inputs are the parameters that influence the outputs and for which data are

either available or easy to obtain. Although it is feasible to use a neural network approach

without an analysis of the independent variables involved, it is generally desirable to

establish a priori the significant variables. This can simplify the data collection process

as well as the complexity of the networks, However, if data gathering and processing is

relatively inexpensive, a common approach is to input all available process parameters to

the network and then let it learn which variables are important. While it may take several

experiments involving different sets of input variables and a number of sensitivity tests

before the most significant input variables can be identified, this approach ensures all

possible variables are examined in the model. The number of hidden nodes is often

determined through trial-and-error process.

3.4 Literature Reviews: Multiple Neural Networks

Approaches

Multiple neural network method has been explored by several researchers. There are

various ways to integrate individual neural networks into one model, each way aims at a

different purpose. This section will review a few approaches that are not necessarily

related to time series modeling, but which explore different aspects related to neural

networks.

39


For a given prediction problem, several neural network solutions can be obtained. The

network resulting in the least testing error is usually chosen. However, the network may

not be the optimum when it is applied to the whole population. Hashem et al. [HSY94]

proposed using optimal linear combinations of a number of trained neural networks

instead of using a single best network. Each component network can have a different

architecture and/or training parameters. Optimal linear combinations are constructed by

forming weighted sums of the corresponding outputs of the networks. The combination

weights are selected to minimize the mean squared error with respect to the distribution

of random inputs. Combining the trained networks may help integrate the knowledge

required by the component networks and thus improve model accuracy. From a neural

network perspective, combining the corresponding outputs of a number of trained

networks is similar to creating a large neural network in which the component neural

networks are sub-networks operating in parallel, and the combination weights are the

connection weights of the output layer.

Hashem et al. conducted an experiment to evaluate the effectiveness of using optimal

linear combination of neural networks in function approximation. Ten independent

training sets were generated using a chosen function. Each data set was used to train six

neural networks. At the end, ten replications of six trained neural networks were

produced. For each replication, combination weights were estimated. The results

demonstrated that the combinations of neural networks could substantially improve

model accuracy as compared to the approach using individual neural networks.

Another interesting observation in [HSY94] is that the model accuracy achieved by

combining the poorly trained component networks was better than that achieved by

40


For a given prediction problem, several neural network solutions can be obtained. The

network resulting in the least testing error is usually chosen. However, the network may

not be the optimum when it is applied to the whole population. Hashem et al. [HSY94]

proposed using optimal linear combinations of a number of trained neural networks

instead of using a single best network. Each component network can have a different

architecture and/or training parameters. Optimal linear combinations are constructed by

forming weighted sums of the corresponding outputs of the networks. The combination

weights are selected to minimize the mean squared error with respect to the distribution

of random inputs. Combining the trained networks may help integrate the knowledge

required by the component networks and thus improve model accuracy. From a neural

network perspective, combining the corresponding outputs of a number of trained

networks is similar to creating a large neural network in which the component neural

networks are sub-networks operating in parallel, and the combination weights are the

connection weights of the output layer.

Hashem et al. conducted an experiment to evaluate the effectiveness of using optimal

linear combination of neural networks in function approximation. Ten independent

training sets were generated using a chosen function. Each data set was used to train six

neural networks. At the end, ten replications of six trained neural networks were

produced. For each replication, combination weights were estimated. The results

demonstrated that the combinations of neural networks could substantially improve

model accuracy as compared to the approach using individual neural networks.

Another interesting observation in [HSY94] is that the model accuracy achieved by

combining the poorly trained component networks was better than that achieved by

40


combining the well-trained component networks. Hashem et al. interpreted this as an

indication that the effectiveness of the combined model does not depend on accuracy of

individual component neural networks. In the case where the component neural networks

were poorly trained, the combination weights play a significant role in the model. That is,

the poorly trained networks are given less weight. When the component networks were

well trained, the combination weights only assume the "fine-tuning" role.

The ensemble neural network system introduced by Hashem et al. can be used for

prediction problems only. Cho and Kim [CB95] presented a method using fuzzy integral

to combine multiple neural networks for classification problems. Unlike other methods

such as the majority votingl or the Borda count2, the proposed method not only combines

the results from the component networks but also considers the relative importance of

each network. For each new input datum, each trained component neural network

calculates the degree of certainty h that the object belongs to a class. Next, the degree of

importance g of each component network in the recognition of a class is calculated. The

fuzzy integral of each class is then computed based on the values of h and g. Finally, the

class with the largest integral value is chosen as the output class.

Cho and Kim conducted an experiment to recognize Arabic numerals, uppercase and

lowercase letters from handwriting characters. Three component neural networks with

different sizes of input vectors were trained. Each network reflects a different view of the

input from coarse to fine. The results show that the overall rates for correct classification

of the fuzzy integral is higher than those of the other methods, including the majority

voting or Borda count methods as well as individual networks. In some cases, the fuzzy

I The majority voting rule chooses the classification made by more than half of the networks.

41


combining the well-trained component networks. Hashem et al. interpreted this as an

indication that the effectiveness of the combined model does not depend on accuracy of

individual component neural networks. In the case where the component neural networks

were poorly trained, the combination weights play a significant role in the model. That is,

the poorly trained networks are given less weight. When the component networks were

well trained, the combination weights only assume the "fine-tuning" role.

The ensemble neural network system introduced by Hashem et al. can be used for

prediction problems only. Cho and Kim [CB95] presented a method using fuzzy integral

to combine multiple neural networks for classification problems. Unlike other methods

1 9such as the majority voting or the Borda count , the proposed method not only combines

the results from the component networks but also considers the relative importance of

each network. For each new input datum, each trained component neural network

calculates the degree of certainty h that the object belongs to a class. Next, the degree of

importance g of each component network in the recognition of a class is calculated. The

fuzzy integral of each class is then computed based on the values of h and g. Finally, the

class with the largest integral value is chosen as the output class.

Cho and Kim conducted an experiment to recognize Arabic numerals, uppercase and

lowercase letters from handwriting characters. Three component neural networks with

different sizes of input vectors were trained. Each network reflects a different view of the

input from coarse to fine. The results show that the overall rates for correct classification

of the fuzzy integral is higher than those of the other methods, including the majority

voting or Borda count methods as well as individual networks. In some cases, the fuzzy

1 T h e m ajority v o tin g rule c h o o se s the c la ss if ic a tio n m ade b y m ore than h a lf o f the netw orks.

41


integral made correct decisions although the partial decisions from individual component

networks were completely inconsistent.

Lee [Le96] focused more on the data and its distribution. Lee introduced a multiple

neural network approach in which each network handles a different set of the input data

with different weights. In this approach, a sub-network is created when the main network

has little confidence in its decision. The main network and sub-network share the same

input vector but each network has its own hidden and output layers. The main network

handles most of the cases while sub-networks handle more or less irregular parts of the

data. The algorithm works as follows. First a confidence level between 0 and 1 was

chosen. A system with the confidence level of 1.0 is equivalent to a general multilayer

neural network. After the main neural network is trained, the confidence level of each

training data point is evaluated. If it is unsatisfactory according to the chosen confidence

level, the data point is extracted and moved to the unsatisfactory data set. When all the

training data points have been examined, a sub-network is generated using the

unsatisfactory data set. The procedure is repeated until the unsatisfactory data set reached

the pre-defined minimum size, or until it reached the depth limit. The output of the

system is selected among multiple outputs from the neural networks using a preference

vector. A preference vector has the form P = (pi, ..., pa), where each pi is the preference

value for the network Ni and pi has the value of 0 or 1. The best output is chosen as the

one with the best confidence among all the outputs with its preference vector of value 1.

An application of letter recognition from phonemes was developed in [Le96] based

on the proposed approach. The results indicate that the performance of the proposed

2 The Borda count of a class is the sum of the number of classes ranked below that class by each network. The class of which the Borda count is the largest is chosen.

42


integral made correct decisions although the partial decisions from individual component

networks were completely inconsistent.

Lee [Le96] focused more on the data and its distribution. Lee introduced a multiple

neural network approach in which each network handles a different set of the input data

with different weights. In this approach, a sub-network is created when the main network

has little confidence in its decision. The main network and sub-network share the same

input vector but each network has its own hidden and output layers. The main network

handles most of the cases while sub-networks handle more or less irregular parts of the

data. The algorithm works as follows. First a confidence level between 0 and 1 was

chosen. A system with the confidence level of 1.0 is equivalent to a general multilayer

neural network. After the main neural network is trained, the confidence level of each

training data point is evaluated. If it is unsatisfactory according to the chosen confidence

level, the data point is extracted and moved to the unsatisfactory data set. When all the

training data points have been examined, a sub-network is generated using the

unsatisfactory data set. The procedure is repeated until the unsatisfactory data set reached

the pre-defined minimum size, or until it reached the depth limit. The output of the

system is selected among multiple outputs from the neural networks using a preference

vector. A preference vector has the form P = (pi, ..., p n), where each p t is the preference

value for the network A, and p, has the value of 0 or 1. The best output is chosen as the

one with the best confidence among all the outputs with its preference vector of value 1.

An application of letter recognition from phonemes was developed in [Le96] based

on the proposed approach. The results indicate that the performance of the proposed

2 T h e B orda co u n t o f a c la ss is the su m o f the num ber o f c la sse s ranked b e lo w that c la ss b y ea ch netw ork. T h e c la ss o f w h ich the B ord a co u n t is the largest is ch o sen .

42


approach is better than a general multi-layer neural network. The proposed approach

achieved 10% improvement on the rate of correct classification of the test data. In the

experiment, a two level generation process was used. The preference vector of (1,1) gave

the best result. The confidence level of 0.9 shows the best performance in the

generalization. The new approach also improved the vowel recognizing ability, and the

vowel recognition error reduced from 27% to 15%, while the correctness rate increased

from 87% to 93%.

The advantages of the proposed approach include: (1) it provides a way to handle the

problem of local minima, (2) each network specializes in only a subset of the data

distribution while passing the unsatisfactory instances to the next sub-network to handle,

and (3) it reduces the training time compared with a neural network which has the same

number of hidden units.

Kadaba et al. [KNJK89] developed a multiple neural network system to improve

accuracy by decreasing the input and output cardinalities. They used back-propagation

self-organizing networks to compress data records and then used the concentrated low-

cardinality data records to feed a classification neural network. In their case study, a

multiple neural network system was developed to select the appropriate combination of

selection rules and insertion rules for the Traveling Salesman Problem (TSP). Since there

are 6 selection rules and 6 insertion rules, there are 36 combinations. Hence, if a single

neural network is used, there should be 36 output variables. Each input point is also

represented with a vector of length 30. Kadaba et al. used two self-organizing neural

networks to compress both the original input and output vectors into vectors of length 4.

The multiple neural network system achieved an accuracy rate of 94% in contrast to the

43


approach is better than a general multi-layer neural network. The proposed approach

achieved 10% improvement on the rate of correct classification of the test data. In the

experiment, a two level generation process was used. The preference vector of (1,1) gave

the best result. The confidence level of 0.9 shows the best performance in the

generalization. The new approach also improved the vowel recognizing ability, and the

vowel recognition error reduced from 27% to 15%, while the correctness rate increased

from 87% to 93%.

The advantages of the proposed approach include: (1) it provides a way to handle the

problem of local minima, (2) each network specializes in only a subset of the data

distribution while passing the unsatisfactory instances to the next sub-network to handle,

and (3) it reduces the training time compared with a neural network which has the same

number of hidden units.

Kadaba et al. [KNJK89] developed a multiple neural network system to improve

accuracy by decreasing the input and output cardinalities. They used back-propagation

self-organizing networks to compress data records and then used the concentrated low-

cardinality data records to feed a classification neural network. In their case study, a

multiple neural network system was developed to select the appropriate combination of

selection rules and insertion rules for the Traveling Salesman Problem (TSP). Since there

are 6 selection rules and 6 insertion rules, there are 36 combinations. Hence, if a single

neural network is used, there should be 36 output variables. Each input point is also

represented with a vector of length 30. Kadaba et al. used two self-organizing neural

networks to compress both the original input and output vectors into vectors of length 4.

The multiple neural network system achieved an accuracy rate of 94% in contrast to the

43


single high-dimension back-propagation neural networks, which only managed to give an

accuracy rate of 10-20%.

The work by Duhoux et al. [DSMV01] is most relevant to the study area of this paper.

Duhoux et al. compared two methods for long-term prediction with neural network

chains. The classical method makes predictions in one-step-ahead recursively. In this

method, only a single one-step-ahead neural network is trained and it is used iteratively p

times to predict p step ahead. The input is shifted correspondingly at each iteration step.

The proposed method, on the other hand, uses p different trained neural networks with

different sizes of input vectors ranging from 1 to p. The input of a network is the same as

that of the previous network plus the predicted output from the previous network.

Dohoux et al. conducted an experiment to predict the hot metal temperature in the

industrial furnace installation three hours in advance. Since each data sample was taken

every 15 minutes, a prediction has to be made 12 steps ahead in the three-hour interval.

Twelve neural networks were used which predict the temperature from 1 to 12 steps

ahead. Four out of 35 measured variables including the temperature itself were relevant

as input. Twelve previous samples of each of the four signals were used as input for the

first neural network. Each subsequent neural network adds an extra input which is the

output of the previous networks. Hence, the input of the 12th neural network includes 24

samples of temperature signal. The training set contains 1300 data points and the testing

set contains 500 data points. The proposed model gave much better results than the

recursive model. Duhoux et al. also reported on some disadvantages of the proposed

method to include (1) the model requires a large amount of neural networks and input

variables, and (2) priori knowledge about the signal tendencies is not used.

44


single high-dimension back-propagation neural networks, which only managed to give an

accuracy rate of 10-20%.

The work by Duhoux et al. [DSMV01] is most relevant to the study area of this paper.

Duhoux et al. compared two methods for long-term prediction with neural network

chains. The classical method makes predictions in one-step-ahead recursively. In this

method, only a single one-step-ahead neural network is trained and it is used iteratively p

times to predict p step ahead. The input is shifted correspondingly at each iteration step.

The proposed method, on the other hand, uses p different trained neural networks with

different sizes of input vectors ranging from 1 to p. The input of a network is the same as

that of the previous network plus the predicted output from the previous network.

Dohoux et al. conducted an experiment to predict the hot metal temperature in the

industrial furnace installation three hours in advance. Since each data sample was taken

every 15 minutes, a prediction has to be made 12 steps ahead in the three-hour interval.

Twelve neural networks were used which predict the temperature from 1 to 12 steps

ahead. Four out of 35 measured variables including the temperature itself were relevant

as input. Twelve previous samples of each of the four signals were used as input for the

first neural network. Each subsequent neural network adds an extra input which is the

output of the previous networks. Hence, the input of the 12th neural network includes 24

samples of temperature signal. The training set contains 1300 data points and the testing

set contains 500 data points. The proposed model gave much better results than the

recursive model. Duhoux et al. also reported on some disadvantages of the proposed

method to include (1) the model requires a large amount of neural networks and input

variables, and (2) priori knowledge about the signal tendencies is not used.

44


In summary, several approaches has been proposed to improve the power of neural

networks by integrating them. Hashem et al. [HSY94] and Cho and Kim [CB95] use

parallel neural networks to improve accuracy of prediction or classification. Lee [Le96]

uses neural networks in a several level hierarchy to deal with data distribution. Kadaba

[KNJK89] uses supplemental neural networks to compress input and output dimensions.

These methods could be useful in making a direct forecast but are not relevant to time

series forecasting. The work by Duhoux et al. [DSMV01] is most related to long term

forecasting and hence most relevant to our work. However, it is impossible to apply this

method into our application because it would require too many neural networks since the

number of steps ahead to be predicted is high. Nevertheless, this work suggests that we

could develop different neural networks for different prediction terms ahead.

Our neural network approach will be introduced in the next section.

3.5 The Proposed Multiple Neural Network Approach

This session presents our multiple neural networks including the motivation and the

structure.

3.5.1 Motivation

A method that is suited for short-term forecasting may not be suited for long-term

forecasting. Tang et al. [TAF91] examined the ability of Box-Jenkins and neural

networks in short-term and long-term forecasting. The results of the experiments on the

three sets of data show that with one-period-ahead and six-period-ahead forecasts, the

Box-Jenkins models outperform the neural networks, while with the 12-period-ahead and

24-period-ahead forecasts, the neural network is superior. The relative performance of the

45


In summary, several approaches has been proposed to improve the power of neural

networks by integrating them. Hashem et al. [HSY94] and Cho and Kim [CB95] use

parallel neural networks to improve accuracy of prediction or classification. Lee [Le96]

uses neural networks in a several level hierarchy to deal with data distribution. Kadaba

[KNJK89] uses supplemental neural networks to compress input and output dimensions.

These methods could be useful in making a direct forecast but are not relevant to time

series forecasting. The work by Duhoux et al. [DSMV01] is most related to long term

forecasting and hence most relevant to our work. However, it is impossible to apply this

method into our application because it would require too many neural networks since the

number of steps ahead to be predicted is high. Nevertheless, this work suggests that we

could develop different neural networks for different prediction terms ahead.

Our neural network approach will be introduced in the next section.

3.5 The Proposed Multiple Neural Network Approach

This session presents our multiple neural networks including the motivation and the

structure.

3.5.1 Motivation

A method that is suited for short-term forecasting may not be suited for long-term

forecasting. Tang et al. [TAF91] examined the ability of Box-Jenkins and neural

networks in short-term and long-term forecasting. The results of the experiments on the

three sets of data show that with one-period-ahead and six-period-ahead forecasts, the

Box-Jenkins models outperform the neural networks, while with the 12-period-ahead and

24-period-ahead forecasts, the neural network is superior. The relative performance of the

45


neural network improves as the forecast horizon increases, which suggests that neural

network is a better choice for long-term forecasting.

One common problem with time series forecasting model is low accuracy of long

term forecasts. The estimated value of a variable may be reasonably reliable for short

terms into the future, but for longer terms, the estimate is liable to become less accurate.

There are several reasons to account for this inaccuracy. One reason is that the

environment in which the model was developed has changed over time and therefore the

assumptions held valid during the development process are no longer true after some

time. Another reason is that the model itself was not well developed. The inaccuracy

arises due to immature training or the lack of appropriate data for training. The trained

model may cover the surrounding neighborhood but fail to a model cyclic change of trend

or seasonal patterns of data. The third cause of inaccuracy is propagation errors during

recursive model predictions. Usually a model is built to predict one unit of time ahead

and used recursively 10 times when a 10-unit-ahead prediction is required. Every model

is likely to be associated with an error. For short-term prediction, the error can be less

than an acceptable threshold, but for long-term prediction, this error is accumulated and

can increase beyond the threshold.

The multiple-neural-network (MNN) approach presented in this study attempts to

deal with the third problem by reducing the number of recursions necessary. In this

approach, several neural networks built to predict from short- to long-term are combined

into one model.

The assumptions in this approach are as follows.

46


neural network improves as the forecast horizon increases, which suggests that neural

network is a better choice for long-term forecasting.

One common problem with time series forecasting model is low accuracy of long

term forecasts. The estimated value of a variable may be reasonably reliable for short

terms into the future, but for longer terms, the estimate is liable to become less accurate.

There are several reasons to account for this inaccuracy. One reason is that the

environment in which the model was developed has changed over time and therefore the

assumptions held valid during the development process are no longer true after some

time. Another reason is that the model itself was not well developed. The inaccuracy

arises due to immature training or the lack of appropriate data for training. The trained

model may cover the surrounding neighborhood but fail to a model cyclic change of trend

or seasonal patterns of data. The third cause of inaccuracy is propagation errors during

recursive model predictions. Usually a model is built to predict one unit of time ahead

and used recursively 10 times when a 10-unit-ahead prediction is required. Every model

is likely to be associated with an error. For short-term prediction, the error can be less

than an acceptable threshold, but for long-term prediction, this error is accumulated and

can increase beyond the threshold.

The multiple-neural-network (MNN) approach presented in this study attempts to

deal with the third problem by reducing the number of recursions necessary. In this

approach, several neural networks built to predict from short- to long-term are combined

into one model.

The assumptions in this approach are as follows.

46


• The patterns that repeatedly appear in historical data will appear again in the future.

This is an assumption in time series modeling.

• The short-term and long-term trends are different. While this assumption is not

necessary, the MNN approach is most useful to address this assumed situation.

3.5.2 Structure of a Multiple Neural Network Model

A MNN model is a group of ANNs working together to perform a task. Each ANN is

developed to predict a different time period ahead. The prediction terms are powers of 2,

that is, the first ANN predicts 1 unit ahead, the second predicts 2 units ahead, the third

predicts 4 units ahead, and so forth. Hereafter an ANN that predicts 2n units ahead is

referred to as an n-ordered ANN. There are two reasons to support the choice of binary

exponential. First, big gaps between two consecutive neural networks are not desirable.

The smaller the gaps are, the fewer steps the model needs to take in order to make a

forecast. Secondly, binary exponential does not introduce bias on the roles of networks. A

higher exponential model tends to use more lower-ordered neural networks in order to

propagate ahead.

A MNN prediction model can be viewed as a single partially connected ANN as

illustrated in Figure 3.5. However, unlike a complex single ANN requiring a long time to

train, a MNN breaks down the training into sub-ANNs and trains separately. Figure 3.5

shows a sample MNN with two sub-ANNs.

47


• The patterns that repeatedly appear in historical data will appear again in the future.

This is an assumption in time series modeling.

• The short-term and long-term trends are different. While this assumption is not

necessary, the MNN approach is most useful to address this assumed situation.

3.5.2 Structure of a Multiple Neural Network Model

A MNN model is a group of ANNs working together to perform a task. Each ANN is

developed to predict a different time period ahead. The prediction terms are powers of 2,

that is, the first ANN predicts 1 unit ahead, the second predicts 2 units ahead, the third

predicts 4 units ahead, and so forth. Hereafter an ANN that predicts 2n units ahead is

referred to as an n-ordered ANN. There are two reasons to support the choice of binary

exponential. First, big gaps between two consecutive neural networks are not desirable.

The smaller the gaps are, the fewer steps the model needs to take in order to make a

forecast. Secondly, binary exponential does not introduce bias on the roles of networks. A

higher exponential model tends to use more lower-ordered neural networks in order to

propagate ahead.

A MNN prediction model can be viewed as a single partially connected ANN as

illustrated in Figure 3.5. However, unlike a complex single ANN requiring a long time to

train, a MNN breaks down the training into sub-ANNs and trains separately. Figure 3.5

shows a sample MNN with two sub-ANNs.

4 7


MN N

Figure 3.5 A sample MNN model

To make a prediction, the neural network with the highest possible order is used first.

For example, to predict 7 units ahead, a 2-ordered neural network is used first. Assume

that the time at present is t. The values of xt and xt_t are already known. We wish to

predict xt+7. Let us denote the function that a n-ordered network models as f n and assume

that every network has two input variables, then the value of the output 7 units ahead is

computed as follows.

Xt+7 = f2(Xt+3, Xt+2)

= f2(fl(Xt-F1, Xt), fl(Xt, Xt-1))

= f2(fl (f0(Xt, Xt-1), Xt), fl (Xt, Xt-1))

The training of individual component neural networks can be dependent or

independent on the training of the other networks. In the development of a MNN, it may

be necessary to implement multi-step validation. One-step-ahead validation does not take

into account the model's sensitivity to errors that arise due to multi-step predictions

[MSV99]. In our MNN tool, multi-step validation was implemented, but the validation

window for each ANN is different. The validation error of the n-ordered network is

calculated as the average of root mean square errors (RMSE) of the (2")-step-ahead to the

48


MNN

O -ordered ANN

> o ..-

Figure 3.5 A sample MNN model

To make a prediction, the neural network with the highest possible order is used first.

For example, to predict 7 units ahead, a 2-ordered neural network is used first. Assume

that the time at present is t. The values of xt and xt-i are already known. We wish to

predict xt+7 . Let us denote the function that a n-ordered network models as /„ and assume

that every network has two input variables, then the value of the output 7 units ahead is

computed as follows.

Xt+7 = f 2 ( x t+3, X t+2 )

= f2(fi(xt+i, Xt) , fi(xt, X u ) )

= f2(fl(fo(Xt, X u ) , X t) , fi(xt, X u ) )

The training of individual component neural networks can be dependent or

independent on the training of the other networks. In the development of a MNN, it may

be necessary to implement multi-step validation. One-step-ahead validation does not take

into account the model’s sensitivity to errors that arise due to multi-step predictions

[MSV99], In our MNN tool, multi-step validation was implemented, but the validation

window for each ANN is different. The validation error of the n-ordered network is

calculated as the average of root mean square errors (RMSE) of the (2n)-step-ahead to the

4 8


(2n+1-1)-step-ahead. In order to calculate these steps, a higher ordered ANN needs to use

the prediction values of the lower ordered ANNs.

3.6 Tools

This session discusses the usage of the two tools used in our projects to develop neural

network applications.

3.6.1 NeurOn-Line Tool-kit

NeurOnline (NOL) from Gensym Corporation is adopted as the tool kit. It is a complete

graphical, object-oriented software tool kit for building neural network applications

which can be applied to dynamic environments. NOL includes facilities for managing

data sets, training the network, testing the fit between model and data, and deploying the

application in the operation environment. Using the NOL tool kit to develop ANN

applications is straightforward and typically involves three steps: cloning blocks,

connecting them and configuring their behavior. NOL is an application layer built on top

of the G2* Expert System shell. Thus it can be deployed and integrated with G2. Hybrid

neural, expert and fuzzy logic systems are simple to configure in NOL.

However, NOL also has some limitations. NOL implements only four types of neural

networks: Back-propagation, Radial Basis Function Networks, Rho Networks and Auto-

associative Networks. Users are allowed to configure only a small number of parameters,

which limits the ability of users to control over behaviors of the network constructed.

Also, while familiarity with G2 is not neccessary for using NOL, it is a requirement for

full utilization of the tool.

* Trademark of Gensym Corp. USA

49


(2n+1-l)-step-ahead. In order to calculate these steps, a higher ordered ANN needs to use

the prediction values of the lower ordered ANNs.

3.6 Tools

This session discusses the usage of the two tools used in our projects to develop neural

network applications.

3.6.1 NeurOn-Line Tool-kit

NeurOnline (NOL) from Gensym Corporation is adopted as the tool kit. It is a complete

graphical, object-oriented software tool kit for building neural network applications

which can be applied to dynamic environments. NOL includes facilities for managing

data sets, training the network, testing the fit between model and data, and deploying the

application in the operation environment. Using the NOL tool kit to develop ANN

applications is straightforward and typically involves three steps: cloning blocks,

connecting them and configuring their behavior. NOL is an application layer built on top

of the G2* Expert System shell. Thus it can be deployed and integrated with G2. Hybrid

neural, expert and fuzzy logic systems are simple to configure in NOL.

However, NOL also has some limitations. NOL implements only four types of neural

networks: Back-propagation, Radial Basis Function Networks, Rho Networks and Auto-

associative Networks. Users are allowed to configure only a small number of parameters,

which limits the ability of users to control over behaviors of the network constructed.

Also, while familiarity with G2 is not neccessary for using NOL, it is a requirement for

full utilization of the tool.

* T radem ark o f G en sy m C orp. U S A

49


3.6.2 Multiple Neural Network Tool

A tool was created to assists the development of multiple neural network applications.

This session provides the implementation and usage details of the tool.

The multiple-neural-network (MNN) tool was written in Java language using

JBuilder-4 development tool. Refer to Appendix A for instruction on how to start the

tool. The MNN system consists of two main parts: the user interface and the neural

network system. The user interface includes a number of screens for receiving input and

displaying output. The neural network system includes several classes implementing

methods for training and testing neural networks, as well as for making forecasts.

Ul

Cupid

NN SYSTEM Synapse

Figure 3.6 Classes of the neural network system of the MNN tool

The main classes implemented in the neural network system are illustrated in Figure

3.6. The Synapse class calculates weighted input of a neuron and performs weight

changes. The Neuron class includes functions to connect with another neuron in the

network and to activate transfer functions. Several neurons are connected together to

form a back-propagation neural network (BPNN) which is also a multi-layer perceptron.

The BPNN class implements methods for training, testing and forecasting. If the

50


3.6.2 Multiple Neural Network Tool

A tool was created to assists the development of multiple neural network applications.

This session provides the implementation and usage details of the tool.

The multiple-neural-network (MNN) tool was written in Java language using

JBuilder-4 development tool. Refer to Appendix A for instruction on how to start the

tool. The MNN system consists of two main parts: the user interface and the neural

network system. The user interface includes a number of screens for receiving input and

displaying output. The neural network system includes several classes implementing

methods for training and testing neural networks, as well as for making forecasts.

MMf-iapf Perv&pftrM

3.-J:

NNA r r a y da*

BPMM

has

N

Neuron

DataSet

has

NN SYSTEM Synaps-e

Figure 3.6 Classes of the neural network system of the MNN tool

The main classes implemented in the neural network system are illustrated in Figure

3.6. The Synapse class calculates weighted input of a neuron and performs weight

changes. The Neuron class includes functions to connect with another neuron in the

network and to activate transfer functions. Several neurons are connected together to

form a back-propagation neural network (BPNN) which is also a multi-layer perceptron.

The BPNN class implements methods for training, testing and forecasting. If the

50


multiple-step-ahead validation mode is triggered, an individual network communicates

with other lower-ordered neural networks during the training process. There is also a

class that manages all the neural networks in an array. This class communicates with the

Data-Set class and activates the necessary methods in the component networks to conduct

overall training, testing and forecasts. The Data-Set class reads time series' points from

text files and creates training and testing records. The size of these records depends on

user-input parameters and the topology of each component neural network.

All neural networks in the current MNN system are multi-layer neural perceptrons

with only one hidden layer. The user can determine the neural network structures by

setting the parameters including the number of input, output, hidden units, etc.. The

connection weights and biases are initialized with small random values. Users can train a

MNN, test an existing MNN or use a MNN to make forecasts. Training is the heaviest

task in developing a neural network application. For each task, the tool first asks the user

to enter necessary parameters. Then it executes the task and displays the results.

The component neural networks in the system are trained with the back-propagation

algorithm. Each training cycle in this algorithm has two phases. In the first phase,

historical data records are fed one by one into the neural network. The signals go forward

from input to output to compute the output at each hidden and output unit. In the second

phase, the output values are compared with the expected values. If there is a difference,

the algorithm adjusts the connection weights of the neural network to minimize the

prediction error. The system repeats the training cycles until one of the following

scenarios: the error reaches an acceptable threshold, the number of cycles reaches a pre-

set maximum value, or over-fitting occurs.

51


multiple-step-ahead validation mode is triggered, an individual network communicates

with other lower-ordered neural networks during the training process. There is also a

class that manages all the neural networks in an array. This class communicates with the

Data-Set class and activates the necessary methods in the component networks to conduct

overall training, testing and forecasts. The Data-Set class reads time series’ points from

text files and creates training and testing records. The size of these records depends on

user-input parameters and the topology of each component neural network.

All neural networks in the current MNN system are multi-layer neural perceptrons

with only one hidden layer. The user can determine the neural network structures by

setting the parameters including the number of input, output, hidden units, etc.. The

connection weights and biases are initialized with small random values. Users can train a

MNN, test an existing MNN or use a MNN to make forecasts. Training is the heaviest

task in developing a neural network application. For each task, the tool first asks the user

to enter necessary parameters. Then it executes the task and displays the results.

The component neural networks in the system are trained with the back-propagation

algorithm. Each training cycle in this algorithm has two phases. In the first phase,

historical data records are fed one by one into the neural network. The signals go forward

from input to output to compute the output at each hidden and output unit. In the second

phase, the output values are compared with the expected values. If there is a difference,

the algorithm adjusts the connection weights of the neural network to minimize the

prediction error. The system repeats the training cycles until one of the following

scenarios: the error reaches an acceptable threshold, the number of cycles reaches a pre

set maximum value, or over-fitting occurs.

51


Over-fitting happens when a neural network learns the training patterns well but have

poor generalization ability. Over-fitting is usually detected by dividing the historical data

into two sets. The training set is used to train the neural net. The validation set is used to

determine the performance of a neural network on patterns that are not used during

learning. Training and validation occurs simultaneous. When the error from validation

runs starts to increase, training is stopped for over-fitting has begun. In our

implementation, the MNN system determines over-fitting has occurred when the

validation error monotonically increases for a certain number of cycles.

The user interface of the system consists of two parts for input and output. The input

screens are shown in Figure 3.7 to Figure 3.10 and the output screens are shown in Figure

3.11 to Figure 3.13. Details of the input parameters and output are described below.

52


Over-fitting happens when a neural network learns the training patterns well but have

poor generalization ability. Over-fitting is usually detected by dividing the historical data

into two sets. The training set is used to train the neural net. The validation set is used to

determine the performance of a neural network on patterns that are not used during

learning. Training and validation occurs simultaneous. When the error from validation

runs starts to increase, training is stopped for over-fitting has begun. In our

implementation, the MNN system determines over-fitting has occurred when the

validation error monotonically increases for a certain number of cycles.

The user interface of the system consists of two parts for input and output. The input

screens are shown in Figure 3.7 to Figure 3.10 and the output screens are shown in Figure

3.11 to Figure 3.13. Details of the input parameters and output are described below.


5 2

DiEnter Training Parameters

Enter Training Parameters

Load Parameters from File 1

Lead Time

Number of Input Variables

Number of Neural Networks 17—

Minimum Number of Training Cycles

Maximum Number of Training Cycles 14 0

Validation Interval

Validation Window Size

Using Multi-Validation

Training Data Filename ialMelfortFlowTrain.txt

Validation Data Filename IlfortFlovNalidation.td

Training Output Filename ktFlowTrainOutputtd

Edit Each Neural NetwbrIcraraiteters ,

Train Cancel I

MEM

Figure 3.7 Screen for inputting training parameters

Users have two choices for input, either from a parameter file or manually. The

parameter file is a text file with a specified structure. The training parameter file contains

values for all the input items in Figures 3.7 and 3.8.

Most of the input items in the input screens are self-explanatory. Some additional

explanations are provided as follows.

Training Parameters (Figure 3.7)

• Lead time: the number of steps ahead to be forecasted. This is an obscure parameter

and is to be eliminated in the later versions.

53


t n | x |

Enter Training Parameters

Load P aram eters from File I

Lead T im e I1

N u m b e r o f Input V ariab les |6

N u m be r o f N eura l N e tw orks p

M in im um N u m be r o f T ra in ing Cycles |o

M axim um N u m be r o f T ra in ing Cycles pOOO

V alida tion Interval |4

V a lida tion W ind ow Size j s o

U sing M ulti-Validation (*

T ra in ing Data F ilenam e [a/M elfortF lowTra in txt q | ;

V a lida tion Data F ilenam e |lfortF loW Validation txt q |T ra in ing Output F ilenam e JrrtF low Tra inO utputtrt < a |

T E d it E a c h N e u ra i N elw o rk P a ra m ete rs j

T ra in | C ance l |

Figure 3.7 Screen for inputting training parameters

Users have two choices for input, either from a parameter file or manually. The

parameter file is a text file with a specified structure. The training parameter file contains

values for all the input items in Figures 3.7 and 3.8.

Most of the input items in the input screens are self-explanatory. Some additional

explanations are provided as follows.

Training Parameters (Figure 3.7)

• Lead time: the number of steps ahead to be forecasted. This is an obscure parameter

and is to be eliminated in the later versions.

53


• Number of input variables or the size of input vector

• Number of neural networks in the MNN to be trained

• Minimum number of training cycles: In most case this value is set to zero. However,

in some cases the minimum number of training cycles need to be set to a number

greater than zero for the following reason. The validation error often increases at the

beginning of the training process and then decreases eventually. However, the MNN

tool may mistake the increasing error as over-fitting and halt the training. Enabling

the user to set the minimum number of training cycles ensures the MNN tool would

continue training pass this period without stopping due to misperceived over-fitting.

• Maximum number of training cycles: When the neural network has not yet over-fitted

the data and the validation error is still higher than the threshold set by users, the

MNN system would continue training until the maximum number of cycles is

reached.

• Validation interval: The validation interval is the interval between two consecutive

validations in terms of training cycles. For example, if the validation interval is 4,

validation is performed every 4 training cycles. The default value for this parameter is

1. Setting this parameter with a higher value reduces necessary training time because

validation errors are calculated less frequently during the training process.

• Validation window size: The validation window size is the number of consecutive

non-decreasing validation errors needed before the system decides that over-fitting

has occurred.

• Using multi-step validation: Users can choose one-step validation or multi-step

validation by deselecting or selecting this radio button. In one-step validation, the

54


• Number of input variables or the size of input vector

• Number of neural networks in the MNN to be trained

• Minimum number of training cycles: In most case this value is set to zero. However,

in some cases the minimum number of training cycles need to be set to a number

greater than zero for the following reason. The validation error often increases at the

beginning of the training process and then decreases eventually. However, the MNN

tool may mistake the increasing error as over-fitting and halt the training. Enabling

the user to set the minimum number of training cycles ensures the MNN tool would

continue training pass this period without stopping due to misperceived over-fitting.

• Maximum number of training cycles: When the neural network has not yet over-fitted

the data and the validation error is still higher than the threshold set by users, the

MNN system would continue training until the maximum number of cycles is

reached.

• Validation interval: The validation interval is the interval between two consecutive

validations in terms of training cycles. For example, if the validation interval is 4,

validation is performed every 4 training cycles. The default value for this parameter is

1. Setting this parameter with a higher value reduces necessary training time because

validation errors are calculated less frequently during the training process.

• Validation window size: The validation window size is the number of consecutive

non-decreasing validation errors needed before the system decides that over-fitting

has occurred.

• Using multi-step validation: Users can choose one-step validation or multi-step

validation by deselecting or selecting this radio button. In one-step validation, the

54


lead times for validation and training are the same. Each neural network is validated

by itself. In multi-step validation, the lead times for validation spread over longer

ranges (Refer to section 3). In the latter case, the training of higher-ordered neural

networks requires the existence of trained lower-ordered neural networks to calculate

the predicted output for a validation input vector. Hence, the training quality of the

higher-ordered networks depends on the training quality of the lower-ordered neural

networks.

• Training data file name

• Validation data file name

• Training output file name

Neural Network's Parameters

Neural network number

6* Load neural netNorkfrorn tile

File name bpn_O

Train neural network

Number of hidden units

Error Threshold

Learning rate

Momentum

OK Next

5

lo.o

Cancel

Figure 3.8. Screen for inputting training parameters of component neural network

Figure 3.8 illustrates the input screen for setting the training parameters of each

individual neural network. This screen for 0-ordered neural network is opened when the

55


lead times for validation and training are the same. Each neural network is validated

by itself. In multi-step validation, the lead times for validation spread over longer

ranges (Refer to section 3). In the latter case, the training of higher-ordered neural

networks requires the existence of trained lower-ordered neural networks to calculate

the predicted output for a validation input vector. Hence, the training quality of the

higher-ordered networks depends on the training quality of the lower-ordered neural

networks.

• Training data file name

• Validation data file name

Training output file name

Edit Neural Network Param eters

Neural Network's ParametersN eura l ne tw ork num ber |0

f*- Load neura l ne tw ork from file

File nam e )ts_bpn_o|

T ra in neura l ne tw ork

N u m be r o f h idden un its p

E rro rT h re sh o ld |°-5

Learn ing rate

M om entum

0.2

joTo

OK Next C ancel

Figure 3.8. Screen for inputting training parameters of component neural network

Figure 3.8 illustrates the input screen for setting the training parameters of each

individual neural network. This screen for 0-ordered neural network is opened when the

55


user clicks the "Edit each neural network parameter" button in the screen shown in Figure

3.7. After setting the parameters for one neural network, the user can clicks the "Next"

button to advance to the next neural network.

Neural network parameters for training (Figure 3.8)

• Load neural network from file: User can continue to train a previously trained neural

network by selecting this radio button and entering the name of the neural network

file.

• Neural network file name: To be entered only when the "load neural network" radio

button is checked.

• Train neural network: since there are several neural networks in a MNN system, it is

possible that the user wants to continue training only one component network. In this

case, the user can choose whether to train a component neural network by select or

deselect this radio button in the respective screen for that network.

• Number of hidden units

• Error threshold (in percentage): When validation error is less than or equals this

value, the training is stopped.

• Learning rate: The learning rate is a scaling factor that decides how fast an algorithm

should learn. A higher learning rate improves the learning speed but if it is too high,

the algorithm will exceed the optimum weights.

• Momentum: The momentum adds a contribution from the previous step when a

weight is updated.

56


user clicks the "Edit each neural network parameter" button in the screen shown in Figure

3.7. After setting the parameters for one neural network, the user can clicks the "Next"

button to advance to the next neural network.

Neural network parameters for training (Figure 3.8)

• Load neural network from file: User can continue to train a previously trained neural

network by selecting this radio button and entering the name of the neural network

file.

• Neural network file name: To be entered only when the "load neural network" radio

button is checked.

• Train neural network: since there are several neural networks in a MNN system, it is

possible that the user wants to continue training only one component network. In this

case, the user can choose whether to train a component neural network by select or

deselect this radio button in the respective screen for that network.

• Number of hidden units

• Error threshold (in percentage): When validation error is less than or equals this

value, the training is stopped.

• Learning rate: The learning rate is a scaling factor that decides how fast an algorithm

should leam. A higher learning rate improves the learning speed but if it is too high,

the algorithm will exceed the optimum weights.

• Momentum: The momentum adds a contribution from the previous step when a

weight is updated.

56


MEI Et

Enter Testing Parameters

Load Parameters from File

Lead Time 124


Number of Neural Networks

Test Data Filename ita/MelfortFlowTest.txt Q I

Test Output Filename lowTestOutput5NN.txt Q I

Load. Neural Networks

Test i Cancel

Figure 3.9. Screen for inputting testing parameters

Figure 3.9 illustrates the input screen for setting the testing parameters. Similar to the

training parameters, the user can set the testing parameters either with a parameter file or

manually.

Testing parameters (Figure 3.9)

• Lead time: the number of steps ahead to be forecasted.

• Number of input variables

• Number of neural networks

• Test data file name

• Test output file name

57


Enter Testing ParametersLoad P aram eters from File |

Lead T im e P *

N um ber o f Input V ariab les P

N u m be r o f N eura l N e tw orks p

T es t Data F ilenam e fta/M elfortFIowTest.txt

T es t O utput F ilenam e |ow TestO utput5N N bft

Load N eura l Netw orks

T e s t | C ance l |

Figure 3.9. Screen for inputting testing parameters

Figure 3.9 illustrates the input screen for setting the testing parameters. Similar to the

training parameters, the user can set the testing parameters either with a parameter file or

manually.

Testing parameters (Figure 3.9)




• Test data file name

• Test output file name

o j

q J

5 7


Al

Enter Predicting Parameters

Load Parameters from File

Lead Time


Number of Neural Networks

Prediction Data Filename . isiNlelfortFluyvTest.Ixt

Prediction Output Filenamef FlgyvPredictOutput.td

Load Neural Networks

Predict Cancel

Q,

Figure 3.10. Screen for inputting parameters for prediction

Figure 3.10 illustrates the input screen for setting the prediction parameters. The user

can set the prediction parameters either with a parameter file or manually.

Prediction parameters (Figure 3.10)




• Prediction data file name: This file contains input vector values.

• Prediction output file name: This file contains predicted outputs.

58


Enter Predicting ParametersLoad P aram eters from File |

Lead T im e P

N u m be r o f Input V ariab les |6

N u m be r o f N eura l N e tw orks |l

P red iction Data F ilenam e ^s /M elfo rtF IowTest.txt Q |

P red iction O utput F ile n a m e fF lo w P re d ic tO u tp u ttft ^ j

; Load N eura l N e tw orks j

Pred ict | C ance l |

Figure 3.10. Screen for inputting parameters for prediction

Figure 3.10 illustrates the input screen for setting the prediction parameters. The

can set the prediction parameters either with a parameter file or manually.

Prediction parameters (Figure 3.10)




• Prediction data file name: This file contains input vector values.

• Prediction output file name: This file contains predicted outputs.


Training Results Log

Neural Network #I0

Training RMSE 10.057

Validation RMSE 10.009

Validation MAPE 12.561

Save Neural Network

Next I Close • Clear 1

Figure 3.11 Screen for training output

Test Results

RMSE 10.032

MAPE 110.676

OK

Log

Figure 3.12 Screen for inputting testing output


Training Results

N eura l N e lw o rk # 0

T ra in ing RMSE 0.057

V a lida tion RMSE J0.009

V a lida tion MAPE |2.561

Log

Save N eura l N etw ork

Ned C lose ' . I " p [ e a r

Figure 3.11 Screen for training output

Test Results Log

RMSE 0 032

MAPE 10.676

OK

Figure 3.12 Screen for inputting testing output


Prediction Results Log

Output written in file

D:iusersihanhithesisiprogram/BEN/misc

/Gas/predict/MelfortFlowPredictOutput

I. txt

OK

Figure 3.13 Screens for prediction output

Training, validation and testing results and errors are reported to the users. The output

screens for showing these are shown in Figure 3.11 to Figure 3.13. Figure 3.11 shows the

screen for training results that include the training and validation errors. Users are

recommended to save the trained neural networks to a file by clicking the "Save neural

network" button. Otherwise, the neural networks are saved in default temporary files that

are easily to be overwritten. Neural network files are binary Java-object files. Predicted

output values are written into external text files. Figure 3.12 shows the screen for the

testing errors as part of the test results that include testing errors. The predicted output

based on the testing data is recorded in separate text files. Figure 3.13 shows the screen to

inform the user that the prediction results have been written to an external file. Apart

from message dialogs, the MNN tool also outputs any errors or exceptions to the log

fields.

60


Prediction Results

jOutput w r itten in f i l e p : /users/hanh/thesis/program /H U N/m isc i/G as/predic t/H e1f o r tFlowPredic tOutput !. tx tI

Figure 3.13 Screens for prediction output

Training, validation and testing results and errors are reported to the users. The output

screens for showing these are shown in Figure 3.11 to Figure 3.13. Figure 3.11 shows the

screen for training results that include the training and validation errors. Users are

recommended to save the trained neural networks to a file by clicking the "Save neural

network" button. Otherwise, the neural networks are saved in default temporary files that

are easily to be overwritten. Neural network files are binary Java-object files. Predicted

output values are written into external text files. Figure 3.12 shows the screen for the

testing errors as part of the test results that include testing errors. The predicted output

based on the testing data is recorded in separate text files. Figure 3.13 shows the screen to

inform the user that the prediction results have been written to an external file. Apart

from message dialogs, the MNN tool also outputs any errors or exceptions to the log

fields.

6 0


Chapter 4

Case Study

In this chapter, the method and framework discussed in chapter 3 were applied on two

problem domains. The first application predicts monthly production of oil wells in

southwestern region of Saskatchewan province. The second application predicts hourly

flow rate at a gas station in Saskatchewan.

4.1 Petroleum Production Prediction

This session is compiled from three sources [NC00], [NCM02], and [CNO3].

Estimation of monthly production rate of in-fill wells is important for cost-effective

operations of the petroleum industry. In our project, the artificial neural network

technique was adopted to obtain functional relationships between production time series,

core analysis and drill stem test (DST) results. Such empirical correlations can be used to

assist petroleum engineers in designing production equipment and surface facilities,

planning future production, and making economic forecasts.

Reservoir engineers typically predict primary performance through curve fitting to

existing production data. Experience from past production, particularly from wells within

the same or similar pools (i.e. Pools with similar oil and geological characteristics) can

lead to reasonable predictions of primary performance. The decision to make the

transition to secondary and tertiary production requires a more time-consuming and

complex use of reservoir simulators that utilize reservoir characteristics based on core

and log analysis, as well as historical production. This also requires significant

61


Chapter 4

Case Study

In this chapter, the method and framework discussed in chapter 3 were applied on two

problem domains. The first application predicts monthly production of oil wells in

southwestern region of Saskatchewan province. The second application predicts hourly

flow rate at a gas station in Saskatchewan.

4.1 Petroleum Production Prediction

This session is compiled from three sources [NCOO], [NCM02], and [CN03],

Estimation of monthly production rate of in-fill wells is important for cost-effective

operations of the petroleum industry. In our project, the artificial neural network

technique was adopted to obtain functional relationships between production time series,

core analysis and drill stem test (DST) results. Such empirical correlations can be used to

assist petroleum engineers in designing production equipment and surface facilities,

planning future production, and making economic forecasts.

Reservoir engineers typically predict primary performance through curve fitting to

existing production data. Experience from past production, particularly from wells within

the same or similar pools (i.e. Pools with similar oil and geological characteristics) can

lead to reasonable predictions of primary performance. The decision to make the

transition to secondary and tertiary production requires a more time-consuming and

complex use of reservoir simulators that utilize reservoir characteristics based on core

and log analysis, as well as historical production. This also requires significant

61


computational capacity. Hence an alternative automation approach is desirable. The ANN

approach was therefore adopted.

Time series modeling is also applied in conjunction with ANN approach. There is a

huge amount of data readily available from within companies and from public sources

that is barely used to understand more about the production process, to optimize timing

for initiation of advanced recovery processes and, potentially, to identify candidate wells

for production or injection. Thus the set of time-series data can be used to build a model

to predict production and opportunities. The historical data are analyzed to identify data

patterns and, assuming the patterns will continue into the future, they are extrapolated in

order to produce forecasts.

The following are the key concepts from petroleum engineering relevant to this

project:

• Petroleum: Petroleum includes oil and gas. Oil can be categorized into four crude

types based on density: light, medium, heavy and bitumen. Since the oil type has high

influences on production as well as fluid parameters, petroleum experts suggested that

one model should be built for each crude type. In this study, only medium oil was

considered.

• Reservoir: A petroleum reservoir is a volume of porous sedimentary rock that has

been filled by petroleum and possibly other fluids. Oil, along with varying amounts of

water and gas, reside in the porous spaces of the rock.

• Well: Many wells can be drilled to recover fluids, including oil and gas, inside the

boundary of each reservoir. Wells can be categorized according to their usages. In

this study, the term 'well' means producer wells.

62


computational capacity. Hence an alternative automation approach is desirable. The ANN

approach was therefore adopted.

Time series modeling is also applied in conjunction with ANN approach. There is a

huge amount of data readily available from within companies and from public sources

that is barely used to understand more about the production process, to optimize timing

for initiation of advanced recovery processes and, potentially, to identify candidate wells

for production or injection. Thus the set of time-series data can be used to build a model

to predict production and opportunities. The historical data are analyzed to identify data

patterns and, assuming the patterns will continue into the future, they are extrapolated in

order to produce forecasts.

The following are the key concepts from petroleum engineering relevant to this

project:

• Petroleum: Petroleum includes oil and gas. Oil can be categorized into four crude

types based on density: light, medium, heavy and bitumen. Since the oil type has high

influences on production as well as fluid parameters, petroleum experts suggested that

one model should be built for each crude type. In this study, only medium oil was

considered.

• Reservoir: A petroleum reservoir is a volume of porous sedimentary rock that has

been filled by petroleum and possibly other fluids. Oil, along with varying amounts of

water and gas, reside in the porous spaces of the rock.

• Well: Many wells can be drilled to recover fluids, including oil and gas, inside the

boundary of each reservoir. Wells can be categorized according to their usages. In

this study, the term ‘well’ means producer wells.

62


• Horizon: Each horizon is a formation layer with unique geographical characteristics.

The data used in this study was taken from one horizon to ensure all wells have the

same geographical characteristics.

• Production history: The production rate fluctuates with time as shown in Figure

4.1.1. Fluctuation can be due to the following reasons. Activities such as well

stimulation can create fractures in the near well-bore area, which increases

production. Production decline is often caused by pressure decline in a reservoir or

deterioration of the mechanical condition of the production wells. An effective way of

slowing the decline may be supplemental recovery operations such as water flooding.

Another method is to shut down the well for a period of time to regain pressure.

While production drops to nil during the shut-in period, it usually goes up afterwards.

Eventually, however, the decline will recur [Di85].

Monthly Production

Pro

duct

ion (in

m3

)

12000 10000 8000 6000 4000 2000

0 ffnitraturria7,11EX11Elraltilli,c,Frm ilk „Ili „, u „ [11,4 I„

N CO 'I' IC) O N CO C) 0 N CO CO 0 CV 't CO CS)

Months

Figure 4.1.1 Well production history

1 -

C\I

N CO CO

NLO

N

63


• Horizon: Each horizon is a formation layer with unique geographical characteristics.

The data used in this study was taken from one horizon to ensure all wells have the

same geographical characteristics.

• Production history: The production rate fluctuates with time as shown in Figure

4.1.1. Fluctuation can be due to the following reasons. Activities such as well

stimulation can create fractures in the near well-bore area, which increases

production. Production decline is often caused by pressure decline in a reservoir or

deterioration of the mechanical condition of the production wells. An effective way of

slowing the decline may be supplemental recovery operations such as water flooding.

Another method is to shut down the well for a period of time to regain pressure.

While production drops to nil during the shut-in period, it usually goes up afterwards.

Eventually, however, the decline will recur [Di85].

Monthly Production

12000 rc 10000^ 8000 r .1 6000 ; I 4000■g 2000 4-

Months

Figure 4.1.1 Well production history

6 3


The following sections present two ANN approaches for prediction of oil production

rate that has been developed and tested on four pools in the southeastern region of

Saskatchewan, Canada.

4.1.1 Data

4.1.1.1 Data Collection

Saskatchewan Energy and Mines supplied the data sets used in this study. The entire data

set contains 14538 production rates and 49 core analysis and pressure data points

recorded from 49 oil producer wells. These 49 wells are located in four independent

reservoirs in the southeastern region of Saskatchewan including Flat Lake, Hoffer,

Neptune and Skinner Lake that produce the same medium type of crude oil from the

same horizon of Ratcliffe.

The production data were collected in a period of about 30 years from the 1960's to

1995. While there were sufficient data to develop an accurate time series model, the total

number of available data patterns for core and DST analysis is only 49. Since insufficient

core and DST data may mean that meaningful correlation between production rate and

analysis data cannot be found, it would be desirable to increase the number of reservoirs

to enhance validity of the model.

4.1 .1 .2 Data Cleaning and Transformation

A number of preprocessing steps were taken as follows. The permeability and porosity

data are raw data obtained from core logs. Permeability was averaged from horizontal

and vertical permeability. For each well, permeability and porosity values are measured

from core samples taken from different depths. If those values fall below a cut-off value

64


The following sections present two ANN approaches for prediction of oil production

rate that has been developed and tested on four pools in the southeastern region of

Saskatchewan, Canada.

4.1.1 Data

4.1.1.1 Data Collection

Saskatchewan Energy and Mines supplied the data sets used in this study. The entire data

set contains 14538 production rates and 49 core analysis and pressure data points

recorded from 49 oil producer wells. These 49 wells are located in four independent

reservoirs in the southeastern region of Saskatchewan including Flat Lake, Hoffer,

Neptune and Skinner Lake that produce the same medium type of crude oil from the

same horizon of Ratcliffe.

The production data were collected in a period of about 30 years from the 1960’s to

1995. While there were sufficient data to develop an accurate time series model, the total

number of available data patterns for core and DST analysis is only 49. Since insufficient

core and DST data may mean that meaningful correlation between production rate and

analysis data cannot be found, it would be desirable to increase the number of reservoirs

to enhance validity of the model.

4.1.1.2 Data Cleaning and Transformation

A number of preprocessing steps were taken as follows. The permeability and porosity

data are raw data obtained from core logs. Permeability was averaged from horizontal

and vertical permeability. For each well, permeability and porosity values are measured

from core samples taken from different depths. If those values fall below a cut-off value

64


set by the petroleum engineering expert, they are ignored. Then the remaining values

taken from the same wells are averaged. Reservoir angle was ignored in the calculation

since it is relatively small in the area selected.

Index Production 1 P1 2 P2 3 P3 4 P4 5 P5 6 P6 17 Pr? 18 P8 9 P9

10 P10 11 PD. 12 P12 13 P13 14 P14 15 P15

Original time series

V ar 1 V ar 2 V ar 3 Output 1 Output 2

P1 P2 P3 P4 P5 P2 Pz P4 P P6

I, I

P8 P9 P10 P11 P12 P9 P10 Pli P12 P13

PIO Pll P12 P13 P14 P11 P12 P13 P14 P15

Length-five records

Figure 4.1.2 Elimination of incomplete records

The monthly productions were scaled to 600 hours a month, which is the mean of the

number of production hours. All the months with zero production were eliminated. This

process of elimination has the drawback of producing discontinuity in the time series data

since individual months with missing production data are ignored. In this way, the

number of long-term records is significantly reduced. A record contains both the input

and the expected output. For example, if production of the three months of January,

February and March are to be used to predict production two months ahead, the record

should contain production from January to May. If there was no production in the 7th

month (P7 = 0), P7 was eliminated and therefore the following records of five-month

duration were also eliminated: P3-P7, Pa-Ps, P5-P9, P7-Pi 1, which are illustrated as

gray rows in Figure 4.1.2. The longer the duration of records is, the more likely the

65


set by the petroleum engineering expert, they are ignored. Then the remaining values

taken from the same wells are averaged. Reservoir angle was ignored in the calculation

since it is relatively small in the area selected.

Index Production1 Pi2 P23 Pj4 P+5 Pj6 P<5m P78 P89 Ps>10 Pio11 P ll12 P 1213 P l314 P l415 P l5

V ar 1 Var 2 Var 3 O u tpu t 1 O utput 2

P i P2 P3 P4 PiPa P3 P4 PaP* F* ■P> ,, __Pa 4Pi Ps Pe *P* F " V Ph Ir__j P - |

Ps Ps Pio P 11 P 12

Ps> Pio P ll P 12 P l3Pio P ll P 12 P l3 P l4P ll P 12 P l3 Pl4 P li

Length-five records

Original time series

Figure 4.1.2 Elimination of incomplete records

The monthly productions were scaled to 600 hours a month, which is the mean of the

number of production hours. All the months with zero production were eliminated. This

process of elimination has the drawback of producing discontinuity in the time series data

since individual months with missing production data are ignored. In this way, the

number of long-term records is significantly reduced. A record contains both the input

and the expected output. For example, if production of the three months of January,

February and March are to be used to predict production two months ahead, the record

should contain production from January to May. If there was no production in the 7

month (P7 = 0), P7 was eliminated and therefore the following records of five-month

duration were also eliminated: P 3- P 7, P 4- P 8, P 5- P 9, P e - P io , P 7- P 11, which are illustrated as

gray rows in Figure 4.1.2. The longer the duration of records is, the more likely the

th

6 5


number of records being eliminated would be high. In the future, this drawback can be

avoided if an attempt is made to repair the missing values based on neighboring values

instead of eliminating the entire record.

Since the sigmoid activation function was used which returns a number in the range

of [0, 1], all monthly productions should be normalized to this range. The following

equation is used for normalization

x.(x-min)/(max-min),

where min and max are the estimated minimum and maximum boundaries of monthly

productions and not the actual boundaries in the training data set.

4.1.1.3 Data Set Manipulation

The data set was divided into three subsets in the proportion of 60:20:20 for training,

validation and testing. The training set is used to train the neural net. The validation set is

used to determine the performance of a neural network on patterns that are not used

during learning. Training and validation occurs simultaneous, and the two sets of data are

used for exploration of parameter values of a network configuration. When the error from

validation runs starts to increase, training is stopped for over-fitting has begun. The test

set is used for finally checking the overall performance of a neural when parameter

values have been determined in the model.

4.1.2 Using NeurOn-Line

The initial modeling was conducted using NeurOn-Line (NOL from Gensym

Corporation, USA), a tool-kit for neural networks modeling. NOL tool is introduced in

chapter 3, section 3.6.1. NOL supports fast development of a neural network application

66


number of records being eliminated would be high. In the future, this drawback can be

avoided if an attempt is made to repair the missing values based on neighboring values

instead of eliminating the entire record.


of [0, 1], all monthly productions should be normalized to this range. The following

equation is used for normalization

x=(x-min)/(max-min),


productions and not the actual boundaries in the training data set.

4.1.1.3 Data Set Manipulation

The data set was divided into three subsets in the proportion of 60:20:20 for training,

validation and testing. The training set is used to train the neural net. The validation set is

used to determine the performance of a neural network on patterns that are not used

during learning. Training and validation occurs simultaneous, and the two sets of data are

used for exploration of parameter values of a network configuration. When the error from

validation runs starts to increase, training is stopped for over-fitting has begun. The test

set is used for finally checking the overall performance of a neural when parameter

values have been determined in the model.

4.1.2 Using NeurOn-Line

The initial modeling was conducted using NeurOn-Line (NOL from Gensym

Corporation, USA), a tool-kit for neural networks modeling. NOL tool is introduced in

chapter 3, section 3.6.1. NOL supports fast development of a neural network application

66


to enable rapid assessment on whether a meaningful model can be built on the available

data and whether the set of chosen variables are suitable for the task.

4.1.2.1 Development of a Model of Production Time Series and

Geoscience Parameters

The first neural network model developed on NOL includes geoscience parameters as

input. The factors that have been identified to influence production include permeability,

porosity, viscosity, density, fluid compressibility, oil saturation, pressure and well

location. However, since not all the associated parameters for a well are available, only

the following three parameters that are easily obtainable are included in the model:

I. Permeability (k) describes the relative ease with which fluids can move through the

reservoir and is therefore a factor in determining well productivity.

2. Porosity (0) is an expression of the volume of void space in the rocks and thus is

related to the volume of oil or gas that can be recovered from the reservoir.

3. First Shut-in Pressure (p) is used as a proxy variable for initial formation pressure

The permeability and porosity values are obtained from laboratory core analysis, and

the first shut-in pressure data are derived from drilled stem test (DST) analysis. In

addition to the above parameters, production time series data were used as a source of

input. The production rates of the three months prior to the target prediction month are

included as input variables. If Pt denotes the production of the target month t for which a

prediction is made, then the productions of the three previous months are Pt-3, Pt-2 and

Pt-i •

In the ANN model, the six conditional variables are permeability, porosity, pressure

and the oil production volumes of the three previous months and the consequent variable

67


to enable rapid assessment on whether a meaningful model can be built on the available

data and whether the set of chosen variables are suitable for the task.

4.1.2.1 Development of a Model of Production Time Series and

Geoscience Parameters

The first neural network model developed on NOL includes geoscience parameters as

input. The factors that have been identified to influence production include permeability,

porosity, viscosity, density, fluid compressibility, oil saturation, pressure and well

location. However, since not all the associated parameters for a well are available, only

the following three parameters that are easily obtainable are included in the model:

1. Permeability (k) describes the relative ease with which fluids can move through the

reservoir and is therefore a factor in determining well productivity.

2. Porosity ((])) is an expression of the volume of void space in the rocks and thus is

related to the volume of oil or gas that can be recovered from the reservoir.

3. First Shut-in Pressure (p) is used as a proxy variable for initial formation pressure

The permeability and porosity values are obtained from laboratory core analysis, and

the first shut-in pressure data are derived from drilled stem test (DST) analysis. In

addition to the above parameters, production time series data were used as a source of

input. The production rates of the three months prior to the target prediction month are

included as input variables. If Pt denotes the production of the target month t for which a

prediction is made, then the productions of the three previous months are Pt_3, Pt-2 and

Pt-i.

In the ANN model, the six conditional variables are permeability, porosity, pressure

and the oil production volumes of the three previous months and the consequent variable

67


is production of the current month. Since there are only six input variables and one output

variable, it was assumed that the number of hidden neurons is also small. In practice, it is

best to have as few hidden nodes as possible because fewer weights need to be

determined.

A scaled data set was used to select the best network configuration. The same training

and validation sets were applied to train and validate five different back-propagation

networks with 2, 3, 4, 5 and 6 hidden units respectively. As can be seen in Table 1, the

network with 4 hidden nodes produced the least validation error, and was the best model

found. The training error was 0.034 and the validation error was 0.032.

Table 4.1.1. Network configuration — model 1

# Hidden Units Training RMSE Validation RMSE 2 0.034 0.034 3 0.034 0.033

4 0.034 0.032

5 0.034 0.034 6 0.033 0.034

With the training and validation sets specified above, the back-propagation neural

network was trained three times with different initial weights. During training, the root

mean square error (RMSE) on the training set declined steadily but the amount of

decrease became insignificant after the first few cycles. The ANN was saved every 300

cycles. The validation error started to increase between cycles 300 and 600, which

indicated over-fitting had occurred. The saved 600-cycle ANN was the final model. We

interpreted the fact that three training runs gave similar results to indicate that the global

minimum had been reached. The training error was 0.029 while the validation error was

0.04.

68


is production of the current month. Since there are only six input variables and one output

variable, it was assumed that the number of hidden neurons is also small. In practice, it is

best to have as few hidden nodes as possible because fewer weights need to be

determined.

A scaled data set was used to select the best network configuration. The same training

and validation sets were applied to train and validate five different back-propagation

networks with 2, 3, 4, 5 and 6 hidden units respectively. As can be seen in Table 1, the

network with 4 hidden nodes produced the least validation error, and was the best model

found. The training error was 0.034 and the validation error was 0.032.

Table 4.1.1. Network configuration - model 1

# H idden Units T raining RMSE V alidation RM SE2 0.034 0.0343 0.034 0.0334 0.034 0.0325 0.034 0.0346 0.033 0.034

With the training and validation sets specified above, the back-propagation neural

network was trained three times with different initial weights. During training, the root

mean square error (RMSE) on the training set declined steadily but the amount of

decrease became insignificant after the first few cycles. The ANN was saved every 300

cycles. The validation error started to increase between cycles 300 and 600, which

indicated over-fitting had occurred. The saved 600-cycle ANN was the final model. We

interpreted the fact that three training runs gave similar results to indicate that the global

minimum had been reached. The training error was 0.029 while the validation error was

0.04.

68


4.1.2.2 Development of a Model of Production Time Series Only

A sensitivity test was conducted to measure the impact of each input variable on the

output in the first ANN model developed on NOL described in section 4.2. The results

showed that all the three geoscience variables had less than 5% influence on the

production. This confirmed our concern earlier about the limited amount of geoscience

data. Hence a second model was developed which consists of the three conditional

variables of the oil production volumes of the three previous months, and the output or

consequent variable is production of the current month.

A similar preprocessing, configuration, training and validation process was conducted

as for the first model. As can be in Table 2, the network with 3 hidden nodes produced

the least test error, and was found to be the best model. The training error was 0.035

while the validation error was 0.03.

Table 4.1.2. Network configuration — model 2

# Hidden Units Training RMSE Validation RMSE 2 0.033 0.036 3 0.034 0.03 4 0.035 0.031 5 0.033 0.035 6 0.034 0.034

4.1.3 Using Multiple Neural Network

A question that confronts an engineer is how long it takes for a well to dry out. To answer

this question, forecasts of not only one but several months ahead need to be made. The

models presented in section 4.1.2 could not make long term prediction with reasonable

accuracy. Therefore we proposed the multiple neural network (MNN) approach for time

series modeling to make long term predictions.

69


4.1.2.2 Development of a Model of Production Time Series Only

A sensitivity test was conducted to measure the impact of each input variable on the

output in the first ANN model developed on NOL described in section 4.2. The results

showed that all the three geoscience variables had less than 5% influence on the

production. This confirmed our concern earlier about the limited amount of geoscience

data. Hence a second model was developed which consists of the three conditional

variables of the oil production volumes of the three previous months, and the output or

consequent variable is production of the current month.

A similar preprocessing, configuration, training and validation process was conducted

as for the first model. As can be in Table 2, the network with 3 hidden nodes produced

the least test error, and was found to be the best model. The training error was 0.035

while the validation error was 0.03.

Table 4.1.2. Network configuration - model 2

# Hidden Units Training RMSE Validation RMSE2 0.033 0.0363 0.034 0.034 0.035 0.0315 0.033 0.0356 0.034 0.034

4.1.3 Using Multiple Neural Network

A question that confronts an engineer is how long it takes for a well to dry out. To answer

this question, forecasts of not only one but several months ahead need to be made. The

models presented in section 4.1.2 could not make long term prediction with reasonable

accuracy. Therefore we proposed the multiple neural network (MNN) approach for time

series modeling to make long term predictions.

69


The MNN was trained with the following parameters

• Number of maximum training cycles: 3000

• Validation error threshold: 5%

• Number of hidden units for each ANN: 5

• Number of input variables for each ANN: 3

• Lead time: 100 months

• Number of ANNs: 7

• Initial learning rate: 0.7

• Momentum: 0.3

The number of ANNs in the MNN was deteimined based on the length of the

prediction term. Since 26<100<27, seven ANNs were used to predict 100 months ahead.

However, if not enough data are available for training high ordered ANNs, this number

can be set smaller. The weights of the first ANN were initialized with small random

values. The initial weights of subsequent ANNs were copied from the previously trained

ANNs.

Validation was done every four training cycles. Four is selected to minimize the

validation time spent. The training process halts under one of the following conditions:

• The number of cycles is equal to the maximum number of training cycles allowed

• The training and validation errors are smaller or equal to the validation error

threshold set by the user

• The values of the last n validation errors increase monotonically, which indicate over-

fitting is likely to have begun. In our experiments, n = 10 was used, and the ANN that

produces the least validation error is saved.

70


The MNN was trained with the following parameters

• Number of maximum training cycles: 3000

• Validation error threshold: 5%

• Number of hidden units for each ANN: 5

• Number of input variables for each ANN: 3

• Lead time: 100 months

• Number of ANNs: 7

• Initial learning rate: 0.7

• Momentum: 0.3

The number of ANNs in the MNN was determined based on the length of the

prediction term. Since 26<100<27, seven ANNs were used to predict 100 months ahead.

However, if not enough data are available for training high ordered ANNs, this number

can be set smaller. The weights of the first ANN were initialized with small random

values. The initial weights of subsequent ANNs were copied from the previously trained

ANNs.

Validation was done every four training cycles. Four is selected to minimize the

validation time spent. The training process halts under one of the following conditions:

• The number of cycles is equal to the maximum number of training cycles allowed

• The training and validation errors are smaller or equal to the validation error

threshold set by the user

• The values of the last n validation errors increase monotonically, which indicate over

fitting is likely to have begun. In our experiments, n = 10 was used, and the ANN that

produces the least validation error is saved.

70


The first component ANN which predicts one month ahead was used as the single

ANN in our comparison.

4.1.4 Results

4.1.4.1 NOL Models

The test data set was run with the saved ANN models. The testing error rates found was

0.04 for the first model developed on NOL that incorporates both time series and

geoscience data and 0.033 for the second model on NOL with only time series data.

Figure 4.1.3 and 4.1.4 show the predicted values of the two models (indicated by the line)

versus the target values (indicated by the dots).

1 .0

0.0 0.0 1.0

Figure 4.1.3. Predicted vs. target — model 1

71


The first component ANN which predicts one month ahead was used as the single

ANN in our comparison.

4.1.4 Results

4.1.4.1 NOL Models

The test data set was run with the saved ANN models. The testing error rates found was

0.04 for the first model developed on NOL that incorporates both time series and

geoscience data and 0.033 for the second model on NOL with only time series data.

Figure 4.1.3 and 4.1.4 show the predicted values of the two models (indicated by the line)

versus the target values (indicated by the dots).

1.0

0.00

Figure 4.1.3. Predicted vs. target - model 1

71


Figure 4.1.4 Predicted vs. target — model 2

Sensitivity tests were conducted over the two trained models to identify input

variables that have strong influence on an output variable, or inputs that have little or no

influence on the output variable. Sensitivity testing is useful for understanding the

correlations in the data, which may lead to a greater understanding of the physical

causality of the process. Sensitivities (or influences of parameters) are obtained by taking

the average of the local derivative information. In our experiments, the sensitivities were

calculated with the NOL tool. They are calculated via the following process [Ge95]:

1. Select a random data point from the data series.

2. Generate the outputs for the data point using the model.

3. Derange the j th input by a small amount, and recalculate the output.

4. For each output, estimate of the local derivative at the selected data point

output — output,

input. —input./

where the prime in the indices indicates the deranged input and output.

5. Repeat from step 3 for each input.

72


Figure 4.1.4 Predicted vs. target - model 2

Sensitivity tests were conducted over the two trained models to identify input

variables that have strong influence on an output variable, or inputs that have little or no

influence on the output variable. Sensitivity testing is useful for understanding the

correlations in the data, which may lead to a greater understanding of the physical

causality of the process. Sensitivities (or influences of parameters) are obtained by taking

the average of the local derivative information. In our experiments, the sensitivities were

calculated with the NOL tool. They are calculated via the following process [Ge95]:

1 . Select a random data point from the data series.

2. Generate the outputs for the data point using the model.

3. Derange the f 1 input by a small amount, and recalculate the output.

4. For each output, estimate of the local derivative at the selected data point

output.. — output ■

input, - input .

where the prime in the indices indicates the deranged input and output.

5. Repeat from step 3 for each input.

7 2


6. Repeat for another random data point.

The sensitivity value of output i with respect to input j is then calculated by taking the

average of the absolute of Sij values, over the sample of randomly selected data points.

Finally, each sensitivity is normalized by dividing the sensitivity by the standard

deviation of the respective input variable.

In the first model, the sensitivities of the six inputs over production are as follows.

Table 4.1.3 Sensitivities — model 1

k cia P Pt-3 Pt-2 Pt-1

3.5% 1.3% 0.6% 9.6% 21.9% 63%

As can be seen in Table 4.1.3, the influence of the core (k and t) and DST (P)

analysis on the production is very small (less than 5%). The production of the most recent

month has the strongest effect at 63%, and the productions of the previous two months

are also significant at 9.6% and 21.9%. From this result, it was decided the second model

should include only production time series data.

The sensitivities of the three input variables over the output variable in the second

model are similar to the previous model as can be seen in Table 4.1.4.

Table 4.1.4 Sensitivities — model 2

Pt-3 Pt-2 Pt-1

11.3% 27.6% 61.2%

4.1.4.2 Multiple-ANN and Single-ANN Models

In order to facilitate a comparison between a MNN and a single ANN, the same test

set of data was applied to the MNN and the single ANN to predict monthly production up

to 100 months ahead. The average RMSE was 0.053.

73


6 . Repeat for another random data point.

The sensitivity value of output i with respect to input j is then calculated by taking the

average of the absolute of Sij values, over the sample of randomly selected data points.

Finally, each sensitivity is normalized by dividing the sensitivity by the standard

deviation of the respective input variable.

In the first model, the sensitivities of the six inputs over production are as follows.

Table 4.1.3 Sensitivities - model 1

k 0 P Pt-3 Pt-2 Pt-i3.5% 1.3% 0 .6 % 9.6% 21.9% 63%

As can be seen in Table 4.1.3, the influence of the core (k and O) and DST (P)

analysis on the production is very small (less than 5%). The production of the most recent

month has the strongest effect at 63%, and the productions of the previous two months

are also significant at 9.6% and 21.9%. From this result, it was decided the second model

should include only production time series data.

The sensitivities of the three input variables over the output variable in the second

model are similar to the previous model as can be seen in Table 4.1.4.

Table 4.1.4 Sensitivities - model 2

Pt-3 Pt-2 Pt-i11.3% 27.6% 61.2%

4.1.4.2 Multiple-ANN and Single-ANN Models


set of data was applied to the MNN and the single ANN to predict monthly production up

to 100 months ahead. The average RMSE was 0.053.

73


Figure 4.1.5 illustrates the errors for different period from 1 to 100 months ahead.

Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods

Figure 4.1.5 indicates that the MNN generally performs slightly better than the single

first ordered ANN. As the prediction term increases, the difference is more significant.

This indicates a MNN performs better than a single ANN in long term forecast.

Desired vs. Predicted

E 3000 .c 2500 • 2000 p 1500 0 1000 • 500 -0 2 0 0. CO (0 T CO T (0 17_

N 1.0 CO CO 0) 0

Record #

CO

"&71

— Desired

Predicted by ANN

Predicted by MNN

Figure 4.1.6 Desired vs. predicted outputs

Figure 4.1.6 illustrates the desired and predicted outputs from MNN and ANN for a

prediction term of 100 months. With the exception of approximately the first 100 values

in the graph, the predictions from the ANN and MNN model are quite close to the desired

results.

74


Figure 4.1.5 illustrates the errors for different period from 1 to 100 months ahead.

2.5o

•■—MNN Error ANN Error

months

Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods

Figure 4.1.5 indicates that the MNN generally performs slightly better than the single

first ordered ANN. As the prediction term increases, the difference is more significant.

This indicates a MNN performs better than a single ANN in long term forecast.

Desired vs. Predicted

E 3000 £ 2500 'T 2000 o 1500 Z 1000 ■g 500 o 0Q.

Record #

Figure 4.1.6 Desired vs. predicted outputs

Figure 4.1.6 illustrates the desired and predicted outputs from MNN and ANN for a

prediction term of 100 months. With the exception of approximately the first 100 values

in the graph, the predictions from the ANN and MNN model are quite close to the desired

results.

74

I• u

( O t- C O t- C O t- C O t- C Dn N O ^ S r ^ f f l r- i - c v - ' ^ L n c o c o c n o c M

■ Desired

Predicted by ANNPredicted by MNN


4.1.5 Discussions

The results of the models developed on NOL indicate that the production time series

model works compatibly with the mixed causal and time series model. The fact that

geoscience data has insignificant influence on production rates can be explained as

follows. Firstly, core analysis is taken at well bore and may not represent the real

permeability and porosity values over the entire well. Secondly, pressure usually changes

over a well's lifecycle but only information about the initial pressure is available for the

study. Thirdly, the time series data may already incorporate all the information of the

core and DST because there are correlations between previous productions and a well's

parameters. Lastly, there may not be sufficient core and DST analysis data points to study

the influence of these parameters on the production.

The NOL tool-kit is a convenient and generic tool to develop and deploy an ANN

application. However, modifying network structures or deployment in an environment

that involves non-Gensym products are more complicated. The MNN tool is a program

developed for the specific purpose of combining ANNs into a MNN. Currently, only

back-propagation ANN is included. However, it is easy to add more network types into

the system as Java is an object-oriented language.

It is observed that a MNN approach has some disadvantages. First, MNN is more

complex than a single ANN although it is only linearly more complex than a single ANN.

Secondly, a high-ordered ANN requires more data to train and validate.

75


4.1.5 Discussions

The results of the models developed on NOL indicate that the production time series

model works compatibly with the mixed causal and time series model. The fact that

geoscience data has insignificant influence on production rates can be explained as

follows. Firstly, core analysis is taken at well bore and may not represent the real

permeability and porosity values over the entire well. Secondly, pressure usually changes

over a well’s lifecycle but only information about the initial pressure is available for the

study. Thirdly, the time series data may already incorporate all the information of the

core and DST because there are correlations between previous productions and a well’s

parameters. Lastly, there may not be sufficient core and DST analysis data points to study

the influence of these parameters on the production.

The NOL tool-kit is a convenient and generic tool to develop and deploy an ANN

application. However, modifying network structures or deployment in an environment

that involves non-Gensym products are more complicated. The MNN tool is a program

developed for the specific purpose of combining ANNs into a MNN. Currently, only

back-propagation ANN is included. However, it is easy to add more network types into

the system as Java is an object-oriented language.

It is observed that a MNN approach has some disadvantages. First, MNN is more

complex than a single ANN although it is only linearly more complex than a single ANN.

Secondly, a high-ordered ANN requires more data to train and validate.

7 5


4.1.6 Conclusion and Future Works

This section presents two ANN approaches for prediction of petroleum production. The

results show that ANN can be used for petroleum prediction. The models are efficient

and adaptable.

Another remark from the experiment is that core analysis and DST data have little

contribution to the model output of petroleum production and an univariate time series is

sufficient to develop a meaningful model.

The MNN model shows superior performance over the single ANN model in long

term prediction. Aside from ANN, it is possible to use other numerical prediction

techniques in a multiple-order model to perform long term prediction.

Future research includes comparing the MNN technique with other statistical curve

fitting techniques.

4.2 Hourly Gas Flow Prediction

The second case study is to predict future hourly gas flow through the Melfort

compressor station. This station is a part of the gas pineline distribution system at St.

Louis East.

76



This section presents two ANN approaches for prediction of petroleum production. The

results show that ANN can be used for petroleum prediction. The models are efficient

and adaptable.

Another remark from the experiment is that core analysis and DST data have little

contribution to the model output of petroleum production and an univariate time series is

sufficient to develop a meaningful model.

The MNN model shows superior performance over the single ANN model in long

term prediction. Aside from ANN, it is possible to use other numerical prediction

techniques in a multiple-order model to perform long term prediction.

Future research includes comparing the MNN technique with other statistical curve

fitting techniques.

4.2 Hourly Gas Flow Prediction

The second case study is to predict future hourly gas flow through the Melfort

compressor station. This station is a part of the gas pineline distribution system at St.

Louis East.

7 6


Nipawin Consumption Area

St. Louis Station

Melfort Station

St. Brieux Consumption Area

Hudson Bay Consumption Area

Figure 4.2.1. Schematic of St. Louis East system

Figure 4.2.1 illustrates the gas stations and their service areas of the St. Louis East gas

pineline distribution system. The system consists of two stations located at Melfort and

St. Louis. The Melfort station receives gas from the St. Louis station and transmits it to

the surrounding consumption areas of Nipawin and Hudson Bay. It is important to ensure

that customer demand is fulfilled. This means that there should be sufficient number of

compressors running at the Melfort station and sufficient gas input to the Melfort station.

Dispatchers at the Melfort station need to make decisions to turn compressors on or off,

or to adjust the compression level in order to reach the necessary pressure while not

wasting resources. The decision has a significant impact on the effectiveness of the

natural gas pipeline operation. When the customer demand increases, a dispatcher adds

compression to the pipeline system by turning on one or more compressors. On the other

hand, the dispatcher turns off one or more compressors to reduce compression in the

pipeline system when the customer demand decreases. Incorrect decisions made by the

dispatcher will cause substantial economic loss.

The purpose of this study is to aid the dispatcher in optimizing natural gas pipeline

operations in order to satisfy customer demand with minimal operating costs. A

77


MelfortStation

St. Louis Station

Hudson Bay Consumption Area

St. Brieux Consumption Area

Nipawin Consumption Area

Figure 4.2.1. Schematic of St. Louis East system

Figure 4.2.1 illustrates the gas stations and their service areas of the St. Louis East gas

pineline distribution system. The system consists of two stations located at Melfort and

St. Louis. The Melfort station receives gas from the St. Louis station and transmits it to

the surrounding consumption areas of Nipawin and Hudson Bay. It is important to ensure

that customer demand is fulfilled. This means that there should be sufficient number of

compressors running at the Melfort station and sufficient gas input to the Melfort station.

Dispatchers at the Melfort station need to make decisions to turn compressors on or off,

or to adjust the compression level in order to reach the necessary pressure while not

wasting resources. The decision has a significant impact on the effectiveness of the

natural gas pipeline operation. When the customer demand increases, a dispatcher adds

compression to the pipeline system by turning on one or more compressors. On the other

hand, the dispatcher turns off one or more compressors to reduce compression in the

pipeline system when the customer demand decreases. Incorrect decisions made by the

dispatcher will cause substantial economic loss.

The purpose of this study is to aid the dispatcher in optimizing natural gas pipeline

operations in order to satisfy customer demand with minimal operating costs. A

77


dispatcher needs to know ahead of time when the largest volume requirement will occur

and to be ready for it. Otherwise, the system pressures at Nipawin and Hudson Bay will

be below the required minimum. Since consumption is only available monthly from

billing records, it cannot be used for the task of predicting hourly demand. Therefore, we

use the flow rate at the Melfort station as a substitute variable for the demand.

Figure 4.2.2. Hourly flow during a day

The flow rate at Melfort station more or less reflects the consumption patterns of

customers at Nipawin and Hudson Bay. As illustrated in Figure 4.2.2, the natural gas

flow rate fluctuates during a day. For example, the demand is usually low at night. In the

morning, the demand is higher because residential customers start cooking and industrial

customers start their machines. In the afternoon, the demand decreases since the facilities

are already heated up. After work hours, industrial customers' demand becomes lower

while the residential customers' demand gets higher. The demand for natural gas also

fluctuates depending on the season. In the winter, the demand for natural gas is usually

higher than in the summer. Special occasions such as public holidays are also a factor that

affects demand patterns.

78


dispatcher needs to know ahead of time when the largest volume requirement will occur

and to be ready for it. Otherwise, the system pressures at Nipawin and Hudson Bay will

be below the required minimum. Since consumption is only available monthly from

billing records, it cannot be used for the task of predicting hourly demand. Therefore, we

use the flow rate at the Melfort station as a substitute variable for the demand.

Gas Flow at Melfort on 10/3/02

500

400 4

300Flow ra te

200 -

100 -

O CO CO cn

Figure 4.2.2. Hourly flow during a day

The flow rate at Melfort station more or less reflects the consumption patterns of

customers at Nipawin and Hudson Bay. As illustrated in Figure 4.2.2, the natural gas

flow rate fluctuates during a day. For example, the demand is usually low at night. In the

morning, the demand is higher because residential customers start cooking and industrial

customers start their machines. In the afternoon, the demand decreases since the facilities

are already heated up. After work hours, industrial customers’ demand becomes lower

while the residential customers’ demand gets higher. The demand for natural gas also

fluctuates depending on the season. In the winter, the demand for natural gas is usually

higher than in the summer. Special occasions such as public holidays are also a factor that

affects demand patterns.

78


4.2.1 Data Collection and Preprocessing

The data was obtained from SaskEnergy/Transgas. Hourly flow rates in the period from

December 2001 to mid August 2002 were collected with an interruption from March 14th

to May 27th. Fall (from September to November) and spring (from March to May) data

was not available. This is a disadvantage since we could not divide the data set into four

seasonal data sets for separate treatments. There were several hourly flow rates with

values of zero in the data set. Those are either missing or abnormal data. All such values

were eliminated from the data set. The total number of data points is 3500 approximately.


of [0, 1], all hourly flow rates should be normalized to this range. The following equation

was used for normalization



productions and not the actual boundaries in the training data set. By examining the plot

of the historical data set, the min and max values are estimated as 0 and 600 (103m3).

The data set was divided into three subsets for training, validation and testing in the

proportion of 5:1:1. The training set contains approximately 2500 data points. The

validation and test sets contain only around 500 data point each.

4.2.2 Training and Validation

The chosen input size was six as six hours are a quarter of a day. Shorter period may not

contain enough information to predict 24 hours ahead while longer period may make the

neural networks too complex and therefore require more data to train.

79


4.2.1 Data Collection and Preprocessing

The data was obtained from SaskEnergy/Transgas. Hourly flow rates in the period from

December 2001 to mid August 2002 were collected with an interruption from March 14th

to May 27th. Fall (from September to November) and spring (from March to May) data

was not available. This is a disadvantage since we could not divide the data set into four

seasonal data sets for separate treatments. There were several hourly flow rates with

values of zero in the data set. Those are either missing or abnormal data. All such values

were eliminated from the data set. The total number of data points is 3500 approximately.


of [0, 1], all hourly flow rates should be normalized to this range. The following equation

was used for normalization



productions and not the actual boundaries in the training data set. By examining the plot

of the historical data set, the min and max values are estimated as 0 and 600 ( 1 0 m ).

The data set was divided into three subsets for training, validation and testing in the

proportion of 5:1:1. The training set contains approximately 2500 data points. The

validation and test sets contain only around 500 data point each.

4.2.2 Training and Validation

The chosen input size was six as six hours are a quarter of a day. Shorter period may not

contain enough information to predict 24 hours ahead while longer period may make the

neural networks too complex and therefore require more data to train.

7 9



expected prediction term. Since 24<24<25, a maximum of four ANNs were used to

predict 24 hours ahead. However, this number can be set smaller based on validation

errors produced by different combinations of ANNs.

The weights of the first ANN were initialized with small values in the range from 0 to

0.5. The following ANNs were initialized with the previous ANN' s weights in order to

reduce training time.

Validation was done every four training cycles. Single step validation showed better

results than multiple step validation.

After five neural networks had been trained, five combinations of them that include 1,

2, 3, 4 and 5 neural networks was validated on the validation set to predict 24 hours

ahead. The one with lowest error rate out of the least four was chosen as the final MNN.

The one with only one neural network was the single ANN.

Figure 4.2.3. Validated RMSE of 5 models for 24 hour period

As can be seen, the MNNs with 4 and 5 neural networks consistently performed

better than the single ANN model. Meanwhile, the MNNs with 2 and 3 neural networks

gave good results at first and became less and less effective when the prediction period

got longer. The model with 5 neural networks performs only marginally better than the

80



expected prediction term. Since 24<24<25, a maximum of four ANNs were used to

predict 24 hours ahead. However, this number can be set smaller based on validation

errors produced by different combinations of ANNs.

The weights of the first ANN were initialized with small values in the range from 0 to

0.5. The following ANNs were initialized with the previous ANN’s weights in order to

reduce training time.

Validation was done every four training cycles. Single step validation showed better

results than multiple step validation.

After five neural networks had been trained, five combinations of them that include 1,

2, 3, 4 and 5 neural networks was validated on the validation set to predict 24 hours

ahead. The one with lowest error rate out of the least four was chosen as the final MNN.

The one with only one neural network was the single ANN.

RMSE on Validation Data

0.4 t -

0.3 1 NN2 NN3 NN

0.20.1

5 NNto 03CM

Lead

Figure 4.2.3. Validated RMSE of 5 models for 24 hour period

As can be seen, the MNNs with 4 and 5 neural networks consistently performed

better than the single ANN model. Meanwhile, the MNNs with 2 and 3 neural networks

gave good results at first and became less and less effective when the prediction period

got longer. The model with 5 neural networks performs only marginally better than the

80


one with 4 neural networks. The average MAPEs for the five models with 1 to 5 neural

networks were 11.7%, 40.3%, 11.02%, 8.84% and 8.76% relatively. The one with 5

neural networks was chosen as the final MNN for testing.

4.2.3 Testing


set of data was applied to the MNN and the single ANN to predict hourly flow rate for

different leads from 1 to 24 hours. Figure 4.2.4 summarizes the results.

0.07 0.06 0.05 0.04 0.03 0.02 0.01

Figure 4.2.4. Test errors for MNN and single ANN for 24 hour period

Figure 4.2.4 indicates that the MNN consistently performs better than the single first

ordered ANN. As the prediction term increases, the difference is more significant. This

indicates a MNN performs better than a single ANN in long term forecast.

Average MAPEs were calculated as follows, where MAPE(i) is the mean absolute

percentage error for lead i.

1 24

Average _MAPE = —MAPE(i) 24 l=1

81


one with 4 neural networks. The average MAPEs for the five models with 1 to 5 neural

networks were 11.7%, 40.3%, 11.02%, 8.84% and 8.76% relatively. The one with 5

neural networks was chosen as the final MNN for testing.

4.2.3 Testing


set of data was applied to the MNN and the single ANN to predict hourly flow rate for

different leads from 1 to 24 hours. Figure 4.2.4 summarizes the results.

RMSE on Test Data

(O 0.04 | 0.03

0.02 -

0.01

•RMSE by ANN •RMSE by MNN

Lead

Figure 4.2.4. Test errors for MNN and single ANN for 24 hour period

Figure 4.2.4 indicates that the MNN consistently performs better than the single first

ordered ANN. As the prediction term increases, the difference is more significant. This

indicates a MNN performs better than a single ANN in long term forecast.

Average MAPEs were calculated as follows, where MAPE(i) is the mean absolute

percentage error for lead i.

1 24Average _ MAPE = — MAPE(i)

81


For 24 hours, the average errors were 12.38% with the single ANN and 8.736% with

the MNN. Figure 4.2.5 illustrates the desired output and predicted outputs from the MNN

and ANN model for a prediction lead of 24 hours.

Predicted vs. Actual (24 hours ahead)

400

300

200 -

100

0

53 105 157 209 261 313 365 417

—Actual

— Predicted by MNN

Predicted by ANN

Figure 4.2.5. Predicted vs. actual for 24 hours ahead

As can be seen, neither model's performance was reasonably good. The prediction by

the MNN shaped like a delayed version of the actual outputs and the prediction by the

single ANN was rather random.

For the first 6 hours, the average errors were 5.75% with the single ANN and 4.971%

with the MNN. An error of 5% or less was considered acceptable. Figure 4.2.6 illustrates

the desired output and predicted outputs from MNN and ANN for a prediction lead of 6

hours.

400

300 -

200 -

100

0


—Actual

— Predicted by MNN

Predicted by ANN

53 105 157 209 261 313 365 417


82


For 24 hours, the average errors were 12.38% with the single ANN and 8.736% with

the MNN. Figure 4.2.5 illustrates the desired output and predicted outputs from the MNN

and ANN model for a prediction lead of 24 hours.


300

200

1 0 0

0

------ Actual

------ P redicted byMNN

Predicted by

1 53 105 157 209 261 313 365 417ANN


As can be seen, neither model’s performance was reasonably good. The prediction by

the MNN shaped like a delayed version of the actual outputs and the prediction by the

single ANN was rather random.

For the first 6 hours, the average errors were 5.75% with the single ANN and 4.971%

with the MNN. An error of 5% or less was considered acceptable. Figure 4.2.6 illustrates

the desired output and predicted outputs from MNN and ANN for a prediction lead of 6

hours.


400

300 --

2001 0 0

1 53 105 157 209 261 313 365 417

■Actual

- Predicted by MNNPredicted by ANN


82


As can be seen, both the predicted lines are quite close to the actual line but the MNN

predicted better than the single ANN.

4.2.4 Discussions

The poor performance of both ANN and MNN models on the 24-hour prediction can be

explained as follows. Firstly, there may be not enough data points for training. The data

used in this study was collected in less than a year. Secondly, special occasions such as

holidays and weekends have not been considered. The gas usage patterns in such

occasions may be different from that of a normal day. Thirdly, seasonal effects may play

a role. A network trained on a data set for summer may not generalize well in winter.


The case study indicates that a MNN model shows superior performance over a single

ANN model in long-term prediction. However, if the period is too long, neither model

can predict well. Incorporating more neural networks also do not guarantee to lower the

error. In the study above, the model with two neural networks showed less satisfactory

performance overall than the single neural network. Using more than five neural

networks to predict 24 period ahead is unavailing because the neural networks with

orders greater than or equal to five predict 32 periods or above.

Future research can include collecting more data and subdividing the problem into

sub-problems. Classification can be based on seasons, weekends or weekdays.

83


As can be seen, both the predicted lines are quite close to the actual line but the MNN

predicted better than the single ANN.

4.2.4 Discussions

The poor performance of both ANN and MNN models on the 24-hour prediction can be

explained as follows. Firstly, there may be not enough data points for training. The data

used in this study was collected in less than a year. Secondly, special occasions such as

holidays and weekends have not been considered. The gas usage patterns in such

occasions may be different from that of a normal day. Thirdly, seasonal effects may play

a role. A network trained on a data set for summer may not generalize well in winter.


The case study indicates that a MNN model shows superior performance over a single

ANN model in long-term prediction. However, if the period is too long, neither model

can predict well. Incorporating more neural networks also do not guarantee to lower the

error. In the study above, the model with two neural networks showed less satisfactory

performance overall than the single neural network. Using more than five neural

networks to predict 24 period ahead is unavailing because the neural networks with

orders greater than or equal to five predict 32 periods or above.

Future research can include collecting more data and subdividing the problem into

sub-problems. Classification can be based on seasons, weekends or weekdays.

83


Chapter 5

Observations and Discussions

This chapter presents some observations and discussions derived from development of

the case studies in chapter 4.

5.1 Discussions on Suitability of Time Series Modeling

in Forecasting

The two case studies presented in the chapter 4 are two satisfying applications of time

series modeling in forecasting. However, not all time series can be used to build a

meaningful forecasting model.

Sometimes crucial information is missing from a time series. For example,

temperature is a factor that influences gas consumption but it is not coded in the gas

consumption time series. In literature, there are two solutions for this kind of problem.

One is to use multivariate time series modeling [Ch94] (cited in [Ru95]). In this

approach, the temperature time series is included in the model as an independent variable.

The other approach is to classify the time series into several classes and apply univariate

time series modeling to each class [LC98], [LCMT99]. For example, based on the date

that the gas consumption was recorded, a data point can be classified to a hot season class

or cold season class. Separate models will be built using each of these classes. However,

these two methods may require data which is not always obtainable.

84


Chapter 5

Observations and Discussions

This chapter presents some observations and discussions derived from development of

the case studies in chapter 4.

5.1 Discussions on Suitability of Time Series Modeling

in Forecasting

The two case studies presented in the chapter 4 are two satisfying applications of time

series modeling in forecasting. However, not all time series can be used to build a

meaningful forecasting model.

Sometimes crucial information is missing from a time series. For example,

temperature is a factor that influences gas consumption but it is not coded in the gas

consumption time series. In literature, there are two solutions for this kind of problem.

One is to use multivariate time series modeling [Ch94] (cited in [Ru95]). In this

approach, the temperature time series is included in the model as an independent variable.

The other approach is to classify the time series into several classes and apply univariate

time series modeling to each class [LC98], [LCMT99]. For example, based on the date

that the gas consumption was recorded, a data point can be classified to a hot season class

or cold season class. Separate models will be built using each of these classes. However,

these two methods may require data which is not always obtainable.

8 4


5.2 Discussions on Using the NOL Tool-kit

In my opinion, NOL is a useful tool-kit for industrial users who have little knowledge on

neural network structures and algorithms but who wish to develop a neural network

application. NOL allows quick and simple development of a neural network application.

Parameters such as learning rate are adjusted automatically during the training process. It

requires only a little training to be able to use the basic features of the tool-kit. However,

in order to utilize the tool-kit fully, knowledge of G2 is mandatory and it could take some

effort to find out the meaning and usage of various NOL blocks.

There are a number of other disadvantages. Since the source code is not available, it

is difficult to modify or improve a neural network's parameters and algorithms, or to

deploy the neural network in an environment that involves other than Gensym's products.

While most other simulator allows users to run a neural network after a few

configurations, NOL requires users to build a neural network by connecting its

components together.

Nevertheless, NOL is still useful for developers to build prototypes of neural network

model or investigate the feasibility of building a neural network application. NOL also

includes a facility to conduct sensibility testing which is very useful in selecting input

variables.

5.3 Discussions on Using the MNN Tool

5.3.1 Reusing weights of lower-ordered ANNs

It is not determined whether to re-use the weights of low-ordered ANN to initialize

higher-ordered ANN is desirable. On the positive side, reusing the weights reduces the

85


5.2 Discussions on Using the NOL Tool-kit

In my opinion, NOL is a useful tool-kit for industrial users who have little knowledge on

neural network structures and algorithms but who wish to develop a neural network

application. NOL allows quick and simple development of a neural network application.

Parameters such as learning rate are adjusted automatically during the training process. It

requires only a little training to be able to use the basic features of the tool-kit. However,

in order to utilize the tool-kit fully, knowledge of G2 is mandatory and it could take some

effort to find out the meaning and usage of various NOL blocks.

There are a number of other disadvantages. Since the source code is not available, it

is difficult to modify or improve a neural network’s parameters and algorithms, or to

deploy the neural network in an environment that involves other than Gensym’s products.

While most other simulator allows users to run a neural network after a few

configurations, NOL requires users to build a neural network by connecting its

components together.

Nevertheless, NOL is still useful for developers to build prototypes of neural network

model or investigate the feasibility of building a neural network application. NOL also

includes a facility to conduct sensibility testing which is very useful in selecting input

variables.

5.3 Discussions on Using the MNN Tool

5.3.1 Reusing weights of lower-ordered ANNs

It is not determined whether to re-use the weights of low-ordered ANN to initialize

higher-ordered ANN is desirable. On the positive side, reusing the weights reduces the

85


time necessary to train higher-ordered ANNs. It could also benefit when the amount of

data available is low or the data has high level of discontinuity due to missing data. In

this case, the number of data records for training higher-ordered ANN may not be

sufficient to train the ANN from scratch. Since it is expected that the weights of lower-

ordered ANNs be more or less close to the optimal weights of higher-ordered ANNs, it

could be better to reuse the weights. However, this is possible only when the higher-

ordered and lower-ordered ANNs have the same number of hidden units. A disadvantage

of reusing weights is that the training of the higher-ordered ANN can easily be stuck in a

local minimum close to the place where the training of the lower-ordered ANN stops.

Therefore, in the case where there is sufficient data, it is recommended to initialize all

ANNs with random weights. Training the ANNs separately can increase the diversity

among the ANNs, which could be a factor that can increase the generalization ability of a

MNN.

5.3.2 Using multi-step validation

Similarly, multi-step validation does not always improve the training results. In the case

where the data is discontinuous due to missing values, multi-step validation reduces the

number of validation records. For example, for a 2nd-ordered network, we need records of

length 4 to perform single-step validation and records of length 7 for multi-step

validation (Refer to 3.5.2). Moreover, if one low-ordered neural network is not trained

well, the following higher-ordered networks are also affected because the training is

dependent on the previous network. Therefore, it is recommended to well train the ANNs

one by one from the lowest order to the highest order.

86


time necessary to train higher-ordered ANNs. It could also benefit when the amount of

data available is low or the data has high level of discontinuity due to missing data. In

this case, the number of data records for training higher-ordered ANN may not be

sufficient to train the ANN from scratch. Since it is expected that the weights of lower-

ordered ANNs be more or less close to the optimal weights of higher-ordered ANNs, it

could be better to reuse the weights. However, this is possible only when the higher-

ordered and lower-ordered ANNs have the same number of hidden units. A disadvantage

of reusing weights is that the training of the higher-ordered ANN can easily be stuck in a

local minimum close to the place where the training of the lower-ordered ANN stops.

Therefore, in the case where there is sufficient data, it is recommended to initialize all

ANNs with random weights. Training the ANNs separately can increase the diversity

among the ANNs, which could be a factor that can increase the generalization ability of a

MNN.

5.3.2 Using multi-step validation

Similarly, multi-step validation does not always improve the training results. In the case

where the data is discontinuous due to missing values, multi-step validation reduces the

number of validation records. For example, for a 2nd-ordered network, we need records of

length 4 to perform single-step validation and records of length 7 for multi-step

validation (Refer to 3.5.2). Moreover, if one low-ordered neural network is not trained

well, the following higher-ordered networks are also affected because the training is

dependent on the previous network. Therefore, it is recommended to well train the ANNs

one by one from the lowest order to the highest order.

86


5.3.3 Setting training parameters

Size of Input Vector

There are two ways to choose the size of an input vector. One is based on domain

experts' opinions. The other is by fixing the number of hidden units and varying the size

of input vector to choose the one with the lowest error.


After the size of an input vector has been chosen, the number of hidden units of each

ANN should be determined by trial and error. In the beginning, the number of hidden

units should be initialized with a small value. If the performance is poor then increase this

number. On the other hand, if there is evidence of over-fitting, then decrease the number

of hidden units.

Maximum number of training cycles

Users set the maximum number of training cycles before training. This number should be

large enough to reduce users' interaction. On the other hand, this number should be small

enough so that the users can update the parameters such as learning rate and momentum

when necessary (Refer to 5.3.4).

Choosing the size of validation windows

Validation window is a window of validation errors that is use to detect over-fitting. If

the chosen size is too small, the training process could stop at the first local minimum

that it reaches. On the other hand, if the validation window is too large, the training can

easily miss a minimum located near a second higher minimum. Since the condition for

claiming over-fitting is that the errors in the validation window increase monotonically,

the training process in Figure 5.1 skips the first minimum. The method using validation

87


5.3.3 Setting training parameters

Size of Input Vector

There are two ways to choose the size of an input vector. One is based on domain

experts’ opinions. The other is by fixing the number of hidden units and varying the size

of input vector to choose the one with the lowest error.


After the size of an input vector has been chosen, the number of hidden units of each

ANN should be determined by trial and error. In the beginning, the number of hidden

units should be initialized with a small value. If the performance is poor then increase this

number. On the other hand, if there is evidence of over-fitting, then decrease the number

of hidden units.


Users set the maximum number of training cycles before training. This number should be

large enough to reduce users’ interaction. On the other hand, this number should be small

enough so that the users can update the parameters such as learning rate and momentum

when necessary (Refer to 5.3.4).

Choosing the size of validation windows

Validation window is a window of validation errors that is use to detect over-fitting. If

the chosen size is too small, the training process could stop at the first local minimum

that it reaches. On the other hand, if the validation window is too large, the training can

easily miss a minimum located near a second higher minimum. Since the condition for

claiming over-fitting is that the errors in the validation window increase monotonically,

the training process in Figure 5.1 skips the first minimum. The method using validation

87


window in this tool is just a coarse-grained solution to detect over-fitting and it needs to

be improved in the future.

First minimur Second minimum

Va ow

Figure 5.1 Side effect of large validation window

5.3.4 Updating training parameters

Using the MNN tool to train neural networks is only semi automatic as the users are still

responsible for choosing the best values for training parameters. However, once the

parameters have been set, the tool will automatically detect over-fitting or stop training

when the error goes below a threshold set by the user. Users do not need to eye error

graphs during a training process. However, it is recommended that users examine the

error trend every time the maximum number of training cycles is reached and update the

parameters if necessary to get better training performance.

Parameters that can be updated are the learning rate and the momentum. A large

learning rate allows fast convergence but also can cause the model to oscillate around a

minimum. Momentum factor tends to keep the weight changes moving in the same

direction, hence allows the algorithm to skip over small local minima. It can also improve

the speed of learning. However, as in the case of learning rate, a large momentum factor

88


window in this tool is just a coarse-grained solution to detect over-fitting and it needs to

be improved in the future.

First minimum Second minimum

Valic ation win dow

Figure 5.1 Side effect of large validation window

5.3.4 Updating training parameters

Using the MNN tool to train neural networks is only semi automatic as the users are still

responsible for choosing the best values for training parameters. However, once the

parameters have been set, the tool will automatically detect over-fitting or stop training

when the error goes below a threshold set by the user. Users do not need to eye error

graphs during a training process. However, it is recommended that users examine the

error trend every time the maximum number of training cycles is reached and update the

parameters if necessary to get better training performance.

Parameters that can be updated are the learning rate and the momentum. A large

learning rate allows fast convergence but also can cause the model to oscillate around a

minimum. Momentum factor tends to keep the weight changes moving in the same

direction, hence allows the algorithm to skip over small local minima. It can also improve

the speed of learning. However, as in the case of learning rate, a large momentum factor

88


may cause the network to skip too much. In the studies in this thesis, momentum was

fixed and learning rate was adjusted based on trial and error.

89


may cause the network to skip too much. In the studies in this thesis, momentum

fixed and learning rate was adjusted based on trial and error.


Chapter 6

Conclusion and Future Works

6.1 Concluding Summary

One of the research objectives is to develop neural network models for two prediction

applications. The first application predicts monthly oil production of a well and the

second predicts hourly gas consumption.

The first step of the research project was to determine if time series alone is sufficient

to develop a good model. For the first application, two neural network models were

developed, which have different input vectors. One includes only time series lags as

input, the other has other additional variables. The results show that the more

sophisticated model did not perform better than the univariate time series model.

Sensitivity testing on the mixed model also confirmed that the time series lags had higher

influence on the output than the other additional variables. The reasonable errors also

suggest that neural network is a promising technique for a problem that petroleum

engineering has not successfully dealt with using conventional techniques.

As a next step in the research project on developing neural network models for the

two applications, the models for both applications were extended to predict longer term

ahead. We proposed a multiple neural network structure in an attempt to reduce the error

that accumulates in the recursive propagation process. The multiple-neural-network

model propagates ahead in different-length steps to make forecasts. The experimental

results were in favor of the proposed structure. A disadvantage of multiple neural

90


Chapter 6

Conclusion and Future Works

6.1 Concluding Summary

One of the research objectives is to develop neural network models for two prediction

applications. The first application predicts monthly oil production of a well and the

second predicts hourly gas consumption.

The first step of the research project was to determine if time series alone is sufficient

to develop a good model. For the first application, two neural network models were

developed, which have different input vectors. One includes only time series lags as

input, the other has other additional variables. The results show that the more

sophisticated model did not perform better than the univariate time series model.

Sensitivity testing on the mixed model also confirmed that the time series lags had higher

influence on the output than the other additional variables. The reasonable errors also

suggest that neural network is a promising technique for a problem that petroleum

engineering has not successfully dealt with using conventional techniques.

As a next step in the research project on developing neural network models for the

two applications, the models for both applications were extended to predict longer term

ahead. We proposed a multiple neural network structure in an attempt to reduce the error

that accumulates in the recursive propagation process. The multiple-neural-network

model propagates ahead in different-length steps to make forecasts. The experimental

results were in favor of the proposed structure. A disadvantage of multiple neural

90


network techniques is that it requires a data set involving longer and more continuous

time series in order to build a model.

The contribution of this work is modification of the recursive neural network

approach and the successful application of this method in the two industrial problem

domains. The proposed multiple neural network method generated results with a higher

accuracy in long term forecasting than the single neural network.

The idea of using multiple neural networks is not new. Several methods for

combining evidences produced by multiple sources into one final result have been

developed [HSY94][CB95]. Multiple neural network methods have also been applied in

time series modeling to improve the accuracy of long-term forecast [DSMV01]. The

novelty of the approach proposed in this thesis is the use of different exponential

prediction terms for the component neural networks. The variety of prediction terms for

the various component networks allows the combined model to cover both short term and

long term trends. Moreover, this method does not require many component networks.

Since the prediction terms of the component networks increase in an exponential manner

and the length of the longest prediction term determines the exponent of the highest-

ordered component network, the number of component networks is not high even when

the prediction term is long.

Since neural networks have the ability to encode redundant and missing information,

it is expected that the application models are robust and reliable. The models are also

reusable in a changing situation since the training of neural networks has generation

property. After a training process, a generation of a neural network is created. When the

situation changes, the neural network can resume the training with new data to generate

91


network techniques is that it requires a data set involving longer and more continuous

time series in order to build a model.

The contribution of this work is modification of the recursive neural network

approach and the successful application of this method in the two industrial problem

domains. The proposed multiple neural network method generated results with a higher

accuracy in long term forecasting than the single neural network.

The idea of using multiple neural networks is not new. Several methods for

combining evidences produced by multiple sources into one final result have been

developed [HSY94][CB95]. Multiple neural network methods have also been applied in

time series modeling to improve the accuracy of long-term forecast [DSMV01], The

novelty of the approach proposed in this thesis is the use of different exponential

prediction terms for the component neural networks. The variety of prediction terms for

the various component networks allows the combined model to cover both short term and

long term trends. Moreover, this method does not require many component networks.

Since the prediction terms of the component networks increase in an exponential manner

and the length of the longest prediction term determines the exponent of the highest-

ordered component network, the number of component networks is not high even when

the prediction term is long.

Since neural networks have the ability to encode redundant and missing information,

it is expected that the application models are robust and reliable. The models are also

reusable in a changing situation since the training of neural networks has generation

property. After a training process, a generation of a neural network is created. When the

situation changes, the neural network can resume the training with new data to generate

91


another generation that can adapt to the new situation. The new generation therefore

inherits the characteristics of the previous generation.

We observed some limitations that cause forecasting inaccuracy no matter how well a

model is trained.

• There is genuine random noise in the data due to error in the process of recording the

data. In both time series, we noticed several non-zero but abnormal data points. It is

not known whether they were incorrectly recorded or they are irregularities in the

data.

• The sample data set does not evenly spread in the feature space and it is only a partial

representation of the population. In the time series context, this means that the length

of time series under investigation is insufficient to represent all patterns in the

problem space. For example, the gas consumption time series does not contain

enough seasons to be able to include the seasonal factor.

• The factors that significantly influence the variable to be forecasted are unavailable

either completely or partially within the examined time span. An example of such

parameters is the temperature. When temperature declines, customers tend to use

more gas than when it is hot. However, future temperature itself is hard to predict.

The parameter of permeability in the petroleum production application is another

example. Since it is costly to measure this parameter, only one or two data points are

available during a well's life.

6.2 Future Works

Despite the satisfactory performance of the MNN application in petroleum production

prediction, some experts commented that the model should be built on individual wells

92


another generation that can adapt to the new situation. The new generation therefore

inherits the characteristics of the previous generation.

We observed some limitations that cause forecasting inaccuracy no matter how well a

model is trained.

• There is genuine random noise in the data due to error in the process of recording the

data. In both time series, we noticed several non-zero but abnormal data points. It is

not known whether they were incorrectly recorded or they are irregularities in the

data.

• The sample data set does not evenly spread in the feature space and it is only a partial

representation of the population. In the time series context, this means that the length

of time series under investigation is insufficient to represent all patterns in the

problem space. For example, the gas consumption time series does not contain

enough seasons to be able to include the seasonal factor.

• The factors that significantly influence the variable to be forecasted are unavailable

either completely or partially within the examined time span. An example of such

parameters is the temperature. When temperature declines, customers tend to use

more gas than when it is hot. However, future temperature itself is hard to predict.

The parameter of permeability in the petroleum production application is another

example. Since it is costly to measure this parameter, only one or two data points are

available during a well’s life.

6.2 Future Works

Despite the satisfactory performance of the MNN application in petroleum production

prediction, some experts commented that the model should be built on individual wells

92


because each well has unique characteristics and that the results obtained in the

experiments described were good by chance. The problem with developing an individual

model for each well is the serious lack of data. A well could last up to 30 years. If a

model is built after 5 years then there are only 60 monthly productions. A portion of this

set should be withdrawn for testing, which leave us with about 40 data points. If this

approach is followed, an attempt could be made using a combination of techniques such

as in [Ru95]. In the study, Rumantir used a statistical technique to model the trend and

seasonal factors and neural network technique to model the irregularities. Since the

statistical technique requires little data, the problem of insufficient data is overcome.

For the problem of predicting gas consumption, it is difficult to make any

improvement unless more data is collected. Further research can include dividing the

problem into sub-problems based on season of the year. We estimate that two to five

years of data is necessary in order to build accurate seasonal models.

A weakness in the reported work is that only the simple hold out validation method is

employed in the current systems. The data set was divided into two portions for training

and testing. However, this method of evaluation can have a high variance. The evaluation

may depend heavily on which data points end up in the training set and which end up in

the test set, and thus the evaluation may be significantly different depending on how the

division is made.

Future improvement to the systems can include cross validation. K-fold cross

validation is one way to improve over the hold out method. The data set is divided into k

subsets, and the holdout method is repeated k times. Each time, one of the k subsets is

used as the test set and the other k-1 subsets are put together to form a training set. Then

93


because each well has unique characteristics and that the results obtained in the

experiments described were good by chance. The problem with developing an individual

model for each well is the serious lack of data. A well could last up to 30 years. If a

model is built after 5 years then there are only 60 monthly productions. A portion of this

set should be withdrawn for testing, which leave us with about 40 data points. If this

approach is followed, an attempt could be made using a combination of techniques such

as in [Ru95]. In the study, Rumantir used a statistical technique to model the trend and

seasonal factors and neural network technique to model the irregularities. Since the

statistical technique requires little data, the problem of insufficient data is overcome.

For the problem of predicting gas consumption, it is difficult to make any

improvement unless more data is collected. Further research can include dividing the

problem into sub-problems based on season of the year. We estimate that two to five

years of data is necessary in order to build accurate seasonal models.

A weakness in the reported work is that only the simple hold out validation method is

employed in the current systems. The data set was divided into two portions for training

and testing. However, this method of evaluation can have a high variance. The evaluation

may depend heavily on which data points end up in the training set and which end up in

the test set, and thus the evaluation may be significantly different depending on how the

division is made.

Future improvement to the systems can include cross validation. K-fold cross

validation is one way to improve over the hold out method. The data set is divided into k

subsets, and the holdout method is repeated k times. Each time, one of the k subsets is

used as the test set and the other k-1 subsets are put together to form a training set. Then

93


the average error over all k test set is computed. The advantage of the k-fold cross

validation method is that it matters less how the data gets divided. Every data point gets

to be in a test set exactly once, and gets to be in a training set k-1 times. As k is increased,

the variance of the evaluation decreases. The disadvantage of this method is that the

training algorithm has to be rerun k times, which means it takes k times as much

computation to make an evaluation.

The current MNN tool still needs much improvement. Considering the serious loss in

the number of data records when a missing data point is eliminated, a method to fill the

missing point should be applied. A simple approach could be replacing the missing data

with the average of neighboring points.

A future topic to investigate would be a more automatic strategy for training which

can reduce users' efforts. A number of methods to adapting the learning rate such as bold

driver [Sa99] and annealing [BA98] have been proposed in the literature. In bold driver

neural network, after each training cycle, the training error is compared to its previous

value. If the error has decreased, the learning rate is increased slightly. If the error has

increased significantly, the last weight changes are discarded and the learning rate is

decreased sharply. The bold driver method keeps growing learning rate slowly until it

finds itself taking a step that has clearly gone too far up onto the opposite slope of the

error function. The annealing method gradually lowers the global learning rate.

Another more ambitious improvement to the MNN system could be implementing

different kinds of training algorithms for the component neural networks.

94


the average error over all k test set is computed. The advantage of the k-fold cross

validation method is that it matters less how the data gets divided. Every data point gets

to be in a test set exactly once, and gets to be in a training set k-1 times. As k is increased,

the variance of the evaluation decreases. The disadvantage of this method is that the

training algorithm has to be rerun k times, which means it takes k times as much

computation to make an evaluation.

The current MNN tool still needs much improvement. Considering the serious loss in

the number of data records when a missing data point is eliminated, a method to fill the

missing point should be applied. A simple approach could be replacing the missing data

with the average of neighboring points.

A future topic to investigate would be a more automatic strategy for training which

can reduce users’ efforts. A number of methods to adapting the learning rate such as bold

driver [Sa99] and annealing [BA98] have been proposed in the literature. In bold driver

neural network, after each training cycle, the training error is compared to its previous

value. If the error has decreased, the learning rate is increased slightly. If the error has

increased significantly, the last weight changes are discarded and the learning rate is

decreased sharply. The bold driver method keeps growing learning rate slowly until it

finds itself taking a step that has clearly gone too far up onto the opposite slope of the

error function. The annealing method gradually lowers the global learning rate.

Another more ambitious improvement to the MNN system could be implementing

different kinds of training algorithms for the component neural networks.

9 4


Bibliography

[BA98] Bos S. and Amari S., Annealed online learning in multilayer neural networks,

1998, downloaded from citeseer.nj.nec.com/147107.html on October 2002

[BLM93] Bozna M, Lesjak M. and Mlakar P., A Neural Network Based Method for

Short-Term Predictions of Ambient SO2 Concentrations in Highly Polluted Industrial

Areas of Complex Terrain, Atmospheric Environment, vol. 27B, no. 2, 1993, pp. 221-230

[CLC97] Chih-Chou Chiu, Ling-Jing Kao and Cook D. F., Combining a Neural Network

with a rule-Based Expert System Approach for Short-Term Power Load Forecasting in

Taiwan, Expert System With Applications, vol. 13, no. 4, 1997, pp. 299-305

[CNO3] Chan C.W. and Nguyen H.H., Artificial Intelligence Techniques in Forecasting

Applications, In Leondes C.T, ed, Intelligent systems: Technology and Applications,

CRC Press, Boca Raton, London, New York, Washington D.C., 2003, vol. 5, ch. 5, pp.

115-152

[Ch94] Chakraborty K. et.al., Forecasting the behavior of multivariate time series using

neural networks, in V. Rao Vemuri and Robert D. Rogers, eds, Artificial Neural

Networks : Forecasting Time Series, IEEE Computer Society Press, Los Alamitos,

California, 1994, pp. 51-60

[Ch75] Chatfield C., The Analysis of Time Series: Theory and Practice, London:

Chapman & Hall, New York: Halsted Press, 1975

[CK95] Cho S.B. and Kim J.H., Combining Multiple Neural Networks by Fuzzy Integral

for Robust Classification, IEEE Transactions on Systems, Man, and Cybernetics, vol. 25,

no. 2, 1995, pp. 380-384

95


Bibliography

[BA98] Bos S. and Amari S., Annealed online learning in multilayer neural networks,

1998, downloaded from citeseer.nj.nec.com/147107.html on October 2002

[BLM93] Bozna M, Lesjak M. and Mlakar P., A Neural Network Based Method for

Short-Term Predictions of Ambient SO2 Concentrations in Highly Polluted Industrial

Areas of Complex Terrain, Atmospheric Environment, vol. 27B, no. 2, 1993, pp. 221-230

[CLC97] Chih-Chou Chiu, Ling-Jing Kao and Cook D. F., Combining a Neural Network

with a rule-Based Expert System Approach for Short-Term Power Load Forecasting in

Taiwan, Expert System With Applications, vol. 13, no. 4, 1997, pp. 299-305

[CN03] Chan C.W. and Nguyen H.H., Artificial Intelligence Techniques in Forecasting

Applications, In Leondes C.T, ed, Intelligent systems: Technology and Applications,

CRC Press, Boca Raton, London, New York, Washington D.C., 2003, vol. 5, ch. 5, pp.

115-152

[Ch94] Chakraborty K. et.al., Forecasting the behavior of multivariate time series using

neural networks, in V. Rao Vemuri and Robert D. Rogers, eds, Artificial Neural

Networks : Forecasting Time Series, IEEE Computer Society Press, Los Alamitos,

California, 1994, pp. 51-60

[Ch75] Chatfield C., The Analysis o f Time Series: Theory and Practice, London:

Chapman & Hall, New York: Halsted Press, 1975

[CK95] Cho S.B. and Kim J.H., Combining Multiple Neural Networks by Fuzzy Integral

for Robust Classification, IEEE Transactions on Systems, Man, and Cybernetics, vol. 25,

no. 2,1995, pp. 380-384

95


[Di85] Dikkers A. J., Geology in Petroleum Production, Amsterdam, New York:

Elsevier, 1985

[Do96] Dorffner G., Neural Networks for Time Series Processing, Neural Network

World, vol. 6, no.4, 1996, pp. 447-468

[DSMV01] Duhoux M, Suykens J.A.K, De Moor B., and Vandewalle J., Improved Long-

Term Temperature Prediction by Chaining of Neural Networks, International Journal of

Neural Systems, vol. 11, no. 1, 2001, pp. 1-10

[FS87] Farmer J.D. and Sidorowich J.J, Predicting Chaotic Time-Series, Physical Review

Letters, vol. 59, no. 8, 1987, pp. 845-848

[GD99] Gardner M.W, Dorling S.R., Neural Network Modeling and Prediction of Hourly

NOx and NO2 Concentrations in Urban Air in London, Atmospheric Environment, vol.

33, no. 5, 1999, pp. 709-719

[Ge95] Gensym Corporation, NeurOn-Line Reference Manual 1.1, 1995

[GRT99] Guhaathakurta P., Rajeevan M. and Thapliyal V., Long Range Forecasting

Indian Summer Monsoon Rainfall by a Hybrid Principle Component Neural Network

Model, Meteorology and Atmospheric Physics, vol. 71, 1999, pp. 255-266

[HG93] Harrison H.C. and Gong Qizhong, An Intelligent Business Forecasting Systems,

AMC Conference on Computer Science, Indianapolis, IN, USA, 1993, pp. 229-236

[HSY94] Hashem S., Schemeiser B. and Yih Y., Optimal Linear Combinations of Neural

Networks: An Overview, Proceedings of the 1994 IEEE International Conference on

Neural Networks (ICNN'94), vol. 3, Orlando, FL, 1994, pp. 1507-1512

96


[Di85] Dikkers A. J., Geology in Petroleum Production, Amsterdam, New York:

Elsevier, 1985

[Do96] Dorffner G., Neural Networks for Time Series Processing, Neural Network

World, vol. 6, no.4, 1996, pp. 447-468

[DSMV01] Duhoux M, Suykens J.A.K, De Moor B., and Vandewalle J., Improved Long-

Term Temperature Prediction by Chaining of Neural Networks, International Journal of

Neural Systems, vol. 11, no. 1, 2001, pp. 1-10

[FS87] Farmer J.D. and Sidorowich J.J, Predicting Chaotic Time-Series, Physical Review

Letters, vol. 59, no. 8, 1987, pp. 845-848

[GD99] Gardner M.W, Dorling S.R., Neural Network Modeling and Prediction of Hourly

NOx and NO2 Concentrations in Urban Air in London, Atmospheric Environment, vol.

33, no. 5, 1999, pp. 709-719

[Ge95] Gensym Corporation, NeurOn-Line Reference Manual 1.1, 1995

[GRT99] Guhaathakurta P., Rajeevan M. and Thapliyal V., Long Range Forecasting

Indian Summer Monsoon Rainfall by a Hybrid Principle Component Neural Network

Model, Meteorology and Atmospheric Physics, vol. 71, 1999, pp. 255-266

[HG93] Harrison H.C. and Gong Qizhong, An Intelligent Business Forecasting Systems,

AMC Conference on Computer Science, Indianapolis, IN, USA, 1993, pp. 229-236

[HSY94] Hashem S., Schemeiser B. and Yih Y., Optimal Linear Combinations of Neural

Networks: An Overview, Proceedings o f the 1994 IEEE International Conference on

Neural Networks (ICNN’94), vol. 3, Orlando, FL, 1994, pp. 1507-1512

9 6


[KHOO] Kao J.J and Huang S.S, Forecasts Using Neural Network versus Box-Jenkins

Methodology for Ambient Air Quality Monitoring Data, Journal of the Air & Waste

Management Association, vol. 50, 2000, pp. 219-226

[KNJK89] Kadaba N, Nygard K.E., Juell P.L., and Kangas L., Modular Back-

Propagation Neural Networks for Large Domain Pattern Classification, Proceedings of

the International Joint Conference on Neural Networks IJCNN'89, Washington DC, USA,

1989, vol. 2, pp. 607-610

[KR94] Kolarik T. and Rudorfer G., Time Series Forecasting Using Neural Networks,

ACM SIGAPL APL Quote Quad, vol. 25 no.1, 1994, pp. 86-94

[LC98] Lertpalangsunti N. and Chan C.W, An Architectural Framework for Construction

of Hybrid Intelligent Forecasting Systems: Application for Electricity Demand

Prediction, Engineering Applications of Artificial Intelligence, vol. 11, 1998, pp. 549-565

[LCMT99] Lertpalangsunti N., Chan C.W., Mason R., and Tontiwachwuthikul P., A

toolset for construction of hybrid intelligent forecasting systems: application for water

demand prediction, Artificial Intelligence in Engineering, vol. 13, no. 1, 1999, pp. 21-42

[Le96] Lee B. J, Applying Parallel Learning Models of Aritificial Neural Networks to

Letters Recognition from Phonemes, Proceedings of the Conference on Integrating

Multiple Learned Models for Improving and Scaling Machine Learning Algorithms,

Portland, Oregon, 1996, pp.66-71

[MJG90] Montgomery D. C., Johnson L. A., Gardiner J. S., Forecasting & Time Series

Analysis, 2nd ed., New York: McGraw-Hill Inc., 1990

97


[KHOO] Kao J J and Huang S.S, Forecasts Using Neural Network versus Box-Jenkins

Methodology for Ambient Air Quality Monitoring Data, Journal o f the A ir & Waste

Management Association, vol. 50, 2000, pp. 219-226

[KNJK89] Kadaba N, Nygard K.E., Juell P.L., and Kangas L., Modular Back-

Propagation Neural Networks for Large Domain Pattern Classification, Proceedings of

the International Joint Conference on Neural Networks IJCNN’89, Washington DC, USA,

1989, vol. 2, pp. 607-610

[KR94] Kolarik T. and Rudorfer G., Time Series Forecasting Using Neural Networks,

ACM SIGAPL APL Quote Quad, vol. 25 no .l, 1994, pp. 86-94

[LC98] Lertpalangsunti N. and Chan C.W, An Architectural Framework for Construction

of Hybrid Intelligent Forecasting Systems: Application for Electricity Demand

Prediction, Engineering Applications o f Artificial Intelligence, vol. 11, 1998, pp. 549-565

[LCMT99] Lertpalangsunti N., Chan C.W., Mason R., and Tontiwachwuthikul P., A

toolset for construction of hybrid intelligent forecasting systems: application for water

demand prediction, Artificial Intelligence in Engineering, vol. 13, no. 1, 1999, pp. 21-42

[Le96] Lee B. J, Applying Parallel Learning Models of Aritificial Neural Networks to

Letters Recognition from Phonemes, Proceedings of the Conference on Integrating

Multiple Learned Models for Improving and Scaling Machine Learning Algorithms,

Portland, Oregon, 1996, pp.66-71

[MJG90] Montgomery D. C., Johnson L. A., Gardiner J. S., Forecasting & Time Series

Analysis, 2nd ed., New York: McGraw-Hill Inc., 1990

9 7


[MSV99] McNames J., Suykens J.A.K. and Vandewalle, Winning Entry of the K. U.

Leuven Time Series Prediction Competition, International Journal of Bifurcation and

Chaos, vol. 9, no. 8, 1999, pp. 1485-1500

[NC00] Nguyen H.H. and Chan C.W., Petroleum Production Prediction: A Neural

Network Approach, International Joint Conference on Engineering Design and

Automation 2001 (EDA 2001), 5-8 August 2001, Las Vegas, USA, pp. 85-90

[NCM02] Nguyen H.H., Chan C.W., and Malcolm W., Prediction of Oil Well Production

using Multi-Neural Network, Proceedings of the 2002 IEEE Canadian Conference on

Electrical & Computer Engineering, Winnipeg, Canada, May 2002, pp. 798-802

[0d83] O'Donovan T. M., Short Term Forecasting: An Introduction to the Box-Jenkins

Approach, Chichester, New York: John Wiley & Sons, 1983

[Po89] Posner M.I., Foundation of Cognitive Science, The MIT Press, Cambridge, 1989

[Ru95] Rumantir G.W., A Hybrid Statistical and Feedforward Network Model for

Forecasting with a Limited Amount of Data: Average Monthly Water Demand Time-

series, Minor Master Thesis in Computer Science, RMIT (Royal Melbourne Institute of

Technology) University, 1995

[Sa99] Sarkar D., Methods to speed up error back-propagation learning algorithm, ACM

Computing Surveys (CSUR), vol. 27, no. 4, 1995, pp. 519-542

[SMKLS00] Swircz M, Mariak Z., Krejza J. and Szydlik P., Intracranial Pressure

Processing with Artificial Neural Networks: Prediction of ICP Trends, Acta

Neurochirurgica, vol. 142, 2000, pp. 401-406

98


[MSV99] McNames J., Suykens J.A.K. and Vandewalle, Winning Entry of the K. U.

Leuven Time Series Prediction Competition, International Journal o f Bifurcation and

Chaos, vol. 9, no. 8, 1999, pp. 1485-1500

[NC00] Nguyen H.H. and Chan C.W., Petroleum Production Prediction: A Neural

Network Approach, International Joint Conference on Engineering Design and

Automation 2001 (EDA 2001), 5-8 August 2001, Las Vegas, USA, pp. 85-90

[NCM02] Nguyen H.H., Chan C.W., and Malcolm W., Prediction of Oil Well Production

using Multi-Neural Network, Proceedings o f the 2002 IEEE Canadian Conference on

Electrical & Computer Engineering, Winnipeg, Canada, May 2002, pp. 798-802

[Od83] O ’Donovan T. M., Short Term Forecasting: An Introduction to the Box-Jenkins

Approach, Chichester, New York: John Wiley & Sons, 1983

[Po89] Posner M.I., Foundation o f Cognitive Science, The MIT Press, Cambridge, 1989

[Ru95] Rumantir G.W., A Hybrid Statistical and Feedforward Network Model for

Forecasting with a Limited Amount of Data: Average Monthly Water Demand Time-

series, Minor Master Thesis in Computer Science, RMIT (Royal Melbourne Institute of

Technology) University, 1995

[Sa99] Sarkar D., Methods to speed up error back-propagation learning algorithm, ACM

Computing Surveys (CSUR), vol. 27, no. 4, 1995, pp. 519-542

[SMKLS00] Swircz M, Mariak Z., Krejza J. and Szydlik P., Intracranial Pressure

Processing with Artificial Neural Networks: Prediction of ICP Trends, Acta

Neurochirurgica, vol. 142, 2000, pp. 401-406

9 8


[SSS00] Sahai A.K, Soman M.K and Satyan V, All India Summer Monsoon Rainfall

Prediction Using an Artificial Neural Network, Climate Dynamics, vol. 16, 2000, pp.

291-302

[TAF91] Tang Z., Almeida C. and Fishwick P. A., Time Series Forecasting Using Neural

Networks vs. Box-Jenkins Methodology, Simulation, vol. 57, no. 5, 1991, pp. 303-310

[THT97] Tangang F.T., Hsieh W.W and Tang B, Forecasting the Equatorial Pacific Sea

Surface Temperatures by Neural Network Models, Climate Dynamics, vol. 13, 1997, pp.

135-147

[Th95] Thearling K., Massively Parallel Architectures and Algorithms for Time Series

Analysis, In Nadel L. and Stein D., eds., 1993 Lectures in Complex Systems, Redwood

City, California: Addison-Wesley, 1995, pp.381-396

[Wa01] Walczak S., An Empirical Analysis of Data Requirements for Financial

Forecasting with Neural Networks, Journal of Management Information Systems, vol. 17,

no. 4, pp. 203-222

[Wi92] Winston P. H., Artificial Intelligence, 3rd ed., Reading, Mass.:Addison-Wesley

Publishing Company, 1992.

[WL93] Wu S. and Lu R.P, Combining Artificial Neural Networks and Statistics for

Stock-Market, Proceedings of the 1993 ACM conference on Computer science,

Indianapolis, Indiana, United States, 1993, pp. 257-264

[Ya99] Yasdi R, Prediction of Road Traffic Using a Neural Network Approach, Neural

Computing & Applications, vol. 8, 1999, pp. 135-142

99


[SSSOO] Sahai A.K, Soman M.K and Satyan V, All India Summer Monsoon Rainfall

Prediction Using an Artificial Neural Network, Climate Dynamics, vol. 16, 2000, pp.

291-302

[TAF91] Tang Z., Almeida C. and Fishwick P. A., Time Series Forecasting Using Neural

Networks vs. Box-Jenkins Methodology, Simulation, vol. 57, no. 5, 1991, pp. 303-310

[THT97] Tangang F.T., Hsieh W.W and Tang B, Forecasting the Equatorial Pacific Sea

Surface Temperatures by Neural Network Models, Climate Dynamics, vol. 13, 1997, pp.

135-147

[Th95] Thearling K., Massively Parallel Architectures and Algorithms for Time Series

Analysis, In Nadel L. and Stein D., eds., 1993 Lectures in Complex Systems, Redwood

City, California: Addison-Wesley, 1995, pp.381-396

[WaOl] Walczak S., An Empirical Analysis of Data Requirements for Financial

Forecasting with Neural Networks, Journal o f Management Information Systems, vol. 17,

no. 4, pp. 203-222

[Wi92] Winston P. H., Artificial Intelligence, 3rd ed., Reading, Mass.:Addison-Wesley

Publishing Company, 1992.

[WL93] Wu S. and Lu R.P, Combining Artificial Neural Networks and Statistics for

Stock-Market, Proceedings o f the 1993 ACM conference on Computer science,

Indianapolis, Indiana, United States, 1993, pp. 257-264

[Ya99] Yasdi R, Prediction of Road Traffic Using a Neural Network Approach, Neural

Computing & Applications, vol. 8, 1999, pp. 135-142

9 9


[YP96] Yi J. and Prybutok V.R, A Neural Network Model Forecasting for Prediction of

Dailly Maximum Ozone Concentration in an Industrialized Urban Area, Environmental

Pollution, vol. 92, no. 3, 1996, pp. 349-357

100


[YP96] Yi J. and Prybutok V.R, A Neural Network Model Forecasting for Prediction of

Dailly Maximum Ozone Concentration in an Industrialized Urban Area, Environmental

Pollution, vol. 92, no. 3, 1996, pp. 349-357

100


APPENDIX A - RUNNING THE MNN TOOL

The classes of the MNN tool and the necessary libraries are bundled into a JAR (Java

Archive) file called mnn.jar. Java Runtime Environment is required for the MNN tool to

function.

To start the tool, from the directory that contains the archive, type "java - jar mnn.jar"

from a DOS command line. The main screen will appear.

Multiple Neural Networks


Train Neural Networks

Test Existing Neural Networks

Predict Using Existing Neural Networks

Exit

Figure A.1 Main screen of the MNN tool

When the user chooses to train or test neural networks, or to use existing neural

networks for prediction, a window will be open for the user to enter necessary

parameters. Refer to section 3.6.2 for descriptions of these parameters.

101


APPENDIX A - RUNNING THE MNN TOOL

The classes of the MNN tool and the necessary libraries are bundled into a JAR (Java

Archive) file called mnn.jar. Java Runtime Environment is required for the MNN tool to

function.

To start the tool, from the directory that contains the archive, type "java - jar mnn.jar"

from a DOS command line. The main screen will appear.

Multiple Neuial Networks nix]


Train N eura l N e tw orks

T es t Existing N eura l N e tw orks

Predict U s ing Existing N eura l N e tw orks

Exit

Figure A .l Main screen of the MNN tool

When the user chooses to train or test neural networks, or to use existing neural

networks for prediction, a window will be open for the user to enter necessary

parameters. Refer to section 3.6.2 for descriptions of these parameters.

101


APPENDIX B - FORMATS OF PARAMETER AND DATA

FILES FOR THE MNN TOOL

Symbol 0+ indicates the term inside the bracket is repeated one or more times.

Training, testing and validation data file format

Estimated lower bound of data values

Estimated upper bound of data values

A data point

// Note: the data point in this loop must be continuous in time

)+

Prediction input data file format



A input vector, separated by space

)+

102


APPENDIX B - FORMATS OF PARAMETER AND DATA

FILES FOR THE MNN TOOL

Symbol ()+ indicates the term inside the bracket is repeated one or more times.

Training, testing and validation data file format



#

(

(

A data point

// Note: the data point in this loop must be continuous in time

}+

#

)+

Prediction input data file format



(

A input vector, separated by space

)+

102


Training parameter file format

Lead

Size of input vector

Number of neural networks

Minimum number of training cycles


Validation interval

Size of validation windows

Using multi-validation or not? (Y/N)

Path of training data file path

Path of validation data file

Path of training output file

(

Load this neural network from file or not? (Y/N)

Path of the neural network file

Train this neural network or not? (Y?N)

Number of hidden neurons

MAPE threshold

Learning rate

Momentum

)+


Training parameter file format

Lead



Minimum number of training cycles


Validation interval

Size of validation windows

Using multi-validation or not? (Y/N)

Path of training data file path

Path of validation data file

Path of training output file

(

#

Load this neural network from file or not? (Y/N)

Path of the neural network file

Train this neural network or not? (Y?N)

Number of hidden neurons

MAPE threshold

Learning rate

Momentum

)+


Testing parameter file format

Lead



Path of test data file

Path of test output file

Path of neural network file

)+

Prediction parameter file format

Lead



Path of prediction input file

Path of prediction output file


)+


Testing parameter file format

Lead



Path of test data file

Path of test output file

(


)+

Prediction parameter file format

Lead



Path of prediction input file

Path of prediction output file

(


)+


APPENDIX C - SAMPLE DATA

Sample Petroleum Data

Table C.1 Sample of oil production data

Location Production Date Hours on Production Oil Volume

01/11-25-001-15W2/0 196506 336 6941 01/11-25-001-15W2/0 196507 336 4345

01/11-25-001-15W2/0 196508 240 3275

01/11-25-001-15W2/0 196509 216 2868

01/11-25-001-15W2/0 196510 216 2800 01/11-25-001-15W2/0 196511 624 7750 01/11-25-001-15W2/0 196512 528 7992 01/11-25-001-15W2/0 196601 744 10432 01/11-25-001-15W2/0 196602 672 6012

01/11-25-001-15W2/0 196603 624 9533 01/11-25-001-15W2/0 196604 360 5298 01/11-25-001-15W2/0 196605 624 8227

01/11-25-001-15W2/0 196606 384 7973 01/11-25-001-15W2/0 196607 504 6797

01/11-25-001-15W2/0 196608 672 9349 01/11-25-001-15W2/0 196609 600 8211

01/11-25-001-15W2/0 196610 600 7323 01/11-25-001-15W2/0 196611 600 8958 01/11-25-001-15W2/0 196612 744 9284 01/11-25-001-15W2/0 196701 720 9325 01/11-25-001-15W2/0 196702 552 6903 01/11-25-001-15W2/0 196703 216 3145 01/11-25-001-15W2/0 196704 600 8610 01/11-25-001-15W2/0 196705 648 8252 01/11-25-001-15W2/0 196706 648 8464 01/11-25-001-15W2/0 196707 552 6903 01/11-25-001-15W2/0 196708 696 8451 01/11-25-001-15W2/0 196709 456 5516 01/11-25-001-15W2/0 196710 624 7844 01/11-25-001-15W2/0 196711 720 10002 01/11-25-001-15W2/0 196712 648 7434 01/11-25-001-15W2/0 196801 336 7731 01/11-25-001-15W2/0 196802 696 9978 01/11-25-001-15W2/0 196803 384 5058 01/11-25-001-15W2/0 196804 576 8154 01/11-25-001-15W2/0 196805 504 6987 01/11-25-001-15W2/0 196806 624 9075


APPENDIX C - SAMPLE DATA

Sample Petroleum Data

Table C .l Sample of oil production data

Location Production Date Hours on Production Oil Volume01/11-25-001-15W2/0 196506 336 694101/11-25-001-15W2/0 196507 336 434501/11-25-001-15W2/0 196508 240 327501/11 -25-001-15W2/0 196509 216 286801/11 -25-001-15W2/0 196510 216 280001/11-25-001-15W2/0 196511 624 775001/11-25-001-15W2/0 196512 528 799201/11-25-001-15W2/0 196601 744 1043201/11-25-001-15W2/0 196602 672 601201/11-25-001-15W2/0 196603 624 953301/11 -25-001-15W2/0 196604 360 529801/11-25-001-15W2/0 196605 624 822701/11-25-001-15W2/0 196606 384 797301/11-25-001-15W2/0 196607 504 679701/11-25-001-15W2/0 196608 672 934901/11-25-001-15W2/0 196609 600 821101/11 -25-001 -15W2/0 196610 600 732301/11-25-001-15W2/0 196611 600 895801/11 -25-001-15W2/0 196612 744 928401/11 -25-001-15W2/0 196701 720 932501/11-25-001-15W2/0 196702 552 690301/11-25-001-15W2/0 196703 216 314501/11-25-001-15W2/0 196704 600 861001/11-25-001-15W2/0 196705 648 825201/11-25-001-15W2/0 196706 648 846401/11 -25-001-15W2/0 196707 552 690301/11-25-001-15W2/0 196708 696 845101/11-25-001-15W2/0 196709 456 551601/11-25-001-15W2/0 196710 624 784401/11 -25-001-15W2/0 196711 720 1000201/11-25-001-15W2/0 196712 648 743401/11 -25-001-15W2/0 196801 336 773101/11-25-001-15W2/0 196802 696 997801/11-25-001-15W2/0 196803 384 505801/11-25-001-15W2/0 196804 576 815401/11 -25-001-15W2/0 196805 504 698701/11 -25-001-15W2/0 196806 624 9075


01/11-25-001-15W2/0 196807 552 8181 01/11-25-001-15W2/0 196808 480 7623 01/11-25-001-15W2/0 196809 672 9490 01/11-25-001-15W2/0 196810 432 6027 01/11-25-001-15W2/0 196811 576 8675 01/11-25-001-15W2/0 196812 528 7248 01/11-25-001-15W2/0 196901 720 10032 01/11-25-001-15W2/0 196902 600 8305 01/11-25-001-15W2/0 196903 504 6528 01/11-25-001-15W2/0 196904 480 6884 01/11-25-001-15W2/0 196905 528 6882 01/11-25-001-15W2/0 196906 600 8314 01/11-25-001-15W2/0 196907 696 8572

01/11-25-001-15W2/0 196908 576 10763 01/11-25-001-15W2/0 196909 504 5908 01/11-25-001-15W2/0 196910 744 9052 01/11-25-001-15W2/0 196911 456 5473 01/11-25-001-15W2/0 196912 648 7809 01/11-25-001-15W2/0 197001 720 8287 01/11-25-001-15W2/0 197002 672 8228 01/11-25-001-15W2/0 197003 552 6730 01/11-25-001-15W2/0 197004 384 4766 01/11-25-001-15W2/0 197005 624 9280 01/11-25-001-15W2/0 197006 720 10334 01/11-25-001-15W2/0 197007 720 11399 01/11-25-001-15W2/0 197008 672 7539 01/11-25-001-15W2/0 197009 720 11600 01/11-25-001-15W2/0 197010 744 10995 01/11-25-001-15W2/0 197011 720 10892 01/11-25-001-15W2/0 197012 744 17687 01/11-25-001-15W2/0 197101 744 16752 01/11-25-001-15W2/0 197102 672 14219 01/11-25-001-15W2/0 197103 744 16249 01/11-25-001-15W2/0 197104 720 14084 01/11-25-001-15W2/0 197105 744 14787 01/11-25-001-15W2/0 197106 720 13142 01/11-25-001-15W2/0 197107 744 13836 01/11-25-001-15W2/0 197108 720 15066 01/11-25-001-15W2/0 197109 720 16523 01/11-25-001-15W2/0 197110 744 19001 01/11-25-001-15W2/0 197111 576 15235 01/11-25-001-15W2/0 197112 624 14605 01/11-25-001-15W2/0 197201 744 16867 01/11-25-001-15W2/0 197202 696 15600 01/11-25-001-15W2/0 197203 600 14078 01/11-25-001-15W2/0 197204 720 13650


01/11-25-001-15W2/0 196807 552 818101/11-25-001-15W2/0 196808 480 762301/11-25-001-15W2/0 196809 672 949001/11 -25-001-15W2/0 196810 432 602701/11-25-001-15W2/0 196811 576 867501/11-25-001-15W2/0 196812 528 724801/11-25-001-15W2/0 196901 720 1003201/11-25-001-15W2/0 196902 600 830501/11-25-001-15W2/0 196903 504 652801/11 -25-001-15W2/0 196904 480 688401/11 -25-001-15W2/0 196905 528 688201/11-25-001-15W2/0 196906 600 831401/11-25-001-15W2/0 196907 696 857201/11-25-001-15W2/0 196908 576 1076301/11-25-001-15W2/0 196909 504 590801/11 -25-001-15W2/0 196910 744 905201/11-25-001-15W2/0 196911 456 547301/11-25-001-15W2/0 196912 648 780901/11-25-001-15W2/0 197001 720 828701/11 -25-001-15W2/0 197002 672 822801/11-25-001-15W2/0 197003 552 673001/11-25-001-15W2/0 197004 384 476601/11-25-001-15W2/0 197005 624 928001/11-25-001-15W2/0 197006 720 1033401/11 -25-001-15W2/0 197007 720 1139901/11-25-001-15W2/0 197008 672 753901/11-25-001-15W2/0 197009 720 1160001/11 -25-001-15W2/0 197010 744 1099501/11-25-001-15W2/0 197011 720 1089201/11-25-001-15W2/0 197012 744 1768701/11-25-001-15W2/0 197101 744 1675201/11-25-001-15W2/0 197102 672 1421901/11 -25-001-15W2/0 197103 744 1624901/11-25-001-15W2/0 197104 720 1408401/11-25-001-15W2/0 197105 744 1478701/11 -25-001-15W2/0 197106 720 1314201/11-25-001-15W2/0 197107 744 1383601/11-25-001-15W2/0 197108 720 1506601/11-25-001-15W2/0 197109 720 1652301/11-25-001-15W2/0 197110 744 1900101/11-25-001-15W2/0 197111 576 1523501/11-25-001-15W2/0 197112 624 1460501/11-25-001-15W2/0 197201 744 1686701/11-25-001-15W2/0 197202 696 1560001/11-25-001-15W2/0 197203 600 1407801/11-25-001-15W2/0 197204 720 13650

1 0 6


01/11-25-001-15W2/0 197205 672 11020 01/11-25-001-15W2/0 197206 360 15023 01/11-25-001-15W2/0 197207 744 24237 01/11-25-001-15W2/0 197208 600 15362 01 /1 1-25-001-15W2/0 197209 648 14057 01/11-25-001-15W2/0 197210 648 12662 01/11-25-001-15W2/0 197211 672 12527 01/11-25-001-15W2/0 197212 744 12231 01/11-25-001-15W2/0 197301 744 13242 01/11-25-001-15W2/0 197302 600 9277 01/11-25-001-15W2/0 197303 744 11443 01/11-25-001-15W2/0 197304 672 9651 01/11-25-001-15W2/0 197305 504 6266 01/11-25-001-15W2/0 197306 528 6758 01/11-25-001-15W2/0 197307 480 6242 01/11-25-001-15W2/0 197308 216 2926 01/11-25-001-15W2/0 197309 384 4578

Table C.2 Sample of raw core analysis data

Location Sample # Formulations Horizontal

Permeability (mD)

Vertical Permeability

(mD) Porosity

21/07-02-001-16W2/0 2 RATCLIFF 0.06 0.08 0.034 21/07-02-001-16W2/0 3 RATCLIFF 0.05 0 0.025 21/07-02-001-16W2/0 4 RATCLIFF 0.21 0.08 0.074 21/07-02-001-16W2/0 5 RATCLIFF 0.41 0.8 0.082 21/07-02-001-16W2/0 6 RATCLIFF 0.09 0 0.042 21/07-02-001-16W2/0 7 RATCLIFF 1.2 5.2 0.101 21/07-02-001-16W2/0 8 RATCLIFF 0.91 0.48 0.156 21/07-02-001-16W2/0 9 RATCLIFF 0.13 0.06 0.065 21/07-02-001-16W2/0 10 RATCLIFF 0.18 0.15 0.066 21/07-02-001-16W2/0 11 RATCLIFF 0 0 0 21/07-02-001-16W2/0 12 RATCLIFF 0.43 0.52 0.13 21/07-02-001-16W2/0 13 RATCLIFF 3 0.47 0.191 21/07-02-001-16W2/0 14 RATCLIFF 3.6 1.7 0.182 21/07-02-001-16W2/0 15 RATCLIFF 1.1 0.28 0.198 21/07-02-001-16W2/0 16 RATCLIFF 3.6 1.6 0.152 21/07-02-001-16W2/0 17 RATCLIFF 2.6 0.65 0.126 21/07-02-001-16W2/0 18 RATCLIFF 0.8 0.5 0.117 21/07-02-001-16W2/0 19 RATCLIFF 0.32 0.08 0.137 21/07-02-001-16W2/0 20 RATCLIFF 0.87 0.37 0.115 21/07-02-001-16W2/0 21 RATCLIFF 0.36 0.23 0.074

107


01/11-25-001-15W2/0 197205 672 1102001/11-25-001-15W2/0 197206 360 1502301/11-25-001-15W2/0 197207 744 2423701/11-25-001-15W2/0 197208 600 1536201/11 -25-001-15W2/0 197209 648 1405701/11-25-001-15W2/0 197210 648 1266201/11-25-001-15W2/0 197211 672 1252701/11-25-001-15W2/0 197212 744 1223101/11-25-001-15W2/0 197301 744 1324201/11-25-001-15W2/0 197302 600 927701/11-25-001-15W2/0 197303 744 1144301/11-25-001-15W2/0 197304 672 965101/11-25-001-15W2/0 197305 504 626601/11-25-001-15W2/0 197306 528 675801/11-25-001-15W2/0 197307 480 624201/11-25-001-15W2/0 197308 216 292601/11-25-001-15W2/0 197309 384 4578

Table C.2 Sample of raw core analysis data

Location Sample # FormulationsHorizontal

Permeability(mD)

VerticalPermeability

(mD)Porosity

21/07-02-001-16W2/0 2 RATCLIFF 0.06 0.08 0.03421 /07-02-001 -16W 2/0 3 RATCLIFF 0.05 0 0.02521/07-02-001-16W2/0 4 RATCLIFF 0.21 0.08 0.07421/07-02-001-16W 2/0 5 RATCLIFF 0.41 0.8 0.08221/07-02-001-16W2/0 6 RATCLIFF 0.09 0 0.04221/07-02-001-16W2/0 7 RATCLIFF 1.2 5.2 0.10121/07-02-001-16W 2/0 8 RATCLIFF 0.91 0.48 0.15621/07-02-001-16W2/0 9 RATCLIFF 0.13 0.06 0.06521/07-02-001-16W2/0 10 RATCLIFF 0.18 0.15 0.06621 /07-02-001 -16W2/0 11 RATCLIFF 0 0 021/07-02-001-16W2/0 12 RATCLIFF 0.43 0.52 0.1321/07-02-001-16W2/0 13 RATCLIFF 3 0.47 0.19121/07-02-001-16W2/0 14 RATCLIFF 3.6 1.7 0.18221/07-02-001-16W2/0 15 RATCLIFF 1.1 0.28 0.19821/07-02-001-16W2/0 16 RATCLIFF 3.6 1.6 0.15221 /07-02-001-16W 2/0 17 RATCLIFF 2.6 0.65 0.12621/07-02-001-16W2/0 18 RATCLIFF 0.8 0.5 0.11721/07-02-001-16W2/0 19 RATCLIFF 0.32 0.08 0.13721 /07-02-001 -16W2/0 20 RATCLIFF 0.87 0.37 0.11521/07-02-001-16W2/0 21 RATCLIFF 0.36 0.23 0.074


Table C.3 Sample of pressure data

Location First Shut-in Pressure 21/05-02-001-16W2/0 19512 21/05-03-001-16W2/0 19209 21/07-03-001-16W2/0 19691 01/10-03-001-16W2/0 19443 01/12-03-001-16W2/0 19167 01/02-04-001-16W2/0 18478 01/02-04-001-16W2/0 18768 01/02-04-001-16W2/0 19864 01/04-04-001-16W2/0 19671 01/04-04-001-16W2/0 20009 01/10-04-001-16W2/0 18554 01/12-04-001-16W2/0 18947 01/06-05-001-16W2/0 10004 01/06-05-001-16W2/0 9818 01/08-05-001-16W2/0 19753 01/08-05-001-16W2/0 19836 01/16-05-001-16W2/0 20043 01/10-08-001-16W2/0 20202 01/10-08-001-16W2/0 20250 01/02-09-001-16W2/0 19781

Gas Consumption Data

Table C.4 Sample of flow rate data at Melfort station

Date Time Melfort Flow 12/3/01 7:02:48 313.498 12/3/01 8:02:48 302.869 12/3/01 9:02:48 299.551 12/3/01 10:02:48 298.992 12/3/01 11:02:48 298.433 12/3/01 12:02:48 296.643 12/3/01 13:02:48 294.375 12/3/01 14:02:48 292.086 12/3/01 15:02:48 289.27 12/3/01 16:02:48 287.285 12/3/01 17:02:48 286.631 12/3/01 18:02:48 285.179 12/3/01 19:02:48 283.278 12/3/01 20:02:48 281.662 12/3/01 21:02:48 280.284 12/3/01 22:02:48 279.989 12/3/01 23:02:48 280.502 12/4/01 0:02:48 279.909


Table C.3 Sample of pressure data

Location First Shut-in Pressure21/05-02-001-16W2/0 1951221 /05-03-001 -16W2/0 1920921 /07-03-001 -16W2/0 1969101/10-03-001-16W2/0 1944301/12-03-001-16W2/0 1916701/02-04-001-16W 2/0 1847801/02-04-001-16W2/0 1876801 /02-04-001 -16W2/0 1986401/04-04-001-16W2/0 1967101 /04-04-001 -16W2/0 2000901/10-04-001-16W2/0 1855401/12-04-001-16W2/0 1894701/06-05-001-16W2/0 1000401 /06-05-001 -16W2/0 981801/08-05-001-16W2/0 1975301 /08-05-001 -16W2/0 1983601/16-05-001-16W2/0 2004301/10-08-001-16W2/0 2020201/10-08-001-16W2/0 2025001/02-09-001-16W2/0 19781

Gas Consumption Data

Table C.4 Sample of flow rate data at Melfort station

Date Time Melfort Flow12/3/01 7:02:48 313.49812/3/01 8:02:48 302.86912/3/01 9:02:48 299.55112/3/01 10:02:48 298.99212/3/01 11:02:48 298.43312/3/01 12:02:48 296.64312/3/01 13:02:48 294.37512/3/01 14:02:48 292.08612/3/01 15:02:48 289.2712/3/01 16:02:48 287.28512/3/01 17:02:48 286.63112/3/01 18:02:48 285.17912/3/01 19:02:48 283.27812/3/01 20:02:48 281.66212/3/01 21:02:48 280.28412/3/01 22:02:48 279.98912/3/01 23:02:48 280.50212/4/01 0:02:48 279.909


12/4/01 1:02:48 278.911 12/4/01 2:02:48 277.122 12/4/01 3:02:48 275.748 12/4/01 4:02:48 274.825 12/4/01 5:02:48 274.501 12/4/01 6:02:48 272.838 12/4/01 7:02:48 267.886 12/4/01 8:02:48 259.44 12/4/01 9:02:48 258.686 12/4/01 10:02:48 257.922 12/4/01 11:02:48 259.766 12/4/01 12:02:48 260.665 12/4/01 13:02:48 263.435 12/4/01 14:02:48 264.775 12/4/01 15:02:48 265.253 12/4/01 16:02:48 264.418 12/4/01 17:02:48 260.613 12/4/01 18:02:48 258.936 12/4/01 19:02:48 259.931 12/4/01 20:02:48 259.742 12/4/01 21:02:48 262.267 12/4/01 22:02:48 269.248 12/4/01 23:02:48 277.601 12/5/01 0:02:48 396.097 12/5/01 1:02:48 439.517 12/5/01 2:02:48 422.462 12/5/01 3:02:48 417.804 12/5/01 4:02:48 414.955 12/5/01 5:02:48 412.88 12/5/01 6:02:48 409.965 12/5/01 7:02:48 406.003 12/5/01 8:02:48 400.458 12/5/01 9:02:48 397.828 12/5/01 10:02:48 399.274 12/5/01 11:02:48 400.686 12/5/01 12:02:48 403.374 12/5/01 13:02:48 252.343 12/5/01 14:02:48 132.44 12/5/01 15:02:48 233.556 12/5/01 16:02:48 259.07 12/5/01 17:02:48 267.627 12/5/01 18:02:48 274.7 12/5/01 19:02:48 278.981 12/5/01 20:02:48 277.827 12/5/01 21:02:48 276.722 12/5/01 22:02:48 276.011 12/5/01 23:02:48 277.28 12/6/01 0:02:48 280.312 12/6/01 1:02:48 283.357


12/4/01 1:02:48 278.91112/4/01 2:02:48 277.12212/4/01 3:02:48 275.74812/4/01 4:02:48 274.82512/4/01 5:02:48 274.50112/4/01 6:02:48 272.83812/4/01 7:02:48 267.88612/4/01 8:02:48 259.4412/4/01 9:02:48 258.68612/4/01 10:02:48 257.92212/4/01 11:02:48 259.76612/4/01 12:02:48 260.66512/4/01 13:02:48 263.43512/4/01 14:02:48 264.77512/4/01 15:02:48 265.25312/4/01 16:02:48 264.41812/4/01 17:02:48 260.61312/4/01 18:02:48 258.93612/4/01 19:02:48 259.93112/4/01 20:02:48 259.74212/4/01 21 :02:48 262.26712/4/01 22:02:48 269.24812/4/01 23:02:48 277.60112/5/01 0:02:48 396.09712/5/01 1:02:48 439.51712/5/01 2:02:48 422.46212/5/01 3:02:48 417.80412/5/01 4:02:48 414.95512/5/01 5:02:48 412.8812/5/01 6:02:48 409.96512/5/01 7:02:48 406.00312/5/01 8:02:48 400.45812/5/01 9:02:48 397.82812/5/01 10:02:48 399.27412/5/01 11:02:48 400.68612/5/01 12:02:48 403.37412/5/01 13:02:48 252.34312/5/01 14:02:48 132.4412/5/01 15:02:48 233.55612/5/01 16:02:48 259.0712/5/01 17:02:48 267.62712/5/01 18:02:48 274.712/5/01 19:02:48 278.98112/5/01 20:02:48 277.82712/5/01 21 :02:48 276.72212/5/01 22:02:48 276.01112/5/01 23:02:48 277.2812/6/01 0:02:48 280.31212/6/01 1:02:48 283.357


12/6/01 2:02:48 284.738 12/6/01 3:02:48 286.074 12/6/01 4:02:48 286.101 12/6/01 5:02:48 285.916 12/6/01 6:02:48 285.767 12/6/01 7:02:48 281.365 12/6/01 8:02:48 275.539 12/6/01 9:02:48 274.457 12/6/01 10:02:48 273.49 12/6/01 11:02:48 273.429 12/6/01 12:02:48 280.909 12/6/01 13:02:48 285.983 12/6/01 14:02:48 289.991 12/6/01 15:02:48 289.406 12/6/01 16:02:48 288.704 12/6/01 17:02:48 287.357 12/6/01 18:02:48 289.722 12/6/01 19:02:48 292.257 12/6/01 20:02:48 293.333 12/6/01 21:02:48 294.406 12/6/01 22:02:48 295.003 12/6/01 23:02:48 371.902 12/7/01 0:02:48 473.774 12/7/01 1:02:48 440.975 12/7/01 2:02:48 421.134 12/7/01 3:02:48 404.733 12/7/01 4:02:48 392.364 12/7/01 5:02:48 381.867 12/7/01 6:02:48 371.74 12/7/01 7:02:48 360.038 12/7/01 8:02:48 340.814 12/7/01 9:02:48 0 12/7/01 10:02:48 66.8873


12/6/01 2:02:48 284.73812/6/01 3:02:48 286.07412/6/01 4:02:48 286.10112/6/01 5:02:48 285.91612/6/01 6:02:48 285.76712/6/01 7:02:48 281.36512/6/01 8:02:48 275.53912/6/01 9:02:48 274.45712/6/01 10:02:48 273.4912/6/01 11:02:48 273.42912/6/01 12:02:48 280.90912/6/01 13:02:48 285.98312/6/01 14:02:48 289.99112/6/01 15:02:48 289.40612/6/01 16:02:48 288.70412/6/01 17:02:48 287.35712/6/01 18:02:48 289.72212/6/01 19:02:48 292.25712/6/01 20:02:48 293.33312/6/01 21:02:48 294.40612/6/01 22:02:48 295.00312/6/01 23:02:48 371.90212/7/01 0:02:48 473.77412/7/01 1:02:48 440.97512/7/01 2:02:48 421.13412/7/01 3:02:48 404.73312/7/01 4:02:48 392.36412/7/01 5:02:48 381.86712/7/01 6:02:48 371.7412/7/01 7:02:48 360.03812/7/01 8:02:48 340.81412/7/01 9:02:48 012/7/01 10:02:48 66.8873


Documents

A STUDY OF NEURAL NETWORKS AND MULTIPLE NEURAL NETWORKS …