Upload
jaxson-jay-mayhew
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Artificial neural networks for infectious diarrhea prediction using meteorological factors in Shanghai
Yongming Wang, Junzhong Gu and Zili ZhouDepartment of Computer Science & Technology, East China Normal University
Institute of Computer Applications, Shanghai, ChinaE-mail: [email protected]
http://www.ica.stc.sh.cn
6th International Conference on Software Engineering and Knowledge Engineering
SEKE 2014, Hyatt Regency, Vancouver, Canada
OUTLINES• Introduction
• Study area and dataset
• Prediction method and performance metrics
• Development of FFBPNN model
– input and output parameters
– Data pre-processing and post-processing
– Determination of optimum network and parameters
• Development of MLR model
• Experiments results and discussion
• Sensitivity analyses
• Conclusions
Introduction
As a kind of common and important infectious disease, infectious diarrhea has a serious threat to human health and leads to one billion disease episodes and 1.8 million deaths each year (WHO, 2008).
In Shanghai of China which is the biggest developing country, the incidence of infectious diarrhea has significant seasonality throughout the year and is particularly high in the summer and autumn of recent years.
Hence, a robust short-term forecasting model for infectious diarrhea incidence is necessary for decision-making in policy and public health.
Introduction
Infectious diseases have a closely relation with meteorological factors, such as temperature and rainfall, and can affect infectious diseases in a linear or nonlinear fashion. In recent years, there has been a large scientific and public debate on climate change and its direct as well as indirect effects on human health.
As far as we are concerned with the prediction of diarrhea diseases in literature, many forecasting models based on statistical methods for diarrhea diseases forecasting have been reported.
With regard to the fact that number of meteorological factor that effect infectious diarrhea are too much and the inter-relation among them is also very complicated, prediction models based on statistics methods may not be fully suitable for such type of problems.
Introduction
Nowadays, Artificial Neural Networks (ANNs) are considered to be one of the intelligent tools to understand the complex problems and have been widely used in the medical and health field. To the best knowledge of the authors, there is no works has been carried out to utilize the ANNs method in predicting diarrhea disease.
Contribution: Establish a new ANNs model (FFBPNN) to predict infectious diarrhea in Shanghai with a set of meteorological factors as predictors.
Study area and Dataset-Study area
Shanghai is located in the eastern part of China which is the largest developing country in the world, and the city has a mild subtropical climate with four distinct seasons and abundant rainfalls. It is the most populous city in China comprising urban/suburban districts and counties, with a total area of 6,340.5 square kilometers and had a population of more then 25.0 million by the end of 2013.
Study area and dataset-dataset
The infectious diarrhea cases for the period 2005.1.3-2009.1.4
0 50 100 150 200 2500
50
100
150
200
250
300
350
Time(week)
Weekly
num
ber of in
fectio
us dia
rrhea
Study area and dataset-dataset
The meteorological factors data for the period 2005.1.3-2009.1.4
2005 2006 2007 2008 2009 20100
5
10
15
20
25
30
35
40
Time(week)
Wee
kly
aver
age
max
imum
tem
pera
ture
(a) 2005 2006 2007 2008 2009 2010-5
0
5
10
15
20
25
30
Time(week)
Wee
kly
aver
age
min
imum
tem
pera
ture
(b) 2005 2006 2007 2008 2009 20100
5
10
15
20
25
30
35
Time(week)
Wee
kly
aver
age
tem
pera
ture
(c) 2005 2006 2007 2008 2009 201045
50
55
60
65
70
75
80
85
90
Time(week)
Wee
kly
aver
age
min
imum
rel
ativ
e hu
mid
ity
(d)
2005 2006 2007 2008 2009 201045
50
55
60
65
70
75
80
85
90
Time(week)
Wee
kly
aver
age
rela
tive
hum
idity
(e)2005 2006 2007 2008 2009 2010
995
1000
1005
1010
1015
1020
1025
1030
1035
Time(week)
Wee
kly
aver
age
atm
osph
eric
pre
ssur
e
(f) 2005 2006 2007 2008 2009 20100
2
4
6
8
10
12
Time(week)W
eekl
y av
erag
e su
nshi
ne d
urat
ion
(g) 2005 2006 2007 2008 2009 20101
2
3
4
5
6
7
Time(week)
Wee
kly
aver
age
win
d sp
eed
(h)
2005 2006 2007 2008 2009 20100
5
10
15
20
25
30
35
Time(week)
Wee
kly
aver
age
rain
fall
(i)
Method and performance metrics
The schematic flowchart of proposed method.
Data Collecting
Dataset
Data calculating
Data normalizing
Data gathering
Pre-processing
Models development
Models testing and comparing
Data m
ining
Prediction Model
Step 1: Data collection
Step 2: Data pre-processing
Step 3: Data mining
Method and performance metrics
Three layered feed-forward back-propagation artificial neural network model.
bbxvwwxfym
j
n
ijiijji
1 10 )()(
Method and performance metrics
The models with the smallest RMSE, MAE and MAPE and the largest R and R2 are considered to be the best models.
n
ttt yyn
MAE1
ˆ1
%1001
1
ˆ
n
tt
tt
yyy
nMAPE
n
tyy ttn
RMSE1
2
)ˆ(1
n
t
n
t
yy
yy
t
ttR
1
21
2
)(
)ˆ(1
n
tt
n
ttt
y
yyR
1
2
1
2
2
)(
)(1
ˆ
Development FFBPNN model
The FFBPNN modeling consists of two steps: --- Train the network using training dataset --- Model input and output parameters --- Data pre-processing and post-processing --- Determination of optimum network and parameters --- Test the network with testing dataset
Hidden neurons and network errors
Development FFBPNN model
Parameters FFBPNN
Number of input layer units 9
Number of hidden layer 1
Number of hidden layer units 4
Number of output layer units 1
Momentum rate 0.9
Learning rate 0.74
Error after learning 1e-6
Learning cycle 1500 epoch
Transfer function in hidden layer Tansig
Transfer function in output layer Purelin
Training function TRAINGDM
The optimum model architecture and parameters for the diarrhea prediction.
Development MLR model
Dependent variable : diarrhea number
Independent variables : meteorological factors
RWS
SDAPRH
RHTT
TWNID
avg
avgavg
avg
6048.17205.15
7734.50902.22993.0
6506.16208.2.815802
0.961917903.1972
minmin
max
Results and discussion
PECs
Models
FFBPNN MLR
Training Testing Training Testing
MAE 20.7628 27.7547 29.8077 35.3774
RMSE 28.3007 36.0526 39.3739 48.9395
MAPE(%) 27.27% 38.41% 43.37% 41.82%
R 0.8783 0.8490 0.8089 0.6968
R2 0.9213 0.9125 0.8811 0.8388
The reason of better performances of the FFBPNN model over MLR model may be attributed to the complex nonlinear relationship between infectious diseases and meteorological factors.
Results and discussion
0 20 40 60 80 100 120 140 1600
50
100
150
200
250
300
350
Time(week)
The
wee
kly
num
ber
of in
fect
ious
dia
rrhe
a
Actual
FFBPNN
(a)0 20 40 60 80 100 120 140 160
-50
0
50
100
150
200
250
300
350
Time(week)
The
wee
kly
num
ber
of in
fect
ious
dia
rrhe
a
Actual
MLR
Comparison curves plot of actual vs. predicted trends for training dataset
FFBPNN MLR
Results and discussion
Comparison scatter plot of actual vs. predicted values for training dataset
FFBPNN MLR
0 50 100 150 200 250 300 3500
50
100
150
200
250
300
Actual values
FF
BP
NN
pre
dict
ed v
alue
s
y=0.83+17
R2=0.9385
(b)
Results and discussion
Comparison curves plot of actual vs. predicted trends for testing dataset
FFBPNN MLR
0 10 20 30 40 50 600
50
100
150
200
250
300
Time(week)
The
wee
kly
num
ber
of in
fect
ious
dia
rrhe
a
Actual
FFBPNN
(c) 0 10 20 30 40 50 600
50
100
150
200
250
300
Time(week)
The
wee
kly
num
ber
of in
fect
ious
dia
rrhe
a
ActualMLR
(c)
Results and discussion
Comparison scatter plot of actual vs. predicted values for testing dataset
FFBPNN MLR
0 50 100 150 200 250 3000
50
100
150
200
250
Actual values
FF
BP
NN
pre
dict
ed v
alue
s
y=0.68x+28
R2=0.9125
(d) 0 50 100 150 200 250 3000
20
40
60
80
100
120
140
160
180
200
Actual values
MLR
pre
dict
ed v
alue
s
y=0.54x+39
R2=0.8388
(d)
Sensitivity analyses
Sensitivity analysis (Cosine Amplitude Method)
ANNs
Meteorological factor Infectious diarrhea
black-box
m
k
m
k
m
kjkikjkikij xxxxr
1 1 1
22/
Sensitivity analyses
Most effective meteorological factor : temperature
least effective meteorological factor :average rainfall
Conclusions1. The proposed method is more suitable for prediction infectious diarrhea then statistical methods MLR.
2. The feed-forward back-propagation neural network (FFBPNN) model with architecture 9-4-1 has the best accurate prediction results in prediction of the weekly number of infectious diarrhea.
3. most effective meteorological factor on the infectious diarrhea is weekly average temperature, whereas weekly average rainfall is the least effective parameter on the infectious diarrhea.
Therefore, this technique can be used to predict infectious diarrhea. The results can be used as a baseline against which to compare other prediction techniques in the future.