8
Procedia Computer Science 47 (2015) 101 – 108 Available online at www.sciencedirect.com 1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014) doi:10.1016/j.procs.2015.03.188 ScienceDirect Relevance study of Data mining for the identification of negatively influenced factors in sick groups A.S.Aneeshkumar a , Dr.C.Jothi Venkateswaran b a Research Scholar, PG and Research Department of Computer Science, Presidency College, Chennai 600 005, India b Dean, Department of Computer Science & Applications, Presidency College, Chennai 600 005, India Abstract The application of medical data mining is emphasis with temporal data in this piece of writing. The application of computational methods in medical field is well known and which started in previous decades, but still now also lot of researches take place in artificial intelligence, knowledge discovery and mining. Correlating computational intelligence with medical intelligence to predict negatively influenced epidemiological factors in liver disorder patients. Keywords: Neural Networks; Back Propagation Algorithm; Analysis of Variance. 1. Introduction Data mining task is used to discover most suitable model and relevant information derived from a large domain specific data set by the Knowledge extraction process and which include various techniques to find the relationship between the attributes of the dataset. The major challenge in pattern determination is the ability to recognise relevant values and most useful information from the domain data. Even data mining application process in information identification is also differing according to the kind of data and the knowledge which we needed [1]. *Coresponding Author Email: [email protected] © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014)

1-s2.0-S1877050915004561-main

Embed Size (px)

Citation preview

Page 1: 1-s2.0-S1877050915004561-main

Procedia Computer Science 47 ( 2015 ) 101 – 108

Available online at www.sciencedirect.com

1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014)doi: 10.1016/j.procs.2015.03.188

ScienceDirect

Relevance study of Data mining for the identification of negatively influenced factors in sick groups

A.S.Aneeshkumara, Dr.C.Jothi Venkateswaranb a Research Scholar, PG and Research Department of Computer Science, Presidency College, Chennai 600 005, India

b Dean, Department of Computer Science & Applications, Presidency College, Chennai 600 005, India

Abstract

The application of medical data mining is emphasis with temporal data in this piece of writing. The application of computational methods in medical field is well known and which started in previous decades, but still now also lot of researches take place in artificial intelligence, knowledge discovery and mining. Correlating computational intelligence with medical intelligence to predict negatively influenced epidemiological factors in liver disorder patients.

© 2015 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014).

Keywords: Neural Networks; Back Propagation Algorithm; Analysis of Variance.

1. Introduction

Data mining task is used to discover most suitable model and relevant information derived from a large domain specific data set by the Knowledge extraction process and which include various techniques to find the relationship between the attributes of the dataset. The major challenge in pattern determination is the ability to recognise relevant values and most useful information from the domain data. Even data mining application process in information identification is also differing according to the kind of data and the knowledge which we needed [1].

*Coresponding Author Email: [email protected]

© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014)

Page 2: 1-s2.0-S1877050915004561-main

102 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

Classification plays an important role in data mining task [2]. Each of the record in a given dataset is belongs to

any one of the predefined class and the classification is a task which extract rules from the given unidentified class elements to classify correctly [3]. Artificial Neural Network is an essential data mining task used in classification and clustering problems where build a machine to learn like a brain. Less computational cost and the representation of dimensionality make artificial neural network a competitor in multidimensional and big data mining [4]. It attempt to present learning model from examples and display some possibility for generalization on training data [5]. Neural Networks adjust their internal constraints by performing vector mappings from the input to the output node. The basic function of neural networks is the belief of the requested function [6].

Back Propagation network is an optimizing learning algorithm used in neural network learning and it follows the reverse application of neural networks. In back propagation, if the achieved result is not expected one then the weights between layers are modified and iterate it repeatedly until the error become small enough. Apart from the fixed learning algorithms, it learns from examples and so improves its learning performance in training [7]. This type of approach is known as meta-learning [8]. Meta-learning algorithms are often follows the procedure of supervised learning or reinforcement learning, where the learner improves its performance and learning ability through trial-and-error scenario [9]. In supervised learning, learning and meta-learning are usually treated as independent processes [10, 11].

Patient’s condition and period of infection is always an antonym for treatment time of the disease and the most complex issue in medical diagnosis is inability of effective recognition of initial and associated symptoms with the predicted disease. If the medical practitioners discover theses symptoms as a stepping stone to other disease instead of identifying it as a simple and independent symptom, only after some days of medical treatment. The reaction of human body to a foreign agent, infection or other inborn diseases may differ. This reaction is an identification mark or a symptom which tells that something happened or is going to happen in the body. These factors promote the risk of disease and which help for proper medical diagnosis, patient management strategy planning and counseling for the patients and their relatives to alter habitual activities, life style and psychosocial interventions to prevent and manage chronic diseases.

Liver disorder is also shows some symptoms like this. If it is hepatitis infection, the symptoms will come out immediately with more effect but in other cases like Alcoholic fatty liver disorder and Non-alcoholic fatty liver disorder, the liver damage will grow gradually and the initial symptoms are less effective to reach in a hospital and so the chance of misclassification is high without the consideration of further episodes. This work will lighten into the effect of other symptoms in liver disorder patients.

2. Dataset Description

The performance of biomedical data and physical assessments done by an expert is used here for medical decision making. The total of 2020 alcoholic liver patient’s data is collected from a reputed hospital in Chennai, with 39 attributes. Bilirubin (d), Bilirubin (t), S.G.O.T., S.G.P.T., Gamma GT, Alkaline phosphate, Total proteins, Albumin and Globulins are known as liver enzymes and gender, age, alcoholic consumption, smoking, height, weight, obesity, vomiting, abdomen pain, yellowish urinary discharge, low appetite, head ache, acting differently are other physical and biological symptoms Plasma glucose–f, Plasma glucose –r, Blood pressure (diastolic), Blood pressure (systolic), Triglycerides, DC Polymorphs, DC Lymphocytes, DC Basophils, DC Monocytes, E.S.R., Haemoglobin, RBC Count, PCV, MCV, MCH, MCHC and Platelet count are used for this study.

3. Approach

Constructed a neural network for alcoholic fatty liver disorder where input layer assigned as liver enzymes values with nine neurons and the hidden layer is physical and biological symptoms to produce output as alcoholic fatty liver disorder and which is initialized with small random numbers as its weights. Then the actual output will be identified with a forward pass. Then calculate error of output and each neuron from hidden layer. In reverse pass this error is then used mathematically to change the weights to get target output with a smaller error. This process will repeat until it reaches to minimum error.

The given figure1 represents i1 to i9 input neurons to represent liver enzymes and hi to h30 hidden neurons to represent other symptoms to produce o as output for alcoholic fatty liver disorder, where wi1h1 to wi9h30 are the weight between input layer and hidden layer and wh1o to w h30o are the weight between hidden and output layers.

Initially the network train with given input with random number as weight and the error is calculated from the first iteration result.

Page 3: 1-s2.0-S1877050915004561-main

103 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

Wi1h1

Wi9h1

Wi1h3

Wi9h3

Wh1o

Wh30o

Error in output is calculated as

Error in hidden layer is calculated as

So output layer weight change is

Hidden layer weight change is

Where shows target output, o actual output, hidden neuron values, new weight and is a constant equal to one for learning rate.

Negative weight of the hidden layer for the target output is identified and the corresponding hidden neurons are used for analysis.

Fig1: Neural Network model

Table1: Analysis of variance between input and hidden layers

Hidden Layer Input Layer Mean S. D. F Value P Value

Weight Bilirubin (D) 0.21 0.406

27.387 <0.001**

Bilirubin (T) 0.15 0.359

S.G.O.T. 0.11 0.318

S.G.P.T. 0.06 0.230

Gamma GT 0.10 0.305

Alkaline Phosphate 0.17 0.372

Total Proteins 0.14 0.349

Albumin 0.15 0.353

Globulins 0.10 0.293

Obese Bilirubin (D) 0.17 0.377

18.095 <0.001**

Bilirubin (T) 0.10 0.294

S.G.O.T. 0.13 0.339

S.G.P.T. 0.12 0.324

Gamma GT 0.12 0.326

i1

i9 h30

h1

o

Page 4: 1-s2.0-S1877050915004561-main

104 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

Alkaline Phosphate 0.11 0.312

Total Proteins 0.05 0.226

Albumin 0.12 0.323

Globulins 0.14 0.344

DC Lymphocytes Bilirubin (D) 0.55 0.498

4.972 0.026*

Bilirubin (T) 0.51 0.500

S.G.O.T. 0.45 0.498

S.G.P.T. 0.49 0.500

Gamma GT 0.47 0.499

Alkaline Phosphate 0.58 0.494

Total Proteins 0.45 0.498

Albumin 0.49 0.500

Globulins 0.49 0.500

Haemoglobin Bilirubin (D) 0.22 0.412

56.772 <0.001**

Bilirubin (T) 0.22 0.413

S.G.O.T. 0.18 0.387

S.G.P.T. 0.24 0.428

Gamma GT 0.22 0.417

Alkaline Phosphate 0.22 0.415

Total Proteins 0.26 0.439

Albumin 0.31 0.461

Globulins 0.25 0.436

RBC Count Bilirubin (D) 0.26 0.437

70.341 <0.001**

Bilirubin (T) 0.27 0.444

S.G.O.T. 0.18 0.387

S.G.P.T. 0.20 0.401

Gamma GT 0.23 0.423

Alkaline Phosphate 0.18 0.382

Total Proteins 0.15 0.359

Albumin 0.24 0.425

Globulins 0.16 0.366

PCV Bilirubin (D) 0.11 0.312

0.105 0.746

Bilirubin (T) 0.15 0.352

S.G.O.T. 0.14 0.347

S.G.P.T. 0.09 0.293

Gamma GT 0.17 0.379

Alkaline Phosphate 0.18 0.383

Total Proteins 0.11 0.314

Albumin 0.11 0.312

Globulins 0.13 0.331

Page 5: 1-s2.0-S1877050915004561-main

105 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

MCV Bilirubin (D) 0.10 0.307

4.987 0.026*

Bilirubin (T) 0.10 0.303

S.G.O.T. 0.07 0.263

S.G.P.T. 0.10 0.296

Gamma GT 0.06 0.242

Alkaline Phosphate 0.06 0.235

Total Proteins 0.04 0.205

Albumin 0.15 0.353

Globulins 0.07 0.256

MCH Bilirubin (D) 0.00 0.000

5.046 0.025*

Bilirubin (T) 0.04 0.184

S.G.O.T. 0.04 0.184

S.G.P.T. 0.02 0.123

Gamma GT 0.03 0.174

Alkaline Phosphate 0.02 0.138

Total Proteins 0.03 0.174

Albumin 0.04 0.184

Globulins 0.02 0.125

MCHC Bilirubin (D) 0.06 0.245

16.314 <0.001**

Bilirubin (T) 0.06 0.245

S.G.O.T. 0.11 0.308

S.G.P.T. 0.02 0.152

Gamma GT 0.08 0.275

Alkaline Phosphate 0.02 0.152

Total Proteins 0.07 0.257

Albumin 0.09 0.287

Globulins 0.11 0.308

Platelet Count Bilirubin (D) 0.50 0.500

6.389 0.011*

Bilirubin (T) 0.45 0.498

S.G.O.T. 0.41 0.492

S.G.P.T. 0.40 0.491

Gamma GT 0.41 0.492

Alkaline Phosphate 0.40 0.491

Total Proteins 0.43 0.496

Albumin 0.51 0.500

Globulins 0.50 0.500

Page 6: 1-s2.0-S1877050915004561-main

106 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

Page 7: 1-s2.0-S1877050915004561-main

107 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

Fig.2: Mean diagram for the variance

4. Discussion

Table 1 shows the influence of liver enzymes in each symptom where a total of 2020 data set trained in the neural network with back propagation algorithm and as per the table mean, standard deviation, f measure and significance of each neurons of the input layer is calculated. For weight and obese, Bilirubin-D is the most influenced but S.G.P.T. and total protein are less influenced factors respectively. According to dc lymphocytes, alkaline phosphate’s (0.58) mean value is high and S.G.O.T and total protein share the least influence (0.45). The standard deviation of this group lies between 0.498 and 0.500 and so on. The maximum f value carried by RBC count (70.341) and the less is for PCV (0.105). P value for weight, obese, fever, haemoglobin, RBC count and MCHC is less than one percentage level of significance and DC lymphocytes, MCV, MCH and platelet count sharing less than five percentage level of significance. Hence there is significance between these hidden layers and the input layer. But the significant level of PCV is above five percentages and so there is no significance between this

Page 8: 1-s2.0-S1877050915004561-main

108 A.S. Aneeshkumar and C. Jothi Venkateswaran / Procedia Computer Science 47 ( 2015 ) 101 – 108

factor and the input neurons. Figure 1 represents the pictorial view of the table mean for each group, where below averaged weight and obese value is correlated with high value of Bilirubin (D) and less S.G.O.T. for weight and less protein for obesity is seen in AFLD group. The effect of Alkaline Phosphate is high in decreased DC Lymphocytes and for hemoglobin, Albumin value is highly influenced, but in both groups the value of S.G.O.T. is low. There is an inverse relation between RBC Count and Bilirubin (T), PCV and Alkaline Phosphate, MCV and Albumin, Platelet count and Albumin. Negative representation of MCH is influenced with high Bilirubin (D), S.G.O.T. and Albumin and for MCHC, S.G.O.T. and Globulin is extremely subjective.

5. Conclusion

Regular alcoholic consumption will lead to various psychodynamic, socioeconomic and physiologic problems. In this paper we analyzed about various symptoms of liver disorders in alcoholic patients and identified negatively influenced factors due to alcoholic usage and the level of liver enzymes influences with respect to the symptoms. So this supportive study may be useful to make awareness within the patient groups.

Acknowledgement

We express our sincere thanks to the Director Dr.(Capt.) K. J. Jayakumar M.S., M.N.A.M.S., F.A.I.S. and Chief Manager Dr. R. Rajamahendran, B.Sc., M.B.B.S., D.M.C.H., D.H.H.M., P.G.D.H.S.C.(Diab.), F.C.D., Sir Ivan Stedeford Hospital, Chennai for providing permission to collect data. We are grateful to the chief Manager for his guidance and also would like to thank other hospital staffs for their valuable suggestions and help throughout this study.

References 1. Lin Li, “Efficient mining of Haplotype patterns for disease prediction”, Ph.D. Thesis, National University of Singapore, 2007 2. Humar Kahramanli and Novruz Allahverdi, “Mining Classification Rules for Liver Disorders”, International Journal of Mathematics and

Computers in Simulation, Issue 1, Volume 3, pg: 9-19, 2009. 3. W. Au, K. C. C. Chan, and X. Yao, “A Novel Evolutionary Data Mining Algorithm With Applications to Churn Prediction”, IEEE

Transactions on Evolutionary Computation, Vol. 7, No. 6, 2003 4. S. Wang, “Classification with incomplete survey data: A Hopfield neural network approach”, Computers & Operations Research, 32, pp.

2583–2594, 2005. 5. S. Mukhopadhyay, C. Tang, J. Huang, M. Yu and M. Palakal, “A Comparative Study of Genetic Sequence Classification Algorithms”,

Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, pp. 57 – 66, 2002. 6. L. Nikolaos, “Radial Basis Functions Networks to hybrid neuro genetic RBFΝs in Financial Evaluation of Corporations”, International

Journal of Computers, Volume 2, pp. 176-182, 2008. 7. J.Schmidhuber, J.Zhao, and M.Wiering, “Simple Principles of Metalearning”, Technical Report IDSIA, pp.69-96, 1996. 8. S.Hochreiter, A.S.Younger, and P.R.Conwell, “Learning to Learn Using Gradient Descent”, Lecture Notes in Computer Science, vol.2130,

2001. 9. Demetri Psaltis, Athanasios Sideris, and Alan A. Yamamura, “A Multilayered Neural Network Controller” IEEE, April 1988. 10. M.Katrak, “Metalearning methods for neural networks”, MS Thesis, Technical University Kosice, 2005. 11. J.Schmidhuber, “A Neural Network that Embeds its own Meta-Levels, Proceedings of the International Conference on Neural Networks,

San Francisco, IEEE, 1993