16
Research Article Soft Clustering for Enhancing the Diagnosis of Chronic Diseases over Machine Learning Algorithms Theyazn H.H Aldhyani , 1 Ali Saleh Alshebami, 2 and Mohammed Y. Alzahrani 3 1 Department of Computer Sciences and Information Technology, King Faisal University, Al-Hasa 31982, Saudi Arabia 2 Department of Administrative and Financial Sciences, King Faisal University, Al-Hasa 31982, Saudi Arabia 3 Department of Computer Sciences and Information Technology, Albaha University, Albaha 65527, Saudi Arabia Correspondence should be addressed to eyazn H.H Aldhyani; [email protected] Received 8 October 2019; Revised 2 January 2020; Accepted 18 January 2020; Published 9 March 2020 Academic Editor: Cesare F. Valenti Copyright©2020eyaznH.HAldhyanietal.isisanopenaccessarticledistributedundertheCreativeCommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Chronic diseases represent a serious threat to public health across the world. It is estimated at about 60% of all deaths worldwide and approximately 43% of the global burden of chronic diseases. us, the analysis of the healthcare data has helped health officials, patients, and healthcare communities to perform early detection for those diseases. Extracting the patterns from healthcare data has helped the healthcare communities to obtain complete medical data for the purpose of diagnosis. e objective of the present research work is presented to improve the surveillance detection system for chronic diseases, which is used for the protection of people’s lives. For this purpose, the proposed system has been developed to enhance the detection of chronic disease by using machine learning algorithms. e standard data related to chronic diseases have been collected from various worldwide resources. In healthcare data, special chronic diseases include ambiguous objects of the class. erefore, the presence of am- biguous objects shows the availability of traits involving two or more classes, which reduces the accuracy of the machine learning algorithms. e novelty of the current research work lies in the assumption that demonstrates the noncrisp Rough K-means (RKM) clustering for figuring out the ambiguity in chronic disease dataset to improve the performance of the system. e RKM algorithm has clustered data into two sets, namely, the upper approximation and lower approximation. e objects belonging to the upper approximation are favourable objects, whereas the ones belonging to the lower approximation are excluded and identified as ambiguous. ese ambiguous objects have been excluded to improve the machine learning algorithms. e machine learning algorithms, namely, na¨ ıve Bayes (NB), support vector machine (SVM), K-nearest neighbors (KNN), and random forest tree, are presented and compared. e chronic disease data are obtained from the machine learning repository and Kaggle to test and evaluate the proposed model. e experimental results demonstrate that the proposed system is successfully employed for the diagnosis of chronic diseases. e proposed model achieved the best results with naive Bayes with RKM for the classification of diabetic disease (80.55%), whereas SVM with RKM for the classification of kidney disease achieved 100% and SVM with RKM for the classification of cancer disease achieved 97.53 with respect to accuracy metric. e performance measures, such as accuracy, sensitivity, specificity, precision, and F-score, are employed to evaluate the performance of the proposed system. Furthermore, evaluation and comparison of the proposed system with the existing machine learning algorithms are presented. Finally, the proposed system has enhanced the performance of machine learning algorithms. 1. Introduction Chronic diseases are serious diseases because they pose a serious threat to people’s lives and persist over long periods. ey can impede the freedom and health of people who have physical disabilities. us, they further cause frustration of people who suffer from various health disabilities. e available vaccines and medicine cannot completely prevent chronic diseases because they show no indications in any case. With aging changes, chronic diseases continue to become a more common phenomenon. Hence, there is a need to identify factors causing them and to take the Hindawi Journal of Healthcare Engineering Volume 2020, Article ID 4984967, 16 pages https://doi.org/10.1155/2020/4984967

SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

Research ArticleSoft Clustering for Enhancing the Diagnosis of ChronicDiseases over Machine Learning Algorithms

Theyazn HH Aldhyani 1 Ali Saleh Alshebami2 and Mohammed Y Alzahrani 3

1Department of Computer Sciences and Information Technology King Faisal University Al-Hasa 31982 Saudi Arabia2Department of Administrative and Financial Sciences King Faisal University Al-Hasa 31982 Saudi Arabia3Department of Computer Sciences and Information Technology Albaha University Albaha 65527 Saudi Arabia

Correspondence should be addressed to eyazn HH Aldhyani taldhyanikfuedusa

Received 8 October 2019 Revised 2 January 2020 Accepted 18 January 2020 Published 9 March 2020

Academic Editor Cesare F Valenti

Copyright copy 2020eyaznHHAldhyani et alis is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Chronic diseases represent a serious threat to public health across the world It is estimated at about 60 of all deaths worldwideand approximately 43 of the global burden of chronic diseases us the analysis of the healthcare data has helped healthofficials patients and healthcare communities to perform early detection for those diseases Extracting the patterns fromhealthcare data has helped the healthcare communities to obtain complete medical data for the purpose of diagnosise objectiveof the present research work is presented to improve the surveillance detection system for chronic diseases which is used for theprotection of peoplersquos lives For this purpose the proposed system has been developed to enhance the detection of chronic diseaseby using machine learning algorithms e standard data related to chronic diseases have been collected from various worldwideresources In healthcare data special chronic diseases include ambiguous objects of the class erefore the presence of am-biguous objects shows the availability of traits involving two or more classes which reduces the accuracy of the machine learningalgorithms e novelty of the current research work lies in the assumption that demonstrates the noncrisp Rough K-means(RKM) clustering for figuring out the ambiguity in chronic disease dataset to improve the performance of the system e RKMalgorithm has clustered data into two sets namely the upper approximation and lower approximation e objects belonging tothe upper approximation are favourable objects whereas the ones belonging to the lower approximation are excluded andidentified as ambiguous ese ambiguous objects have been excluded to improve the machine learning algorithms e machinelearning algorithms namely naıve Bayes (NB) support vector machine (SVM) K-nearest neighbors (KNN) and random foresttree are presented and compared e chronic disease data are obtained from the machine learning repository and Kaggle to testand evaluate the proposed modele experimental results demonstrate that the proposed system is successfully employed for thediagnosis of chronic diseases e proposed model achieved the best results with naive Bayes with RKM for the classification ofdiabetic disease (8055) whereas SVM with RKM for the classification of kidney disease achieved 100 and SVMwith RKM forthe classification of cancer disease achieved 9753 with respect to accuracy metric e performance measures such as accuracysensitivity specificity precision and F-score are employed to evaluate the performance of the proposed system Furthermoreevaluation and comparison of the proposed system with the existing machine learning algorithms are presented Finally theproposed system has enhanced the performance of machine learning algorithms

1 Introduction

Chronic diseases are serious diseases because they pose aserious threat to peoplersquos lives and persist over long periodsey can impede the freedom and health of people who havephysical disabilities us they further cause frustration of

people who suffer from various health disabilities eavailable vaccines and medicine cannot completely preventchronic diseases because they show no indications in anycase With aging changes chronic diseases continue tobecome a more common phenomenon Hence there is aneed to identify factors causing them and to take the

HindawiJournal of Healthcare EngineeringVolume 2020 Article ID 4984967 16 pageshttpsdoiorg10115520204984967

required corrective measures accordingly Factors such assmoking physical inactivity food diet and insufficient orexcessive alcohol consumption could largely contribute tochronic diseases Previous studies have identified chronicdiseases as the seventh cause of death among other causes Inthe United States they resulted in 658 of deaths among USmales and 672 of deaths among US females in 2010 [1]Heart cancer diabetes asthma and kidney diseases areidentified as chronic diseases In addition chronic diseasesare measured as noncommunicable diseases they slowly endthe life of people in a long period Chronic diseases do nottransfer from one person to another In the United Stateschronic diseases drive up medicinal service expenses andbreak up human services reasonably ey possess an es-sential part of the economy and thwart the health quality ofpeople is study promotes the classification of chronicdisease conditions namely cancer kidney asthma anddiabetes e World Health Organization reported in 2002that mortality dreariness and incapacity were credited tothe major chronic diseases Currently records show that60 of all deaths and 43 of the global weight of illnesses areattributed to chronic diseases By 2020 it is expected that thepercentage of deaths will reach 73 of total deaths and 60of the global weight of sicknesses [2] With the help ofmachine learning algorithms predicting chronic diseaseshas become an easy task erefore it is aimed here todevelop a surveillance system to predict and diagnosehealthcare data for helping the health communities

e machine learning algorithms increase the level ofindividual prediction and therefore diagnosis will ultimatelystrengthen anticipation efforts e availability of wellprevention measures will not only enhance or provide goodhealth for persons but also reduce healthcare spendingMachine learning techniques have been used widely in thehealthcare domain ey have now become crucial tools forhealthcare management ey have also assisted in im-proving health care by using the prediction measures forepidemic diseases faced by people around the world eWorld Health Organization (WHO) has significantlybenefited from the employment of machine learning ap-plications that improve the quality of health care

Machine learning algorithms are considered to beclassification clustering and prediction for the sake ofsolving various issues in real-time applicationsey providean assurance of the classification and prediction solutions forstability and reliability in performance Based on machinelearning algorithms a few researchers have developedsuccessful healthcare systems Algorithms include statisticsSVM decision trees clustering and optimization algorithmsand others Machine learning applications rely largely ondatasets that analyze and discover the patterns that are usedto solve specific tasks e healthcare system has the po-tential promotion in the health domain to extract anddiscover the hidden patterns in the database [3] us theavailable healthcare data are universally scattered and am-biguousey may also contain insufficient and insignificantinformation stored in terms of the constancy in predictionand classification One of the biggest challenges of healthcaredata and its information is the accurate diagnosis of certain

significant information To predict and analyze the chronicdiseases such as kidney diabetic cancer and heart diseasesthere are several proposed machine learning algorithms thatcan be usedese algorithms include the decision tree (DT)SVM ANN linear regression (LR) KNN NB and timeseries prediction models Because of the rapid innovationand continuous changes in software engineering a hugevolume of information can be generated With the devel-opment of a healthcare database management system therewill be more opportunities for the enhancement of thehealthcare systems Extracting patterns from these datasetsand managing large amounts of dimensionality data havebecome a major field of machine learning e machinelearning algorithm is considered to be the classification ofhealthcare datasets to obtain useful knowledge that can helphealth officials and communities To apply machine learningalgorithms that enhance the performance of the classifica-tion process the preprocessing of the soft clustering algo-rithm is required

e remaining parts of the article are organized intosections Introduction is discussed in Section 1 relatedstudies are given in Section 2 data andmethods are shown inSection 3 and results and discussion are shown in Section 4Lastly conclusion is presented in Section 5

2 Related Studies

ere is a considerable number of research works that havebeen done in relation to the classification and the predictionof healthcare data Solanki [4] proposed most of the classifieralgorithms on the Weka tool for predicting the prevalentsickle cell disease e obtained results compared with theclassifiers are available on the Weka data mining tool It isobserved that the random tree approach is a better algorithmfor classifying sickle cell Similarly Joshi et al [5] used anumber of machine learning approaches such as Bayes netlogistic model tree (LMT) multilayer perception stochasticgradient descent and sequential minimal optimizationtechniques ese researchers suggested using LMT algo-rithms for diagnosing breast cancer because of its highperformance and accuracy Furthermore David et al [6]applied the KNN algorithm Bayesian network decision treealgorithm and random tree method namely the J48 tree topredict leukemia disease Accordingly it was found that thedecision tree algorithm had shown better accuracy in theresult In one more study conducted by Vijayarani andSudha [7] LMT and the sequential minimal optimizationmultilayer and perceptron algorithms are employed topredict heart diseases Furthermore the study conducted bySugandhi et al [8] proposed a random tree algorithm for theclassification of heart diseases e outcome of the researchhas shown that random tree gives a better performance thanother classification algorithms Consequently for obtainingresults from a random tree classifier it is found that therandom tree classifier is outperformede study of Yasodhaand Kannan [9] has also reported that the Weka classifi-cation algorithm was used for analyzing and predicting thediabetic patientrsquos database Likewise Bin Othman and Yau[10] have compared various classification approaches with

2 Journal of Healthcare Engineering

theWeka data mining tool for predicting breast cancer Israa[11] has applied NB decision tree (DT) random forest andsupport vector machine techniques to improve the classi-fication of heart diseases On the other hand D Sisodia andSisodia [12] have proposed three machine learning algo-rithms that is DT SVM and naive Bayes (NB) to detectdiabetes us experiments have been done by usingstandard data from the UCI machine learning repository Itis observed that the NB approach outperforms as comparedwith other algorithms which have an accuracy rate of7630 e study of Syed et al [13] has employed SVMBayesian network and decision tree algorithms to predictthe obesity of schoolchildren Sandeep et al [14] proposedlinear discriminate analysis (LDA) NB random forest LRand quadratic discriminate analysis (QDA) for the analysisand classification of chronic kidney diseases e study ofSahana and Minavathi [15] has also focused on predictingkidney disease using classification algorithms namely ANNand C45 It concentrated on accurate prediction and timefactor performance e ANN and C45 algorithms are usedfor helping out the medical practitioner to give propermedication and medical treatment K Polaraju and Prasad[16] have proposed a multiple regression model to classifychronic heart disease It is proved that the multiple linearregression model is favourable for predicting heart diseases

In this research the training dataset consists of 3000values with 13 different features From the experimentalresults it is shown that the regression algorithm performsbetter than other algorithms Kim et al [17] have proposedthe character-recurrent neural network (Char-RNN) modelto predict chronic diseases ey have collected data fromthe Korean National Health and Nutrition ExaminationSurvey (KNHANES) It is observed that the Char-RNNmodel obtained higher accuracy than the conventionalmultilayer perceptron model Ng et al [18] have used ma-chine learning algorithms to detect heart failure Moreoverelectronic health record data are used to predict events andthe onset of diseases Zhang et al [19] have proposed aconvolution neural network (CNN) architecture namedGroup Net to predict chronic diseases us the experi-mental analysis is conducted using data from local medicalcenters ey have noted that CNN has achieved the bestaccuracy Kriplani et al [20] have used deep learning topredict chronic kidney disease e proposed models aretested by using standard datasets of diseases available on theUCI e analysis results have appeared better in usingcross-validation performance Liu et al [21] proposed CNNLSTM and hierarchical models to predict chronic diseasesBrisimi et al [22] have applied four machine learning al-gorithms namely SVM kernelized sparse logistic regres-sion and random forests to predict chronic heart anddiabetic diseases ey have gathered standard data fromelectronic health records (EHRs) Chen et al [23] usedstreamline machine learning techniques to predict chronicdisease epidemics eir experiment has proposed predic-tion models using real-life hospital data gathered fromcentral China in 2013ndash2015e convolution neural networkis implemented as based on multimodal disease risk pre-diction (CNN-MDRP) algorithm using structured and

unstructured data from hospitals Patel et al [24] havedeveloped a system using three classifiers such as KStarSMO and J48 Bayes net and multilayer perception neuralnetwork algorithms with the help of Weka software toclassify heart diseases It is observed that the Bayes net hasaccomplished optimum performance as compared withfurther classification algorithms namely KStar multilayerperception and J48 approaches by using the k-fold cross-validation method e research of Deepika and Seema [25]also designed a system to predict chronic diseases via ma-chine learning algorithms such as naıve Bayes decision treeSVM and ANN A comparative analysis of the performancesof algorithms is presented It is observed that the support ofthe vector machine and the naıve Bayes provides the highestaccuracy rate when predicting the diabetic disease Ul Haqet al [26] have suggested different machine learning algo-rithms such as naive Bayes classification tree KNN logisticregression SVM and ANN to predict heart diseases reefeature selection methods are applied to improve the clas-sification algorithms It is concluded that the feature se-lectionmethod increases the performance of the classifier forpredicting heart diseases Ahmed et al [27] proposed a fuzzylogic algorithm to classify kidney diseases Coacci et al [28]used two classification approaches namely logistic regres-sion and ANN Xun et al [29] presented ANN and naıveBayes classifiers for predicting chronic diseases Some re-searchers used the UCI machine learning repository fortesting proposed models such as chronic diseases [30 31]diabetic disease [32 33] and breast cancer [34] Using thedeep learning algorithm to predict chronic diseases Kimet al [17] proposed nature-inspired computing algorithmsfor the diagnosis of chronic diseases [35] and employedmachine learning algorithm to develop E-health for thediagnosis of chronic diseases [36]

In the current research article traditional machinelearning algorithms are employed for predicting chronicdiseases erefore the result of the existing classificationalgorithms is needed to make the healthcare system morereliable Subsequently the soft clustering algorithm is ap-plied to increase the accuracy of classification algorithms

3 Materials and Methods

e proposed model is designed explicitly to classify chronicdiseases using machine learning algorithms Figure 1 dis-plays the proposed system that combines the existing ma-chine learning algorithms with the rough k-means clusteringtechnique Noncrisp rough k-means algorithm is demon-strated to handle the ambiguous objects ese ambiguousobjects obstruct the performance of machine learning al-gorithms e RKM clustering has clustered data into twoclusters us it is used to measure the roughness of theobjects Moreover the threshold value is also used tomaximize the roughness of objects for reducing the am-biguous objects e threshold value parameter plays a verysignificant role in making the program of the noncrisp al-gorithm It has experimented and found out that thethreshold value is 14 e RKM algorithm is used to dealwith ambiguous objects for improving the classification

Journal of Healthcare Engineering 3

algorithme RKM algorithm has clustered data into lowerapproximation and upper approximation in which theclustered objects in the lower approximation are consideredbut the objects clustered in upper approximation are ex-cluded e novelty of the proposed model has used roughk-means to handle the ambiguous objects belonging to lowerapproximation that is processed with the help of machinelearning algorithms e rough k-means clustering is pro-posed to explicitly determine ambiguous objects To close itis investigated that the results of the proposed system haveoutperformed all the alternative models used for measuringthe performance e detailed description of the proposedsystem is discussed in the following subsections

31 Datasets e chronic disease datasets have been col-lected from the different resources as follows

311 Diabetic Disease Dataset e diabetes data collectedfrom the machine learning repository contained nine at-tributes eight features and one class is dataset has beengathered from an automatic electronic recording device andpaper records [37] Table 1 shows the features of data

312 Breast Cancer Disease Dataset e cancer data col-lected from the Kaggle contained nine attributes 30 featuresand 1 class ese features are obtained from digitizedimages of breast cancer [38] Table 2 shows the features ofdata

313 Kidney Disease Dataset e collected kidney datafrom the Kaggle contained 26 attributes 24 features and 1class [38] Table 3 shows the features of data

32HandlingAmbiguity Machine learning algorithms havesucceeded in a number of real-time applications such asimage processing recognition video recognition marketingprediction weather forecasting and network security econventional machine learning algorithms are used toidentify the objects belonging to exactly one class In dataanalysis it may be possible that an object shows the char-acteristics of different classes [39] In that event an objectshould belong to more than one class and as a result objectboundaries should necessarily overlap e machinelearning algorithms categorize an object into one classprecisely Figure 2 shows the ambiguous data Such re-quirement is found to be too restrictive in a number of real-time applications

In Figure 2 the basic example of ambiguous objects canbe noted It clearly shows the three separate classes Hence itis observed that five objects are not classified under anyprecise class Henceforth these five objects decrease theperformance of machine learning algorithms us it isrequired to determine such ambiguous objects and deal withthem before applying machine learning algorithms For this

Data sets

Rough K-means

Clustering data

Clustering data into twoclusters

Threshold value ofroughness (14)

No

Threshold value ofroughness

appropriate

Upperapproximate Lower approximate

Machine learningalgorithms

Performancemeasurement

Ambiguous objects Correct objects

Denied

Yes

Figure 1 Proposed model

Table 1 Features of diabetic dataset

Feature name CategoriesPregnancies NumericGlucose NumericBlood pressure NumericSkin thickness NumericInsulin NumericBMI NumericDiabetes pedigree function NumericAge Numeric

Class Nominal diabeticsnot diabetics (0 1)

4 Journal of Healthcare Engineering

issue the present research work applies the RKM techniqueto recognize ambiguous packets from chronic diseasedatasets e detailed description of the RKM algorithmemployed for identifying the ambiguous objects is presentedin the subsequent subsections

321 Rough K-Means Clustering Algorithm e proposedRKM clustering approach is based on a simple K-meansclustering [40ndash42] Peters [43] enhanced the algorithm of[40] (original proposal) by calculating rough centroid usingratios of distances as new proposals to differentiate betweensimilar distances Joshi and Lingras [44] used RKM andECM clustering algorithms to handle high dimensional dataAldhyani and Joshi [39] used the rough K-means and ECMclustering algorithms to handle ambiguous objects of in-trusion detection e rough K-means approach is designedto determine the ambiguous objects that belong to the upperboundary of clusters Cluster the data as lower approxi-mation and upper approximation e rough K-Meansrepresents each

(P1) An object xrarr can be part of at most one lower

approximation (lower bound)(P2) xisinA( c

rarri) rArr c

rarrisinA( crarr

i)

(P3) An object xrarr is not part of any lower

approximationxrarr belongs to two or more upper approximations(upper bound)

Overall ideas of soft clustering are more appropriate todeal with ambiguous objects When the algorithm is pro-cessed all objects are assigned wlower and wupper For eachobject vector v

rarr let d ( vrarr c

rarrj) be the distance between itself

and the centroid of cluster crarr

j Let d ( vrarr c

rarri)min 1 le j le k

d ( vrarr c

rarrj) e ratios d ( v

rarr crarr

j)d ( vrarr c

rarrj) 1le I jle k are

used to determine the membership of vrarr Let T j d ( v

rarrc

rarrj)d ( v

rarr crarr

j)ge threshold and ine j

(1) If T ϕ vrarrisinA ( c

rarrj) and v

rarrisinA ( crarr

j) forallj isinT Fur-thermore v

rarr is not part of any lower approximatione above criterion guarantees that property (P3) issatisfied

Table 2 Features of cancer dataset

Feature name CategoriesId NumericRadius mean NumericTexture_mean NumericPerimeter_mean NumericArea_mean NumericSmoothness_mean NumericCompactness_mean NumericConcavity_mean NumericCONCAVE points_mean NumericSymmetry_mean NumericFractal_dimension_mean NumericRadius_se NumericTexture_se NumericPerimeter_se NumericArea_se NumericSmoothness_se NumericCompactness_se NumericConcavity_se NumericConcave points_se NumericSymmetry_se NumericFractal_dimension_se NumericRadius_worst NumericTexture_worst NumericPerimeter_worst NumericArea_worst NumericSmoothness_worst NumericCompactness_worst NumericConcavity_worst NumericConcave points_worst NumericSymmetry_worst NumericFractal_dimension_worst Numeric

Diagnosis Nominal (Mmalignantand B benign)

Table 3 Features of kidney dataset

Feature name CategoriesAge Numericbp blood pressure Numeric

sg specific gravity Nominal 1005 10101015 1020 1025

al albumin Nominal 0 1 2 3 4 5su sugar Nominal 0 1 2 3 4 5rbc red blood cells Nominal 0 1pc pus cell Nominal 0 1pcc pus cell clumps Nominal 0 1ba bacteria Nominal 0 1bgr blood glucose random Numericbu blood urea Numericsc serum creatinine Numericsod sodium Numericpot potassium Numerichemo hemoglobin Numericpcv packed cell volume Numericwc white blood cell count Numericrc red blood cell count Numerichtn hypertension Nominaldm diabetes mellitus Nominalcad coronary artery disease Nominalappet appetite Nominalpe pedal edema Nominalane anemia Nominalclass class Nominal CKD not CKD

60

60

50

50

40

40

30

30

20

20

10

00 10

Figure 2 Sample of ambiguous objects

Journal of Healthcare Engineering 5

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 2: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

required corrective measures accordingly Factors such assmoking physical inactivity food diet and insufficient orexcessive alcohol consumption could largely contribute tochronic diseases Previous studies have identified chronicdiseases as the seventh cause of death among other causes Inthe United States they resulted in 658 of deaths among USmales and 672 of deaths among US females in 2010 [1]Heart cancer diabetes asthma and kidney diseases areidentified as chronic diseases In addition chronic diseasesare measured as noncommunicable diseases they slowly endthe life of people in a long period Chronic diseases do nottransfer from one person to another In the United Stateschronic diseases drive up medicinal service expenses andbreak up human services reasonably ey possess an es-sential part of the economy and thwart the health quality ofpeople is study promotes the classification of chronicdisease conditions namely cancer kidney asthma anddiabetes e World Health Organization reported in 2002that mortality dreariness and incapacity were credited tothe major chronic diseases Currently records show that60 of all deaths and 43 of the global weight of illnesses areattributed to chronic diseases By 2020 it is expected that thepercentage of deaths will reach 73 of total deaths and 60of the global weight of sicknesses [2] With the help ofmachine learning algorithms predicting chronic diseaseshas become an easy task erefore it is aimed here todevelop a surveillance system to predict and diagnosehealthcare data for helping the health communities

e machine learning algorithms increase the level ofindividual prediction and therefore diagnosis will ultimatelystrengthen anticipation efforts e availability of wellprevention measures will not only enhance or provide goodhealth for persons but also reduce healthcare spendingMachine learning techniques have been used widely in thehealthcare domain ey have now become crucial tools forhealthcare management ey have also assisted in im-proving health care by using the prediction measures forepidemic diseases faced by people around the world eWorld Health Organization (WHO) has significantlybenefited from the employment of machine learning ap-plications that improve the quality of health care

Machine learning algorithms are considered to beclassification clustering and prediction for the sake ofsolving various issues in real-time applicationsey providean assurance of the classification and prediction solutions forstability and reliability in performance Based on machinelearning algorithms a few researchers have developedsuccessful healthcare systems Algorithms include statisticsSVM decision trees clustering and optimization algorithmsand others Machine learning applications rely largely ondatasets that analyze and discover the patterns that are usedto solve specific tasks e healthcare system has the po-tential promotion in the health domain to extract anddiscover the hidden patterns in the database [3] us theavailable healthcare data are universally scattered and am-biguousey may also contain insufficient and insignificantinformation stored in terms of the constancy in predictionand classification One of the biggest challenges of healthcaredata and its information is the accurate diagnosis of certain

significant information To predict and analyze the chronicdiseases such as kidney diabetic cancer and heart diseasesthere are several proposed machine learning algorithms thatcan be usedese algorithms include the decision tree (DT)SVM ANN linear regression (LR) KNN NB and timeseries prediction models Because of the rapid innovationand continuous changes in software engineering a hugevolume of information can be generated With the devel-opment of a healthcare database management system therewill be more opportunities for the enhancement of thehealthcare systems Extracting patterns from these datasetsand managing large amounts of dimensionality data havebecome a major field of machine learning e machinelearning algorithm is considered to be the classification ofhealthcare datasets to obtain useful knowledge that can helphealth officials and communities To apply machine learningalgorithms that enhance the performance of the classifica-tion process the preprocessing of the soft clustering algo-rithm is required

e remaining parts of the article are organized intosections Introduction is discussed in Section 1 relatedstudies are given in Section 2 data andmethods are shown inSection 3 and results and discussion are shown in Section 4Lastly conclusion is presented in Section 5

2 Related Studies

ere is a considerable number of research works that havebeen done in relation to the classification and the predictionof healthcare data Solanki [4] proposed most of the classifieralgorithms on the Weka tool for predicting the prevalentsickle cell disease e obtained results compared with theclassifiers are available on the Weka data mining tool It isobserved that the random tree approach is a better algorithmfor classifying sickle cell Similarly Joshi et al [5] used anumber of machine learning approaches such as Bayes netlogistic model tree (LMT) multilayer perception stochasticgradient descent and sequential minimal optimizationtechniques ese researchers suggested using LMT algo-rithms for diagnosing breast cancer because of its highperformance and accuracy Furthermore David et al [6]applied the KNN algorithm Bayesian network decision treealgorithm and random tree method namely the J48 tree topredict leukemia disease Accordingly it was found that thedecision tree algorithm had shown better accuracy in theresult In one more study conducted by Vijayarani andSudha [7] LMT and the sequential minimal optimizationmultilayer and perceptron algorithms are employed topredict heart diseases Furthermore the study conducted bySugandhi et al [8] proposed a random tree algorithm for theclassification of heart diseases e outcome of the researchhas shown that random tree gives a better performance thanother classification algorithms Consequently for obtainingresults from a random tree classifier it is found that therandom tree classifier is outperformede study of Yasodhaand Kannan [9] has also reported that the Weka classifi-cation algorithm was used for analyzing and predicting thediabetic patientrsquos database Likewise Bin Othman and Yau[10] have compared various classification approaches with

2 Journal of Healthcare Engineering

theWeka data mining tool for predicting breast cancer Israa[11] has applied NB decision tree (DT) random forest andsupport vector machine techniques to improve the classi-fication of heart diseases On the other hand D Sisodia andSisodia [12] have proposed three machine learning algo-rithms that is DT SVM and naive Bayes (NB) to detectdiabetes us experiments have been done by usingstandard data from the UCI machine learning repository Itis observed that the NB approach outperforms as comparedwith other algorithms which have an accuracy rate of7630 e study of Syed et al [13] has employed SVMBayesian network and decision tree algorithms to predictthe obesity of schoolchildren Sandeep et al [14] proposedlinear discriminate analysis (LDA) NB random forest LRand quadratic discriminate analysis (QDA) for the analysisand classification of chronic kidney diseases e study ofSahana and Minavathi [15] has also focused on predictingkidney disease using classification algorithms namely ANNand C45 It concentrated on accurate prediction and timefactor performance e ANN and C45 algorithms are usedfor helping out the medical practitioner to give propermedication and medical treatment K Polaraju and Prasad[16] have proposed a multiple regression model to classifychronic heart disease It is proved that the multiple linearregression model is favourable for predicting heart diseases

In this research the training dataset consists of 3000values with 13 different features From the experimentalresults it is shown that the regression algorithm performsbetter than other algorithms Kim et al [17] have proposedthe character-recurrent neural network (Char-RNN) modelto predict chronic diseases ey have collected data fromthe Korean National Health and Nutrition ExaminationSurvey (KNHANES) It is observed that the Char-RNNmodel obtained higher accuracy than the conventionalmultilayer perceptron model Ng et al [18] have used ma-chine learning algorithms to detect heart failure Moreoverelectronic health record data are used to predict events andthe onset of diseases Zhang et al [19] have proposed aconvolution neural network (CNN) architecture namedGroup Net to predict chronic diseases us the experi-mental analysis is conducted using data from local medicalcenters ey have noted that CNN has achieved the bestaccuracy Kriplani et al [20] have used deep learning topredict chronic kidney disease e proposed models aretested by using standard datasets of diseases available on theUCI e analysis results have appeared better in usingcross-validation performance Liu et al [21] proposed CNNLSTM and hierarchical models to predict chronic diseasesBrisimi et al [22] have applied four machine learning al-gorithms namely SVM kernelized sparse logistic regres-sion and random forests to predict chronic heart anddiabetic diseases ey have gathered standard data fromelectronic health records (EHRs) Chen et al [23] usedstreamline machine learning techniques to predict chronicdisease epidemics eir experiment has proposed predic-tion models using real-life hospital data gathered fromcentral China in 2013ndash2015e convolution neural networkis implemented as based on multimodal disease risk pre-diction (CNN-MDRP) algorithm using structured and

unstructured data from hospitals Patel et al [24] havedeveloped a system using three classifiers such as KStarSMO and J48 Bayes net and multilayer perception neuralnetwork algorithms with the help of Weka software toclassify heart diseases It is observed that the Bayes net hasaccomplished optimum performance as compared withfurther classification algorithms namely KStar multilayerperception and J48 approaches by using the k-fold cross-validation method e research of Deepika and Seema [25]also designed a system to predict chronic diseases via ma-chine learning algorithms such as naıve Bayes decision treeSVM and ANN A comparative analysis of the performancesof algorithms is presented It is observed that the support ofthe vector machine and the naıve Bayes provides the highestaccuracy rate when predicting the diabetic disease Ul Haqet al [26] have suggested different machine learning algo-rithms such as naive Bayes classification tree KNN logisticregression SVM and ANN to predict heart diseases reefeature selection methods are applied to improve the clas-sification algorithms It is concluded that the feature se-lectionmethod increases the performance of the classifier forpredicting heart diseases Ahmed et al [27] proposed a fuzzylogic algorithm to classify kidney diseases Coacci et al [28]used two classification approaches namely logistic regres-sion and ANN Xun et al [29] presented ANN and naıveBayes classifiers for predicting chronic diseases Some re-searchers used the UCI machine learning repository fortesting proposed models such as chronic diseases [30 31]diabetic disease [32 33] and breast cancer [34] Using thedeep learning algorithm to predict chronic diseases Kimet al [17] proposed nature-inspired computing algorithmsfor the diagnosis of chronic diseases [35] and employedmachine learning algorithm to develop E-health for thediagnosis of chronic diseases [36]

In the current research article traditional machinelearning algorithms are employed for predicting chronicdiseases erefore the result of the existing classificationalgorithms is needed to make the healthcare system morereliable Subsequently the soft clustering algorithm is ap-plied to increase the accuracy of classification algorithms

3 Materials and Methods

e proposed model is designed explicitly to classify chronicdiseases using machine learning algorithms Figure 1 dis-plays the proposed system that combines the existing ma-chine learning algorithms with the rough k-means clusteringtechnique Noncrisp rough k-means algorithm is demon-strated to handle the ambiguous objects ese ambiguousobjects obstruct the performance of machine learning al-gorithms e RKM clustering has clustered data into twoclusters us it is used to measure the roughness of theobjects Moreover the threshold value is also used tomaximize the roughness of objects for reducing the am-biguous objects e threshold value parameter plays a verysignificant role in making the program of the noncrisp al-gorithm It has experimented and found out that thethreshold value is 14 e RKM algorithm is used to dealwith ambiguous objects for improving the classification

Journal of Healthcare Engineering 3

algorithme RKM algorithm has clustered data into lowerapproximation and upper approximation in which theclustered objects in the lower approximation are consideredbut the objects clustered in upper approximation are ex-cluded e novelty of the proposed model has used roughk-means to handle the ambiguous objects belonging to lowerapproximation that is processed with the help of machinelearning algorithms e rough k-means clustering is pro-posed to explicitly determine ambiguous objects To close itis investigated that the results of the proposed system haveoutperformed all the alternative models used for measuringthe performance e detailed description of the proposedsystem is discussed in the following subsections

31 Datasets e chronic disease datasets have been col-lected from the different resources as follows

311 Diabetic Disease Dataset e diabetes data collectedfrom the machine learning repository contained nine at-tributes eight features and one class is dataset has beengathered from an automatic electronic recording device andpaper records [37] Table 1 shows the features of data

312 Breast Cancer Disease Dataset e cancer data col-lected from the Kaggle contained nine attributes 30 featuresand 1 class ese features are obtained from digitizedimages of breast cancer [38] Table 2 shows the features ofdata

313 Kidney Disease Dataset e collected kidney datafrom the Kaggle contained 26 attributes 24 features and 1class [38] Table 3 shows the features of data

32HandlingAmbiguity Machine learning algorithms havesucceeded in a number of real-time applications such asimage processing recognition video recognition marketingprediction weather forecasting and network security econventional machine learning algorithms are used toidentify the objects belonging to exactly one class In dataanalysis it may be possible that an object shows the char-acteristics of different classes [39] In that event an objectshould belong to more than one class and as a result objectboundaries should necessarily overlap e machinelearning algorithms categorize an object into one classprecisely Figure 2 shows the ambiguous data Such re-quirement is found to be too restrictive in a number of real-time applications

In Figure 2 the basic example of ambiguous objects canbe noted It clearly shows the three separate classes Hence itis observed that five objects are not classified under anyprecise class Henceforth these five objects decrease theperformance of machine learning algorithms us it isrequired to determine such ambiguous objects and deal withthem before applying machine learning algorithms For this

Data sets

Rough K-means

Clustering data

Clustering data into twoclusters

Threshold value ofroughness (14)

No

Threshold value ofroughness

appropriate

Upperapproximate Lower approximate

Machine learningalgorithms

Performancemeasurement

Ambiguous objects Correct objects

Denied

Yes

Figure 1 Proposed model

Table 1 Features of diabetic dataset

Feature name CategoriesPregnancies NumericGlucose NumericBlood pressure NumericSkin thickness NumericInsulin NumericBMI NumericDiabetes pedigree function NumericAge Numeric

Class Nominal diabeticsnot diabetics (0 1)

4 Journal of Healthcare Engineering

issue the present research work applies the RKM techniqueto recognize ambiguous packets from chronic diseasedatasets e detailed description of the RKM algorithmemployed for identifying the ambiguous objects is presentedin the subsequent subsections

321 Rough K-Means Clustering Algorithm e proposedRKM clustering approach is based on a simple K-meansclustering [40ndash42] Peters [43] enhanced the algorithm of[40] (original proposal) by calculating rough centroid usingratios of distances as new proposals to differentiate betweensimilar distances Joshi and Lingras [44] used RKM andECM clustering algorithms to handle high dimensional dataAldhyani and Joshi [39] used the rough K-means and ECMclustering algorithms to handle ambiguous objects of in-trusion detection e rough K-means approach is designedto determine the ambiguous objects that belong to the upperboundary of clusters Cluster the data as lower approxi-mation and upper approximation e rough K-Meansrepresents each

(P1) An object xrarr can be part of at most one lower

approximation (lower bound)(P2) xisinA( c

rarri) rArr c

rarrisinA( crarr

i)

(P3) An object xrarr is not part of any lower

approximationxrarr belongs to two or more upper approximations(upper bound)

Overall ideas of soft clustering are more appropriate todeal with ambiguous objects When the algorithm is pro-cessed all objects are assigned wlower and wupper For eachobject vector v

rarr let d ( vrarr c

rarrj) be the distance between itself

and the centroid of cluster crarr

j Let d ( vrarr c

rarri)min 1 le j le k

d ( vrarr c

rarrj) e ratios d ( v

rarr crarr

j)d ( vrarr c

rarrj) 1le I jle k are

used to determine the membership of vrarr Let T j d ( v

rarrc

rarrj)d ( v

rarr crarr

j)ge threshold and ine j

(1) If T ϕ vrarrisinA ( c

rarrj) and v

rarrisinA ( crarr

j) forallj isinT Fur-thermore v

rarr is not part of any lower approximatione above criterion guarantees that property (P3) issatisfied

Table 2 Features of cancer dataset

Feature name CategoriesId NumericRadius mean NumericTexture_mean NumericPerimeter_mean NumericArea_mean NumericSmoothness_mean NumericCompactness_mean NumericConcavity_mean NumericCONCAVE points_mean NumericSymmetry_mean NumericFractal_dimension_mean NumericRadius_se NumericTexture_se NumericPerimeter_se NumericArea_se NumericSmoothness_se NumericCompactness_se NumericConcavity_se NumericConcave points_se NumericSymmetry_se NumericFractal_dimension_se NumericRadius_worst NumericTexture_worst NumericPerimeter_worst NumericArea_worst NumericSmoothness_worst NumericCompactness_worst NumericConcavity_worst NumericConcave points_worst NumericSymmetry_worst NumericFractal_dimension_worst Numeric

Diagnosis Nominal (Mmalignantand B benign)

Table 3 Features of kidney dataset

Feature name CategoriesAge Numericbp blood pressure Numeric

sg specific gravity Nominal 1005 10101015 1020 1025

al albumin Nominal 0 1 2 3 4 5su sugar Nominal 0 1 2 3 4 5rbc red blood cells Nominal 0 1pc pus cell Nominal 0 1pcc pus cell clumps Nominal 0 1ba bacteria Nominal 0 1bgr blood glucose random Numericbu blood urea Numericsc serum creatinine Numericsod sodium Numericpot potassium Numerichemo hemoglobin Numericpcv packed cell volume Numericwc white blood cell count Numericrc red blood cell count Numerichtn hypertension Nominaldm diabetes mellitus Nominalcad coronary artery disease Nominalappet appetite Nominalpe pedal edema Nominalane anemia Nominalclass class Nominal CKD not CKD

60

60

50

50

40

40

30

30

20

20

10

00 10

Figure 2 Sample of ambiguous objects

Journal of Healthcare Engineering 5

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 3: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

theWeka data mining tool for predicting breast cancer Israa[11] has applied NB decision tree (DT) random forest andsupport vector machine techniques to improve the classi-fication of heart diseases On the other hand D Sisodia andSisodia [12] have proposed three machine learning algo-rithms that is DT SVM and naive Bayes (NB) to detectdiabetes us experiments have been done by usingstandard data from the UCI machine learning repository Itis observed that the NB approach outperforms as comparedwith other algorithms which have an accuracy rate of7630 e study of Syed et al [13] has employed SVMBayesian network and decision tree algorithms to predictthe obesity of schoolchildren Sandeep et al [14] proposedlinear discriminate analysis (LDA) NB random forest LRand quadratic discriminate analysis (QDA) for the analysisand classification of chronic kidney diseases e study ofSahana and Minavathi [15] has also focused on predictingkidney disease using classification algorithms namely ANNand C45 It concentrated on accurate prediction and timefactor performance e ANN and C45 algorithms are usedfor helping out the medical practitioner to give propermedication and medical treatment K Polaraju and Prasad[16] have proposed a multiple regression model to classifychronic heart disease It is proved that the multiple linearregression model is favourable for predicting heart diseases

In this research the training dataset consists of 3000values with 13 different features From the experimentalresults it is shown that the regression algorithm performsbetter than other algorithms Kim et al [17] have proposedthe character-recurrent neural network (Char-RNN) modelto predict chronic diseases ey have collected data fromthe Korean National Health and Nutrition ExaminationSurvey (KNHANES) It is observed that the Char-RNNmodel obtained higher accuracy than the conventionalmultilayer perceptron model Ng et al [18] have used ma-chine learning algorithms to detect heart failure Moreoverelectronic health record data are used to predict events andthe onset of diseases Zhang et al [19] have proposed aconvolution neural network (CNN) architecture namedGroup Net to predict chronic diseases us the experi-mental analysis is conducted using data from local medicalcenters ey have noted that CNN has achieved the bestaccuracy Kriplani et al [20] have used deep learning topredict chronic kidney disease e proposed models aretested by using standard datasets of diseases available on theUCI e analysis results have appeared better in usingcross-validation performance Liu et al [21] proposed CNNLSTM and hierarchical models to predict chronic diseasesBrisimi et al [22] have applied four machine learning al-gorithms namely SVM kernelized sparse logistic regres-sion and random forests to predict chronic heart anddiabetic diseases ey have gathered standard data fromelectronic health records (EHRs) Chen et al [23] usedstreamline machine learning techniques to predict chronicdisease epidemics eir experiment has proposed predic-tion models using real-life hospital data gathered fromcentral China in 2013ndash2015e convolution neural networkis implemented as based on multimodal disease risk pre-diction (CNN-MDRP) algorithm using structured and

unstructured data from hospitals Patel et al [24] havedeveloped a system using three classifiers such as KStarSMO and J48 Bayes net and multilayer perception neuralnetwork algorithms with the help of Weka software toclassify heart diseases It is observed that the Bayes net hasaccomplished optimum performance as compared withfurther classification algorithms namely KStar multilayerperception and J48 approaches by using the k-fold cross-validation method e research of Deepika and Seema [25]also designed a system to predict chronic diseases via ma-chine learning algorithms such as naıve Bayes decision treeSVM and ANN A comparative analysis of the performancesof algorithms is presented It is observed that the support ofthe vector machine and the naıve Bayes provides the highestaccuracy rate when predicting the diabetic disease Ul Haqet al [26] have suggested different machine learning algo-rithms such as naive Bayes classification tree KNN logisticregression SVM and ANN to predict heart diseases reefeature selection methods are applied to improve the clas-sification algorithms It is concluded that the feature se-lectionmethod increases the performance of the classifier forpredicting heart diseases Ahmed et al [27] proposed a fuzzylogic algorithm to classify kidney diseases Coacci et al [28]used two classification approaches namely logistic regres-sion and ANN Xun et al [29] presented ANN and naıveBayes classifiers for predicting chronic diseases Some re-searchers used the UCI machine learning repository fortesting proposed models such as chronic diseases [30 31]diabetic disease [32 33] and breast cancer [34] Using thedeep learning algorithm to predict chronic diseases Kimet al [17] proposed nature-inspired computing algorithmsfor the diagnosis of chronic diseases [35] and employedmachine learning algorithm to develop E-health for thediagnosis of chronic diseases [36]

In the current research article traditional machinelearning algorithms are employed for predicting chronicdiseases erefore the result of the existing classificationalgorithms is needed to make the healthcare system morereliable Subsequently the soft clustering algorithm is ap-plied to increase the accuracy of classification algorithms

3 Materials and Methods

e proposed model is designed explicitly to classify chronicdiseases using machine learning algorithms Figure 1 dis-plays the proposed system that combines the existing ma-chine learning algorithms with the rough k-means clusteringtechnique Noncrisp rough k-means algorithm is demon-strated to handle the ambiguous objects ese ambiguousobjects obstruct the performance of machine learning al-gorithms e RKM clustering has clustered data into twoclusters us it is used to measure the roughness of theobjects Moreover the threshold value is also used tomaximize the roughness of objects for reducing the am-biguous objects e threshold value parameter plays a verysignificant role in making the program of the noncrisp al-gorithm It has experimented and found out that thethreshold value is 14 e RKM algorithm is used to dealwith ambiguous objects for improving the classification

Journal of Healthcare Engineering 3

algorithme RKM algorithm has clustered data into lowerapproximation and upper approximation in which theclustered objects in the lower approximation are consideredbut the objects clustered in upper approximation are ex-cluded e novelty of the proposed model has used roughk-means to handle the ambiguous objects belonging to lowerapproximation that is processed with the help of machinelearning algorithms e rough k-means clustering is pro-posed to explicitly determine ambiguous objects To close itis investigated that the results of the proposed system haveoutperformed all the alternative models used for measuringthe performance e detailed description of the proposedsystem is discussed in the following subsections

31 Datasets e chronic disease datasets have been col-lected from the different resources as follows

311 Diabetic Disease Dataset e diabetes data collectedfrom the machine learning repository contained nine at-tributes eight features and one class is dataset has beengathered from an automatic electronic recording device andpaper records [37] Table 1 shows the features of data

312 Breast Cancer Disease Dataset e cancer data col-lected from the Kaggle contained nine attributes 30 featuresand 1 class ese features are obtained from digitizedimages of breast cancer [38] Table 2 shows the features ofdata

313 Kidney Disease Dataset e collected kidney datafrom the Kaggle contained 26 attributes 24 features and 1class [38] Table 3 shows the features of data

32HandlingAmbiguity Machine learning algorithms havesucceeded in a number of real-time applications such asimage processing recognition video recognition marketingprediction weather forecasting and network security econventional machine learning algorithms are used toidentify the objects belonging to exactly one class In dataanalysis it may be possible that an object shows the char-acteristics of different classes [39] In that event an objectshould belong to more than one class and as a result objectboundaries should necessarily overlap e machinelearning algorithms categorize an object into one classprecisely Figure 2 shows the ambiguous data Such re-quirement is found to be too restrictive in a number of real-time applications

In Figure 2 the basic example of ambiguous objects canbe noted It clearly shows the three separate classes Hence itis observed that five objects are not classified under anyprecise class Henceforth these five objects decrease theperformance of machine learning algorithms us it isrequired to determine such ambiguous objects and deal withthem before applying machine learning algorithms For this

Data sets

Rough K-means

Clustering data

Clustering data into twoclusters

Threshold value ofroughness (14)

No

Threshold value ofroughness

appropriate

Upperapproximate Lower approximate

Machine learningalgorithms

Performancemeasurement

Ambiguous objects Correct objects

Denied

Yes

Figure 1 Proposed model

Table 1 Features of diabetic dataset

Feature name CategoriesPregnancies NumericGlucose NumericBlood pressure NumericSkin thickness NumericInsulin NumericBMI NumericDiabetes pedigree function NumericAge Numeric

Class Nominal diabeticsnot diabetics (0 1)

4 Journal of Healthcare Engineering

issue the present research work applies the RKM techniqueto recognize ambiguous packets from chronic diseasedatasets e detailed description of the RKM algorithmemployed for identifying the ambiguous objects is presentedin the subsequent subsections

321 Rough K-Means Clustering Algorithm e proposedRKM clustering approach is based on a simple K-meansclustering [40ndash42] Peters [43] enhanced the algorithm of[40] (original proposal) by calculating rough centroid usingratios of distances as new proposals to differentiate betweensimilar distances Joshi and Lingras [44] used RKM andECM clustering algorithms to handle high dimensional dataAldhyani and Joshi [39] used the rough K-means and ECMclustering algorithms to handle ambiguous objects of in-trusion detection e rough K-means approach is designedto determine the ambiguous objects that belong to the upperboundary of clusters Cluster the data as lower approxi-mation and upper approximation e rough K-Meansrepresents each

(P1) An object xrarr can be part of at most one lower

approximation (lower bound)(P2) xisinA( c

rarri) rArr c

rarrisinA( crarr

i)

(P3) An object xrarr is not part of any lower

approximationxrarr belongs to two or more upper approximations(upper bound)

Overall ideas of soft clustering are more appropriate todeal with ambiguous objects When the algorithm is pro-cessed all objects are assigned wlower and wupper For eachobject vector v

rarr let d ( vrarr c

rarrj) be the distance between itself

and the centroid of cluster crarr

j Let d ( vrarr c

rarri)min 1 le j le k

d ( vrarr c

rarrj) e ratios d ( v

rarr crarr

j)d ( vrarr c

rarrj) 1le I jle k are

used to determine the membership of vrarr Let T j d ( v

rarrc

rarrj)d ( v

rarr crarr

j)ge threshold and ine j

(1) If T ϕ vrarrisinA ( c

rarrj) and v

rarrisinA ( crarr

j) forallj isinT Fur-thermore v

rarr is not part of any lower approximatione above criterion guarantees that property (P3) issatisfied

Table 2 Features of cancer dataset

Feature name CategoriesId NumericRadius mean NumericTexture_mean NumericPerimeter_mean NumericArea_mean NumericSmoothness_mean NumericCompactness_mean NumericConcavity_mean NumericCONCAVE points_mean NumericSymmetry_mean NumericFractal_dimension_mean NumericRadius_se NumericTexture_se NumericPerimeter_se NumericArea_se NumericSmoothness_se NumericCompactness_se NumericConcavity_se NumericConcave points_se NumericSymmetry_se NumericFractal_dimension_se NumericRadius_worst NumericTexture_worst NumericPerimeter_worst NumericArea_worst NumericSmoothness_worst NumericCompactness_worst NumericConcavity_worst NumericConcave points_worst NumericSymmetry_worst NumericFractal_dimension_worst Numeric

Diagnosis Nominal (Mmalignantand B benign)

Table 3 Features of kidney dataset

Feature name CategoriesAge Numericbp blood pressure Numeric

sg specific gravity Nominal 1005 10101015 1020 1025

al albumin Nominal 0 1 2 3 4 5su sugar Nominal 0 1 2 3 4 5rbc red blood cells Nominal 0 1pc pus cell Nominal 0 1pcc pus cell clumps Nominal 0 1ba bacteria Nominal 0 1bgr blood glucose random Numericbu blood urea Numericsc serum creatinine Numericsod sodium Numericpot potassium Numerichemo hemoglobin Numericpcv packed cell volume Numericwc white blood cell count Numericrc red blood cell count Numerichtn hypertension Nominaldm diabetes mellitus Nominalcad coronary artery disease Nominalappet appetite Nominalpe pedal edema Nominalane anemia Nominalclass class Nominal CKD not CKD

60

60

50

50

40

40

30

30

20

20

10

00 10

Figure 2 Sample of ambiguous objects

Journal of Healthcare Engineering 5

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 4: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

algorithme RKM algorithm has clustered data into lowerapproximation and upper approximation in which theclustered objects in the lower approximation are consideredbut the objects clustered in upper approximation are ex-cluded e novelty of the proposed model has used roughk-means to handle the ambiguous objects belonging to lowerapproximation that is processed with the help of machinelearning algorithms e rough k-means clustering is pro-posed to explicitly determine ambiguous objects To close itis investigated that the results of the proposed system haveoutperformed all the alternative models used for measuringthe performance e detailed description of the proposedsystem is discussed in the following subsections

31 Datasets e chronic disease datasets have been col-lected from the different resources as follows

311 Diabetic Disease Dataset e diabetes data collectedfrom the machine learning repository contained nine at-tributes eight features and one class is dataset has beengathered from an automatic electronic recording device andpaper records [37] Table 1 shows the features of data

312 Breast Cancer Disease Dataset e cancer data col-lected from the Kaggle contained nine attributes 30 featuresand 1 class ese features are obtained from digitizedimages of breast cancer [38] Table 2 shows the features ofdata

313 Kidney Disease Dataset e collected kidney datafrom the Kaggle contained 26 attributes 24 features and 1class [38] Table 3 shows the features of data

32HandlingAmbiguity Machine learning algorithms havesucceeded in a number of real-time applications such asimage processing recognition video recognition marketingprediction weather forecasting and network security econventional machine learning algorithms are used toidentify the objects belonging to exactly one class In dataanalysis it may be possible that an object shows the char-acteristics of different classes [39] In that event an objectshould belong to more than one class and as a result objectboundaries should necessarily overlap e machinelearning algorithms categorize an object into one classprecisely Figure 2 shows the ambiguous data Such re-quirement is found to be too restrictive in a number of real-time applications

In Figure 2 the basic example of ambiguous objects canbe noted It clearly shows the three separate classes Hence itis observed that five objects are not classified under anyprecise class Henceforth these five objects decrease theperformance of machine learning algorithms us it isrequired to determine such ambiguous objects and deal withthem before applying machine learning algorithms For this

Data sets

Rough K-means

Clustering data

Clustering data into twoclusters

Threshold value ofroughness (14)

No

Threshold value ofroughness

appropriate

Upperapproximate Lower approximate

Machine learningalgorithms

Performancemeasurement

Ambiguous objects Correct objects

Denied

Yes

Figure 1 Proposed model

Table 1 Features of diabetic dataset

Feature name CategoriesPregnancies NumericGlucose NumericBlood pressure NumericSkin thickness NumericInsulin NumericBMI NumericDiabetes pedigree function NumericAge Numeric

Class Nominal diabeticsnot diabetics (0 1)

4 Journal of Healthcare Engineering

issue the present research work applies the RKM techniqueto recognize ambiguous packets from chronic diseasedatasets e detailed description of the RKM algorithmemployed for identifying the ambiguous objects is presentedin the subsequent subsections

321 Rough K-Means Clustering Algorithm e proposedRKM clustering approach is based on a simple K-meansclustering [40ndash42] Peters [43] enhanced the algorithm of[40] (original proposal) by calculating rough centroid usingratios of distances as new proposals to differentiate betweensimilar distances Joshi and Lingras [44] used RKM andECM clustering algorithms to handle high dimensional dataAldhyani and Joshi [39] used the rough K-means and ECMclustering algorithms to handle ambiguous objects of in-trusion detection e rough K-means approach is designedto determine the ambiguous objects that belong to the upperboundary of clusters Cluster the data as lower approxi-mation and upper approximation e rough K-Meansrepresents each

(P1) An object xrarr can be part of at most one lower

approximation (lower bound)(P2) xisinA( c

rarri) rArr c

rarrisinA( crarr

i)

(P3) An object xrarr is not part of any lower

approximationxrarr belongs to two or more upper approximations(upper bound)

Overall ideas of soft clustering are more appropriate todeal with ambiguous objects When the algorithm is pro-cessed all objects are assigned wlower and wupper For eachobject vector v

rarr let d ( vrarr c

rarrj) be the distance between itself

and the centroid of cluster crarr

j Let d ( vrarr c

rarri)min 1 le j le k

d ( vrarr c

rarrj) e ratios d ( v

rarr crarr

j)d ( vrarr c

rarrj) 1le I jle k are

used to determine the membership of vrarr Let T j d ( v

rarrc

rarrj)d ( v

rarr crarr

j)ge threshold and ine j

(1) If T ϕ vrarrisinA ( c

rarrj) and v

rarrisinA ( crarr

j) forallj isinT Fur-thermore v

rarr is not part of any lower approximatione above criterion guarantees that property (P3) issatisfied

Table 2 Features of cancer dataset

Feature name CategoriesId NumericRadius mean NumericTexture_mean NumericPerimeter_mean NumericArea_mean NumericSmoothness_mean NumericCompactness_mean NumericConcavity_mean NumericCONCAVE points_mean NumericSymmetry_mean NumericFractal_dimension_mean NumericRadius_se NumericTexture_se NumericPerimeter_se NumericArea_se NumericSmoothness_se NumericCompactness_se NumericConcavity_se NumericConcave points_se NumericSymmetry_se NumericFractal_dimension_se NumericRadius_worst NumericTexture_worst NumericPerimeter_worst NumericArea_worst NumericSmoothness_worst NumericCompactness_worst NumericConcavity_worst NumericConcave points_worst NumericSymmetry_worst NumericFractal_dimension_worst Numeric

Diagnosis Nominal (Mmalignantand B benign)

Table 3 Features of kidney dataset

Feature name CategoriesAge Numericbp blood pressure Numeric

sg specific gravity Nominal 1005 10101015 1020 1025

al albumin Nominal 0 1 2 3 4 5su sugar Nominal 0 1 2 3 4 5rbc red blood cells Nominal 0 1pc pus cell Nominal 0 1pcc pus cell clumps Nominal 0 1ba bacteria Nominal 0 1bgr blood glucose random Numericbu blood urea Numericsc serum creatinine Numericsod sodium Numericpot potassium Numerichemo hemoglobin Numericpcv packed cell volume Numericwc white blood cell count Numericrc red blood cell count Numerichtn hypertension Nominaldm diabetes mellitus Nominalcad coronary artery disease Nominalappet appetite Nominalpe pedal edema Nominalane anemia Nominalclass class Nominal CKD not CKD

60

60

50

50

40

40

30

30

20

20

10

00 10

Figure 2 Sample of ambiguous objects

Journal of Healthcare Engineering 5

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 5: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

issue the present research work applies the RKM techniqueto recognize ambiguous packets from chronic diseasedatasets e detailed description of the RKM algorithmemployed for identifying the ambiguous objects is presentedin the subsequent subsections

321 Rough K-Means Clustering Algorithm e proposedRKM clustering approach is based on a simple K-meansclustering [40ndash42] Peters [43] enhanced the algorithm of[40] (original proposal) by calculating rough centroid usingratios of distances as new proposals to differentiate betweensimilar distances Joshi and Lingras [44] used RKM andECM clustering algorithms to handle high dimensional dataAldhyani and Joshi [39] used the rough K-means and ECMclustering algorithms to handle ambiguous objects of in-trusion detection e rough K-means approach is designedto determine the ambiguous objects that belong to the upperboundary of clusters Cluster the data as lower approxi-mation and upper approximation e rough K-Meansrepresents each

(P1) An object xrarr can be part of at most one lower

approximation (lower bound)(P2) xisinA( c

rarri) rArr c

rarrisinA( crarr

i)

(P3) An object xrarr is not part of any lower

approximationxrarr belongs to two or more upper approximations(upper bound)

Overall ideas of soft clustering are more appropriate todeal with ambiguous objects When the algorithm is pro-cessed all objects are assigned wlower and wupper For eachobject vector v

rarr let d ( vrarr c

rarrj) be the distance between itself

and the centroid of cluster crarr

j Let d ( vrarr c

rarri)min 1 le j le k

d ( vrarr c

rarrj) e ratios d ( v

rarr crarr

j)d ( vrarr c

rarrj) 1le I jle k are

used to determine the membership of vrarr Let T j d ( v

rarrc

rarrj)d ( v

rarr crarr

j)ge threshold and ine j

(1) If T ϕ vrarrisinA ( c

rarrj) and v

rarrisinA ( crarr

j) forallj isinT Fur-thermore v

rarr is not part of any lower approximatione above criterion guarantees that property (P3) issatisfied

Table 2 Features of cancer dataset

Feature name CategoriesId NumericRadius mean NumericTexture_mean NumericPerimeter_mean NumericArea_mean NumericSmoothness_mean NumericCompactness_mean NumericConcavity_mean NumericCONCAVE points_mean NumericSymmetry_mean NumericFractal_dimension_mean NumericRadius_se NumericTexture_se NumericPerimeter_se NumericArea_se NumericSmoothness_se NumericCompactness_se NumericConcavity_se NumericConcave points_se NumericSymmetry_se NumericFractal_dimension_se NumericRadius_worst NumericTexture_worst NumericPerimeter_worst NumericArea_worst NumericSmoothness_worst NumericCompactness_worst NumericConcavity_worst NumericConcave points_worst NumericSymmetry_worst NumericFractal_dimension_worst Numeric

Diagnosis Nominal (Mmalignantand B benign)

Table 3 Features of kidney dataset

Feature name CategoriesAge Numericbp blood pressure Numeric

sg specific gravity Nominal 1005 10101015 1020 1025

al albumin Nominal 0 1 2 3 4 5su sugar Nominal 0 1 2 3 4 5rbc red blood cells Nominal 0 1pc pus cell Nominal 0 1pcc pus cell clumps Nominal 0 1ba bacteria Nominal 0 1bgr blood glucose random Numericbu blood urea Numericsc serum creatinine Numericsod sodium Numericpot potassium Numerichemo hemoglobin Numericpcv packed cell volume Numericwc white blood cell count Numericrc red blood cell count Numerichtn hypertension Nominaldm diabetes mellitus Nominalcad coronary artery disease Nominalappet appetite Nominalpe pedal edema Nominalane anemia Nominalclass class Nominal CKD not CKD

60

60

50

50

40

40

30

30

20

20

10

00 10

Figure 2 Sample of ambiguous objects

Journal of Healthcare Engineering 5

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 6: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

(2) Otherwise if T ϕ vrarrisinA ( c

rarrj) In addition by

property (P2) vrarrisinA ( c

rarrj)

e rough k-means algorithm has stability and reliabilityfor handling ambiguity e rough k-means algorithm hasclustered objects into lower bound and upper bound eobjects in the upper bound are ambiguous objects whereasthe objects in the lower bound are correct objects e upperbound should not be empty and the objects in the upperbound can belong to one or more upper bounds in thecluster numbers Figure 3 shows a snapshot of output ob-tained from the RKM algorithm to determine the ambiguousobjects for improving the performance of machine learningalgorithms e objects in lower bound are correction ob-jects whereas the objects on boundary bound are ambiguousobjects

33 Classification Algorithms In this section conventionalmachine learning algorithms are discussed e automaticclassification namely naive Bayes (NB) support vectormachine (SVM) K-nearest neighbor (KNN) and randomforest tree are presented to predict chronic diseases forenhancing healthcare systems

331 Support Vector Machine Algorithm e supportvector machine is used to analyze data as classification andregression In the SVM algorithm the data point is con-sidered as n-dimensional space where there are a number offeatures of data and the values of features are the values ofa specific coordinate e classification of data is achievedby finding the best difference between the classes of datausing hyperplane A support vector machine algorithmclassifies data by separating the hyperactive plane of labeltraining data e SVM obtains lower error when themargin is large In the present research work two classes ofchronic diseases are used All types of kernel functions areapplied to classify the chronic disease datasets in whichradial basis function (RBF) along with kernel functionobtain high accuracy e kernel function is applied andobserved that the RBF function and kernel function areappropriate with the RKM algorithm to obtain goodaccuracy

k(x x) exp minusx minus x

2σ21113888 1113889 (1)

where x minus x2 is the square Euclidean distance between twofeature vectors and σ is a parameter

332 Naıve Bayes Algorithm Naive Bayes algorithm isdefined as a probabilistic method used to classify the datasetbased on the well-known Bayes theorem of probability enaıve Bayes classification algorithm works as prior proba-bility posterior probability likelihood probability and ev-idence probability It normally uses probabilitydistributions e working Bayesian algorithm is as followsAssume AA1 A2 A3 An is regarded as the featurevector of chronic disease features and the values of the

features are A1 A2 A3 An and are considered as anumber of features in the dataset C indicates a class ofchronic data as normal and abnormal e Bayes equation isshown as follows

Conditional probability

P(C | A) P(A C)

P(A) (2)

P(C | A) P(A C)

P(c) (3)

It is assumed that the predictor A on the given class c isindependent of the values of other predictors and it isknown as conditional class independence P(C | A) is theposterior probability of class c given predictor (feature)

eorem is as follows

P(C | A) P(C) P (A | C)

P(A) (4)

333 K-Nearest Neighbors Algorithm A K-nearest neigh-bors algorithm is a simple machine learning algorithmwhich uses the entire dataset in its training phase KNNalgorithm has low complexity in programming andimplementation e basic idea can be presented in asample space when its nearest neighboring features belongto a category and then the features belong to the samecategory e KNN classification algorithm can be usedwith either a single or a multidimensional feature datasetand can find the closest features It employs the Euclideandistance method for finding the closest point among thefeatures

D

x1 minus x2( 1113857 + y1 minus y2( 11138572

1113969

(5)

334 Random Forest Algorithm A decision tree algorithmis one of the powerful decisions It is used to build the blockof a random forest It works to select the best split of anobject from the dataset in each step To reduce the highharnessed of variance we can create multiple trees withvarious samples of datasets and combine this operation withbootstrap aggregation or bagging e disadvantage of thebootstrap aggregation method is used to spill the values ofeach tree which creates a problem in decision-makingFurthermore it makes predictions of training data similarand mitigates the variance originally sought us therandom forest algorithm can be further used for the clas-sification and regression problems and for the overfitting ofdata as well e selected attributes are measured byemploying the information gain method to discover thevalue or the information from the entire dataset e in-formation gain method is calculated for each splitting at-tribute with selecting high gain attributes It is assumed thatD is the dataset

6 Journal of Healthcare Engineering

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 7: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

info(D) minus 1113944m

i1pilogpi( 1113857 (6)

whereD is the dataset i 1 2 m is the class of dataset Dand the probability is pi

Let B be an attribute in dataset D and b1 b2 b3 bn1113864 1113865

are values of the attributes in B Attributes are a partition forgenerating the amount of information from attributes

info(D) 1113944n

j1

Dj

11138681113868111386811138681113868

11138681113868111386811138681113868

D⎛⎝ ⎞⎠lowast info Dj1113872 1113873 (7)

e attributes show the highest information as follows

Gain(B) info(D) minus infoB(D) (8)

34 Performance Measurement e performance measuresare used to test and evaluate the proposed system eaccuracy specificity and sensitivity precision recall and F-score evaluation matrices have been employed to test theproposed model e evaluation matrices are computed byusing the equations (9)ndash(13) as described below where wehave true positive (TP) true negative (TN) false positive(FP) and false negative (FN)

341 Accuracy Accuracy is the number of correct pre-dictions made by the model over all kinds of predictionsmade It is calculated as the total number of correct labels(TP+TN) divided by the total number of chronic diseasedatasets (P+N)

accuracy TP + TN

TP + FP + FN + TN (9)

342 Specificity Specificity (also called the true negativerate) is a measure that tells us about the percentage ofpatients who do not suffer from chronic diseases which arepredicted by the model as not chronic diseases

specificity TN

TN + FPtimes 100 (10)

343 Sensitivity Sensitivity (also called the true positiverate the recall or probability of detection) is the measurethat tells about the percentage of patients who actually sufferfrom chronic diseases which are diagnosed by the classi-fication algorithms on chronic diseases

specificity TP

TP + FNtimes 100 (11)

344 Precision Precision is a measure that tells about theproportion of patients that we diagnosed as having chronicdiseases actually had chronic diseases It is known aspositive predictive value (PPV)

precision TP

TP + FPtimes 100 (12)

345 F1-Score F1-score (also called the precision F-scoreF-measure and recall) is the harmonic mean (average) of theprecision and recall

F1 minus score 2lowast precisionlowast sensitivityprecision + sensitivity

100 (13)

4 Experimental Results and Discussion

erefore the rough K-means algorithm is applied forimproving the classification of chronic diseases It is used todetermine the ambiguous objects that have obstructed theclassification algorithms It has further experimented withvarious standard chronic datasets It is aimed here to im-prove the diagnosis of chronic diseases In the beginning theconventional classification algorithms are applied to predictchronic diseases However it is observed that the obtainedresults were not appropriate From the obtained results it isnoted that there are ambiguous objects that decrease theaccuracy of machine learning algorithms One of the biggestchallenges that we have faced within the implementation ofthe proposed system is the ambiguity embedded in thevariable of the standard dataset For this reason the RKMalgorithm is considered to handle these ambiguous objectsso that the accuracy of the classification algorithms can beimproved e RKM algorithm is appropriately designed for

Number of iterations 67Centroid for cluster 0 371 14655 7340 3180 30111 3550 059 3389Lower bound for cluster 08 13 16 20 31 43 53 54 56 73 95 99 111 132 139 144 153 162 156 195 199 206 215 22

Boundary region for cluster 04 14 35 39 91 114 130 150 204 214 223 236 243 244 292 293 298 308 309 325 325 338 382Centroid for cluster 1 370 12010 6882 2122 7561 3191 049 3331Lower bound for cluster 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 32 33 34 36 37

Boundary region for cluster 14 14 35 39 91 11 130 150 204 214 223 236 243 244 292 293 298 308 309 325 338 382

Fitness value is 20412240011599756End

Figure 3 Snapshot of output RKM algorithm

Journal of Healthcare Engineering 7

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 8: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

detecting the ambiguity in the chronic disease datasets eexperimental results have shown that the performance of theproposed system is better than that of the conventionalmodels For measuring and evaluating the performance ofthe proposed system the performance measures are appliede standard evaluation matrices namely accuracy spec-ificity sensitivity precision and F-score have been pre-sented to test the proposed system against the existingmachine learning techniques Moreover for validating theproposed system the datasets are divided into 70 train and30 test Numerous experiments have attempted to evaluatethe proposed system e results of machine learning al-gorithms and enhanced model to the various datasets arepresented as follows

41 ClassificationResults ofDiabeticsDisease In this sectiondifferent experiments of classification algorithms with theenhanced proposed system have been conducted e softcomputing rough K-means algorithm is used to handleambiguous objects e ambiguous objects in chronic dis-ease datasets have reduced the performance of machinelearning algorithms When applying the classification al-gorithm on the original diabetics data it is observed that theresults are not favourable From the data it is investigatedthat there are ambiguous objects that hinder the classifi-cation algorithms e diabetes data contain seven instancesand two classes ese ambiguous objects are examined byRKM clustering to assist in determining the exact class ofambiguous diseases or the closest one e dataset has beenclustered for two clusters corresponding into two classes thatare labelled variables in datasets e RKM algorithm hasclustered the ambiguous objects into upper approximationand lower approximationose objects that belong to upperapproximation which belongs to one or more clusternumbers are excluded Among 768 instances 718 instancesare clustered as a lower approximation Moreover theremaining objects are clustered as an upper approximationand are considered as ambiguous objects as well e am-biguous objects have been denied from the data e clas-sification algorithm is applied to process the data in a lowerapproximation for diagnosing the diseases Table 4 shows theresults obtained from the RKM algorithm used for dis-covering the ambiguous objects

Table 5 shows the results of the classification algorithmnamely naıve Bayes SVM random forest tree and KNN Itis observed that the obtained results are needed to improvee rough K-means is applied to enhance the existingmachine learning Table 6 shows the results of machinelearning techniques with RKM algorithm e roughK-means is used to deal with ambiguous objects It is ob-served that the RKM algorithm has improved the results ofthe classification algorithms e results of naıve Bayes withRKM are 8055 8014 8014 90 and 8478 in termsof accuracy sensitivity specificity precision and F-scorerespectively Similarly the results of SVM with RKM ap-proaches are 7778 7724 7887 8819 and 8235 corre-sponding to accuracy sensitivity and specificity precisionand F-score in that order e results of random forest with

RKM are 7720 5609 6905 and 620 Furthermore theobtained results by using KNN with RKM are 71307929 5658 7708 7708 and 7870 Finally fromthe obtained data it is investigated that the classificationalgorithm is improved by using the RKM algorithmFigures 4ndash7 show the performance of the classification al-gorithms with the RKM algorithm

42 Classification Results of Kidney Disease is sectiondemonstrates the classification of kidney diseases with thehelp of machine learning algorithms and the enhancedproposed system Table 7 shows the results obtained fromthe RKM algorithm to figure out the ambiguous objectsKidney diseases contain 400 instancese data are clusteredinto two clusters e rough K-means clusters data intoupper approximation and lower approximation e objectsthat have been clustered in lower approximation are 174instances ose objects that belong to lower approximationare regarded as approved objects because they belong to thesame cluster numbers e remaining objects that is 226objects are clustered in upper approximation which isconsidered as ambiguous objects

Table 8 shows the performance of existing machinelearning algorithms namely naıve Bayes SVM randomforest tree and KNN It is observed that there is a possibilityfor improving the classification algorithms if they handleambiguous objects It is also noted that the RKM algorithmhas improved the results of the classification algorithmsTable 9 shows results of machine learning techniques withthe RKM algorithm e results of naıve Bayes with RKMare 9811 9643 9615 9615 and 980478 withrespect to the evaluation matrices Similarly results ofSVM with RKM approaches are 100 100 100 100and 100 in terms of accuracy sensitivity and specificityprecision and F-score respectively e results of randomforest with RKM are 100 100 100 100 and 9802Furthermore the obtained results by using KNN withRKM are 8491 8065 9091 9259 and 8621Lastly from the obtained results it is found that theclassification algorithm is improved by using the RKMalgorithm Figures 8ndash11 display the performance of theclassification algorithms with the RKM algorithm forpredicting kidney diseases

43 Classification Results of Cancer Disease is sectionshows the classification of cancer disease using the existingmachine learning algorithms and the proposed system usingthe RKM algorithm e cancer data contain 569 instancesthat are classified into two classes such as benign and ma-lignant e soft computing RKM clustering algorithm isused to handle ambiguous objects It clusters data into two

Table 4 Results of RKM algorithm for handling ambiguous objectsin the diabetic dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 718 50

8 Journal of Healthcare Engineering

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 9: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

Table 5 Results of existing machine learning for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 6969 7072 66 8819 8235SVM 7316 7379 7209 8168 8235Random forest 76 556 656 556 6002KNN 6580 5679 5658 51 5380

Table 6 Results of machine learning after handling ambiguous objects for diagnosing diabetic diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 8055 8014 8014 90 8478SVM with RKM 7778 7724 7887 8819 8235Random forest with RKM 7720 569 6905 569 6206KNN with RKM 7130 7929 7067 7708 7817

69698055

70728014

66

80148819 90

82358478

0102030405060708090

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 4 Comparison results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for diabetic diseases

7316

77787379

7724

7209

78878168

8819

8235 8235

50

55

60

65

70

75

80

85

90

95

100

SVM SVM with RKM

Precision ()Accuracy ()F-score ()Sensitivity ()

Specificity ()

Figure 5 Comparison results of existing SVM classifier and SVM using RKM algorithm for diabetic diseases

Journal of Healthcare Engineering 9

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 10: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

clusters according to the class label of data e RKMtechnique has clustered data into lower approximation andupper approximation e objects belonged to the lowerapproximation are appropriate objects and are processed byusing a machine learning algorithm However the objectsthat have been clustered into upper approximation areconsidered as ambiguous ones e RKM is clustered into539 objects in the lower approximation and the remainingobjects are in the upper approximation Table 10 demon-strates the results of the RKM algorithm Subsequentlymachine learning is applied to diagnose cancer as benign andmalignant

Table 11 shows the obtained results of conventionalmachine learning for the classification of cancer disease It isnoted that the results need more improvement and theRKM algorithm is applied to enhance the existing machine

learning algorithm Table 12 shows results analysis of theproposed model e results of naıve Bayes with RKM are9444 9495 9365 9592 and 9543 in terms ofaccuracy sensitivity specificity precision and F-scoreSimilarly the results of SVM with RKM approaches are9753 9907 9444 9727 and 9817 regarding ac-curacy sensitivity specificity precision and F-score re-spectively e results of random forest with RKM are9630 9309 9601 95 and 9309 Furthermore the

76 772

556 569656

6905

556 5696002 6206

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 6 Comparison results of existing random forest classifier and random forest using RKM algorithm for diabetic diseases

658713

5679

7929

5658

7067

51

7708

538

7817

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 7 Comparison results of existing KNN classifier and KNN using RKM algorithm for diabetic diseases

Table 7 Results of RKM algorithm for handling ambiguous objectsin kidney dataset

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

10 Journal of Healthcare Engineering

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 11: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

Table 8 Results of existing machine learning for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 9587 8936 8936 9367 9673SVM 502 370 400 420 51Random forest 9807 100 966 100 9802KNN 6957 6003 7439 8265 7686

Table 9 Results of machine learning after handling ambiguous objects for diagnosing kidney diseases

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9811 9643 9615 9615 9804SVM with RKM 100 100 100 100 100Random forest with RKM 100 100 100 100 100KNN with RKM 8491 8065 9091 9259 8621

9587

9811

8936

9643

8936

9615

9367

961596739804

84

86

88

90

92

94

96

98

100

Naiumlve Bayes Naiumlve Bayes with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 8 Comparison of results of existing naıve Bayes classifier and naıve Bayes using RKM algorithm for kidney diseases

502

100

37

100

40

100

42

100

51

100

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 9 Comparison of results of existing SVM classifier and SVM using the RKM algorithm for kidney diseases

Journal of Healthcare Engineering 11

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 12: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

obtained results by using KNN with RKM are 85809505 7049 8421 and 8930 Finally from theobtained results it is investigated that the classification al-gorithms are improved by using a soft clustering RKM al-gorithm Figures 12ndash15 display the performance of theclassification algorithms with the RKM algorithm for thediagnosis of cancer disease From graphic representationsthat are shown it can be noted that the proposed system isbetter

5 Comparative Analysis

In this section a comparative analysis between the proposedmodel and some of the other state-of-the-art work is used inthe same datasetse comparison is very important because

it examines the results of the proposed model e accuracymetric is used to compare the proposed model with theexisting classification algorithms Table 13 shows the resultsof the proposed system and the existing neural networkapproach It is investigated that the results of the proposedsystem are better than those of the existing neural networkapproach

9807 100100 100966 100100 1009802 100

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 10 Comparison of results of existing random forest classifier and random forest using the RKM algorithm for kidney diseases

6957

8491

6003

80657439

90918265

9259

7686

8621

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Precision ()

Specificity ()

Accuracy ()Sensitivity () F-score ()

Figure 11 Comparison results of existing KNN classifier and KNN using the RKM algorithm for kidney diseases

Table 10 Results analysis of machine learning algorithm using theRKM algorithm for the diagnosis of cancer disease

Cluster number Lower approximation Upper approximationCluster 1Cluster 2 174 226

12 Journal of Healthcare Engineering

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 13: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

Table 11 Results of existing machine learning for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes 8889 8824 8986 9278 9045SVM 9591 9649 9444 9727 9201Random forest 9506 902 9605 90 932KNN 7953 8942 6418 7949 8416

Table 12 Results of machine learning after handling ambiguous objects for the diagnosis of cancer disease dataset

Model Accuracy () Sensitivity () Specificity () Precision () F-score ()Naıve Bayes with RKM 9444 9495 9365 9592 9543SVM with RKM 9753 9907 9444 9727 9817Random forest with RKM 9630 9309 9601 950 9309KNN with RKM 8580 9505 7049 8421 8930

88899444

88249495

8986 93659278 95929045

9543

0102030405060708090

100

Naiumlve Bayes with RKMNaiumlve Bayes

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 12 Comparison of results of the existing naıve Bayes classifier and naıve Bayes using the RKM algorithm for cancer disease

9591 97539649 99079444 94449727 9727

92019817

0

10

20

30

40

50

60

70

80

90

100

SVM SVM with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 13 Comparison of results of the existing SVM classifier and SVM using the RKM algorithm for cancer disease

Journal of Healthcare Engineering 13

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 14: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

6 Conclusion

e performance of the existing machine learning isthwarted from diagnosing chronic diseases because of the

availability of ambiguous objects ese ambiguous objectsshow traits in more than one class To identify and processthe ambiguous objects explicitly we have demonstrated thenoncrisp RKM clustering that can handle these ambiguous

9506 963902 93099605 9601

9095932 9309

0

10

20

30

40

50

60

70

80

90

100

Random forest Random forest with RKM

Specificity ()

Accuracy () Precision ()Sensitivity () F-score ()

Figure 14 Comparison of results of the existing random forest classifier and random forest using the RKM algorithm for cancer disease

79538588942

9505

64187049

794984218416

893

0

10

20

30

40

50

60

70

80

90

100

KNN KNN with RKM

Specificity ()

Accuracy ()Sensitivity ()

Precision ()F-score ()

Figure 15 Comparison of results of the existing KNN classifier and KNN using the RKM algorithm for cancer disease

Table 13 Results of the proposed system against the existing neural network approach

Technique References Accuracy () DiseasesANN [45] 804 KidneyGeneral regression neural network [46] 8020 DiabetesBack propagation neural network [47] 9503 KidneyBPNNs [48] 9284 Breast cancerProposed model 100 KidneyProposed model 8055 DiabetesPropose model 9753 Breast cancer

14 Journal of Healthcare Engineering

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 15: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

objects to improve the accuracy of classification algorithmse framework of the proposed system lies in its use of a softclustering algorithm namely rough K-means that can beemployed for modelling ambiguity e rough K-meansclustering can assist in determining the exact class of theambiguous objects or the approximate ones It is observedthat the RKM algorithm has increased the performance ofthe conventional machine algorithms to predict chronicdiseases e ambiguous objects are excluded from chronicdataset erefore the RKM algorithm clustered the datainto lower and upper approximation e objects clusteredin lower approximation are considered as appropriate ob-jects Additionally the objects that belong to the upperapproximation are denied and considered as ambiguousobjects e objects that belong to the lower approximationare proposed by using machine learning algorithms topredict chronic diseases e experimental results demon-strate that the proposed system is successfully employed forthe diagnosis of chronic diseases Comparative analysisresults between existing machine learning algorithms andthe proposed system are presented Moreover it is observedthat the results of the proposed system are superior in termsof accuracy specificity sensitivity precision recall andF-score performance measures Identifying common websearch activity behaviour is regarded as a proxy for chronicdisease risk factors using machine learning algorithms canbe considered in future work

Data Availability

e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

e authors declare no conflicts of interest

Acknowledgments

e authors acknowledge the Deanship of Scientific Re-search at King Faisal University for the financial supportunder Nasher Track (grant no 186163) e authors ac-knowledge the Deanship of Scientific Research at King FaisalUniversity for the financial support

Supplementary Materials

e chronic disease datasets have been collected from dif-ferent resources as follows 1 Diabetics disease dataset thediabetes data collected from the machine learning repositorycontained nine attributes eight features and one class isdataset has been gathered from an automatic electronicrecording device and paper records 2 Breast cancer diseasedataset the cancer data collected from the Kaggle containednine attributes 30 features and 1 class ese features areobtained from digitized images of breast cancer [38] 3Kidney disease dataset the collected kidney data from theKaggle contained 26 attributes 24 features and 1 class [38](Supplementary Materials)

References

[1] httpswwwcdcgov[2] httpswwwwhoint[3] J H Witten and E Frank ldquoData Mining practical machine

learning tools and techniquesrdquo International Journal onCybernetics amp Informatics (IJCI) vol 4 no 4 2015

[4] A V Solanki ldquoData mining techniques using WEKA clas-sification for sickle cell diseaserdquo International Journal ofComputer Science and Information Technology vol 5 no 4pp 5857ndash5860 2014

[5] J Joshi D Rinal and J Patel ldquoDiagnosis and prognosis ofbreast cancer using classification rulesrdquo International Journalof Engineering Research and General Science vol 2 no 6pp 315ndash323 2014

[6] S K David A T Saeb and K Al Rubeaan ldquoComparativeanalysis of data mining tools and classification techniquesusing WEKA in medical bioinformaticsrdquo Computer Engi-neering and Intelligent Systems vol 4 no 13 pp 28ndash38 2013

[7] S Vijayarani and S Sudha ldquoComparative analysis of classi-fication function techniques for heart disease predictionrdquoInternational Journal of Innovative Research in Computer andCommunication Engineering vol 1 no 3 pp 735ndash741 2013

[8] C Sugandhi P Yasodha and M Kannan ldquoAnalysis of apopulation of cataract patient database in WEKA toolrdquo In-ternational Journal of Scientific and Engineering Researchvol 2 no 10 2011

[9] P Yasodha and M Kannan ldquoAnalysis of population of di-abetic patient database in WEKA toolrdquo International Journalof Science and Engineering Research vol 2 no 5 2011

[10] M F Bin Othman and T M S Yau ldquoComparison of differentclassification techniques using WEKA for breast cancerrdquo inProceedings of the 3rd Kuala Lumpur International Conferenceon Biomedical Engineering pp 520ndash523 Springer HeidelbergKuala Lumpur Malaysia December 2006

[11] A Israa and M Azzeh ldquoComparative study of predictingheart diseases by using data mining classificationrdquo Interna-tional Journal of Computer Science and Information Securityvol 14 no 12 2016

[12] D Sisodia and D S Sisodia ldquoPrediction of diabetes usingclassification algorithmsrdquo Procedia Computer Sciencevol 132 pp 1578ndash1585 2018

[13] F A Syed A Aryati S Wajihah and M R Shahir ldquoClas-sification of obesity childhood among 6 school children usingdata mining techniquerdquo International Conference on SoftComputing and Data Mining p 549 2016

[14] J Sandeep R Mula and B Kuriachan ldquoChronic kidneydisease analysis using machine learning algorithmsrdquo Inter-national Journal for Research in Applied Science amp EngineeringTechnology vol 6 no 1 pp 3367ndash3379 2018

[15] B J Sahana and Minavathi ldquoKidney disease prediction usingdata mining classification techniquesand ANNrdquo InternationalJournal of Innovative Research in Computer and Communi-cation Engineering vol 4 no 5 pp 8675ndash8679 2017

[16] K Polaraju and D D Prasad ldquoPrediction of heart diseaseusing multiple linear regression modelrdquo International Journalof Engineering Development and Research Development vol 5no 4 pp 1419ndash1425 2017

[17] C Kim Y Son and S Youm ldquoChronic disease predictionusing character-recurrent neural network in the presence ofmissing informationrdquo Applied Sciences vol 9 no 10 p 21702019

[18] K Ng S Steinbuhl C deFilippi S Dey and W StewartldquoEarly detection of heart failure using electronic health

Journal of Healthcare Engineering 15

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering

Page 16: SoftClusteringforEnhancingtheDiagnosisofChronic ... · algorithms on the Weka tool for predicting the prevalent ... rithms for diagnosing breast cancer because of its high performance

records practical implications for time before diagnosis datadiversity data quantity and data densityrdquo Journal of Patient-Centered Research and Reviews vol 4 no 3 pp 174-1752017

[19] X Zhang H Zhao S Zhang and R Li ldquoA novel deep neuralnetwork model for multi-label chronic disease predictionrdquoFrontiers in Genetics vol 10 2019

[20] H Kriplani B Patel and S Roy ldquoPrediction of chronic kidneydiseases using deep artificial neural network techniquerdquoComputer Aided Intervention and Diagnostics in Clinical andMedical Images Springer Berlin Germany pp 179ndash187 2019

[21] J Liu Z Zhang and N Razavian ldquoDeep EHR chronic diseaseprediction using medical notesrdquo 2018 httpsarxivorgabs180804928

[22] T S Brisimi T Xu T Wang W Dai W G Adams andI C Paschalidis ldquoPredicting chronic disease hospitalizationsfrom electronic health records an interpretable classificationapproachrdquo Proceedings of the IEEE vol 106 no 4 pp 690ndash707 2018

[23] M Chen Y Hao K Hwang L Wang and L Wang ldquoDiseaseprediction by machine learning over big data from healthcarecommunitiesrdquo IEEE Access vol 5 pp 8869ndash8879 2017

[24] J Patel T Upadhyay and S Patel ldquoHeart disease predictionusing machine learning and data mining techniquerdquo Inter-national Journal of Computer Science amp Communicationvol 7 no 1 pp 129ndash137 2016

[25] K Deepika and S S Seema ldquoPredictive analytics to preventand control chronic diseaserdquo in Proceedings of the Interna-tional Conference on Applied and Aeoretical Computing andCommunication Technology (iCATccT) pp 381ndash386 IEEEBangalore India July 2016

[26] A Ul Haq J P Li M H Memon S Nazir and R Sun ldquoAhybrid intelligent system framework for the prediction ofheart disease using machine learning algorithmsrdquo MobileInformation System vol 2018 Article ID 3860146 21 pages2018

[27] S Ahmed M T Kabir N T Mahmood and R M RahmanldquoDiagnosis of kidney disease using fuzzy expert systemrdquo inProceedings of the 8th International Conference on SoftwareKnowledge Information Management and Applications(SKIMA 2014) IEEE Dhaka Bangladesh December 2014

[28] G Caocci R Baccoli R Littera S Orru C Carcassi and G LaNasa ldquoComparison between an ANN and logistic regressionin predicting long term kidney transplantation outcomerdquo inArtificial Neural NetworksmdashArchitectures and Applicationspp 106ndash124 IntechOpen Philadelphia PA USA 2012

[29] L Xun W Xiaoping L Ningshan and L Tanqi ldquoApplicationof radial basis function neural network to estimate glomerularfiltration rate in Chinese patients with chronic kidney dis-easerdquo in Proceedings of the International Conference onComputer Application and System Modeling (ICCASM)vol 15 pp V15ndashV332 IEEE Taiyuan China October 2010

[30] H Polat H D Mehr and A Cetin ldquoDiagnosis of chronickidney disease based on support vector machine by featureselection methodsrdquo Systems-Level Quality Improvementvol 41 no 55 pp 2ndash11 2017

[31] V Anuja Kumari and R Chitra ldquoClassification of diabetesdisease using support vector machinerdquo International Journalof Engineering Research and Applications (IJERA) vol 3 no 2pp 1797ndash1801 2013

[32] A Hamza and H Moetque ldquoDiabetes disease diagnosismethod based on feature extraction using K-SVMrdquo Inter-national Journal of Advanced Computer Science and Appli-cations vol 8 no 1 p 236 2017

[33] A Mert N Kilic and A Akan ldquoBreast cancer classification byusing support vector machines with reduced dimensionrdquo inProceedings of the 53rd International Symposium ELMARZadar Croatia September 2011

[34] D West P Mangiameli R Rampal and V West ldquoEnsemblestrategies for a medical diagnostic decision support system abreast cancer diagnosis applicationrdquo European Journal ofOperational Research vol 162 no 2 pp 532ndash551 2005

[35] R Gautam P Kaur and M Sharma ldquoA comprehensive re-view on nature inspired computing algorithms for the di-agnosis of chronic disorders in human beingsrdquo Progress inArtificial Intelligence vol 8 no 4 pp 401ndash424 2019

[36] E M Rojas H B R Moreno and J S M Palencia ldquoCon-tributions of machine learning in the health area as support inthe diagnosis and care of chronic diseasesrdquo Innovation inMedicine and Healthcare Systems and Multimedia Springervol 145 pp 261ndash269 Berlin Germany 2019

[37] httparchiveicsuciedumldatasetsdiabetes[38] httpswwwkagglecomucimlbreast-cancer-wisconsin-data[39] T H Aldhyani andM R Joshi ldquoHandling ambiguous packets

in intrusion detectionrdquo in Proceedings of the 3rd InternationalConference on Signal Processing Communication and Net-working (ICSCN) IEEE Chennai India March 2015

[40] P Lingras and C West ldquoInterval set clustering of web userswith rough k-meansrdquo Journal of Intelligent InformationSystems vol 23 no 1 pp 5ndash16 2004

[41] J A Hartigan and M A Wong ldquoAlgorithm AS 136 ak-means clustering algorithmrdquo Applied Statistics vol 28no 1 pp 100ndash108 1979

[42] Q MacQueen ldquoSome methods for classification and analysisof multivariate observationsrdquordquo in Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Proba-bility L M L Cam and J Neyman Eds vol 1 pp 281ndash297University of California Press Berkeley CA USA 1967

[43] G Peters ldquoSome refinements of rough -means clusteringrdquoPattern Recognition vol 39 no 8 pp 1481ndash1491 2006

[44] M Joshi and P Lingras ldquoEvidential clustering or roughclustering the choice is yoursrdquo in Rough Sets and KnowledgeTechnology LNCS T Li H S Nguyen G Wang et al Edsvol 7414 pp 123ndash128 Springer Heidelberg Germany 2012

[45] S Ramya and N Radha ldquoDiagnosis of chronic kidney diseaseusing machine learning algorithmsrdquo International Journal ofInnovative Research in Computer and Communication Engi-neering vol 4 no 1 pp 812ndash820 2016

[46] K Kayaer and T Yldrm ldquoMedical diagnosis on Pima Indiandiabetes using general regression neural networksrdquo in Pro-ceedings of the International Conference on Artificial NeuralNetworks and Neural Information Processing (ICANNICONIP) pp 181ndash184 Daegu South Korea June 2013

[47] B V Ravindra N Sriraam and M Geetha ldquoChronic kidneydisease detection using back propagation neural networkclassifierrdquo in Proceedings of the International Conference onCommunication Computing and Internet of Aingsvol 65ndash68 pp 15ndash17 Chennai India February 2018

[48] A Marcano-Cedentildeo F S Buendıa-Buendıa and D AndinaBreast Cancer Classification Applying Artificial MetaplasticitySpringer-Verlag Berlin-Heidelberg Germany 2009

16 Journal of Healthcare Engineering