16
Research Article Supervised Clustering Based on DPClusO: Prediction of Plant-Disease Relations Using Jamu Formulas of KNApSAcK Database Sony Hartono Wijaya, 1,2 Husnawati Husnawati, 3 Farit Mochamad Afendi, 4 Irmanida Batubara, 5 Latifah K. Darusman, 5 Md. Altaf-Ul-Amin, 1 Tetsuo Sato, 1 Naoaki Ono, 1 Tadao Sugiura, 1 and Shigehiko Kanaya 1 1 Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan 2 Department of Computer Science, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 3 Department of Biochemistry, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 4 Department of Statistics, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia 5 Biopharmaca Research Center, Bogor Agricultural University, Kampus IPB Taman Kencana, Jl. Taman Kencana No. 3, Bogor 16151, Indonesia Correspondence should be addressed to Shigehiko Kanaya; [email protected] Received 30 November 2013; Accepted 18 February 2014; Published 7 April 2014 Academic Editor: Samuel Kuria Kiboi Copyright © 2014 Sony Hartono Wijaya et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Indonesia has the largest medicinal plant species in the world and these plants are used as Jamu medicines. Jamu medicines are popular traditional medicines from Indonesia and we need to systemize the formulation of Jamu and develop basic scientific principles of Jamu to meet the requirement of Indonesian Healthcare System. We propose a new approach to predict the relation between plant and disease using network analysis and supervised clustering. At the preliminary step, we assigned 3138 Jamu formulas to 116 diseases of International Classification of Diseases (ver. 10) which belong to 18 classes of disease from National Center for Biotechnology Information. e correlation measures between Jamu pairs were determined based on their ingredient similarity. Networks are constructed and analyzed by selecting highly correlated Jamu pairs. Clusters were then generated by using the network clustering algorithm DPClusO. By using matching score of a cluster, the dominant disease and high frequency plant associated to the cluster are determined. e plant to disease relations predicted by our method were evaluated in the context of previously published results and were found to produce around 90% successful predictions. 1. Introduction Big data biology, which is a discipline of data-intensive science, has emerged because of the rapid increasing of data in omics fields such as genomics, transcriptomics, proteomics, and metabolomics as well as in several other fields such as ethnomedicinal survey. e number of medic- inal plants is estimated to be 40,000 to 70,000 around the world [1] and many countries utilize these plants as blended herbal medicines, for example, China (traditional Chinese medicine), Japan (Kampo medicine), India (Ayurveda, Sid- dha, and Unani), and Indonesia (Jamu). Nowadays, the use of traditional medicines is rapidly increasing [2, 3]. ese medicines consist of ingredients made from plants, animals, minerals, or combination of them. e traditional medicines have been used for generations for treatments of diseases or maintaining health of people and the most popular form of traditional medicine is herbal medicine. Blended herbal medicines as well as single herb medicines include a large number of constituent substances which exert effects on human physiology through a variety of biological pathways. e KNApSAcK Family database systems can be used to comprehensively understand the medicinal usage of plants based upon traditional and modern knowledge [4, 5]. is Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 831751, 15 pages http://dx.doi.org/10.1155/2014/831751

Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

Embed Size (px)

Citation preview

Page 1: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

Research ArticleSupervised Clustering Based on DPClusO:Prediction of Plant-Disease Relations Using JamuFormulas of KNApSAcK Database

Sony Hartono Wijaya,1,2 Husnawati Husnawati,3 Farit Mochamad Afendi,4

Irmanida Batubara,5 Latifah K. Darusman,5 Md. Altaf-Ul-Amin,1 Tetsuo Sato,1

Naoaki Ono,1 Tadao Sugiura,1 and Shigehiko Kanaya1

1 Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0192, Japan2Department of Computer Science, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia3 Department of Biochemistry, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia4Department of Statistics, Bogor Agricultural University, Kampus IPB Dramaga, Jl. Meranti, Bogor 16680, Indonesia5 Biopharmaca Research Center, Bogor Agricultural University, Kampus IPB Taman Kencana, Jl. Taman Kencana No. 3,Bogor 16151, Indonesia

Correspondence should be addressed to Shigehiko Kanaya; [email protected]

Received 30 November 2013; Accepted 18 February 2014; Published 7 April 2014

Academic Editor: Samuel Kuria Kiboi

Copyright © 2014 Sony Hartono Wijaya et al. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

Indonesia has the largest medicinal plant species in the world and these plants are used as Jamu medicines. Jamu medicines arepopular traditional medicines from Indonesia and we need to systemize the formulation of Jamu and develop basic scientificprinciples of Jamu to meet the requirement of Indonesian Healthcare System. We propose a new approach to predict the relationbetween plant and disease using network analysis and supervised clustering. At the preliminary step, we assigned 3138 Jamuformulas to 116 diseases of International Classification of Diseases (ver. 10) which belong to 18 classes of disease from NationalCenter for Biotechnology Information. The correlation measures between Jamu pairs were determined based on their ingredientsimilarity. Networks are constructed and analyzed by selecting highly correlated Jamu pairs. Clusters were then generated by usingthe network clustering algorithm DPClusO. By using matching score of a cluster, the dominant disease and high frequency plantassociated to the cluster are determined. The plant to disease relations predicted by our method were evaluated in the context ofpreviously published results and were found to produce around 90% successful predictions.

1. Introduction

Big data biology, which is a discipline of data-intensivescience, has emerged because of the rapid increasing ofdata in omics fields such as genomics, transcriptomics,proteomics, and metabolomics as well as in several otherfields such as ethnomedicinal survey. The number of medic-inal plants is estimated to be 40,000 to 70,000 around theworld [1] and many countries utilize these plants as blendedherbal medicines, for example, China (traditional Chinesemedicine), Japan (Kampo medicine), India (Ayurveda, Sid-dha, and Unani), and Indonesia (Jamu). Nowadays, the use

of traditional medicines is rapidly increasing [2, 3]. Thesemedicines consist of ingredients made from plants, animals,minerals, or combination of them.The traditional medicineshave been used for generations for treatments of diseasesor maintaining health of people and the most popular formof traditional medicine is herbal medicine. Blended herbalmedicines as well as single herb medicines include a largenumber of constituent substances which exert effects onhuman physiology through a variety of biological pathways.The KNApSAcK Family database systems can be used tocomprehensively understand the medicinal usage of plantsbased upon traditional and modern knowledge [4, 5]. This

Hindawi Publishing CorporationBioMed Research InternationalVolume 2014, Article ID 831751, 15 pageshttp://dx.doi.org/10.1155/2014/831751

Page 2: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

2 BioMed Research International

Table 1: List of diseases using International Classification of Dis-eases ver. 10 (class of disease IDs correspond to Table 2).

ID Disease Class ofdisease

1 Abdominal pain 32 Abdominal pain, diarrhea 33 Acne 164 Acne, skin problems (cosmetics) 165 Amenorrhoea, dysmenorrhea 66 Amenorrhoea, irregular menstruation 67 Anaemia 18 Appendicitis, urinary tract infection, tonsillitis 39 Arthralgia 1110 Arthralgia, arthritis 1111 Asthma 1512 Benign prostatic hyperplasia (Bph) 1013 Breast disorder 614 Bromhidrosis 1615 Bronchitis 1516 Cancer 217 Cancer pain 218 Cancer, inflammation 219 Colic abdomen, bloating (in infant) 320 Common cold 1521 Common cold, dyspepsia, insect bites 15, 3, 1622 Common cold, influenza 1523 Cough 1524 Degenerative disease 1425 Dermatitis, urticaria, erythema 1626 Diabetes 1427 Diabetic gangrene 1628 Diarrhea 329 Diarrhea, abdominal pain 330 Diseases of the eye 531 Disorders in pregnancy 632 Dysmenorrhea 633 Dysmenorrhea, irregular menstruation 634 Dysmenorrhea, menstrual syndrome 635 Dyspepsia 336 Dyspnoea 1537 Dyspnoea, cough, orthopnoea 1538 Fatigue 1139 Fatigue, anaemia, loss appetite 140 Fatigue, lack of sexual function 641 Fatigue, low back pain 1142 Fatigue, myalgia, arthralgia 1143 Fatigue, osteoarthritis 1144 Fertility problem 6, 1045 Fever 0

Table 1: Continued.

ID Disease Class ofdisease

46 Gastritis, gastric ulcer 347 Haemorrhoids 148 Headache 1349 Heart diseases 850 Heartburn 3, 851 Hepatitis, other diseases of liver 352 Hypercholesterolaemia 1453 Hypertension 854 Hypertension, diabetes 1455 Hypertension, hypercholesterolaemia 1456 Hyperuricemia 157 Immunodefficiency 958 Indigestion (K.30) 359 Indigestion, lose appetite 360 Infertility 6, 10

61 Irregular menstruation, menstruationsyndrome 6

62 Kidney diseases 1763 Lactation problems 664 Leukorrhoea (Vaginalis) 665 Leukorrhoea (Vaginalis), dysmenorrhoea 666 Lose appetite 367 Lose appetite, underweight 1468 Low back pain, myalgia, arthralgia 1169 Low back pain, myalgia, constipation 1170 Low back pain, urinary tract infection 1771 Lung diseases 1572 Malaise and Fatigue 1173 Malaise and Fatigue, Constipation 1174 Malaise and Fatigue, Fertility Problems 10, 1175 Malaise and Fatigue, Low Back Pain 1176 Malaise and Fatigue, Sexual Dysfunction 11, 6, 10

77 Malaise and Fatigue, Skin Problems(Cosmetics) 16

78 Malaria, anaemia 179 Meno-metrorrhagia 680 Menopausal syndrome 6

81 Menopause/menstrual syndrome, leukorrhoea(vaginalis) 6

82 Menstrual syndrome 683 Menstrual syndrome, fatigue 684 Migraine 1385 Mood disorder 1886 Myalgia, arthralgia 1187 Nausea/vomiting of pregnancy 688 Osteoarthritis 1189 Osteoarthritis, fatigue 11

Page 3: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 3

Table 1: Continued.

ID Disease Class ofdisease

90 Overweight, obesity 1491 Paralysis 1392 Post partum syndrome 693 Prevent from overweight 1494 Respiratory infection due to smoking 1595 Respiratory tract infection 1596 Rheumatoid arthritis, gout 1197 Secondary amenorrhea 698 Secondary amenorrhea, irregular menstruation 699 Sexual dysfunction, fatigue 6, 10100 Skin diseases 16101 Skin problems (cosmetics) 16102 Sleeping and Mood Disorders 18103 Sleeping disorders 18104 Stomatitis 3105 Stomatitis, gingivitis, tonsilitis 3106 Stone in kidney (N20.0) 17

107 Stone in kidney (N20.0), urinary bladder stone(N21.0) 17

108 Tonsilitis 4109 Tonsilofaringitis 4110 Toothache 13111 Typhoid, dyspepsia 3112 Ulcer of anus and rectum 3113 Underweight, lose appetite 3114 Urinary tract infection (urethritis) 17115 Vaginal discharges 6116 Vaginal diseases 6

database has information about the selected herbal ingre-dients, that is, the formulas of Kampo and Jamu, omicsinformation of plants and humans, and physiological activ-ities in humans. Jamu is generally composed based on theexperience of the users for decades or even hundreds ofyears. However, versatile scientific analyses are needed tosupport their efficacy and their safety. Attaining this objectiveis in accordance with the 2010 policy of the Ministry ofHealth of Indonesian Government about scientification ofJamu. Thus, it is required to systemize the formulationsand develop basic scientific principles of Jamu to meet therequirement of Indonesian Healthcare System. Afendi et al.initiated and conducted scientific analysis of Jamu for findingthe correlation between plants, Jamu, and their efficacy usingstatistical methods [6–8]. They used Biplot, partial leastsquares (PLS), and bootstrapping methods to summarize thedata and also focused on prediction of Jamu formulations.These methods give a good understanding about relationshipbetween plants, Jamu, and their efficacy. Among 465 plantsused in 3138 Jamu, 190 plants were shown to be effectivefor at least one efficacy and these plants were considered

to be the main ingredients of Jamu. The other 275 plantsare considered to be supporting ingredients in Jamu becausetheir efficacy has not been established yet.

Network biology can be defined as the study of thenetwork representations of molecular interactions, both toanalyze such networks and to use them as a tool to makebiological predictions [9]. This study includes modelling,analysis, and visualizations, which holds important task inlife science today [10]. Network analysis has been increasinglyutilized in interpreting high throughput data on omics infor-mation, including transcriptional regulatory networks [11],coexpression networks [12], and protein-protein interactions[13]. We can easily describe relationship between entities inthe network and also concentrate on part of the networkconsisting of important nodes or edges.These advantages canbe adopted for analyzing medicinal usage of plants in Jamuand diseases. Network analysis provides information aboutgroups of Jamu that are closely related to each other in termsof ingredient similarity and thus allows precise investigationto relate plants to diseases. On the other hand, multivariatestatisticalmethods such as PLS can assign plants to efficacy byglobal linear modeling of the Jamu ingredients and efficacy.However, there is still lack of appropriate network basedmethods to learn how and why many plants are grouped incertain Jamu formula and the combination rule embeddingnumerous Jamu formulas.

It is needed to explore the relationship between Indone-sian herbal plants used in Jamu medicines and the diseaseswhich are treated using Jamu medicines. When effectivenessof a plant against a disease is firmly established, then furtheranalysis about that plant can be proceeded to molecular levelto pinpoint the drug targets. The present study developeda network based approach for prediction of plant-diseaserelations. We utilized the Jamu data from the KNApSAcKdatabase. A Jamu network was constructed based on thesimilarity of their ingredients and then Jamu clusters weregenerated using the network clustering algorithm DPClusO[14, 15]. Plant-disease relations were then predicted by deter-mining the dominant diseases and plants associated withselected Jamu clusters.

2. Methods

2.1. Concept of the Methodology. Jamu medicines consistof combination of medicinal plants and are used to treatversatile diseases. In this work we exploit the ingredientsimilarity between Jamu medicines to predict plant-diseaserelations. The concept of the proposed method is depictedin Figure 1. In step 1 a network is constructed where a nodeis a Jamu medicine and an edge represents high ingredientsimilarity between the corresponding Jamu pair. In Figure 1,the nodes of the same color indicate the Jamumedicines usedfor the same disease.The similarity is represented by Pearsoncorrelation coefficient [16, 17]; that is,

corr (𝑋, 𝑌) =∑𝑙

𝑖=1(𝑥𝑖 − 𝑥) (𝑦𝑖 − 𝑦)

√∑𝑙

𝑖=1(𝑥𝑖 − 𝑥)

2

∑𝑙

𝑖=1(𝑦𝑖 − 𝑦)

2, (1)

Page 4: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

4 BioMed Research International

Table 2: Distribution of Jamu formulas according to 18 classes of disease (classes of diseases are determined by NCBI in ID1 to ID16 and bythe present study in ID17 and ID18 represented by asterisks in Ref. columns).

ID Class of disease (NCBI) Ref. Number of Jamu Percentage1 Blood and lymph diseases NCBI 201 6.412 Cancers NCBI 32 1.023 The digestive system NCBI 457 14.564 Ear, nose, and throat NCBI 2 0.065 Diseases of the eye NCBI 1 0.036 Female-specific diseases NCBI 382 12.177 Glands and hormones NCBI 0 —8 The heart and blood vessels NCBI 57 1.829 Diseases of the immune system NCBI 22 0.7010 Male-specific diseases NCBI 17 0.5411 Muscle and bone NCBI 649 20.6812 Neonatal diseases NCBI 0 —13 The nervous system NCBI 32 1.0214 Nutritional and metabolic diseases NCBI 576 18.3615 Respiratory diseases NCBI 313 9.9716 Skin and connective tissue NCBI 163 5.1917 The urinary system ∗ 90 2.8718 Mental and behavioral disorders ∗ 21 0.67

The number of Jamu classified into multiple disease classes 119 3.79The number of Jamu unclassified 4 0.13Total Jamu formulas 3138 100.00

where 𝑥𝑖 is the weight of plant-𝑖 in Jamu 𝑋, 𝑦𝑖 is the weightof plant-𝑖 in Jamu 𝑌, 𝑥 is mean of Jamu 𝑋, and 𝑦 is meanof Jamu 𝑌. The higher similarity between Jamu pairs thehigher the correlation value. In the present study, 𝑥𝑖 and𝑦𝑖 are assigned as 1 or 0 in cases the 𝑖th plant is, respec-tively, included or not included in the formula. Under suchcondition, Pearson correlation corresponds to fourfold pointcorrelation coefficient; that is,

corr (𝑋, 𝑌) = 𝑎𝑑 − 𝑏𝑐

√(𝑎 + 𝑏) (𝑎 + 𝑐) (𝑏 + 𝑑) (𝑐 + 𝑑), (2)

where 𝑎, 𝑏, 𝑐, and 𝑑 represent the numbers of plants includedin both 𝑋 and 𝑌, in only 𝑋, in only 𝑌, and in neither 𝑋 nor𝑌, respectively.

In step 2 the Jamu clusters are generated using net-work clustering algorithm DPClusO. DPClusO can generateclusters characterized by high density and identified byperiphery; that is, the Jamu medicines belonging to a clusterare highly cohesive and separated by a natural boundary. Suchclusters contain potential information about plant-diseaserelations.

In step 3 we assess disease-dominant clusters based onmatching score represented by the following equation:

matching score

=number of Jamu belonging to the same disease

total number of Jamu in the cluster.

(3)

Matching score of a cluster is the ratio of the highest numberof Jamu associated with a single disease to the total numberof Jamu in the cluster. We assign a disease to a cluster forwhich the matching score is greater than a threshold value.In step 4, we determine the frequency of plants associatedwith a cluster if and only if a disease is assigned to it in theprevious step. The highest frequency plant associated to acluster is considered to be related to the disease assigned tothat cluster. True positive rates (TPR) or sensitivity was usedto evaluate resulting plants. TPR is the proportion of the truepositive predictions out of all the true predictions, defined bythe following formula [18]:

TPR = TPTP + FN

, (4)

where true positive (TP) is the number of correctly classifiedand false negative (FN) is the number of incorrectly rejectedentities. We refer to the proposed method as supervisedclustering because after generation of the clusters we narrowdown the candidate clusters for further analysis based onsupervised learning and thus improve the accuracy of predic-tion of the proposed method.

3. Result and Discussion

3.1. Construction and Comparison of Jamu and RandomNetworks. We used the same number of Jamu formulas fromprevious research [6], 3138 Jamu formulas, and the set union

Page 5: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 5

A BC D

DCBA

Step 1

Constructing ingredient correlation network

Step 2

Extracting highly connected Jamu

Step 3

Supervised analysis for voting utilization

Step 4

Listing ingredients

Input: Jamu formulas

Output: plant-disease relations

Figure 1: Concept of the methodology: network construction based on ingredient similarity between individual Jamu medicines, networkclustering, and classification of medicinal plants to dominant disease.

J02538

J01667

J01392

J02594J00533

J00527J00526

J01267

J01407

J02548

J02484

J01406

J02338

J00379

J02195

J01401

J01760

J02570

J00525

J01669

J01666

J01665

J01399

J01659

J01400

J01398

J01875

J02712

J01452 J00942

J02752

J02148

J01448

J02867

J00518

J00374

J02931J01864

J02100

J00168

J01623J02806

J02154

J01205

J01185

J00778J00420

J01898J01899

J01897J01322J00709J01187

J02194J01281

J01241

J00358

J01057

J00365

J03028J02122

J02688

J00842

J00164

J00166J00929

J00876

J01670

J01473

J00107

J00174J02397

J00707J01148

J00430

J02762

J03029

J01209

J01207

J00195

J01463

J01634

J00253

J00605J01269

J00142J00022J03040

J00824J00917

J01709J01352

J01459

J00023

J01321

J01208

J00114

J01310

J01263J01251

J00292

J00199J02329J02328

J02119

J01163

J00686

J00321

J01602J02268J02905

J02071J01430

J02012

J02718J01787

J01800

J02189

J02657

J01725

J00496

J02146J00658

J01833

J02940J01673

J00111

J02359J02522

J01726J02198J01455

J01451

J02723 J00261

J00262

J00960

J01881

J01450

J03030

J02932

J02021

J00201J00412

J00380

J00838

J01841

J02938

J02549

J02537J01171

J02020

J02881

J00318

J01210

J02979J01005

J01303

J00494

J01926J01629

J03125

J00989J02459J00363

J00495

J00241

J03131

J02011

J00766

J01636

J00402

J03089

J03004

J01090

J02103

J02886

J02390

J01359

J02879

J02848J02660

J02855J03097J02789

J02261

J01361

J01053J01501

J00068J02754J02263J02374

J01698

J03088J02969J01863

J02203

J00758

J01528J01498J01565

J01615

J02031

J02354J03095

J01432

J02469 J02039

J02138

J01548

J01434J01576

J03101J02658J01908J02925

J02342J02811J02497J02339

J01837J00157

J02045J02373

J02568

J02663J01614J00227J00102

J02259J02530

J02157

J02788J00067

J01827J00798J02124J02260

J02840

J02793J02792

J02022J01983

J01690J00986

J00821

J01431J02856

J03100J01502J02655

J02036

J02262

J02137

J01551J00332

J00088

J02666J02400J02884

J02831

J01704J01680

J00898

J02037J01855J01368

J02204J03115J02388J03086

J01685

J01270

J01122

J01649J02326J01831J03085

J00054J01700J01064J02317

J00307J00286

J02950J02075

J01306

J01006

J00135

J00417

J00384

J02061

J00271

J00698

J01029

J00419

J00627

J02398

J01964

J02005J02040

J00141

J01045

J00165

J01189J00354

J01013

J00353

J00395J00409

J01166

J00429

J01799J00752

J00747J03090

J00039

J02356

J02010

J01020

J01225

J01044

J02472

J00276

J02536

J02346

J00828

J00759

J02526

J02922

J03081

J00280

J01689

J02126

J02656J02893

J02812J02847

J01832J01909

J02123J02156

J01697J01681

J01861

J02029J02159J02033

J02152J02028J01550

J01353J01609J01377J02830J01862

J00734J00331

J02871J00145

J01393J00233

J01845J00149J01500J02135

J00158

J01569J01617

J02670

J02662J02136

J02032J02751J01648

J02038J02653

J02202J01616

J00985

J00071

J03082

J01579

J01825

J02803J01394J00891

J00160J00147

J00894J00958J02387

J01786

J00957

J00895

J02904

J01887

J01023J02843

J01170

J01695

J00312

J00311J01552

J00554 J01868

J02223J02790J02616

J01879J03114J02732J01679J01316

J00732J00191

J02737J03093

J02027

J02997

J02255J01542J00298

J02486

J01640

J02639

J02640

J01655

J02389

J02818

J02816

J01554

J02633

J01668

J01671

J02303

J00990

J01635

J01637

J01639

J02206

J02967

J01672

J02703

J01497 J01641

J01638

J02554

J02116

J02961

J02702

J02493

J03000

J00362

J02665J02827

J00656

J01068

J00105

J00983

J02406

J01625

J01490

J02528

J01829J02701

J02098

J01227J00978

J00060

J02634

J02934J01584

J00214J02696

J02637J02093

J02635

J01867

J02324

J02807

J02707

J03108

J01656 J02987J01657J02672

J01445

J01596

J02275J01537J02697

J01980

J02286J02768

J01060

J00737J00435

J03010J02242

J02971

J00984

J01418

J01810

J01462

J00398J00757

J00729J02281

J00193

J02295

J00754

J00626J02708

J02251

J03110

J02048

J00753J01707

J01475

J00750

J00515

J01486J01593

J00364 J01582 J02651 J00070 J02619 J01107

J00238 J01581 J02650 J00069 J02618 J02846J01435J02569J00339J00563J02911J01108

J02842J00018J00954J00316J00556J02890

J00809

J02277J01478

J00465

J02288

J02626

J02499

J00760

J02309

J01558

J02291

J02648J00530

J00308

J01423

J02046

J02076

J00139

J01351

J01627

J02243

J01755

J01866

J02998

J01950 J00858J01734J01098

J02439

J01873

J02233

J01338

J01097

J01246

J00098

J00702

J01761J00851

J00113J00679J02217

J00375

J00704

J03003

J01002

J01803

J00901

J02423

J03044

J00436

J02695

J00546J01426 J00080

J00028J00186

J02516

J01139 J02441 J01389

J02442

J02193

J02212J01041

J01290

J01421

J01157

J01840

J00573

J01870

J02711

J02822

J01960

J00315

J00127

J02722

J02111

J00245

J02815J03046

J00117J00291

J01764

J01513

J01938 J00250 J00258

J03017

J02360

J01939J00422

J00904J02933J01645

J02256

J00879J01177J02557

J00540

J02140J02234

J02543J02627

J02130

J01051

J01590J01591

J01519

J01138J03024J00192

J01972J01975 J02623

J01971J01509

J01483

J00473

J01425

J01650J01842

J01922

J00489

J01324J01654

J02058

J01244

J02629

J00322

J00862J01533J01532J00259

J00918

J02721

J00899J00247J00900

J01265

J02283J02290J01653

J02963

J02249

J01601

J02000

J02227J00885

J01701J02572

J01481J01117

J01420

J01485J02049J00021J02825

J00811J03096

J02172

J00666

J03105

J00788

J02118

J00773

J00202

J03063

J00628J00638

J00864

J02369

J00644

J00589J01716

J00782

J00664J00637

J00722

J01017J00643

J00416

J03027J01232J01199

J01224J01201J01075

J00956J03130

J01135

J00034

J00007J00781

J00780

J00793

J02781J00661

J00593

J02427J02425

J00802

J03034

J03020

J00716

J02383

J00951J00787

J01223J01197

J01198

J00008

J01283

J01235

J00641

J00036

J01356

J03055

J01985

J00467

J01412

J01917

J02013

J00440

J02254J01776

J01792 J00739J01771

J01706

J02447J00019

J00619

J00624J00587J02451

J01768

J00376

J01749

J01746

J00834J00503

J02284

J00408

J01894

J01878

J01691

J00172

J00506

J02405

J00076

J01422

J02403

J00320J01708

J00812

J00220J00219J01573

J03136J00046

J00432J00367

J02571J00221J01770

J02253J00283

J00063J00064J00319

J01769J01692J00194

J01774J00222J02617

J00448J02970

J01248

J00703

J01355

J02347

J01229J02580

J02966

J02964

J00138

J00180

J00044J00196

J02835J01632

J02047J00300

J01773

J01882

J01748

J01713

J01443

J01427J00519

J01693

J02404

J01647

J02197

J01772

J00058

J01048

J00539J00908

J00282

J00100

J00323

J02114

J02009

J01388

J00653

J02024

J00225

J00324J02121

J01687

J01184

J02738J02561

J02336

J02725J00471

J01402

J02735J01856

J01247

J02652

J00582

J01775

J01793

J01676

J00594J00065

J01710

J02026

J01610

J02467

J01404

J00014J00013

J02476

J01416J00481

J00392

J00003J00950

J02208

J01410

J00049

J00931J03038

J03026

J03039

J00459

J03051

J02785

J02365

J03052

J02368

J00863J02191

J03043J03037

J03057

J01951J01220

J01203

J00461

J01811 J01192

J01997

J03048

J00969J01642

J00083J00971

J00377J02517

J00352

J01390J00575

J01114J00167

J02192J00004

J00700

J00551J02985J00152

J02099

J01743J00151

J00434

J01240

J00907

J01923

J03116

J01664

J00998

J00236

J03117

J03137

J02756

J00578

J01530

J00335J00096

J00896

J02115

J02141J00097J00699

J02611

J01903J00301

J02107J03103

J02276

J00101

J03133

J00056

J01815J01791J02158

J02943

J00187

J01477J01988J00125

J02714J02942J02959

J02372J01987J00938J00334J00178

J00078

J02272J02632

J02375J02960

J01797J00218

J00189J00094

J01900

J03064

J00770J02509

J01682J00991

J00631

J02409J02228

J02132

J02069

J01955J01936

J00179

J00667

J00617

J00584J00678

J01914J01727

J00622J02542

J02686

J00443

J00579J00414

J01160

J02381

J02462

J00600J01913

J00275

J01262

J01259

J01257

J02413

J02216

J00771

J00630

J02664

J00639

J00621

J00884

J01805

J01928

J00484

J01102J01766

J01896

J00257

J00887

J01893

J01460J02186

J00346J00343 J00349 J00994

J00604J03113

J02797J02169

J00997

J00406 J00687

J00728

J00726

J00410

J00508

J00911

J03041

J00683

J00947J00801J00134

J00910

J00888

J00170

J00705

J02939J02167

J00889

J02539J01054J01164J00350

J00949

J03032

J02171J02431J01115

J00795

J03119J00243

J01403J00502 J00762

J03120

J02019

J00640J02923

J01777J02869J02896

J02872J02841

J00235

J01646

J02892

J02727

J01780

J00614J02885

J00633

J00592

J03087J01279

J00646

J01993

J02929

J01071

J00988

J03134J02996

J01912

J02105J01597 J01531

J00413

J01630

J02724

J00333

J00955

J00675

J00234

J00810

J00684

J01789

J02207 J02742

J03056

J02487

J00571

J01499J02974

J03033

J00351

J01295

J00464

J01696

J00348

J02507

J01883

J02579

J01991J00823

J00162

J02927

J02687

J02743J03075

J01077

J00873J01911

J01439J01101

J02715J01266

J03062

J00531

J03060J02370

J01021

J02558

J02161

J01744

J01300

J01907J00317 J01358

J00255J02502

J01411

J02593

J01515J00209

J01386

J02716

J01367J00815J00085

J02808 J00813

J02845J01824

J02849

J01438 J01611

J02851

J02868J01781

J00337

J00154 J00155J00156J02852

J00542 J02371

J02318

J02880

J01444J03061J01456J00746

J02921

J02587

J01992

J02747

J02844J02839

J00086

J02850J00797

J02915J02097

J02744

J02741J02728

J01074J02733J02740J02524

J01395 J02035J03099

J02833

J01354

J00892

J00713J01835J02801

J01126

J01035

J00120

J02859

J00230

J02802J00733

J01702

J01116

J02758

J00148

J03079

J00865

J02853J02791

J00075J00355

J00818J02854

J00488

J01360J02401J01271J00820J02034J00053

J00161

J00150

J00146

J02990 J02993 J02983J02085 J02991 J01571

J02320

J01952J02305

J02316

J02084

J01268

J02083

J00460

J02838

J02771

J01910

J00123

J01087J00635

J01180

J01234

J02609J00612J02908J02875J00652J02945 J02700 J01508

J02874J02861

J02876

J02899 J01539J01834

J02247J02636

J01865J02610

J00611J00711

J00153

J02125J02810

J02153

J00427

J02995J02399J03098J01703

J00306J00226

J02787J00121J01699J00073J01336

J00087J02786

J01613J01618J02659

J01580J01120J03084

J01564J00735J03094

J01826J02753J02402

J02809J02258J01063J02948

J01188

J01688J01836

J03083

J01583

J01683J01830

J01527

J00897

J00066 J02949J02968J00077J02529J01121

J02302J01330J00916

J02433J02313J02759J01405

J02453

J01877

J01587

J02745J01872J00548

J01082

J01982 J01759 J02719 J02023

J02675J01871J00566J01080 J01981 J01758 J02709 J01996

J02129 J02050 J02763 J02669 J02607 J02319 J01437 J01119

J00959 J01479 J02761 J00232 J01387 J02298 J01436 J01118

J02301 J00490 J01817

J02314 J01998 J01818

J01651

J02515J02315

J00497J03042

J00468

J01315 J00326J02782

J01838

J02466

J00980

J00512J01932

J01643

J01948 J02092J01686

J00415J01504

J00933J01325

J00297

J00769J01660

J01941J00523 J00547

J02350

J00441

J00932J02682

J01030J00450

J00854

J00868

J02310J02351

J02341J00455J02300

J01179

J00844

J00266J00906J00928

J03035

J01001

J00017

J01927

J00583

J00609

J00618

J00690

J00599

J02090J00970

J02428

J02471

J03006

J00452

J01788

J00269

J00251

J01088

J01925

J01055

J02481J00695

J00682J00800J01309J01340

J01934J02465

J02919

J01379J01378 J01212J02661

J01216

J01202

J02645

J02673

J00856

J02799J00791

J00038J02592

J00265

J02559

J01018

J02643

J01396

J02485J02590

J00927J00772

J00136 J02490

J01072

J02488

J00175J01621

J02003

J01278

J00761

J03008

J00670

J02444

J02454

J00926

J02582

J03107

J02498J00740

J02265

J00493J01874

J00855

J01273J01535

J00025J01413

J00616

J00242

J02264J00458

J00861J00011J00224

J00708J01173

J02327

J00860

J00329J00357

J02440

J00454J00693

J01802J00061J01807

J00745

J01547

J00608

J00623

J00659

J01312

J02222

J00591

J00790

J00356J00001J00106

J00665J00002J00723

J01440J00035

J00009

J00601

J02798

J00598

J00944J00585

J03078

J01843J01272

J03018

J01110

J02449J01904

J00342

J01151

J00373

J00383

J00210

J00839

J00768

J02503J00371J03121

J02457J00963J01066

J02184

J02527

J01036

J02936

J01334J01317

J02004

J00654J00143

J00650J01149J00606

J02477J02478

J01409J01280J02501

J01014

J00850J00421

J01277

J00084

J00169

J00655

J00727

J02531J02713

J00590J00647

J00648

J00674J00634

J00137

J00632

J01719

J02832

J00629

J00037

J03092

J01150

J02680J00964

J01732 J01722

J01369

J00249

J02166

J01374

J03122

J01162

J01376

J02333

J01007

J01344

J01136

J00880

J01886

J01415J01137

J02681J01154

J01902

J00764J01065

J02209J00774

J00676J02532

J03019

J03013

J00847

J00203

J00691

J01906

J02510

J01711

J00845

J01226

J02679

J00882

J00890J01471

J00369

J00041

J02508

J01174

J01366

J02180

J00401

J01720

J00680J00672

J01371

J02525

J01747J00347

J01034

J01989

J00212J02937

J00574J02935

J00200

J03022

J00299

J00701

J01085

J02519

J00799

J01715 J01264J00294

J00692J00689

J00290

J00796 J03123

J00783J00996

J01073J00671

J00786

J01343

J00206

J00099

J01249J00915

J02473

J00651J00789J00586J00649

J01929J01895

J00016J01890

J00491

J00010J00724J00725

J00765

J00636

J00875J00405

J00965

J01056J00657J00784

J02420J02448J03021

J02430J02429J01009J03054J03112

J01109

J02363J00968

J03007

J00231

J00721

J00720J00719J00718

J00717

J00715

J01011J02642J02644

J01039J01132

J01285J00794

J00827

J00662

J02581 J01957J00763

J00779J01765 J01282

J00109

J01015

J01319

J00607

J00714

J02182J01380

J01373

J00568

J01417

J02210

J02586

J00183

J00361

J01753

J02311 J02595

J01175

J02813

J00368

J01757

J01652

J01131

J01453

J02081

J00866J00391

J02641

J01954

J00126

J01578J01123

J01752

J00681

J00198

J00748 J00424

J01718

J01275

J00972

J00867J00849

J00433

J01736

J02181

J02426J02424

J02131

J01512J01511

J02982J02065

J01419

J02078

J02331

J02064

J02574J01804

J02824

J01942

J01144J00588

J01176

J00785

J02086J01978

J00572J02685

J00500J01130

J00181

J00310

J00108

J00411J01370

J02553

J01328J00207J01522

J00469J00092J00089

J00837

J00090

J02306J02386J02496J01510

J02693J01049

J00012

J01429

J01712

J02414J01545J00006

J01979

J03025

J01012J00981

J01544

J01038

J02952

J01466

J00026

J00032

J00033

J00537

J01442

J01414

J00020

J02953

J03127

J02357

J00359

J00869

J01345J02573

J02438

J00743

J02241

J01828J00278

J02692J02691

J01678

J02746

J00756

J02267J02236

J00953

J00270

J02975J01974J02235

J01467

J00777 J01031

J01181

J01337

J01323

J00902

J02109

J00112

J03106

J02730

J01383

J01357

J02237

J00260J01294

J02951

J02562J00015

J00330 J01182J02443 J02055 J01534J01487J00288

J02296J02282

J01231J01798

J02008J00223

J01790J00501J00244

J00062

J00045

J01307J01604

J00263

J01756

J00935 J00124

J01287

J02057J00936J02567

J01206

J01024 J02489J02468

J02079

J01857

J01449

J02054

J01222J01196

J03109J00366

J00264

J00470

J00240

J01133

J02755J00281

J02170J01606J01880

J02955J00228

J02492J02269J02147J02041J01605J01169

J01165J02190J01794

J02063J01796

J02252J00304

J02353J01536J02226

J01561J02739J02749

J02541

J00081

J01839

J00482J02345

J01541J00625J00504

J00059

J01984

J03053

J00208

J02584

J03012J00843

J01663

J00595

J01296J02978

J02018J00825

J00248J00660

J01008

J02458J01730

J01724

J01723

J00029J02677

J01731J00688

J02073

J02072

J02491

J01019

J02060

J02556

J01739

J02355

J02511

J00874

J02415

J00976

J00645

J01161 J00505

J02380

J00254

J01572

J01885

J02218J01167J02500

J02188 J00883J02588

J01408

J02144 J01526J02535J02544

J01291

J01129

J01062

J00602

J01042 J00390

J00903

J00767

J02297

J00597

J01010

J01318

J01717

J00673

J03009

J01995

J01363

J02671

J03091

J01125

J02514

J00372

J00313

J00776

J02649J02460

J01754

J00755J00277

J00144

J01384

J01103

J01214

J02796J00974

J00285

J02470J01889

J00999

J00171 J00560J01586

J00279

J00613 J01888J01905

J00710

J00344J00345

J00511

J00048

J02895J02873

J02091J01314 J00407

J01298

J02385

J00529

J01884

J00881

J01305

J00941

J02683

J00095J01238

J03058

J01626

J01924

J02434

J02614

J02981

J02920J02006J02878J01976J02066

J01969

J02804

J01674

J01341J01469J01592J02232

J02546

J01915

J02684J01567

J00510

J00246

J02784 J00272 J02240 J01916

J02294

J02280J01956

J02250

J01965

J01562

J02819

J02134

J02710

J01557

J02325

J02566

J02070

J02074

J01958

J01568

J00498J00205

J02014J02225

J01230

J00807

J00492

J02435

J00922

J02229

J02555J02972

J01930J02720

J01812

J01919

J02704 J00914

J00923

J00846 J02780

J02113

J00360

J02215

J02455

J02421

J02330

J00327

J00886

J02965

J01183

J00742

J01313

J02174

J01302

J01967J02214

J01228

J01973

J02483 J01574

J00596

J00642

J00677

J00663

J01218

J01194

J03031J02266

J02565J00822

J02231

J02062J01555

J02238

J01694

J01105

J00386

J02976

J02583 J01153J00513

J02523

J01585

J01480

J03138J02962

J02731J01556J02248

J01600J02080

J02279

J01276

J01288

J01156

J00610

J02095J02293

J01999

J02178

J02201

J01043

J02219

J00393

J02108

J00204

J01159

J02177J00731

J00877

J00912

J03045

J01566

J02820

J01145

J02947J00620

J00444

J00905J01458

J01260J00057

J02575

J00309J00536

J01447

J00544J02221J01236

J01740

J01808J00775

J00104 J02179

J01172J02563

J00445

J01382J00295

J01850

J01311

J01970J00463J00437

J01178

J00946

J03016

J03005

J02795

J03023

J00848J02461

J02482

J02480

J02479

J01472J00055

J00252J02419J00457 J00696J00173

J01219

J02520

J02452

J02120J00987

J01549J00215

J01920

J01195

J01191

J02418J01937

J02432

J00453J02495

J01943

J00190J02016

J00499

J01484

J02162

J00934J02729

J02068

J01619

J00517

J02051

J02910

J02001J01239

J02273J02292

J02862

J02863

J02112

J01286

J00184

J01622

J02474

J02312

J02608

J00878

J01033

J01559

J01465

J00449

J00079J02676

J02494

J00122

J02211

J02094

J02367

J02446

J02392 J02778 J00486 J02926

J02391 J02777 J00438 J02897 J02882 J02887 J01493 J00514

J00615J02096J01849J01329J00552J01491J02901J02909

J00697J02667J01852J01326

J02133

J01069

J02436

J02067

J01901

J00118

J01783

J01779

J00005

J00937

J00830

J00816J01714

J00952

J01047

J02547

J00031J00555

J01284

J01945

J00389

J03065

J03104J01940

J00446

J03047

J00399

J01190

J00982J00418

J00520

J03059

J00302J00303

J02411

J02224

J01140

J00558

J02213J02630

J01918

J02475

J02110J00314

J00451

J01255J00387

J00188

J03049

J00110

J02437

J00229J02817

J02891

J00378J01677J00712

J00394J00425

J02902J00694

J02602

J00603

J03050

J02052J01986J01474

J00859J02866J02865

J01809J01729J02285

J01966

J01158J00293

J00336

J00730

J00074

J03072

J01859

J00532

J02087

J00973

J01032

J02690

J01959

J00819

J02694

J01806

J02030

J01299

J00975

J00388

J00751

J01385J02748J02145J01517

J02589

J00474

J00289

J01977J02007J02464

J02323

J02689J02533J01538

J02954

J00979

J01543

J03128

J01745

J02377

J01784

J01046

J01892

J02139

J00925

J00920

J02307J00030

J01375

J00338

J00404

J01381

J02394

J02199

J01461

J02200

J01141

J01349

J01350J00296

J01293J02106

J01146

J00328

J03135J00273

J01424 J02834

J00024

J02638

J01607J02836 J02760

J02776

J02343

J02142J01631 J01454

J03080

J01529

J03124 J02958

J03102

J02944J01061

J01854

J02973J02941

J02615J00177 J02956

J02410 J02407

J02564

J02185

J02165

J02187

J02736J02408

J02674

J02271 J02289

J01931

J00216

J02002 J02894

J02912 J02930

J02857

J02101

J02913J02900

J01506

J01507

J02104

J02102

J00580

J03002J01661J00961

J00340J01040J01737

J01741J01814

J01990J00133

J02552J02378

J02605

J01081

J02603

J00685

J00857

J01813J00284J02599

J01301

J01705

J03071

J01016

J01470J01488J01575

J00924J03129J00466

J02598J01100J01364J03011

J02591J00840 J02513J02606

J00268

J02344

J00400J02364

J01728

J01844

J01935

J01441

J00027

J02604

J00550

J02628 J01078J01308J02512

J00516 J02017

J00040 J00893

J02088

J01143J02597

J01953J01099

J01823

J00792

J02585J02056

J02176

J00841

J00487

J02540

J02794

J01525

J00091

J00668J02518

J00103J01742

J00370J02814

J01320

J00962

J01738J00267

J02646

J00669

J00948J01733

J00462

J01037

J01751

J01124

J01763J02220

J01433J01750

J01468

J01052J02362

J02196J02361

J02183J00382

J01624 J01348

J01963

J01523

J01853 J02928

J01520

J02015J01142J02625

J02622

J02393J01524

J00093

J02335J00082

J00808J00140

J00826 J01514J00472

J00921J01968 J00945

J00403

J01949J03126J02164

J01482J00341J01521

J01147J01211

J00522

J01962J01860

J02578

J01560 J01819

J00913

J00480

J00478

J01221J01076

J02534

J00479

J01847

J02783

J01134J01594

J02395J01200

J01603

J02545J00853

J01155

J02860

J01243J02168J02053

J01858

J02163J01027J00456J01204

J02600

J00176J00274J02631J01050

J02717

J01346

J02577

J01489

J02245

J01274

J02668

J00197J02883J02376J01215

J02576

J02278J02601

J02734J01464

J02143

J03076

J02299 J02160

Figure 2: The network consisting of 0.7% Jamu pairs (correlation value above or equal to 0.596).

Page 6: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

6 BioMed Research International

Table 3: Statistics of three datasets.

Parameters 0.7% 0.5% 0.3%

Networkstatistics

Total pairs 34,454 24,610 14,766Minimum correlation 0.596 0.665 0.718Number of Jamu formulas 2,779 2,496 2,085Average degree 24.8 19.7 14.2(Random network: ER) (24.8 ± 0.0) (19.7 ± 0.0) (14.2 ± 0.0)

(Random network: BA) (24.7 ± 0.1) (19.7 ± 0.1) (14.1 ± 0.1)

(Random network: CNN) (24.7 ± 0.4) (19.7 ± 0.4) (14.0 ± 0.4)

Clustering coefficient 0.521 0.520 0.540(Random network: ER) (0.009 ± 0.000) (0.008 ± 0.000) (0.007 ± 0.000)

(Random network: BA) (0.030 ± 0.001) (0.028 ± 0.001) (0.026 ± 0.001)

(Random network: CNN) (0.246 ± 0.008) (0.239 ± 0.008) (0.233 ± 0.010)

Number of connected components 69 119 254(Random networks: ER, BA, CNN) (1) (1) (1)Network diameter 15 17 20(Random network: ER) (4.0 ± 0.0) (4.0 ± 0.0) (5.0 ± 0.0)

(Random network: BA) (10.8 ± 0.8) (11.2 ± 1.5) (10.8 ± 0.9)

(Random network: CNN) (14.6 ± 1.9) (14.1 ± 1.4) (14.7 ± 1.3)

Network density 0.008 0.008 0.007(Random network: ER) (0.009 ± 0.000) (0.008 ± 0.000) (0.007 ± 0.000)

(Random network: BA) (0.009 ± 0.000) (0.008 ± 0.000) (0.007 ± 0.000)

(Random network: CNN) (0.009 ± 0.000) (0.008 ± 0.000) (0.007 ± 0.000)

DPClusO

Total number of clusters 1,746 1,411 938Number of clusters with more than 2 Jamu 1,296 873 453(%) (74.2) (61.9) (48.3)Number of Jamu formulas in the biggest cluster 118 104 89

of all formulas consists of 465 plants. We assigned 3138 Jamuformulas to 116 diseases of International Classification ofDiseases (ICD) version 10 from World Health Organization(WHO, Table 1) [19]. Those 116 diseases are mapped to18 classes of disease, which contains 16 classes of diseasefromNational Center for Biotechnology Information (NCBI)[20] and 2 additional classes. Table 2 shows distributionof 3138 Jamu into 18 classes of disease. According to thisclassification, most Jamu formulas are useful for relievingmuscle and bone, nutritional and metabolic diseases, andthe digestive system. Furthermore, there is no Jamu formulaclassified into glands and hormones and neonatal diseaseclasses. We excluded 4 Jamu formulas which are used to treatfever in the evaluation process because this symptom is verygeneral and almost appeared in all disease classes. Jamu-plant-disease relations can be represented using 2 matrices:first matrix is Jamu-plant relation with dimension 3138 ×465 and the second matrix is Jamu-disease relation withdimension 3138 × 18.

After completion of data acquisition process, we calcu-lated the similarity between Jamu pairs using correlationmeasure. The similarity measures between Jamu pairs weredetermined based on their ingredients. Corresponding to 𝐾(3138 in present case) Jamu formulas, there can be maximum(𝐾 × (𝐾 − 1)/2) = (3138 × (3137/2)) = 4,921,953 Jamu

pairs. We sorted the Jamu pairs based on correlation valueusing descending order and selected top-𝑛 (0.7%, 0.5%,and 0.3%) pairs of Jamu formula to create 3 sets of Jamupairs. The number of Jamu pairs for 0.7%, 0.5%, and 0.3%datasets is 34,454 pairs, 24,610 pairs, and 14,766 pairs andthe corresponding minimum correlation values are 0.596,0.665, and 0.718, respectively. The three datasets of Jamupairs can be regarded as three undirected networks (step 1 inFigure 1) consisting of 2779, 2496, and 2085 Jamu formulas,respectively (Table 3). Figure 2 shows visualization of 0.7%Jamunetworks usingCytoscape Spring Embedded layout.Weverified that the degree distributions of the Jamu networksare somehow close to those of scale-free networks, that is,roughly are of power law type. However, in the high-degreeregion the power law structure is broken (Figure 3). Nearlyaccurate relation of power laws between medicinal herbsand the number of formulas utilizing them was observed inJamu system but not in Kampo (Japanese crude drug system)[4]. The difference of formulas between Jamu and Kampocan be explained by herb selection by medicinal researchersbased on the optimization process of selection [4]. Thus,the broken structure of power law corresponding to Jamunetworks is associated with the fact that selection of Jamupairs based on ingredient correlation leads to nonrandomselection. We also constructed random networks according

Page 7: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 7

●● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

1 2 5 10 20 50 100 200

12

35

813

2340

7122

1

0.5%

Freq

uenc

y

Freq

uenc

y

●●

●●

●●●

●●

●●

●●

●●

●●

1 2 5 10 20 50 100

12

47

1222

4381

0.3%(Deg.) (Deg.)

(Deg.)

Freq

uenc

y●

●●

●●●

●●

●●●●●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●●●

●●

●●

●●

●●

●●●●

●●

●●

1 2 5 10 20 50 100 200

12

35

813

2237

6311

423

40.7%

Figure 3: Degree distributions of three Jamu networks roughly follow power law.The 𝑥-axis corresponds to the log of degree of a node in theJamu network and the 𝑦-axis corresponds to the log of the number of Jamu.

to Erdos-Renyi (ER) model [21], Barabasi-Albert (BA) model[22], and Vazquez’s Connecting Nearest Neighbor (CNN)model [23] of the same size corresponding to each of the realJamu network. We used Cytoscape Network Analyzer plugin[24] and R software for analyzing the characteristics of boththe Jamu and the random networks.

We determined five statistical indexes, that is, averagedegree, clustering coefficient, number of connected compo-nent, network diameter, and network density of each Jamunetwork and also of each random network. The clusteringcoefficient 𝐶𝑛 of a node 𝑛 is defined as 𝐶𝑛 = 2𝑒𝑛/(𝑘𝑛(𝑘𝑛 − 1)),where 𝑘𝑛 is the number of neighbors of 𝑛 and 𝑒𝑛 is the numberof connected pairs between all neighbors of 𝑛. The networkdiameter is the largest distance between any two nodes. If

a network is disconnected, its diameter is themaximum of alldiameters of its connected components. A network’s densityis the ratio of the number of edges in the network over thetotal number of possible edges between all pairs of nodes(which is 𝑛(𝑛 − 1)/2, where 𝑛 is the number of vertices, foran undirected graph). The average number of neighbors andthe network density are the same for the real and randomnetworks of the same size as it is shown in Table 3. In caseof 0.7% and 0.5% real networks, the clustering coefficient isroughly the same and in case of 0.3% the clustering coefficientis somewhat larger. The number of connected componentsand the diameter of the Jamu networks gradually decreaseas the network grows bigger by addition of more nodes andedges.

Page 8: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

8 BioMed Research International

Matching score

Num

ber o

f clu

sters

0.0 0.2 0.4 0.6 0.8 1.0

0

100

200

300

(a) 0.7%

Matching score

Num

ber o

f clu

sters

0.0 0.2 0.4 0.6 0.8 1.00

100

200

300

(b) 0.5%

Matching score

Num

ber o

f clu

sters

0.0 0.2 0.4 0.6 0.81.00

100

200

300

(c) 0.3%

Figure 4: Distribution of clusters based on matching score.

0.8Matching score threshold

Ratio

of n

umbe

r of c

luste

rs to

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1.00.0

0.2

0.4

0.6

0.8

1.0

0.7%0.5%0.3%

tota

l clu

sters

(a)

50

0

100

150

Matching score threshold

Num

ber o

f pre

dict

ed p

lant

s

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.7%0.5%0.3%

(b)

Figure 5: (a) Success rate and (b) number of predicted plants with respect to matching score thresholds.

Very different values corresponding to clustering coef-ficient, connected component, and network diameter implythat the Jamu networks are quite different from all 3 typesof random networks.The differences between Jamu networksand ER random networks are the largest. Random networksconstructed based on other two models are also substantiallydifferent from Jamu networks. Based on the fact that therandom networks constructed based on all three types ofmodels are different from the Jamu networks, it can beconcluded that structure of Jamu networks is reasonablybiased and thus might contain certain information about

plant-disease relations. Specially, much higher value corre-sponding to clustering coefficient indicates that there areclusters in the networks worthy to be investigated. To extractclusters from the Jamu networks (step 2 in Figure 1) weapplied DPClusO network clustering algorithm [14] to gen-erate overlapping clusters based on density and peripherytracking.

3.2. Supervised Clustering Based on DPClusO. DPClusO is ageneral-purpose clustering algorithm and useful for findingoverlapping cohesive groups in an undirected simple graph

Page 9: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 9

Table 4: List of plants assigned to each disease.

Number Plants name Hit-miss statusA. Disease: blood and lymph diseases

1 Tamarindus indica Hit ∗

2 Allium sativum Hit ∗

3 Tinospora tuberculata Hit ∗

4 Piper retrofractum Hit5 Syzygium aromaticum Hit ∗

6 Bupleurum falcatum Hit7 Graptophyllum pictum Hit8 Plantago major Hit9 Zingiber officinale Hit ∗

10 Cinnamomum burmannii Hit ∗

11 Soya max Miss ∗12 Kaempferia galanga Hit13 Curcuma longa Hit ∗

14 Piper nigrum Hit15 Zingiber aromaticum Hit ∗

16 Phyllanthus urinaria Hit ∗

17 Oryza sativa Hit18 Myristica fragrans Hit ∗

19 Alstonia scholaris Hit ∗

20 Syzygium polyanthum Miss21 Andrographis paniculata Hit ∗

22 Sida rhombifolia Miss23 Cyperus rotundus Hit24 Sonchus arvensis Miss25 Curcuma aeruginosa Hit ∗

26 Curcuma xanthorrhiza HitB. Disease: cancers

1 Catharanthus roseus HitC. Disease: the digestive system

1 Foeniculum vulgare Hit2 Glycyrrhiza uralensis Hit ∗

3 Imperata cylindrica Hit4 Zingiber purpureum Hit ∗

5 Physalis peruviana Hit6 Punica granatum Hit ∗

7 Echinacea purpurea Hit8 Zingiber officinale Hit ∗

9 Psidium guajava Hit10 Baeckea frutescens Hit ∗

11 Amomum compactum Hit12 Cinnamomum burmannii Hit ∗

13 Melaleuca leucadendra Hit14 Caesalpinia sappan Hit ∗

15 Parkia roxburghii Hit16 Rheum tanguticum Hit17 Kaempferia galanga Hit18 Coriandrum sativum Hit

Table 4: Continued.

Number Plants name Hit-miss status19 Curcuma longa Hit20 Zingiber aromaticum Hit21 Phyllanthus urinaria Hit22 Myristica fragrans Hit23 Hydrocotyle asiatica Hit ∗

24 Carica papaya Hit25 Mentha arvensis Hit26 Lepiniopsis ternatensis Hit27 Helicteres isora Hit28 Andrographis paniculata Hit29 Symplocos odoratissima Hit30 Schisandra chinensis Hit31 Blumea balsamifera Hit32 Silybum marianum Hit ∗

33 Cinnamomum sintoc Hit34 Elephantopus scaber Hit35 Curcuma aeruginosa Hit36 Kaempferia pandurata Hit37 Curcuma xanthorrhiza Hit38 Curcuma mangga Hit ∗

39 Curcuma zedoaria Hit40 Daucus carota Hit ∗

41 Matricaria chamomilla Hit ∗

42 Cymbopogon nardus Hit ∗

D. Disease: female-specific diseases1 Foeniculum vulgare Hit2 Imperata cylindrica Hit3 Tamarindus indica Hit4 Pluchea indica Hit ∗

5 Piper retrofractum Hit6 Punica granatum Hit7 Uncaria rhynchophylla Hit8 Zingiber officinale Hit9 Guazuma ulmifolia Hit ∗

10 Nigella sativa Hit11 Terminalia bellirica Hit12 Baeckea frutescens Hit13 Phaseolus radiatus Hit14 Amomum compactum Hit ∗

15 Sauropus androgynus Hit16 Usnea misaminensis Hit17 Cinnamomum burmannii Hit18 Melaleuca leucadendra Hit19 Parameria laevigata Hit20 Parkia roxburghii Hit21 Piper cubeba Hit22 Kaempferia galanga Hit

Page 10: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

10 BioMed Research International

Table 4: Continued.

Number Plants name Hit-miss status23 Coriandrum sativum Hit24 Kaempferia angustifolia Hit25 Curcuma longa Hit26 Zingiber aromaticum Hit27 Languas galanga Hit28 Galla lusitania Hit29 Quercus lusitanica Hit30 Hydrocotyle asiatica Hit31 Areca catechu Hit32 Lepiniopsis ternatensis Hit33 Helicteres isora Hit ∗

34 Piper betle Hit35 Elephantopus scaber Hit ∗

36 Kaempferia pandurata Hit37 Curcuma xanthorrhiza Hit38 Sesbania grandiflora Hit

E. Disease: the heart and blood vessels1 Allium sativum Hit2 Curcuma longa Hit ∗

3 Morinda citrifolia Hit ∗

4 Homalomena occulta Hit ∗

5 Hydrocotyle asiatica Hit6 Alstonia scholaris Hit ∗

7 Syzygium polyanthum Miss ∗8 Andrographis paniculata Hit ∗

9 Apium graveolens Miss10 Imperata cylindrica Hit

F. Disease: male-specific diseases1 Cucurbita pepo Miss2 Serenoa repens Miss3 Baeckea frutescens Hit4 Phaseolus radiatus Hit5 Curcuma longa Hit6 Elephantopus scaber Hit

G. Disease: muscle and bone1 Foeniculum vulgare Hit2 Clausena anisum-olens Hit ∗

3 Zingiber purpureum Hit4 Allium sativum Hit5 Strychnos ligustrina Hit6 Tinospora tuberculata Hit ∗

7 Piper retrofractum Hit8 Syzygium aromaticum Hit9 Cola nitida Hit ∗

10 Ginkgo biloba Hit ∗

11 Panax ginseng Hit12 Equisetum debile Hit ∗

13 Zingiber officinale Hit

Table 4: Continued.

Number Plants name Hit-miss status14 Ganoderma lucidum Hit15 Nigella sativa Hit16 Terminalia bellirica Hit ∗

17 Baeckea frutescens Hit ∗

18 Amomum compactum Hit19 Cinnamomum burmannii Hit20 Melaleuca leucadendra Hit21 Parameria laevigata Hit ∗

22 Psophocarpus tetragonolobus Hit ∗

23 Parkia roxburghii Hit24 Piper cubeba Hit ∗

25 Kaempferia galanga Hit26 Coriandrum sativum Hit27 Cola acuminata Hit28 Coffea arabica Hit29 Orthosiphon stamineus Hit30 Curcuma longa Hit31 Piper nigrum Hit32 Alpinia galanga Hit33 Vitex trifolia Hit34 Zingiber amaricans Hit ∗

35 Zingiber zerumbet Hit36 Zingiber aromaticum Hit37 Languas galanga Hit38 Massoia aromatica Hit39 Morinda citrifolia Hit40 Carum copticum Hit ∗

41 Panax pseudoginseng Hit ∗

42 Oryza sativa Hit43 Myristica fragrans Hit44 Pandanus amaryllifolius Hit45 Eurycoma longifolia Hit46 Hydrocotyle asiatica Hit47 Areca catechu Hit ∗

48 Mentha arvensis Hit ∗

49 Lepiniopsis ternatensis Hit50 Pimpinella pruatjan Hit51 Andrographis paniculata Hit52 Blumea balsamifera Hit53 Cymbopogon nardus Hit54 Sida rhombifolia Hit55 Cinnamomum sintoc Hit56 Piper betle Hit ∗

57 Talinum paniculatum Hit58 Elephantopus scaber Hit59 Cyperus rotundus Hit60 Curcuma aeruginosa Hit61 Kaempferia pandurata Hit ∗

Page 11: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 11

Table 4: Continued.

Number Plants name Hit-miss status62 Curcuma xanthorrhiza Hit63 Tribulus terrestris Hit64 Corydalis yanhusuo Hit65 Pausinystalia yohimbe HitH. Disease: nutritional andmetabolic diseases1 Foeniculum vulgare Hit2 Glycyrrhiza uralensis Hit3 Zingiber purpureum Hit4 Allium sativum Hit5 Tinospora tuberculata Hit6 Pandanus conoideus Hit7 Syzygium aromaticum Hit8 Punica granatum Hit9 Zingiber officinale Hit10 Guazuma ulmifolia Hit11 Nigella sativa Hit12 Amomum compactum Hit ∗13 Cinnamomum burmannii Hit14 Parameria laevigata Hit15 Caesalpinia sappan Hit16 Soya max Hit ∗17 Cocos nucifera Hit18 Rheum tanguticum Hit19 Piper cubeba Hit ∗20 Murraya paniculata Hit21 Kaempferia galanga Hit ∗22 Coffea arabica Hit ∗23 Orthosiphon stamineus Hit24 Curcuma longa Hit25 Piper nigrum Hit ∗26 Zingiber aromaticum Hit27 Aloe vera Hit28 Phaleria papuana Hit29 Galla lusitania Hit30 Quercus lusitanica Hit31 Morinda citrifolia Hit32 Myristica fragrans Hit ∗33 Momordica charantia Hit34 Areca catechu Hit35 Lepiniopsis ternatensis Hit36 Alstonia scholaris Hit37 Hibiscus sabdariffa Hit38 Laminaria japonica Hit39 Syzygium polyanthum Hit40 Andrographis paniculata Hit41 Sindora sumatrana Hit ∗42 Cassia angustifolia Hit43 Woodfordia floribunda Hit

Table 4: Continued.

Number Plants name Hit-miss status44 Piper betle Hit45 Spirulina Hit46 Stevia rebaudiana Hit47 Theae sinensis Hit48 Sonchus arvensis Hit49 Curcuma heyneana Hit50 Curcuma aeruginosa Hit51 Kaempferia pandurata Hit ∗

52 Curcuma xanthorrhiza Hit53 Curcuma zedoaria Hit ∗

54 Olea europaea HitI. Disease respiratory diseases

1 Foeniculum vulgare Hit2 Clausena anisum-olens Hit3 Glycyrrhiza uralensis Hit4 Zingiber purpureum Hit5 Piper retrofractum Hit ∗

6 Syzygium aromaticum Hit7 Gaultheria punctata Hit8 Panax ginseng Hit9 Equisetum debile Hit ∗

10 Zingiber officinale Hit11 Citrus aurantium Hit ∗

12 Nigella sativa Hit ∗

13 Amomum compactum Hit14 Cinnamomum burmannii Hit15 Melaleuca leucadendra Hit16 Parkia roxburghii Hit17 Cocos nucifera Hit18 Piper cubeba Hit19 Kaempferia galanga Hit20 Coriandrum sativum Hit21 Curcuma longa Hit22 Piper nigrum Hit23 Zingiber aromaticum Hit24 Languas galanga Hit25 Mentha piperita Hit26 Oryza sativa Hit ∗

27 Myristica fragrans Hit28 Pandanus amaryllifolius Hit ∗

29 Hydrocotyle asiatica Hit ∗

30 Mentha arvensis Hit31 Lepiniopsis ternatensis Hit32 Helicteres isora Hit33 Blumea balsamifera Hit34 Cymbopogon nardus Hit35 Piper betle Hit36 Curcuma xanthorrhiza Hit

Page 12: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

12 BioMed Research International

Table 4: Continued.

Number Plants name Hit-miss status37 Salix alba Hit ∗

38 Matricaria chamomilla Miss ∗J. Disease: skin and connective tissue

1 Strychnos ligustrina Hit2 Merremia mammosa Hit ∗

3 Piper retrofractum Hit ∗

4 Santalum album Hit5 Zingiber officinale Hit ∗

6 Citrus aurantium Hit7 Citrus hystrix Hit8 Cassia siamea Hit9 Cocos nucifera Hit10 Trigonella foenum-graecum Hit11 Orthosiphon stamineus Hit12 Curcuma longa Hit13 Vetiveria zizanioides Hit14 Aloe vera Hit15 Rosa chinensis Hit16 Jasminum sambac Hit17 Phyllanthus urinaria Hit18 Mentha piperita Hit19 Oryza sativa Hit20 Myristica fragrans Hit ∗

21 Hydrocotyle asiatica Hit22 Lepiniopsis ternatensis Hit23 Alstonia scholaris Hit24 Andrographis paniculata Hit25 Cymbopogon nardus Hit26 Piper betle Hit27 Theae sinensis Hit28 Curcuma heyneana Hit29 Kaempferia pandurata Hit ∗

30 Curcuma xanthorrhiza Hit31 Melaleuca leucadendra Hit32 Matricaria chamomilla Miss ∗

K. Disease: the urinary system1 Foeniculum vulgare Hit ∗

2 Imperata cylindrica Hit ∗

3 Strychnos ligustrina Hit ∗

4 Plantago major Hit5 Zingiber officinale Hit ∗

6 Cinnamomum burmannii Hit ∗

7 Strobilanthes crispus Hit8 Kaempferia galanga Hit ∗

9 Orthosiphon stamineus Hit10 Phyllanthus urinaria Hit11 Blumea balsamifera Hit ∗

12 Sonchus arvensis Hit13 Curcuma xanthorrhiza Hit∗indicates that plant will not assigned if we use matching score >0.7.

63

24

1418

5 62 2 1

0

10

20

30

40

50

60

70

1 2 3 4 5 6 7 8 9

Num

ber o

f pla

nts

Number of diseases

2424

141418

55 662 2 1

Figure 6: Distribution of 135 plants assigned based on 0.7% datasetwith respect to the number of diseases they are assigned to.

for any type of application. It ensures coverage and performsrobustly in case of random addition, removal, and rearrange-ment of edges in protein-protein interaction (PPI) networks[14]. While applying DPClusO, the parameter values ofdensity and cluster property that we used in this experimentare 0.9 and 0.5, respectively [15]. Table 3 shows the summaryof clustering result by DPClusO. Because clusters consistingof two Jamu formulas are trivial clusters, for the next stepswe only use clusters each of which consists of 3 or moreJamu formulas. The number of total clusters increases alongwith the larger dataset, although the threshold correlationbetween Jamu pairs decreases. We evaluated the clusteringresult using matching score to determine dominant diseasefor every cluster (step 3 in Figure 1). Matching score of acluster is the ratio of the highest number of Jamu associatedwith the same disease to the total number of Jamu in thecluster. Thus matching score is a measure to indicate howstrongly a disease is associated to a cluster. Figure 4 showsthe distribution of the clusters with respect to matching scorefrom three datasets. All datasets have the highest frequencyof clusters at matching score >0.9 and overall most of theclusters have higher matching score, which means most ofthe DPClusO generated clusters can be confidently relatedto a dominant disease. Furthermore the number of clusterswith matching score >0.9 is remarkably larger compared tothe same in other ranges ofmatching score in case of the 0.3%dataset (Figure 4(c)). If we compare the ratio of frequency ofclusters at matching score >0.9 for every dataset, the 0.3%dataset has the highest ratio with 40.84% (of 453), comparedto 29.67% (of 873) and 21.91% (of 1296), in case of 0.5% and0.7% datasets, respectively. Thus, the most reliable speciesto disease relations can be predicted at matching score >0.9corresponding to the clusters generated from 0.3% dataset.

Figure 5(a) shows the success rate for all 3 datasets withrespect to threshold matching scores. Success rate is definedas the ratio of the number of clusters with matching scorelarger than the threshold to the total number of clusters.As expected it tends to produce lower success rate if wedecrease correlation value to create the datasets. Howevermore clusters are generated and more information can beextracted when we lower the threshold correlation value.Thesuccess rate increases rapidly as the matching score decreases

Page 13: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 13

Table 5: Relation between disease classes in NCBI and efficacy classes reported by Afendi et al. [6].

Class of disease Ref. Efficacy classD1 Blood and lymph diseases NCBI E7 Pain/inflammation (PIN)D2 Cancers NCBI E7 Pain/inflammation (PIN)

D3The digestive system NCBI E4 Gastrointestinal disorders (GST)E7 Pain/inflammation (PIN)

D4 Ear, nose, and throat NCBI E7 Pain/inflammation (PIN)D5 Diseases of the eye NCBI E7 Pain/inflammation (PIN)D6 Female-specific diseases NCBI E5 Female reproductive organ problems (FML)D7 Glands and hormones NCBI E7 Pain/inflammation (PIN)D8The heart and blood vessels NCBI E7 Pain/inflammation (PIN)D9 Diseases of the immune system NCBI E7 Pain/inflammation (PIN)D10Male-specific diseases NCBI E6Musculoskeletal and connective tissue disorders (MSC)D11Muscle and bone NCBI E6Musculoskeletal and connective tissue disorders (MSC)D12 Neonatal diseases NCBI E7 Pain/inflammation (PIN)D13The nervous system NCBI E7 Pain/inflammation (PIN)

D14 Nutritional and metabolic diseases NCBI E2 Disorders of appetite (DOA)E4 Gastrointestinal disorders (GST)

D15 Respiratory diseases NCBI E8 Respiratory disease (RSP)E7 Pain/inflammation (PIN)

D16 Skin and connective tissue NCBI E9Wounds and skin infections (WND)D17The urinary system ∗ E1 Urinary related problems (URI)D18Mental and behavioural disorders ∗ E3 Disorders of mood and behavior (DMB)

from 0.9 to 0.6 and after that the slope of increase of successrate decreases. Therefore in this study we empirically decide0.6 as the threshold matching score to predict plant-diseaserelations.

3.3. Assignment of Plants to Disease. By using DPClusO re-sulting clusters, we assigned plants to classes of disease. Basedon a threshold matching score we assigned dominant diseaseto a cluster. Then we assign a plant to a cluster by way ofanalyzing the ingredients of the Jamu formulas belongingto that cluster and determining the highest frequency plant,that is, the plant that is used for maximum number Jamubelonging to that cluster (step 4 in Figure 1). Thus we assigna disease and a plant to each cluster having matching scoregreater than a threshold. Our hypothesis is that the diseaseand the plant assigned to the same cluster are related.

The total number of assigned plants depends onmatchingscore value. Figure 5(b) shows the number of predicted plantsthat can be assigned to diseases in the context of matchingscore. With higher matching score value, the number ofpredicted plants assigned to classes of disease is supposed toremain similar or decrease but the reliability of predictionincreases. In Figure 5(b) a sudden change in the numberof predicted plants is seen at matching score 0.6 which weconsider as empirical threshold in this work. Based on the0.7%dataset, the largest number of plants (135 plants, Table 4)was assigned to diseases. There are 63 plants assigned to onlyone class of disease, whereas the other 72 plants are assignedto at least two or more classes of disease (Figure 6).

3.4. Evaluation of the Supervised Clustering Based on DPClu-sO. Weused previously published results [6] as gold standardto evaluate our results. The previous study assigned plantsto 9 kinds of efficacy whereas we assigned the plants to 18disease classes (16 from NCBI and 2 additional classes). Forthe sake of evaluation we got done amapping of the 18 diseaseclasses to 9 efficacy classes by a professional doctor, whichis shown in Table 5. Table 6 shows the prediction result ofplant-disease relations for all 3 datasets, corresponding toclusters with matching score greater than 0.6. Table 6 alsoshows corresponding efficacy, the number of assigned plants,number of correctly predicted plants, and true positive rates(TPR), respectively.

We determined TPR corresponding to a disease/efficacyclass by calculating the ratio of the number of correctprediction to the number of all predictions. When a diseasecorresponds to more than one kind of efficacy, the highestTPR can be considered the TPR for the correspondingdisease. For all 3 datasets the TPR corresponding to eachdisease is roughly 90% or more. The 0.3% dataset consists ofJamu pairs with higher correlation values and based on thisdataset 117 plants are assigned to 14 disease classes. The 0.7%dataset contains more Jamu pairs and assigned plants to11 disease classes, one less disease class compared to 0.5%dataset. The two disease classes covered by 0.3% datasetbut not covered by 0.5% and 0.7% datasets are the nervoussystem (D13) and disease of the immune system (D9). Theonly disease class covered by 0.3% and 0.5% datasets butnot covered by 0.7% dataset is mental and behaviouraldisorders (D18). The larger dataset network tends to have

Page 14: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

14 BioMed Research International

Table 6: The prediction result of plant-disease relations using matching score >0.6.

Class ofdisease

Correspondingefficacy

0.7% dataset 0.5% dataset 0.3% datasetNumber ofassignedplants

Correctprediction

Truepositiverate

Number ofassignedplants

Correctprediction

Truepositiverate

Number ofassignedplants

Correctprediction

Truepositiverate

D1 E7 26 22 0.85 24 20 0.83 24 20 0.83D2 E7 1 1 1.00 5 5 1.00 1 1 1.00

D3 E4 42 42 1.00 33 33 1.00 28 28 1.00E7 38 0.90 30 0.91 25 0.89

D4 E7 0 0 — 0 0 — 0 0 —D5 E7 0 0 — 0 0 — 0 0 —D6 E5 38 38 1.00 37 37 1.00 32 32 1.00D7 E7 0 0 — 0 0 — 0 0 —D8 E7 10 8 0.80 8 7 0.88 6 5 0.83D9 E7 0 0 — 0 0 — 1 1 1.00D10 E6 6 4 0.67 2 0 — 3 1 0.33D11 E6 65 65 1.00 71 71 1.00 60 60 1.00D12 E7 0 0 — 0 0 — 0 0 —D13 E7 0 0 — 0 0 — 5 5 1.00

D14 E2 54 44 0.81 45 36 0.80 35 26 0.74E4 54 1.00 45 1.00 35 1.00

D15 E7 38 37 0.97 34 34 1.00 33 33 1.00E8 31 0.82 30 0.88 29 0.88

D16 E9 32 31 0.97 32 32 1.00 27 27 1.00D17 E1 13 13 1.00 9 9 1.00 8 8 1.00D18 E3 0 0 — 5 5 1.00 4 4 1.00

Total assigned plants 135 129 117

lower coverage of disease classes. The number of Jamu pairs,that is, the number of edges in the network, affect the numberof DPClusO resulting clusters and number of Jamu formulasper cluster. As a consequence, for the larger dataset networks,the success rate becomes lower and the coverage of diseaseclasses is lower but prediction of more plant-disease relationscan be achieved.

4. Conclusions

This paper introduces a novel method called supervisedclustering for analyzing big biological data by integrat-ing network clustering and selection of clusters based onsupervised learning. In the present work we applied themethod for data mining of Jamu formulas accumulatedin KNApSAcK database. Jamu networks were constructedbased on correlation similarities between Jamu formulas andthen network clustering algorithm DPClusO was applied togenerate high density Jamu modules. For the analysis ofthe next steps potential clusters were selected by supervisedlearning. The successful clusters containing several Jamurelated to the same disease might be useful for finding mainingredient plant for that disease and the lower matchingscore value clusters will be associated with varying plants

which might be supporting ingredients. By applying theproposed method important plants from Jamu formulas forevery classes of disease were determined.The plant to diseaserelations predicted by proposed network based method wereevaluated in the context of previously published results andwere found to produce a TPR of 90%. For the larger datasetnetworks, success rate and the coverage of disease classesbecome lower but prediction of more plant-disease relationscan be achieved.

Conflict of Interests

The authors declare that there is no financial interest orconflict of interests regarding the publication of this paper.

Acknowledgments

Thisworkwas supported by theNational BioscienceDatabaseCenter in Japan and the Ministry of Education, Culture,Sports, Science, and Technology of Japan (Grant-in-Aidfor Scientific Research on Innovation Areas “BiosyntheticMachinery. Deciphering and Regulating the System for Cre-ating StructuralDiversity of BioactivityMetabolites (2007)”).

Page 15: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

BioMed Research International 15

References

[1] R. Verporte, H. K. Kim, and Y. H. Choi, “Plants as source ofmedicines,” inMedicinal and Aromatic Plants, R. J. Boger, L. E.Craker, and D. Lange, Eds., chapter 19, pp. 261–273, 2006.

[2] A. Furnharm, “Why do people choose and use complemen-tary therapies?” in Complementary Medicine: An ObjectiveAppraisal, E. Ernst, Ed., pp. 71–88, Butterworth-Heinemann,Oxford, UK, 1996.

[3] E. Ernst, “Herbal medicines put into context,” British MedicalJournal, vol. 327, no. 7420, pp. 881–882, 2003.

[4] F. M. Afendi, T. Okada, M. Yamazaki et al., “KNApSAcK familydatabases: integrated metabolite—plant species databases formultifaceted plant research,” Plant and Cell Physiology, vol. 53,no. 2, p. e1, 2012.

[5] F.M.Afendi, N.Ono, Y.Nakamura et al., “Dataminingmethodsfor omics and knowledge of crude medicinal plants towardbig data biology,” Computational and Structural BiotechnologyJournal, vol. 4, no. 5, Article ID e201301010, 2013.

[6] F. M. Afendi, L. K. Darusman, A. Hirai et al., “System biologyapproach for elucidating the relationship between Indonesianherbal plants and the efficacy of Jamu,” in Proceedings of the10th IEEE International Conference on Data Mining Workshops(ICDMW ’10), pp. 661–668, Sydney, Australia, December 2010.

[7] F. M. Afendi, L. K. Darusman, A. H. Morita et al., “Efficacy ofJamu formulations by PLS modeling,” Current Computer-AidedDrug Design, vol. 9, pp. 46–59, 2013.

[8] F. M. Afendi, L. K. Darusman, M. Fukuyama, M. Altaf-Ul-Amin, and S. Kanaya, “A bootstrapping approach for investi-gating the consistency of assignment of plants to Jamu efficacyby PLS-DAmodel,”Malaysian Journal ofMathematical Sciences,vol. 6, no. 2, pp. 147–164, 2012.

[9] W. Winterbach, P. V. Mieghem, M. Reinders, H. Wang, and D.de Ridder, “Topology of molecular interaction networks,” BMCSystems Biology, vol. 7, article 90, 2013.

[10] C. Bachmaier, U. Brandes, and F. Schreiber, “Biological net-work,” in Handbook of Graph Drawing and Visualization, pp.621–651, CRC Press, 2013.

[11] X. Chen, M. Chen, and K. Ning, “BNArray: an R package forconstructing gene regulatory networks from microarray databy using Bayesian network,” Bioinformatics, vol. 22, no. 23, pp.2952–2954, 2006.

[12] P. Langfelder and S. Horvath, “WGCNA: an R package forweighted correlation network analysis,” BMC Bioinformatics,vol. 9, article 559, 2008.

[13] A. Martin, M. E. Ochagavia, L. C. Rabasa, J. Miranda, J.Fernandez-de-Cossio, and R. Bringas, “BisoGenet: a new toolfor gene network building, visualization and analysis,” BMCBioinformatics, vol. 11, article 91, 2010.

[14] M. Altaf-Ul-Amin, M. Wada, and S. Kanaya, “Partitioning aPPI network into overlapping modules constrained by high-density and periphery tracking,” ISRN Biomathematics, vol.2012, Article ID 726429, 11 pages, 2012.

[15] M. Altaf-Ul-Amin, H. Tsuji, K. Kurokawa, H. Asahi, Y. Shinbo,and S. Kanaya, “DPClus: a density-periphery based graphclustering software mainly focused on detection of proteincomplexes in interaction networks,” Journal of Computer AidedChemistry, vol. 7, pp. 150–156, 2006.

[16] S. K. Kachigan, Multivariate Statistical Analysis: A ConceptualIntroduction, Radius Press, New York, NY, USA, 1991.

[17] J. L. Rodgers and W. A. Nicewander, “Thirteen ways to look atthe correlations coefficient,”TheAmerican Statiscian, vol. 42, pp.59–66, 1995.

[18] M. Li, J.-E. Chen, J.-X. Wang, B. Hu, and G. Chen, “Modifyingthe DPClus algorithm for identifying protein complexes basedon new topological structures,” BMC Bioinformatics, vol. 9,article 398, 2008.

[19] World Health Organization, “International Classification ofDiseases (ICD) 10,” 2010, http://www.who.int/classifications/icd/en/.

[20] National Center for Biotechnology Information, Genes andDisease, NCBI, Bethesda, Md, USA, 1998.

[21] P. Erdos and A. Renyi, “On the evolution of random graph,”Publicationes Mathematicae Debrecen, vol. 6, pp. 290–297, 1959.

[22] A.-L. Barabasi and R. Albert, “Emergence of scaling in randomnetworks,” Science, vol. 286, no. 5439, pp. 509–512, 1999.

[23] A. Vazquez, “Growing network with local rules: preferentialattachment, clustering hierarchy, anddegree correlations,”Phys-ical Review E—Statistical, Nonlinear, and Soft Matter Physics,vol. 67, no. 5, Article ID 056104, 15 pages, 2003.

[24] Max Planck Institut Informatik, “NetworkAnalyzer,” 2013,http://med.bioinf.mpi-inf.mpg.de/netanalyzer/index.php.

Page 16: Research Article Supervised Clustering Based on DPClusO ...downloads.hindawi.com/journals/bmri/2014/831751.pdf · Research Article Supervised Clustering Based on DPClusO: Prediction

Submit your manuscripts athttp://www.hindawi.com

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttp://www.hindawi.com

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

Microbiology