Construct an Optimal Triage Prediction Model: A Case Study of the Emergency Department of a Teaching Hospital in Taiwan

ORIGINAL PAPER

Construct an Optimal Triage Prediction Model: A Case Studyof the Emergency Department of a Teaching Hospitalin Taiwan

Shen-Tsu Wang

Received: 9 July 2013 /Accepted: 13 August 2013 /Published online: 29 August 2013# Springer Science+Business Media New York 2013

Abstract The purpose of triage is to prevent the delay oftreatment for patients in real emergencies due to excessivenumbers of patients in the hospital. This study uses the data ofpatients of consistent triage to develop the triage predictionmodel. By integrating Principal Component Analysis (PCA)and Support Vector Machine (SVM), the anomaly detection(overestimate and underestimate) prediction accuracy rate canbe 100 %, which is better than the accuracy rate of SVM(about 89.2 %) or Back- propagation Neural Networks(BPNN) (96.71 %); afterwards, this study uses SupportVector Regression (SVR) to adopt Genetic Algorithm (GA)to determine three SVR parameters to predict triage. Afterusing the scroll data predictive values, we calculate theAbsolute Percentage Error (APE) of each scroll data. Theresulting SVR’s Mean Absolute Percentage Error (MAPE) is3.78 %, and BPNN’s MAPE is 5.99 %; therefore, the pro-posed triage prediction model of this study can effectivelypredict anomaly detection and triage.

Keywords Patients . Triage . Predictionmodel . PCA .

Anomaly detection . SVM

Introduction

This study summarizes the medical processes and triage statusof patients in medical centres in order to develop a triageprediction model. On one hand, various parameters of clinicaldiagnostic significance are identified by the triage predictionmodel; on the other hand, through a data mining prediction

model, concrete data mining steps can be defined. This studyextracted required samples from the patient visit database ofthe emergency department of a teaching hospital in Taiwan forthe required summary and classification of data processing.From process construction and parameter selection to sampleselection, as well as literature review, we have engaged inthorough discussions with the administrative directors andprofessionals of the emergency department. The research de-sign, parameter measurement, data acquisition, and equipmentexplanations are as illustrated below. The purpose of theestablishment of a triage system in the emergency departmentis: “to prevent the delay of treatment for patients in realemergencies due to excessive numbers of visits to the hospi-tal”. The clinic process of the emergency department is: “afterpatients are admitted to the emergency room, the nursing staffwill implement triage examination according to the criticalityof patients, and distribute them to the various divisions forpatient treatment. After the completion of the clinic treatmentof physicians, the patients will be arranged according toconditions, and then, the triage determinations of nursing staffare evaluated”. If the nursing staff and physicians are consis-tent in triage, the triage is correct. If the criticality judgment ofthe nursing staff is more serious than that of the physician, it isknown as an “overestimate”, if the criticality judged by thenursing staff is less serious than that of the physician, it isknown as an “underestimate” [1, 2].

Regarding the criteria of the nursing staff, they are subjectto the patient’s complaints of (c/o) medical history, generalappearance, vital signs, symptoms and signs, and physicalassessment results; whereas, the patient’s medical history,general appearance, symptoms and signs, physical assessmentresults cannot be quantified. Therefore, they are combined bythe emergency department as the patient c/o (the patient’sdescriptions of the patient’s own conditions). The vital signsconsist of 6 parameters, including breathing, temperature,pulse, diastolic pressure (Dias.), systolic pressure (Systolic),

S.<T. Wang (*)Department of Commerce Automation and Management, NationalPingtung Institute of Commerce, No. 51, Min Sheng E. Road,Pingtung 900, Taiwan, Republic of Chinae-mail: [email protected]

J Med Syst (2013) 37:9968DOI 10.1007/s10916-013-9968-x

and SaO2. Wherein, with respect to breathing, as the frequen-cy of abnormal breathing of patients is rare, the nursing staffwill report the abnormality in breathing with c/o for evaluationat the same time. In summary of the emergency department’sprocesses and parameters, this study converted the processinto the nursing staff triage and a system analysis predictionmodel. The existing triage processes and relevant examinationitems can be found in the model, and patients are classifiedinto Level 1, Level 2, Level 3, and Level 4 for overestimatedand underestimated judgment by criticality. In this study, thedata of patients with consistent triage are used to develop atriage prediction model [2, 3].

Regarding data prediction, emergency department activityreflects the global increase of patients' health problems duringthis period, the profile of patients referred to emergencydepartments might be a basis to detect an excess mortality inthe catchment area [4]. This reference objective was to devel-op a real-time surveillance model based on emergency depart-ment data to detect excessive heat-related mortality as early aspossible. A simulation model has identified that emergencydepartment diversion could be negligible (less than 0.5 %) ifpatients discharged home stay in emergency department notmore than 5 h, and patients admitted into the hospital stay inemergency department not more than 6 h [5]. Using fullfactorial design of experiments with two factors and themodel’s predicted percent diversion as a response function.We extracted the data of various parameters from the patientvisit database of the emergency department (including theregistration query database, triage database, medical orderquery database). If the patient data contained extreme valuesor missing values, we would discuss the case with the profes-sionals in the emergency department to confirm whether thedata should be modified or deleted before inputting the sum-marized patient data into the data mining classification tool forsubsequent analysis to improve triage decision making con-sistency. As the classification results are subject to the impactof an algorithm, parameter selection, and data acquisition, inorder to increase the credibility of the experiment and improvethe classification accuracy rate, the data were divided intotraining and testing groups prior to analysis for 10 crossvalidations.

Regarding the parameters, according to literature, we usedthe patient’s c/o, medical history, general appearance, vitalsigns, symptoms and signs, and physical assessment resultsfor measurement. After discussions, the head of the emergen-cy departments also believed that the above parameters cansufficiently describe patient triage. Below are the illustrationsof the above parameters. The six parameters of vital signs areall ratio scales, according to the “Adult Triage Scale” of theDepartment of Health, Executive Yuan, patients with bodytemperature≧41 °C or ≦32 °C belong to triage Level 1, pa-tients with body temperature between 40 °C ~ 39 °C, or

between 35 °C~32 °C belong to triage Level 2; patient withbody temperature <39 °C belong to triage Level 3; theremaining patients belong to triage Level 4. Patients withsystolic pressure >220 mmHg or ≦80 mmHg belong to triageLevel 1; patients with systolic pressure between 220 mmHg~180 mmHg belong to triage Level 2; there are no cleardefinitions regarding patients belonging to triage Level 3 orLevel 4 [6]. In addition, according to the clinical diagnosis ofphysicians, regarding blood pressure: systolic pressure be-tween 140 mmHg~160 mmHg, and diastolic pressure be-tween 90 mmHg~95 mmHg are threshold values, when sys-tolic pressure >160 mmHg and diastolic pressure >95 mmHg,it is known as “hypertension”; when systolic pressure<90 mmHg ~100 mmHg, it is known as “hypotension”.Regarding adult pulse: when the pulse is more than 100 timesper minute, it is known as “Tachycardia”; when the pulse isfewer than 60 times per minute, it is known as “Bradycardia”;too fast or too slow heartbeats often affect the per minutestroke amount of blood to the heart. Regarding SaO2: thenormal concentration of SaO2 is 92 %~99 %, too low SaO2concentration can result in coma. Regarding breathing, wereported it along with c/o: for adults, with fixed breathingdepth, if the times of breathing is above 24 (per minute), it isknown as “tachypnea”; if the breathing times is below 8 (perminute) it is known as “oligopnea”; when breathing times arenormal and the breathing depth is deeper, it is known as“breathing transition”; when the breathing times is fixed andthe breathing depth is shallower, it is known as “hypopnea”;breathing transition and tachypnea are known as “excessiveventilation”. It can be learnt from the above data that, theparameters of vital signs have not been fully listed on the“Adult Triage Scale”. With the exception of temperaturesbeing explicitly classified into four levels, the remaining pa-rameters depend on the judgment of experienced nursing staff[2, 3, 7–14].

In the busy society of Taiwan with upgrading medicalservice quality, many important documents discussedupgrading medical service quality, including waiting is unex-pected by the patients. Therefore, Song et al. (2010)[15]attempted to identify the optimal physician inquiry start timeby solving a goal-programming problem. One of our simula-tion results shows that the proposed optimal physician inquirystart time decreased patient wait times by 50 %, withoutincreasing overall physician utilization. The solutions of thisdocument greatly shortened the wait time of patients. Su et al.(2012) [16] considered that a pressure ulcer is a seriousproblem during patient care processes. Four data mining tech-niques, namely, Mahalanobis Taguchi System (MTS),Support Vector Machines (SVMs), Decision Tree (DT), andLogistic Regression (LR), are used to select the importantattributes of data in order to predict the incidence of pressureulcers. This reference concludes that data mining techniques

9968, Page 2 of 11 J Med Syst (2013) 37:9968

can help identify important factors and provide a feasiblemodel to predict pressure ulcer development. The findingsof literature exempt patients with pressure ulcer caused by toomuch pain. Yeh et al. (2008)[2] evaluated inter-rater reliabilityfor the present emergency treatment triage system, and com-pared case histories with computer data after systematic sam-pling. The findings showed that, the accuracy of using thetriage result of emergency physicians to predict the hospitali-zation of patients was higher than emergency nursing personnelby more than two times. Wei et al. (2012)[17] used data miningtechniques to investigate the disease forms of various adminis-trative areas, in order to analyze the differences among thevarious administrative areas, to further draw up a disease dis-tribution map. It is hoped that such mapping may help formu-late future public health strategies and lead to more appropri-ately allocated medical resources. Cheng et al. (2012)[18]considered the healthcare failure mode, where and effect anal-ysis (HFMEA) was adopted to identify potential chemotherapyprocess failures. Chemotherapy is regarded as a high-risk pro-cess. Multiple errors can occur during ordering, preparation,compounding, dispensing, and administering medications,which can lead to serious consequences. HFMEA is a usefultool to evaluate potential risk in healthcare processes.

It is learnt from relevant literature that, Back- propagationNeural Networks (BPNN) and Support Vector Machine(SVM) have considerably good prediction effectiveness[19–22]. Differing from previous studies and methodologyapplications [2, 15–22]. This study combined PrincipalComponent Analysis (PCA) and SVM to analyze anomalydetection (overestimate and underestimate) for comparisonwith SVM and BPNN regarding accuracy rate. Next, we usedSupport Vector Regression (SVR) to search for the three SVRparameters, using the Genetic Algorithm (GA) to predicttriage. After using the scroll data predictive values, we calcu-lated the Absolute Percentage Error (APE) of each scroll dataand evaluated performance using Mean Absolute PercentageError (MAPE). It is expected that the proposed triage predic-tion model can be used to effectively predict anomaly detec-tion and triage for the reference of relevant medical institu-tions [23–26].

Literature review

Regarding BPNN, an evolving hybrid neural approach [19] isconsidered in this study. To evaluate the effectiveness of theproposed approach, Production Simulation (PS) is employedto generate test data. According to experimental results, theprediction accuracy of the evolving hybrid neural approach issignificantly better than those of many existing approaches. Inaddition, to improve the practicability of the evolving hybridneural approach, several issues in practical applications are

addressed and discussed. PS has been used for a reliable watersupply policy, which was specifically applied during the dryseason for accurate predictions of water table depth fluctua-tions [20]. Owing to the difficulties of identifying a non-linearmodel structure, and estimating its associated parameters, aBPNN and Radial Basis function network (RBFN) model istaken into account for this study. A back propagation neuralnetwork model, with a delta algorithm, was calibrated usinghistorical groundwater level records and related hydro-meteorological data to simulate water table fluctuations inthe study area. Similarly a RBFN network has been used toanalyze the water table depth prediction for four differentstations. The importance performance analysis (IPA) modelhas been widely used as the primary tool for customer satis-faction management [27]. IPA is a 2-D matrix analysis basedon the importance and performance of the organization fromthe customer perception of quality. The firm’s customer satis-faction management strategy is formulated according to theIPA analysis results. This reference puts forth a new decisionmaking and analysis methodology that will exploit the BPNNto establish quality characteristics and important hidden inte-gral satisfaction assumptions. The decision making trial andevaluation laboratory (DEMATEL) is used to calculate thecausal relationship and extent of mutual influence amongqualities in order to adjust the importance of quality charac-teristics and identify the core Order-Winners and Qualifiersproblems. The proposed method modifies quality importance,improves IPA model ranking, and resolves difficult practicalproblems with fewer resources. This study illustrates usingTaiwan industrial computers, working in conjunction withIPA models established with BPNN and DEMATEL, to ob-serve its application and effect.

Regarding PCA, Li et al. (2007) [28] considered a batch ofjobs waiting for the services of a machine or resource, as it isdesirable to minimize the variance of job waiting times,Waiting Time Variance (WTV), for the service stability of allthe jobs in the batch in order that all jobs have approximatelythe same waiting times. Many factors, including the sum ofthe jobs' processing times, the probability distribution of jobprocessing times, and the scheduling method may influencethe variance of job waiting times. This study uses multivariateexploratory techniques, such as PCA and CorrespondenceAnalysis (CA), along with other statistical analysis techniquesto investigate these factors. Yang et al. (2009) [29] consideredthe paddy stem borer (Scirpophaga incertulas), which is animportant insect pest of rice. Damaged plants wither and thetassels die or become blanched and infertile. Severe infesta-tion leads to greatly decreased grain production. The bestdamage control requires accurate descriptions and forecastingof population dynamics. This paper applies PCA and BackPropagation (BP) Artificial Neural Network (ANN) methodsto analyze historical data on population occurrence to

J Med Syst (2013) 37:9968 Page 3 of 11, 9968

determine a non-line relation between pest occurrence andmeteorological factors to build the prediction model. Testresults show that there exactly exists a non-line relation be-tween insect population occurrence and meteorological fac-tors, and the new prediction model, based on BP ANN andPCA, improved prediction accuracy, as compared with othermethods. Lloyd (2010) [30] analyzed the quantitative sourcesof data on multiple population characteristics is oftenconducted through the use of some form of multivariatestatistical procedure, such as PCA. Such approachesassist in the identification of characteristics that groupdifferent populations, or those that vary between thegroups. This paper focuses on two particular problemsthat are rarely considered in the analysis of multivariatepopulation data. The methods are illustrated through acase study, which focuses on selected characteristics ofthe population of Northern Ireland, as represented indata released from the 2001 Census of Population.Key substantive findings included that, the characteris-tics which most strongly differentiate members of thepopulation are geographically variable.

Regarding SVM, standard univariate analysis of neuroim-aging data has revealed a host of neuroanatomical and func-tional differences between healthy individuals and patientssuffering a wide range of neurological and psychiatric disor-ders [21]. While the findings are significant only at the grouplevel, these findings have had limited clinical translation, andrecent attention has turned toward alternative forms of analy-sis, including SVM, which is a type of machine learning.SVM allows categorization of an individual's previously un-seen data into a predefined group using a classification algo-rithm developed on a training data set. In recent years, SVMhas been successfully applied in the context of disease diag-nosis, transition prediction, and treatment prognosis, usingboth structural and functional neuroimaging data. Here weprovide a brief overview of the method, and review studiesthat applied it to the investigation of Alzheimer's disease,schizophrenia, major depression, bipolar disorder,presymptomatic Huntington's disease, Parkinson's disease,and autistic spectrum disorder. Chang et al. (2010) [22] con-sidered most thyroid nodules are heterogeneous with variousinternal components, which confuse many radiologists andphysicians with their various echo patterns in ultrasound im-ages. Numerous textural feature extraction methods are usedto characterize these patterns in order to reduce themisdiagnosis rate. Thyroid nodules can be classified usingthe corresponding textural features. In this paper, six SVMsare adopted to select significant textural features and classifythe nodular lesions of a thyroid. Experimental results showthat the proposed method can correctly and efficientlyclassify thyroid nodules. We et al. (2008) [31] consid-ered the implicit characteristics of learning disabilities(LD), as the identification or diagnosis of students with

learning disabilities has long been a difficult issue. The LDdiagnosis procedure usually involves the interpretation ofstandard tests or checklist scores for comparison to the normsderived from statistical methods. This study applies two well-known artificial intelligence techniques, ANN and SVM, tothe LD diagnosis problem. To improve overall identificationaccuracy, we also experiment with GA-based feature selectionalgorithms as the pre-processing step. To the best of ourknowledge, this is the first attempt at applying ANN orSVM to similar applications. The experimental results showthat, in general, ANN performs better than SVM in thisapplication, the wrapper-based GA feature selection proce-dure can improve LD identification accuracy, and the combi-nation of using the SVM learner in the feature selectionprocedure and the ANN learner in the classification stageresults in a feature set that achieves the best prediction accu-racy. Most important of all, the study indicates that the ANNclassifier can correctly identify up to 50 % of the LD studentswith 100 % confidence, which is much better than the cur-rently used LD diagnosis predictors derived through the sta-tistical method.

SVR has often been applied in the prediction of financialtime series with many characteristics. On account of the timeconsumption of a global SVR, local machines are conductedto accelerate computation. In this reference, Jiang and He(2011) [26] introduces the local grey SVR (LG-SVR), whichintegrated grey relational grades with local SVR for financialtime series forecasting. The pattern search method and leave-one-out errors are adopted for model selection. Experimentalresults of three actual financial time series predictions dem-onstrate that LG-SVR can increase computing speed andimprove prediction accuracy. Lu (2009) [32] considered thatfinancial time series are inherently noisy and non-stationary,and is regarded as one of the most challenging applications oftime series forecasting. Due to its advantage of generalizationcapability in obtaining a unique solution, SVR has also beensuccessfully applied in financial time series forecasting. In themodelling of financial time series using SVR, one of the keyproblems is the inherent high noise, thus, detecting and re-moving noise are important, but difficult tasks, when buildingan SVR forecasting model. To alleviate the influence of noise,a two-stage modelling approach, using independent compo-nent analysis (ICA) and SVR, is proposed in financial timeseries forecasting. ICA is a novel statistical signal processingtechnique that was originally proposed to find the latentsource signals of observed mixture signals, without havingany prior knowledge of the mixing mechanism. The proposedapproach first uses ICA for forecasting variables in order togenerate the independent components (ICs). After identifyingand removing the ICs containing noise, the remaining ICs arethen used to reconstruct the forecasting variables, which con-tain less noise and serve as the input variables of the SVRforecasting model.

9968, Page 4 of 11 J Med Syst (2013) 37:9968

Integration of PCA and SVM optimization methodologies

This study combined PCA and SVM to analyze anomalydetection (overestimate and underestimate) for accuracy com-parison with SVM and BPNN. Next, we used SVR to searchfor the three SVR parameters by using the GA to predicttriage. After using the scroll data predictive values, we calcu-lated the APE of each scroll data and evaluated performanceusingMAPE. It is expected that the proposed triage predictionmodel can be used to effectively predict anomaly detectionand triage for the reference of relevant medical institutions.

Principal Component Analysis (PCA)

PCA is used to identify representative patterns in data. Thepurpose of PCA is to find a new dimension (attribute) set toobtain data changes. The identified representative patterns inthe data can be used as a pattern-finding technique. Mostchanges in data can be obtained from a small portion of alldimension sets, which are scaled down dimensions that gen-erate data of very low dimensions. When the dimensions havebeen scaled down, much of the noise can be reduced. Throughsearching for the variable maximum projection axis, PCA canjudge the number of independent variables. As it deletesunimportant or irrelevant data, and develops new variablesusing highly correlated data, it can reduce the complexity ofcomputation and avoid the loss of information [28, 29]. Theassumption data matrix is as shown in Eq.(1):

X ¼x11 x12 ⋯ x1mx21 x22 ⋯ x2m⋮ ⋮ ⋯ ⋮xn1 xn2 ⋯ xnm

2664

3775 ð1Þ

where, X is the Data Set, x ij represents the value of variable jof sample i , and x i is the vector of object i in the data set. Asvariables may be correlated, they can be integrated into newassessment variables by PCA, as shown in Eq.(2):

y1 ¼ a11x1 þ a12x2 þ⋯þ a1mxmy2 ¼ a21x1 þ a22x2 þ⋯þ a2mxm

⋮ym ¼ am1x1 þ am2x2 þ⋯þ anmxm

8>><>>: ð2Þ

where, y i and y j (i ≠j) are mutually independent, and y1 is themaximum difference of the squares of all linear combinationsof x1, x2, …, xm that satisfy the above linear equations, y2 isthe second largest difference of squares,……ym is the newevaluation variable of the least square difference. Thereforey1, y2… ym are respectively known as the first, the second…No, and m principal components and square differences grad-ually decrease.

PCA determines the data conversion that satisfies the fol-lowing characteristics:

(1) Each pair of new attributes has 0 covariance (for differentattributes).

(2) The attributes will be sorted by the number of variances.(3) An attribute may have many data variances.(4) Subject to vertical demand, each consecutive attribute

may obtain the remaining variances, as possible.

Data conversions of these characteristics can be obtainedby using eigenvalue analysis of the covariance matrix.Regarding the derivation of the algorithm, the data set matrixshould be standardized. For example, if the matrix data haveno differences of dimension, Z-Score transformation shouldbe directly implemented; if the standardized data matrix is A ,then, the correlation matrix R =ATA , AT is the A permutationmatrix, and the correlation coefficient matrix of variables is asshown in Eq.(3):

R ¼r11 r12 ⋯ r1mr21 r22 ⋯ r2m⋮ ⋮ ⋯ ⋮rn1 rn2 ⋯ rnm

2664

3775 ð3Þ

Next, feature selection mainly filters repeated or ir-relevant attributes; however, some data may be lost. Theduplicate features/characteristics means that most infor-mation contains one or more other attributes, whileirrelevant characteristics means most information isunavailable.

m eigenvalues of the characteristic equation λi:

R−λij j ¼ 0;λ1 > λ2 > … > λ0≥0

As the correlation coefficient matrix is a symmetric matrix,the corresponding eigenvector ai= (ai1, ai2, ⋯, aim) of ei-genvalue λi should satisfy Eq. (4):

aia j ¼ 1; i ¼ j0 ; i≠ j

�ð4Þ

By using eigenvectors, we can obtain the correspondingPCA operation equation (5):

yi ¼ ai1; ai2;…; aimð Þx1x2⋮xm

0BB@

1CCA ð5Þ

Each principal component summarizes the extent ofthe original variables, which can be represented by the

J Med Syst (2013) 37:9968 Page 5 of 11, 9968

corresponding square differences contribution rate, which ratecan be calculated by Eq.(6):

f i ¼λiX

j¼1

m

λ j

ð6Þ

where, f i is the square difference contribution rate of No. iprincipal component, and λ i is the corresponding eigenvalueof No. i principal component. If the preset principal compo-nent contribution rate threshold value is α, by selecting the kprincipal component, according to the descending order of thesquare difference contribution rate, the square difference cu-mulative contribution rate is greater than the threshold, asshown in Eq.(7), and k principal component is then a con-firmed principal component.

Xj¼1

m

λ j > α ð7Þ

As the correlation coefficient matrix is calculated on thebasis of the standardized data matrix, the original data matrixvalues cannot be directly input when calculating the principalcomponents. Instead, the data matrix should be input into theprincipal component operation equation. The objective of thedata set can be evaluated through the synthesized value of theprincipal components, as shown in Eq.(8).

F ¼Xi¼1

k

f i yi ð8Þ

PCA optimizes the integration of the multi-variable planedata. When using regression analysis, grouping, and otheranalysis methods, it can be used to effectively reduce thevariables of multiple data.

SVM and SVR

SVM is the most effective and accurate classifier in recentyears. Its theoretical framework is mainly based on theStructural Risk Minimization principle (SRM) of the statisti-cal learning theory. SVMmainly uses a method that separatesthe hyper-plane to search for the maximum margin, and fur-ther separates the data into dual or multi-class categories. Thenew regression technology developed on the basis of SVM isknown as SVR, which is extensively applied in the predictionof time series. Due to its global optimal solution and consid-erations of structured risk, it has recently attracted increasingattention[22, 26, 31, 32].

SVM is a machine learning classifier that distinguishesmargin maximization. If {X i} is the training data, {Yi}

represents the classification category {y i∈+1,−1}, SVM canfind determine the different categories of data of the optimalseparating hyper-plane f (x )=wTx −b to distinguish the twocategories of data. The classification margin (Hyper-plane)consists of weight vector w and bias b , and the maximizationof the margin is the purpose of the support vector.

By combining Eqs.(9) and (10) into y i[wT ⋅x i]+b ≥1,

if the distance between any two points of differentcategories is M ¼ 2

wk k , in order to obtain the hyper-plane of

the maximum margin, it must solve the (w, b ) optimizationproblem:

WT ⋅Xþ� �þ b≥ þ 1; yi ¼ þ1 ð9ÞWT ⋅X −� �þ b≤ þ 1; yi ¼ −1 ð10Þ

Vapnik (2000) [33] used the ε-insensitivity loss func-tion to propose the regression SVM, namely, SVR.According to the use of SVR to process regression forprediction, the purpose is to determine the optimalweight vector w and bias b . By using historical obser-vation data, G ={x i ,d i}i

n ,x i are used to represent theinput vectors, and d i is used to represent the expectedprediction value, namely, the actual value of the histori-cal observation data, where n denotes the number ofhistorical observation data, and the unknown function isapproached by the regression method to develop Eq.(11).

y ¼ f x;wð Þ ¼ wϕ xð Þ þ b ð11Þ

When ϕ (x ) denotes a high dimensional feature space, itexplains that the f (x ) function is non-linear. In addition, thereare two characteristics of the SVR approximation function, (1)in regression, it uses ε-insensitivity as the loss function toexplain the risk minimization problem, (2) risk minimizationis based on the SVR theory: ‖w‖2≤ the defined constant. SVRcan be represented by Eq.(12) due to the above risk minimi-zation characteristics

RSVM Cð Þ ¼ 1

2wk k2 þ C

Xi¼1

n

di−yij j !

ð12Þ

12 wk k2 is known as a regularized term, C ∑

i¼1

n

di−yij j� �

is

then the empirical error, C is the regularization con-stant, which determines the conversion between empiri-cal errors and regularized terms, and ε and C are setmanually by experiential rules. Hence, there is no ap-propriate value that is suitable for all the historicalobservation values to be predicted. When the gap be-tween prediction value and the actual value of theregression equation f (x ) is calculated, if the error is

9968, Page 6 of 11 J Med Syst (2013) 37:9968

smaller than ε , then |d −y |ε=0, otherwise, the error ofsubtraction is represented by Eq.(13).

d−yj jε ¼0 ; d−yj j ≤ εd−yj j−ε; otherwise

�ð13Þ

The purpose of SVR is to minimize RSVM . In Eq.(13), theslack variables are ξ and ξ*, namely, the SVR primal problemis input before introducing the Lagrange multipliers to obtainEq.(14).

f x;αi;α�i

� � ¼Xi¼1

n

αi;α�i

� �K x; xið Þ þ b ð14Þ

By using Lagrange variables α i and αi*, the SVR primal

problem is converted into the SVR dual problem, as repre-sented in Eq.(15):

MinR α;α�ð Þ ¼ −1

2

Xi; j¼1

n

αi−α�i

� �α j−α�

j

� �K xi−x j� �

−εXi¼1

n

αi þ α�i

� �þXi¼1

n

di αi þ α�i

� �s:t:Xi¼1

n

αi−α�i

� � ¼ 0

0 ≤ αi≤ C ; 0 ≤ α�i ≤ C ; i ¼ 1 ; 2 ; … ; n

ð15ÞAccording to KKT conditions, the dual problem can be

solved by the SVR primal problem α i and αi*. The optimal

weight is rewritten into w0, and the optimal bias is convertedinto b0 , as represented by Eqs.(16) and (17):

w0 ¼Xi¼1

n

α�i −αi

� �φ xð Þi ð16Þ

b0 ¼ 1

n

Xi¼1

n

yi−wT0φ xð Þ

" #ð17Þ

In Eq. (14),K (x i, x j) is the SVR core function, and “spatialconversion"is the function of the core function, which con-verts the input space into an Eigen space. The core functionscommonly used in SVR are the same as SVM: polynomial, aradial basis network, and multilayer perception machine. Inthis study, the SVM core function is a Gaussian type radialbasis network. SVM is an effective algorithm that can be usedto find the global optimization of the objective function. It canimplement capacity control by the margin of the decision-making boundary maximization and establish proxy variablesin accordance with the attribute variables of each category.Hence, it can be applied to data of various category types.With certain classification qualities, SVM is also one of thecommonly used classification algorithms.

Case study of the emergency department of a teachinghospital in Taiwan

Data acquisition

From the emergency department of a teaching hospital inTaiwan, this study obtained 109,360 samples of patient visitsin 2010 (including 312 samples with missing values).Regarding data processing, we used the “medical record num-ber” as the first keyword and “visit time” as the secondkeyword, as well as the six parameters of “parameter mea-surement” for filtering. If the visiting patient has any blankcolumns, it is regarded as a missing value, and thesample will be deleted. Upon summary, patients oftriage Level 1 accounted for 10.79 %, Level 2 patientsaccounted for 36.53 %, Level 3 patients accounted for52.16 %, and Level 4 patients accounted for 0.52 %. Ina total, 561 samples have triage filtering abnormalities(overestimate and underestimate).

To determine the implicit information and rules of thepatient data, using levels of triages as the target values, werandomly selected 3000 samples of consistent patient data in2010 for subsequent analysis. The sample extraction in thisstudy is to ensure that the prior probabilities of all levels arethe same to randomly select 500 samples per level. As variousparameters are measured in different units, in order to evaluatethe significance of various parameters, the 3000 samples werestandardized and prior probability was set at 0.25 for allgroups.

Data summary

According to literature, classification results are subject toclassification technology, parameter selection, and data acqui-sition. Therefore, in order to increase the credibility of theexperiment, and improve the classification accuracy rate, the3000 samples were divided into the training and testinggroups for 10 cross validations. The experimental method ofthe 10 cross validations divides the 3,000 samples in the

Table 1 Raw data grouping

Training Data Set

Level 1 Level 2 Level 3 Level 4

A 75 75 75 75

B 75 75 75 75

C 75 75 75 75

⋮ ⋮ ⋮ ⋮ ⋮

I 75 75 75 75

Test Data Set

Level 1 Level 2 Level 3 Level 4

J 75 75 75 75

J Med Syst (2013) 37:9968 Page 7 of 11, 9968

proportion of 9: 1 into the Training Data Set (2,700) and TestData Set (300). Wherein, the Training Data Set includes ninesubsets with same amount of data (namely, the data amount ofthe nine subsets of the Training Data Set and the nine subjectsof the Test Data Set is 300 samples). It should be noted inparticular, each subset should contain 75 samples of varioustriage levels. In other words, the 3000 samples in 10 subsetsshould contain 75 samples of triage Level 1, 75 samples ofLevel 2, 75 samples of Level 3, and 75 samples of Level 4.After data assignment, the subsets should be named. The ninesubsets of the Training Data Set were named as A, B, C… I,Test Data Set was named as J (as shown in Table 1). Inaddition, it can be found from Table 2 that, the differencesbetween cross validation and single experiments are: forconducting one experiment (Experiment 1), the data varianceof Training Data Set (A~I) or Test Data Set (J) may beneglected; therefore, researchers usually expect to convertTest Data Set (J) into the Training Data Set, or convert theTraining Data Set (A~I) into the Test Data Set for experimen-tation. The cross validation rules derived from these conceptsare: Experiment 1 uses Data Set A, B, C,…, I as the Training

Data Set, Data Set J as the Test Data Set for experiment;Experiment 2, Data Set J of the Test Data Set and TrainingData Set Awere switched by using Data Set J, B, C,…, I as theTraining Data Set and Data Set A as the Test Data Set forexperiment. Experiment 3 (Experiment 3), Data Set A of TestData Set and Training Data Set B were switched by using DataSet J, A, C, D,…, I as the Training Data Set and Data Set B asthe Test Data Set for experiment. By analogy, the 10 cross-validations were completed [19, 20, 27, 29, 30].

In data preprocessing, data integrity was checked.According to expert suggestions, inappropriate parameterswere deleted to maintain feasible and integral data. Afterpreliminary data preprocessing, this study is divided intotwo parts using different data mining technologies. Anomalydetection (overestimate and underestimate) uses PCA to im-plement the optimal integration of the multiple variable planedata regarding actual triage data to reduce the dimensions ofparameters and obtain the eigenvalue and eigenvector usingthe conversion of a linear equation; and thus, reducing param-eters from (26) to (10); after reduction in the number ofparameters, we marked the pairs of parameters on the X andY axis to filter the noise data. The PCA implementationsection of the results are as shown in Fig. 1, the outlier datashown, O(1)~O(10), were removed from the population ofsamples.

Integration of PCA and SVM

First, the original 3000 samples were divided into 2400 train-ing data and 600 training data in the proportion of 4:1. Thetraining data are used to design the SVM model and thevalidation data are used to validate model effectiveness. Byusing the validation data, it was found that 26 parameters maybe the causes of prediction abnormality. By PCA, the original26 possible parameters can be converted into 10 major givenvalue parameters. Table 3 illustrates the SVM classification

Table 2 Cross-validation of original data

Training Data Set Test Data Set

Experiment 1 A B C D E F G H I J

Experiment 2 J B C D E F G H I A

Experiment 3 J A C D E F G H I B

Experiment 4 J A B D E F G H I C

Experiment 5 J A B C E F G H I D

Experiment 6 J A B C D F G H I E

Experiment 7 J A B C D E G H I F

Experiment 8 J A B C D E F H I G

Experiment 9 J A B C D E F G I H

Experiment 10 J A B C D E F G H I

Fig. 1 Filtering of noise data

9968, Page 8 of 11 J Med Syst (2013) 37:9968

accuracy rate and operational time. The average value of thetesting ratio was used as the basis for model accuracy rate.Table 3 illustrates the results of integrating SVM and PCA, theanomaly detection (overestimate and underestimate) predic-tion accuracy rate is 100 %, which is better than that of SVM(about 89.2 %).

According to the results of the above two tables, by inte-grating PCA, the result accuracy rate can be improved from89.2 to 100 %, the operational time can be reduced from 981to 382.8 s, and the accuracy rate can be improved by about10.8 %. It can be learnt from the two tables that SVM predic-tion has a high accuracy rate. If it can be combined with PCA,it can considerably improve the overall accuracy rate andimplementation time of the model.

BPNN

In BPNN, the most appropriate number of neurons in thehidden layer should be found first. This study attempted thetrial and error method to determine the optimal processingunits between half the number of neurons of the input layer;and the number of neurons of the output layer to [20, 27] twicethe number of neurons of the input layer; then calculated theconvergence value of MSE (mean square error). By

attempting the number of neurons from 16 to 24, it wassuggested that the MSE value gradually converged. By divid-ing the original data into training data and validation data, at a4:1 ratio of SVM, we input the number of neurons for calcu-lation of the accuracy rate to obtain the results, as shown inTable 4. The optimal number of neurons is 21, the anomalydetection (overestimate and underestimate) training data ac-curacy rate is 99.67 %, and the validation data accuracy rate is96.71 %; suggesting the BPNN anomaly detection accuracyrate is poorer than that of the method integrating PCA andSVM.

Comparison of the SVR and BPNN measurement predictions

To compare SVR and BPNN, this study used MAPE as theevaluation standard to compare prediction result error rates.For multiple predictions of the data, we validated by scrollingthe test data. Namely, after the training data predicted the firsttesting sample, the prediction of the second testing sample isconducted by scrolling the N-1 sample of the training data (Nis the total number of the training data samples) coupled withthe first testing data. The trainingmodel of the third test data isthe N-2 sample coupled with the previous two testing samples.By analogy, all the testing data are predicted.

In this study, SVR uses GA to search for the three SVRparameters (ε , C , σ ). First, the data are divided by a ratio of1:4:1 into training, validation, and testing data. The SVRparameters of the training data are adjusted using GA to obtainthe parameter candidate values, before being input into theSVR model, in order to select the optimal parameters. In theprocess of predicting the minimum residual value of thetraining data, the prediction values of the validation data arepredicted for comparison with the actual validation data. Theparameter candidate group with the minimum residual value isthe approximate optimal parameter solution of the SVR mod-el. Then, its actual predictability can be obtained by compar-ison with the prediction value of the testing data and the actualtesting data. Through the above method, the three parametersare: ε =0.0031287, C=1.36338, σ =0.381673.

In the time sequence of measurement prediction, BPNNsets the hidden layer number of neurons between 6 and 11.After repeated model implementation, the MAPE values ofthe training data of various numbers of neurons were validat-ed. When the number of neurons is 7, the MAPE value isminimum, namely, the error rate is lowest. The APE and

Table 3 Analysis of the prediction results by PCA integrated with SVM

SVM implementation results

No. i cross validity

i 1 2 3 4 5

Training data accuracy rate 98 % 99 % 97 % 96 % 95 %

Test data accuracy rate 79 % 82 % 98 % 99 % 88 %

Average accuracy rate 89.2 %

Implementation time(second)

982 969 1021 897 1036

Average implementationtime (second)

981

PCA integrated with SVM

i 1 2 3 4 5

Training data accuracy rate 100 % 100 % 100 % 100 % 100 %

Test data accuracy rate 100 % 100 % 100 % 100 % 100 %

Average accuracy rate 100 %

Implementation time(second)

368 368 368 368 368

Average implementationtime (second)

382.8

Table 4 Accuracy rate of BPNN by number of neurons

Number of neurons 16 17 18 19 20 21 22 23 24

Training data accuracy rate 99.19 % 99.36 % 98.19 % 99.23 % 98.29 % 99.67 % 98.98 % 99.11 % 98.29 %

Validation data accuracy rate 90.11 % 93.21 % 91.23 % 95.34 % 96.21 % 96.71 % 92.13 % 93.29 % 92.36 %

J Med Syst (2013) 37:9968 Page 9 of 11, 9968

MAPE values of BPNN are as shown in Table 5. Therefore,theMAPE of BPNN technology can be found at 5.99%. Afterobtaining the prediction values of 11 samples, after scrolling11 times, the APE values of each scrolling can be calculated,respectively, to obtain the MAPE of SVR technology at3.78 %.

Conclusions

By integrating PCA and SVM, the anomaly detection(overestimate and underestimate) prediction accuracy rate is100 %, which is better than that of SVM (about 89.2 %) andBPNN (96.71 %). As shown in Tables 3 and 4, in this study,SVR uses GA to search for the three SVR parameters topredict triage. By using the scroll data predictive values, wecalculated the APE of each scroll data, finding the SVRMAPE was 3.78 %, and BPNNMAPE was 5.99 %, as shownin Table 5; hence, the proposed triage prediction model caneffectively predict anomaly detection (overestimate and un-derestimate) and triage. The proposed model can provide areference to relevant medical institutions [2, 3, 13, 19, 21, 32].

Conflict of Interest The author declare that I have no conflict ofinterest.

References

1. Cheng, A. P., A discussion on the implications and measurementindicators of pre-hospital emergency medical care service quality forthe fire department. Cent Police Univ Police Stud Ser 34(5):155–184, 2004.

2. Yeh, S. Y., Bullard, M. J., and Hu, P. M., An evaluation of the Taiwantriage scale in a regional hospital. J Emerg Crit Care 19(3):102–112,2008.

3. Goransson, K. E., Ehrenberg, A., and Ehnfors, M., Triage in emer-gency departments: National survey. J Clin Nurs 14:1067–1074,2005.

4. Semenza, J. C., Are electronic emergency department data predictiveof heat-related mortality? J Med Syst 23(5):419–421, 1999.

5. Kolker, A., Process modeling of emergency department patient flow:Effect of patient length of stay on ED diversion. J Med Syst 32(5):389–401, 2008.

6. Department of Health, Executive Yuan, ROC (2010) Adult TriageScale, Department of Health, Executive Yuan website.

7. Campo, T., McNulty, R., Sabatini, M., and Fitzpatrick, J., Nursepractitioners performing procedures with confidence and indepen-dence in the emergency care setting. Adv Emerg Nurs J 30(2):153–170, 2008.

8. Carter, A. J., and Chochino, A. H., A systematic review of the impact ofnurse practitioners on cost, quality of care, satisfaction and wait times inthe emergency department. Can J Emerg Med 9(4):286–295, 2007.

9. Considine, J., Martin, R., Smit, D., Jenkins, J., and Winter, C.,Defining the scope of practice of the emergency nurse practitionerrole in a metropolitan emergency department. Int J Nurs Pract 12:205–213, 2006.

10. Curren, J., Nurse practitioners and physician assistants: Do you knowthe difference? Medsurge Nurs 16(6):404–407, 2007.

11. McGee, L. A., and Kaplan, L., Factors influencing the decision to usenurse practitioners in the emergency department. J Emerg Nurs33(5):441–446, 2007.

12. Lee, C. H., Kuan, J. T., Chiu, T. F., Szu, L. Y., Chen, L. C., Chen, J.C., and Ng, C. J., Coverage and appropriateness of the Taiwan adulttriage complaint list. J Taiw Emerg Med 9:65–71, 2007.

13. Li, X., Ye, N., Xu, X., and Sawhey, R., Influencing factors of jobwaiting time variance on a single machine. Eur J Ind Eng 1(1):56–73, 2007.

14. Steiner, I. P., Nichols, D. N., Blitz, S., Tapper, L., Stagg, A. P.,Sharma, L., and Policicchio, C., Impact of a nurse practitioner onpatient care in a Canadian emergency department. Can J Emerg Med11(3):207–214, 2009.

15. Song, W. T., Chih, M., and Bair, A. E., Improving the efficiency ofphysical examination services. J Med Syst 34(4):579–590, 2010.

16. Su, C. T., Wang, P. C., Chen, Y. C., and Chen, L. F., Data miningtechniques for assisting the diagnosis of pressure ulcer developmentin surgical patients. J Med Syst 36(4):2387–2399, 2012.

17. Wei, C. K., Su, S., and Yang, M. C., Application of data mining onthe development of a disease distribution map of screened commu-nity residents of Taipei County in Taiwan. J Med Syst 36(3):2021–2027, 2012.

18. Cheng, C. H., Chou, C. J., Wang, P. C., Lin, H. Y., Kao, C. L., and Su,C. T., Applying HFMEA to prevent chemotherapy errors. J Med Syst36(3):1543–1551, 2012.

19. Chen, T., and Wang, Y. C., An evolving hybrid neural approach forpredicting job completion time in a semiconductor fabrication plant.Eur J Ind Eng 4(3):336–354, 2010.

20. Ghose, D. K., Panda, S. S., and Swain, P. C., Prediction of water tabledepth in western region, Orissa using BPNN and RBFN neuralnetworks. J Hydro 394:296–304, 2010.

21. Orrù, G., Using support vector machine to identify imaging bio-markers of neurological and psychiatric disease: a critical review.Neurosci Biobehav R 36(4):1140–1152, 2012.

22. Chang, C. Y., Chen, S. J., and Tsai, M. F., Application of support-vector-machine-based method for feature selection and classificationof thyroid nodules in ultrasound images. Pattern Recogn 43:3494–3506, 2010.

23. Huang, M. L., and Chen, H. Y., Development and comparison ofautomated classifiers for glaucoma diagnosis using stratus optical

Table 5 BPNN andSVR之MAPE BPNN SVR

Rolling APE Rolling APE

1 9.36 % 1 6.23 %

2 10.23 % 2 5.86 %

3 3.12 % 3 4.31 %

4 0.86 % 4 0.92 %

5 1.29 % 5 3.23 %

6 11.21 % 6 0.99 %

7 8.91 % 7 6.14 %

8 7.12 % 8 3.1 %

9 3.66 % 9 5.13 %

10 3.12 % 10 2.11 %

11 7.09 % 11 3.61 %

MAPE 5.99 % MAPE 3.78 %

9968, Page 10 of 11 J Med Syst (2013) 37:9968

coherence tomography. Invest Ophth Visual 46(11):4121–4129,2005.

24. Sharda, R., and Delen, D., Predicting box-office success of motionpictures with neural networks. Expert Syst Appl 30(2):243–254, 2006.

25. Abdolmaleki, P., Buadu, L. D., and Naderimansh, H., Feature extrac-tion and classification of breast cancer on dynamic magnetic reso-nance imaging using artificial neural network. Cancer Lett 171(2):183–191, 2001.

26. Jiang, H., and He, W., Grey relational grade in local support vectorregression for financial time series prediction. Expert Syst Appl39(3):2256–2262, 2011.

27. Hu, H. Y., Lee, Y. C., Yen, T. M., and Tsai, C. H., Using BPNN andDEMATEL tomodify importance–performance analysismodel–a studyof the computer industry. Expert Syst Appl 36(6):9969–9979, 2009.

28. Li, G., Lau, J. T., McCarthy, M. L., Schull, M. J., and Kelen, G. D.,Emergency department utilization in the united states and Ontario,Canada. Acad Emerg Med 14(6):582–584, 2007.

29. Yang, L. N., Peng, L., Zhang, L. M., Zang, L. L., and Yang, S. S., Aprediction model for population occurrence of paddy stem borer(Scirpophaga incertulas), based on back propagation artificial neuralnetwork and principal components analysis. Comput Electron Agr68(2):200–206, 2009.

30. Lloyd, C. D., Analyzing population characteristics using geo-graphically weighted principal components analysis: a casestudy of Northern Ireland in 2001. Environ Urban Syst34(5):389–399, 2010.

31. Wu, T. K., Huang, S. C., and Meng, Y. R., Evaluation of ANN andSVM classifiers as predictors to the diagnosis of students withlearning disabilities. Expert Syst Appl 34(3):1846–1856, 2008.

32. Lu, C. J., Lee, T. S., and Chiu, C. C., Financial time series forecastingusing independent component analysis and support vector regression.Decis Support Syst 47(2):115–125, 2009.

33. Vapnik, V., The nature of statistical learning theory. Springer, NewYork, 2000.

J Med Syst (2013) 37:9968 Page 11 of 11, 9968

Documents

Construct an Optimal Triage Prediction Model: A Case Study of the Emergency Department of a Teaching Hospital in Taiwan