12
Research Article A Robust Probability Classifier Based on the Modified 2 -Distance Yongzhi Wang, 1 Yuli Zhang, 2 Jining Yi, 3 Honggang Qu, 3,4 and Jinli Miu 3,4 1 College of Instrumentation & Electrical Engineering, Jilin University, Changchun 130061, China 2 Department of Automation, TNList, Tsinghua University, Beijing 100084, China 3 Development and Research Center of China Geological Survey, Beijing 100037, China 4 Key Laboratory of Geological Information Technology, Ministry of Land and Resources, Beijing 100037, China Correspondence should be addressed to Yongzhi Wang; [email protected] Received 9 January 2014; Revised 5 April 2014; Accepted 7 April 2014; Published 30 April 2014 Academic Editor: Hua-Peng Chen Copyright © 2014 Yongzhi Wang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We propose a robust probability classifier model to address classification problems with data uncertainty. A class-conditional probability distributional set is constructed based on the modified 2 -distance. Based on a “linear combination assumption” for the posterior class-conditional probabilities, we consider a classification criterion using the weighted sum of the posterior probabilities. An optimal robust minimax classifier is defined as the one with the minimal worst-case absolute error loss function value over all possible distributions belonging to the constructed distributional set. Based on the conic duality theorem, we show that the resulted optimization problem can be reformulated into a second order cone programming problem which can be efficiently solved by interior algorithms. e robustness of the proposed model can avoid the “overlearning” phenomenon on training sets and thus keep a comparable accuracy on test sets. Numerical experiments validate the effectiveness of the proposed model and further show that it also provides promising results on multiple classification problems. 1. Introduction Statistics classification has been extensively studied in the field of machine learning and statistics. A typical classification problem is to design a linear or nonlinear classifier based on a known training set such that a new observation can be assigned to one of the known classes. Many classification models have been proposed, such as the naive Bayes classifiers (NBC) [1, 2], artificial neural network [3], and support vector machines (SVM) [4]. In real-world classification problems, it is oſten the case that the data of training set are imprecise due to unavoidable observational noises in the process of data collection or data approximation from incomplete samples. One way to handle the data uncertainty is to design a robust classifier in the sense that it has the minimal worst-case misclassification probability for the training sets. e idea of robustness has been widely applied in many traditional machine learning and statistics techniques, such as robust Bayes classifiers [5], robust support vector machines [6], and robust quadratic regressions [7]. Robust classifiers are highly related to the recently flourished research on robust optimization. For more recent developments on robust optimization, we refer the readers to the excellent book [8] and reviews [9, 10]. Recently [11, 12] have proposed a robust minimax approach called the minimax probability machine to design a binary classifier. Unlike the traditional methods, they make no assumption on the class-conditional distributions, but only the mean and covariance matrix of each class are assumed to be known. Under this assumption, the designed classifier is determined by minimizing the worst-case proba- bility of misclassification under all possible choices of class- conditional distributions with the given mean and covariance matrix. By reformulating the classifier design problem into second order cone programming, they show that the com- putational complexity of the proposed approach is similar to that of SVM. Because of its computational advantage and competitive performance with other current methods, this Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 621314, 11 pages http://dx.doi.org/10.1155/2014/621314

Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Research ArticleA Robust Probability Classifier Based onthe Modified 1205942-Distance

Yongzhi Wang1 Yuli Zhang2 Jining Yi3 Honggang Qu34 and Jinli Miu34

1 College of Instrumentation amp Electrical Engineering Jilin University Changchun 130061 China2Department of Automation TNList Tsinghua University Beijing 100084 China3Development and Research Center of China Geological Survey Beijing 100037 China4Key Laboratory of Geological Information Technology Ministry of Land and Resources Beijing 100037 China

Correspondence should be addressed to Yongzhi Wang iamwangyongzhi126com

Received 9 January 2014 Revised 5 April 2014 Accepted 7 April 2014 Published 30 April 2014

Academic Editor Hua-Peng Chen

Copyright copy 2014 Yongzhi Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

We propose a robust probability classifier model to address classification problems with data uncertainty A class-conditionalprobability distributional set is constructed based on the modified 120594

2-distance Based on a ldquolinear combination assumptionrdquo for theposterior class-conditional probabilities we consider a classification criterion using the weighted sum of the posterior probabilitiesAn optimal robust minimax classifier is defined as the one with the minimal worst-case absolute error loss function value overall possible distributions belonging to the constructed distributional set Based on the conic duality theorem we show that theresulted optimization problem can be reformulated into a second order cone programming problemwhich can be efficiently solvedby interior algorithms The robustness of the proposed model can avoid the ldquooverlearningrdquo phenomenon on training sets and thuskeep a comparable accuracy on test sets Numerical experiments validate the effectiveness of the proposed model and further showthat it also provides promising results on multiple classification problems

1 Introduction

Statistics classification has been extensively studied in thefield ofmachine learning and statistics A typical classificationproblem is to design a linear or nonlinear classifier basedon a known training set such that a new observation canbe assigned to one of the known classes Many classificationmodels have been proposed such as the naive Bayes classifiers(NBC) [1 2] artificial neural network [3] and support vectormachines (SVM) [4]

In real-world classification problems it is often the casethat the data of training set are imprecise due to unavoidableobservational noises in the process of data collection or dataapproximation from incomplete samples One way to handlethe data uncertainty is to design a robust classifier in thesense that it has the minimal worst-case misclassificationprobability for the training sets The idea of robustness hasbeen widely applied in many traditional machine learningand statistics techniques such as robust Bayes classifiers [5]

robust support vector machines [6] and robust quadraticregressions [7] Robust classifiers are highly related to therecently flourished research on robust optimization Formorerecent developments on robust optimization we refer thereaders to the excellent book [8] and reviews [9 10]

Recently [11 12] have proposed a robust minimaxapproach called the minimax probability machine to designa binary classifier Unlike the traditional methods they makeno assumption on the class-conditional distributions butonly the mean and covariance matrix of each class areassumed to be known Under this assumption the designedclassifier is determined by minimizing the worst-case proba-bility of misclassification under all possible choices of class-conditional distributions with the givenmean and covariancematrix By reformulating the classifier design problem intosecond order cone programming they show that the com-putational complexity of the proposed approach is similarto that of SVM Because of its computational advantage andcompetitive performance with other current methods this

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 621314 11 pageshttpdxdoiorg1011552014621314

2 Mathematical Problems in Engineering

approach has been further extended to incorporating otherfeatures El Ghaoui et al [13] propose a robust classificationmodel by minimizing the worst-case value of a given lossfunction over all possible choices of the data in a boundedhyperrectangles Three loss functions from SVM logisticregressions and minimax probability machines are studiedin [13] Based on the same assumption of known meanand covariance matrix [14 15] propose the biased minimaxprobability machines to address the biased classificationproblem and further generalize it to obtain the minimumerrorminimax probabilitymachinesHoi and Lyu [16] study aquadratic classifier with positive definite covariance matricesand further consider the problem of finding a convex set tocover known sampled data in one class while minimizing theworst-case misclassification probability The minimax prob-ability machines have also been extended to solve multipleclassification problems see [17 18]

In this paper we propose a robust probability classifier(RPC) based on the modified 120594

2-distance Specifically for agiven training set we first estimate the probability of eachsample belonging to each class based on a feature whichis called a nominal class-conditional distribution Then a 120598-confidence probability distributional set 119875

120598is constructed

based on the nominal class-conditional distributions and themodified 120594

2-distance where parameter 120598 controls the size ofthe constructed set Unlike the ldquoconditional independenceassumptionrdquo in NBC we introduce a ldquolinear combinationassumptionrdquo for the posterior class-conditional probabilitiesand the proposed classifier takes a linear combination formof these probabilities based on different features and it willassign the sample to the class with the maximal posteriorprobability To get a robust classifier we minimize the worst-case loss function value over all possible choices of class-conditional distributions over the distributional set 119875

120598 The

underlying assumption is that due to observational noiseswe cannot obtain the true probability distribution of eachclass but it can be well estimated by the nominal distributionsuch that it belongs to the distributional set119875

120598

Our two major contributions are as follows First inour model the proposed distributional set 119875

120598is based on

the nominal distribution and the modified 1205942-distance As

pointed in [19] such distributional set can make use of moreinformation conveyed in the training set compared withtraditional robust approaches which only use the informationofmean and covariancematrix To the best of our knowledgethis is among the first study of classification models con-sidering complex distribution information Although [20]considers a 120598-contaminated robust support vector machinemodel its distributional set is defined by easily handledlinear constraints and its analysis is highly dependent oncharacterization of the extreme points of this set Here ourproposed distributional set is defined by nonlinear quadraticfunction and is analyzed by the conic duality theorem Secondby taking the absolute error function as the loss functionwe show how to transform our robust minimax optimizationproblem into computable second order cone programmingThe absolute error function in the objective function alsodistinguishes our model from other existing models such

as the soft-margin support vector machine which uses theHinge loss function [21 22] and the logistic regression whichuses the negative log likelihood function [23] Note that theabsolute error function is essential in our model to obtaina tractable optimization problem for the proposed modelNumerical experiments on real-world application validatethe effectiveness of the proposed classifier and further showthat the proposed classifier also performs well for multipleclassification problems

The paper proceeds as follows Section 2 introduces theproposed robust minimax probability classifier based on themodified 120594

2-distance and discusses how to construct thedesired distributional set 119875

120598 Section 3 provides an equivalent

reformulation by handling the robust constraints and robustobjective separately Numerical experiments on real-worlddata set are carried out to validate the effectiveness of theproposed classifier in Section 4 Section 5 concludes thispaper and gives future research directions

2 Classifier Models

In this section a simple probability classifier is first presentedand then it is extended to handle data uncertainty byintroducing a distributional set 119875

120598 We also discuss how to

construct this distributional set based on training data setConsider a multiclass multifeature classification problem

in which each sample contains |119871| features and there are|119869| classes and |119868| samples Specifically given a training set(119883 119884) isin R|119868|times|119871|times0 1

|119868|times|119869| where 119909119894119897denotes the 119897th feature

of the 119894th sample and 119910119894119895

= 1 if the 119894th sample belongs to119895th class otherwise 119910

119894119895= 0 In the following context we

will also use the term 119909119894 to denote the 119894th sample that is

119909119894= (1199091198941

119909119894|119871|

)

21 Probability Classifier Bayes classifiers assign an observa-tion 119909 to the 119895

lowast(119909)th class which has the maximal posterior

probability that is

119895lowast

(119909) = arg max119895isin119869

119875 (119895 | 119909) (1)

and 119875(119895 | 119909) is the posterior probability function that isthe conditional probability that the sample belongs to the 119895thclass given that we know it has feature vector 119909

Using Bayesrsquo theorem we have

119875 (119895 | 119909) =119875 (119895) 119875 (119909 | 119895)

119875 (119909)prop 119875 (119895) 119875 (119909 | 119895) (2)

where 119875(119895) is the prior probability of the 119895th class 119875(119909 | 119895)

is the conditional probability for the 119894th class and 119875(119909) isthe probability that a sample has feature vector 119909 Note that119875(119909) is a constant if the values of the feature variables areknown and thus can be omitted To design an effective Bayesclassifier the key issue is estimating the class-conditionalprobability 119875(119909 | 119895) or the joint probability 119875(119909 119895) Theoreti-cally using the chain rule we have

119875 (119909 119895) = 119875 (119895) 119875 (1199091

| 119895) 119875 (1199092

| 119895 1199091)

sdot sdot sdot 119875 (119909|119871|

| 119895 1199091 119909

|119871minus1|)

(3)

Mathematical Problems in Engineering 3

However such estimating method leads to the problem ofldquodimension disasterrdquo

To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo

119875 (119909 | 119895) =

|119871|

prod

119897=1

119901119897

119895(119909) (4)

where 119901119897

119895(119909) = 119875(119909

119897| 119895) is the class-conditional probability

that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability

119875 (119909 | 119895) =

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) (5)

where 120573119897

119895is a coefficient Compared with the ldquoconditional

independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection

Under this assumption we have

119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) =

|119871|

sum

119897=1

120572119897

119895119901119897

119895(119909)

(6)

where 120572119897

119895= 119875(119895)120573

119897

119895denotes the probability weight of the 119897th

feature for the 119895th classTo obtain the optimal probability classifier based on the

ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem

min120572isinΘ

sum

119895isin119869

sum

119894isin119868

119871 (119875 (119895 | 119909119894) 119910119894119895

) (7)

where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In

the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability

0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)

Under such constraints we have that

sum

119895isin119869

sum

119894isin119868

119871 (119891 (119895 | 119909119894) 119910119894119895

)

= sum

119895isin119869

sum

119894isin119868

10038161003816100381610038161003816119891 (119895 | 119909

119894) minus 119910119894119895

10038161003816100381610038161003816

= sum

119895isin119869

sum

119894isin119868

119910119894119895

(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910

119894119895) 119891 (119895 | 119909

119894)

= sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) 119891 (119895 | 119909119894) + |119868|

(9)

where |119868| = sum119895isin119869

sum119894isin119868

119910119894119895

Thus the optimal probability classifier (PC) problem canbe formulated as follows

(PC) min sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119901119897

119894119895+ |119868|

st 0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119894 isin 119868 119895 isin 119869

(10)

It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts

(1) As an intuitive interpretation note that 119901119897

119895(119909) esti-

mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572

119897

119895as

certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]

(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier

(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time

22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594

2-distance which usesmore information from the samples

4 Mathematical Problems in Engineering

The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is

used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901

1 119901

119898)119879

and 119902 = (1199021 119902

119898)119879 it is defined as

119889 (119902 119901) =

119898

sum

119895=1

(119902119895

minus 119901119895)2

119901119895

(11)

Based on the modified 1205942-distance we present the following

class-conditional probability distributional set

119875120598

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0 sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598

forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(12)

where 119901119897

119894119895is the nominal class-conditional distribution prob-

ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set

To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875

120598 the robust constraints ensure

that all the original constraints should also be satisfied forany distribution in 119875

120598 Thus the robust probability classifier

problem is of the following form

(RPC) min

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| 119902119897

119894119895 isin 119875120598

st 0 le sum

119897isin119871

120572119897

119895119902119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598

forall119894 119895

(13)

Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3

23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal

probability 119901119897

119894119895 The selection of parameter 120598 is application

based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897

119894119895For the 119897th feature the following procedure takes an

integer 119870119897indicating the number of data intervals as an input

andwill output the estimated probability119901119897

119894119895of the 119894th sample

belonging to the 119895th class

(1) Sort samples in the increased order and divide theminto 119870

119897intervals such that each interval has at least

lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval

by Δ119897119896

(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval

119873119897119896 and the total number of samples belonging to the

119895-class in the 119896th interval 119873119897119896119895

(3) For the 119894th sample if it falls into the 119896th interval the

class-conditional probability 119901119897

119894119895is calculated by

119901119897

119894119895= Prob (119894 isin 119895 | 119909

119894119897isin Δ119897119896

)

=Prob (119894 isin 119895 119909

119894119897isin Δ119897119896

)

Prob (119909119894119897

isin Δ119897119896

)

=Prob (119894 isin 119895)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 119895)

sum1198951015840isin119869Prob (119894 isin 119895

1015840)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 1198951015840)

=

(119873119895 |119868|) sdot (119873

119897119896119895119873119895)

sum1198951015840isin119869

(1198731015840

119895 |119868|) sdot (119873

11989711989611989510158401198731015840

119895)

=

119873119897119896119895

119873119897119896

(14)

Note that from the definition of 119875120598 we easily compute the

upper bound 119902119897

119894119895and lower bound 119902

119897

119894119895for the true class-

conditional probability 119902119897

119894119895as follows

119902119897

119894119895= max

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(15)

119902119897

119894119895= min

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(16)

The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]

3 Solution Methods for RPC

In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]

Mathematical Problems in Engineering 5

Consider a conic program of the following form

(CP) min 119888119879119909

st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898

119860119909 = 119887

(17)

and its dual problem

(DP) max 119887119879119911 +

119898

sum

119894=1

119887119879

119894119910119894

st 119860lowast119911 +

119898

sum

119894=1

119860lowast

119894119910119894= 119888

119910119894isin 119862lowast

119894 forall119894 = 1 119898

(18)

where 119862119894is a cone in R119899119894 and 119862

lowast

119894is its dual cone defined by

119862lowast

119894= 119910 isin R

119899119894 119910119879119909 ge forall119909 isin 119862

119894 (19)

A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860

119894119909 minus 119887119894

isin int119862119894 forall119894 = 1 119898 where

int119862119894denotes the interior point set of 119862

119894

Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value

31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently

Lemma 2 For given 119894 119895 the robust constraint

0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598 (20)

is equal to the following constraints

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 119906

1198971

119894119895 V1198971119894119895

ge 0 forall119897 isin 119871

(21)

Proof First note that the distributional set 119875120598119894can be repre-

sented as theCartesian product of a series of projected subsets

119875120598

= prod

119894isin119868

119875120598119894

(22)

where the projected subset on index 119894 is defined by

119875120598119894

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0

sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598 forall119897 isin 119871 119895 isin 119869

(23)

Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902

119897

119894119895 119897 isin 119871 we can further split the

projected subset 119875120598119894into |119869| subsets

119875120598119894

= prod

119895isin119869

119875120598119894119895

= prod

119895isin119869

119902119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 (24)

where 119902119897

119894119895and 119902119897

119894119895are computed by (15) and (16) respectively

For constraint sum119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall119902

119897

119894119895 isin 119875120598 it is equal to

the following constraint

sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894

lArrrArr sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894119895

lArrrArr minsum

119897isin119871

120572119897

119895119901119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 ge 0

lArrrArr maxsum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

)

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871 ge 0

lArrrArr sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

(25)

where the last equivalence comes from the strong dualitybetween these two linear programs

For the constraint sum119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119902

119897

119894119895 isin 119875120598 the same

technique applies thus we complete the proof

32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889

lowast of the modified 1205942-distance

119889lowast

(119904) = sup119905ge0

119904119905 minus 119889 (119905) =[119904 + 2]

2

+

4minus 1 (26)

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

2 Mathematical Problems in Engineering

approach has been further extended to incorporating otherfeatures El Ghaoui et al [13] propose a robust classificationmodel by minimizing the worst-case value of a given lossfunction over all possible choices of the data in a boundedhyperrectangles Three loss functions from SVM logisticregressions and minimax probability machines are studiedin [13] Based on the same assumption of known meanand covariance matrix [14 15] propose the biased minimaxprobability machines to address the biased classificationproblem and further generalize it to obtain the minimumerrorminimax probabilitymachinesHoi and Lyu [16] study aquadratic classifier with positive definite covariance matricesand further consider the problem of finding a convex set tocover known sampled data in one class while minimizing theworst-case misclassification probability The minimax prob-ability machines have also been extended to solve multipleclassification problems see [17 18]

In this paper we propose a robust probability classifier(RPC) based on the modified 120594

2-distance Specifically for agiven training set we first estimate the probability of eachsample belonging to each class based on a feature whichis called a nominal class-conditional distribution Then a 120598-confidence probability distributional set 119875

120598is constructed

based on the nominal class-conditional distributions and themodified 120594

2-distance where parameter 120598 controls the size ofthe constructed set Unlike the ldquoconditional independenceassumptionrdquo in NBC we introduce a ldquolinear combinationassumptionrdquo for the posterior class-conditional probabilitiesand the proposed classifier takes a linear combination formof these probabilities based on different features and it willassign the sample to the class with the maximal posteriorprobability To get a robust classifier we minimize the worst-case loss function value over all possible choices of class-conditional distributions over the distributional set 119875

120598 The

underlying assumption is that due to observational noiseswe cannot obtain the true probability distribution of eachclass but it can be well estimated by the nominal distributionsuch that it belongs to the distributional set119875

120598

Our two major contributions are as follows First inour model the proposed distributional set 119875

120598is based on

the nominal distribution and the modified 1205942-distance As

pointed in [19] such distributional set can make use of moreinformation conveyed in the training set compared withtraditional robust approaches which only use the informationofmean and covariancematrix To the best of our knowledgethis is among the first study of classification models con-sidering complex distribution information Although [20]considers a 120598-contaminated robust support vector machinemodel its distributional set is defined by easily handledlinear constraints and its analysis is highly dependent oncharacterization of the extreme points of this set Here ourproposed distributional set is defined by nonlinear quadraticfunction and is analyzed by the conic duality theorem Secondby taking the absolute error function as the loss functionwe show how to transform our robust minimax optimizationproblem into computable second order cone programmingThe absolute error function in the objective function alsodistinguishes our model from other existing models such

as the soft-margin support vector machine which uses theHinge loss function [21 22] and the logistic regression whichuses the negative log likelihood function [23] Note that theabsolute error function is essential in our model to obtaina tractable optimization problem for the proposed modelNumerical experiments on real-world application validatethe effectiveness of the proposed classifier and further showthat the proposed classifier also performs well for multipleclassification problems

The paper proceeds as follows Section 2 introduces theproposed robust minimax probability classifier based on themodified 120594

2-distance and discusses how to construct thedesired distributional set 119875

120598 Section 3 provides an equivalent

reformulation by handling the robust constraints and robustobjective separately Numerical experiments on real-worlddata set are carried out to validate the effectiveness of theproposed classifier in Section 4 Section 5 concludes thispaper and gives future research directions

2 Classifier Models

In this section a simple probability classifier is first presentedand then it is extended to handle data uncertainty byintroducing a distributional set 119875

120598 We also discuss how to

construct this distributional set based on training data setConsider a multiclass multifeature classification problem

in which each sample contains |119871| features and there are|119869| classes and |119868| samples Specifically given a training set(119883 119884) isin R|119868|times|119871|times0 1

|119868|times|119869| where 119909119894119897denotes the 119897th feature

of the 119894th sample and 119910119894119895

= 1 if the 119894th sample belongs to119895th class otherwise 119910

119894119895= 0 In the following context we

will also use the term 119909119894 to denote the 119894th sample that is

119909119894= (1199091198941

119909119894|119871|

)

21 Probability Classifier Bayes classifiers assign an observa-tion 119909 to the 119895

lowast(119909)th class which has the maximal posterior

probability that is

119895lowast

(119909) = arg max119895isin119869

119875 (119895 | 119909) (1)

and 119875(119895 | 119909) is the posterior probability function that isthe conditional probability that the sample belongs to the 119895thclass given that we know it has feature vector 119909

Using Bayesrsquo theorem we have

119875 (119895 | 119909) =119875 (119895) 119875 (119909 | 119895)

119875 (119909)prop 119875 (119895) 119875 (119909 | 119895) (2)

where 119875(119895) is the prior probability of the 119895th class 119875(119909 | 119895)

is the conditional probability for the 119894th class and 119875(119909) isthe probability that a sample has feature vector 119909 Note that119875(119909) is a constant if the values of the feature variables areknown and thus can be omitted To design an effective Bayesclassifier the key issue is estimating the class-conditionalprobability 119875(119909 | 119895) or the joint probability 119875(119909 119895) Theoreti-cally using the chain rule we have

119875 (119909 119895) = 119875 (119895) 119875 (1199091

| 119895) 119875 (1199092

| 119895 1199091)

sdot sdot sdot 119875 (119909|119871|

| 119895 1199091 119909

|119871minus1|)

(3)

Mathematical Problems in Engineering 3

However such estimating method leads to the problem ofldquodimension disasterrdquo

To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo

119875 (119909 | 119895) =

|119871|

prod

119897=1

119901119897

119895(119909) (4)

where 119901119897

119895(119909) = 119875(119909

119897| 119895) is the class-conditional probability

that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability

119875 (119909 | 119895) =

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) (5)

where 120573119897

119895is a coefficient Compared with the ldquoconditional

independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection

Under this assumption we have

119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) =

|119871|

sum

119897=1

120572119897

119895119901119897

119895(119909)

(6)

where 120572119897

119895= 119875(119895)120573

119897

119895denotes the probability weight of the 119897th

feature for the 119895th classTo obtain the optimal probability classifier based on the

ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem

min120572isinΘ

sum

119895isin119869

sum

119894isin119868

119871 (119875 (119895 | 119909119894) 119910119894119895

) (7)

where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In

the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability

0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)

Under such constraints we have that

sum

119895isin119869

sum

119894isin119868

119871 (119891 (119895 | 119909119894) 119910119894119895

)

= sum

119895isin119869

sum

119894isin119868

10038161003816100381610038161003816119891 (119895 | 119909

119894) minus 119910119894119895

10038161003816100381610038161003816

= sum

119895isin119869

sum

119894isin119868

119910119894119895

(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910

119894119895) 119891 (119895 | 119909

119894)

= sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) 119891 (119895 | 119909119894) + |119868|

(9)

where |119868| = sum119895isin119869

sum119894isin119868

119910119894119895

Thus the optimal probability classifier (PC) problem canbe formulated as follows

(PC) min sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119901119897

119894119895+ |119868|

st 0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119894 isin 119868 119895 isin 119869

(10)

It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts

(1) As an intuitive interpretation note that 119901119897

119895(119909) esti-

mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572

119897

119895as

certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]

(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier

(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time

22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594

2-distance which usesmore information from the samples

4 Mathematical Problems in Engineering

The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is

used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901

1 119901

119898)119879

and 119902 = (1199021 119902

119898)119879 it is defined as

119889 (119902 119901) =

119898

sum

119895=1

(119902119895

minus 119901119895)2

119901119895

(11)

Based on the modified 1205942-distance we present the following

class-conditional probability distributional set

119875120598

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0 sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598

forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(12)

where 119901119897

119894119895is the nominal class-conditional distribution prob-

ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set

To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875

120598 the robust constraints ensure

that all the original constraints should also be satisfied forany distribution in 119875

120598 Thus the robust probability classifier

problem is of the following form

(RPC) min

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| 119902119897

119894119895 isin 119875120598

st 0 le sum

119897isin119871

120572119897

119895119902119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598

forall119894 119895

(13)

Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3

23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal

probability 119901119897

119894119895 The selection of parameter 120598 is application

based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897

119894119895For the 119897th feature the following procedure takes an

integer 119870119897indicating the number of data intervals as an input

andwill output the estimated probability119901119897

119894119895of the 119894th sample

belonging to the 119895th class

(1) Sort samples in the increased order and divide theminto 119870

119897intervals such that each interval has at least

lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval

by Δ119897119896

(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval

119873119897119896 and the total number of samples belonging to the

119895-class in the 119896th interval 119873119897119896119895

(3) For the 119894th sample if it falls into the 119896th interval the

class-conditional probability 119901119897

119894119895is calculated by

119901119897

119894119895= Prob (119894 isin 119895 | 119909

119894119897isin Δ119897119896

)

=Prob (119894 isin 119895 119909

119894119897isin Δ119897119896

)

Prob (119909119894119897

isin Δ119897119896

)

=Prob (119894 isin 119895)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 119895)

sum1198951015840isin119869Prob (119894 isin 119895

1015840)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 1198951015840)

=

(119873119895 |119868|) sdot (119873

119897119896119895119873119895)

sum1198951015840isin119869

(1198731015840

119895 |119868|) sdot (119873

11989711989611989510158401198731015840

119895)

=

119873119897119896119895

119873119897119896

(14)

Note that from the definition of 119875120598 we easily compute the

upper bound 119902119897

119894119895and lower bound 119902

119897

119894119895for the true class-

conditional probability 119902119897

119894119895as follows

119902119897

119894119895= max

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(15)

119902119897

119894119895= min

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(16)

The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]

3 Solution Methods for RPC

In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]

Mathematical Problems in Engineering 5

Consider a conic program of the following form

(CP) min 119888119879119909

st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898

119860119909 = 119887

(17)

and its dual problem

(DP) max 119887119879119911 +

119898

sum

119894=1

119887119879

119894119910119894

st 119860lowast119911 +

119898

sum

119894=1

119860lowast

119894119910119894= 119888

119910119894isin 119862lowast

119894 forall119894 = 1 119898

(18)

where 119862119894is a cone in R119899119894 and 119862

lowast

119894is its dual cone defined by

119862lowast

119894= 119910 isin R

119899119894 119910119879119909 ge forall119909 isin 119862

119894 (19)

A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860

119894119909 minus 119887119894

isin int119862119894 forall119894 = 1 119898 where

int119862119894denotes the interior point set of 119862

119894

Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value

31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently

Lemma 2 For given 119894 119895 the robust constraint

0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598 (20)

is equal to the following constraints

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 119906

1198971

119894119895 V1198971119894119895

ge 0 forall119897 isin 119871

(21)

Proof First note that the distributional set 119875120598119894can be repre-

sented as theCartesian product of a series of projected subsets

119875120598

= prod

119894isin119868

119875120598119894

(22)

where the projected subset on index 119894 is defined by

119875120598119894

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0

sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598 forall119897 isin 119871 119895 isin 119869

(23)

Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902

119897

119894119895 119897 isin 119871 we can further split the

projected subset 119875120598119894into |119869| subsets

119875120598119894

= prod

119895isin119869

119875120598119894119895

= prod

119895isin119869

119902119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 (24)

where 119902119897

119894119895and 119902119897

119894119895are computed by (15) and (16) respectively

For constraint sum119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall119902

119897

119894119895 isin 119875120598 it is equal to

the following constraint

sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894

lArrrArr sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894119895

lArrrArr minsum

119897isin119871

120572119897

119895119901119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 ge 0

lArrrArr maxsum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

)

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871 ge 0

lArrrArr sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

(25)

where the last equivalence comes from the strong dualitybetween these two linear programs

For the constraint sum119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119902

119897

119894119895 isin 119875120598 the same

technique applies thus we complete the proof

32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889

lowast of the modified 1205942-distance

119889lowast

(119904) = sup119905ge0

119904119905 minus 119889 (119905) =[119904 + 2]

2

+

4minus 1 (26)

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Mathematical Problems in Engineering 3

However such estimating method leads to the problem ofldquodimension disasterrdquo

To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo

119875 (119909 | 119895) =

|119871|

prod

119897=1

119901119897

119895(119909) (4)

where 119901119897

119895(119909) = 119875(119909

119897| 119895) is the class-conditional probability

that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability

119875 (119909 | 119895) =

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) (5)

where 120573119897

119895is a coefficient Compared with the ldquoconditional

independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection

Under this assumption we have

119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)

|119871|

sum

119897=1

120573119897

119895119901119897

119895(119909) =

|119871|

sum

119897=1

120572119897

119895119901119897

119895(119909)

(6)

where 120572119897

119895= 119875(119895)120573

119897

119895denotes the probability weight of the 119897th

feature for the 119895th classTo obtain the optimal probability classifier based on the

ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem

min120572isinΘ

sum

119895isin119869

sum

119894isin119868

119871 (119875 (119895 | 119909119894) 119910119894119895

) (7)

where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In

the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability

0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)

Under such constraints we have that

sum

119895isin119869

sum

119894isin119868

119871 (119891 (119895 | 119909119894) 119910119894119895

)

= sum

119895isin119869

sum

119894isin119868

10038161003816100381610038161003816119891 (119895 | 119909

119894) minus 119910119894119895

10038161003816100381610038161003816

= sum

119895isin119869

sum

119894isin119868

119910119894119895

(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910

119894119895) 119891 (119895 | 119909

119894)

= sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) 119891 (119895 | 119909119894) + |119868|

(9)

where |119868| = sum119895isin119869

sum119894isin119868

119910119894119895

Thus the optimal probability classifier (PC) problem canbe formulated as follows

(PC) min sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119901119897

119894119895+ |119868|

st 0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119894 isin 119868 119895 isin 119869

(10)

It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts

(1) As an intuitive interpretation note that 119901119897

119895(119909) esti-

mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572

119897

119895as

certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]

(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier

(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time

22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594

2-distance which usesmore information from the samples

4 Mathematical Problems in Engineering

The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is

used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901

1 119901

119898)119879

and 119902 = (1199021 119902

119898)119879 it is defined as

119889 (119902 119901) =

119898

sum

119895=1

(119902119895

minus 119901119895)2

119901119895

(11)

Based on the modified 1205942-distance we present the following

class-conditional probability distributional set

119875120598

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0 sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598

forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(12)

where 119901119897

119894119895is the nominal class-conditional distribution prob-

ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set

To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875

120598 the robust constraints ensure

that all the original constraints should also be satisfied forany distribution in 119875

120598 Thus the robust probability classifier

problem is of the following form

(RPC) min

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| 119902119897

119894119895 isin 119875120598

st 0 le sum

119897isin119871

120572119897

119895119902119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598

forall119894 119895

(13)

Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3

23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal

probability 119901119897

119894119895 The selection of parameter 120598 is application

based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897

119894119895For the 119897th feature the following procedure takes an

integer 119870119897indicating the number of data intervals as an input

andwill output the estimated probability119901119897

119894119895of the 119894th sample

belonging to the 119895th class

(1) Sort samples in the increased order and divide theminto 119870

119897intervals such that each interval has at least

lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval

by Δ119897119896

(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval

119873119897119896 and the total number of samples belonging to the

119895-class in the 119896th interval 119873119897119896119895

(3) For the 119894th sample if it falls into the 119896th interval the

class-conditional probability 119901119897

119894119895is calculated by

119901119897

119894119895= Prob (119894 isin 119895 | 119909

119894119897isin Δ119897119896

)

=Prob (119894 isin 119895 119909

119894119897isin Δ119897119896

)

Prob (119909119894119897

isin Δ119897119896

)

=Prob (119894 isin 119895)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 119895)

sum1198951015840isin119869Prob (119894 isin 119895

1015840)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 1198951015840)

=

(119873119895 |119868|) sdot (119873

119897119896119895119873119895)

sum1198951015840isin119869

(1198731015840

119895 |119868|) sdot (119873

11989711989611989510158401198731015840

119895)

=

119873119897119896119895

119873119897119896

(14)

Note that from the definition of 119875120598 we easily compute the

upper bound 119902119897

119894119895and lower bound 119902

119897

119894119895for the true class-

conditional probability 119902119897

119894119895as follows

119902119897

119894119895= max

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(15)

119902119897

119894119895= min

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(16)

The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]

3 Solution Methods for RPC

In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]

Mathematical Problems in Engineering 5

Consider a conic program of the following form

(CP) min 119888119879119909

st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898

119860119909 = 119887

(17)

and its dual problem

(DP) max 119887119879119911 +

119898

sum

119894=1

119887119879

119894119910119894

st 119860lowast119911 +

119898

sum

119894=1

119860lowast

119894119910119894= 119888

119910119894isin 119862lowast

119894 forall119894 = 1 119898

(18)

where 119862119894is a cone in R119899119894 and 119862

lowast

119894is its dual cone defined by

119862lowast

119894= 119910 isin R

119899119894 119910119879119909 ge forall119909 isin 119862

119894 (19)

A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860

119894119909 minus 119887119894

isin int119862119894 forall119894 = 1 119898 where

int119862119894denotes the interior point set of 119862

119894

Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value

31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently

Lemma 2 For given 119894 119895 the robust constraint

0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598 (20)

is equal to the following constraints

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 119906

1198971

119894119895 V1198971119894119895

ge 0 forall119897 isin 119871

(21)

Proof First note that the distributional set 119875120598119894can be repre-

sented as theCartesian product of a series of projected subsets

119875120598

= prod

119894isin119868

119875120598119894

(22)

where the projected subset on index 119894 is defined by

119875120598119894

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0

sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598 forall119897 isin 119871 119895 isin 119869

(23)

Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902

119897

119894119895 119897 isin 119871 we can further split the

projected subset 119875120598119894into |119869| subsets

119875120598119894

= prod

119895isin119869

119875120598119894119895

= prod

119895isin119869

119902119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 (24)

where 119902119897

119894119895and 119902119897

119894119895are computed by (15) and (16) respectively

For constraint sum119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall119902

119897

119894119895 isin 119875120598 it is equal to

the following constraint

sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894

lArrrArr sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894119895

lArrrArr minsum

119897isin119871

120572119897

119895119901119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 ge 0

lArrrArr maxsum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

)

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871 ge 0

lArrrArr sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

(25)

where the last equivalence comes from the strong dualitybetween these two linear programs

For the constraint sum119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119902

119897

119894119895 isin 119875120598 the same

technique applies thus we complete the proof

32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889

lowast of the modified 1205942-distance

119889lowast

(119904) = sup119905ge0

119904119905 minus 119889 (119905) =[119904 + 2]

2

+

4minus 1 (26)

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

4 Mathematical Problems in Engineering

The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is

used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901

1 119901

119898)119879

and 119902 = (1199021 119902

119898)119879 it is defined as

119889 (119902 119901) =

119898

sum

119895=1

(119902119895

minus 119901119895)2

119901119895

(11)

Based on the modified 1205942-distance we present the following

class-conditional probability distributional set

119875120598

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0 sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598

forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(12)

where 119901119897

119894119895is the nominal class-conditional distribution prob-

ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set

To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875

120598 the robust constraints ensure

that all the original constraints should also be satisfied forany distribution in 119875

120598 Thus the robust probability classifier

problem is of the following form

(RPC) min

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| 119902119897

119894119895 isin 119875120598

st 0 le sum

119897isin119871

120572119897

119895119902119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598

forall119894 119895

(13)

Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3

23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal

probability 119901119897

119894119895 The selection of parameter 120598 is application

based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897

119894119895For the 119897th feature the following procedure takes an

integer 119870119897indicating the number of data intervals as an input

andwill output the estimated probability119901119897

119894119895of the 119894th sample

belonging to the 119895th class

(1) Sort samples in the increased order and divide theminto 119870

119897intervals such that each interval has at least

lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval

by Δ119897119896

(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval

119873119897119896 and the total number of samples belonging to the

119895-class in the 119896th interval 119873119897119896119895

(3) For the 119894th sample if it falls into the 119896th interval the

class-conditional probability 119901119897

119894119895is calculated by

119901119897

119894119895= Prob (119894 isin 119895 | 119909

119894119897isin Δ119897119896

)

=Prob (119894 isin 119895 119909

119894119897isin Δ119897119896

)

Prob (119909119894119897

isin Δ119897119896

)

=Prob (119894 isin 119895)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 119895)

sum1198951015840isin119869Prob (119894 isin 119895

1015840)Prob (119909

119894119897isin Δ119897119896

| 119894 isin 1198951015840)

=

(119873119895 |119868|) sdot (119873

119897119896119895119873119895)

sum1198951015840isin119869

(1198731015840

119895 |119868|) sdot (119873

11989711989611989510158401198731015840

119895)

=

119873119897119896119895

119873119897119896

(14)

Note that from the definition of 119875120598 we easily compute the

upper bound 119902119897

119894119895and lower bound 119902

119897

119894119895for the true class-

conditional probability 119902119897

119894119895as follows

119902119897

119894119895= max

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(15)

119902119897

119894119895= min

119902119897

119894119895 sum

119904

119902119897

119894119904= 1

sum

119904isin119869

(119902119897

119894119904minus 119901119897

119894119904)2

119901119897

119894119904

le 120598 119902119897

119894119904ge 0 forall119904 isin 119869

(16)

The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]

3 Solution Methods for RPC

In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]

Mathematical Problems in Engineering 5

Consider a conic program of the following form

(CP) min 119888119879119909

st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898

119860119909 = 119887

(17)

and its dual problem

(DP) max 119887119879119911 +

119898

sum

119894=1

119887119879

119894119910119894

st 119860lowast119911 +

119898

sum

119894=1

119860lowast

119894119910119894= 119888

119910119894isin 119862lowast

119894 forall119894 = 1 119898

(18)

where 119862119894is a cone in R119899119894 and 119862

lowast

119894is its dual cone defined by

119862lowast

119894= 119910 isin R

119899119894 119910119879119909 ge forall119909 isin 119862

119894 (19)

A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860

119894119909 minus 119887119894

isin int119862119894 forall119894 = 1 119898 where

int119862119894denotes the interior point set of 119862

119894

Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value

31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently

Lemma 2 For given 119894 119895 the robust constraint

0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598 (20)

is equal to the following constraints

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 119906

1198971

119894119895 V1198971119894119895

ge 0 forall119897 isin 119871

(21)

Proof First note that the distributional set 119875120598119894can be repre-

sented as theCartesian product of a series of projected subsets

119875120598

= prod

119894isin119868

119875120598119894

(22)

where the projected subset on index 119894 is defined by

119875120598119894

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0

sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598 forall119897 isin 119871 119895 isin 119869

(23)

Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902

119897

119894119895 119897 isin 119871 we can further split the

projected subset 119875120598119894into |119869| subsets

119875120598119894

= prod

119895isin119869

119875120598119894119895

= prod

119895isin119869

119902119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 (24)

where 119902119897

119894119895and 119902119897

119894119895are computed by (15) and (16) respectively

For constraint sum119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall119902

119897

119894119895 isin 119875120598 it is equal to

the following constraint

sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894

lArrrArr sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894119895

lArrrArr minsum

119897isin119871

120572119897

119895119901119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 ge 0

lArrrArr maxsum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

)

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871 ge 0

lArrrArr sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

(25)

where the last equivalence comes from the strong dualitybetween these two linear programs

For the constraint sum119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119902

119897

119894119895 isin 119875120598 the same

technique applies thus we complete the proof

32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889

lowast of the modified 1205942-distance

119889lowast

(119904) = sup119905ge0

119904119905 minus 119889 (119905) =[119904 + 2]

2

+

4minus 1 (26)

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Mathematical Problems in Engineering 5

Consider a conic program of the following form

(CP) min 119888119879119909

st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898

119860119909 = 119887

(17)

and its dual problem

(DP) max 119887119879119911 +

119898

sum

119894=1

119887119879

119894119910119894

st 119860lowast119911 +

119898

sum

119894=1

119860lowast

119894119910119894= 119888

119910119894isin 119862lowast

119894 forall119894 = 1 119898

(18)

where 119862119894is a cone in R119899119894 and 119862

lowast

119894is its dual cone defined by

119862lowast

119894= 119910 isin R

119899119894 119910119879119909 ge forall119909 isin 119862

119894 (19)

A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860

119894119909 minus 119887119894

isin int119862119894 forall119894 = 1 119898 where

int119862119894denotes the interior point set of 119862

119894

Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value

31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently

Lemma 2 For given 119894 119895 the robust constraint

0 le sum

119897isin119871

120572119897

119895119901119897

119894119895le 1 forall 119902

119897

119894119895 isin 119875120598 (20)

is equal to the following constraints

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 119906

1198971

119894119895 V1198971119894119895

ge 0 forall119897 isin 119871

(21)

Proof First note that the distributional set 119875120598119894can be repre-

sented as theCartesian product of a series of projected subsets

119875120598

= prod

119894isin119868

119875120598119894

(22)

where the projected subset on index 119894 is defined by

119875120598119894

=

119902119897

119894119895 sum

119895

119902119897

119894119895= 1 119902119897

119894119895ge 0

sum

119895isin119869

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

le 120598 forall119897 isin 119871 119895 isin 119869

(23)

Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902

119897

119894119895 119897 isin 119871 we can further split the

projected subset 119875120598119894into |119869| subsets

119875120598119894

= prod

119895isin119869

119875120598119894119895

= prod

119895isin119869

119902119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 (24)

where 119902119897

119894119895and 119902119897

119894119895are computed by (15) and (16) respectively

For constraint sum119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall119902

119897

119894119895 isin 119875120598 it is equal to

the following constraint

sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894

lArrrArr sum

119897isin119871

120572119897

119895119901119897

119894119895ge 0 forall 119902

119897

119894119895 isin 119875120598119894119895

lArrrArr minsum

119897isin119871

120572119897

119895119901119897

119894119895 119902119897

119894119895le 119902119897

119894119895le 119902119897

119894119895 forall119897 isin 119871 ge 0

lArrrArr maxsum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

)

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871 ge 0

lArrrArr sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 1199061198970

119894119895 V1198970119894119895

ge 0 forall119897 isin 119871

(25)

where the last equivalence comes from the strong dualitybetween these two linear programs

For the constraint sum119897isin119871

120572119897

119895119901119897

119894119895le 1 forall119902

119897

119894119895 isin 119875120598 the same

technique applies thus we complete the proof

32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889

lowast of the modified 1205942-distance

119889lowast

(119904) = sup119905ge0

119904119905 minus 119889 (119905) =[119904 + 2]

2

+

4minus 1 (26)

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

6 Mathematical Problems in Engineering

where the function [sdot]+is defined as [119909]

+= 119909 if 119909 ge

0 otherwise [119909]+

= 0 For more details about conjugatefunctions see [28]

Proposition 3 The following inner maximization problem

maxsum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895+ |119868| 119902

119897

119894119895 isin 119875120598 (27)

is equivalent to a second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 120582119897

119894119895 119911119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(28)

where a second order cone 119871119899+1 is defined as

119871119899+1

=

119909 isin R119899+1

119909119899+1

ge radic

119899

sum

119894=1

1199092

119894

(29)

Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)

(MP) min 119905

st 119905 ge sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895 + |119868|

forall 119902119897

119894119895 isin 119875120598

(30)

The above constraint can be further reduced to the followingconstraint

max

sum

119895isin119869

sum

119894isin119868

(1 minus 2119910119894119895

) sum

119897isin119871

120572119897

119895119902119897

119894119895

+ |119868| minus 119905 forall 119902119897

119894119895 isin 119875120598 le 0

(31)

By assigning Lagrange multipliers 120579119897

119894isin R and 120582

119897

119894isin R+

to the constraints in the left optimization problem we obtainthe following Lagrange function

119871 (119902 120579 120582) = sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894

(119902119897

119894119895minus 119901119897

119894119895)2

119901119897

119894119895

)

+ |119868| minus 119905

(32)

where 119903119897

119894119895= 120572119897

119895(1 minus 2119910

119894119895) + 120579119897

119894 Its dual function is given as

119863 (120579 120582) = max119902ge0

119871 (119905 119902 120579 120582)

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

max119902119897

119894119895ge0

(119903119897

119894119895119902119897

119894119895minus 120582119897

119894119901119897

119894119895(

119902119897

119894119895minus 119901119897

119894119895

119901119897

119894119895

)

2

)

+ |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895max119905ge0

(119903119897

119894119895119905 minus 120582119897

119894(119905 minus 1)

2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894max119905ge0

(

119903119897

119894119895

120582119897

119894

119905 minus (119905 minus 1)2) + |119868| minus 119905

= sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894)

+ sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) + |119868| minus 119905

(33)

Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897

119894119895 thus there is no duality gap between (31) and the

following dual problem

min 119863 (120579 120582) 120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

lArrrArr

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| minus 119905

st 119908119897

119894119895ge120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) forall119894isin119868 119897isin119871 119895isin119869

120579119897

119894isin R 120582

119897

119894isin R+ forall119894 isin 119868 119897 isin 119871

(34)

Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints

120582119897

119894119889lowast

(

119903119897

119894119895

120582119897

119894

) le 119908119897

119894119895lArrrArr 120582

119897

119894(minus1 +

1

4

[

[

119903119897

119894119895

120582119897

119894

+ 2]

]

2

+

) le 119908119897

119894119895

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge [119903

119897

119894119895+ 2120582119897

119894]2

+

lArrrArr 4120582119897

119894(120582119897

119894+ 119908119897

119894119895) ge (119911

119897

119894119895)2

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Mathematical Problems in Engineering 7

lArrrArr (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge 0 119911

119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894

(35)

By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem

min 119905

st sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868| le 119905

(

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713

119911119897

119894119895ge119903119897

119894119895+2120582119897

119894 119911119897

119894119895 120582119897

119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119906119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869

(36)

By eliminating variable 119905 we complete the proof

Based on the Lemma 2 and Proposition 3 we obtain ourmain result

Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming

min sum

119894isin119868

sum

119897isin119871

(120598120582119897

119894minus 120579119897

119894) + sum

119894isin119868

sum

119897isin119871

sum

119895isin119869

119901119897

119894119895119908119897

119894119895+ |119868|

st (

119908119897

119894119895

119911119897

119894119895

2120582119897

119894+ 119908119897

119894119895

) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895= 120572119897

119895(1 minus 2119868

119894119895) + 120579119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119911119897

119894119895ge 119903119897

119894119895+ 2120582119897

119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

sum

119897isin119871

(119902119897

1198941198951199061198970

119894119895minus 119902119897

119894119895V1198970119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

1 + sum

119897isin119871

(119902119897

1198941198951199061198971

119894119895minus 119902119897

119894119895V1198971119894119895

) ge 0 forall119894 isin 119868 119895 isin 119869

120572119897

119894119895minus 1199061198970

119894119895+ V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

V1198971119894119895

minus 120572119897

119894119895minus 1199061198971

119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

120582119897

119894119895 119911119897

119894119895 1199061198971

119894119895 V1198971119894119895

1199061198970

119894119895 V1198970119894119895

ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871

119903119897

119894119895 120579119897

119894119895 119908119897

119894119895 120572119897

119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871

(37)

4 Numerical Experiments onReal-World Applications

In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems

All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM

41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility

Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples

42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575

2|119869| where

120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901

119897

119894119895= 1|119869| its maximal variation

range is between 119901119897

119894119895(1 minus 120575) and 119901

119897

119894119895(1 + 120575) The number of

data intervals 119870119897is defined as 119870

119897= |119868|(|119869| times 119870) such that if

the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8

We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)

(RSVM) min sum

119894isin119868

120585119894119895

+ 120582119895

10038171003817100381710038171003817119908119895

10038171003817100381710038171003817

st 119910119894119895

(sum

119897isin119871

119908119897

119895119909119897

119894+ 119887119895) ge 1 minus 120585

119894119895 119894 isin 119868

120585119894119895

ge 0 119894 isin 119868

(38)

where 119910119894119895

= 2119910119894119895

minus1 and 120582119895

ge 0 is a regularization parameterAs pointed by [8] 120582

119895ge 0 represents a trade-off between the

number of training set errors and the amount of robustness

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

8 Mathematical Problems in Engineering

Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 907 882 639 662 884 905lowast

55 899 886 691 728 895 899lowast

60 890 850 703 721 913 864lowast

65 863 859 721 728 880 925lowast

70 923 841 703 757 908 863lowast

75 888 879 742 746 887 916lowast

80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast

90 895 884 933 958lowast 892 926

Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set

tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

50 914 848 765 689 913 875lowast

55 925 866 680 770 920 903lowast

60 898 861 729 738 889 909lowast

65 910 823 805 816 898 929lowast

70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast

80 918 808 881 799 897 911lowast

85 883 899 899 928 908 971lowast

90 885 902 888 942 909 972lowast

with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582

119895= 0 we refer to this model as the

classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems

43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]

Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk

Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification

accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon

To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets

44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets

Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Mathematical Problems in Engineering 9

Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

T41 620 597 824 785 779 835lowast

T42 870 822 841 831 805 853lowast

T43 680 612 802 754 855 869lowast

T44 913 839 779 868 888 905lowast

T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast

T62 714 665 869 854lowast 863 854lowast

T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast

Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set

Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()

M1 654 682 727 737 791 774lowast

M2 769 753 826 748 817 809lowast

M3 579 699 748 874 954 920lowast

M4 704 641 971 923 954 923lowast

M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast

06 065 07 075 08 085 09055

06

065

07

075

08

085

09

095

Training rate

Accu

racy

on

trai

ning

set (

)

RSVMNBCRPC

Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset

for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases

Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

055

06

065

07

075

08

085

09

095

1

Accu

racy

on

test

set (

)

Figure 2 Performances of RSVM NBC and RPC on Y5 test set

of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set

To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

10 Mathematical Problems in Engineering

06

065

07

075

08

085

Accu

racy

on

trai

ning

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 3 Performances of RSVM NBC and RPC on T1 trainingset

055

06

065

07

075

08

085

09

Accu

racy

on

test

set (

)

06 065 07 075 08 085 09Training rate

RSVMNBCRPC

Figure 4 Performances of RSVM NBC and RPC on T1 test set

provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)

From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems

5 Conclusion

In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems

To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594

2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set

Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems

There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973

[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992

[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007

[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000

[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001

[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011

[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Mathematical Problems in Engineering 11

[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009

[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002

[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011

[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001

[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003

[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003

[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004

[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004

[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004

[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008

[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007

[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013

[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012

[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000

[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002

[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001

[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986

[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994

[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999

[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for

semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf

[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A Robust Probability Classifier Based on the … · 2020. 1. 13. · Research Article A Robust Probability Classifier Based on the Modified 2-Distance YongzhiWang,

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of